Far Cry 4 {3D Screenshots}
  9 / 28    
There is part of this game located in "spiritual" world and I think that this might be driven by different shaders. They (Shangri-La) are like short side quests but you have to find a mystical scroll and place which holds it. The first one is on the map but after that you have to search for it and then get to it. Usually some hidden temple. I give you some heads up. These are the direction where you should search for it. Spoilers thou. Shangri-La: x-245 y-505 x-545 y-482 x-751 y-812 x-678 y-560
There is part of this game located in "spiritual" world and I think that this might be driven by different shaders. They (Shangri-La) are like short side quests but you have to find a mystical scroll and place which holds it.
The first one is on the map but after that you have to search for it and then get to it. Usually some hidden temple. I give you some heads up. These are the direction where you should search for it. Spoilers thou.

Shangri-La:
x-245 y-505
x-545 y-482
x-751 y-812
x-678 y-560
[quote="bo3b"]That SLI call is in there because of that same experiment. I tried to lie to a single card system to see if it would help the SR4 problem, but it's not that simple apparently.[/quote] Have you by any chance managed to identify any differences between SLI & non-SLI in the API level? Given the clue from Mike that changing the SLI compatbility bits can fix the mono depth buffer in SLI I'm thinking that replicating whatever they do in non-SLI might do the trick for this game. At a complete stab in the dark - do you suppose they might affect features the card claims to support in the ID3D11Device::CheckFeatureSupport method?
bo3b said:That SLI call is in there because of that same experiment. I tried to lie to a single card system to see if it would help the SR4 problem, but it's not that simple apparently.


Have you by any chance managed to identify any differences between SLI & non-SLI in the API level? Given the clue from Mike that changing the SLI compatbility bits can fix the mono depth buffer in SLI I'm thinking that replicating whatever they do in non-SLI might do the trick for this game.

At a complete stab in the dark - do you suppose they might affect features the card claims to support in the ID3D11Device::CheckFeatureSupport method?

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/03/2015 05:48 AM   
@DarkStarSword I've been through and fixed loads more shadow/light shaders, as I set different combinations of quality presets. The big problem, and I am now convinced that this is the issue you have raised, is that in non-SLI (single GPU) there is no way to get rid of the offset rendering due to fog and clouds. I tried all sorts of things, including profiles. As soon as I go back to SLI, it's completely fixed. It's been driving me a bit nuts so I am going to leave the single GPU issue for now and focus on fixing rendering issues.
@DarkStarSword I've been through and fixed loads more shadow/light shaders, as I set different combinations of quality presets. The big problem, and I am now convinced that this is the issue you have raised, is that in non-SLI (single GPU) there is no way to get rid of the offset rendering due to fog and clouds. I tried all sorts of things, including profiles. As soon as I go back to SLI, it's completely fixed. It's been driving me a bit nuts so I am going to leave the single GPU issue for now and focus on fixing rendering issues.

Rig: Intel i7-8700K @4.7GHz, 16Gb Ram, SSD, GTX 1080Ti, Win10x64, Asus VG278

Posted 01/03/2015 05:51 AM   
Ok, sounds like the best thing is for you to concentrate on fixing the rendering issues - at the very least SLI users will be able to enjoy the game. Meanwhile I'll keep searching for a anything to help non-SLI since I don't have much other choice short of buying a new gaming rig. If there's any other non-SLI users following this thread interested in helping out, experimenting with running it on different profiles might help. I've tried a whole bunch already with no luck, but given how many possibilities there are maybe someone will stumble upon the right profile.
Ok, sounds like the best thing is for you to concentrate on fixing the rendering issues - at the very least SLI users will be able to enjoy the game. Meanwhile I'll keep searching for a anything to help non-SLI since I don't have much other choice short of buying a new gaming rig.

If there's any other non-SLI users following this thread interested in helping out, experimenting with running it on different profiles might help. I've tried a whole bunch already with no luck, but given how many possibilities there are maybe someone will stumble upon the right profile.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/03/2015 06:52 AM   
[quote="DarkStarSword"][quote="bo3b"]That SLI call is in there because of that same experiment. I tried to lie to a single card system to see if it would help the SR4 problem, but it's not that simple apparently.[/quote] Have you by any chance managed to identify any differences between SLI & non-SLI in the API level? Given the clue from Mike that changing the SLI compatbility bits can fix the mono depth buffer in SLI I'm thinking that replicating whatever they do in non-SLI might do the trick for this game. At a complete stab in the dark - do you suppose they might affect features the card claims to support in the ID3D11Device::CheckFeatureSupport method?[/quote] I haven't looked closely enough at the SLI API to understand what might help here. If this winds up being a second game where we can't seem to fix the non-SLI version, it might be time to take a closer look. As a very good tool I've used to inspect game runtimes, look for the API Monitor from rohitab. Great for understanding a given game's startup sequence.
DarkStarSword said:
bo3b said:That SLI call is in there because of that same experiment. I tried to lie to a single card system to see if it would help the SR4 problem, but it's not that simple apparently.


Have you by any chance managed to identify any differences between SLI & non-SLI in the API level? Given the clue from Mike that changing the SLI compatbility bits can fix the mono depth buffer in SLI I'm thinking that replicating whatever they do in non-SLI might do the trick for this game.

At a complete stab in the dark - do you suppose they might affect features the card claims to support in the ID3D11Device::CheckFeatureSupport method?

I haven't looked closely enough at the SLI API to understand what might help here. If this winds up being a second game where we can't seem to fix the non-SLI version, it might be time to take a closer look.

As a very good tool I've used to inspect game runtimes, look for the API Monitor from rohitab. Great for understanding a given game's startup sequence.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 01/03/2015 11:34 AM   
Real 3D won't work smooth in this game with everything ultra. Even call of duty advanced warfare doesn't work smooth with a single 970 in Real 3D 720p.
Real 3D won't work smooth in this game with everything ultra. Even call of duty advanced warfare doesn't work smooth with a single 970 in Real 3D 720p.

AMD TR4 1950x @ 4.0
GA X399 AORUS Gaming 7 F12e
Gskill FlareX 32GB 4x 3200 14-14-14
INNO3D Nvidia GeForce GTX1080 8G D5 1759Mhz iChill X3 CSM Enabled
Creative SXFI AMP / USB SoundCard
SanDisk Extreme PRO SSD 480Gb
Samsung x2 1TB 7200
WD Gold 6TB Enterprise 128MB Cache (WD6002FRYZ)
Seagate 5TB Enterprise 128MB Cache (ST5000NM0024)
LG OLED55E6 3D TV 50.30.30
Logitech G502 Proteus - LGS Installed
Logitech G903
Logitech G502 Hero
Corsair Gaming K70 RED RAPIDFIRE Mechanical Keyboard
Windows 10 x64 1809 UEFI Boot

DreamScreenTV LEDs https://www.dreamscreentv.com/

Posted 01/03/2015 08:15 PM   
[quote="DarkStarSword"]Ok, sounds like the best thing is for you to concentrate on fixing the rendering issues - at the very least SLI users will be able to enjoy the game. Meanwhile I'll keep searching for a anything to help non-SLI since I don't have much other choice short of buying a new gaming rig. If there's any other non-SLI users following this thread interested in helping out, experimenting with running it on different profiles might help. I've tried a whole bunch already with no luck, but given how many possibilities there are maybe someone will stumble upon the right profile.[/quote] Sounds good. I worked on the water shaders (they are all really quite messed up) and think I have cracked the code for most of that. I will pick it up tomorrow.
DarkStarSword said:Ok, sounds like the best thing is for you to concentrate on fixing the rendering issues - at the very least SLI users will be able to enjoy the game. Meanwhile I'll keep searching for a anything to help non-SLI since I don't have much other choice short of buying a new gaming rig.

If there's any other non-SLI users following this thread interested in helping out, experimenting with running it on different profiles might help. I've tried a whole bunch already with no luck, but given how many possibilities there are maybe someone will stumble upon the right profile.

Sounds good. I worked on the water shaders (they are all really quite messed up) and think I have cracked the code for most of that. I will pick it up tomorrow.

Rig: Intel i7-8700K @4.7GHz, 16Gb Ram, SSD, GTX 1080Ti, Win10x64, Asus VG278

Posted 01/03/2015 10:10 PM   
Game runs really good on latest drivers, patched to 1.6 and 680 sli. Only thing which have to be on medium or high (no higher) is terrain details. Other things are on Ultra and I have it smooth no matter if its on CM or S3D. Question. Since you had some problems with all Fog effects - did you check Himalayas? I believe that because all these particles/debris and blizzard it might be pain as well. Like you didn`t have enough - right ;)
Game runs really good on latest drivers, patched to 1.6 and 680 sli. Only thing which have to be on medium or high (no higher) is terrain details. Other things are on Ultra and I have it smooth no matter if its on CM or S3D.

Question. Since you had some problems with all Fog effects - did you check Himalayas? I believe that because all these particles/debris and blizzard it might be pain as well.
Like you didn`t have enough - right ;)
Just a quick status update on some experiments I've been trying for non-SLI: The closest I've come to success was by adding a hack into 3Dmigoto that detected when the known bad mono depth buffer was in use, disabled the active depth/stencil buffer and replaced the bad depth buffer with the one matched via the ZRepair hash. My reasoning here was the the reason the ZRepair wasn't working was due to the depth buffer being set as both a depth/stencil target and a shader resource simultaneously, which isn't allowed in DX11. This did indeed inject the correct depth buffer *in stereo* and I was able to get some of the effects to render correctly... Unfortunately, this has for some reason made the driver's stereo adjustment unreliable - sometimes objects are adjusted, but swing the camera slightly and they won't be. The worst affected was the water where a seam appeared between the part of the water being stereoised correctly and the part that wasn't :( It also seems that the injected depth texture might not always be quite right, as sometimes the water rendered with full opacity (and a slightly misaligned shoreline, which seems to be related to the subtle weapon movements somehow, NFI what's up with that - maybe a bad pointer mixed up some vertex buffers or something), and from other angles renders perfectly. This hack is currently in the fc4_hacks branch of 3Dmigoto in case someone wanted to take a look (it requires MSAA or TXAA which uses a different depth buffer to other AA settings). A few other less than successful experiments: - I've tried creating a new view of the depth buffer and injecting that, but CreateShaderResourceView is failing and I'm not sure why. - I've tried logging the operations that could be used to copy the depth buffer or access it from the CPU (Map, CopyResource, CopySubresourceRegion, UpdateSubresource, ResolveSubresource) in the hope that I might be able to interject and do the copy in a way that retains the data for the second eye, but I don't see the depth buffer in the log (did I miss any operations?). - I've tried naively copying the real depth buffer into the bad depth texture with CopyResource, which didn't work (nor did I really expect it to). - So far I haven't actually managed to get a meaningful trace out of API monitor for this game (cut short with no DX calls), but this might be PEBKAC and/or uplay related. A few ideas I have yet to try: - Same as the hack I've already done with 3Dmigoto, but use a new blank depth buffer instead of disabling it. - Keep track of more shader resource views (in the same way as the ZRepair does) and see if there is another depth texture I could use (I know for fact that the depth buffer hash is used by several instances). - Create my own Texture2D so I know what flags it was created with and copy the depth buffer into that. - Read out the stereo depth buffer using nvapi's reverse stereo blit and somehow re-inject it (stereo texture with NV3D signature? set left eye, blit, set right eye, blit?) - Figure out what the SLI compatibility bits actually do (but without SLI on my own system I have nothing to compare it against) I'm open to any other ideas - it's getting pretty frustrating at the moment to be honest - feels like I hit a brick wall no matter what I try.
Just a quick status update on some experiments I've been trying for non-SLI:

The closest I've come to success was by adding a hack into 3Dmigoto that detected when the known bad mono depth buffer was in use, disabled the active depth/stencil buffer and replaced the bad depth buffer with the one matched via the ZRepair hash. My reasoning here was the the reason the ZRepair wasn't working was due to the depth buffer being set as both a depth/stencil target and a shader resource simultaneously, which isn't allowed in DX11. This did indeed inject the correct depth buffer *in stereo* and I was able to get some of the effects to render correctly...

Unfortunately, this has for some reason made the driver's stereo adjustment unreliable - sometimes objects are adjusted, but swing the camera slightly and they won't be. The worst affected was the water where a seam appeared between the part of the water being stereoised correctly and the part that wasn't :( It also seems that the injected depth texture might not always be quite right, as sometimes the water rendered with full opacity (and a slightly misaligned shoreline, which seems to be related to the subtle weapon movements somehow, NFI what's up with that - maybe a bad pointer mixed up some vertex buffers or something), and from other angles renders perfectly.

This hack is currently in the fc4_hacks branch of 3Dmigoto in case someone wanted to take a look (it requires MSAA or TXAA which uses a different depth buffer to other AA settings).



A few other less than successful experiments:

- I've tried creating a new view of the depth buffer and injecting that, but CreateShaderResourceView is failing and I'm not sure why.

- I've tried logging the operations that could be used to copy the depth buffer or access it from the CPU (Map, CopyResource, CopySubresourceRegion, UpdateSubresource, ResolveSubresource) in the hope that I might be able to interject and do the copy in a way that retains the data for the second eye, but I don't see the depth buffer in the log (did I miss any operations?).

- I've tried naively copying the real depth buffer into the bad depth texture with CopyResource, which didn't work (nor did I really expect it to).

- So far I haven't actually managed to get a meaningful trace out of API monitor for this game (cut short with no DX calls), but this might be PEBKAC and/or uplay related.



A few ideas I have yet to try:

- Same as the hack I've already done with 3Dmigoto, but use a new blank depth buffer instead of disabling it.

- Keep track of more shader resource views (in the same way as the ZRepair does) and see if there is another depth texture I could use (I know for fact that the depth buffer hash is used by several instances).

- Create my own Texture2D so I know what flags it was created with and copy the depth buffer into that.

- Read out the stereo depth buffer using nvapi's reverse stereo blit and somehow re-inject it (stereo texture with NV3D signature? set left eye, blit, set right eye, blit?)

- Figure out what the SLI compatibility bits actually do (but without SLI on my own system I have nothing to compare it against)



I'm open to any other ideas - it's getting pretty frustrating at the moment to be honest - feels like I hit a brick wall no matter what I try.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/08/2015 01:54 PM   
Thanks for putting all this effort in :-) I am not sure what to suggest to be honest. Regarding the SLI/Stereo bits, this is a long shot needle in a haystack thing, but when I used the Sniper 3 profile in Lords of the Fallen, it *made* several effects render in one eye that were not previously doing so. I can verify this at some point to make sure it was not something else, but you could look in Nvidia Inspector at the two profiles and see what is different about the SLI and stereo bits, or what "unknown" values there are at the bottom of the lists. Do you have the Nvidia API? There are .h files with loads of constants defined which might explain what some of the bits are? Bo3b and I have been working on the decompiler issues (well, I spot them, he fixes them ;-) and making good progress with the fix otherwise. This is a hard game to fix actually, and I am still trying to work out a fair few things (not least of all the ground decals which render above ground a few inches and make shadows look wrong when moving). It's going to be a few weeks yet I think.
Thanks for putting all this effort in :-) I am not sure what to suggest to be honest. Regarding the SLI/Stereo bits, this is a long shot needle in a haystack thing, but when I used the Sniper 3 profile in Lords of the Fallen, it *made* several effects render in one eye that were not previously doing so. I can verify this at some point to make sure it was not something else, but you could look in Nvidia Inspector at the two profiles and see what is different about the SLI and stereo bits, or what "unknown" values there are at the bottom of the lists. Do you have the Nvidia API? There are .h files with loads of constants defined which might explain what some of the bits are?

Bo3b and I have been working on the decompiler issues (well, I spot them, he fixes them ;-) and making good progress with the fix otherwise. This is a hard game to fix actually, and I am still trying to work out a fair few things (not least of all the ground decals which render above ground a few inches and make shadows look wrong when moving). It's going to be a few weeks yet I think.

Rig: Intel i7-8700K @4.7GHz, 16Gb Ram, SSD, GTX 1080Ti, Win10x64, Asus VG278

Posted 01/08/2015 02:10 PM   
Wooooooo Hooooooooo :) So glad you guys are looking at this... FarCry3 was one of my favorite 3D experiences to date & maybe the only game that I once almost played for 24 hours non stop... ( I BECAME THE JUNGLE ) LOL Many thanks guys.
Wooooooo Hooooooooo :)

So glad you guys are looking at this...

FarCry3 was one of my favorite 3D experiences to date & maybe the only game that I once almost played for 24 hours non stop... ( I BECAME THE JUNGLE ) LOL

Many thanks guys.

Posted 01/09/2015 12:16 PM   
Glad you guys working of Far Cry 4! After all, there is a point on having SLI system! :) Thanks a lot!
Glad you guys working of Far Cry 4! After all, there is a point on having SLI system! :)
Thanks a lot!

ASUS Prime z370-A, i5 8600K, ASUS GTX 1080 Ti Strix, 16 GB DDR4, Corsair AX860i PSU, ASUS VG278HR, 3D Vision 2, Sound Blaster Z, Astro A50 Headphones, Windows 10 64-bit Home

Posted 01/12/2015 10:05 PM   
@DarkStarSword: Getting back late to you here, I was head-down in fixing Decompiler problems. The latest version of 0.99.30 is quite a lot better than previous versions for this game, I fixed all of the known 'udiv' problems, and all of the known 'resinfo'/GetDimensions problems. Should be generating correct code for all of those now. For the problem of the mono depth buffer- I'm not sure I understand the nuances here, and I also am not sure we use the same terminology, so take anything I say with a caveat. Please bear with me here, I'm just sort of rambling, hoping it sparks something more useful for you. Do we know how a surface is stereoized by the driver? What is the technique that it uses? The reason I ask is because of the possibility of using the d3dx.ini flag of TextureOverride1, with 'StereoMode=1' to force stereo for a given texture. When you say a mono-depth-buffer, is this the same as the driver's idea of a 'surface', and/or a directX idea of Texture? Did you try using that texture override mechanism from d3dx.ini? It would be interesting if it made any change to the scenario at all. I'm not fully sure it's functional, but is probably worth a try if that texture is mono and should be stereo. The driver has heuristics we know, to mono some things, and stereo others. Those are at least partly driven by profiles, but maybe not all. We know that some flags will change to change the heuristics, but we don't know what they are, or what they do. Now the reason I think that is interesting, is because in SLI, it works properly. The surface/buffer is rendered in stereo, which makes the fix work. In non-SLI, that very same surface/buffer is rendered in mono, or at least sort of mono. That may not be fully related, because you say that both eyes are the same, and shifted. So it looks like what you'd see but only from the left eye for example. That is to be contrasted with a strictly mono surface that should be offset in both eyes, and wind up usually at screen depth. This is where you saying it is a 'mono-depth-buffer' is not completely clear to me. It seems like it's actually a stereo buffer, just that it's damaged to have both halves be the same offset. Does that seem right? Am I mixing up ideas? If it's a driver heuristic misfiring, I'd expect it to be actually mono for real, and no offset in either eye, and thus at screen depth. Since it works in the SLI case, that suggests that either the profile has a switch in heuristics, which seems kind of unlikely, or a driver bug that is bypassed in the SLI case. Or something else, not sure. This does seem to be exactly the same problem we saw with Saints Row. We thought that was a game engine bug, but this is 100% different game engine. If it's some sort of heuristic problem, have you any idea why it might happen here? Are these textures perfectly square for example, or other parts that automatic mode looks at? Is it worth jacking with their dimensions slightly? Like resize them to be non-square? I know you added some more detailed logging for these, anything stand out as significantly different than other pieces that work properly? Just for reference and terminology, I'll add the text from the automatic-mode document that seems related: [i] Duplicate Render Targets One of the most obvious tasks that 3D Vision Automatic performs on an application's behalf is to duplicate the primary render target used. This allows the driver to build a render target to present each eye individually. Additionally, other render targets may be duplicated based on heuristic analysis at creation time. The driver handles all of the mapping for the developer, so that when the developer asks to bind a target, the appropriate per-eye target is bound. Likewise, if the developer uses render-to-texture, stereoization may be performed. If bound for reading, the proper left or right variant of the texture will be used. [/i] Keeping in mind I'm not expert here, but since it works under SLI, that suggests that the most likely scenario is that the render target (the surface or texture) has been duplicated and is a stereo buffer/texture. But that that render target is either damaged or created with the same image in each half/duplicate, or that the driver mapping fails with a bug, by returning the same side for each bind call.
@DarkStarSword: Getting back late to you here, I was head-down in fixing Decompiler problems. The latest version of 0.99.30 is quite a lot better than previous versions for this game, I fixed all of the known 'udiv' problems, and all of the known 'resinfo'/GetDimensions problems. Should be generating correct code for all of those now.


For the problem of the mono depth buffer- I'm not sure I understand the nuances here, and I also am not sure we use the same terminology, so take anything I say with a caveat. Please bear with me here, I'm just sort of rambling, hoping it sparks something more useful for you.


Do we know how a surface is stereoized by the driver? What is the technique that it uses?

The reason I ask is because of the possibility of using the d3dx.ini flag of TextureOverride1, with 'StereoMode=1' to force stereo for a given texture.

When you say a mono-depth-buffer, is this the same as the driver's idea of a 'surface', and/or a directX idea of Texture?

Did you try using that texture override mechanism from d3dx.ini? It would be interesting if it made any change to the scenario at all. I'm not fully sure it's functional, but is probably worth a try if that texture is mono and should be stereo.


The driver has heuristics we know, to mono some things, and stereo others. Those are at least partly driven by profiles, but maybe not all. We know that some flags will change to change the heuristics, but we don't know what they are, or what they do.

Now the reason I think that is interesting, is because in SLI, it works properly. The surface/buffer is rendered in stereo, which makes the fix work. In non-SLI, that very same surface/buffer is rendered in mono, or at least sort of mono.


That may not be fully related, because you say that both eyes are the same, and shifted. So it looks like what you'd see but only from the left eye for example. That is to be contrasted with a strictly mono surface that should be offset in both eyes, and wind up usually at screen depth.

This is where you saying it is a 'mono-depth-buffer' is not completely clear to me. It seems like it's actually a stereo buffer, just that it's damaged to have both halves be the same offset.

Does that seem right? Am I mixing up ideas? If it's a driver heuristic misfiring, I'd expect it to be actually mono for real, and no offset in either eye, and thus at screen depth.


Since it works in the SLI case, that suggests that either the profile has a switch in heuristics, which seems kind of unlikely, or a driver bug that is bypassed in the SLI case. Or something else, not sure. This does seem to be exactly the same problem we saw with Saints Row. We thought that was a game engine bug, but this is 100% different game engine.

If it's some sort of heuristic problem, have you any idea why it might happen here? Are these textures perfectly square for example, or other parts that automatic mode looks at? Is it worth jacking with their dimensions slightly? Like resize them to be non-square? I know you added some more detailed logging for these, anything stand out as significantly different than other pieces that work properly?


Just for reference and terminology, I'll add the text from the automatic-mode document that seems related:

Duplicate Render Targets
One of the most obvious tasks that 3D Vision Automatic performs on an application's
behalf is to duplicate the primary render target used. This allows the driver to build a
render target to present each eye individually. Additionally, other render targets may be
duplicated based on heuristic analysis at creation time. The driver handles all of the
mapping for the developer, so that when the developer asks to bind a target, the
appropriate per-eye target is bound. Likewise, if the developer uses render-to-texture,
stereoization may be performed. If bound for reading, the proper left or right variant of
the texture will be used.


Keeping in mind I'm not expert here, but since it works under SLI, that suggests that the most likely scenario is that the render target (the surface or texture) has been duplicated and is a stereo buffer/texture. But that that render target is either damaged or created with the same image in each half/duplicate, or that the driver mapping fails with a bug, by returning the same side for each bind call.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 01/15/2015 03:51 AM   
I've had to take a little break from this over the last few days (hence fixes to DreadOut and Eleusis on the blog), but it's time to do some more investigating now. I haven't tried any of the other ideas I wrote about in my last post as yet, but I do have high hopes for some of them :) This post is mostly going to be a bit of a dump of the information I've gathered so far - I'm not really expecting anyone to suddenly spot the problem in all of this, and don't worry if this goes over your head, but typing this out has helped to collect my own thoughts. [quote="bo3b"]When you say a mono-depth-buffer, is this the same as the driver's idea of a 'surface', and/or a directX idea of Texture?[/quote] I go into this in a bit more detail below, but it is probably most accurate to call it a texture, as it is never used as a render target or depth/stencil target. Rather, it seems to be that the real depth buffer has been copied into it, though as yet I don't know when or how this happens. [quote="bo3b"]Did you try using that texture override mechanism from d3dx.ini? It would be interesting if it made any change to the scenario at all. I'm not fully sure it's functional, but is probably worth a try if that texture is mono and should be stereo.[/quote] I have already tried this. It does work in that it forces the driver to create a stereo version of the texture, but the result is that the right eye still has the correct depth information, but the left eye has no depth information at all - imagine a blank depth buffer set to infinity. This has the result of making the water, clouds, fire, smoke, etc look correct in the right eye, but be 100% opaque in the left eye (as the depth buffer is being used to calculate opacity in most of these effects). For example, take a look at this screenshot: [img]https://forums.geforce.com/cmd/default/download-comment-attachment/63173/[/img] As before the water and clouds are being drawn correctly in the right eye, but instead of the left eye halo they are now drawn fully opaque in the left eye instead, especially noticable at the shoreline, and where the clouds intersect the mountain. This sort of thing happens on pretty much every semi-transparent effect in the game, of which there are many. Here's another example (ignore the broken shadows here, I haven't fixed that one yet) - without the TextureOverride to force it to use a stereo depth texture there is a halo of the barrels visible in the light shaft in the left eye: [img]https://forums.geforce.com/cmd/default/download-comment-attachment/63174/[/img] Forcing it to stereo makes the halo go away, but two other things break instead: The soft-particle effect on the light shaft has gone, making it abruptly intersect with the ground instead of fading out gradually, and more critically - one of the clouds on the distant mountain is now drawn fully opaque *inside* the house: [img]https://forums.geforce.com/cmd/default/download-comment-attachment/63175/[/img] I do believe that forcing this texture to stereo will be a necessary part of the solution, but we still need to find a way to get the correct depth information into the left eye. As for why the driver has not stereoised this texture automatically - I suspect that it is because it is never used as a render or depth/stencil target, and as such as far as the driver knows it is simply a texture and will never be stereoised (unless we force it). Rather, the depth information has (somehow) been copied into it from the real depth buffer, and this copy operation has only copied the information for the right eye. Somehow, the SLI compatibility bits change this copy operation such that the stereo information has been retained. If I had to guess, I would think that in SLI the copy is being performed on the GPU, while in non-SLI it is being performed on the CPU. It is not clear to me whether it is the game that changes the copy operation in SLI, or something in the driver or hardware - I have yet to identify when or how this copy occurs. Of course, I could be way off and maybe something else is happening that I haven't even considdered. [quote="bo3b"]The driver has heuristics we know, to mono some things, and stereo others. Those are at least partly driven by profiles, but maybe not all. We know that some flags will change to change the heuristics, but we don't know what they are, or what they do.[/quote] Yes, their driver is really a black box, and all we can do is poke at it with a stick. I know that the magic 0x701EB457 profile setting is very important - it's mere existance in a profile enables a whole bunch of stereo code regardless of what value it has (the value is important as well, but I don't know what for). In particular if this setting exists in a profile it enables windowed 3D for DX9 games, and is required for other stereo related settings to have any affect (as I've found out through experimentation). StereoTextureEnable (0x70EDB381) seems to be the main setting that affects the heuristics as to what gets stereoised and what doesn't, but other than some of the low bits we don't know what does what. StereoCutoff (0x709A1DDF) is another setting that exists in almost every 3D profile, but I haven't a clue what it does. It's really a shame that nVidia insists on having a closed source driver. There are so many unanswered questions about their black box driver that would be trivial to find out with access to the source code (not to mention bugs we could fix). Of course, we've been trying to convince them of this on the Linux side for over a decade and I don't think they are going to change their mind any time soon (however, recently there have been some promising signs that there are people in nVidia who understand the benefits this would bring, and they have demonstrated that they are willing to release some information about their cards to the open source community driven Nouveau driver, so maybe there is hope for them yet). [quote="mike_ar69"]you could look in Nvidia Inspector at the two profiles and see what is different about the SLI and stereo bits, or what "unknown" values there are at the bottom of the lists. Do you have the Nvidia API? There are .h files with loads of constants defined which might explain what some of the bits are?[/quote] Comparing the SLI compatibility bits of different profiles is still on my TODO list, but I don't know how much it will help unless I have some way to determine (or guess) what they actually do. I do have the public version of the nVidia SDK and have had a bit of a look through it, but I couldn't spot definitions for the SLI compatibility bits. I'll look through it again later though - there were several bit definitions that I wasn't sure what they were related to. Does anyone happen to know if the NDA version has any extra information about SLI or 3D? Apparently the "GPU Topology" feature (which includes SLI) has more detail in the NDA version, as does the "Frame Rendering" feature (which allows control over DirectX rendering). --- EDIT: I'm off on a tangent here, feel free to skip ahead --- In my day job I work fairly closely with low level hardware (at the operating system kernel / device driver level), so I can take a bit of a guess at what the SLI compatibility bits might do in general terms, even if I don't know the specifics. Even if you're not working at quite this low level, if you've ever written any multi-threading code you will be aware of the need for locking around access to shared data structures, and maybe you've even come across memory barriers. If you've measured performance on these things you will know they suck and maybe even had to change a mutex to a read/write lock to reduce lock contention, or found an alternative approach that avoids the need to do the locking altogether. And of course there's the lesser known but oh so significant impact of cache ping-pong, where a shared item (such as a lock) has to keep moving between the cache of different CPU cores because it keeps getting taken from different places, even if it is never actually contested. It's basically going to be the same thing in the GPU - if you add a second GPU it's going to need to be able to share resources between the two GPUs, which means locking or some other form of synchronisation primitive, but the two GPUs are a long way from each other so any time a resource needs to be accessed from a GPU that it's not currently on it will be *slow*, and if it's something the GPU is blocked on it won't be able to perform any other operations until it has got the resource. To be safe and make sure things render correctly they can go for full synchronisation between the two, but performance will suck. As such they find ways they can reduce the synchronisation - maybe both GPUs have a copy of a resource and don't bother checking if it has been updated on the other for a while. Maybe they avoid certain operations they know are going to require expensive synchronisation and use a different technique that doesn't need as much synchronisation. Maybe both GPUs perform the same calculation as it's faster than sending the result from one to another. The thing is - some of these optimisations may not be safe and could cause rendering issues, so they use these compatibility bits to define what optimisations are allowed for a given game. As I said, that's a guess based on my own experiences. I also have no idea where these optimisations take place - some are likely to be completely in hardware, others will be in the driver. Some may involve the driver telling the game that it doesn't support an operation that is known to be slow in SLI so that it is forced to do something else, and others may be an optimisation that an SLI aware game is able to do itself. ... back to the matter at hand ... Here's an example of one effect that uses the mono depth texture - this is a light shaft from the ceiling light in the previous two screenshots: Common vertex shader: [code] <VertexShader hash="52aed2f11227b60e"> <CalledPixelShaders>34270cfca283f591 5cc5cf5c77050b64 </CalledPixelShaders> <Register id=120 handle=00000000177F14D0>0000000000000000</Register> <Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register> </VertexShader> [/code] Pixel shader used when anti-aliasing is disabled or SMAA: Depth information is in regsiter 1: [code] SamplerState DownsampledDepthSampler__SampObj___s : register(s1); Texture2D<float4> DownsampledDepthSampler__TexObj__ : register(t1); <PixelShader hash="34270cfca283f591"> <ParentVertexShaders>52aed2f11227b60e </ParentVertexShaders> <Register id=0 handle=0000000036177590>2413676bd2cabe27</Register> <Register id=1 handle=000000001BB5EC90>062c6213062bab8c</Register> <Register id=120 handle=00000000177F14D0>0000000000000000</Register> <Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register> <RenderTarget id=0 handle=0000000024992750>835432345caa0c80</RenderTarget> <DepthTarget handle=000000001BB60490>44253aebc65d7056</DepthTarget> </PixelShader> RGB render target: <RenderTarget hash=835432345caa0c80 type=Texture2D Width=960 Height=540 MipLevels=1 ArraySize=1 RawFormat=9 Format="R16G16B16A16_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=40 CPUAccessFlags=0 MiscFlags=0></RenderTarget> Active depth target: <DepthTarget hash=44253aebc65d7056 type=Texture2D Width=960 Height=540 MipLevels=1 ArraySize=1 RawFormat=19 Format="R32G8X24_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=72 CPUAccessFlags=0 MiscFlags=0></DepthTarget> Mono depth texture/sampler: 062c6213062bab8c - info unknown, need to add another feature to 3Dmigoto to get it [/code] Notably, the depth texture used above (hash 062c6213062bab8c) is never used as a render or depth/stencil target. I'll have to add a bit more code to 3Dmigoto to collect information about other general textures to find out more about it. Also worthwhile to note is that this effect is being drawn at 1/2 the resolution (960x540) and then scaled up. The name of the depth texture/sampler also suggests that it has been downsampled. Also worth noting is that this effect is being drawn with an active depth target, as are many (but not all) of these broken effects. This will be the reason it can't simply access the depth target as an input (DirectX forbids it being used as both simultaneously) and uses a copy in a texture instead. For many of these effects, it may be possible to disable the active depth target as their opacity calculations would make them fully transparent on any occluded pixels anyway. This might muck up some things (e.g. an underwater explosion may be drawn in front of the water), but more seriously as I noted in my previous post for some reason this is messing up the stereoisation of these effects some (most) of the time and I haven't a clue why. The variation of the pixel shader for the same effect used when anti-aliasing is MSAA2 or TXAA2: Depth information is in register 0: [code] SamplerState DepthVPSampler__SampObj___s : register(s0); Texture2D<float4> DepthVPSampler__TexObj__ : register(t0); <PixelShader hash="5cc5cf5c77050b64"> <ParentVertexShaders>52aed2f11227b60e </ParentVertexShaders> <Register id=0 handle=000000001BB5E3D0>dd64dbb61b36d8f6</Register> <Register id=1 handle=0000000036177590>2413676bd2cabe27</Register> <Register id=120 handle=00000000177F14D0>0000000000000000</Register> <Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register> <RenderTarget id=0 handle=000000001B84AA10>bad9539ab9245b05</RenderTarget> <DepthTarget handle=000000001BB5CB90>56ec0f474627a0bf</DepthTarget> </PixelShader> RGB Render target: <RenderTarget hash=bad9539ab9245b05 type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=9 Format="R16G16B16A16_TYPELESS" SampleDesc.Count=2 SampleDesc.Quality=0 Usage=0 BindFlags=40 CPUAccessFlags=0 MiscFlags=0></RenderTarget> Active depth target: <DepthTarget hash=56ec0f474627a0bf type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=19 Format="R32G8X24_TYPELESS" SampleDesc.Count=2 SampleDesc.Quality=0 Usage=0 BindFlags=72 CPUAccessFlags=0 MiscFlags=0></DepthTarget> Mono depth texture/sampler: <RenderTarget hash=dd64dbb61b36d8f6 type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=39 Format="R32_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=168 CPUAccessFlags=0 MiscFlags=0></RenderTarget> [/code] In this case, another resource with the same properties as the depth texture has been used as a render target, so I do have information about it. It is however NOT the same texture, as searching for the *handle* in the ShaderUsage.txt shows that it is only ever used as a texture, and never as a render or depth target. With these AA settings this effect has been drawn at full resolution instead of 1/2. I haven't checked, but I would expect that with MSAA4/TXAA4 it would be double resolution and with MSAA8/TXAA8 it would be 4x the screen resolution (note to self: The hashing algorithm in 3Dmigoto will not currently match these upscaled/downscaled resources if the base resolution is changed). For comparison, here is the outdoor shadow shader, which also uses depth information, but unlike the above shader has the correct stereo depth information: [code] <VertexShader hash="dbdfa4de5fbab6d1"> <CalledPixelShaders>149c09dd3792cebb 78fedf913827866d </CalledPixelShaders> <Register id=120 handle=00000000177F14D0>0000000000000000</Register> <Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register> </VertexShader> [/code] No AA / SMAA: Depth information is in register 0: [code] SamplerState DepthVPSampler__SampObj___s : register(s0); Texture2D<float4> DepthVPSampler__TexObj__ : register(t0); <PixelShader hash="149c09dd3792cebb"> <ParentVertexShaders>dbdfa4de5fbab6d1 </ParentVertexShaders> <Register id=0 handle=000000001BAC15D0>617eea474caf667a</Register> <Register id=1 handle=0000000024353F50>2baa91c87b4c646a</Register> <Register id=2 handle=000000001B58A950>dcaefb46f65dc434</Register> <Register id=3 handle=000000001B58AD90>2b6806281b61ad40</Register> <Register id=120 handle=00000000177F14D0>0000000000000000</Register> <Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register> <RenderTarget id=0 handle=000000001BB5B810>911bbcf667d2ac55</RenderTarget> <RenderTarget id=0 handle=000000004B425DD0>911bbcf667d2ac55</RenderTarget> </PixelShader> RGB render target: <RenderTarget hash=911bbcf667d2ac55 type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=60 Format="R8_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=40 CPUAccessFlags=0 MiscFlags=0></RenderTarget> Active depth target: None Stereo depth texture/sampler: <DepthTarget hash=617eea474caf667a type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=19 Format="R32G8X24_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=72 CPUAccessFlags=0 MiscFlags=0></DepthTarget> [/code] In this case the shader has no active depth/stencil target set (which makes sense given that this is a post-processing effect), but the texture it is using *IS* the depth target used elsewhere (the handle matches that used in DepthTarget for many other shaders). i.e. it is sampling directly from the real stereo depth buffer, hence why this one works while the above effect does not. Notably, in game these shadows appear to be out of sync with the rest of the scene by one frame. It is possible that the depth texture they are sampling from is actually from the previous frame, or this may be due to a draw order issue in this game. MSAA2 / TXAA2: Depth information is in register 0: [code] Texture2DMS<float4,2> DepthVPSampler_TextureObject : register(t0); <PixelShader hash="78fedf913827866d"> <ParentVertexShaders>dbdfa4de5fbab6d1 </ParentVertexShaders> <Register id=0 handle=000000001BB5CB90>56ec0f474627a0bf</Register> <Register id=1 handle=0000000024353F50>2baa91c87b4c646a</Register> <Register id=2 handle=000000001B58A950>dcaefb46f65dc434</Register> <Register id=3 handle=000000001B58AD90>2b6806281b61ad40</Register> <Register id=120 handle=00000000177F14D0>0000000000000000</Register> <Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register> <RenderTarget id=0 handle=000000001BB59710>77c26bf659a54804</RenderTarget> <RenderTarget id=0 handle=000000001BB60250>77c26bf659a54804</RenderTarget> </PixelShader> RGB render target: <RenderTarget hash=77c26bf659a54804 type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=60 Format="R8_TYPELESS" SampleDesc.Count=2 SampleDesc.Quality=0 Usage=0 BindFlags=40 CPUAccessFlags=0 MiscFlags=0></RenderTarget> Active depth target: None Stereo depth texture/sampler: <DepthTarget hash=56ec0f474627a0bf type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=19 Format="R32G8X24_TYPELESS" SampleDesc.Count=2 SampleDesc.Quality=0 Usage=0 BindFlags=72 CPUAccessFlags=0 MiscFlags=0></DepthTarget> [/code] And again this has no active depth/stencil buffer and is sampling directly from the real depth buffer (again the handle for register 0 appears in the DepthTarget of many other shaders, *including* the depth buffer set for the MSAA2/TXAA2 ceiling light shaft). Unlike the no AA / SMAA case, these shadows are perfectly synchronised with the rest of the scene, indicating that the depth texture/sampler they are using is up to date for the current frame. This is also the only instance of a multi-sampled depth buffer so far (Texture2DMS<float4,2>, SampleDesc.Count=2). I don't know what the significance of this is, just adding it as it's something I've noticed. So, what conclusions can we draw? The shadows work because they are sampling the depth buffer directly. Examining the links in the ShaderUsage.txt clearly shows that the resource used for the depth buffer in many shaders is directly sampled by the shadow shader. The broken effects do not work because they are using a different resource to sample the depth information. This resource is never used as a depth target, nor a render target - it is only ever used as a texture resource. This indicates that it is never drawn by a shader on the GPU, it must be copied from the real depth target somehow, but I have yet to identify when or how this copy occurs. I believe that it is this copy operation that is losing the stereo information, perhaps because the game is performing the copy on the CPU, or perhaps there is another operation that does this copy on the GPU, but fails to maintain the stereo information. I suspect that the SLI compatibility bits somehow make this copy operation to occur differently. It may be that that game uses a different code path and uses the GPU to copy it instead of the CPU (and since the GPU knows about the stereo information this would maintain the stereo information), or it may be that the driver or hardware does something differently related to the copy - I'm not sure. I would be very interested to see the ShaderUsage.txt from this game on an SLI system to see what differs (this file is generated whenever any shader or render target is marked). I've pushed up the code to get this extra information, but it's not in the 0.99.30 release - should be in the next release bo3b makes :)
I've had to take a little break from this over the last few days (hence fixes to DreadOut and Eleusis on the blog), but it's time to do some more investigating now. I haven't tried any of the other ideas I wrote about in my last post as yet, but I do have high hopes for some of them :)

This post is mostly going to be a bit of a dump of the information I've gathered so far - I'm not really expecting anyone to suddenly spot the problem in all of this, and don't worry if this goes over your head, but typing this out has helped to collect my own thoughts.

bo3b said:When you say a mono-depth-buffer, is this the same as the driver's idea of a 'surface', and/or a directX idea of Texture?

I go into this in a bit more detail below, but it is probably most accurate to call it a texture, as it is never used as a render target or depth/stencil target. Rather, it seems to be that the real depth buffer has been copied into it, though as yet I don't know when or how this happens.

bo3b said:Did you try using that texture override mechanism from d3dx.ini? It would be interesting if it made any change to the scenario at all. I'm not fully sure it's functional, but is probably worth a try if that texture is mono and should be stereo.

I have already tried this. It does work in that it forces the driver to create a stereo version of the texture, but the result is that the right eye still has the correct depth information, but the left eye has no depth information at all - imagine a blank depth buffer set to infinity.

This has the result of making the water, clouds, fire, smoke, etc look correct in the right eye, but be 100% opaque in the left eye (as the depth buffer is being used to calculate opacity in most of these effects). For example, take a look at this screenshot:

Image

As before the water and clouds are being drawn correctly in the right eye, but instead of the left eye halo they are now drawn fully opaque in the left eye instead, especially noticable at the shoreline, and where the clouds intersect the mountain. This sort of thing happens on pretty much every semi-transparent effect in the game, of which there are many.

Here's another example (ignore the broken shadows here, I haven't fixed that one yet) - without the TextureOverride to force it to use a stereo depth texture there is a halo of the barrels visible in the light shaft in the left eye:

Image

Forcing it to stereo makes the halo go away, but two other things break instead: The soft-particle effect on the light shaft has gone, making it abruptly intersect with the ground instead of fading out gradually, and more critically - one of the clouds on the distant mountain is now drawn fully opaque *inside* the house:

Image

I do believe that forcing this texture to stereo will be a necessary part of the solution, but we still need to find a way to get the correct depth information into the left eye. As for why the driver has not stereoised this texture automatically - I suspect that it is because it is never used as a render or depth/stencil target, and as such as far as the driver knows it is simply a texture and will never be stereoised (unless we force it). Rather, the depth information has (somehow) been copied into it from the real depth buffer, and this copy operation has only copied the information for the right eye.

Somehow, the SLI compatibility bits change this copy operation such that the stereo information has been retained. If I had to guess, I would think that in SLI the copy is being performed on the GPU, while in non-SLI it is being performed on the CPU. It is not clear to me whether it is the game that changes the copy operation in SLI, or something in the driver or hardware - I have yet to identify when or how this copy occurs. Of course, I could be way off and maybe something else is happening that I haven't even considdered.

bo3b said:The driver has heuristics we know, to mono some things, and stereo others. Those are at least partly driven by profiles, but maybe not all. We know that some flags will change to change the heuristics, but we don't know what they are, or what they do.

Yes, their driver is really a black box, and all we can do is poke at it with a stick. I know that the magic 0x701EB457 profile setting is very important - it's mere existance in a profile enables a whole bunch of stereo code regardless of what value it has (the value is important as well, but I don't know what for). In particular if this setting exists in a profile it enables windowed 3D for DX9 games, and is required for other stereo related settings to have any affect (as I've found out through experimentation). StereoTextureEnable (0x70EDB381) seems to be the main setting that affects the heuristics as to what gets stereoised and what doesn't, but other than some of the low bits we don't know what does what. StereoCutoff (0x709A1DDF) is another setting that exists in almost every 3D profile, but I haven't a clue what it does.

It's really a shame that nVidia insists on having a closed source driver. There are so many unanswered questions about their black box driver that would be trivial to find out with access to the source code (not to mention bugs we could fix). Of course, we've been trying to convince them of this on the Linux side for over a decade and I don't think they are going to change their mind any time soon (however, recently there have been some promising signs that there are people in nVidia who understand the benefits this would bring, and they have demonstrated that they are willing to release some information about their cards to the open source community driven Nouveau driver, so maybe there is hope for them yet).


mike_ar69 said:you could look in Nvidia Inspector at the two profiles and see what is different about the SLI and stereo bits, or what "unknown" values there are at the bottom of the lists. Do you have the Nvidia API? There are .h files with loads of constants defined which might explain what some of the bits are?

Comparing the SLI compatibility bits of different profiles is still on my TODO list, but I don't know how much it will help unless I have some way to determine (or guess) what they actually do.

I do have the public version of the nVidia SDK and have had a bit of a look through it, but I couldn't spot definitions for the SLI compatibility bits. I'll look through it again later though - there were several bit definitions that I wasn't sure what they were related to. Does anyone happen to know if the NDA version has any extra information about SLI or 3D? Apparently the "GPU Topology" feature (which includes SLI) has more detail in the NDA version, as does the "Frame Rendering" feature (which allows control over DirectX rendering).

--- EDIT: I'm off on a tangent here, feel free to skip ahead ---

In my day job I work fairly closely with low level hardware (at the operating system kernel / device driver level), so I can take a bit of a guess at what the SLI compatibility bits might do in general terms, even if I don't know the specifics. Even if you're not working at quite this low level, if you've ever written any multi-threading code you will be aware of the need for locking around access to shared data structures, and maybe you've even come across memory barriers. If you've measured performance on these things you will know they suck and maybe even had to change a mutex to a read/write lock to reduce lock contention, or found an alternative approach that avoids the need to do the locking altogether. And of course there's the lesser known but oh so significant impact of cache ping-pong, where a shared item (such as a lock) has to keep moving between the cache of different CPU cores because it keeps getting taken from different places, even if it is never actually contested.

It's basically going to be the same thing in the GPU - if you add a second GPU it's going to need to be able to share resources between the two GPUs, which means locking or some other form of synchronisation primitive, but the two GPUs are a long way from each other so any time a resource needs to be accessed from a GPU that it's not currently on it will be *slow*, and if it's something the GPU is blocked on it won't be able to perform any other operations until it has got the resource. To be safe and make sure things render correctly they can go for full synchronisation between the two, but performance will suck. As such they find ways they can reduce the synchronisation - maybe both GPUs have a copy of a resource and don't bother checking if it has been updated on the other for a while. Maybe they avoid certain operations they know are going to require expensive synchronisation and use a different technique that doesn't need as much synchronisation. Maybe both GPUs perform the same calculation as it's faster than sending the result from one to another. The thing is - some of these optimisations may not be safe and could cause rendering issues, so they use these compatibility bits to define what optimisations are allowed for a given game.

As I said, that's a guess based on my own experiences. I also have no idea where these optimisations take place - some are likely to be completely in hardware, others will be in the driver. Some may involve the driver telling the game that it doesn't support an operation that is known to be slow in SLI so that it is forced to do something else, and others may be an optimisation that an SLI aware game is able to do itself.


... back to the matter at hand ...



Here's an example of one effect that uses the mono depth texture - this is a light shaft from the ceiling light in the previous two screenshots:

Common vertex shader:
<VertexShader hash="52aed2f11227b60e">
<CalledPixelShaders>34270cfca283f591 5cc5cf5c77050b64 </CalledPixelShaders>
<Register id=120 handle=00000000177F14D0>0000000000000000</Register>
<Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register>
</VertexShader>


Pixel shader used when anti-aliasing is disabled or SMAA:
Depth information is in regsiter 1:
SamplerState DownsampledDepthSampler__SampObj___s : register(s1);
Texture2D<float4> DownsampledDepthSampler__TexObj__ : register(t1);

<PixelShader hash="34270cfca283f591">
<ParentVertexShaders>52aed2f11227b60e </ParentVertexShaders>
<Register id=0 handle=0000000036177590>2413676bd2cabe27</Register>
<Register id=1 handle=000000001BB5EC90>062c6213062bab8c</Register>
<Register id=120 handle=00000000177F14D0>0000000000000000</Register>
<Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register>
<RenderTarget id=0 handle=0000000024992750>835432345caa0c80</RenderTarget>
<DepthTarget handle=000000001BB60490>44253aebc65d7056</DepthTarget>
</PixelShader>

RGB render target:
<RenderTarget hash=835432345caa0c80 type=Texture2D Width=960 Height=540 MipLevels=1 ArraySize=1 RawFormat=9 Format="R16G16B16A16_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=40 CPUAccessFlags=0 MiscFlags=0></RenderTarget>

Active depth target:
<DepthTarget hash=44253aebc65d7056 type=Texture2D Width=960 Height=540 MipLevels=1 ArraySize=1 RawFormat=19 Format="R32G8X24_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=72 CPUAccessFlags=0 MiscFlags=0></DepthTarget>

Mono depth texture/sampler:
062c6213062bab8c - info unknown, need to add another feature to 3Dmigoto to get it

Notably, the depth texture used above (hash 062c6213062bab8c) is never used as a render or depth/stencil target. I'll have to add a bit more code to 3Dmigoto to collect information about other general textures to find out more about it.

Also worthwhile to note is that this effect is being drawn at 1/2 the resolution (960x540) and then scaled up. The name of the depth texture/sampler also suggests that it has been downsampled.

Also worth noting is that this effect is being drawn with an active depth target, as are many (but not all) of these broken effects. This will be the reason it can't simply access the depth target as an input (DirectX forbids it being used as both simultaneously) and uses a copy in a texture instead. For many of these effects, it may be possible to disable the active depth target as their opacity calculations would make them fully transparent on any occluded pixels anyway. This might muck up some things (e.g. an underwater explosion may be drawn in front of the water), but more seriously as I noted in my previous post for some reason this is messing up the stereoisation of these effects some (most) of the time and I haven't a clue why.



The variation of the pixel shader for the same effect used when anti-aliasing is MSAA2 or TXAA2:
Depth information is in register 0:
SamplerState DepthVPSampler__SampObj___s : register(s0);
Texture2D<float4> DepthVPSampler__TexObj__ : register(t0);

<PixelShader hash="5cc5cf5c77050b64">
<ParentVertexShaders>52aed2f11227b60e </ParentVertexShaders>
<Register id=0 handle=000000001BB5E3D0>dd64dbb61b36d8f6</Register>
<Register id=1 handle=0000000036177590>2413676bd2cabe27</Register>
<Register id=120 handle=00000000177F14D0>0000000000000000</Register>
<Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register>
<RenderTarget id=0 handle=000000001B84AA10>bad9539ab9245b05</RenderTarget>
<DepthTarget handle=000000001BB5CB90>56ec0f474627a0bf</DepthTarget>
</PixelShader>

RGB Render target:
<RenderTarget hash=bad9539ab9245b05 type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=9 Format="R16G16B16A16_TYPELESS" SampleDesc.Count=2 SampleDesc.Quality=0 Usage=0 BindFlags=40 CPUAccessFlags=0 MiscFlags=0></RenderTarget>

Active depth target:
<DepthTarget hash=56ec0f474627a0bf type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=19 Format="R32G8X24_TYPELESS" SampleDesc.Count=2 SampleDesc.Quality=0 Usage=0 BindFlags=72 CPUAccessFlags=0 MiscFlags=0></DepthTarget>

Mono depth texture/sampler:
<RenderTarget hash=dd64dbb61b36d8f6 type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=39 Format="R32_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=168 CPUAccessFlags=0 MiscFlags=0></RenderTarget>


In this case, another resource with the same properties as the depth texture has been used as a render target, so I do have information about it. It is however NOT the same texture, as searching for the *handle* in the ShaderUsage.txt shows that it is only ever used as a texture, and never as a render or depth target.

With these AA settings this effect has been drawn at full resolution instead of 1/2. I haven't checked, but I would expect that with MSAA4/TXAA4 it would be double resolution and with MSAA8/TXAA8 it would be 4x the screen resolution (note to self: The hashing algorithm in 3Dmigoto will not currently match these upscaled/downscaled resources if the base resolution is changed).



For comparison, here is the outdoor shadow shader, which also uses depth information, but unlike the above shader has the correct stereo depth information:

<VertexShader hash="dbdfa4de5fbab6d1">
<CalledPixelShaders>149c09dd3792cebb 78fedf913827866d </CalledPixelShaders>
<Register id=120 handle=00000000177F14D0>0000000000000000</Register>
<Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register>
</VertexShader>


No AA / SMAA:
Depth information is in register 0:
SamplerState DepthVPSampler__SampObj___s : register(s0);
Texture2D<float4> DepthVPSampler__TexObj__ : register(t0);

<PixelShader hash="149c09dd3792cebb">
<ParentVertexShaders>dbdfa4de5fbab6d1 </ParentVertexShaders>
<Register id=0 handle=000000001BAC15D0>617eea474caf667a</Register>
<Register id=1 handle=0000000024353F50>2baa91c87b4c646a</Register>
<Register id=2 handle=000000001B58A950>dcaefb46f65dc434</Register>
<Register id=3 handle=000000001B58AD90>2b6806281b61ad40</Register>
<Register id=120 handle=00000000177F14D0>0000000000000000</Register>
<Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register>
<RenderTarget id=0 handle=000000001BB5B810>911bbcf667d2ac55</RenderTarget>
<RenderTarget id=0 handle=000000004B425DD0>911bbcf667d2ac55</RenderTarget>
</PixelShader>

RGB render target:
<RenderTarget hash=911bbcf667d2ac55 type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=60 Format="R8_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=40 CPUAccessFlags=0 MiscFlags=0></RenderTarget>

Active depth target:
None

Stereo depth texture/sampler:
<DepthTarget hash=617eea474caf667a type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=19 Format="R32G8X24_TYPELESS" SampleDesc.Count=1 SampleDesc.Quality=0 Usage=0 BindFlags=72 CPUAccessFlags=0 MiscFlags=0></DepthTarget>


In this case the shader has no active depth/stencil target set (which makes sense given that this is a post-processing effect), but the texture it is using *IS* the depth target used elsewhere (the handle matches that used in DepthTarget for many other shaders). i.e. it is sampling directly from the real stereo depth buffer, hence why this one works while the above effect does not.

Notably, in game these shadows appear to be out of sync with the rest of the scene by one frame. It is possible that the depth texture they are sampling from is actually from the previous frame, or this may be due to a draw order issue in this game.


MSAA2 / TXAA2:
Depth information is in register 0:
Texture2DMS<float4,2> DepthVPSampler_TextureObject : register(t0);

<PixelShader hash="78fedf913827866d">
<ParentVertexShaders>dbdfa4de5fbab6d1 </ParentVertexShaders>
<Register id=0 handle=000000001BB5CB90>56ec0f474627a0bf</Register>
<Register id=1 handle=0000000024353F50>2baa91c87b4c646a</Register>
<Register id=2 handle=000000001B58A950>dcaefb46f65dc434</Register>
<Register id=3 handle=000000001B58AD90>2b6806281b61ad40</Register>
<Register id=120 handle=00000000177F14D0>0000000000000000</Register>
<Register id=125 handle=00000000177F0C10>24a83fdae0465bcc</Register>
<RenderTarget id=0 handle=000000001BB59710>77c26bf659a54804</RenderTarget>
<RenderTarget id=0 handle=000000001BB60250>77c26bf659a54804</RenderTarget>
</PixelShader>

RGB render target:
<RenderTarget hash=77c26bf659a54804 type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=60 Format="R8_TYPELESS" SampleDesc.Count=2 SampleDesc.Quality=0 Usage=0 BindFlags=40 CPUAccessFlags=0 MiscFlags=0></RenderTarget>

Active depth target:
None

Stereo depth texture/sampler:
<DepthTarget hash=56ec0f474627a0bf type=Texture2D Width=1920 Height=1080 MipLevels=1 ArraySize=1 RawFormat=19 Format="R32G8X24_TYPELESS" SampleDesc.Count=2 SampleDesc.Quality=0 Usage=0 BindFlags=72 CPUAccessFlags=0 MiscFlags=0></DepthTarget>


And again this has no active depth/stencil buffer and is sampling directly from the real depth buffer (again the handle for register 0 appears in the DepthTarget of many other shaders, *including* the depth buffer set for the MSAA2/TXAA2 ceiling light shaft).

Unlike the no AA / SMAA case, these shadows are perfectly synchronised with the rest of the scene, indicating that the depth texture/sampler they are using is up to date for the current frame.

This is also the only instance of a multi-sampled depth buffer so far (Texture2DMS<float4,2>, SampleDesc.Count=2). I don't know what the significance of this is, just adding it as it's something I've noticed.



So, what conclusions can we draw?

The shadows work because they are sampling the depth buffer directly. Examining the links in the ShaderUsage.txt clearly shows that the resource used for the depth buffer in many shaders is directly sampled by the shadow shader.

The broken effects do not work because they are using a different resource to sample the depth information. This resource is never used as a depth target, nor a render target - it is only ever used as a texture resource. This indicates that it is never drawn by a shader on the GPU, it must be copied from the real depth target somehow, but I have yet to identify when or how this copy occurs.

I believe that it is this copy operation that is losing the stereo information, perhaps because the game is performing the copy on the CPU, or perhaps there is another operation that does this copy on the GPU, but fails to maintain the stereo information.

I suspect that the SLI compatibility bits somehow make this copy operation to occur differently. It may be that that game uses a different code path and uses the GPU to copy it instead of the CPU (and since the GPU knows about the stereo information this would maintain the stereo information), or it may be that the driver or hardware does something differently related to the copy - I'm not sure.

I would be very interested to see the ShaderUsage.txt from this game on an SLI system to see what differs (this file is generated whenever any shader or render target is marked). I've pushed up the code to get this extra information, but it's not in the 0.99.30 release - should be in the next release bo3b makes :)

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/15/2015 03:38 PM   
I pulled in top-of-tree and ran against the current shader set in the project. This is running SLI, but no fix is enabled for the white fog/tree haze problem, but that shouldn't matter for a ShaderUsage.txt right?
I pulled in top-of-tree and ran against the current shader set in the project. This is running SLI, but no fix is enabled for the white fog/tree haze problem, but that shouldn't matter for a ShaderUsage.txt right?
Attachments

ShaderUsage.txt.jpg

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 01/16/2015 06:46 AM   
  9 / 28    
Scroll To Top