3Dmigoto now open-source...
  138 / 143    
[quote="ddr0390"]BTW should line 17 be "};" or just "}"?[/quote] "};" - this is an ugly left over from the C language from which HLSL is inspired. In C it is necessary as you can specify a variable name straight after a structure definition and the semicolon tells the language where the variable name ends or that you aren't using one. This makes no sense in HLSL and IMO was a mistake in C as well as it is inconsistent with other } constructs and therefore easily forgotten, I suspect leading to millions of hours of wasted compiler time and a lot of head scratching followed by face palming in cases where the compiler's error message says some cryptic nonsense about whatever came after the mistake instead of pointing to the mistake itself (I will give credit to compilers that suggest you may have forgotten a semicolon, but this should never have been in the language). [quote]Does the fact that there is "t3.xxxx" (line 121 in original shader) in one case and "t3.xyzw" (line 159) in the other case have any meaning for fixing the code? [/quote]Yes it does, and I'm glad to see you were observant enough to question it, but don't worry too much if this explanation goes over your head - most of these ld_structured_indexable cases are the simple cases where you can ignore it, as is the case here - but it's worth knowing what to look out for that may indicate you need to do something special :) It plays into the register component swizzle for inputs (which may swap register components around or affect which entry in the struct is selected) and masking for outputs. Unfortunately this is a complicated area with some key differences in how it works between assembly and HLSL which can complicate translating between the two, which is why I didn't cover this before. [code] ld_str...(...)(...) r3.w, v10.x, l(272), t3.xxxx ^^^^ first parameter is output register and >component mask< ^^^^^ structure index within buffer ^^^^^^ offset within individual structure ^^^^^^^ input register and >swizzle< [/code] In this case the first 32 bit component (x) of the input t3 register at the offset specified by v10.x and l(272) is swizzled to all four channels (xxxx), but only the fourth channel (w) is enabled in the output mask, so you can disregard the first three channels in the swizzle, making this r3.w = t3.___x or just r3.w = t3.x for some offset of t3. Since the offset points to "float m_glowFactor", which is a >single< 32bit float and the x in the swizzle indicates we are interested in the first 32bits at that offset we just take the float by itself. In other words, there is a lot of complexity here, but we can just ignore it in this case. [code] ld_st...(...)(...) r2.xyzw, v10.x, l(208), t3.xyzw ^^^^^^^ first parameter is output register and >component mask< ^^^^^^^ input register and >swizzle< [/code] In this case the output mask on r2 says we are writing to all four components (xyzw), and the input swizzle says we are reading from all four components (that is, four consecutive 32bit values) from the input structure >in the same xyzw order<. If we look at what the offset points to in the structure we find a "float4 m_debugColor", matching both the swizzle and mask, so in this case we can simply take all four input values (it's a float4, so this matches) and put them in the output register without re-arranging anything. Most of these that I've seen fall into one of these two simple cases, but you may come across other cases where this is more complicated, such as if you have two or three components in the output mask, especially if they are "yz" "yw" "zw" or "yzw" (x and maybe y is missing means you >disregard< the first and maybe second position in the input swizzle) or where the swizzle is something other than "xxxx" or "xyzw" meaning you may have to re-arrange components or account for a slightly different offset in the structure. r2.xyzw = t1.xxxx is another special case where you take a single input component and replicate it to multiple (four in this case) output components. Occasionally you may also find a situation where the item you find in the structure doesn't match with the number of components the instruction is using and you have to do something more complicated to specify the correct items. e.g. perhaps you find a "float2 foo;" (two components) followed by a "float bar;" (one component) and a "float baz;" (one component) and the instruction calls for four components in consecutive xyzw order, which would become something like output = float4(struct[index].foo.x, struct[index].foo.y, struct[index].bar, struct[index].baz); This shouldn't be very common, and most of the load instructions I've seen only point to a single entry in the structure, but I'm pretty sure I have seen this happen occasionally. Above I mentioned that Assembly and HLSL are slightly different in how this is specified - in Assmebly an input swizzle must always be specified with four components (one also works, but two or three are silently corrupted), and the output mask indicates which ones are in use and which are ignored. In HLSL you don't even mention any components that aren't in use. So in assembly an output mask of r3.yzw and an input swizzle of t1.xxyz becomes r3.yzw = t1.xyz in HLSL (plus whatever other boiler plate is needed on that line), where the first x has been dropped since it was masked out in assembly. [quote]I don't get an error sound when I start the game. But as soon as I make any changes to the shader and reload the shader I get an error sound (used the old 3Dmigoto version included in the original fix). Also inserting my workaround code in the shader BEFORE starting the game has no results as the shader seems to be ignored.[/quote]Use the newest version - there are a number of bug fixes to do with shader reloading and caching that may be important here, and it has an overlay that will tell you what the error was instead of having to check the log. We vary rarely make any backwards incompatible changes in 3DMigoto so in most cases it is safe to use a newer version. [quote="ddr0390"]Update: Tested the changed shader with a "clean install" of 3Dmigoto (1.3.11) and only f8bdf818996f7aae-ps_replace.txt in ShaderFixes to prevent any conflicts with other fixed shaders. Got the same error beep when reloading the shader after implementing my workaround code and the UI says that there is a problem with "f8bdf818996f7aae-ps_replace.txt(135,10-18): error X3004: undeclared identifier 'AnisoWrap' ". This seems to refer to [code]r2.y = AnisoWrap[]..swiz;[/code][/quote]Ahh, I missed those when I looked at the shaders originally. Just delete all those lines - they are bogus output's from the code stub that warned about the ld_structured_indexable just above them. I've added tackling structured buffer support in the decompiler to my TODO list, so a future version of 3DMigoto might be able to handle most of this for you.
ddr0390 said:BTW should line 17 be "};" or just "}"?

"};" - this is an ugly left over from the C language from which HLSL is inspired. In C it is necessary as you can specify a variable name straight after a structure definition and the semicolon tells the language where the variable name ends or that you aren't using one. This makes no sense in HLSL and IMO was a mistake in C as well as it is inconsistent with other } constructs and therefore easily forgotten, I suspect leading to millions of hours of wasted compiler time and a lot of head scratching followed by face palming in cases where the compiler's error message says some cryptic nonsense about whatever came after the mistake instead of pointing to the mistake itself (I will give credit to compilers that suggest you may have forgotten a semicolon, but this should never have been in the language).

Does the fact that there is "t3.xxxx" (line 121 in original shader) in one case and "t3.xyzw" (line 159) in the other case have any meaning for fixing the code?
Yes it does, and I'm glad to see you were observant enough to question it, but don't worry too much if this explanation goes over your head - most of these ld_structured_indexable cases are the simple cases where you can ignore it, as is the case here - but it's worth knowing what to look out for that may indicate you need to do something special :)

It plays into the register component swizzle for inputs (which may swap register components around or affect which entry in the struct is selected) and masking for outputs. Unfortunately this is a complicated area with some key differences in how it works between assembly and HLSL which can complicate translating between the two, which is why I didn't cover this before.

ld_str...(...)(...) r3.w, v10.x, l(272), t3.xxxx
^^^^ first parameter is output register and >component mask<
^^^^^ structure index within buffer
^^^^^^ offset within individual structure
^^^^^^^ input register and >swizzle<

In this case the first 32 bit component (x) of the input t3 register at the offset specified by v10.x and l(272) is swizzled to all four channels (xxxx), but only the fourth channel (w) is enabled in the output mask, so you can disregard the first three channels in the swizzle, making this r3.w = t3.___x or just r3.w = t3.x for some offset of t3. Since the offset points to "float m_glowFactor", which is a >single< 32bit float and the x in the swizzle indicates we are interested in the first 32bits at that offset we just take the float by itself. In other words, there is a lot of complexity here, but we can just ignore it in this case.

ld_st...(...)(...) r2.xyzw, v10.x, l(208), t3.xyzw 
^^^^^^^ first parameter is output register and >component mask<
^^^^^^^ input register and >swizzle<

In this case the output mask on r2 says we are writing to all four components (xyzw), and the input swizzle says we are reading from all four components (that is, four consecutive 32bit values) from the input structure >in the same xyzw order<. If we look at what the offset points to in the structure we find a "float4 m_debugColor", matching both the swizzle and mask, so in this case we can simply take all four input values (it's a float4, so this matches) and put them in the output register without re-arranging anything.

Most of these that I've seen fall into one of these two simple cases, but you may come across other cases where this is more complicated, such as if you have two or three components in the output mask, especially if they are "yz" "yw" "zw" or "yzw" (x and maybe y is missing means you >disregard< the first and maybe second position in the input swizzle) or where the swizzle is something other than "xxxx" or "xyzw" meaning you may have to re-arrange components or account for a slightly different offset in the structure. r2.xyzw = t1.xxxx is another special case where you take a single input component and replicate it to multiple (four in this case) output components.

Occasionally you may also find a situation where the item you find in the structure doesn't match with the number of components the instruction is using and you have to do something more complicated to specify the correct items. e.g. perhaps you find a "float2 foo;" (two components) followed by a "float bar;" (one component) and a "float baz;" (one component) and the instruction calls for four components in consecutive xyzw order, which would become something like output = float4(struct[index].foo.x, struct[index].foo.y, struct[index].bar, struct[index].baz); This shouldn't be very common, and most of the load instructions I've seen only point to a single entry in the structure, but I'm pretty sure I have seen this happen occasionally.

Above I mentioned that Assembly and HLSL are slightly different in how this is specified - in Assmebly an input swizzle must always be specified with four components (one also works, but two or three are silently corrupted), and the output mask indicates which ones are in use and which are ignored. In HLSL you don't even mention any components that aren't in use. So in assembly an output mask of r3.yzw and an input swizzle of t1.xxyz becomes r3.yzw = t1.xyz in HLSL (plus whatever other boiler plate is needed on that line), where the first x has been dropped since it was masked out in assembly.

I don't get an error sound when I start the game. But as soon as I make any changes to the shader and reload the shader I get an error sound (used the old 3Dmigoto version included in the original fix). Also inserting my workaround code in the shader BEFORE starting the game has no results as the shader seems to be ignored.
Use the newest version - there are a number of bug fixes to do with shader reloading and caching that may be important here, and it has an overlay that will tell you what the error was instead of having to check the log. We vary rarely make any backwards incompatible changes in 3DMigoto so in most cases it is safe to use a newer version.

ddr0390 said:Update: Tested the changed shader with a "clean install" of 3Dmigoto (1.3.11) and only f8bdf818996f7aae-ps_replace.txt in ShaderFixes to prevent any conflicts with other fixed shaders.
Got the same error beep when reloading the shader after implementing my workaround code and the UI says that there is a problem with
"f8bdf818996f7aae-ps_replace.txt(135,10-18): error X3004: undeclared identifier 'AnisoWrap' ".
This seems to refer to
r2.y = AnisoWrap[]..swiz;
Ahh, I missed those when I looked at the shaders originally. Just delete all those lines - they are bogus output's from the code stub that warned about the ld_structured_indexable just above them.

I've added tackling structured buffer support in the decompiler to my TODO list, so a future version of 3DMigoto might be able to handle most of this for you.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 10/27/2018 04:14 PM   
Update: Just deleted the lines with AnisoWrap[]..swiz and now I get no error sound/message any more. Also the specular glow workaround shows the desired result. So far I could not observe any negative side effects, the surfaces look normal. I will change the other shaders the same way and test if it works.
Update: Just deleted the lines with AnisoWrap[]..swiz and now I get no error sound/message any more. Also the specular glow workaround shows the desired result. So far I could not observe any negative side effects, the surfaces look normal. I will change the other shaders the same way and test if it works.

My original display name is 3d4dd - for some reason Nvidia changed it..?!

Posted 10/27/2018 04:18 PM   
So we were posting simultameosly ;) Tested some more shaders and everything works fine now. Thank You very much for Your great support and especially explaining patiently all the details!
So we were posting simultameosly ;)
Tested some more shaders and everything works fine now. Thank You very much for Your great support and especially explaining patiently all the details!

My original display name is 3d4dd - for some reason Nvidia changed it..?!

Posted 10/27/2018 04:48 PM   
DarkStarSword: i managed to get a dump out of MGSV w/ 3Dmigoto by fiddling w/ the INI settings. i also got something that 'seems' usable according to dump folder's size from other games but every time i try to open any of those files in blender i get the following message: Only draw calls using a single vertex buffer and a single index buffer are supported for now. is there a way to manually import 'parts' of the dump into blender to see if i'm getting the right geometry out of 3Dmigoto? thanks i rip posed models from video games, clean them up in blender and post them on thingiverse so people can 3D print them.
DarkStarSword:

i managed to get a dump out of MGSV w/ 3Dmigoto by fiddling w/ the INI settings. i also got something that 'seems' usable according to dump folder's size from other games but every time i try to open any of those files in blender i get the following message:

Only draw calls using a single vertex buffer and a single index buffer are supported for now.

is there a way to manually import 'parts' of the dump into blender to see if i'm getting the right geometry out of 3Dmigoto?

thanks

i rip posed models from video games, clean them up in blender and post them on thingiverse so people can 3D print them.

Posted 11/12/2018 02:46 PM   
Are we really dumping models when extracting shaders from a game or do you export the models first and try to replicate the game's rendering with shaders?
Are we really dumping models when extracting shaders from a game or do you export the models first and try to replicate the game's rendering with shaders?

Thanks to everybody using my assembler it warms my heart.
To have a critical piece of code that everyone can enjoy!
What more can you ask for?

donations: ulfjalmbrant@hotmail.com

Posted 11/12/2018 02:51 PM   
[quote]Only draw calls using a single vertex buffer and a single index buffer are supported for now.[/quote]This is still on my TODO list - I've got a few people waiting on support for loading models spread across multiple buffers. I've got a trello card for it if you want to keep an eye on when I get to it: https://trello.com/c/x0XT1NO3/42-blender-addon-handle-meshes-using-multiple-vertex-buffers [quote] Are we really dumping models when extracting shaders from a game[/quote]It's not part of the ordinary workflow - it's something that can be enabled in frame analysis, and mostly is targeted at other modding communities that are using 3DMigoto to do mesh swaps (DOAXVV is the big one right now), and some people are now using 3DMigoto in place of dedicated tools like Ninja Ripper. [quote]try to replicate the game's rendering with shaders?[/quote] Some people are doing that to create fan art, e.g. shuubaru on Deviatart and everyone using his ports (NSFW so I won't link directly), and some people like fASE-2 are using it to create 3D printed objects.
Only draw calls using a single vertex buffer and a single index buffer are supported for now.
This is still on my TODO list - I've got a few people waiting on support for loading models spread across multiple buffers. I've got a trello card for it if you want to keep an eye on when I get to it:
https://trello.com/c/x0XT1NO3/42-blender-addon-handle-meshes-using-multiple-vertex-buffers

Are we really dumping models when extracting shaders from a game
It's not part of the ordinary workflow - it's something that can be enabled in frame analysis, and mostly is targeted at other modding communities that are using 3DMigoto to do mesh swaps (DOAXVV is the big one right now), and some people are now using 3DMigoto in place of dedicated tools like Ninja Ripper.

try to replicate the game's rendering with shaders?
Some people are doing that to create fan art, e.g. shuubaru on Deviatart and everyone using his ports (NSFW so I won't link directly), and some people like fASE-2 are using it to create 3D printed objects.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 11/13/2018 12:29 AM   
Is it possible to set up a hold toggle in the d3dx.ini? I want to hold a button and then release it so the shader will be off, hold it again to turn it on. I've tried "type = hold" but that requires you to hold the button down constantly to keep the shader off.
Is it possible to set up a hold toggle in the d3dx.ini? I want to hold a button and then release it so the shader will be off, hold it again to turn it on.

I've tried "type = hold" but that requires you to hold the button down constantly to keep the shader off.

1080 Ti - i7 5820k - 16Gb RAM - Win 10 version 1607 - ASUS VG236H (1920x1080@120Hz)

Posted 11/27/2018 06:36 PM   
Yep, combine type=toggle with a delay, like: [code] [KeyToggleMods] Key = no_modifiers F2 type = toggle delay = 1000 z = 0 run = CommandListToggleMods [/code]
Yep, combine type=toggle with a delay, like:

[KeyToggleMods]
Key = no_modifiers F2
type = toggle
delay = 1000
z = 0
run = CommandListToggleMods

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 11/28/2018 02:46 AM   
That works, thanks!
That works, thanks!

1080 Ti - i7 5820k - 16Gb RAM - Win 10 version 1607 - ASUS VG236H (1920x1080@120Hz)

Posted 11/28/2018 06:32 AM   
Hi, 3dmigoto is awesome! I've experienced texture hash collision when replacing textures with 3dmigoto, even with texture_hash = 1. I've had a situation where I swapped 2 textures, and the 2nd texture replaced the 1st one because they had the same hash. Would it be possible to add a capability to compare arbitrary pixels in a texture to avoid collision? Something like [TextureOverride1] hash = 12345678 match_pixel 1024*1 + 50 == FF FF FF ; or 00 00 00 FF 00 00 00 FF 00 00 00 FF .. whatever's the right value depending on texture format? match_pixel 1024*1 + 51 == 00 00 00 ; would check if pixels at (x,y) (50, 1) and (51,1) are white and black after checking the hash.. For example imagine it's a 1024 x 1024 texture.
Hi, 3dmigoto is awesome!

I've experienced texture hash collision when replacing textures with 3dmigoto, even with texture_hash = 1. I've had a situation where I swapped 2 textures, and the 2nd texture replaced the 1st one because they had the same hash.

Would it be possible to add a capability to compare arbitrary pixels in a texture to avoid collision?

Something like

[TextureOverride1]
hash = 12345678
match_pixel 1024*1 + 50 == FF FF FF ; or 00 00 00 FF 00 00 00 FF 00 00 00 FF .. whatever's the right value depending on texture format?
match_pixel 1024*1 + 51 == 00 00 00 ;

would check if pixels at (x,y) (50, 1) and (51,1) are white and black after checking the hash.. For example imagine it's a 1024 x 1024 texture.

Posted 12/02/2018 07:37 PM   
It's highly unlikely you are actually getting a hash collision if the textures are distinct. It is far more likely that the game has updated a texture after creating it, thereby causing the hash 3DMigoto calculated earlier to become stale. Textures dumped with frame analysis include warnings !M! !U! !C! and/or !S! in the filename signifying a possible hash contamination (the letters indicate what caused the contamination), and ShaderUsage.txt indicates this with hash_contaminated=true (additional details elsewhere in the file). You can tell 3DMigoto to recalculate the hashes in the most common* of these circumstances by enabling track_texture_updates=1, however there is a significant performance penalty associated with that that can harm the framerate of CPU bound players or games that play FMVs (improvements are planned in this area, but will require changes to the texture hashes). * exceptions include textures used as a render targets or UAVs, region copies and copies between distinct subresource levels - that later one I've seen in Far Cry 4 when changing the texture quality in the settings, but otherwise these generally do not represent cases you would be trying to match hashes on. We also never recalculate hashes on buffers, because those are updated constantly. We're not going to add individual pixels checks on the CPU - it would suffer exactly the same pitfalls as hashes do at the moment with no real added benefit (performance maybe, but only if we drop the hashes altogether). Plus, most games use block compression, so we'd have to decompress the textures to do pixel checks at all, adding extra overhead (block compression is efficient for GPUs to decompress, not CPUs), or we would have to do compressed byte checks for those cases. You can however do this on the GPU, e.g. I do this in WATCH_DOGS2 where stale hashes only affected a single texture and I didn't want to pay the penalty for enabling hash tracking: [code] // Textures we are going to match in the shader instead of the CPU: Texture2D<float4> ps_t0 : register(t100); Texture2D<float4> bullet : register(t101); bool textures_match(Texture2D<float4> tex1, Texture2D<float4> tex2) { uint w1, h1, w2, h2, x, y; tex1.GetDimensions(w1, h1); tex2.GetDimensions(w2, h2); if (w1 != w2 || h1 != h2) return false; for (y = 0; y < h1; y++) { for (x = 0; x < w1; x++) { if (any(tex1[uint2(x, y)] != tex2[uint2(x, y)])) return false; } } return true; } ... // Check against blacklisted textures that 3DMigoto can't filter due to hash // contamination (without incuring a performance penalty): if (!textures_match(ps_t0, bullet)) { to_screen_depth(o2); } [/code] [code] [ShaderOverrideWaypointMarkersMinimapPS] hash = 0632b4741fa82155 ; There are also some other textures that we don't want to return to screen ; depth, but that suffer from hash contamination and matching the hash from ; 3DMigoto is unreliable without track_texture_updates=1, but we don't want to ; enable that due to the performance impact it incurs. Instead, we are going to ; match these on the GPU. For now, we pass a copy of the textures to match into ; the shader: vs-t100 = ps-t0 vs-t101 = ResourceBullet post vs-t100 = null post vs-t101 = null ; Dump textures in mono so we can feed them back into the shaders to check for ; matches on the GPU: analyse_options = dump_tex dds mono [ResourceBullet] filename = ShaderFixes/2dcbd7e1-bullet.dds [/code] In this case the texture I was checking was very small, so I wasn't worried about the performance penalty associated with iterating over every pixel. For larger textures you could check a handful known distinct pixels, or use custom shaders to do the comparison in a parallel manner then downscale to get the result (map/reduce).
It's highly unlikely you are actually getting a hash collision if the textures are distinct. It is far more likely that the game has updated a texture after creating it, thereby causing the hash 3DMigoto calculated earlier to become stale. Textures dumped with frame analysis include warnings !M! !U! !C! and/or !S! in the filename signifying a possible hash contamination (the letters indicate what caused the contamination), and ShaderUsage.txt indicates this with hash_contaminated=true (additional details elsewhere in the file).

You can tell 3DMigoto to recalculate the hashes in the most common* of these circumstances by enabling track_texture_updates=1, however there is a significant performance penalty associated with that that can harm the framerate of CPU bound players or games that play FMVs (improvements are planned in this area, but will require changes to the texture hashes).

* exceptions include textures used as a render targets or UAVs, region copies and copies between distinct subresource levels - that later one I've seen in Far Cry 4 when changing the texture quality in the settings, but otherwise these generally do not represent cases you would be trying to match hashes on. We also never recalculate hashes on buffers, because those are updated constantly.

We're not going to add individual pixels checks on the CPU - it would suffer exactly the same pitfalls as hashes do at the moment with no real added benefit (performance maybe, but only if we drop the hashes altogether). Plus, most games use block compression, so we'd have to decompress the textures to do pixel checks at all, adding extra overhead (block compression is efficient for GPUs to decompress, not CPUs), or we would have to do compressed byte checks for those cases.

You can however do this on the GPU, e.g. I do this in WATCH_DOGS2 where stale hashes only affected a single texture and I didn't want to pay the penalty for enabling hash tracking:

// Textures we are going to match in the shader instead of the CPU:
Texture2D<float4> ps_t0 : register(t100);
Texture2D<float4> bullet : register(t101);

bool textures_match(Texture2D<float4> tex1, Texture2D<float4> tex2)
{
uint w1, h1, w2, h2, x, y;

tex1.GetDimensions(w1, h1);
tex2.GetDimensions(w2, h2);

if (w1 != w2 || h1 != h2)
return false;

for (y = 0; y < h1; y++) {
for (x = 0; x < w1; x++) {
if (any(tex1[uint2(x, y)] != tex2[uint2(x, y)]))
return false;
}
}

return true;
}

...

// Check against blacklisted textures that 3DMigoto can't filter due to hash
// contamination (without incuring a performance penalty):
if (!textures_match(ps_t0, bullet)) {
to_screen_depth(o2);
}

[ShaderOverrideWaypointMarkersMinimapPS]
hash = 0632b4741fa82155
; There are also some other textures that we don't want to return to screen
; depth, but that suffer from hash contamination and matching the hash from
; 3DMigoto is unreliable without track_texture_updates=1, but we don't want to
; enable that due to the performance impact it incurs. Instead, we are going to
; match these on the GPU. For now, we pass a copy of the textures to match into
; the shader:
vs-t100 = ps-t0
vs-t101 = ResourceBullet
post vs-t100 = null
post vs-t101 = null

; Dump textures in mono so we can feed them back into the shaders to check for
; matches on the GPU:
analyse_options = dump_tex dds mono

[ResourceBullet]
filename = ShaderFixes/2dcbd7e1-bullet.dds


In this case the texture I was checking was very small, so I wasn't worried about the performance penalty associated with iterating over every pixel. For larger textures you could check a handful known distinct pixels, or use custom shaders to do the comparison in a parallel manner then downscale to get the result (map/reduce).

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 12/03/2018 03:57 AM   
[center][color="orange"][size="XL"]3Dmigoto 1.3.12[/size] [/color][/center][center][color="green"][url]https://github.com/bo3b/3Dmigoto/releases[/url][/color][/center] This release is largely focussed on improving some aspects of shader hunting and the decompiler. [center][size="L"][color="green"]New Marking Actions[/color][/size][/center] There is a new 'marking_actions' option in the d3dx.ini to choose what happens when marking a shader or buffer: "hlsl" = decompile shader to HLSL and copy to ShaderFixes (as before) "asm" = disassemble shader and copy to ShaderFixes "clipboard" = copy shader/buffer hash to clipboard "mono_snapshot" = take mono screenshot (replaces old mark_snapshot=1 option) "stereo_snapshot" = take stereo screenshot (replaces old mark_snapshot=2 option) "snapshot_if_pink" = limit mono/stereo_snapshot to when marking_mode=pink If both "hlsl" and "asm" are specified 3DMigoto will first try to decompile a shader to HLSL. If that fails, or the decompiled shader fails to recompile 3DMigoto will then automatically switch to using assembly. In this configuration the failed HLSL shader will be included in a comment block at the end of the assembly shader (this will not happen if only "asm" is selected). "clipboard" is the most useful option for hunting vertex/index buffers, as it allows the hash to be quickly and accurately copied to another text file. It can also be useful to grab a hash to define a ShaderOverride section. In addition, copy on mark will now refuse to write out a shader if any version (HLSL, Assembly or Binary) of it already exists in ShaderFixes, and existing Assembly files will now be touched to appear as the most recently modified file in much the same way as we already do for existing HLSL files. The ShaderFixes folder itself will now also have its timestamp updated, which helps encourage Windows Explorer to refresh automatically. [center][size="L"][color="green"]Vertex Buffer Hunting Keys[/color][/size][/center] 3DMigoto can now hunt vertex buffers in the same way it could already hunt for index buffers or shaders. This is of most interest to the modding communities using 3DMigoto for mesh swaps, where Vertex buffers tend to be a more precise way to match a given mesh than an index buffer. Numpad /*- have been set as the vertex buffer hunting keys in the template d3dx.ini (since 789 were already set for index buffer hunting) - these keys were previously used for render target hunting, however since render target hunting is vary rarely useful it has been disabled in the template d3dx.ini - you can always uncomment it if you want it. [center][size="L"][color="green"]Decompiler[/color][/size][/center] The decompiler will now properly handle StructuredBuffers, RWStructuredBuffers and shared memory. If reflection information is available in the shader it will include the structure definition and decode the loads and stores to show the correct variable being accessed within the structure and should make it a lot easier to understand what shaders that use these StructuredBuffers are doing, since this is becoming more and more prevalent. Even if the decompiler fails on the shader for some other reason (as it does for all compute shaders), the fact that "marking_actions=hlsl asm" will include the HLSL with these decoded StructuredBuffers in a comment will still make life a lot easier. When reflection information is not available it will manufacture fake type information, much like the decompiler already does for constant buffers, so that the shader has a chance of compiling. This support will also handle the "fake" structured buffer support, where a structuredbuffer is just using a primitive type (such as float4) directly rather than a structure. cmd_Decompiler has been updated with this support as well. A small caution here: This update has required extensive updates to the HLSL and Binary decompilers, and while I have tested this fairly thoroughly there is a risk of new bugs appearing here. Please keep an eye out for any new problems or regressions in this version and report them. In addition: [list] [.]Fixed cases where hex literals such as l(0xffff, 0xffff, 0xffff, 0xffff) would be decompiled as l(0, 0, 0, 0)[/.] [.]Fixed ByteAddressBuffers being mistaken for StructuredBuffers (raw loads/stores still not supported)[/.] [.]Fixed a memory leak in the decompiler[/.] [.]Test suite added for the decompiler[/.] [.]cmd_Decompiler now has a --copy-reflection option to use reflection information from an existing shader binary[/.] [.]cmd_Decompiler now has a --version option to report its version on the command line[/.] [/list] [center][size="L"][color="green"]Raw Resources[/color][/size][/center] Support for accessing raw buffers (AKA ByteAddressBuffers) added to the arbitrary resource copying. You can now manipulate raw buffers created by the game, or make your own if you desire. [url=https://forums.geforce.com/default/topic/980906/3d-vision/resident-evil-7/post/5896712/#5896712]This post[/url] contains an example of converting a raw buffer to mono using this new support as one method to fix the 9 second issue in Resident Evil 7. Raw buffers may or may not have a type - if they do they can also be bound as a regular typed buffer, and 3DMigoto will treat them this way by default - in this case use the "raw" keyword to bind it as a ByteAddressBuffer instead. If a raw buffer lacks a type 3DMigoto will always treat it as raw. This is how you might go about creating your own raw resource if you needed one (raw resources are noteworthy in that they provide a means to perform atomic operations between threads in a compute shader): [code] [ResourceByteAddressable] type = ByteAddressBuffer format = R32_TYPELESS array = 50 [CustomShaderRawCS] cs-u0 = raw ResourceByteAddressable post cs-u0 = null [/code] [center][size="L"][color="green"]Misc[/color][/size][/center] [list] [.]The take_screenshot key (print screen by default) will now work when hunting is disabled. This triggers a 3D Vision screenshot much like Alt+F1, but always uses .pns and doesn't show the screenshot saved message.[/.] [.]Fix a bug causing crashes or otherwise interfering with full screen or resolution changes after dumping a shader when mark_snapshot was set to 2[/.] [.]Stereo snapshot on mark will now automatically fall back to a mono snapshot if stereo is disabled[/.] [.]Touching an existing file on mark will no longer add a space to the end of the file[/.] [/list]
3Dmigoto 1.3.12


This release is largely focussed on improving some aspects of shader hunting and the decompiler.

New Marking Actions

There is a new 'marking_actions' option in the d3dx.ini to choose what happens when marking a shader or buffer:

"hlsl" = decompile shader to HLSL and copy to ShaderFixes (as before)
"asm" = disassemble shader and copy to ShaderFixes
"clipboard" = copy shader/buffer hash to clipboard
"mono_snapshot" = take mono screenshot (replaces old mark_snapshot=1 option)
"stereo_snapshot" = take stereo screenshot (replaces old mark_snapshot=2 option)
"snapshot_if_pink" = limit mono/stereo_snapshot to when marking_mode=pink

If both "hlsl" and "asm" are specified 3DMigoto will first try to decompile a shader to HLSL. If that fails, or the decompiled shader fails to recompile 3DMigoto will then automatically switch to using assembly. In this configuration the failed HLSL shader will be included in a comment block at the end of the assembly shader (this will not happen if only "asm" is selected).

"clipboard" is the most useful option for hunting vertex/index buffers, as it allows the hash to be quickly and accurately copied to another text file. It can also be useful to grab a hash to define a ShaderOverride section.

In addition, copy on mark will now refuse to write out a shader if any version (HLSL, Assembly or Binary) of it already exists in ShaderFixes, and existing Assembly files will now be touched to appear as the most recently modified file in much the same way as we already do for existing HLSL files. The ShaderFixes folder itself will now also have its timestamp updated, which helps encourage Windows Explorer to refresh automatically.

Vertex Buffer Hunting Keys

3DMigoto can now hunt vertex buffers in the same way it could already hunt for index buffers or shaders. This is of most interest to the modding communities using 3DMigoto for mesh swaps, where Vertex buffers tend to be a more precise way to match a given mesh than an index buffer.

Numpad /*- have been set as the vertex buffer hunting keys in the template d3dx.ini (since 789 were already set for index buffer hunting) - these keys were previously used for render target hunting, however since render target hunting is vary rarely useful it has been disabled in the template d3dx.ini - you can always uncomment it if you want it.

Decompiler

The decompiler will now properly handle StructuredBuffers, RWStructuredBuffers and shared memory. If reflection information is available in the shader it will include the structure definition and decode the loads and stores to show the correct variable being accessed within the structure and should make it a lot easier to understand what shaders that use these StructuredBuffers are doing, since this is becoming more and more prevalent.

Even if the decompiler fails on the shader for some other reason (as it does for all compute shaders), the fact that "marking_actions=hlsl asm" will include the HLSL with these decoded StructuredBuffers in a comment will still make life a lot easier.

When reflection information is not available it will manufacture fake type information, much like the decompiler already does for constant buffers, so that the shader has a chance of compiling. This support will also handle the "fake" structured buffer support, where a structuredbuffer is just using a primitive type (such as float4) directly rather than a structure.

cmd_Decompiler has been updated with this support as well.

A small caution here: This update has required extensive updates to the HLSL and Binary decompilers, and while I have tested this fairly thoroughly there is a risk of new bugs appearing here. Please keep an eye out for any new problems or regressions in this version and report them.

In addition:
  • Fixed cases where hex literals such as l(0xffff, 0xffff, 0xffff, 0xffff) would be decompiled as l(0, 0, 0, 0)
  • Fixed ByteAddressBuffers being mistaken for StructuredBuffers (raw loads/stores still not supported)
  • Fixed a memory leak in the decompiler
  • Test suite added for the decompiler
  • cmd_Decompiler now has a --copy-reflection option to use reflection information from an existing shader binary
  • cmd_Decompiler now has a --version option to report its version on the command line


Raw Resources

Support for accessing raw buffers (AKA ByteAddressBuffers) added to the arbitrary resource copying. You can now manipulate raw buffers created by the game, or make your own if you desire. This post contains an example of converting a raw buffer to mono using this new support as one method to fix the 9 second issue in Resident Evil 7.

Raw buffers may or may not have a type - if they do they can also be bound as a regular typed buffer, and 3DMigoto will treat them this way by default - in this case use the "raw" keyword to bind it as a ByteAddressBuffer instead. If a raw buffer lacks a type 3DMigoto will always treat it as raw.

This is how you might go about creating your own raw resource if you needed one (raw resources are noteworthy in that they provide a means to perform atomic operations between threads in a compute shader):

[ResourceByteAddressable]
type = ByteAddressBuffer
format = R32_TYPELESS
array = 50

[CustomShaderRawCS]
cs-u0 = raw ResourceByteAddressable
post cs-u0 = null


Misc

  • The take_screenshot key (print screen by default) will now work when hunting is disabled. This triggers a 3D Vision screenshot much like Alt+F1, but always uses .pns and doesn't show the screenshot saved message.
  • Fix a bug causing crashes or otherwise interfering with full screen or resolution changes after dumping a shader when mark_snapshot was set to 2
  • Stereo snapshot on mark will now automatically fall back to a mono snapshot if stereo is disabled
  • Touching an existing file on mark will no longer add a space to the end of the file

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 12/03/2018 11:11 PM   
Thanks a lot DSS!!! Those new marking actions will be very usefull.
Thanks a lot DSS!!!
Those new marking actions will be very usefull.

MY WEB

Helix Mod - Making 3D Better

My 3D Screenshot Gallery

Like my fixes? you can donate to Paypal: dhr.donation@gmail.com

Posted 12/04/2018 12:00 PM   
[quote="DarkStarSword"]@Losti: This is an alternate to deny_cpu_read for CryEngine that performs better - this is from Lichdom, so it may be a different shader in KDC (or possibly other differences - let me know if you have trouble).[/quote]Here's an update to this that will hopefully eliminate any remaining flickering and can hopefully be adapted to solve your occlusion culling issues in Dead Rising as well. Whereas my prior attempt merged the left and right depth buffers together into some sort of weird hybrid, this attempt de-stereoises the depth buffer in a sort of reverse-compatibility mode operation so that the depth values should line up with what the game is expecting. Once again this is from Lichdom, so may have to be adapted for KCD and Dead Rising. I've put together two versions of this - one is more accurate, but uses some expensive atomic operations and two shaders, while the second skips those expensive operations to save performance at the cost of some corruption on the resulting depth buffer - whether that will be enough to matter or not I can't say, but it seems fine where I'm testing in Lichdom. I had some rather unexpected results using UAVs with pixel shaders while setting this up - the pixel shader ran for both eyes insofar as the bound depth buffer (ps-t0) and render target (o0) were concerned, but both eyes ended up writing to just the right eye's view of the UAV (ps-u1). I've ended up exploiting that a little below, but it's worth checking if the debug output looks correct in case there are any differences for you. For reference I had SLI *enabled* while doing this and StereoFlagsDX10 set to 0x00000008 (either of which could conceivably affect this, though I haven't experimented with different options as yet). [code] [ShaderOverrideFlicker] hash = 630978719b4885a7 ; This shader downscales the depth buffer prior to sending it to the CPU for ; occlusion culling. The CPU will only get a mono copy, which results in ; incorrect occlusion culling and appears as flickering geometry in the ; distance that is just to the right of something in the foreground. ; if stereo_active ; CHOOSE ONE (FIXME: I should switch this over to "post" for better ; performance, but that changes which binds are used in a few places ; and I wanted this closer to what will be required for Dead Rising): ;run = CustomShaderOcclusionCullingFlattenDepthBufferWithAtomicOps ;run = CustomShaderOcclusionCullingFlattenDepthBufferRaceCondition endif ; The depth buffer used for occlusion culling is a weird scale that complicates ; finding the linear depth. Grab the regular depth target instead: [ResourceZBuffer] [ClearDepthStencilView] ResourceZBuffer = ref this [/code] [color="green"][size="L"]Version 1:[/size][/color] No race condition so has a cleaner resulting depth buffer, but at the cost of having worse performance: [code] [ResourceOcclusionCullingFlattenDepthBufferUint] ; This resource is used for atomic operations to avoid race conditions, and ; must be a uint. It seems (rather unexpectedly, and possibly due to a driver ; bug) that when used as a pixel shader UAV both eyes will write to just one ; side of this buffer if it is forced stereo, so may as well just keep it mono: format = R32_UINT mode = mono [ResourceOcclusionCullingFlattenDepthBufferFloat] ; This buffer serves two purposes: ; 1. During the first flattening stage it's width, height and stereo determines ; how many pixel shader invocations will run, but the pixel shader will not ; write to it. We want depth values to be combined from both eyes, so we set ; it stereo. ; 2. This buffer is written to by the second shader that translates the ; normalised depth values in the uint buffer back to the original floating ; point range that the game is expecting. Since this has already been ; reduced to mono in phase 1 we would only need mono for this, but requiring ; stereo for 1. trumps this. ; Part of the reason these two purposes are conflated in a single buffer is in ; part due to DirectX API limitations - if we bind a different o0 for the ; second shader, DirectX will unbind ps-u1. That's not a show stopper, but it ; would mean having to perform a bit of a dance to get things where we want ; them, and having this one buffer serve two purposes allows us to skip ; changing any of the render target or UAV binds between shaders. mode = stereo [CustomShaderOcclusionCullingFlattenDepthBufferWithAtomicOps] ; This shader will take a stereo depth buffer and perform a sort of ; reverse-compatibility mode transformation on it to move the depth values to ; their mono locations so that once the occlusion culling buffer is read out on ; the CPU things will more or less match up with what it expects. ps = ShaderFixes\cryengine_occlusion_culling_flatten_depth_buffer.hlsl ; ; We only want to invoke this once during the downscaling process. Dead ; Rising's frame analysis looks like there are two indenendent occlusion ; culling buffers being read out, so this should possibly be removed for that ; game: max_executions_per_frame = 1 ; ; ps-t0 already contains the depth buffer we need to flatten (game specific) ; ; Create our UAVs and render targets to match the depth buffer size in ps-t0: ResourceOcclusionCullingFlattenDepthBufferUint = copy_desc ps-t0 ResourceOcclusionCullingFlattenDepthBufferFloat = copy_desc ps-t0 ; ; Clear the flattened depth buffer we are using as a UAV: clear = ResourceOcclusionCullingFlattenDepthBufferUint 0 ; ; Remove any potentially incompatible render targets: run = BuiltInCommandListUnbindAllRenderTargets ; ; o0 must be set to a resource the same size as the depth buffer we are working ; on as this determines how many pixel shader invocations will run, even though ; this shader does not actually write to it: o0 = set_viewport ResourceOcclusionCullingFlattenDepthBufferFloat ; ; Bind the flattened depth buffer as a UAV to allow the pixel shader to draw ; arbitrary pixels rather than just SV_Target: ps-u1 = ResourceOcclusionCullingFlattenDepthBufferUint ; ; Bind the PER_FRAME constant buffer to get the view-projection matrix and ; calculate linear depth (game specific): ps-cb13 = vs-cb3 ; ; Bind the depth buffer used to calculate the linear depth (but in this game's ; case not the same depth values we will be passing to the CPU): ps-t100 = ResourceZBuffer ; ; Invoke the flattening shader: draw = from_caller ; ; Load up the shader for the next stage with o0 and ps-u1 still bound: post run = CustomShaderOcclusionCullingDenormaliseDepthBuffer [CustomShaderOcclusionCullingDenormaliseDepthBuffer] ; This shader converts the normalised depth values back to the range the game is after. ps = ShaderFixes\cryengine_occlusion_culling_denormalise_depth_buffer.hlsl ; ; ps-u1 is still bound with the normalised flattened depth buffer and o0 is ; still bound with the floating point flattened depth buffer (see above for an ; explanation). ; ; Invoke the denormalising shader: draw = from_caller ; ; Bind the output back to the input for the game to use in the next stage. ; Since DirectX forbids resources being bound as both input and output ; simultaneously we unbind it from o0 first (or, if we did this outside the ; CustomShader we could rely on 3DMigoto having restored the original o0): post o0 = null post ps-t0 = ResourceOcclusionCullingFlattenDepthBufferFloat ; ; Clean up anything we bound that 3DMigoto won't automatically restore (the ; CustomShader section will take care of restoring o0 and ps-u1) post ps-cb13 = null post ps-t100 = null ; ; Optional debugging to view the resulting flattened depth buffer: ;post Resource\ShaderFixes\debug_2d.ini\Debug2D = ref ResourceOcclusionCullingFlattenDepthBufferFloat [/code] cryengine_occlusion_culling_flatten_depth_buffer.hlsl: [code] Texture2D<float4> StereoParams : register(t125); #define separation StereoParams.Load(0).x #define convergence StereoParams.Load(0).y #define eye StereoParams.Load(0).z #ifdef PIXEL_SHADER // Depth buffer game wants to pass back to the CPU: Texture2D<float4> occlusion_culling_depth_buffer : register(t0); // Flattened depth buffer. Normalised uint version to support atomic operations: RWTexture2D<uint> flattened_depth_buffer_uint : register(u1); // Depth buffer used throughout the game, for ease of calculating linear depth: Texture2D<float4> z_buffer : register(t100); // Copied from an appropriate shader with 3DMigoto: cbuffer PER_FRAME : register(b13) { row_major float4x4 g_VS_ViewProjMatr : packoffset(c0); float4 g_VS_WorldViewPos : packoffset(c6); row_major float4x4 g_VS_ViewProjZeroMatr : packoffset(c10); row_major float4x4 unknown : packoffset(c14); } matrix inverse(matrix m) { matrix inv; float det = determinant(m); inv[0].x = m[1].y*(m[2].z*m[3].w - m[2].w*m[3].z) + m[1].z*(m[2].w*m[3].y - m[2].y*m[3].w) + m[1].w*(m[2].y*m[3].z - m[2].z*m[3].y); inv[0].y = m[0].y*(m[2].w*m[3].z - m[2].z*m[3].w) + m[0].z*(m[2].y*m[3].w - m[2].w*m[3].y) + m[0].w*(m[2].z*m[3].y - m[2].y*m[3].z); inv[0].z = m[0].y*(m[1].z*m[3].w - m[1].w*m[3].z) + m[0].z*(m[1].w*m[3].y - m[1].y*m[3].w) + m[0].w*(m[1].y*m[3].z - m[1].z*m[3].y); inv[0].w = m[0].y*(m[1].w*m[2].z - m[1].z*m[2].w) + m[0].z*(m[1].y*m[2].w - m[1].w*m[2].y) + m[0].w*(m[1].z*m[2].y - m[1].y*m[2].z); inv[1].x = m[1].x*(m[2].w*m[3].z - m[2].z*m[3].w) + m[1].z*(m[2].x*m[3].w - m[2].w*m[3].x) + m[1].w*(m[2].z*m[3].x - m[2].x*m[3].z); inv[1].y = m[0].x*(m[2].z*m[3].w - m[2].w*m[3].z) + m[0].z*(m[2].w*m[3].x - m[2].x*m[3].w) + m[0].w*(m[2].x*m[3].z - m[2].z*m[3].x); inv[1].z = m[0].x*(m[1].w*m[3].z - m[1].z*m[3].w) + m[0].z*(m[1].x*m[3].w - m[1].w*m[3].x) + m[0].w*(m[1].z*m[3].x - m[1].x*m[3].z); inv[1].w = m[0].x*(m[1].z*m[2].w - m[1].w*m[2].z) + m[0].z*(m[1].w*m[2].x - m[1].x*m[2].w) + m[0].w*(m[1].x*m[2].z - m[1].z*m[2].x); inv[2].x = m[1].x*(m[2].y*m[3].w - m[2].w*m[3].y) + m[1].y*(m[2].w*m[3].x - m[2].x*m[3].w) + m[1].w*(m[2].x*m[3].y - m[2].y*m[3].x); inv[2].y = m[0].x*(m[2].w*m[3].y - m[2].y*m[3].w) + m[0].y*(m[2].x*m[3].w - m[2].w*m[3].x) + m[0].w*(m[2].y*m[3].x - m[2].x*m[3].y); inv[2].z = m[0].x*(m[1].y*m[3].w - m[1].w*m[3].y) + m[0].y*(m[1].w*m[3].x - m[1].x*m[3].w) + m[0].w*(m[1].x*m[3].y - m[1].y*m[3].x); inv[2].w = m[0].x*(m[1].w*m[2].y - m[1].y*m[2].w) + m[0].y*(m[1].x*m[2].w - m[1].w*m[2].x) + m[0].w*(m[1].y*m[2].x - m[1].x*m[2].y); inv[3].x = m[1].x*(m[2].z*m[3].y - m[2].y*m[3].z) + m[1].y*(m[2].x*m[3].z - m[2].z*m[3].x) + m[1].z*(m[2].y*m[3].x - m[2].x*m[3].y); inv[3].y = m[0].x*(m[2].y*m[3].z - m[2].z*m[3].y) + m[0].y*(m[2].z*m[3].x - m[2].x*m[3].z) + m[0].z*(m[2].x*m[3].y - m[2].y*m[3].x); inv[3].z = m[0].x*(m[1].z*m[3].y - m[1].y*m[3].z) + m[0].y*(m[1].x*m[3].z - m[1].z*m[3].x) + m[0].z*(m[1].y*m[3].x - m[1].x*m[3].y); inv[3].w = m[0].x*(m[1].y*m[2].z - m[1].z*m[2].y) + m[0].y*(m[1].z*m[2].x - m[1].x*m[2].z) + m[0].z*(m[1].x*m[2].y - m[1].y*m[2].x); inv /= det; return inv; } void main(float4 pos : SV_Position0) { float width, height, x = pos.x, y = pos.y, z, w; uint normalised_depth; float4 tmp; // Game specific: Calculate linear depth: z = z_buffer.Load(int3(pos.x, pos.y, 0)).x; tmp = mul(inverse(g_VS_ViewProjMatr), float4(0, 0, z, 1)); w = mul(g_VS_ViewProjMatr, tmp / tmp.w).w; // Calculate mono position: flattened_depth_buffer_uint.GetDimensions(width, height); x -= separation * (w - convergence) / w / 2 * width; // Since multiple pixels may try to write to the same pixel, we use // atomic operations to avoid race conditions. Atomic operations can // only be done using uint resources, so we scale the non-linear depth // values up to take the full range of a 32bit uint. normalised_depth = occlusion_culling_depth_buffer.Load(int3(pos.x, pos.y, 0)).x * (float)0xffffffff; InterlockedMax(flattened_depth_buffer_uint[int2(x, y)], normalised_depth); } #endif /* PIXEL_SHADER */ [/code] cryengine_occlusion_culling_denormalise_depth_buffer.hlsl: [code] #ifdef PIXEL_SHADER // Flattened depth buffer. Normalised uint version to support atomic operations: RWTexture2D<uint> flattened_depth_buffer_uint : register(u1); void main(float4 pos : SV_Position0, out float4 o0 : SV_Target0) { uint normalised_depth = flattened_depth_buffer_uint[int2(pos.x, pos.y)]; if (normalised_depth) { o0 = (float)normalised_depth / (float)0xffffffff; } else { // This pixel was missed during the flattening process and we // need to make something up. For now trying maximum depth to // avoid false culling, but there might be enough of these that // it prevents any occlusion culling defeating the performance // gains of using it in the first place... //o0 = 1; // Alternatively, maybe we should set the minimum value on the // assumption the holes will be small enough not to matter and // that this will allow for more occlusion culling and better // performance? o0 = 0; // Alternatively, we could search through u1 to find nearby // values and interpolate around them. } } #endif /* PIXEL_SHADER */ [/code] If this is working correctly, the debug output enabled in the final line of the above d3dx.ini excerpt should look like this (grab my debug_2d shader and change the blend setting in debug_2d.ini to "blend = ADD ONE ONE"): [url=https://forums.geforce.com/cmd/default/download-comment-attachment/76596/][img]https://forums.geforce.com/cmd/default/download-comment-attachment/76596/[/img][/url] Note that the objects in the red depth buffer are now back where they would have been placed originally, so that when the game uses this for occlusion culling things will line up. The performance monitor shows around 76fps here. [color="green"][size="L"]Version 2:[/size][/color] This skips the atomic operations and 2nd shader necessary to avoid the race conditions, so will perform better, but the depth buffer it produces has some corruption where multiple pixels wrote to the same destination that may or may not have an impact on occlusion culling: [code] [ResourceOcclusionCullingFlattenDepthBufferRaceCondition] mode = mono [CustomShaderOcclusionCullingFlattenDepthBufferRaceCondition] ; This is similar to the above, but skips the atomic operations, possibly ; improving performance at the expense of having a partially broken depth ; buffer we send back to the CPU - whether that is acceptable or not is yet to ; be seen. ps = ShaderFixes\cryengine_occlusion_culling_flatten_depth_buffer_race_condition.hlsl ; ; We only want to invoke this once during the downscaling process. Dead ; Rising's frame analysis looks like there are two indenendent occlusion ; culling buffers being read out, so this should possibly be removed for that ; game: max_executions_per_frame = 1 ; ; ps-t0 already contains the depth buffer we need to flatten (game specific) ; ; Create our UAVs and render targets to match the depth buffer size in ps-t0: ResourceOcclusionCullingFlattenDepthBufferFloat = copy_desc ps-t0 ResourceOcclusionCullingFlattenDepthBufferRaceCondition = copy_desc ps-t0 ; ; Clear the flattened depth buffer with whatever value we want missing pixels ; to contain (0 for near, 1 for far). May have some trade off on performance ; vs false occlusion culling? clear = ResourceOcclusionCullingFlattenDepthBufferRaceCondition 0 ; ; Remove any potentially incompatible render targets: run = BuiltInCommandListUnbindAllRenderTargets ; ; o0 must be set to a resource the same size as the depth buffer we are working ; on as this determines how many pixel shader invocations will run, even though ; this shader does not actually write to it: o0 = set_viewport ResourceOcclusionCullingFlattenDepthBufferFloat ; ; Bind the flattened depth buffer as a UAV to allow the pixel shader to draw ; arbitrary pixels rather than just SV_Target: ps-u1 = ResourceOcclusionCullingFlattenDepthBufferRaceCondition ; ; Bind the PER_FRAME constant buffer to get the view-projection matrix and ; calculate linear depth (game specific): ps-cb13 = vs-cb3 ; ; Bind the depth buffer used to calculate the linear depth (but in this game's ; case not the same depth values we will be passing to the CPU): ps-t100 = ResourceZBuffer ; ; Invoke the flattening shader: draw = from_caller ; ; Bind the output back to the input for the game to use in the next stage. ; Since DirectX forbids resources being bound as both input and output ; simultaneously we unbind it from o0 first (or, if we did this outside the ; CustomShader we could rely on 3DMigoto having restored the original o0): post o0 = null post ps-t0 = ResourceOcclusionCullingFlattenDepthBufferRaceCondition ; ; Clean up anything we bound that 3DMigoto won't automatically restore (the ; CustomShader section will take care of restoring o0 and ps-u1) post ps-cb13 = null post ps-t100 = null ; ; Optional debugging to view the resulting flattened depth buffer: ;post Resource\ShaderFixes\debug_2d.ini\Debug2D = ref ResourceOcclusionCullingFlattenDepthBufferRaceCondition [/code] cryengine_occlusion_culling_flatten_depth_buffer_race_condition.hlsl: [code] Texture2D<float4> StereoParams : register(t125); #define separation StereoParams.Load(0).x #define convergence StereoParams.Load(0).y #define eye StereoParams.Load(0).z #ifdef PIXEL_SHADER // Depth buffer game wants to pass back to the CPU Texture2D<float4> occlusion_culling_depth_buffer : register(t0); // Flattened depth buffer. Floating point version we pass back to the CPU directly: RWTexture2D<float> flattened_depth_buffer : register(u1); // Depth buffer used throughout the game, for ease of calculating linear depth: Texture2D<float4> z_buffer : register(t100); // Copied from an appropriate shader with 3DMigoto: cbuffer PER_FRAME : register(b13) { row_major float4x4 g_VS_ViewProjMatr : packoffset(c0); float4 g_VS_WorldViewPos : packoffset(c6); row_major float4x4 g_VS_ViewProjZeroMatr : packoffset(c10); row_major float4x4 unknown : packoffset(c14); } matrix inverse(matrix m) { matrix inv; float det = determinant(m); inv[0].x = m[1].y*(m[2].z*m[3].w - m[2].w*m[3].z) + m[1].z*(m[2].w*m[3].y - m[2].y*m[3].w) + m[1].w*(m[2].y*m[3].z - m[2].z*m[3].y); inv[0].y = m[0].y*(m[2].w*m[3].z - m[2].z*m[3].w) + m[0].z*(m[2].y*m[3].w - m[2].w*m[3].y) + m[0].w*(m[2].z*m[3].y - m[2].y*m[3].z); inv[0].z = m[0].y*(m[1].z*m[3].w - m[1].w*m[3].z) + m[0].z*(m[1].w*m[3].y - m[1].y*m[3].w) + m[0].w*(m[1].y*m[3].z - m[1].z*m[3].y); inv[0].w = m[0].y*(m[1].w*m[2].z - m[1].z*m[2].w) + m[0].z*(m[1].y*m[2].w - m[1].w*m[2].y) + m[0].w*(m[1].z*m[2].y - m[1].y*m[2].z); inv[1].x = m[1].x*(m[2].w*m[3].z - m[2].z*m[3].w) + m[1].z*(m[2].x*m[3].w - m[2].w*m[3].x) + m[1].w*(m[2].z*m[3].x - m[2].x*m[3].z); inv[1].y = m[0].x*(m[2].z*m[3].w - m[2].w*m[3].z) + m[0].z*(m[2].w*m[3].x - m[2].x*m[3].w) + m[0].w*(m[2].x*m[3].z - m[2].z*m[3].x); inv[1].z = m[0].x*(m[1].w*m[3].z - m[1].z*m[3].w) + m[0].z*(m[1].x*m[3].w - m[1].w*m[3].x) + m[0].w*(m[1].z*m[3].x - m[1].x*m[3].z); inv[1].w = m[0].x*(m[1].z*m[2].w - m[1].w*m[2].z) + m[0].z*(m[1].w*m[2].x - m[1].x*m[2].w) + m[0].w*(m[1].x*m[2].z - m[1].z*m[2].x); inv[2].x = m[1].x*(m[2].y*m[3].w - m[2].w*m[3].y) + m[1].y*(m[2].w*m[3].x - m[2].x*m[3].w) + m[1].w*(m[2].x*m[3].y - m[2].y*m[3].x); inv[2].y = m[0].x*(m[2].w*m[3].y - m[2].y*m[3].w) + m[0].y*(m[2].x*m[3].w - m[2].w*m[3].x) + m[0].w*(m[2].y*m[3].x - m[2].x*m[3].y); inv[2].z = m[0].x*(m[1].y*m[3].w - m[1].w*m[3].y) + m[0].y*(m[1].w*m[3].x - m[1].x*m[3].w) + m[0].w*(m[1].x*m[3].y - m[1].y*m[3].x); inv[2].w = m[0].x*(m[1].w*m[2].y - m[1].y*m[2].w) + m[0].y*(m[1].x*m[2].w - m[1].w*m[2].x) + m[0].w*(m[1].y*m[2].x - m[1].x*m[2].y); inv[3].x = m[1].x*(m[2].z*m[3].y - m[2].y*m[3].z) + m[1].y*(m[2].x*m[3].z - m[2].z*m[3].x) + m[1].z*(m[2].y*m[3].x - m[2].x*m[3].y); inv[3].y = m[0].x*(m[2].y*m[3].z - m[2].z*m[3].y) + m[0].y*(m[2].z*m[3].x - m[2].x*m[3].z) + m[0].z*(m[2].x*m[3].y - m[2].y*m[3].x); inv[3].z = m[0].x*(m[1].z*m[3].y - m[1].y*m[3].z) + m[0].y*(m[1].x*m[3].z - m[1].z*m[3].x) + m[0].z*(m[1].y*m[3].x - m[1].x*m[3].y); inv[3].w = m[0].x*(m[1].y*m[2].z - m[1].z*m[2].y) + m[0].y*(m[1].z*m[2].x - m[1].x*m[2].z) + m[0].z*(m[1].x*m[2].y - m[1].y*m[2].x); inv /= det; return inv; } void main(float4 pos : SV_Position0) { float width, height, x = pos.x, y = pos.y, z, w; float4 tmp; // Game specific: Calculate linear depth: z = z_buffer.Load(int3(pos.x, pos.y, 0)).x; tmp = mul(inverse(g_VS_ViewProjMatr), float4(0, 0, z, 1)); w = mul(g_VS_ViewProjMatr, tmp / tmp.w).w; // Calculate mono position: flattened_depth_buffer.GetDimensions(width, height); x -= separation * (w - convergence) / w / 2 * width; // Write the output depth shifted back to the mono position. Note that // since multiple pixels may be writing to the same target pixel this // is racy and pixels may end up with either depth value, possibly // still leading to false occlusion culling in some situations? flattened_depth_buffer[int2(x, y)] = occlusion_culling_depth_buffer.Load(int3(pos.x, pos.y, 0)).x; } #endif /* PIXEL_SHADER */ [/code] If this is working correctly, the debug output enabled in the final line of the above d3dx.ini excerpt should look like this (grab my debug_2d shader and change the blend setting in debug_2d.ini to "blend = ADD ONE ONE"): [url=https://forums.geforce.com/cmd/default/download-comment-attachment/76597/][img]https://forums.geforce.com/cmd/default/download-comment-attachment/76597/[/img][/url] Note that in this case the edges of objects on the red depth buffer show a slight corruption pattern. The performance monitor shows around 90fps here, 14fps more than the accurate version, though that performance difference narrows considerably in other areas.
DarkStarSword said:@Losti: This is an alternate to deny_cpu_read for CryEngine that performs better - this is from Lichdom, so it may be a different shader in KDC (or possibly other differences - let me know if you have trouble).
Here's an update to this that will hopefully eliminate any remaining flickering and can hopefully be adapted to solve your occlusion culling issues in Dead Rising as well. Whereas my prior attempt merged the left and right depth buffers together into some sort of weird hybrid, this attempt de-stereoises the depth buffer in a sort of reverse-compatibility mode operation so that the depth values should line up with what the game is expecting. Once again this is from Lichdom, so may have to be adapted for KCD and Dead Rising.

I've put together two versions of this - one is more accurate, but uses some expensive atomic operations and two shaders, while the second skips those expensive operations to save performance at the cost of some corruption on the resulting depth buffer - whether that will be enough to matter or not I can't say, but it seems fine where I'm testing in Lichdom.

I had some rather unexpected results using UAVs with pixel shaders while setting this up - the pixel shader ran for both eyes insofar as the bound depth buffer (ps-t0) and render target (o0) were concerned, but both eyes ended up writing to just the right eye's view of the UAV (ps-u1). I've ended up exploiting that a little below, but it's worth checking if the debug output looks correct in case there are any differences for you. For reference I had SLI *enabled* while doing this and StereoFlagsDX10 set to 0x00000008 (either of which could conceivably affect this, though I haven't experimented with different options as yet).

[ShaderOverrideFlicker]
hash = 630978719b4885a7
; This shader downscales the depth buffer prior to sending it to the CPU for
; occlusion culling. The CPU will only get a mono copy, which results in
; incorrect occlusion culling and appears as flickering geometry in the
; distance that is just to the right of something in the foreground.
;
if stereo_active
; CHOOSE ONE (FIXME: I should switch this over to "post" for better
; performance, but that changes which binds are used in a few places
; and I wanted this closer to what will be required for Dead Rising):
;run = CustomShaderOcclusionCullingFlattenDepthBufferWithAtomicOps
;run = CustomShaderOcclusionCullingFlattenDepthBufferRaceCondition
endif

; The depth buffer used for occlusion culling is a weird scale that complicates
; finding the linear depth. Grab the regular depth target instead:
[ResourceZBuffer]
[ClearDepthStencilView]
ResourceZBuffer = ref this

Version 1:
No race condition so has a cleaner resulting depth buffer, but at the cost of having worse performance:

[ResourceOcclusionCullingFlattenDepthBufferUint]
; This resource is used for atomic operations to avoid race conditions, and
; must be a uint. It seems (rather unexpectedly, and possibly due to a driver
; bug) that when used as a pixel shader UAV both eyes will write to just one
; side of this buffer if it is forced stereo, so may as well just keep it mono:
format = R32_UINT
mode = mono
[ResourceOcclusionCullingFlattenDepthBufferFloat]
; This buffer serves two purposes:
; 1. During the first flattening stage it's width, height and stereo determines
; how many pixel shader invocations will run, but the pixel shader will not
; write to it. We want depth values to be combined from both eyes, so we set
; it stereo.
; 2. This buffer is written to by the second shader that translates the
; normalised depth values in the uint buffer back to the original floating
; point range that the game is expecting. Since this has already been
; reduced to mono in phase 1 we would only need mono for this, but requiring
; stereo for 1. trumps this.
; Part of the reason these two purposes are conflated in a single buffer is in
; part due to DirectX API limitations - if we bind a different o0 for the
; second shader, DirectX will unbind ps-u1. That's not a show stopper, but it
; would mean having to perform a bit of a dance to get things where we want
; them, and having this one buffer serve two purposes allows us to skip
; changing any of the render target or UAV binds between shaders.
mode = stereo

[CustomShaderOcclusionCullingFlattenDepthBufferWithAtomicOps]
; This shader will take a stereo depth buffer and perform a sort of
; reverse-compatibility mode transformation on it to move the depth values to
; their mono locations so that once the occlusion culling buffer is read out on
; the CPU things will more or less match up with what it expects.
ps = ShaderFixes\cryengine_occlusion_culling_flatten_depth_buffer.hlsl
;
; We only want to invoke this once during the downscaling process. Dead
; Rising's frame analysis looks like there are two indenendent occlusion
; culling buffers being read out, so this should possibly be removed for that
; game:
max_executions_per_frame = 1
;
; ps-t0 already contains the depth buffer we need to flatten (game specific)
;
; Create our UAVs and render targets to match the depth buffer size in ps-t0:
ResourceOcclusionCullingFlattenDepthBufferUint = copy_desc ps-t0
ResourceOcclusionCullingFlattenDepthBufferFloat = copy_desc ps-t0
;
; Clear the flattened depth buffer we are using as a UAV:
clear = ResourceOcclusionCullingFlattenDepthBufferUint 0
;
; Remove any potentially incompatible render targets:
run = BuiltInCommandListUnbindAllRenderTargets
;
; o0 must be set to a resource the same size as the depth buffer we are working
; on as this determines how many pixel shader invocations will run, even though
; this shader does not actually write to it:
o0 = set_viewport ResourceOcclusionCullingFlattenDepthBufferFloat
;
; Bind the flattened depth buffer as a UAV to allow the pixel shader to draw
; arbitrary pixels rather than just SV_Target:
ps-u1 = ResourceOcclusionCullingFlattenDepthBufferUint
;
; Bind the PER_FRAME constant buffer to get the view-projection matrix and
; calculate linear depth (game specific):
ps-cb13 = vs-cb3
;
; Bind the depth buffer used to calculate the linear depth (but in this game's
; case not the same depth values we will be passing to the CPU):
ps-t100 = ResourceZBuffer
;
; Invoke the flattening shader:
draw = from_caller
;
; Load up the shader for the next stage with o0 and ps-u1 still bound:
post run = CustomShaderOcclusionCullingDenormaliseDepthBuffer
[CustomShaderOcclusionCullingDenormaliseDepthBuffer]
; This shader converts the normalised depth values back to the range the game is after.
ps = ShaderFixes\cryengine_occlusion_culling_denormalise_depth_buffer.hlsl
;
; ps-u1 is still bound with the normalised flattened depth buffer and o0 is
; still bound with the floating point flattened depth buffer (see above for an
; explanation).
;
; Invoke the denormalising shader:
draw = from_caller
;
; Bind the output back to the input for the game to use in the next stage.
; Since DirectX forbids resources being bound as both input and output
; simultaneously we unbind it from o0 first (or, if we did this outside the
; CustomShader we could rely on 3DMigoto having restored the original o0):
post o0 = null
post ps-t0 = ResourceOcclusionCullingFlattenDepthBufferFloat
;
; Clean up anything we bound that 3DMigoto won't automatically restore (the
; CustomShader section will take care of restoring o0 and ps-u1)
post ps-cb13 = null
post ps-t100 = null
;
; Optional debugging to view the resulting flattened depth buffer:
;post Resource\ShaderFixes\debug_2d.ini\Debug2D = ref ResourceOcclusionCullingFlattenDepthBufferFloat

cryengine_occlusion_culling_flatten_depth_buffer.hlsl:

Texture2D<float4> StereoParams : register(t125);
#define separation StereoParams.Load(0).x
#define convergence StereoParams.Load(0).y
#define eye StereoParams.Load(0).z

#ifdef PIXEL_SHADER
// Depth buffer game wants to pass back to the CPU:
Texture2D<float4> occlusion_culling_depth_buffer : register(t0);
// Flattened depth buffer. Normalised uint version to support atomic operations:
RWTexture2D<uint> flattened_depth_buffer_uint : register(u1);
// Depth buffer used throughout the game, for ease of calculating linear depth:
Texture2D<float4> z_buffer : register(t100);

// Copied from an appropriate shader with 3DMigoto:
cbuffer PER_FRAME : register(b13)
{
row_major float4x4 g_VS_ViewProjMatr : packoffset(c0);
float4 g_VS_WorldViewPos : packoffset(c6);
row_major float4x4 g_VS_ViewProjZeroMatr : packoffset(c10);
row_major float4x4 unknown : packoffset(c14);
}

matrix inverse(matrix m)
{
matrix inv;

float det = determinant(m);
inv[0].x = m[1].y*(m[2].z*m[3].w - m[2].w*m[3].z) + m[1].z*(m[2].w*m[3].y - m[2].y*m[3].w) + m[1].w*(m[2].y*m[3].z - m[2].z*m[3].y);
inv[0].y = m[0].y*(m[2].w*m[3].z - m[2].z*m[3].w) + m[0].z*(m[2].y*m[3].w - m[2].w*m[3].y) + m[0].w*(m[2].z*m[3].y - m[2].y*m[3].z);
inv[0].z = m[0].y*(m[1].z*m[3].w - m[1].w*m[3].z) + m[0].z*(m[1].w*m[3].y - m[1].y*m[3].w) + m[0].w*(m[1].y*m[3].z - m[1].z*m[3].y);
inv[0].w = m[0].y*(m[1].w*m[2].z - m[1].z*m[2].w) + m[0].z*(m[1].y*m[2].w - m[1].w*m[2].y) + m[0].w*(m[1].z*m[2].y - m[1].y*m[2].z);
inv[1].x = m[1].x*(m[2].w*m[3].z - m[2].z*m[3].w) + m[1].z*(m[2].x*m[3].w - m[2].w*m[3].x) + m[1].w*(m[2].z*m[3].x - m[2].x*m[3].z);
inv[1].y = m[0].x*(m[2].z*m[3].w - m[2].w*m[3].z) + m[0].z*(m[2].w*m[3].x - m[2].x*m[3].w) + m[0].w*(m[2].x*m[3].z - m[2].z*m[3].x);
inv[1].z = m[0].x*(m[1].w*m[3].z - m[1].z*m[3].w) + m[0].z*(m[1].x*m[3].w - m[1].w*m[3].x) + m[0].w*(m[1].z*m[3].x - m[1].x*m[3].z);
inv[1].w = m[0].x*(m[1].z*m[2].w - m[1].w*m[2].z) + m[0].z*(m[1].w*m[2].x - m[1].x*m[2].w) + m[0].w*(m[1].x*m[2].z - m[1].z*m[2].x);
inv[2].x = m[1].x*(m[2].y*m[3].w - m[2].w*m[3].y) + m[1].y*(m[2].w*m[3].x - m[2].x*m[3].w) + m[1].w*(m[2].x*m[3].y - m[2].y*m[3].x);
inv[2].y = m[0].x*(m[2].w*m[3].y - m[2].y*m[3].w) + m[0].y*(m[2].x*m[3].w - m[2].w*m[3].x) + m[0].w*(m[2].y*m[3].x - m[2].x*m[3].y);
inv[2].z = m[0].x*(m[1].y*m[3].w - m[1].w*m[3].y) + m[0].y*(m[1].w*m[3].x - m[1].x*m[3].w) + m[0].w*(m[1].x*m[3].y - m[1].y*m[3].x);
inv[2].w = m[0].x*(m[1].w*m[2].y - m[1].y*m[2].w) + m[0].y*(m[1].x*m[2].w - m[1].w*m[2].x) + m[0].w*(m[1].y*m[2].x - m[1].x*m[2].y);
inv[3].x = m[1].x*(m[2].z*m[3].y - m[2].y*m[3].z) + m[1].y*(m[2].x*m[3].z - m[2].z*m[3].x) + m[1].z*(m[2].y*m[3].x - m[2].x*m[3].y);
inv[3].y = m[0].x*(m[2].y*m[3].z - m[2].z*m[3].y) + m[0].y*(m[2].z*m[3].x - m[2].x*m[3].z) + m[0].z*(m[2].x*m[3].y - m[2].y*m[3].x);
inv[3].z = m[0].x*(m[1].z*m[3].y - m[1].y*m[3].z) + m[0].y*(m[1].x*m[3].z - m[1].z*m[3].x) + m[0].z*(m[1].y*m[3].x - m[1].x*m[3].y);
inv[3].w = m[0].x*(m[1].y*m[2].z - m[1].z*m[2].y) + m[0].y*(m[1].z*m[2].x - m[1].x*m[2].z) + m[0].z*(m[1].x*m[2].y - m[1].y*m[2].x);
inv /= det;

return inv;
}

void main(float4 pos : SV_Position0)
{
float width, height, x = pos.x, y = pos.y, z, w;
uint normalised_depth;
float4 tmp;

// Game specific: Calculate linear depth:
z = z_buffer.Load(int3(pos.x, pos.y, 0)).x;
tmp = mul(inverse(g_VS_ViewProjMatr), float4(0, 0, z, 1));
w = mul(g_VS_ViewProjMatr, tmp / tmp.w).w;

// Calculate mono position:
flattened_depth_buffer_uint.GetDimensions(width, height);
x -= separation * (w - convergence) / w / 2 * width;

// Since multiple pixels may try to write to the same pixel, we use
// atomic operations to avoid race conditions. Atomic operations can
// only be done using uint resources, so we scale the non-linear depth
// values up to take the full range of a 32bit uint.
normalised_depth = occlusion_culling_depth_buffer.Load(int3(pos.x, pos.y, 0)).x * (float)0xffffffff;
InterlockedMax(flattened_depth_buffer_uint[int2(x, y)], normalised_depth);
}
#endif /* PIXEL_SHADER */

cryengine_occlusion_culling_denormalise_depth_buffer.hlsl:

#ifdef PIXEL_SHADER
// Flattened depth buffer. Normalised uint version to support atomic operations:
RWTexture2D<uint> flattened_depth_buffer_uint : register(u1);

void main(float4 pos : SV_Position0, out float4 o0 : SV_Target0)
{
uint normalised_depth = flattened_depth_buffer_uint[int2(pos.x, pos.y)];

if (normalised_depth) {
o0 = (float)normalised_depth / (float)0xffffffff;
} else {
// This pixel was missed during the flattening process and we
// need to make something up. For now trying maximum depth to
// avoid false culling, but there might be enough of these that
// it prevents any occlusion culling defeating the performance
// gains of using it in the first place...
//o0 = 1;
// Alternatively, maybe we should set the minimum value on the
// assumption the holes will be small enough not to matter and
// that this will allow for more occlusion culling and better
// performance?
o0 = 0;
// Alternatively, we could search through u1 to find nearby
// values and interpolate around them.
}
}
#endif /* PIXEL_SHADER */

If this is working correctly, the debug output enabled in the final line of the above d3dx.ini excerpt should look like this (grab my debug_2d shader and change the blend setting in debug_2d.ini to "blend = ADD ONE ONE"):

Image

Note that the objects in the red depth buffer are now back where they would have been placed originally, so that when the game uses this for occlusion culling things will line up. The performance monitor shows around 76fps here.

Version 2:
This skips the atomic operations and 2nd shader necessary to avoid the race conditions, so will perform better, but the depth buffer it produces has some corruption where multiple pixels wrote to the same destination that may or may not have an impact on occlusion culling:

[ResourceOcclusionCullingFlattenDepthBufferRaceCondition]
mode = mono
[CustomShaderOcclusionCullingFlattenDepthBufferRaceCondition]
; This is similar to the above, but skips the atomic operations, possibly
; improving performance at the expense of having a partially broken depth
; buffer we send back to the CPU - whether that is acceptable or not is yet to
; be seen.
ps = ShaderFixes\cryengine_occlusion_culling_flatten_depth_buffer_race_condition.hlsl
;
; We only want to invoke this once during the downscaling process. Dead
; Rising's frame analysis looks like there are two indenendent occlusion
; culling buffers being read out, so this should possibly be removed for that
; game:
max_executions_per_frame = 1
;
; ps-t0 already contains the depth buffer we need to flatten (game specific)
;
; Create our UAVs and render targets to match the depth buffer size in ps-t0:
ResourceOcclusionCullingFlattenDepthBufferFloat = copy_desc ps-t0
ResourceOcclusionCullingFlattenDepthBufferRaceCondition = copy_desc ps-t0
;
; Clear the flattened depth buffer with whatever value we want missing pixels
; to contain (0 for near, 1 for far). May have some trade off on performance
; vs false occlusion culling?
clear = ResourceOcclusionCullingFlattenDepthBufferRaceCondition 0
;
; Remove any potentially incompatible render targets:
run = BuiltInCommandListUnbindAllRenderTargets
;
; o0 must be set to a resource the same size as the depth buffer we are working
; on as this determines how many pixel shader invocations will run, even though
; this shader does not actually write to it:
o0 = set_viewport ResourceOcclusionCullingFlattenDepthBufferFloat
;
; Bind the flattened depth buffer as a UAV to allow the pixel shader to draw
; arbitrary pixels rather than just SV_Target:
ps-u1 = ResourceOcclusionCullingFlattenDepthBufferRaceCondition
;
; Bind the PER_FRAME constant buffer to get the view-projection matrix and
; calculate linear depth (game specific):
ps-cb13 = vs-cb3
;
; Bind the depth buffer used to calculate the linear depth (but in this game's
; case not the same depth values we will be passing to the CPU):
ps-t100 = ResourceZBuffer
;
; Invoke the flattening shader:
draw = from_caller
;
; Bind the output back to the input for the game to use in the next stage.
; Since DirectX forbids resources being bound as both input and output
; simultaneously we unbind it from o0 first (or, if we did this outside the
; CustomShader we could rely on 3DMigoto having restored the original o0):
post o0 = null
post ps-t0 = ResourceOcclusionCullingFlattenDepthBufferRaceCondition
;
; Clean up anything we bound that 3DMigoto won't automatically restore (the
; CustomShader section will take care of restoring o0 and ps-u1)
post ps-cb13 = null
post ps-t100 = null
;
; Optional debugging to view the resulting flattened depth buffer:
;post Resource\ShaderFixes\debug_2d.ini\Debug2D = ref ResourceOcclusionCullingFlattenDepthBufferRaceCondition

cryengine_occlusion_culling_flatten_depth_buffer_race_condition.hlsl:

Texture2D<float4> StereoParams : register(t125);
#define separation StereoParams.Load(0).x
#define convergence StereoParams.Load(0).y
#define eye StereoParams.Load(0).z

#ifdef PIXEL_SHADER
// Depth buffer game wants to pass back to the CPU
Texture2D<float4> occlusion_culling_depth_buffer : register(t0);
// Flattened depth buffer. Floating point version we pass back to the CPU directly:
RWTexture2D<float> flattened_depth_buffer : register(u1);
// Depth buffer used throughout the game, for ease of calculating linear depth:
Texture2D<float4> z_buffer : register(t100);

// Copied from an appropriate shader with 3DMigoto:
cbuffer PER_FRAME : register(b13)
{
row_major float4x4 g_VS_ViewProjMatr : packoffset(c0);
float4 g_VS_WorldViewPos : packoffset(c6);
row_major float4x4 g_VS_ViewProjZeroMatr : packoffset(c10);
row_major float4x4 unknown : packoffset(c14);
}

matrix inverse(matrix m)
{
matrix inv;

float det = determinant(m);
inv[0].x = m[1].y*(m[2].z*m[3].w - m[2].w*m[3].z) + m[1].z*(m[2].w*m[3].y - m[2].y*m[3].w) + m[1].w*(m[2].y*m[3].z - m[2].z*m[3].y);
inv[0].y = m[0].y*(m[2].w*m[3].z - m[2].z*m[3].w) + m[0].z*(m[2].y*m[3].w - m[2].w*m[3].y) + m[0].w*(m[2].z*m[3].y - m[2].y*m[3].z);
inv[0].z = m[0].y*(m[1].z*m[3].w - m[1].w*m[3].z) + m[0].z*(m[1].w*m[3].y - m[1].y*m[3].w) + m[0].w*(m[1].y*m[3].z - m[1].z*m[3].y);
inv[0].w = m[0].y*(m[1].w*m[2].z - m[1].z*m[2].w) + m[0].z*(m[1].y*m[2].w - m[1].w*m[2].y) + m[0].w*(m[1].z*m[2].y - m[1].y*m[2].z);
inv[1].x = m[1].x*(m[2].w*m[3].z - m[2].z*m[3].w) + m[1].z*(m[2].x*m[3].w - m[2].w*m[3].x) + m[1].w*(m[2].z*m[3].x - m[2].x*m[3].z);
inv[1].y = m[0].x*(m[2].z*m[3].w - m[2].w*m[3].z) + m[0].z*(m[2].w*m[3].x - m[2].x*m[3].w) + m[0].w*(m[2].x*m[3].z - m[2].z*m[3].x);
inv[1].z = m[0].x*(m[1].w*m[3].z - m[1].z*m[3].w) + m[0].z*(m[1].x*m[3].w - m[1].w*m[3].x) + m[0].w*(m[1].z*m[3].x - m[1].x*m[3].z);
inv[1].w = m[0].x*(m[1].z*m[2].w - m[1].w*m[2].z) + m[0].z*(m[1].w*m[2].x - m[1].x*m[2].w) + m[0].w*(m[1].x*m[2].z - m[1].z*m[2].x);
inv[2].x = m[1].x*(m[2].y*m[3].w - m[2].w*m[3].y) + m[1].y*(m[2].w*m[3].x - m[2].x*m[3].w) + m[1].w*(m[2].x*m[3].y - m[2].y*m[3].x);
inv[2].y = m[0].x*(m[2].w*m[3].y - m[2].y*m[3].w) + m[0].y*(m[2].x*m[3].w - m[2].w*m[3].x) + m[0].w*(m[2].y*m[3].x - m[2].x*m[3].y);
inv[2].z = m[0].x*(m[1].y*m[3].w - m[1].w*m[3].y) + m[0].y*(m[1].w*m[3].x - m[1].x*m[3].w) + m[0].w*(m[1].x*m[3].y - m[1].y*m[3].x);
inv[2].w = m[0].x*(m[1].w*m[2].y - m[1].y*m[2].w) + m[0].y*(m[1].x*m[2].w - m[1].w*m[2].x) + m[0].w*(m[1].y*m[2].x - m[1].x*m[2].y);
inv[3].x = m[1].x*(m[2].z*m[3].y - m[2].y*m[3].z) + m[1].y*(m[2].x*m[3].z - m[2].z*m[3].x) + m[1].z*(m[2].y*m[3].x - m[2].x*m[3].y);
inv[3].y = m[0].x*(m[2].y*m[3].z - m[2].z*m[3].y) + m[0].y*(m[2].z*m[3].x - m[2].x*m[3].z) + m[0].z*(m[2].x*m[3].y - m[2].y*m[3].x);
inv[3].z = m[0].x*(m[1].z*m[3].y - m[1].y*m[3].z) + m[0].y*(m[1].x*m[3].z - m[1].z*m[3].x) + m[0].z*(m[1].y*m[3].x - m[1].x*m[3].y);
inv[3].w = m[0].x*(m[1].y*m[2].z - m[1].z*m[2].y) + m[0].y*(m[1].z*m[2].x - m[1].x*m[2].z) + m[0].z*(m[1].x*m[2].y - m[1].y*m[2].x);
inv /= det;

return inv;
}

void main(float4 pos : SV_Position0)
{
float width, height, x = pos.x, y = pos.y, z, w;
float4 tmp;

// Game specific: Calculate linear depth:
z = z_buffer.Load(int3(pos.x, pos.y, 0)).x;
tmp = mul(inverse(g_VS_ViewProjMatr), float4(0, 0, z, 1));
w = mul(g_VS_ViewProjMatr, tmp / tmp.w).w;

// Calculate mono position:
flattened_depth_buffer.GetDimensions(width, height);
x -= separation * (w - convergence) / w / 2 * width;

// Write the output depth shifted back to the mono position. Note that
// since multiple pixels may be writing to the same target pixel this
// is racy and pixels may end up with either depth value, possibly
// still leading to false occlusion culling in some situations?
flattened_depth_buffer[int2(x, y)] = occlusion_culling_depth_buffer.Load(int3(pos.x, pos.y, 0)).x;
}
#endif /* PIXEL_SHADER */

If this is working correctly, the debug output enabled in the final line of the above d3dx.ini excerpt should look like this (grab my debug_2d shader and change the blend setting in debug_2d.ini to "blend = ADD ONE ONE"):

Image

Note that in this case the edges of objects on the red depth buffer show a slight corruption pattern. The performance monitor shows around 90fps here, 14fps more than the accurate version, though that performance difference narrows considerably in other areas.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 12/04/2018 04:48 PM   
If 3dmigoto is dumping 3 different textures as ps-t0-000100=12345678...dds ps-t0-000101=12345678...dds ps-t0-000102=12345678...dds 12345678 is the hash Is there a way I could differentiate between these textures by using those 000100..000102 numbers?
If 3dmigoto is dumping 3 different textures as

ps-t0-000100=12345678...dds
ps-t0-000101=12345678...dds
ps-t0-000102=12345678...dds

12345678 is the hash

Is there a way I could differentiate between these textures by using those 000100..000102 numbers?

Posted 12/04/2018 05:24 PM   
  138 / 143    
Scroll To Top