Bo3b's School For Shaderhackers
  49 / 88    
[quote="3d4dd"][quote="DarkStarSword"] 3. Try adding the stereo correction formula instead of subtracting it (remove the - from -r31.w on the final line of the stereo correction) 4. Try adding the convergence instead of subtracting it (this is based on one pattern I still can't explain in the Unity 4 version of Stranded Deep) [/quote] The combination of both was the solution - now the shadows are placed correctly :) Thank You so much for the help! Now I will to fix the dark spot under the character's feet simulating "ambiant occlusion" within buildings (also wouldn't harm to disable it). I will also try to separate the textures for HUD elements and some ground effects as they use the same VS and moving the HUD to depth messes up these effects.[/quote]Great stuff!!! [quote]Or should I use the correction on registers connected to v2 (dcl_texcoord2 v2.[b]xyz[/b])? But before I continue with experiments I just wanted to ask if my approach is somewhat usefull or if I'm on the wrong track.[/quote]You are definitely on the right track - as you guessed v2 is relevant here :) I'll walk you through my process looking for the right spot to add a fix: The trick with these is to find the depth buffer: [code] // $s_cameraDepthSampler s0 1 [/code] Then look for where it is sampled: [code] texld r0, r0, s0 [/code] Then follow how the depth value (in r0.x here) gets used looking for something that multiplies it by a three dimensional coordinate (there might be some intermediate steps that scales it and maybe does an rcp to it first, but that depends on the game) [code] mov r1.xyz, v0 mad r0.yzw, v2.xxyz, r0.x, -r1.xxyz [/code] You tried adding the code after the first instruction there, but that did not use the depth - the important one is the next instruction that multiplies the depth in r0.x by a three dimensional coordinate in v2.xyz. Since this then goes and subtracts another coordinate and you will (probably) want to adjust it before that point, you might first want to split the mad into a mul and and add: [code] mul r0.yzw, v2.xxyz, r0.x add r0.yzw, r0, -r1.xxyz [/code] And insert the correction between the two. Keep in mind here that since it is r0.yzw instead of .xyz, you will need to use .w as the depth and .y as x: [code] // view-space stereo correction from r0.yzw: texldl r31, c220.z, s13 add r31.w, r0.w, r31.y mul r31.w, r31.w, r31.x mad r0.y, r31.w, v3.x, r0.y [/code] See how that goes. [quote] VSDC043178: [code]// Parameters: // // float4 $s_projectionReciprocal; // float2 $s_surfaceSize; // float4x4 $s_worldProjectionMatrix; // float4x4 $s_worldViewMatrix; [/code][/quote]So, interestingly this time you also have access to two additional matrices - if the same pattern as the shadows doesn't work they might prove useful.
3d4dd said:
DarkStarSword said:
3. Try adding the stereo correction formula instead of subtracting it (remove the - from -r31.w on the final line of the stereo correction)
4. Try adding the convergence instead of subtracting it (this is based on one pattern I still can't explain in the Unity 4 version of Stranded Deep)

The combination of both was the solution - now the shadows are placed correctly :) Thank You so much for the help!
Now I will to fix the dark spot under the character's feet simulating "ambiant occlusion" within buildings (also wouldn't harm to disable it). I will also try to separate the textures for HUD elements and some ground effects as they use the same VS and moving the HUD to depth messes up these effects.
Great stuff!!!

Or should I use the correction on registers connected to v2 (dcl_texcoord2 v2.xyz)? But before I continue with experiments I just wanted to ask if my approach is somewhat usefull or if I'm on the wrong track.
You are definitely on the right track - as you guessed v2 is relevant here :)

I'll walk you through my process looking for the right spot to add a fix:

The trick with these is to find the depth buffer:

//   $s_cameraDepthSampler s0       1

Then look for where it is sampled:

texld r0, r0, s0

Then follow how the depth value (in r0.x here) gets used looking for something that multiplies it by a three dimensional coordinate (there might be some intermediate steps that scales it and maybe does an rcp to it first, but that depends on the game)

mov r1.xyz, v0
mad r0.yzw, v2.xxyz, r0.x, -r1.xxyz

You tried adding the code after the first instruction there, but that did not use the depth - the important one is the next instruction that multiplies the depth in r0.x by a three dimensional coordinate in v2.xyz.

Since this then goes and subtracts another coordinate and you will (probably) want to adjust it before that point, you might first want to split the mad into a mul and and add:

mul r0.yzw, v2.xxyz, r0.x
add r0.yzw, r0, -r1.xxyz

And insert the correction between the two. Keep in mind here that since it is r0.yzw instead of .xyz, you will need to use .w as the depth and .y as x:

// view-space stereo correction from r0.yzw:
texldl r31, c220.z, s13
add r31.w, r0.w, r31.y
mul r31.w, r31.w, r31.x
mad r0.y, r31.w, v3.x, r0.y

See how that goes.


VSDC043178:
// Parameters:
//
// float4 $s_projectionReciprocal;
// float2 $s_surfaceSize;
// float4x4 $s_worldProjectionMatrix;
// float4x4 $s_worldViewMatrix;
So, interestingly this time you also have access to two additional matrices - if the same pattern as the shadows doesn't work they might prove useful.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/08/2016 01:11 PM   
Thank You for Your interesting explanations! Maybe I have made a mistake when I tried to change the code according to Your instructions as the ambiant occlusion dot just remains 2D and there is no split between the left and right eye. I have used this code in PS903F9CB3: [code] dcl vPos.xy dcl_2d s0 // New input with s_projectionReciprocal from the vertex shader: dcl_texcoord3 v3 // Helix Mod Stereo Params: dcl_2d s13 def c220, 0, 1, 0.0625, 0.5 frc r0.xy, vPos ... mov r1.xyz, v0 // mad r0.yzw, v2.xxyz, r0.x, -r1.xxyz // split mad into mul and add mul r0.yzw, v2.xxyz, r0.x // view-space stereo correction from r0.yzw: texldl r31, c220.z, s13 add r31.w, r0.w, r31.y mul r31.w, r31.w, r31.x mad r0.y, r31.w, v3.x, r0.y add r0.yzw, r0, -r1.xxyz dp3 r0.y, v1, r0.yzww max r1.w, r0.y, -v1.w ...[/code] I did some of the experiments used for the fixed dynamic shadow shader like -r31.y or -r31.w and used a constant register instead of v3.x. But in contrary to the shadow shader this had no effects and the result was always the same 2D dot without a split...
Thank You for Your interesting explanations! Maybe I have made a mistake when I tried to change the code according to Your instructions as the ambiant occlusion dot just remains 2D and there is no split between the left and right eye. I have used this code in PS903F9CB3:
dcl vPos.xy
dcl_2d s0
// New input with s_projectionReciprocal from the vertex shader:
dcl_texcoord3 v3
// Helix Mod Stereo Params:
dcl_2d s13
def c220, 0, 1, 0.0625, 0.5
frc r0.xy, vPos
...
mov r1.xyz, v0
// mad r0.yzw, v2.xxyz, r0.x, -r1.xxyz
// split mad into mul and add
mul r0.yzw, v2.xxyz, r0.x
// view-space stereo correction from r0.yzw:
texldl r31, c220.z, s13
add r31.w, r0.w, r31.y
mul r31.w, r31.w, r31.x
mad r0.y, r31.w, v3.x, r0.y
add r0.yzw, r0, -r1.xxyz
dp3 r0.y, v1, r0.yzww
max r1.w, r0.y, -v1.w
...

I did some of the experiments used for the fixed dynamic shadow shader like -r31.y or -r31.w and used a constant register instead of v3.x. But in contrary to the shadow shader this had no effects and the result was always the same 2D dot without a split...

My original display name is 3d4dd - for some reason Nvidia changed it..?!

Posted 01/09/2016 09:40 AM   
[quote="3d4dd"]Thank You for Your interesting explanations! Maybe I have made a mistake when I tried to change the code according to Your instructions as the ambiant occlusion dot just remains 2D and there is no split between the left and right eye. I have used this code in PS903F9CB3:[/quote]That looks like what I was thinking, which suggests that this one needs a different pattern. But first, just make sure that there isn't just a typo by disabling the output from the shader and making sure it does disable on reload. Once you have confirmed that the shader is reloading, try moving the adjustment to after the mad instruction. Since you already identified that adjusting r1.xyz had an effect, I notice that that is used a little later in a similar mad instruction, so check if inserting an adjustment after / in the middle of / before that has any effect. I notice that both texcoord1 and texcoord2 use the world-view matrix in the vertex shader - it might be worth seeing what happens if you add a view-space forwards (or backwards) correction to one or both of those in the vertex shader.
3d4dd said:Thank You for Your interesting explanations! Maybe I have made a mistake when I tried to change the code according to Your instructions as the ambiant occlusion dot just remains 2D and there is no split between the left and right eye. I have used this code in PS903F9CB3:
That looks like what I was thinking, which suggests that this one needs a different pattern.

But first, just make sure that there isn't just a typo by disabling the output from the shader and making sure it does disable on reload.

Once you have confirmed that the shader is reloading, try moving the adjustment to after the mad instruction.

Since you already identified that adjusting r1.xyz had an effect, I notice that that is used a little later in a similar mad instruction, so check if inserting an adjustment after / in the middle of / before that has any effect.

I notice that both texcoord1 and texcoord2 use the world-view matrix in the vertex shader - it might be worth seeing what happens if you add a view-space forwards (or backwards) correction to one or both of those in the vertex shader.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/10/2016 04:23 AM   
[quote="DarkStarSword"]Since you already identified that adjusting r1.xyz had an effect, I notice that that is used a little later in a similar mad instruction, so check if inserting an adjustment after / in the middle of / before that has any effect. [/quote] As I didn't know which mad instruction You meant exactly I tried both [code]mad r0.yzw, v1.xxyz, r0.y, r1.xxyz [/code] and [code]mad r0.xyz, v2, -r0.x, r0.yzww[/code] (realized that it was xyz instead of yzw so used r0.z and r0.x in the adjustment) Placing the adjustment in the middle and after the mad instruction had the effect that the ambiant occlusion dot splitted for the left and right eye. But the gap between the dots remained always the same regardless of the distance to the character. Where is the information about the distance taken from? From the VS? I also tried to pass the values (x,y,z or w) from worldProjectionMatrix and worldViewMatrix to the PS but the effect was the same. [quote="DarkStarSword"]I notice that both texcoord1 and texcoord2 use the world-view matrix in the vertex shader - it might be worth seeing what happens if you add a view-space forwards (or backwards) correction to one or both of those in the vertex shader. [/quote] I think You refer to [code]// $s_worldViewMatrix c12 4 ... mad r0.xyz, c12, v1.x, r0 ... mad r0.xyz, c12, v2.x, r0[/code] Could You please give me an example how a "view-space forwards (or backwards) correction" looks like in a VS? For VS I have only used the prime directive (halos) and the HUD/sky depth correction so far...
DarkStarSword said:Since you already identified that adjusting r1.xyz had an effect, I notice that that is used a little later in a similar mad instruction, so check if inserting an adjustment after / in the middle of / before that has any effect.

As I didn't know which mad instruction You meant exactly I tried both
mad r0.yzw, v1.xxyz, r0.y, r1.xxyz

and
mad r0.xyz, v2, -r0.x, r0.yzww
(realized that it was xyz instead of yzw so used r0.z and r0.x in the adjustment)
Placing the adjustment in the middle and after the mad instruction had the effect that the ambiant occlusion dot splitted for the left and right eye. But the gap between the dots remained always the same regardless of the distance to the character. Where is the information about the distance taken from? From the VS? I also tried to pass the values (x,y,z or w) from worldProjectionMatrix and worldViewMatrix to the PS but the effect was the same.

DarkStarSword said:I notice that both texcoord1 and texcoord2 use the world-view matrix in the vertex shader - it might be worth seeing what happens if you add a view-space forwards (or backwards) correction to one or both of those in the vertex shader.

I think You refer to
//   $s_worldViewMatrix       c12      4
...
mad r0.xyz, c12, v1.x, r0
...
mad r0.xyz, c12, v2.x, r0

Could You please give me an example how a "view-space forwards (or backwards) correction" looks like in a VS? For VS I have only used the prime directive (halos) and the HUD/sky depth correction so far...

My original display name is 3d4dd - for some reason Nvidia changed it..?!

Posted 01/10/2016 09:59 PM   
[quote="3d4dd"]Where is the information about the distance taken from? From the VS?[/quote]There's several coordinates in this shader - some are just passed from the game (likely to coord of the character's feet) while others are being calculated from the screen coord multiplied by the depth buffer - those are the ones we need to change (but since that didn't work we tried a few other things, but I think I may have just missed something). [quote]Could You please give me an example how a "view-space forwards (or backwards) correction" looks like in a VS? For VS I have only used the prime directive (halos) and the HUD/sky depth correction so far...[/quote]A view-space correction is very similar to a regular correction, except you use z instead of w and multiply the whole thing by the value in the top-left corner of the inverse projection matrix (or divide by the top-left corner of the forwards projection matrix) - this is the type of correction we have added to the pixel shaders already. Looking at the code again I think I missed a fairly vital clue the first time round: [code] mad r0.yzw, v2.xxyz, r0.x, -r1.xxyz ... mad r0.xyz, v2, -r0.x, r0.yzww [/code] That's creating two coordinates from the depth buffer (then adding some other coord to them), but the second will be negated from the first. Try this: [code] // Parameters: // // sampler2D $s_cameraDepthSampler; // float2 $s_screenReciprocal; // float4 $s_shadeFactor; // // // Registers: // // Name Reg Size // --------------------- ----- ---- // $s_screenReciprocal c0 1 // $s_shadeFactor c1 1 // $s_cameraDepthSampler s0 1 // ps_3_0 def c2, 0.5, 1, 0, 0 def c3, 3.73000002, 2.73000002, -3.73000002, -0.730000019 dcl_texcoord v0 dcl_texcoord1 v1 dcl_texcoord2 v2.xyz dcl vPos.xy dcl_2d s0 // New input with s_projectionReciprocal from the vertex shader: dcl_texcoord3 v3 // Helix Mod Stereo Params: dcl_2d s13 def c220, 0, 1, 0.0625, 0.5 frc r0.xy, vPos add r0.xy, -r0, vPos add r0.xy, r0, c2.x mul r0.xy, r0, c0 texld r0, r0, s0 mov r1.xyz, v0 // Split the mad instruction (using r30 this time as I want to use the // result twice): //mad r0.yzw, v2.xxyz, r0.x, -r1.xxyz mad r30.yzw, v2.xxyz, r0.x // view-space stereo correction from r30.yzw: texldl r31, c220.z, s13 add r31.w, r30.w, r31.y mul r31.w, r31.w, r31.x mad r30.y, r31.w, v3.x, r30.y // Second part of split mad instruction: add r0.yzw, r30, -r1.xxyz dp3 r0.y, v1, r0.yzww max r1.w, r0.y, -v1.w min r0.y, v1.w, r1.w mad r0.yzw, v1.xxyz, r0.y, r1.xxyz // Replace this mad to use the adjusted coord from above negated: //mad r0.xyz, v2, -r0.x, r0.yzww add r0.xyz, -r30.yzw, r0.yzww dp3 r0.x, r0, r0 rsq r0.x, r0.x rcp r0.x, r0.x mul_pp r0.x, r0.x, v0.w max_pp r1.x, c2.y, r0.x mad_pp r0.xy, r1.x, c3, c3.zwzw rcp r0.y, r0.y mul_sat_pp r0.x, r0.y, r0.x max_pp oC0.xyz, c1.x, r0.x mov_pp oC0.w, c2.y [/code]
3d4dd said:Where is the information about the distance taken from? From the VS?
There's several coordinates in this shader - some are just passed from the game (likely to coord of the character's feet) while others are being calculated from the screen coord multiplied by the depth buffer - those are the ones we need to change (but since that didn't work we tried a few other things, but I think I may have just missed something).

Could You please give me an example how a "view-space forwards (or backwards) correction" looks like in a VS? For VS I have only used the prime directive (halos) and the HUD/sky depth correction so far...
A view-space correction is very similar to a regular correction, except you use z instead of w and multiply the whole thing by the value in the top-left corner of the inverse projection matrix (or divide by the top-left corner of the forwards projection matrix) - this is the type of correction we have added to the pixel shaders already.


Looking at the code again I think I missed a fairly vital clue the first time round:
mad r0.yzw, v2.xxyz, r0.x, -r1.xxyz
...
mad r0.xyz, v2, -r0.x, r0.yzww


That's creating two coordinates from the depth buffer (then adding some other coord to them), but the second will be negated from the first. Try this:

// Parameters:
//
// sampler2D $s_cameraDepthSampler;
// float2 $s_screenReciprocal;
// float4 $s_shadeFactor;
//
//
// Registers:
//
// Name Reg Size
// --------------------- ----- ----
// $s_screenReciprocal c0 1
// $s_shadeFactor c1 1
// $s_cameraDepthSampler s0 1
//

ps_3_0
def c2, 0.5, 1, 0, 0
def c3, 3.73000002, 2.73000002, -3.73000002, -0.730000019
dcl_texcoord v0
dcl_texcoord1 v1
dcl_texcoord2 v2.xyz
dcl vPos.xy
dcl_2d s0

// New input with s_projectionReciprocal from the vertex shader:
dcl_texcoord3 v3
// Helix Mod Stereo Params:
dcl_2d s13
def c220, 0, 1, 0.0625, 0.5

frc r0.xy, vPos
add r0.xy, -r0, vPos
add r0.xy, r0, c2.x
mul r0.xy, r0, c0
texld r0, r0, s0
mov r1.xyz, v0

// Split the mad instruction (using r30 this time as I want to use the
// result twice):
//mad r0.yzw, v2.xxyz, r0.x, -r1.xxyz
mad r30.yzw, v2.xxyz, r0.x

// view-space stereo correction from r30.yzw:
texldl r31, c220.z, s13
add r31.w, r30.w, r31.y
mul r31.w, r31.w, r31.x
mad r30.y, r31.w, v3.x, r30.y

// Second part of split mad instruction:
add r0.yzw, r30, -r1.xxyz

dp3 r0.y, v1, r0.yzww
max r1.w, r0.y, -v1.w
min r0.y, v1.w, r1.w
mad r0.yzw, v1.xxyz, r0.y, r1.xxyz

// Replace this mad to use the adjusted coord from above negated:
//mad r0.xyz, v2, -r0.x, r0.yzww
add r0.xyz, -r30.yzw, r0.yzww

dp3 r0.x, r0, r0
rsq r0.x, r0.x
rcp r0.x, r0.x
mul_pp r0.x, r0.x, v0.w
max_pp r1.x, c2.y, r0.x
mad_pp r0.xy, r1.x, c3, c3.zwzw
rcp r0.y, r0.y
mul_sat_pp r0.x, r0.y, r0.x
max_pp oC0.xyz, c1.x, r0.x
mov_pp oC0.w, c2.y

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/11/2016 04:32 AM   
Btw. I did a quick test of Resident Evil Revelations 2 since people couldnt get wrapper to work way back when[figured its a good idea since dragons dogma is on horizon]. Just got to grab latest version of wrapper and do [General] UseRenderedShaders = true Applies to all games on that engine. Again, no pressure but figured it would be helpful. I cant do any shader stuff on this computer cause setup is a pain in the ass with edid overrides causing issues with alt tabbing.
Btw. I did a quick test of Resident Evil Revelations 2 since people couldnt get wrapper to work way back when[figured its a good idea since dragons dogma is on horizon].

Just got to grab latest version of wrapper and do
[General]
UseRenderedShaders = true


Applies to all games on that engine. Again, no pressure but figured it would be helpful. I cant do any shader stuff on this computer cause setup is a pain in the ass with edid overrides causing issues with alt tabbing.

Co-founder/Web host of helixmod.blog.com

Donations for web hosting @ paypal -eqzitara@yahoo.com
or
https://www.patreon.com/user?u=791918

Posted 01/11/2016 08:14 AM   
We are getting closer :) The ambiant occlusion is now placed correctly. But unfortunately it is only partially visible and especially for distant objects cropped: [url]http://photos.3dvisionlive.com/3d4dd/image/569367a9e7e564b37c0001a8/[/url] Looks like a scissor rectangle issue (an issue I couldn't observe in this game so far). But SkipSetScissorRect = true doesn't help...
We are getting closer :) The ambiant occlusion is now placed correctly. But unfortunately it is only partially visible and especially for distant objects cropped: http://photos.3dvisionlive.com/3d4dd/image/569367a9e7e564b37c0001a8/ Looks like a scissor rectangle issue (an issue I couldn't observe in this game so far). But SkipSetScissorRect = true doesn't help...

My original display name is 3d4dd - for some reason Nvidia changed it..?!

Posted 01/11/2016 08:31 AM   
[quote="eqzitara"]Btw. I did a quick test of Resident Evil Revelations 2 since people couldnt get wrapper to work way back when[figured its a good idea since dragons dogma is on horizon]. Just got to grab latest version of wrapper and do [General] UseRenderedShaders = true Applies to all games on that engine. Again, no pressure but figured it would be helpful. I cant do any shader stuff on this computer cause setup is a pain in the ass with edid overrides causing issues with alt tabbing.[/quote] Perhaps we are off topic here.. But anyway in RE2: Revelations I use Max Payne 3 profile and ads this: Under C:\Users\Username\AppData\Local\CAPCOM\RESIDENT EVIL REVELATIONS2 ad Stereo=ON" in alphabetical order Now its perfect in 3D !
eqzitara said:Btw. I did a quick test of Resident Evil Revelations 2 since people couldnt get wrapper to work way back when[figured its a good idea since dragons dogma is on horizon].

Just got to grab latest version of wrapper and do
[General]
UseRenderedShaders = true


Applies to all games on that engine. Again, no pressure but figured it would be helpful. I cant do any shader stuff on this computer cause setup is a pain in the ass with edid overrides causing issues with alt tabbing.


Perhaps we are off topic here..

But anyway in RE2: Revelations I use Max Payne 3 profile and ads this:
Under C:\Users\Username\AppData\Local\CAPCOM\RESIDENT EVIL REVELATIONS2 ad Stereo=ON" in alphabetical order
Now its perfect in 3D !

Win7 64bit Pro
CPU: 4790K 4.8 GHZ
GPU: Asus Geforce RTX 2080 TI Rog Strix OC
Monitor: Asus PG278QR
And lots of ram and HD's ;)

Posted 01/11/2016 09:17 AM   
[quote="3d4dd"]We are getting closer :) The ambiant occlusion is now placed correctly. But unfortunately it is only partially visible and especially for distant objects cropped: [url]http://photos.3dvisionlive.com/3d4dd/image/569367a9e7e564b37c0001a8/[/url] Looks like a scissor rectangle issue (an issue I couldn't observe in this game so far). But SkipSetScissorRect = true doesn't help...[/quote]This is almost certainly going to need an adjustment in the vertex shader to fix, but I've fixed a bunch of these and it can get a little tricky so first see if you can confirm that will be necessary - change the pixel shader to output a solid colour so you see where the bounds of the effect are. I'm expecting that you will see a 2D rectangle at screen depth that stops right where you are seeing the effect clipping. You should then be able to adjust the output position in the vertex shader and you should see the rectangle move (I don't care what adjustment yet - all you want to confirm is that you can move the rectangle beyond it's current bounds). The next step will be to move the rectangle so that it will encompass the entire bounds of the effect regardless of depth and with any reasonable convergence setting. Here are the options: 1. Move the effect to a fixed depth 2. Move the effect to a variable depth based on something in the shader 3. Move the left side of the effect in the left eye separation pixels to the left and the right side of the effect in the right eye separation pixels to the right 4. Move the left side of the effect to the left of the screen and the right side of the effect to the right of the screen 5. Move the effect to a variable depth based on the depth buffer 1 is the easiest option, but often won't work for all distances and might work less well for someone playing at a high convergence. You should at least start with this to see if you can make it work before adding any more complexity. 2 might be suitable here - I'm fairly certain that one of the coordinates you have been passed will be related to the character (possibly their feet). If that is the case it might have a suitable depth value you can use. 3 is a *lot* easier in DX11 as you can check the SV_VertexID to determine which side of the effect the current vertex is on, however I believe that this should be possible to do in DX9 as well by using FirstVertexPosReg to get the position of the first vertex and compare the current vertex position to that (I added documentation on this to the feature list page once I figured out how to make it work in Dreamfall Chapters). This approach works for effects that don't pop too far out of the screen (hence why I mentioned that convergence can be important). 4 is like 3, but it works on any effect regardless of pop-out 5 is added for completeness, but should only be used as a last resort and I don't think it will come to that here. Now, the tricky part is that once you adjust the position to move the bounding box you almost certainly find that you have misaligned the effect once again. To fix this you will need to adjust one or more of the texcoord outputs (any that are in some way related to the screen position) to match the adjustment you made to the position, but they will often be in a different coordinate system so this is not as simple as just adding the same value to each. In some cases you might be able to simplify this by applying an adjustment earlier in the shader so that it will affect multiple outputs. Here's an example of approach 3 to fix clipping on certain shadows in Lichdom Battlemage - this adjusts the position, a texture coordinate and a world space coordinate: https://github.com/DarkStarSword/3d-fixes/blob/7f3919851c3575e50a76c9624e63c85612243696/Lichdom%20Battlemage/ShaderFixes/bfb176156cee7a92-vs_replace.txt Here's an example of approach 4 to fix a clipping issue on interior shadows in Far Cry 4 - this only needed to adjust the position (as you can see from the comment I tried approach 3 first): https://github.com/bo3b/3Dmigoto/blob/8ab435d44d2de4720fd44e47380ea466d8fbc5f2/FC4/ShaderFixes/438efe85d344ddd3-vs_replace.txt Here's an example of approach 1 to fix clipping on lights in Far Cry 4 - note that I have applied the adjustment rather early here and have effectively adjusted an input to the vertex shader rather than an output which then gets picked up by both outputs: https://github.com/bo3b/3Dmigoto/blob/8ab435d44d2de4720fd44e47380ea466d8fbc5f2/FC4/ShaderFixes/bbcba0c185334df3-vs_replace.txt Based on what we have learned about this shader so far, here is my first stab at approach 1. I'm inserting the adjustment a little early in the hopes that it will affect both the output position and texcoord2 identically, while should leave texcoord 0 and 1 alone (which I'm guessing we won't need to adjust if they are what I think they are, but I could be wrong - if they do need to be adjusted at least we know that they are in view-space coordinates): [code] // Parameters: // // float4 $s_projectionReciprocal; // float2 $s_surfaceSize; // float4x4 $s_worldProjectionMatrix; // float4x4 $s_worldViewMatrix; // // // Registers: // // Name Reg Size // ------------------------ ----- ---- // $s_surfaceSize c0 1 // $s_projectionReciprocal c1 1 // $s_worldProjectionMatrix c8 4 // $s_worldViewMatrix c12 4 // vs_3_0 def c2, 0.100000001, 1, -1, 0 dcl_position v0 dcl_texcoord v1 dcl_texcoord1 v2 dcl_position o0 dcl_texcoord o1 dcl_texcoord1 o2 dcl_texcoord2 o3.xyz def c220, 0, 1, 0.0625, 0.5 dcl_2d s0 mul r0.xyz, c13, v1.y mad r0.xyz, c12, v1.x, r0 mad r0.xyz, c14, v1.z, r0 add o1.xyz, r0, c15 rcp o1.w, v1.w mul r0.xyz, c13, v2.y mad r0.xyz, c12, v2.x, r0 mad o2.xyz, c14, v2.z, r0 mul r0.xyz, c9.xyww, v0.y mad r0.xyz, c8.xyww, v0.x, r0 mad r0.xyz, c10.xyww, v0.z, r0 add r0.xyz, r0, c11.xyww max r0.z, r0.z, c2.x rcp r0.z, r0.z mul r0.xy, r0.z, r0 mov r0.z, c2.y // Add separation as a first attempt at fixing the clipping issue on the AO // shadows by moving the bounding box. Adjust at this point to hopefully affect // the output postition and texcoord2 equally: texldl r31, c220.z, s0 add r0.x, r0.x, r31.x mul o3.xyz, r0, c1 rcp r1.x, c0.x rcp r1.y, c0.y mad o0.xy, r1, c2.zyzw, r0 mov o0.zw, c2.xywy mov o2.w, v2.w [/code]
3d4dd said:We are getting closer :) The ambiant occlusion is now placed correctly. But unfortunately it is only partially visible and especially for distant objects cropped: http://photos.3dvisionlive.com/3d4dd/image/569367a9e7e564b37c0001a8/ Looks like a scissor rectangle issue (an issue I couldn't observe in this game so far). But SkipSetScissorRect = true doesn't help...
This is almost certainly going to need an adjustment in the vertex shader to fix, but I've fixed a bunch of these and it can get a little tricky so first see if you can confirm that will be necessary - change the pixel shader to output a solid colour so you see where the bounds of the effect are. I'm expecting that you will see a 2D rectangle at screen depth that stops right where you are seeing the effect clipping. You should then be able to adjust the output position in the vertex shader and you should see the rectangle move (I don't care what adjustment yet - all you want to confirm is that you can move the rectangle beyond it's current bounds).

The next step will be to move the rectangle so that it will encompass the entire bounds of the effect regardless of depth and with any reasonable convergence setting. Here are the options:

1. Move the effect to a fixed depth
2. Move the effect to a variable depth based on something in the shader
3. Move the left side of the effect in the left eye separation pixels to the left and the right side of the effect in the right eye separation pixels to the right
4. Move the left side of the effect to the left of the screen and the right side of the effect to the right of the screen
5. Move the effect to a variable depth based on the depth buffer

1 is the easiest option, but often won't work for all distances and might work less well for someone playing at a high convergence. You should at least start with this to see if you can make it work before adding any more complexity.

2 might be suitable here - I'm fairly certain that one of the coordinates you have been passed will be related to the character (possibly their feet). If that is the case it might have a suitable depth value you can use.

3 is a *lot* easier in DX11 as you can check the SV_VertexID to determine which side of the effect the current vertex is on, however I believe that this should be possible to do in DX9 as well by using FirstVertexPosReg to get the position of the first vertex and compare the current vertex position to that (I added documentation on this to the feature list page once I figured out how to make it work in Dreamfall Chapters). This approach works for effects that don't pop too far out of the screen (hence why I mentioned that convergence can be important).

4 is like 3, but it works on any effect regardless of pop-out

5 is added for completeness, but should only be used as a last resort and I don't think it will come to that here.


Now, the tricky part is that once you adjust the position to move the bounding box you almost certainly find that you have misaligned the effect once again. To fix this you will need to adjust one or more of the texcoord outputs (any that are in some way related to the screen position) to match the adjustment you made to the position, but they will often be in a different coordinate system so this is not as simple as just adding the same value to each. In some cases you might be able to simplify this by applying an adjustment earlier in the shader so that it will affect multiple outputs.

Here's an example of approach 3 to fix clipping on certain shadows in Lichdom Battlemage - this adjusts the position, a texture coordinate and a world space coordinate:

https://github.com/DarkStarSword/3d-fixes/blob/7f3919851c3575e50a76c9624e63c85612243696/Lichdom%20Battlemage/ShaderFixes/bfb176156cee7a92-vs_replace.txt

Here's an example of approach 4 to fix a clipping issue on interior shadows in Far Cry 4 - this only needed to adjust the position (as you can see from the comment I tried approach 3 first):

https://github.com/bo3b/3Dmigoto/blob/8ab435d44d2de4720fd44e47380ea466d8fbc5f2/FC4/ShaderFixes/438efe85d344ddd3-vs_replace.txt

Here's an example of approach 1 to fix clipping on lights in Far Cry 4 - note that I have applied the adjustment rather early here and have effectively adjusted an input to the vertex shader rather than an output which then gets picked up by both outputs:

https://github.com/bo3b/3Dmigoto/blob/8ab435d44d2de4720fd44e47380ea466d8fbc5f2/FC4/ShaderFixes/bbcba0c185334df3-vs_replace.txt

Based on what we have learned about this shader so far, here is my first stab at approach 1. I'm inserting the adjustment a little early in the hopes that it will affect both the output position and texcoord2 identically, while should leave texcoord 0 and 1 alone (which I'm guessing we won't need to adjust if they are what I think they are, but I could be wrong - if they do need to be adjusted at least we know that they are in view-space coordinates):

// Parameters:
//
// float4 $s_projectionReciprocal;
// float2 $s_surfaceSize;
// float4x4 $s_worldProjectionMatrix;
// float4x4 $s_worldViewMatrix;
//
//
// Registers:
//
// Name Reg Size
// ------------------------ ----- ----
// $s_surfaceSize c0 1
// $s_projectionReciprocal c1 1
// $s_worldProjectionMatrix c8 4
// $s_worldViewMatrix c12 4
//

vs_3_0
def c2, 0.100000001, 1, -1, 0
dcl_position v0
dcl_texcoord v1
dcl_texcoord1 v2
dcl_position o0
dcl_texcoord o1
dcl_texcoord1 o2
dcl_texcoord2 o3.xyz

def c220, 0, 1, 0.0625, 0.5
dcl_2d s0

mul r0.xyz, c13, v1.y
mad r0.xyz, c12, v1.x, r0
mad r0.xyz, c14, v1.z, r0
add o1.xyz, r0, c15
rcp o1.w, v1.w
mul r0.xyz, c13, v2.y
mad r0.xyz, c12, v2.x, r0
mad o2.xyz, c14, v2.z, r0
mul r0.xyz, c9.xyww, v0.y
mad r0.xyz, c8.xyww, v0.x, r0
mad r0.xyz, c10.xyww, v0.z, r0
add r0.xyz, r0, c11.xyww

max r0.z, r0.z, c2.x
rcp r0.z, r0.z
mul r0.xy, r0.z, r0
mov r0.z, c2.y

// Add separation as a first attempt at fixing the clipping issue on the AO
// shadows by moving the bounding box. Adjust at this point to hopefully affect
// the output postition and texcoord2 equally:
texldl r31, c220.z, s0
add r0.x, r0.x, r31.x

mul o3.xyz, r0, c1
rcp r1.x, c0.x
rcp r1.y, c0.y
mad o0.xy, r1, c2.zyzw, r0
mov o0.zw, c2.xywy
mov o2.w, v2.w

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/11/2016 01:20 PM   
[quote="helifax"][quote="DarkStarSword"]It is UE3, but it is a very heavily modified UE3, and the lighting they are using is entirely a custom job for this specific game, so it's unlikely there will be a standard configuration option for it (a custom one maybe). That said, I was able to crack one of the two clipping issues on the tile lighting last night and I'm 99.9% confident that I know how to solve the other, so at the moment it's looking very promising :)[/quote] That sounds very promising! I am wondering if the same things we can use on the FrostByte 3 engine and their tile-based lighting there as well;)[/quote]I was waiting to reply to this until I had fixed them 100%, which I now have :) I suspect that these are going to be fairly game/engine specific, so the fix for Batman is very unlikely to work as is in any other game, but perhaps there are similarities. This is what the lighting originally looked like: [img]http://darkstarsword.net/screenshots/BatmanAK%20-%202016-01-12%20-%20003852.0.jps[/img] Identifying how to fix the position of the lights was rather straight forward (just the usual - find where the depth buffer is sampled, look for a 3D coordinate multiplied by it then insert a correction, in this case in view-space), but then this happened (it looked fine up close, this was only a problem from a distance): [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67772/[/img] The problem is that before the lights are drawn each tile decides which lights it is going to draw, and this decision is still being made on the old 2D position of the light which is no longer lined up. We experimented with enabling all the lights in every tile, but that was a massive performance killer. Fixing this was not obvious, but it ended up just being a view-space stereo correction (or four) on what I assume must be the corners of each tile (this wasn't obvious because it was so far removed from the depth buffer samples, but enough experimentation and analysis eventually made me suspicious of some shared memory g buffers that were being used to pass some positions around, and some other code that looked like it was finding the minimum and maximum depths in a tile, until I found the correct spot somewhere between those two things): [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67770/[/img] You might notice that there is still an issue with clipping here, but now it's on a larger 4x2 grid instead of the small tiles. This was a bit more tricky - the game had already calculated which lights fell within which parts of that large grid on the CPU and passed it to the shader in a structured buffer. The structured buffer had 8 sorted lists for each of the 8 large tiles in the grid and a footer containing the lengths of each list (how do I know this? frame analysis' dump_tex option dumps out structured buffers so I could examine it's contents). Not using the list wasn't a good option. There were over 100 lights active simultaneously, but this buffer just listed IDs that were looked up in other buffers, and those IDs went upwards of almost 1000 - without the list we would have to check them all and this shader was already really expensive. So I added some code to the shader to merge adjacent lists together, which resulted in this: [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67769/[/img] But unfortunately that wasn't free either. This shader had been costing me 5fps and was now costing me 15fps (down to 10, not really playable any more)! I determined that the cost was not the extra instructions, it was the extra reads to the structured buffer containing the lists. Further experimentation showed that this was due to reading the buffer from so many threads simultaneously - I was copying it to a shared memory structured buffer, which was only shared within each thread group of 256 threads, so I had to do the merge on each thread group - and there were something like 1,800 of those while playing in 720p * 8 lists * 2 lists being merged = 28,800 simultaneous reads, and up to a maximum of 512 entries per list to be merged. Some GPUs might perform better here - I would bet that this performance issue was due to something along the lines of "cache ping pong" (though why that happened on reads...). So, this was my motivation for finally writing that custom shader injection feature I've been planning for some time now. Using a second shader meant I could run a single thread group of 8 threads to perform the merge, eliminating the performance killing cache ping pong effect and completely restoring my original frame rate :) The final shader is here: https://github.com/DarkStarSword/3d-fixes/blob/master/Batman/ShaderFixes/eb8c3e5e00a6c476-cs.txt And the injected shader to merge the lists: https://github.com/DarkStarSword/3d-fixes/blob/master/Batman/ShaderFixes/merge_tiles.hlsl
helifax said:
DarkStarSword said:It is UE3, but it is a very heavily modified UE3, and the lighting they are using is entirely a custom job for this specific game, so it's unlikely there will be a standard configuration option for it (a custom one maybe).

That said, I was able to crack one of the two clipping issues on the tile lighting last night and I'm 99.9% confident that I know how to solve the other, so at the moment it's looking very promising :)


That sounds very promising! I am wondering if the same things we can use on the FrostByte 3 engine and their tile-based lighting there as well;)
I was waiting to reply to this until I had fixed them 100%, which I now have :)

I suspect that these are going to be fairly game/engine specific, so the fix for Batman is very unlikely to work as is in any other game, but perhaps there are similarities.

This is what the lighting originally looked like:

Image

Identifying how to fix the position of the lights was rather straight forward (just the usual - find where the depth buffer is sampled, look for a 3D coordinate multiplied by it then insert a correction, in this case in view-space), but then this happened (it looked fine up close, this was only a problem from a distance):

Image

The problem is that before the lights are drawn each tile decides which lights it is going to draw, and this decision is still being made on the old 2D position of the light which is no longer lined up. We experimented with enabling all the lights in every tile, but that was a massive performance killer. Fixing this was not obvious, but it ended up just being a view-space stereo correction (or four) on what I assume must be the corners of each tile (this wasn't obvious because it was so far removed from the depth buffer samples, but enough experimentation and analysis eventually made me suspicious of some shared memory g buffers that were being used to pass some positions around, and some other code that looked like it was finding the minimum and maximum depths in a tile, until I found the correct spot somewhere between those two things):

Image

You might notice that there is still an issue with clipping here, but now it's on a larger 4x2 grid instead of the small tiles. This was a bit more tricky - the game had already calculated which lights fell within which parts of that large grid on the CPU and passed it to the shader in a structured buffer. The structured buffer had 8 sorted lists for each of the 8 large tiles in the grid and a footer containing the lengths of each list (how do I know this? frame analysis' dump_tex option dumps out structured buffers so I could examine it's contents).

Not using the list wasn't a good option. There were over 100 lights active simultaneously, but this buffer just listed IDs that were looked up in other buffers, and those IDs went upwards of almost 1000 - without the list we would have to check them all and this shader was already really expensive.

So I added some code to the shader to merge adjacent lists together, which resulted in this:

Image

But unfortunately that wasn't free either. This shader had been costing me 5fps and was now costing me 15fps (down to 10, not really playable any more)! I determined that the cost was not the extra instructions, it was the extra reads to the structured buffer containing the lists. Further experimentation showed that this was due to reading the buffer from so many threads simultaneously - I was copying it to a shared memory structured buffer, which was only shared within each thread group of 256 threads, so I had to do the merge on each thread group - and there were something like 1,800 of those while playing in 720p * 8 lists * 2 lists being merged = 28,800 simultaneous reads, and up to a maximum of 512 entries per list to be merged. Some GPUs might perform better here - I would bet that this performance issue was due to something along the lines of "cache ping pong" (though why that happened on reads...).

So, this was my motivation for finally writing that custom shader injection feature I've been planning for some time now. Using a second shader meant I could run a single thread group of 8 threads to perform the merge, eliminating the performance killing cache ping pong effect and completely restoring my original frame rate :)

The final shader is here:
https://github.com/DarkStarSword/3d-fixes/blob/master/Batman/ShaderFixes/eb8c3e5e00a6c476-cs.txt

And the injected shader to merge the lists:
https://github.com/DarkStarSword/3d-fixes/blob/master/Batman/ShaderFixes/merge_tiles.hlsl

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/11/2016 02:18 PM   
Yes, yes, yes! It works :) I have to confess that I was so curious that after the solid colour test I just skipped to Your suggested code instead of studying all the other options. But now I will take a look at them and try to understand it - after all we are in a school for shaderhackers ;) By this way I want to thank You so much for Your detailed explanations. Several times I was just going to gift the game, lean back and wait/hope for the perfect fix to be delivered. And I'm sure that this would have been more convenient for all of us ;) But on the other hand I also wanted to contribute something or at least try to learn. So thank You for Your patience and helpfulness! Now the last step will be to separate some (ground)effects from HUD elements so that they don't get messed up when I push the HUD to depth. I also had to do this for FFXIII-1 and FFXIII-2 and it was quite tricky (CRCs were not recognized reliably, etc.). On the other hand You and other hacker have provided me usefull tricks then and I hope that I can do it by myself...
Yes, yes, yes! It works :)
I have to confess that I was so curious that after the solid colour test I just skipped to Your suggested code instead of studying all the other options. But now I will take a look at them and try to understand it - after all we are in a school for shaderhackers ;) By this way I want to thank You so much for Your detailed explanations. Several times I was just going to gift the game, lean back and wait/hope for the perfect fix to be delivered. And I'm sure that this would have been more convenient for all of us ;) But on the other hand I also wanted to contribute something or at least try to learn. So thank You for Your patience and helpfulness!
Now the last step will be to separate some (ground)effects from HUD elements so that they don't get messed up when I push the HUD to depth. I also had to do this for FFXIII-1 and FFXIII-2 and it was quite tricky (CRCs were not recognized reliably, etc.). On the other hand You and other hacker have provided me usefull tricks then and I hope that I can do it by myself...

My original display name is 3d4dd - for some reason Nvidia changed it..?!

Posted 01/11/2016 03:03 PM   
[quote="3d4dd"]Yes, yes, yes! I works :) [/quote]=D
3d4dd said:Yes, yes, yes! I works :)
=D

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/11/2016 03:07 PM   
[quote="DarkStarSword"][quote="helifax"][quote="DarkStarSword"]It is UE3, but it is a very heavily modified UE3, and the lighting they are using is entirely a custom job for this specific game, so it's unlikely there will be a standard configuration option for it (a custom one maybe). That said, I was able to crack one of the two clipping issues on the tile lighting last night and I'm 99.9% confident that I know how to solve the other, so at the moment it's looking very promising :)[/quote] That sounds very promising! I am wondering if the same things we can use on the FrostByte 3 engine and their tile-based lighting there as well;)[/quote]I was waiting to reply to this until I had fixed them 100%, which I now have :) I suspect that these are going to be fairly game/engine specific, so the fix for Batman is very unlikely to work as is in any other game, but perhaps there are similarities. This is what the lighting originally looked like: [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67771/[/img] Identifying how to fix the position of the lights was rather straight forward (just the usual - find where the depth buffer is sampled, look for a 3D coordinate multiplied by it then insert a correction, in this case in view-space), but then this happened (it looked fine up close, this was only a problem from a distance): [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67772/[/img] The problem is that before the lights are drawn each tile decides which lights it is going to draw, and this decision is still being made on the old 2D position of the light which is no longer lined up. We experimented with enabling all the lights in every tile, but that was a massive performance killer. Fixing this was not obvious, but it ended up just being a view-space stereo correction (or four) on what I assume must be the corners of each tile (this wasn't obvious because it was so far removed from the depth buffer samples, but enough experimentation and analysis eventually made me suspicious of some shared memory g buffers that were being used to pass some positions around, and some other code that looked like it was finding the minimum and maximum depths in a tile, until I found the correct spot somewhere between those two things): [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67770/[/img] You might notice that there is still an issue with clipping here, but now it's on a larger 4x2 grid instead of the small tiles. This was a bit more tricky - the game had already calculated which lights fell within which parts of that large grid on the CPU and passed it to the shader in a structured buffer. The structured buffer had 8 sorted lists for each of the 8 large tiles in the grid and a footer containing the lengths of each list (how do I know this? frame analysis' dump_tex option dumps out structured buffers so I could examine it's contents). Not using the list wasn't a good option. There were over 100 lights active simultaneously, but this buffer just listed IDs that were looked up in other buffers, and those IDs went upwards of almost 1000 - without the list we would have to check them all and this shader was already really expensive. So I added some code to the shader to merge adjacent lists together, which resulted in this: [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67769/[/img] But unfortunately that wasn't free either. This shader had been costing me 5fps and was now costing me 15fps (down to 10, not really playable any more)! I determined that the cost was not the extra instructions, it was the extra reads to the structured buffer containing the lists. Further experimentation showed that this was due to reading the buffer from so many threads simultaneously - I was copying it to a shared memory structured buffer, which was only shared within each thread group of 256 threads, so I had to do the merge on each thread group - and there were something like 1,800 of those while playing in 720p * 8 lists * 2 lists being merged = 28,800 simultaneous reads, and up to a maximum of 512 entries per list to be merged. Some GPUs might perform better here - I would bet that this performance issue was due to something along the lines of "cache ping pong" (though why that happened on reads...). So, this was my motivation for finally writing that custom shader injection feature I've been planning for some time now. Using a second shader meant I could run a single thread group of 8 threads to perform the merge, eliminating the performance killing cache ping pong effect and completely restoring my original frame rate :) The final shader is here: https://github.com/bo3b/3Dmigoto/blob/master/Batman/ShaderFixes/eb8c3e5e00a6c476-cs.txt And the injected shader to merge the lists: https://github.com/bo3b/3Dmigoto/blob/master/Batman/ShaderFixes/merge_tiles.hlsl [/quote] F*cking Genius is all I can say to this :-) Astounding work :-)
DarkStarSword said:
helifax said:
DarkStarSword said:It is UE3, but it is a very heavily modified UE3, and the lighting they are using is entirely a custom job for this specific game, so it's unlikely there will be a standard configuration option for it (a custom one maybe).

That said, I was able to crack one of the two clipping issues on the tile lighting last night and I'm 99.9% confident that I know how to solve the other, so at the moment it's looking very promising :)


That sounds very promising! I am wondering if the same things we can use on the FrostByte 3 engine and their tile-based lighting there as well;)
I was waiting to reply to this until I had fixed them 100%, which I now have :)

I suspect that these are going to be fairly game/engine specific, so the fix for Batman is very unlikely to work as is in any other game, but perhaps there are similarities.

This is what the lighting originally looked like:

Image

Identifying how to fix the position of the lights was rather straight forward (just the usual - find where the depth buffer is sampled, look for a 3D coordinate multiplied by it then insert a correction, in this case in view-space), but then this happened (it looked fine up close, this was only a problem from a distance):

Image

The problem is that before the lights are drawn each tile decides which lights it is going to draw, and this decision is still being made on the old 2D position of the light which is no longer lined up. We experimented with enabling all the lights in every tile, but that was a massive performance killer. Fixing this was not obvious, but it ended up just being a view-space stereo correction (or four) on what I assume must be the corners of each tile (this wasn't obvious because it was so far removed from the depth buffer samples, but enough experimentation and analysis eventually made me suspicious of some shared memory g buffers that were being used to pass some positions around, and some other code that looked like it was finding the minimum and maximum depths in a tile, until I found the correct spot somewhere between those two things):

Image

You might notice that there is still an issue with clipping here, but now it's on a larger 4x2 grid instead of the small tiles. This was a bit more tricky - the game had already calculated which lights fell within which parts of that large grid on the CPU and passed it to the shader in a structured buffer. The structured buffer had 8 sorted lists for each of the 8 large tiles in the grid and a footer containing the lengths of each list (how do I know this? frame analysis' dump_tex option dumps out structured buffers so I could examine it's contents).

Not using the list wasn't a good option. There were over 100 lights active simultaneously, but this buffer just listed IDs that were looked up in other buffers, and those IDs went upwards of almost 1000 - without the list we would have to check them all and this shader was already really expensive.

So I added some code to the shader to merge adjacent lists together, which resulted in this:

Image

But unfortunately that wasn't free either. This shader had been costing me 5fps and was now costing me 15fps (down to 10, not really playable any more)! I determined that the cost was not the extra instructions, it was the extra reads to the structured buffer containing the lists. Further experimentation showed that this was due to reading the buffer from so many threads simultaneously - I was copying it to a shared memory structured buffer, which was only shared within each thread group of 256 threads, so I had to do the merge on each thread group - and there were something like 1,800 of those while playing in 720p * 8 lists * 2 lists being merged = 28,800 simultaneous reads, and up to a maximum of 512 entries per list to be merged. Some GPUs might perform better here - I would bet that this performance issue was due to something along the lines of "cache ping pong" (though why that happened on reads...).

So, this was my motivation for finally writing that custom shader injection feature I've been planning for some time now. Using a second shader meant I could run a single thread group of 8 threads to perform the merge, eliminating the performance killing cache ping pong effect and completely restoring my original frame rate :)

The final shader is here:

https://github.com/bo3b/3Dmigoto/blob/master/Batman/ShaderFixes/eb8c3e5e00a6c476-cs.txt


And the injected shader to merge the lists:

https://github.com/bo3b/3Dmigoto/blob/master/Batman/ShaderFixes/merge_tiles.hlsl


F*cking Genius is all I can say to this :-) Astounding work :-)

Rig: Intel i7-8700K @4.7GHz, 16Gb Ram, SSD, GTX 1080Ti, Win10x64, Asus VG278

Posted 01/11/2016 03:07 PM   
[quote="mike_ar69"][quote="DarkStarSword"][quote="helifax"][quote="DarkStarSword"]It is UE3, but it is a very heavily modified UE3, and the lighting they are using is entirely a custom job for this specific game, so it's unlikely there will be a standard configuration option for it (a custom one maybe). That said, I was able to crack one of the two clipping issues on the tile lighting last night and I'm 99.9% confident that I know how to solve the other, so at the moment it's looking very promising :)[/quote] That sounds very promising! I am wondering if the same things we can use on the FrostByte 3 engine and their tile-based lighting there as well;)[/quote]I was waiting to reply to this until I had fixed them 100%, which I now have :) I suspect that these are going to be fairly game/engine specific, so the fix for Batman is very unlikely to work as is in any other game, but perhaps there are similarities. This is what the lighting originally looked like: [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67771/[/img] Identifying how to fix the position of the lights was rather straight forward (just the usual - find where the depth buffer is sampled, look for a 3D coordinate multiplied by it then insert a correction, in this case in view-space), but then this happened (it looked fine up close, this was only a problem from a distance): [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67772/[/img] The problem is that before the lights are drawn each tile decides which lights it is going to draw, and this decision is still being made on the old 2D position of the light which is no longer lined up. We experimented with enabling all the lights in every tile, but that was a massive performance killer. Fixing this was not obvious, but it ended up just being a view-space stereo correction (or four) on what I assume must be the corners of each tile (this wasn't obvious because it was so far removed from the depth buffer samples, but enough experimentation and analysis eventually made me suspicious of some shared memory g buffers that were being used to pass some positions around, and some other code that looked like it was finding the minimum and maximum depths in a tile, until I found the correct spot somewhere between those two things): [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67770/[/img] You might notice that there is still an issue with clipping here, but now it's on a larger 4x2 grid instead of the small tiles. This was a bit more tricky - the game had already calculated which lights fell within which parts of that large grid on the CPU and passed it to the shader in a structured buffer. The structured buffer had 8 sorted lists for each of the 8 large tiles in the grid and a footer containing the lengths of each list (how do I know this? frame analysis' dump_tex option dumps out structured buffers so I could examine it's contents). Not using the list wasn't a good option. There were over 100 lights active simultaneously, but this buffer just listed IDs that were looked up in other buffers, and those IDs went upwards of almost 1000 - without the list we would have to check them all and this shader was already really expensive. So I added some code to the shader to merge adjacent lists together, which resulted in this: [img]https://forums.geforce.com/cmd/default/download-comment-attachment/67769/[/img] But unfortunately that wasn't free either. This shader had been costing me 5fps and was now costing me 15fps (down to 10, not really playable any more)! I determined that the cost was not the extra instructions, it was the extra reads to the structured buffer containing the lists. Further experimentation showed that this was due to reading the buffer from so many threads simultaneously - I was copying it to a shared memory structured buffer, which was only shared within each thread group of 256 threads, so I had to do the merge on each thread group - and there were something like 1,800 of those while playing in 720p * 8 lists * 2 lists being merged = 28,800 simultaneous reads, and up to a maximum of 512 entries per list to be merged. Some GPUs might perform better here - I would bet that this performance issue was due to something along the lines of "cache ping pong" (though why that happened on reads...). So, this was my motivation for finally writing that custom shader injection feature I've been planning for some time now. Using a second shader meant I could run a single thread group of 8 threads to perform the merge, eliminating the performance killing cache ping pong effect and completely restoring my original frame rate :) The final shader is here: https://github.com/bo3b/3Dmigoto/blob/master/Batman/ShaderFixes/eb8c3e5e00a6c476-cs.txt And the injected shader to merge the lists: https://github.com/bo3b/3Dmigoto/blob/master/Batman/ShaderFixes/merge_tiles.hlsl [/quote] F*cking Genius is all I can say to this :-) Astounding work :-) [/quote] .... aaa...aaaa... WOOOOW... NOT THAT! Is some serious coding and STUFF you did there! Really impressive and really awesome! .....aaaa... WOW AGAIN! BRILLIANT JOB! I'm positively speechless right now...
mike_ar69 said:
DarkStarSword said:
helifax said:
DarkStarSword said:It is UE3, but it is a very heavily modified UE3, and the lighting they are using is entirely a custom job for this specific game, so it's unlikely there will be a standard configuration option for it (a custom one maybe).

That said, I was able to crack one of the two clipping issues on the tile lighting last night and I'm 99.9% confident that I know how to solve the other, so at the moment it's looking very promising :)


That sounds very promising! I am wondering if the same things we can use on the FrostByte 3 engine and their tile-based lighting there as well;)
I was waiting to reply to this until I had fixed them 100%, which I now have :)

I suspect that these are going to be fairly game/engine specific, so the fix for Batman is very unlikely to work as is in any other game, but perhaps there are similarities.

This is what the lighting originally looked like:

Image

Identifying how to fix the position of the lights was rather straight forward (just the usual - find where the depth buffer is sampled, look for a 3D coordinate multiplied by it then insert a correction, in this case in view-space), but then this happened (it looked fine up close, this was only a problem from a distance):

Image

The problem is that before the lights are drawn each tile decides which lights it is going to draw, and this decision is still being made on the old 2D position of the light which is no longer lined up. We experimented with enabling all the lights in every tile, but that was a massive performance killer. Fixing this was not obvious, but it ended up just being a view-space stereo correction (or four) on what I assume must be the corners of each tile (this wasn't obvious because it was so far removed from the depth buffer samples, but enough experimentation and analysis eventually made me suspicious of some shared memory g buffers that were being used to pass some positions around, and some other code that looked like it was finding the minimum and maximum depths in a tile, until I found the correct spot somewhere between those two things):

Image

You might notice that there is still an issue with clipping here, but now it's on a larger 4x2 grid instead of the small tiles. This was a bit more tricky - the game had already calculated which lights fell within which parts of that large grid on the CPU and passed it to the shader in a structured buffer. The structured buffer had 8 sorted lists for each of the 8 large tiles in the grid and a footer containing the lengths of each list (how do I know this? frame analysis' dump_tex option dumps out structured buffers so I could examine it's contents).

Not using the list wasn't a good option. There were over 100 lights active simultaneously, but this buffer just listed IDs that were looked up in other buffers, and those IDs went upwards of almost 1000 - without the list we would have to check them all and this shader was already really expensive.

So I added some code to the shader to merge adjacent lists together, which resulted in this:

Image

But unfortunately that wasn't free either. This shader had been costing me 5fps and was now costing me 15fps (down to 10, not really playable any more)! I determined that the cost was not the extra instructions, it was the extra reads to the structured buffer containing the lists. Further experimentation showed that this was due to reading the buffer from so many threads simultaneously - I was copying it to a shared memory structured buffer, which was only shared within each thread group of 256 threads, so I had to do the merge on each thread group - and there were something like 1,800 of those while playing in 720p * 8 lists * 2 lists being merged = 28,800 simultaneous reads, and up to a maximum of 512 entries per list to be merged. Some GPUs might perform better here - I would bet that this performance issue was due to something along the lines of "cache ping pong" (though why that happened on reads...).

So, this was my motivation for finally writing that custom shader injection feature I've been planning for some time now. Using a second shader meant I could run a single thread group of 8 threads to perform the merge, eliminating the performance killing cache ping pong effect and completely restoring my original frame rate :)

The final shader is here:

https://github.com/bo3b/3Dmigoto/blob/master/Batman/ShaderFixes/eb8c3e5e00a6c476-cs.txt


And the injected shader to merge the lists:

https://github.com/bo3b/3Dmigoto/blob/master/Batman/ShaderFixes/merge_tiles.hlsl


F*cking Genius is all I can say to this :-) Astounding work :-)


.... aaa...aaaa... WOOOOW... NOT THAT! Is some serious coding and STUFF you did there! Really impressive and really awesome! .....aaaa... WOW AGAIN! BRILLIANT JOB! I'm positively speechless right now...

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 01/11/2016 05:46 PM   
A quick note about my last experiences with the FFLR fix. It seems that the debug.dll has some problems with this game. Because of the necessary UseExtInterfaceOnly=true all the texture CRCs are shown as 0xFFFFFFFF as 4everAwake and DarkStarSword already said. As a work around I could use the PS to distinguish the effects that need to be fixed from other elements (DarkStarSword explained it here: [url]https://forums.geforce.com/default/topic/766890/3d-vision/bo3bs-school-for-shaderhackers/22/[/url]). As in this game very different effects use the same VS and/or PS I couldn't fix everything this way but the major issues should be solved. Another problem was the fact that some VS I hunted and placed without any changes into the ShaderOverride just disabled the effect I wanted to fix. Changing these VS even caused the game to crash! So I e.g. couldn't push the sun to dept. I can only offer the option to disable it with an hotkey using the only PS I found (which is on the other hand so common that this disables many other effects so that I don't want to disbale it by default). Same issue with certain lense flare effects. But the sun and lense flare effects are rarely visible so it is a minor issue. Most important is the fix for the shadows that worked excellent in every situation I tried so far :)
A quick note about my last experiences with the FFLR fix. It seems that the debug.dll has some problems with this game. Because of the necessary UseExtInterfaceOnly=true all the texture CRCs are shown as 0xFFFFFFFF as 4everAwake and DarkStarSword already said. As a work around I could use the PS to distinguish the effects that need to be fixed from other elements (DarkStarSword explained it here: https://forums.geforce.com/default/topic/766890/3d-vision/bo3bs-school-for-shaderhackers/22/). As in this game very different effects use the same VS and/or PS I couldn't fix everything this way but the major issues should be solved.
Another problem was the fact that some VS I hunted and placed without any changes into the ShaderOverride just disabled the effect I wanted to fix. Changing these VS even caused the game to crash! So I e.g. couldn't push the sun to dept. I can only offer the option to disable it with an hotkey using the only PS I found (which is on the other hand so common that this disables many other effects so that I don't want to disbale it by default). Same issue with certain lense flare effects. But the sun and lense flare effects are rarely visible so it is a minor issue.
Most important is the fix for the shadows that worked excellent in every situation I tried so far :)

My original display name is 3d4dd - for some reason Nvidia changed it..?!

Posted 01/12/2016 06:33 PM   
  49 / 88    
Scroll To Top