I have an idea of how to skip that culling, but I would like to know if it's even doable.
What if I used a main shader of the buildings and run a pass to render the depth buffer using the viewprojection matrix from any of the occluded objects used for the shadow map. Then because it will be 1 frame late I could combine the new depth buffer with the depth buffer used to render the shadows but using the sampling offset. The thing I'm unsure about is would I be able to somehow calculate that pixel offset from old and new viewprojection matrix of the light source view.
I have an idea of how to skip that culling, but I would like to know if it's even doable.
What if I used a main shader of the buildings and run a pass to render the depth buffer using the viewprojection matrix from any of the occluded objects used for the shadow map. Then because it will be 1 frame late I could combine the new depth buffer with the depth buffer used to render the shadows but using the sampling offset. The thing I'm unsure about is would I be able to somehow calculate that pixel offset from old and new viewprojection matrix of the light source view.
I don't know for sure - it sounds crazy enough that it just might work, but you are very much in the realm of dragons right now. I'm not entirely sure I follow what you mean by sampling offset... depth bias?
You might be able to calculate the shadow matrix for the current frame, something like inverse(last_frame_shadow_matrix) * last_frame_view_project * inverse(this_frame_view_project) * last_frame_shadow_matrix
If the shadow matrix is something like clip_to_shadow or screen_to_shadow you will need a last_clip_to_this_clip or similar matrix - but you can most likely calculate that as well from the last and current view projection matrices.
I don't know if that will work, and maybe I've got the order wrong, but it might be worth a shot.
I'm doing vaguely similar things in UE4 at the moment (for stereo, not shadows), and compensating for the differences between the last and current frame is how I fix temporal AA and motion blur (but in my case I have a ClipToPrevClip matrix). I'm not quite ready to announce that just yet, but if it helps this is one way I can calculate a correction matrix for the SVPosition coordinate system (I've already calculated a stereo version of SVPositionToTranslatedWorld from other things, but in your case you would use something like the last and current frame view projection matrices to generate a correction matrix in either world or clip space and go from there):
[code]
injection[0].sv_position_inv = mul(stereo_SVPositionToTranslatedWorld, inverse(mono.SVPositionToTranslatedWorld));
[/code]
I don't know for sure - it sounds crazy enough that it just might work, but you are very much in the realm of dragons right now. I'm not entirely sure I follow what you mean by sampling offset... depth bias?
You might be able to calculate the shadow matrix for the current frame, something like inverse(last_frame_shadow_matrix) * last_frame_view_project * inverse(this_frame_view_project) * last_frame_shadow_matrix
If the shadow matrix is something like clip_to_shadow or screen_to_shadow you will need a last_clip_to_this_clip or similar matrix - but you can most likely calculate that as well from the last and current view projection matrices.
I don't know if that will work, and maybe I've got the order wrong, but it might be worth a shot.
I'm doing vaguely similar things in UE4 at the moment (for stereo, not shadows), and compensating for the differences between the last and current frame is how I fix temporal AA and motion blur (but in my case I have a ClipToPrevClip matrix). I'm not quite ready to announce that just yet, but if it helps this is one way I can calculate a correction matrix for the SVPosition coordinate system (I've already calculated a stereo version of SVPositionToTranslatedWorld from other things, but in your case you would use something like the last and current frame view projection matrices to generate a correction matrix in either world or clip space and go from there):
My alter ego is rule-bender, so I will just say one thing “Hold my beer!” :D
By saying an offsett I meant to compensate the camera movement between combining the old depth buffer used to draw shadows (with occluded buildings removed, as it’s the most visible artifact) with that 1 frame late depth buffer I render called from the main buildings shader
My alter ego is rule-bender, so I will just say one thing “Hold my beer!” :D
By saying an offsett I meant to compensate the camera movement between combining the old depth buffer used to draw shadows (with occluded buildings removed, as it’s the most visible artifact) with that 1 frame late depth buffer I render called from the main buildings shader
I've started to look into some shaders last night and am a bit confused by the results I got from 3DM. This is in Watchdogs 1. Its one of my favorite games by now. Don't think these games got all the credit it deserved as with Assassins Creed III. Two recent games I caught up with. Been reading through their respective threads and realized how much effort went into them and breakthroughs been made by these titles alone! Its unbelievable! and at the same time very impressive whats been achieved here. I hope 3d Migoto will continue to evolve with the same pace engines does =) Thank you for all the effort you guys are investing in this wrapper and fixes been done!
My problem might be a very simple one. What I did was to remove all the repaired shaders from the fixed folder to start clean hunting. I just want to get hold of shaders to compare. But the 5 shaders I dumped didn't match any in the fixed folder? Some of the textures were blotted out as pixel and vertex shaders. But I dont know why they wont correspond with at least one shader in the fixed folder?
The games are pretty much flawless and looking amazing but there are still some messed up shaders that I thought I could try and learn from while I have some shaders to compare them with. Can someone please tell me whats going on and how I will find some shaders to compare?
Thanks
I've started to look into some shaders last night and am a bit confused by the results I got from 3DM. This is in Watchdogs 1. Its one of my favorite games by now. Don't think these games got all the credit it deserved as with Assassins Creed III. Two recent games I caught up with. Been reading through their respective threads and realized how much effort went into them and breakthroughs been made by these titles alone! Its unbelievable! and at the same time very impressive whats been achieved here. I hope 3d Migoto will continue to evolve with the same pace engines does =) Thank you for all the effort you guys are investing in this wrapper and fixes been done!
My problem might be a very simple one. What I did was to remove all the repaired shaders from the fixed folder to start clean hunting. I just want to get hold of shaders to compare. But the 5 shaders I dumped didn't match any in the fixed folder? Some of the textures were blotted out as pixel and vertex shaders. But I dont know why they wont correspond with at least one shader in the fixed folder?
The games are pretty much flawless and looking amazing but there are still some messed up shaders that I thought I could try and learn from while I have some shaders to compare them with. Can someone please tell me whats going on and how I will find some shaders to compare?
Have the fixed shaders in and use marking_mode=original to hunt - then as you go past a fixed effect it will change back to the broken version. Hit the dump key and the file with the most recent timestamp in ShaderFixes will be for that, then you can study it to see how it was fixed.
Also turn on export_hlsl so 3DMigoto will dump all the shaders to ShaderCache (if memory serves it might not dump ones that are also in ShaderFixes - so remove those for that run).
Have the fixed shaders in and use marking_mode=original to hunt - then as you go past a fixed effect it will change back to the broken version. Hit the dump key and the file with the most recent timestamp in ShaderFixes will be for that, then you can study it to see how it was fixed.
Also turn on export_hlsl so 3DMigoto will dump all the shaders to ShaderCache (if memory serves it might not dump ones that are also in ShaderFixes - so remove those for that run).
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
I've managed to get some shaders to compare. But somehow I feel like I might have wasted my time. (hopefully not). At the end of the shaders there are a difference in the compiler versions...
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
Should there be any differences, is this normal?
What I've done was to look for all the lines that have been modified and those added. This way I could look for consistency in the different shaders more easily.
So the changed code I marked with "[color="orange"]>>>>>[/color]"
The lines which have been added a simple "[color="orange"]added[/color]" reference.
I will add the differences I've compiled that Mike had to implement to get the shader in the correct stereo scopic position. This is some crazy stuff!!! Dont know how you guys figure this shit out!?!?!
Please advice if my approach will work and improvise where necessary. Im still not 100% sure if 3dM is doing what you explained DSS. When I made the necessary changes in the ini you suggested I weren't able to dump the shaders like you explained. There were very little visual "movement" and rather than breaking them one actually was fixed. Very weird. Think the one shader that was fixed is actually the one Im adding. I tried to keep it as simple and neat as possible. The code are in brackets with the line number added. Please feel free to ask if my compilation doesn't make sense.
[code]189.... "r0.x = abs(r1.y) < abs(r1.x);" >>>>> 189.... "r1.x = abs(r1.y) < abs(r1.x);"
added 190.... "r1.y = abs(r1.z) < abs(r1.x);"
added 191.... "r1.z = abs(r1.x) < abs(r1.y);"
added 192.... "r1.w = abs(r1.z) < abs(r1.y);"
193.... "r0.x = r1.y ? r1.x : 0;" >>>>> "r0.x = r0.x ? 1.000000 : 0;"
194.... "r0.y = r1.w ? r1.z : 0;" >>>>> "r0.y = abs(r1.z) < r0.x;"
201.... "r5.xyzw = _DiffuseUVTiling1.xyzw * r1.xyzw;" >>>>> "r5.xyzw = _SpecularUVTiling1.xyzw * r1.xyzw;"
202.... "r1.xyzw = _SpecularUVTiling1.xyzw * r1.xyzw;" >>>>> "r1.xyzw = _DiffuseUVTiling1.xyzw * r1.xyzw;"
203.... "o3.zw = r1.xy + r1.zw;" >>>>> "o3.xy = r1.xy + r1.zw;"
204.... "o3.xy = r5.xy + r5.zw;" >>>>> "o3.zw = r5.xy + r5.zw;
added 202.... "r0.z = r0.x ? r0.y : 0;"
added 203.... "r0.y = r0.y ? 1.000000 : 0;"
added 204.... "r0.x = r0.x < r0.y;"
added 205.... "r0.x = r0.x ? 1.000000 : 0;"
added 206.... "r0.y = r0.x < r0.y;"
added 207.... "r0.x = r0.x ? r0.y : 0;"
added 208.... "r0.x = 0 != r0.x;"
added 209.... "r0.z = 0 != r0.z;"
205.... "r4.w = r0.y ? r4.z : r4.y; >>>>> "r4.w = r0.x ? r4.z : r4.y;"
206.... "r0.xy = r0.xx ? r4.yz : r4.xw; >>>>> "r0.xy = r0.zz ? r4.yz : r4.xw;"
207.... "r0.z = 5.000000000e-001 * _ViewportSize.x;" >>>>> "r0.z = _ViewportSize.x * 5.000000000e-001;"
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
475.... removed "lt r1.xyzw, |r1.yzxz|, |r1.xxyy|"
476.... removed "and r0.xy, r1.ywyy, r1.xzxx"
added 480.... "lt r0.x, |r1.y|, |r1.x|"
added 481.... "and r0.x, r0.x, l(0x3f800000)"
added 482.... "lt r0.y, |r1.z|, r0.x"
485.... "add o3.zw, r1.zzzw, r1.xxxy" >>>>> "add o3.xy, r1.zwzz, r1.xyxx"
486.... "add o3.xy, r5.zwzz, r5.xyxx" >>>>> "add o3.zw, r5.zzzw, r5.xxxy"
added 493.... "and r0.z, r0.x, r0.y"
added 494.... "and r0.y, r0.y, l(0x3f800000)"
added 495.... "lt r0.x, r0.x, r0.y"
added 496.... "and r0.x, r0.x, l(0x3f800000)"
added 497.... "lt r0.y, r0.x, r0.y"
added 498.... "and r0.x, r0.x, r0.y"
added 499.... "ner0.xz,l(0.000000,0.000000,0.000000, 0.000000), r0.xxzx"
487.... "movc r4.w, r0.y, r4.z, r4.y" >>>>> "movc r4.w, r0.x, r4.z, r4.y"
488.... "movc r0.xy, r0.xxxx, r4.yzyy, r4.xwxx" >>>>> "movc r0.xy, r0.zzzz, r4.yzyy, r4.xwxx"
489.... "mul r0.z, cb0[35].x, l(0.500000)" >>>>> "mul r0.z, l(0.500000), cb0[35].x"
// Approximately 77 instruction slots used
// Approximately 85 instruction slots used[/code]
Im in no rush with this. Im making a backup of my save so I can simultaneously continue playing and fixing. The broken shaders are in a place you only probably pass once on this specific mission. =)
I've managed to get some shaders to compare. But somehow I feel like I might have wasted my time. (hopefully not). At the end of the shaders there are a difference in the compiler versions...
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
Should there be any differences, is this normal?
What I've done was to look for all the lines that have been modified and those added. This way I could look for consistency in the different shaders more easily.
So the changed code I marked with ">>>>>"
The lines which have been added a simple "added" reference.
I will add the differences I've compiled that Mike had to implement to get the shader in the correct stereo scopic position. This is some crazy stuff!!! Dont know how you guys figure this shit out!?!?!
Please advice if my approach will work and improvise where necessary. Im still not 100% sure if 3dM is doing what you explained DSS. When I made the necessary changes in the ini you suggested I weren't able to dump the shaders like you explained. There were very little visual "movement" and rather than breaking them one actually was fixed. Very weird. Think the one shader that was fixed is actually the one Im adding. I tried to keep it as simple and neat as possible. The code are in brackets with the line number added. Please feel free to ask if my compilation doesn't make sense.
// Approximately 77 instruction slots used
// Approximately 85 instruction slots used
Im in no rush with this. Im making a backup of my save so I can simultaneously continue playing and fixing. The broken shaders are in a place you only probably pass once on this specific mission. =)
[quote="KoelerMeester 1ste"]I've managed to get some shaders to compare. But somehow I feel like I might have wasted my time. (hopefully not). At the end of the shaders there are a difference in the compiler versions...
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
Should there be any differences, is this normal?[/quote]No - unless 3DMigoto was set to use the "bytecode" hash type a change in the compiler version would have completely changed the shader hash and they wouldn't match up at all. I think you may have hit one of our long standing bugs where 3DMigoto can decompile a previously fixed shader - this can happen for example if you delete a shader from ShaderFixes then dump it out again. I've fixed a bunch of these recently, which will be in 3DMigoto 1.2.66, so you might like to try again once that is out.
[quote]What I've done was to look for all the lines that have been modified and those added. This way I could look for consistency in the different shaders more easily.[/quote]None of those lines look like anything we would normally change - I think this is probably the result of 3DMigoto decompiling a fixed shader that it shouldn't have.
[quote]I will add the differences I've compiled that Mike had to implement to get the shader in the correct stereo scopic position. This is some crazy stuff!!! Dont know how you guys figure this shit out!?!?![/quote]Have you gone through Bo3b's shaderhacker school? That will teach you some of the basics, and you can't go on to anything more advanced until you have a good mastery of that.
[quote]Please advice if my approach will work and improvise where necessary. Im still not 100% sure if 3dM is doing what you explained DSS. When I made the necessary changes in the ini you suggested I weren't able to dump the shaders like you explained. There were very little visual "movement" and rather than breaking them one actually was fixed. Very weird. Think the one shader that was fixed is actually the one Im adding. I tried to keep it as simple and neat as possible. The code are in brackets with the line number added. Please feel free to ask if my compilation doesn't make sense.[/quote]None of the changed or added lines have anything to do with 3D coordinates - if marking_mode=original is showing this one looks better without the changes then you should not be touching it.
Also, if you haven't already you should update to the latest 3DMigoto DLL - we have fixed a metric tonne of bugs since the original Watch Dogs fix, not to mention all the modern features that have been added since then.
KoelerMeester 1ste said:I've managed to get some shaders to compare. But somehow I feel like I might have wasted my time. (hopefully not). At the end of the shaders there are a difference in the compiler versions...
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
Should there be any differences, is this normal?
No - unless 3DMigoto was set to use the "bytecode" hash type a change in the compiler version would have completely changed the shader hash and they wouldn't match up at all. I think you may have hit one of our long standing bugs where 3DMigoto can decompile a previously fixed shader - this can happen for example if you delete a shader from ShaderFixes then dump it out again. I've fixed a bunch of these recently, which will be in 3DMigoto 1.2.66, so you might like to try again once that is out.
What I've done was to look for all the lines that have been modified and those added. This way I could look for consistency in the different shaders more easily.
None of those lines look like anything we would normally change - I think this is probably the result of 3DMigoto decompiling a fixed shader that it shouldn't have.
I will add the differences I've compiled that Mike had to implement to get the shader in the correct stereo scopic position. This is some crazy stuff!!! Dont know how you guys figure this shit out!?!?!
Have you gone through Bo3b's shaderhacker school? That will teach you some of the basics, and you can't go on to anything more advanced until you have a good mastery of that.
Please advice if my approach will work and improvise where necessary. Im still not 100% sure if 3dM is doing what you explained DSS. When I made the necessary changes in the ini you suggested I weren't able to dump the shaders like you explained. There were very little visual "movement" and rather than breaking them one actually was fixed. Very weird. Think the one shader that was fixed is actually the one Im adding. I tried to keep it as simple and neat as possible. The code are in brackets with the line number added. Please feel free to ask if my compilation doesn't make sense.
None of the changed or added lines have anything to do with 3D coordinates - if marking_mode=original is showing this one looks better without the changes then you should not be touching it.
Also, if you haven't already you should update to the latest 3DMigoto DLL - we have fixed a metric tonne of bugs since the original Watch Dogs fix, not to mention all the modern features that have been added since then.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
The vertex buffer is a resource like any other, so you can use arbitrary resource copying in the same way you would copy textures. If you copy it into a texture slot 3DMigoto will turn it into a structured buffer and chop off any vertex offset within that buffer, then in the destination shader you can define a structure that matches the vertex buffer layout to access fields inside it.
e.g. I access the vertex buffers in Dreamfall Chapters to find the center of floating icons as part of the HUD analysis and adjustment:
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_vb_0bd32bb622c2d611.hlsl
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_analyse.hlsl
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud.hlsl
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_analyse_clear.hlsl
https://raw.githubusercontent.com/DarkStarSword/3d-fixes-DreamfallChapters/master/d3dx.ini (search for vb0)
If you specifically want to build a table of vertex data from multiple draw calls, take a close look at what analyse_text_shader() is doing (it's recording positions from matrices in a constant buffer, but would be trivial to do the same with vertex buffer data).
Just beware that the vertex buffer layout is similar to, but does not match the inputs to the vertex shader - I suggest using frame analysis and/or my constant buffer debug shader (which works with other buffers, but beware that it treats everything as 32bit floats, which will not be the case for all fields in a vertex buffer) to examine it for yourself to work out the correct layout.
Also beware that it might use certain formats that are not available inside shaders - position is usually just a vector of four regular 32bit little endian floats, so that's easy since it matches HLSL's float4 data type, but if you needed to pull out say a colour stored as an 8 bit integer, you would have to read out the whole 32bits as an unsigned integer and mask and shift to get the bits you need (tip: use unsigned data types or shift before masking so you don't need to worry about sign extended shifting messing things up if the high bit was a 1 - ishr vs ushr instructions).
One [color="orange"][b]huge[/b][/color] gotcha that I always forget - if you read the input position from the vertex buffer directly it might have undefined fields, like the position W coordinate is often NAN and if you use that in a calculation you will wind up with all sorts of bizzare behaviour (read the comments in the above example - I still have no idea why the symptoms of this problem *change after an F10 reload* - best guess is something to do with GPU or system memory layout subtly changing on a reload affecting what garbage data was in memory before filling out the vertex data), so if you read the input position ignore W and use a hardcoded W=1 instead.
The vertex buffer is a resource like any other, so you can use arbitrary resource copying in the same way you would copy textures. If you copy it into a texture slot 3DMigoto will turn it into a structured buffer and chop off any vertex offset within that buffer, then in the destination shader you can define a structure that matches the vertex buffer layout to access fields inside it.
If you specifically want to build a table of vertex data from multiple draw calls, take a close look at what analyse_text_shader() is doing (it's recording positions from matrices in a constant buffer, but would be trivial to do the same with vertex buffer data).
Just beware that the vertex buffer layout is similar to, but does not match the inputs to the vertex shader - I suggest using frame analysis and/or my constant buffer debug shader (which works with other buffers, but beware that it treats everything as 32bit floats, which will not be the case for all fields in a vertex buffer) to examine it for yourself to work out the correct layout.
Also beware that it might use certain formats that are not available inside shaders - position is usually just a vector of four regular 32bit little endian floats, so that's easy since it matches HLSL's float4 data type, but if you needed to pull out say a colour stored as an 8 bit integer, you would have to read out the whole 32bits as an unsigned integer and mask and shift to get the bits you need (tip: use unsigned data types or shift before masking so you don't need to worry about sign extended shifting messing things up if the high bit was a 1 - ishr vs ushr instructions).
One huge gotcha that I always forget - if you read the input position from the vertex buffer directly it might have undefined fields, like the position W coordinate is often NAN and if you use that in a calculation you will wind up with all sorts of bizzare behaviour (read the comments in the above example - I still have no idea why the symptoms of this problem *change after an F10 reload* - best guess is something to do with GPU or system memory layout subtly changing on a reload affecting what garbage data was in memory before filling out the vertex data), so if you read the input position ignore W and use a hardcoded W=1 instead.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Thanks DSS, I'm planning to use it for just the depth buffer calculations, so no colour data is needed in my case. What about the instanced geometry? Will it work the same?
Thanks DSS, I'm planning to use it for just the depth buffer calculations, so no colour data is needed in my case. What about the instanced geometry? Will it work the same?
Damnit, forum ate my post again :(
It might not do the right thing when chopping off the vertex offset - it can't distinguish between vertex and instance buffers so would apply StartVertexLocation / BaseVertexLocation as the offset to both (all) vertex & instance buffers and ignore StartInstanceLocation entirely. You might want to check a frame analysis log to make sure the game is using 0 for these in the DrawInstanced / DrawIndexedInstanced calls*, otherwise I'll need to work out the best way to handle it (the flexibility of the IA layout makes this non-trivial... I might be better off ignoring the vertex offset for instanced geometry and leaving it to you to work out how to deal with it in the shader... I always meant to add a keyword to ignore the offset when copying a buffer, though there are actually two offsets 3DMigoto takes into account for vertex buffers so we'd only want to ignore this one).
If you need it, you can get access to the VertexCountPerInstance / IndexCountPerInstance and InstanceCount parameters via command list assignment (these will be 0 if they aren't applicable to the current draw call):
[code]
x = vertex_count
y = index_count
z = instance_count
[/code]
* if the game is using one of the Indirect draw calls 3DMigoto won't do anything with the offsets and all the counts will be 0... Oh, hey I should use the new 'this' keyword to get access to the indirect buffer via the command list :)
It might not do the right thing when chopping off the vertex offset - it can't distinguish between vertex and instance buffers so would apply StartVertexLocation / BaseVertexLocation as the offset to both (all) vertex & instance buffers and ignore StartInstanceLocation entirely. You might want to check a frame analysis log to make sure the game is using 0 for these in the DrawInstanced / DrawIndexedInstanced calls*, otherwise I'll need to work out the best way to handle it (the flexibility of the IA layout makes this non-trivial... I might be better off ignoring the vertex offset for instanced geometry and leaving it to you to work out how to deal with it in the shader... I always meant to add a keyword to ignore the offset when copying a buffer, though there are actually two offsets 3DMigoto takes into account for vertex buffers so we'd only want to ignore this one).
If you need it, you can get access to the VertexCountPerInstance / IndexCountPerInstance and InstanceCount parameters via command list assignment (these will be 0 if they aren't applicable to the current draw call):
x = vertex_count
y = index_count
z = instance_count
* if the game is using one of the Indirect draw calls 3DMigoto won't do anything with the offsets and all the counts will be 0... Oh, hey I should use the new 'this' keyword to get access to the indirect buffer via the command list :)
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Ok, I’ve analysed your code and I partially understand it. I know how to store the model transformation matrices for each draw call. The only thing that’s still not clear to me is how do I store/use the geometry buffers from the multiple draw calls in the custom resource. Can I do it in the ini, or it requires some more advanced trickery inside the shader?
Ok, I’ve analysed your code and I partially understand it. I know how to store the model transformation matrices for each draw call. The only thing that’s still not clear to me is how do I store/use the geometry buffers from the multiple draw calls in the custom resource. Can I do it in the ini, or it requires some more advanced trickery inside the shader?
Can you take a step back and tell me what you are trying to achieve here? I'm a little concerned that your use case may not be well suited to 3DMigoto's current level of support, and while I think you can probably still achieve what you want going down this path, there may be better options we haven't considered yet, or dead ends we might hit if we don't.
A little background: I originally added the ability to access vertex buffers to 3DMigoto so that we could find the locations of other vertices from any vertex being drawn, which is useful for automatic depth buffer based adjustments to HUD elements where we may need to find the center of an icon (or at least one single consistent point) to check the depth buffer from each invocation to make sure we adjust all vertices by the same amount so as not to distort the icon.
The Dreamfall Chapters example is a more extreme example of the same concept, but instead of accessing the vertex buffer from other vertex shader invocations I'm using a compute shader to gather the positions of multiple HUD and text elements across a frame to decide on one single point to ray trace the depth buffer to determine what depth to draw the entire HUD at in the following frame.
Another example where we might do something similar is to solve the classic 2D nameplate/health bar problem, by collecting the positions and depths of every actor that may have a nameplate and match up the nameplates to the closest actor - in some cases we only need the matrices, but in others we may need a point from the vertex buffers as well.
In all those cases, I'm only interested in a small piece of information (one single position for the entire object) I can get from the vertex buffer, so I only store that - I have no need to store the entire vertex buffer, and I'm concerned from your comments that you might be.
Depending on what you're trying to do, maybe we should be looking into e.g. hooking up the stream output stage and/or indirect draw calls in 3DMigoto (DrawAuto and assigning resources to the stream output stage is already in, but without being able to configure a geometry/vertex/domain shader to use the stream output it is of limited use), or maybe we need to start thinking about ways we could give you more precise control over things like resource arrays, subresource copying and so on from 3DMigoto if they could help (some of this may be of limited use until we get more flexibility elsewhere in the command lists though, like "x = x + instance_count" and so on), or maybe we can just find another way to approach the problem.
Or perhaps if you do need something a little too situation specific we could think about creating an API for 3DMigoto to enable you to write a sort of extension DLL to do what you need, or otherwise add some sort of scripting support to 3DMigoto - if you noticed the recent bike shedding issue I opened in the tracker you would realise I'm already working on something like this for a UE4 specific extension (delayed because of depression triggers) and am open to input if other people might also like to use an API in 3DMigoto for whatever.
-----
Anyway, if you keep going down this path you would store the data you are interested in inside an array in the custom resource (either an array inside the structure as I did in Dreamfall Chapters, or use a higher number for array= inside the custom resource definition to turn the whole resource into an array, which may be more suitable if you want to use it *as* a vertex buffer again later, which is not tested but should work, especially once we gain support for indirect draw calls). You would also use a counter separate from the array (either a separate field in the structure as I did, or possibly in an entirely separate resource, which would be more appropriate if you turn the whole resource into an array) to know which index you are up to.
If you are only accessing this from a single shader invocation at once you don't need to worry about atomically updating the counter, but if you were updating it from multiple shader invocations simultaneously you could use InterlockedAdd() to give you the atomic guarantee you need to do that safely (I don't recall for sure - do we need raw byte access buffer support for InterlockedAdd, or can it be used with other UAVs? It can be used with shared memory in compute shaders at least, which aren't backed by UAVs and we don't need to do anything more to support, but those are per thread-group and only have a lifetime of one dispatch call. I have some support for raw buffers in a topic branch - I should merge that into master).
I have made some progress towards support for append/consume/counter structured buffers, which could also be used to provide the atomic guarantees you need to update a counter from multiple shader invocations simultaneously, but it needs some rework of the view cache to make it useful, and I decided that was a little too risky for the last release given how much other stuff was already going in and I could make do without it. Let me know if you feel you need this.
Of course, you should keep in mind that atomic operations work by each GPU thread taking turns to own and update a cache line, so that cache line may have to travel all over the GPU while each thread waits for it - use sparingly and from as few threads as possible or performance may suffer (but may still be better than doing everything from a single shader invocation depending on exactly what you are doing and other factors).
Can you take a step back and tell me what you are trying to achieve here? I'm a little concerned that your use case may not be well suited to 3DMigoto's current level of support, and while I think you can probably still achieve what you want going down this path, there may be better options we haven't considered yet, or dead ends we might hit if we don't.
A little background: I originally added the ability to access vertex buffers to 3DMigoto so that we could find the locations of other vertices from any vertex being drawn, which is useful for automatic depth buffer based adjustments to HUD elements where we may need to find the center of an icon (or at least one single consistent point) to check the depth buffer from each invocation to make sure we adjust all vertices by the same amount so as not to distort the icon.
The Dreamfall Chapters example is a more extreme example of the same concept, but instead of accessing the vertex buffer from other vertex shader invocations I'm using a compute shader to gather the positions of multiple HUD and text elements across a frame to decide on one single point to ray trace the depth buffer to determine what depth to draw the entire HUD at in the following frame.
Another example where we might do something similar is to solve the classic 2D nameplate/health bar problem, by collecting the positions and depths of every actor that may have a nameplate and match up the nameplates to the closest actor - in some cases we only need the matrices, but in others we may need a point from the vertex buffers as well.
In all those cases, I'm only interested in a small piece of information (one single position for the entire object) I can get from the vertex buffer, so I only store that - I have no need to store the entire vertex buffer, and I'm concerned from your comments that you might be.
Depending on what you're trying to do, maybe we should be looking into e.g. hooking up the stream output stage and/or indirect draw calls in 3DMigoto (DrawAuto and assigning resources to the stream output stage is already in, but without being able to configure a geometry/vertex/domain shader to use the stream output it is of limited use), or maybe we need to start thinking about ways we could give you more precise control over things like resource arrays, subresource copying and so on from 3DMigoto if they could help (some of this may be of limited use until we get more flexibility elsewhere in the command lists though, like "x = x + instance_count" and so on), or maybe we can just find another way to approach the problem.
Or perhaps if you do need something a little too situation specific we could think about creating an API for 3DMigoto to enable you to write a sort of extension DLL to do what you need, or otherwise add some sort of scripting support to 3DMigoto - if you noticed the recent bike shedding issue I opened in the tracker you would realise I'm already working on something like this for a UE4 specific extension (delayed because of depression triggers) and am open to input if other people might also like to use an API in 3DMigoto for whatever.
-----
Anyway, if you keep going down this path you would store the data you are interested in inside an array in the custom resource (either an array inside the structure as I did in Dreamfall Chapters, or use a higher number for array= inside the custom resource definition to turn the whole resource into an array, which may be more suitable if you want to use it *as* a vertex buffer again later, which is not tested but should work, especially once we gain support for indirect draw calls). You would also use a counter separate from the array (either a separate field in the structure as I did, or possibly in an entirely separate resource, which would be more appropriate if you turn the whole resource into an array) to know which index you are up to.
If you are only accessing this from a single shader invocation at once you don't need to worry about atomically updating the counter, but if you were updating it from multiple shader invocations simultaneously you could use InterlockedAdd() to give you the atomic guarantee you need to do that safely (I don't recall for sure - do we need raw byte access buffer support for InterlockedAdd, or can it be used with other UAVs? It can be used with shared memory in compute shaders at least, which aren't backed by UAVs and we don't need to do anything more to support, but those are per thread-group and only have a lifetime of one dispatch call. I have some support for raw buffers in a topic branch - I should merge that into master).
I have made some progress towards support for append/consume/counter structured buffers, which could also be used to provide the atomic guarantees you need to update a counter from multiple shader invocations simultaneously, but it needs some rework of the view cache to make it useful, and I decided that was a little too risky for the last release given how much other stuff was already going in and I could make do without it. Let me know if you feel you need this.
Of course, you should keep in mind that atomic operations work by each GPU thread taking turns to own and update a cache line, so that cache line may have to travel all over the GPU while each thread waits for it - use sparingly and from as few threads as possible or performance may suffer (but may still be better than doing everything from a single shader invocation depending on exactly what you are doing and other factors).
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Let me describe in more details of what I'm planning to achieve.
The game uses 2 type of shadows. Prebaked used only on the ground and dynamic used on the objects. There are 2 dynamic shadow maps which game uses by fading them depending on the distance from the camera. The dynamic shadows were only visible at the distance of 200,a and were very light, so I've made them more visible by increasing the distance and shadow mixing in the pixel shader. Then I scaled the xy output by 0.5 of the vertex shader passes used for calculating the depth buffer used in shadow map calculation, So the shadow map coveres more area.
The problem is that the game has hardcoded 2 types of culling, the camera view frustum culling (not the orthogonal sun view) and the square area culling as you can see on the video when I scale the vertex output even more by multiplying xy by 0.125
https://www.youtube.com/watch?v=6Taa5jzOQ6A
Because of that the objects shadows are disappearing even if the objects are still visible from the camera.
I was thinking about 2 routes to draw those missing objects:
1. Store the geometry from the later shaders that draw the textured objects and use them in the additional pass of the vertex shader that calculate the depth buffer for the shadow map. The objects will be drawn 1 frame late, but because those are static there will be no offset in the depth map.
2. Store the transformation matrices of the orthogonal view and render additional pass after the textured objects by skipping the pixel shader and using those orthogonal view transform matrices. Then I could mix the result in the next frame with rendered depth buffer before shadow map shader is called. That would need to have some sort of orthogonal camera offset compensation in the custom shader that mixes the depth buffers, as the orthogonal view transform matrices will be late by 1 frame.
What you described I see that the most difficult brickwall to crush would be handling multiple passes of the shaders drawing buildings.
Making 3DM modular is a very interesting idea. It would surpass the ReShade wrapper by the magnitude of 100
Forgive me for saying it, but using 3DM solely for fixing stereo effects is a waste of potential.
The custom resources that automaticaly build in a stack would be a nice feature. Instead of overriding the data it would add another element to the queue. When you consume the data by referencing to it, it could increment the pointer to the next/previous element.
[quote]I have made some progress towards support for append/consume/counter structured buffers, which could also be used to provide the atomic guarantees you need to update a counter from multiple shader invocations simultaneously, but it needs some rework of the view cache to make it useful, and I decided that was a little too risky for the last release given how much other stuff was already going in and I could make do without it. Let me know if you feel you need this.[/quote]
Not sure about what you mean by saying atomic, is it what I just wrote but in different words?
Let me describe in more details of what I'm planning to achieve.
The game uses 2 type of shadows. Prebaked used only on the ground and dynamic used on the objects. There are 2 dynamic shadow maps which game uses by fading them depending on the distance from the camera. The dynamic shadows were only visible at the distance of 200,a and were very light, so I've made them more visible by increasing the distance and shadow mixing in the pixel shader. Then I scaled the xy output by 0.5 of the vertex shader passes used for calculating the depth buffer used in shadow map calculation, So the shadow map coveres more area.
The problem is that the game has hardcoded 2 types of culling, the camera view frustum culling (not the orthogonal sun view) and the square area culling as you can see on the video when I scale the vertex output even more by multiplying xy by 0.125
Because of that the objects shadows are disappearing even if the objects are still visible from the camera.
I was thinking about 2 routes to draw those missing objects:
1. Store the geometry from the later shaders that draw the textured objects and use them in the additional pass of the vertex shader that calculate the depth buffer for the shadow map. The objects will be drawn 1 frame late, but because those are static there will be no offset in the depth map.
2. Store the transformation matrices of the orthogonal view and render additional pass after the textured objects by skipping the pixel shader and using those orthogonal view transform matrices. Then I could mix the result in the next frame with rendered depth buffer before shadow map shader is called. That would need to have some sort of orthogonal camera offset compensation in the custom shader that mixes the depth buffers, as the orthogonal view transform matrices will be late by 1 frame.
What you described I see that the most difficult brickwall to crush would be handling multiple passes of the shaders drawing buildings.
Making 3DM modular is a very interesting idea. It would surpass the ReShade wrapper by the magnitude of 100
Forgive me for saying it, but using 3DM solely for fixing stereo effects is a waste of potential.
The custom resources that automaticaly build in a stack would be a nice feature. Instead of overriding the data it would add another element to the queue. When you consume the data by referencing to it, it could increment the pointer to the next/previous element.
I have made some progress towards support for append/consume/counter structured buffers, which could also be used to provide the atomic guarantees you need to update a counter from multiple shader invocations simultaneously, but it needs some rework of the view cache to make it useful, and I decided that was a little too risky for the last release given how much other stuff was already going in and I could make do without it. Let me know if you feel you need this.
Not sure about what you mean by saying atomic, is it what I just wrote but in different words?
What if I used a main shader of the buildings and run a pass to render the depth buffer using the viewprojection matrix from any of the occluded objects used for the shadow map. Then because it will be 1 frame late I could combine the new depth buffer with the depth buffer used to render the shadows but using the sampling offset. The thing I'm unsure about is would I be able to somehow calculate that pixel offset from old and new viewprojection matrix of the light source view.
EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64
You might be able to calculate the shadow matrix for the current frame, something like inverse(last_frame_shadow_matrix) * last_frame_view_project * inverse(this_frame_view_project) * last_frame_shadow_matrix
If the shadow matrix is something like clip_to_shadow or screen_to_shadow you will need a last_clip_to_this_clip or similar matrix - but you can most likely calculate that as well from the last and current view projection matrices.
I don't know if that will work, and maybe I've got the order wrong, but it might be worth a shot.
I'm doing vaguely similar things in UE4 at the moment (for stereo, not shadows), and compensating for the differences between the last and current frame is how I fix temporal AA and motion blur (but in my case I have a ClipToPrevClip matrix). I'm not quite ready to announce that just yet, but if it helps this is one way I can calculate a correction matrix for the SVPosition coordinate system (I've already calculated a stereo version of SVPositionToTranslatedWorld from other things, but in your case you would use something like the last and current frame view projection matrices to generate a correction matrix in either world or clip space and go from there):
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
By saying an offsett I meant to compensate the camera movement between combining the old depth buffer used to draw shadows (with occluded buildings removed, as it’s the most visible artifact) with that 1 frame late depth buffer I render called from the main buildings shader
EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64
My problem might be a very simple one. What I did was to remove all the repaired shaders from the fixed folder to start clean hunting. I just want to get hold of shaders to compare. But the 5 shaders I dumped didn't match any in the fixed folder? Some of the textures were blotted out as pixel and vertex shaders. But I dont know why they wont correspond with at least one shader in the fixed folder?
The games are pretty much flawless and looking amazing but there are still some messed up shaders that I thought I could try and learn from while I have some shaders to compare them with. Can someone please tell me whats going on and how I will find some shaders to compare?
Thanks
Also turn on export_hlsl so 3DMigoto will dump all the shaders to ShaderCache (if memory serves it might not dump ones that are also in ShaderFixes - so remove those for that run).
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
// Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
// Generated by Microsoft (R) HLSL Shader Compiler 9.30.9200.20789
Should there be any differences, is this normal?
What I've done was to look for all the lines that have been modified and those added. This way I could look for consistency in the different shaders more easily.
So the changed code I marked with ">>>>>"
The lines which have been added a simple "added" reference.
I will add the differences I've compiled that Mike had to implement to get the shader in the correct stereo scopic position. This is some crazy stuff!!! Dont know how you guys figure this shit out!?!?!
Please advice if my approach will work and improvise where necessary. Im still not 100% sure if 3dM is doing what you explained DSS. When I made the necessary changes in the ini you suggested I weren't able to dump the shaders like you explained. There were very little visual "movement" and rather than breaking them one actually was fixed. Very weird. Think the one shader that was fixed is actually the one Im adding. I tried to keep it as simple and neat as possible. The code are in brackets with the line number added. Please feel free to ask if my compilation doesn't make sense.
Im in no rush with this. Im making a backup of my save so I can simultaneously continue playing and fixing. The broken shaders are in a place you only probably pass once on this specific mission. =)
None of those lines look like anything we would normally change - I think this is probably the result of 3DMigoto decompiling a fixed shader that it shouldn't have.
Have you gone through Bo3b's shaderhacker school? That will teach you some of the basics, and you can't go on to anything more advanced until you have a good mastery of that.
None of the changed or added lines have anything to do with 3D coordinates - if marking_mode=original is showing this one looks better without the changes then you should not be touching it.
Also, if you haven't already you should update to the latest 3DMigoto DLL - we have fixed a metric tonne of bugs since the original Watch Dogs fix, not to mention all the modern features that have been added since then.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64
e.g. I access the vertex buffers in Dreamfall Chapters to find the center of floating icons as part of the HUD analysis and adjustment:
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_vb_0bd32bb622c2d611.hlsl
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_analyse.hlsl
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud.hlsl
https://github.com/DarkStarSword/3d-fixes-DreamfallChapters/blob/master/ShaderFixes/hud_analyse_clear.hlsl
https://raw.githubusercontent.com/DarkStarSword/3d-fixes-DreamfallChapters/master/d3dx.ini (search for vb0)
If you specifically want to build a table of vertex data from multiple draw calls, take a close look at what analyse_text_shader() is doing (it's recording positions from matrices in a constant buffer, but would be trivial to do the same with vertex buffer data).
Just beware that the vertex buffer layout is similar to, but does not match the inputs to the vertex shader - I suggest using frame analysis and/or my constant buffer debug shader (which works with other buffers, but beware that it treats everything as 32bit floats, which will not be the case for all fields in a vertex buffer) to examine it for yourself to work out the correct layout.
Also beware that it might use certain formats that are not available inside shaders - position is usually just a vector of four regular 32bit little endian floats, so that's easy since it matches HLSL's float4 data type, but if you needed to pull out say a colour stored as an 8 bit integer, you would have to read out the whole 32bits as an unsigned integer and mask and shift to get the bits you need (tip: use unsigned data types or shift before masking so you don't need to worry about sign extended shifting messing things up if the high bit was a 1 - ishr vs ushr instructions).
One huge gotcha that I always forget - if you read the input position from the vertex buffer directly it might have undefined fields, like the position W coordinate is often NAN and if you use that in a calculation you will wind up with all sorts of bizzare behaviour (read the comments in the above example - I still have no idea why the symptoms of this problem *change after an F10 reload* - best guess is something to do with GPU or system memory layout subtly changing on a reload affecting what garbage data was in memory before filling out the vertex data), so if you read the input position ignore W and use a hardcoded W=1 instead.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64
It might not do the right thing when chopping off the vertex offset - it can't distinguish between vertex and instance buffers so would apply StartVertexLocation / BaseVertexLocation as the offset to both (all) vertex & instance buffers and ignore StartInstanceLocation entirely. You might want to check a frame analysis log to make sure the game is using 0 for these in the DrawInstanced / DrawIndexedInstanced calls*, otherwise I'll need to work out the best way to handle it (the flexibility of the IA layout makes this non-trivial... I might be better off ignoring the vertex offset for instanced geometry and leaving it to you to work out how to deal with it in the shader... I always meant to add a keyword to ignore the offset when copying a buffer, though there are actually two offsets 3DMigoto takes into account for vertex buffers so we'd only want to ignore this one).
If you need it, you can get access to the VertexCountPerInstance / IndexCountPerInstance and InstanceCount parameters via command list assignment (these will be 0 if they aren't applicable to the current draw call):
* if the game is using one of the Indirect draw calls 3DMigoto won't do anything with the offsets and all the counts will be 0... Oh, hey I should use the new 'this' keyword to get access to the indirect buffer via the command list :)
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64
A little background: I originally added the ability to access vertex buffers to 3DMigoto so that we could find the locations of other vertices from any vertex being drawn, which is useful for automatic depth buffer based adjustments to HUD elements where we may need to find the center of an icon (or at least one single consistent point) to check the depth buffer from each invocation to make sure we adjust all vertices by the same amount so as not to distort the icon.
The Dreamfall Chapters example is a more extreme example of the same concept, but instead of accessing the vertex buffer from other vertex shader invocations I'm using a compute shader to gather the positions of multiple HUD and text elements across a frame to decide on one single point to ray trace the depth buffer to determine what depth to draw the entire HUD at in the following frame.
Another example where we might do something similar is to solve the classic 2D nameplate/health bar problem, by collecting the positions and depths of every actor that may have a nameplate and match up the nameplates to the closest actor - in some cases we only need the matrices, but in others we may need a point from the vertex buffers as well.
In all those cases, I'm only interested in a small piece of information (one single position for the entire object) I can get from the vertex buffer, so I only store that - I have no need to store the entire vertex buffer, and I'm concerned from your comments that you might be.
Depending on what you're trying to do, maybe we should be looking into e.g. hooking up the stream output stage and/or indirect draw calls in 3DMigoto (DrawAuto and assigning resources to the stream output stage is already in, but without being able to configure a geometry/vertex/domain shader to use the stream output it is of limited use), or maybe we need to start thinking about ways we could give you more precise control over things like resource arrays, subresource copying and so on from 3DMigoto if they could help (some of this may be of limited use until we get more flexibility elsewhere in the command lists though, like "x = x + instance_count" and so on), or maybe we can just find another way to approach the problem.
Or perhaps if you do need something a little too situation specific we could think about creating an API for 3DMigoto to enable you to write a sort of extension DLL to do what you need, or otherwise add some sort of scripting support to 3DMigoto - if you noticed the recent bike shedding issue I opened in the tracker you would realise I'm already working on something like this for a UE4 specific extension (delayed because of depression triggers) and am open to input if other people might also like to use an API in 3DMigoto for whatever.
-----
Anyway, if you keep going down this path you would store the data you are interested in inside an array in the custom resource (either an array inside the structure as I did in Dreamfall Chapters, or use a higher number for array= inside the custom resource definition to turn the whole resource into an array, which may be more suitable if you want to use it *as* a vertex buffer again later, which is not tested but should work, especially once we gain support for indirect draw calls). You would also use a counter separate from the array (either a separate field in the structure as I did, or possibly in an entirely separate resource, which would be more appropriate if you turn the whole resource into an array) to know which index you are up to.
If you are only accessing this from a single shader invocation at once you don't need to worry about atomically updating the counter, but if you were updating it from multiple shader invocations simultaneously you could use InterlockedAdd() to give you the atomic guarantee you need to do that safely (I don't recall for sure - do we need raw byte access buffer support for InterlockedAdd, or can it be used with other UAVs? It can be used with shared memory in compute shaders at least, which aren't backed by UAVs and we don't need to do anything more to support, but those are per thread-group and only have a lifetime of one dispatch call. I have some support for raw buffers in a topic branch - I should merge that into master).
I have made some progress towards support for append/consume/counter structured buffers, which could also be used to provide the atomic guarantees you need to update a counter from multiple shader invocations simultaneously, but it needs some rework of the view cache to make it useful, and I decided that was a little too risky for the last release given how much other stuff was already going in and I could make do without it. Let me know if you feel you need this.
Of course, you should keep in mind that atomic operations work by each GPU thread taking turns to own and update a cache line, so that cache line may have to travel all over the GPU while each thread waits for it - use sparingly and from as few threads as possible or performance may suffer (but may still be better than doing everything from a single shader invocation depending on exactly what you are doing and other factors).
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
The game uses 2 type of shadows. Prebaked used only on the ground and dynamic used on the objects. There are 2 dynamic shadow maps which game uses by fading them depending on the distance from the camera. The dynamic shadows were only visible at the distance of 200,a and were very light, so I've made them more visible by increasing the distance and shadow mixing in the pixel shader. Then I scaled the xy output by 0.5 of the vertex shader passes used for calculating the depth buffer used in shadow map calculation, So the shadow map coveres more area.
The problem is that the game has hardcoded 2 types of culling, the camera view frustum culling (not the orthogonal sun view) and the square area culling as you can see on the video when I scale the vertex output even more by multiplying xy by 0.125
Because of that the objects shadows are disappearing even if the objects are still visible from the camera.
I was thinking about 2 routes to draw those missing objects:
1. Store the geometry from the later shaders that draw the textured objects and use them in the additional pass of the vertex shader that calculate the depth buffer for the shadow map. The objects will be drawn 1 frame late, but because those are static there will be no offset in the depth map.
2. Store the transformation matrices of the orthogonal view and render additional pass after the textured objects by skipping the pixel shader and using those orthogonal view transform matrices. Then I could mix the result in the next frame with rendered depth buffer before shadow map shader is called. That would need to have some sort of orthogonal camera offset compensation in the custom shader that mixes the depth buffers, as the orthogonal view transform matrices will be late by 1 frame.
What you described I see that the most difficult brickwall to crush would be handling multiple passes of the shaders drawing buildings.
Making 3DM modular is a very interesting idea. It would surpass the ReShade wrapper by the magnitude of 100
Forgive me for saying it, but using 3DM solely for fixing stereo effects is a waste of potential.
The custom resources that automaticaly build in a stack would be a nice feature. Instead of overriding the data it would add another element to the queue. When you consume the data by referencing to it, it could increment the pointer to the next/previous element.
Not sure about what you mean by saying atomic, is it what I just wrote but in different words?
EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64