Quite some time back when we were fixing Mad Max I wondered:
[quote="DarkStarSword"]One thing I've been meaning to try to work out, is does a matrix exist that we can multiply a regular projection matrix by to get a modified projection matrix with the stereo correction built in (but not change it in any other way)?
If there is, we should be able to multiply any matrix that includes a projection matrix as well as some other matrix (like MVP or VP) by it, or alternatively inverse it then multiply that by the inverse MV/MVP matrix to build the stereo correction right into them as well.
Not sure what the answer is yet - I haven't sat down with a notebook to work out the math. [/quote][size="S"]Original Post: [url]https://forums.geforce.com/default/topic/872839/3d-vision/mad-max-3d-vision/post/4672069/#4672069[/url][/size]
And about an hour later I had worked through that math and came up with this:
[quote]The answer to my above musing is:
[code]
[ 1, 0, 0, 0 ],
[ 0, 1, 0, 0 ],
[ (sep*conv) / (q*near), 0, 1, 0 ],
[ sep - (sep*conv)/near, 0, 0, 1 ]
[/code]
Where q = far/(far-near)
...
It should be possible to inverse that and multiply the inverse view-projection matrix by it to add a stereo correction to it, or multiply the screen coordinate by it before multiplying by the inverse view-projection matrix.
Edit: It's inverse turns out to be just as simple:
[code]
[ 1, 0, 0, 0 ],
[ 0, 1, 0, 0 ],
[ -(sep*conv) / (q*near), 0, 1, 0 ],
[ -sep + (sep*conv)/near, 0, 0, 1 ]
[/code]
[/quote]
I added that to my matrix.py playground at the time and was able to verify that the numbers I got out seemed to match the results of using the nvidia formula, but the idea was largely shelved since then as we never really came across a situation where we needed it. But it has been sitting in the back of my mind because I was sure the theory was solid, and that there might be some situations where it could be useful, so this weekend I spent a little time switching a couple of shaders to use it as a proof of concept that it does indeed work. I don't have any screenshots - because these games were already fixed, so the before and after screenshots would look identical ;-)
I started out by switching a couple of random vertex shaders to use it instead of the driver's stereo correction, at first I used a simple case in MGSVTPP that used a pure projection matrix, just to make sure I had the details ironed out before trying anything more complex:
https://github.com/DarkStarSword/3d-fixes/commit/0645695dc8236b4776ab9e0bab16d2e0f96b7fe2
Then I applied it to another random vertex shader in Far Cry Primal, but this one is more interesting because it was modifying a View-Rotation-Projection matrix, not just a pure projection matrix - and the fact that this worked proved the theory that this can be applied to any composite matrix that includes a projection matrix:
https://github.com/DarkStarSword/3d-fixes/commit/923a79255f4721d3ee5eabf5b3a012b880696464
So far I've just duplicated the results of the nvidia driver, and only for forwards projection matrices, which doesn't really show what we could do with this. So, then I took one of the shadow shaders from Far Cry Primal, ripped out my previous shadow fix and replaced it with a variation of this that injects a stereo correction into an *Inverse* View-Rotation-Projection matrix - the maths are a little different here, but not by much. This worked beautifully and the result was identical to editing the shader:
https://github.com/DarkStarSword/3d-fixes/commit/8b7b9c7ba6dc053b35bdbf7df177634299696f62
So, this is where things get interesting - because this technique relies on changing the matrices passed to the shader instead of modifying the code in the shader it could *in some cases* allow us to fix effects in a game without necessarily understanding the code, only identifying relevant matrices in the constant buffers. In fact, with a bit more work in 3DMigoto it should be possible to eliminate the need to edit the shader at all as 3DMigoto could simply replace the constant buffer with a stereo pair containing the modified matrices for each eye.
This isn't a magic bullet - it alone will not allow someone to 100% fix all rendering issues in a game... but it might get us to a point where our starting point has less broken effects that need manual fixes. I also have some hopes that it might help in complex shaders that are difficult to understand or perform a large number of coordinate transformations (I'm thinking screen space reflections, ambient occlusion, etc... however I was not able to make this work with the water reflections (not screen space) in Far Cry Primal, so don't get your hopes up too high).
At the moment this is not optimised for performance, which doesn't matter here since I'm only applying this to a small number of effects, but if we ever applied this globally then running an extra compute shader for every draw call could bite. I'm sure there are a few tricks we could employ though, for one - the injection matrix should be constant for an entire frame (or at least per render target), so we could calculate that once and then just multiply each new MVP matrix by it. And if the game has the model matrices separate from the view-projection then we would only need to do the multiplication once.
At the moment, I'm not entirely sure where this will lead. Maybe this gives us a new tool that we can use to try to fix certain effects, maybe this is the next step on the path towards getting 3DMigoto to be able to fix a large number of effects with minimal effort... or maybe this is just a mathematical curiosity that we never end up using in practice.
The above three commits are all in a topic branch "stereo_injection_matrix_proof_of_concept" on my github. I have another interesting maths discovery (used as part of this, but with broader uses) which I'll go into in just a moment.
Quite some time back when we were fixing Mad Max I wondered:
DarkStarSword said:One thing I've been meaning to try to work out, is does a matrix exist that we can multiply a regular projection matrix by to get a modified projection matrix with the stereo correction built in (but not change it in any other way)?
If there is, we should be able to multiply any matrix that includes a projection matrix as well as some other matrix (like MVP or VP) by it, or alternatively inverse it then multiply that by the inverse MV/MVP matrix to build the stereo correction right into them as well.
Not sure what the answer is yet - I haven't sat down with a notebook to work out the math.
It should be possible to inverse that and multiply the inverse view-projection matrix by it to add a stereo correction to it, or multiply the screen coordinate by it before multiplying by the inverse view-projection matrix.
Edit: It's inverse turns out to be just as simple:
I added that to my matrix.py playground at the time and was able to verify that the numbers I got out seemed to match the results of using the nvidia formula, but the idea was largely shelved since then as we never really came across a situation where we needed it. But it has been sitting in the back of my mind because I was sure the theory was solid, and that there might be some situations where it could be useful, so this weekend I spent a little time switching a couple of shaders to use it as a proof of concept that it does indeed work. I don't have any screenshots - because these games were already fixed, so the before and after screenshots would look identical ;-)
I started out by switching a couple of random vertex shaders to use it instead of the driver's stereo correction, at first I used a simple case in MGSVTPP that used a pure projection matrix, just to make sure I had the details ironed out before trying anything more complex:
https://github.com/DarkStarSword/3d-fixes/commit/0645695dc8236b4776ab9e0bab16d2e0f96b7fe2
Then I applied it to another random vertex shader in Far Cry Primal, but this one is more interesting because it was modifying a View-Rotation-Projection matrix, not just a pure projection matrix - and the fact that this worked proved the theory that this can be applied to any composite matrix that includes a projection matrix:
https://github.com/DarkStarSword/3d-fixes/commit/923a79255f4721d3ee5eabf5b3a012b880696464
So far I've just duplicated the results of the nvidia driver, and only for forwards projection matrices, which doesn't really show what we could do with this. So, then I took one of the shadow shaders from Far Cry Primal, ripped out my previous shadow fix and replaced it with a variation of this that injects a stereo correction into an *Inverse* View-Rotation-Projection matrix - the maths are a little different here, but not by much. This worked beautifully and the result was identical to editing the shader:
https://github.com/DarkStarSword/3d-fixes/commit/8b7b9c7ba6dc053b35bdbf7df177634299696f62
So, this is where things get interesting - because this technique relies on changing the matrices passed to the shader instead of modifying the code in the shader it could *in some cases* allow us to fix effects in a game without necessarily understanding the code, only identifying relevant matrices in the constant buffers. In fact, with a bit more work in 3DMigoto it should be possible to eliminate the need to edit the shader at all as 3DMigoto could simply replace the constant buffer with a stereo pair containing the modified matrices for each eye.
This isn't a magic bullet - it alone will not allow someone to 100% fix all rendering issues in a game... but it might get us to a point where our starting point has less broken effects that need manual fixes. I also have some hopes that it might help in complex shaders that are difficult to understand or perform a large number of coordinate transformations (I'm thinking screen space reflections, ambient occlusion, etc... however I was not able to make this work with the water reflections (not screen space) in Far Cry Primal, so don't get your hopes up too high).
At the moment this is not optimised for performance, which doesn't matter here since I'm only applying this to a small number of effects, but if we ever applied this globally then running an extra compute shader for every draw call could bite. I'm sure there are a few tricks we could employ though, for one - the injection matrix should be constant for an entire frame (or at least per render target), so we could calculate that once and then just multiply each new MVP matrix by it. And if the game has the model matrices separate from the view-projection then we would only need to do the multiplication once.
At the moment, I'm not entirely sure where this will lead. Maybe this gives us a new tool that we can use to try to fix certain effects, maybe this is the next step on the path towards getting 3DMigoto to be able to fix a large number of effects with minimal effort... or maybe this is just a mathematical curiosity that we never end up using in practice.
The above three commits are all in a topic branch "stereo_injection_matrix_proof_of_concept" on my github. I have another interesting maths discovery (used as part of this, but with broader uses) which I'll go into in just a moment.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
This was the stereo injection matrix I had to calculate for this trick:
[quote]
[code]
[ 1, 0, 0, 0 ],
[ 0, 1, 0, 0 ],
[ (sep*conv) / (q*near), 0, 1, 0 ],
[ sep - (sep*conv)/near, 0, 0, 1 ]
[/code]
Where q = far/(far-near)
[/quote]Since this matrix is derived from an inverse projection matrix and a stereo corrected forwards projection matrix it includes a couple of fields from the projection matrix - the near and far clipping planes. If we were working with a pure projection matrix we could probably derive those quite easily with some trivial algebra, but the whole point of this technique is that it will work with composite projection matrices as well, and the game doesn't always hand us these values, so I had to find a way to derive them from any composite projection matrix.
The trick I came up with is actually fairly simple - take a homogeneous coordinate (0, 0, 0, 1) and multiply it by the inverse projection (or inverse view-projection, or inverse model-view-projection, etc) matrix. Then normalise that coordinate (i.e. turn it into a 3D cartesian coordinate somewhere in space) and multiply that by the forwards matrix - the resulting W value will be the near clipping plane. To find the far clipping plane do exactly the same, but start with the homogeneous coordinate (0, 0, 1, 1). This may be of use in other cases where we need to find the near and far clipping planes (e.g. for some variations of the stereo crosshair code, or certain corrections where we need to multiply or divide by the far clipping plane).
Since this matrix is derived from an inverse projection matrix and a stereo corrected forwards projection matrix it includes a couple of fields from the projection matrix - the near and far clipping planes. If we were working with a pure projection matrix we could probably derive those quite easily with some trivial algebra, but the whole point of this technique is that it will work with composite projection matrices as well, and the game doesn't always hand us these values, so I had to find a way to derive them from any composite projection matrix.
The trick I came up with is actually fairly simple - take a homogeneous coordinate (0, 0, 0, 1) and multiply it by the inverse projection (or inverse view-projection, or inverse model-view-projection, etc) matrix. Then normalise that coordinate (i.e. turn it into a 3D cartesian coordinate somewhere in space) and multiply that by the forwards matrix - the resulting W value will be the near clipping plane. To find the far clipping plane do exactly the same, but start with the homogeneous coordinate (0, 0, 1, 1). This may be of use in other cases where we need to find the near and far clipping planes (e.g. for some variations of the stereo crosshair code, or certain corrections where we need to multiply or divide by the far clipping plane).
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Mind blowing work, as usual. Seriously, your ability to not only understand the maths & concepts behind fixing things, but to then be able to come up with ideas to fully automate them is some next level s***. I'll bet money that at some point down the road, every fix ever made afterward will all be done via your awesome tools with minimal to no effort involved by anyone, just plug and play baby!
Mind blowing work, as usual. Seriously, your ability to not only understand the maths & concepts behind fixing things, but to then be able to come up with ideas to fully automate them is some next level s***. I'll bet money that at some point down the road, every fix ever made afterward will all be done via your awesome tools with minimal to no effort involved by anyone, just plug and play baby!
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
Have you ever wanted to view the contents of a constant buffer live while playing the game? I know I sure have... Well, now you can:
[url=https://forums.geforce.com/cmd/default/download-comment-attachment/71133/][img]https://forums.geforce.com/cmd/default/download-comment-attachment/71133/[/img][/url]
Obviously we can already dump these out with frame analysis, and we can often use that to identify various matrices in the constant buffers, but sometimes we might be looking for something that might not be immediately obvious, like a combined MVP matrix that could just look like 16 random numbers. Being able to view it live means that we can quickly work out which numbers change while rotating or moving the camera, suggesting that they may well be a view matrix, or a composite matrix including a view matrix.
This does not require any new support in 3DMigoto - it leverages the custom shader support we already have, just in a new and exciting way. There are several parts to make this work, all of which can be found here:
https://github.com/DarkStarSword/3d-fixes/tree/master/custom_shader_cb_live_view
That's all commented and should be fairly straight forward to drop in any game you are working on. Add a line similar to this to a [ShaderOverride] section to copy or reference the constant buffer of interest:
[code]ResourceDebugCB = ps-cb1[/code]
Then in the present section add this:
[code]run = CustomShaderDebugCB[/code]
To reduce how many entries it displays, simply lower the number on the draw line in the custom shader section:
[code]draw = 4096, 0[/code]
If you wish to change the text colour, simply edit this line in the pixel shader:
[code]static const float3 colour = float3(1, 0.5, 0.25);[/code]
Note that the floating point values this displays may not be as accurate as those dumped out using frame analysis - this is not intended to give you super accurate values, this is intended to see how the values change during gameplay, and a bit of floating point error shouldn't hurt there.
For a bit of an overview of how this works - this is entirely decoding the constant buffers on the GPU. It uses three shaders - the vertex shader is not very interesting as it just passes the index into the constant buffer along to the geometry shader. The geometry shader is the real heart of this - it includes routines to convert the floating point values from the constant buffer to text, position each character on screen taking into account the width of each individual character and generate the required triangle strips to pass to the pixel shader, which draws the font to the screen. A single invocation of the geometry shader can only generate a limited amount of text because the buffer size it passes to the pixel shader is quite small, so each invocation only generates the text for the four components of a single offset in the constant buffer, and we simply invoke it for however many entries we want to see.
Have you ever wanted to view the contents of a constant buffer live while playing the game? I know I sure have... Well, now you can:
Obviously we can already dump these out with frame analysis, and we can often use that to identify various matrices in the constant buffers, but sometimes we might be looking for something that might not be immediately obvious, like a combined MVP matrix that could just look like 16 random numbers. Being able to view it live means that we can quickly work out which numbers change while rotating or moving the camera, suggesting that they may well be a view matrix, or a composite matrix including a view matrix.
That's all commented and should be fairly straight forward to drop in any game you are working on. Add a line similar to this to a [ShaderOverride] section to copy or reference the constant buffer of interest:
ResourceDebugCB = ps-cb1
Then in the present section add this:
run = CustomShaderDebugCB
To reduce how many entries it displays, simply lower the number on the draw line in the custom shader section:
draw = 4096, 0
If you wish to change the text colour, simply edit this line in the pixel shader:
Note that the floating point values this displays may not be as accurate as those dumped out using frame analysis - this is not intended to give you super accurate values, this is intended to see how the values change during gameplay, and a bit of floating point error shouldn't hurt there.
For a bit of an overview of how this works - this is entirely decoding the constant buffers on the GPU. It uses three shaders - the vertex shader is not very interesting as it just passes the index into the constant buffer along to the geometry shader. The geometry shader is the real heart of this - it includes routines to convert the floating point values from the constant buffer to text, position each character on screen taking into account the width of each individual character and generate the required triangle strips to pass to the pixel shader, which draws the font to the screen. A single invocation of the geometry shader can only generate a limited amount of text because the buffer size it passes to the pixel shader is quite small, so each invocation only generates the text for the four components of a single offset in the constant buffer, and we simply invoke it for however many entries we want to see.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Dude, I f***ing love you!
Seriously, I put in a request for this a while back, and was just thinking about this 2 days ago. Christmas comes 4 days early this year!
Seriously, I put in a request for this a while back, and was just thinking about this 2 days ago. Christmas comes 4 days early this year!
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
So, I've just proved that the approach of modifying matrices is viable to fix a very common rendering issue present on almost every surface in ABZU (UE4):
https://github.com/DarkStarSword/3d-fixes/commit/d09e0e3d0d05ab8efeab0ce156101eeec33dff29
That engine has some complications, in particular their projection matrix looks like this:
[code]
w 0 0 0
0 h 0 0
0 0 0 1
0 0 n 0
[/code]
Specifically, the 0 in location 3x3 stuffs up the near & far calculations - that location would normally hold far/(far-near), but it seems this engine can optionally use something called "reverse Z projection" and in that case using the above method of finding near & far will end up with a divide by zero and propogate the result "Not a Number" through the rest of the calculations.
At the moment I'm deriving the injection matrix from first principles instead since I do have access to the projection matrix elsewhere in the constant buffer, and that works fine. There's probably some other maths I could use to derive the injection matrix, but this will do for now (I'm certain I can make this run once per frame in this engine, so I'm not worried about a few more calculations here).
The second complication is that the matrix I needed to adjust was "SVPositionToTranslatedWorld", which includes a resolution divide and viewport offset, as opposed to "ClipToTranslatedWorld" which is the inverse view-projection matrix that the injection matrix operates on (these names come from the engine's source code). Fortunately, it is easy to pull out the extra component matrix in SVPositionToTranslatedWorld with a bit more math and inject the stereo correction between it and the inverse projection matrix. We could also generate the extra matrix fairly easily based on the resolution, but for now I'm deriving it in case differences in render scale or viewport offset mess that up.
That engine has some complications, in particular their projection matrix looks like this:
w 0 0 0
0 h 0 0
0 0 0 1
0 0 n 0
Specifically, the 0 in location 3x3 stuffs up the near & far calculations - that location would normally hold far/(far-near), but it seems this engine can optionally use something called "reverse Z projection" and in that case using the above method of finding near & far will end up with a divide by zero and propogate the result "Not a Number" through the rest of the calculations.
At the moment I'm deriving the injection matrix from first principles instead since I do have access to the projection matrix elsewhere in the constant buffer, and that works fine. There's probably some other maths I could use to derive the injection matrix, but this will do for now (I'm certain I can make this run once per frame in this engine, so I'm not worried about a few more calculations here).
The second complication is that the matrix I needed to adjust was "SVPositionToTranslatedWorld", which includes a resolution divide and viewport offset, as opposed to "ClipToTranslatedWorld" which is the inverse view-projection matrix that the injection matrix operates on (these names come from the engine's source code). Fortunately, it is easy to pull out the extra component matrix in SVPositionToTranslatedWorld with a bit more math and inject the stereo correction between it and the inverse projection matrix. We could also generate the extra matrix fairly easily based on the resolution, but for now I'm deriving it in case differences in render scale or viewport offset mess that up.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Stereoised matrix in the buffer - note that the values displayed in each eye are slightly different:
[url=https://forums.geforce.com/cmd/default/download-comment-attachment/71137/][img]https://forums.geforce.com/cmd/default/download-comment-attachment/71137/[/img][/url]
So far I've only been able to get this to work with buffers in UAV / SRV slots - to use the buffer in a constant buffer slot I need to make a copy of it with different bind flags (a DirectX restriction related to hardware optimisations), but doing that copy loses the values for the left eye. Edit: Seems the problem is that buffers are created stereo based on driver heuristics and ignore the surface creation mode - and I can't influence those heuristics for constant buffers.
Stereoised matrix in the buffer - note that the values displayed in each eye are slightly different:
So far I've only been able to get this to work with buffers in UAV / SRV slots - to use the buffer in a constant buffer slot I need to make a copy of it with different bind flags (a DirectX restriction related to hardware optimisations), but doing that copy loses the values for the left eye. Edit: Seems the problem is that buffers are created stereo based on driver heuristics and ignore the surface creation mode - and I can't influence those heuristics for constant buffers.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Tried working on Batman: The Telltale Series, 3DMigoto causes crash on load, even using old versions of the tool. Relevant section from the debug log is:
[code]*** D3D11 DLL successfully initialized. ***
Trying to load original_d3d11.dll
Hooked_LoadLibraryExW load: original_d3d11.dll.
Hooked_LoadLibraryExW switching to original dll: original_d3d11.dll to C:\Windows\system32\d3d11.dll.
*** D3D11CreateDevice called with
pAdapter = 0000000000000000
Flags = 0x20
pFeatureLevels = 0xb100
FeatureLevels = 2
ppDevice = 000000B67CBFEAA0
pFeatureLevel = 0
ppImmediateContext = 000000B67CBFEAA8
->Feature level != 11.0: 0xb100, returning E_INVALIDARG
Hooked_LoadLibraryExW load: kernel32.dll.
Hooked_LoadLibraryExW load: C:\Windows\System32\MSCTF.dll.[/code]
Same error with all hooking options and "allow" options, however a couple interesting notes are:
1) With none of the hooking/allow options set (or using any other than what I mention in #2 below) the game provides an error message "DirectX Error - The parameter is incorrect". I checked the Steam forums, and people were also getting that error when running on Win7 without the evil update, and installing evil update fixed issue for them. (I'm running on Win 10, and have evil update on Win 7 already installed, so doesn't apply here)
2) If I set allow_create_device to 1 or 2, I don't get the above message, a .dmp file gets generated in the game's folder, and if I don't get an unbuffered log the log section cuts off at:
[code] *** D3D11CreateDevice called with
pAdapter = 0000000000000000
Flags = 0x20
pFeatureLevels = 0xb100
FeatureLevels = 2
ppDevice =
[/code]
Can anything be done? Game is available on shaderhackers account if testing is required.
Tried working on Batman: The Telltale Series, 3DMigoto causes crash on load, even using old versions of the tool. Relevant section from the debug log is:
*** D3D11 DLL successfully initialized. ***
Trying to load original_d3d11.dll
Hooked_LoadLibraryExW load: original_d3d11.dll.
Hooked_LoadLibraryExW switching to original dll: original_d3d11.dll to C:\Windows\system32\d3d11.dll.
Same error with all hooking options and "allow" options, however a couple interesting notes are:
1) With none of the hooking/allow options set (or using any other than what I mention in #2 below) the game provides an error message "DirectX Error - The parameter is incorrect". I checked the Steam forums, and people were also getting that error when running on Win7 without the evil update, and installing evil update fixed issue for them. (I'm running on Win 10, and have evil update on Win 7 already installed, so doesn't apply here)
2) If I set allow_create_device to 1 or 2, I don't get the above message, a .dmp file gets generated in the game's folder, and if I don't get an unbuffered log the log section cuts off at:
*** D3D11CreateDevice called with
pAdapter = 0000000000000000
Flags = 0x20
pFeatureLevels = 0xb100
FeatureLevels = 2
ppDevice =
Can anything be done? Game is available on shaderhackers account if testing is required.
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
Hello after a few months break. Sorry for coming stright with a question :)
I need to inject a custom vertex/pixel shader before an original pixel shader but after an original vertex shader. can I use o0 from an original vertex shader as a texture in a custom pixel shader? It doesn't seem to work.
Hello after a few months break. Sorry for coming stright with a question :)
I need to inject a custom vertex/pixel shader before an original pixel shader but after an original vertex shader. can I use o0 from an original vertex shader as a texture in a custom pixel shader? It doesn't seem to work.
I'm not entirely sure I follow - can you describe a bit more what you are trying to achieve?
Taking your question at face value no, you can't inject a vertex and pixel shader between an existing vertex shader and pixel shader because that's not how the graphics pipeline works:
[code]
input assembler <--- vertex buffers, index buffer
|
v
-- vertex shader + constant buffers,
/ | 1D, 2D and 3D textures,
| v typed buffers,
| hull shader raw buffers or
| | structured buffers
| v ... as inputs to
| tessellator ^ any type of shader.
| | |
| v |
|\ domain shader |
| \ | |
| v v |
| geometry shader ---> stream output
\ |
\ v
> rasterizer <--- depth buffer for early Z check
|
v
pixel shader <--> UAVs (RW textures, RW buffers, append/consume buffers)
|
v
output merger ---> render targets, depth/stencil target
[/code]
You could inject one of the other shader types between the vertex and pixel shader - if you wanted to manipulate the vertices in some way before they go to the pixel shader you could use a geometry shader to do it*, though unless you need to generate new vertices that is probably better done by just editing the vertex shader, or you could** use the stream output stage to pass the vertex information out to wherever you like.
* this uses the custom shader support in 3DMigoto to duplicate a draw call with the new geometry shader, but be aware that the original draw call will still happen, so you need to edit one of the original shaders to conditionally abort based on a value in IniParams that you only set when running the custom shader and clear afterwards.
** "could" in theory, but not in practice - we have untested support to bind buffers to the stream output stage as part of the arbitrary resource copying feature (so0, so1, so2, so3), but the shaders also need to be specially prepared to use it, which we can't do yet.
However, 3DMigoto can be used to duplicate a draw call (except indirect draw calls) replacing only what you need from the original, which might be what you want. I used this in The Witness to duplicate a draw call using the original vertex shader with my own pixel shader and a custom render target that I rendered depth information to. Since the original vertex shader is run with the same inputs as the original draw call it should produce the same result. This is done using a custom shader section but only overriding the pixel shader (and this is the reason the custom shader sections do not automatically unbind anything from the pipeline unless you ask them to). There is a special draw command you use to duplicate the original draw call without needing to know the exact parameters the game used: "draw = from_caller".
o0 from the vertex shader can't be used as a texture, because it is not a texture (you could turn it into one by rendering it to a render target with a pixel shader) - it is some form of per vertex information that the rasterizer interpolates (usually linearly, though you can customise that when you refer to it in the pixel shader input) between the three vertices of a triangle for each pixel. In the pixel shader you define an input with a matching*** semantic (typically a TEXCOORD for arbitrary information) to read the interpolated value for that pixel. If the output you are interested in is a "system value" (e.g. SV_Position) the rules are a little different (too complex to go into here - google directx semantics and read the first hit on MSDN) - in particular, SV_Position in the vertex shader is in clip space coordinates, but in the pixel shader it is in pixels (the rasterizer is the point at which that changes in the rendering pipeline).
*** If you are trying to match a TEXCOORD, be aware that DirectX ignores the texcoord index, so you need to specify *all* texcoords that the vertex shader output with the same types and index and matching the order exactly - you can usually just copy the function prototype from the original pixel shader as 3DMigoto usually gets it right unless there is some unusual packing.
I'm not entirely sure I follow - can you describe a bit more what you are trying to achieve?
Taking your question at face value no, you can't inject a vertex and pixel shader between an existing vertex shader and pixel shader because that's not how the graphics pipeline works:
input assembler <--- vertex buffers, index buffer
|
v
-- vertex shader + constant buffers,
/ | 1D, 2D and 3D textures,
| v typed buffers,
| hull shader raw buffers or
| | structured buffers
| v ... as inputs to
| tessellator ^ any type of shader.
| | |
| v |
|\ domain shader |
| \ | |
| v v |
| geometry shader ---> stream output
\ |
\ v
> rasterizer <--- depth buffer for early Z check
|
v
pixel shader <--> UAVs (RW textures, RW buffers, append/consume buffers)
|
v
output merger ---> render targets, depth/stencil target
You could inject one of the other shader types between the vertex and pixel shader - if you wanted to manipulate the vertices in some way before they go to the pixel shader you could use a geometry shader to do it*, though unless you need to generate new vertices that is probably better done by just editing the vertex shader, or you could** use the stream output stage to pass the vertex information out to wherever you like.
* this uses the custom shader support in 3DMigoto to duplicate a draw call with the new geometry shader, but be aware that the original draw call will still happen, so you need to edit one of the original shaders to conditionally abort based on a value in IniParams that you only set when running the custom shader and clear afterwards.
** "could" in theory, but not in practice - we have untested support to bind buffers to the stream output stage as part of the arbitrary resource copying feature (so0, so1, so2, so3), but the shaders also need to be specially prepared to use it, which we can't do yet.
However, 3DMigoto can be used to duplicate a draw call (except indirect draw calls) replacing only what you need from the original, which might be what you want. I used this in The Witness to duplicate a draw call using the original vertex shader with my own pixel shader and a custom render target that I rendered depth information to. Since the original vertex shader is run with the same inputs as the original draw call it should produce the same result. This is done using a custom shader section but only overriding the pixel shader (and this is the reason the custom shader sections do not automatically unbind anything from the pipeline unless you ask them to). There is a special draw command you use to duplicate the original draw call without needing to know the exact parameters the game used: "draw = from_caller".
o0 from the vertex shader can't be used as a texture, because it is not a texture (you could turn it into one by rendering it to a render target with a pixel shader) - it is some form of per vertex information that the rasterizer interpolates (usually linearly, though you can customise that when you refer to it in the pixel shader input) between the three vertices of a triangle for each pixel. In the pixel shader you define an input with a matching*** semantic (typically a TEXCOORD for arbitrary information) to read the interpolated value for that pixel. If the output you are interested in is a "system value" (e.g. SV_Position) the rules are a little different (too complex to go into here - google directx semantics and read the first hit on MSDN) - in particular, SV_Position in the vertex shader is in clip space coordinates, but in the pixel shader it is in pixels (the rasterizer is the point at which that changes in the rendering pipeline).
*** If you are trying to match a TEXCOORD, be aware that DirectX ignores the texcoord index, so you need to specify *all* texcoords that the vertex shader output with the same types and index and matching the order exactly - you can usually just copy the function prototype from the original pixel shader as 3DMigoto usually gets it right unless there is some unusual packing.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
I think one thing worth mentioning, is that for debug purposes it should be possible to use a variation of my constant buffer debug shader to render per vertex information from the vertex shader to the screen as text. The shaders would need some changes, but I don't see any reason it couldn't work in theory... Is this what you are after?
I think one thing worth mentioning, is that for debug purposes it should be possible to use a variation of my constant buffer debug shader to render per vertex information from the vertex shader to the screen as text. The shaders would need some changes, but I don't see any reason it couldn't work in theory... Is this what you are after?
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Yeah, I've figured out it's a horese crap after writing this :)
I took approach from your example for The Witness and it's almost what I want to achieve.
I have a water shader which samples the depth buffer, but the buffer is incomplete, it does not include the water surface. It's somehow exposed in v0.z register, so I was wondering about combining them into a texture before water pixel shader is called.
edit: then it's used for screen space reflections.
I will post an example to look at as soon as I figure out what I did wrong with the csPosition transformations :)
Yeah, I've figured out it's a horese crap after writing this :)
I took approach from your example for The Witness and it's almost what I want to achieve.
I have a water shader which samples the depth buffer, but the buffer is incomplete, it does not include the water surface. It's somehow exposed in v0.z register, so I was wondering about combining them into a texture before water pixel shader is called.
edit: then it's used for screen space reflections.
I will post an example to look at as soon as I figure out what I did wrong with the csPosition transformations :)
Yeah, that sounds like the same problem I hit in The Witness where translucent surfaces did not write to the depth buffer (common in a lot of games actually since it allows for order independent transparency effects), so the stereo crosshair would render behind them, which was not acceptable since some of those surfaces were puzzles.
I did first try merging the depth information into a copy of the depth buffer, but I ran into problems with that (I don't quite recall the specifics - it might have been related to floating point rounding errors when dividing Z by W to match the depth buffer scaling that got amplified when later calculating the linear depth. I don't recall if I tried reading from SV_Depth in the pixel shader as an alternative), so I just wrote the linear depth to a separate texture for better accuracy, then in the destination shader I checked both buffers and used the linear depth for any pixels where it had been written (I cleared the buffer between frames), and calculated it from the Z buffer for other pixels.
Yeah, that sounds like the same problem I hit in The Witness where translucent surfaces did not write to the depth buffer (common in a lot of games actually since it allows for order independent transparency effects), so the stereo crosshair would render behind them, which was not acceptable since some of those surfaces were puzzles.
I did first try merging the depth information into a copy of the depth buffer, but I ran into problems with that (I don't quite recall the specifics - it might have been related to floating point rounding errors when dividing Z by W to match the depth buffer scaling that got amplified when later calculating the linear depth. I don't recall if I tried reading from SV_Depth in the pixel shader as an alternative), so I just wrote the linear depth to a separate texture for better accuracy, then in the destination shader I checked both buffers and used the linear depth for any pixels where it had been written (I cleared the buffer between frames), and calculated it from the Z buffer for other pixels.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
[b][center][color="orange"][size="XL"]3DMigoto 1.2.52[/size][/color]
[size="M"][url]https://github.com/bo3b/3Dmigoto/releases[/url][/size]
[/center][/b]
This update includes a *major* overhaul to the assembler, which should now work in many more games. It should be considered a slightly risky update given how much has changed, but is still highly recommended for anyone working with assembly shaders. The assembler still has a tendancy to drop instructions it doesn't recognise - I would like to address that and have it explicitly fail instead, but I judged it a little too risky for this release given the potential for it to break existing fixes (e.g. if there is a typo, incorrect comment character, etc that is currently being ignored).
[code]
$ git diff --stat 1.2.51 '*.cpp' '*.h'
D3D_Shaders/Assembler.cpp | 1230 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------------------------------------
DirectX11/CommandList.cpp | 167 ++++++++++++++++++++++++-----
DirectX11/CommandList.h | 57 +++++++++-
DirectX11/IniHandler.cpp | 18 +++-
HLSLDecompiler/cmd_Decompiler/cmd_Decompiler.cpp | 136 ++++++++++++++++++------
5 files changed, 1029 insertions(+), 579 deletions(-)
[/code]
[size="L"][color="orange"]General:[/color][/size]
[.]Fixed an off by one error in max_copies_per_frame and max_executions_per_frame that allowed them to execute once more than specified [/.]
[.]Custom resource sections now have a mode=stereo / mode=mono / mode=auto option to set the driver creation mode for the resource (note that driver heuristics mean that even forcing this does not guarantee it will do what you ask...). It was already possible to set this for resources created though copying another - the new options are primarily intended for use with resources created by filling out a complete resource description (but can also be used as an alternative to specifying the mode in the copy operation if the destination is a custom resource)[/.]
[.]Custom resources now have a bind_flags option to override the flags that 3DMigoto automatically selects. This is an advanced option and I do not recommended using it unless you have a good understanding of what these do, but it can be used to influence driver heuristics and may be useful in some rare circumstances (e.g. a buffer created with the render_target bind flag will be stereoised, and a buffer with the unordered_access bind flag will be steroised if stereo compute shaders are enabled)[/.]
[.]Buffer type custom resources can now be loaded from a file with the filename= option. Note that unlike textures it is necessary to specify the type of buffer (type=Buffer, StructuredBuffer, AppendStructuredBuffer, ConsumeStructuredBuffer, ByteAddressBuffer, RWBuffer, RWStructuredBuffer, RWByteAddressBuffer), and either format= or stride= as these cannot be automatically determined (only one of format or stride are required - stride is usually filled in automatically based on the format). The buffer size will be set to the file size unless it is overridden in the custom resource section.[/.]
[.]The frame analysis log now indicates which type of copy operation was used when copying a resource[/.]
[size="L"][color="orange"]Assembler:[/color][/size]
[.]Tabs may now be used for indentation[/.]
[.]Comments are now stripped before being parsed (fixes an issue where a malformed comment could cause the assembler to enter an infinite loop and consume all memory)[/.]
[.]Add support for all missing interpolation modes in dcl_input_ps_siv and dcl_input_ps instructions (Fixes issues found in WATCH_DOGS2)[/.]
[.]Add support for missing resource types (int, uint, unorm, snorm, double, etc) in all resource declaration and load instructions (Fixes issues in WATCH_DOGS2 and Song of the Deep)[/.]
[.]Add support for MSAA resources with arbitrary number of samples[/.]
[.]Add support for address offset immediate variants of various resource load, sample and gather instructions[/.]
[.]Add support for the vGSInstanceID special purpose register[/.]
[.]Add missing system values to dcl_output_sgv, dcl_output_siv, dcl_input_siv, dcl_input_sgv, dcl_input_ps_sgv and dcl_input_ps_siv instructions[/.]
[.]Add missing global flags enableDoublePrecisionFloatOps, skipOptimization, enableMinimumPrecision, enable11_1DoubleExtensions and enable11_1ShaderExtensions[/.]
[.]Add support for pow2 tesselation mode to hull shaders[/.]
[.]Add support for point outputs from hull shaders[/.]
[.]Add support for point strip outputs from geometry shaders[/.]
[.]Add support for line list with adjacency as an input to geometry shaders[/.]
[.]Add support for instanced geometry shaders[/.]
[.]Add support for writing to SV_IsFrontFace from geometry shaders[/.]
[.]Add missing atomic operations and synchronisation instructions in compute shaders[/.]
[.]Adds support for the following missing instructions:
abort, atomic_cmp_store, atomic_xor, bufinfo, dadd, dcl_gsinstances, dcl_output_sgv, ddiv, deq, dfma, dge, dlt, dmax, dmin, dmov, dmovc, dmul, dne, drcp, dtof, dtoi, dtou, emit_then_cut, emit_then_cut_stream, eval_centroid, eval_snapped, firstbin_sni, ftod, gather4_aoffimmi, gather4_c, gather4_c_aoffimmi, gather4_po, gather4_po_c, imm_atomic_imax, imm_atomic_imin, imm_atomic_or, imm_atomic_umax, imm_atomic_umin, imm_atomic_xor, itod, ld_uav_typed, ldms_aoffimmi, msad, nop, resinfo_rcpfloat, sample_b_aoffimmi, sample_b_aoffimmi_indexable, sample_c_aoffimmi_indexable, sample_d_aoffimmi, sample_d_aoffimmi_indexable, sync, sync_g, sync_sat_uglobal, sync_sat_uglobal_g, sync_sat_uglobal_g_t, sync_sat_uglobal_t, sync_sat_ugroup, sync_sat_ugroup_g, sync_sat_ugroup_g_t, sync_sat_ugroup_t, sync_t, sync_uglobal_g, sync_uglobal_g_t, sync_uglobal_t, uaddc, usubb, utod[/.]
Most of these missing features were discovered during a comprehensive audit of the assembler, and most have been verified with test cases. There are still some missing features, but they should be fairly rare - instructions related to function calls are still not supported (fxc aggressively inlines everything - I have never seen it produce a function), nor are debug layer instructions (useless without the DirectX SDK, though potentially quite useful with it) and one instruction and register optionally used by the hull shader join phase is still missing (we have never needed to fix a hull shader and I am not familiar with how the join phase maps to HLSL).
[size="L"][color="orange"]cmd_Decompiler:[/color][/size]
[.]The --validate flag used when disassembling a shader will now show the binary differences between the original and reassembled shader if validation fails.[/.]
[.]Shader validation now uses signature parsing and checks each section of the binary shader separately, to point out whether any problems lie in the assembly text or signature sections[/.]
[.]Fixed an issue with signature parsing for geometry shaders using the SV_PrimitiveID semantic[/.]
[.]Fixed a minor issue in the binary dump in the verbose output from the signature parser[/.]
This update includes a *major* overhaul to the assembler, which should now work in many more games. It should be considered a slightly risky update given how much has changed, but is still highly recommended for anyone working with assembly shaders. The assembler still has a tendancy to drop instructions it doesn't recognise - I would like to address that and have it explicitly fail instead, but I judged it a little too risky for this release given the potential for it to break existing fixes (e.g. if there is a typo, incorrect comment character, etc that is currently being ignored).
Fixed an off by one error in max_copies_per_frame and max_executions_per_frame that allowed them to execute once more than specified
Custom resource sections now have a mode=stereo / mode=mono / mode=auto option to set the driver creation mode for the resource (note that driver heuristics mean that even forcing this does not guarantee it will do what you ask...). It was already possible to set this for resources created though copying another - the new options are primarily intended for use with resources created by filling out a complete resource description (but can also be used as an alternative to specifying the mode in the copy operation if the destination is a custom resource)
Custom resources now have a bind_flags option to override the flags that 3DMigoto automatically selects. This is an advanced option and I do not recommended using it unless you have a good understanding of what these do, but it can be used to influence driver heuristics and may be useful in some rare circumstances (e.g. a buffer created with the render_target bind flag will be stereoised, and a buffer with the unordered_access bind flag will be steroised if stereo compute shaders are enabled)
Buffer type custom resources can now be loaded from a file with the filename= option. Note that unlike textures it is necessary to specify the type of buffer (type=Buffer, StructuredBuffer, AppendStructuredBuffer, ConsumeStructuredBuffer, ByteAddressBuffer, RWBuffer, RWStructuredBuffer, RWByteAddressBuffer), and either format= or stride= as these cannot be automatically determined (only one of format or stride are required - stride is usually filled in automatically based on the format). The buffer size will be set to the file size unless it is overridden in the custom resource section.
The frame analysis log now indicates which type of copy operation was used when copying a resource
Assembler:
Tabs may now be used for indentation
Comments are now stripped before being parsed (fixes an issue where a malformed comment could cause the assembler to enter an infinite loop and consume all memory)
Add support for all missing interpolation modes in dcl_input_ps_siv and dcl_input_ps instructions (Fixes issues found in WATCH_DOGS2)
Add support for missing resource types (int, uint, unorm, snorm, double, etc) in all resource declaration and load instructions (Fixes issues in WATCH_DOGS2 and Song of the Deep)
Add support for MSAA resources with arbitrary number of samples
Add support for address offset immediate variants of various resource load, sample and gather instructions
Add support for the vGSInstanceID special purpose register
Add missing system values to dcl_output_sgv, dcl_output_siv, dcl_input_siv, dcl_input_sgv, dcl_input_ps_sgv and dcl_input_ps_siv instructions
Add missing global flags enableDoublePrecisionFloatOps, skipOptimization, enableMinimumPrecision, enable11_1DoubleExtensions and enable11_1ShaderExtensions
Add support for pow2 tesselation mode to hull shaders
Add support for point outputs from hull shaders
Add support for point strip outputs from geometry shaders
Add support for line list with adjacency as an input to geometry shaders
Add support for instanced geometry shaders
Add support for writing to SV_IsFrontFace from geometry shaders
Add missing atomic operations and synchronisation instructions in compute shaders
Most of these missing features were discovered during a comprehensive audit of the assembler, and most have been verified with test cases. There are still some missing features, but they should be fairly rare - instructions related to function calls are still not supported (fxc aggressively inlines everything - I have never seen it produce a function), nor are debug layer instructions (useless without the DirectX SDK, though potentially quite useful with it) and one instruction and register optionally used by the hull shader join phase is still missing (we have never needed to fix a hull shader and I am not familiar with how the join phase maps to HLSL).
cmd_Decompiler:
The --validate flag used when disassembling a shader will now show the binary differences between the original and reassembled shader if validation fails.
Shader validation now uses signature parsing and checks each section of the binary shader separately, to point out whether any problems lie in the assembly text or signature sections
Fixed an issue with signature parsing for geometry shaders using the SV_PrimitiveID semantic
Fixed a minor issue in the binary dump in the verbose output from the signature parser
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
I'm trying to draw 2 small debug resources ResourceDepthBuffer and ResourceLakePositionBuffer. Because ResourceLakePositionBuffer is using Draw = from_caller the area of the geometry gets updated, the rest of the resource contains stale information,so I'm using a custom shader to clear it. post run = CustomShaderClearRT does nothing. When I use run = CustomShaderClearRT the whole back buffer goes black. Do you know by any chance what am I doing wrong? I thought ResourceLakePositionBuffer = null will work, but it does nothing as well.
[code]
[ResourceDepthBuffer]
[ResourceLakePositionBuffer]
format = R8G8B8A8_UNORM
max_copies_per_frame=1
[ShaderOverride-SkyPS]
Hash = 5fb7805badf885b7
ResourceDepthBuffer = copy oD
[CustomShaderPostprocess]
vs = ShaderFixes\postProcess.vs.hlsl
ps = ShaderFixes\postProcess.ps.hlsl
blend = disable
x1=rt_width
y1=rt_height
ps-t104 = bb
ps-t105 = ResourceDepthBuffer
ps-t106 = ResourceLakePositionBuffer
o0 = bb
draw = 6, 0
[ResourceBackupo0]
[ResourceBackupo1]
[CustomShaderPositionBuffer]
blend = disable
ps = ShaderFixes\position.ps.hlsl
ResourceLakePositionBuffer = copy_desc o0
ResourceBackupo0 = ref o0
o0 = ResourceLakePositionBuffer
Draw = from_caller
post o0 = ResourceBackupo0
[ShaderOverride-665483756892af90-vs]
hash = 665483756892af90
run = CustomShaderPositionBuffer
[CustomShaderClearRT]
blend = disable
vs = ShaderFixes\clear_rt.vs.hlsl
ps = ShaderFixes\clear_rt.ps.hlsl
ResourceBackupo0 = ref o0
o0 = ResourceLakePositionBuffer
Draw = 6, 0
post o0 = ResourceBackupo0
[Present]
ResourceLakePositionBuffer = null
run = CustomShaderClearRT //post run = CustomShaderClearRT
run = CustomShaderPostprocess
[/code]
I'm trying to draw 2 small debug resources ResourceDepthBuffer and ResourceLakePositionBuffer. Because ResourceLakePositionBuffer is using Draw = from_caller the area of the geometry gets updated, the rest of the resource contains stale information,so I'm using a custom shader to clear it. post run = CustomShaderClearRT does nothing. When I use run = CustomShaderClearRT the whole back buffer goes black. Do you know by any chance what am I doing wrong? I thought ResourceLakePositionBuffer = null will work, but it does nothing as well.
[ResourceDepthBuffer]
[ResourceLakePositionBuffer]
format = R8G8B8A8_UNORM
max_copies_per_frame=1
[ShaderOverride-SkyPS]
Hash = 5fb7805badf885b7
ResourceDepthBuffer = copy oD
Original Post: https://forums.geforce.com/default/topic/872839/3d-vision/mad-max-3d-vision/post/4672069/#4672069
And about an hour later I had worked through that math and came up with this:
I added that to my matrix.py playground at the time and was able to verify that the numbers I got out seemed to match the results of using the nvidia formula, but the idea was largely shelved since then as we never really came across a situation where we needed it. But it has been sitting in the back of my mind because I was sure the theory was solid, and that there might be some situations where it could be useful, so this weekend I spent a little time switching a couple of shaders to use it as a proof of concept that it does indeed work. I don't have any screenshots - because these games were already fixed, so the before and after screenshots would look identical ;-)
I started out by switching a couple of random vertex shaders to use it instead of the driver's stereo correction, at first I used a simple case in MGSVTPP that used a pure projection matrix, just to make sure I had the details ironed out before trying anything more complex:
https://github.com/DarkStarSword/3d-fixes/commit/0645695dc8236b4776ab9e0bab16d2e0f96b7fe2
Then I applied it to another random vertex shader in Far Cry Primal, but this one is more interesting because it was modifying a View-Rotation-Projection matrix, not just a pure projection matrix - and the fact that this worked proved the theory that this can be applied to any composite matrix that includes a projection matrix:
https://github.com/DarkStarSword/3d-fixes/commit/923a79255f4721d3ee5eabf5b3a012b880696464
So far I've just duplicated the results of the nvidia driver, and only for forwards projection matrices, which doesn't really show what we could do with this. So, then I took one of the shadow shaders from Far Cry Primal, ripped out my previous shadow fix and replaced it with a variation of this that injects a stereo correction into an *Inverse* View-Rotation-Projection matrix - the maths are a little different here, but not by much. This worked beautifully and the result was identical to editing the shader:
https://github.com/DarkStarSword/3d-fixes/commit/8b7b9c7ba6dc053b35bdbf7df177634299696f62
So, this is where things get interesting - because this technique relies on changing the matrices passed to the shader instead of modifying the code in the shader it could *in some cases* allow us to fix effects in a game without necessarily understanding the code, only identifying relevant matrices in the constant buffers. In fact, with a bit more work in 3DMigoto it should be possible to eliminate the need to edit the shader at all as 3DMigoto could simply replace the constant buffer with a stereo pair containing the modified matrices for each eye.
This isn't a magic bullet - it alone will not allow someone to 100% fix all rendering issues in a game... but it might get us to a point where our starting point has less broken effects that need manual fixes. I also have some hopes that it might help in complex shaders that are difficult to understand or perform a large number of coordinate transformations (I'm thinking screen space reflections, ambient occlusion, etc... however I was not able to make this work with the water reflections (not screen space) in Far Cry Primal, so don't get your hopes up too high).
At the moment this is not optimised for performance, which doesn't matter here since I'm only applying this to a small number of effects, but if we ever applied this globally then running an extra compute shader for every draw call could bite. I'm sure there are a few tricks we could employ though, for one - the injection matrix should be constant for an entire frame (or at least per render target), so we could calculate that once and then just multiply each new MVP matrix by it. And if the game has the model matrices separate from the view-projection then we would only need to do the multiplication once.
At the moment, I'm not entirely sure where this will lead. Maybe this gives us a new tool that we can use to try to fix certain effects, maybe this is the next step on the path towards getting 3DMigoto to be able to fix a large number of effects with minimal effort... or maybe this is just a mathematical curiosity that we never end up using in practice.
The above three commits are all in a topic branch "stereo_injection_matrix_proof_of_concept" on my github. I have another interesting maths discovery (used as part of this, but with broader uses) which I'll go into in just a moment.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
Since this matrix is derived from an inverse projection matrix and a stereo corrected forwards projection matrix it includes a couple of fields from the projection matrix - the near and far clipping planes. If we were working with a pure projection matrix we could probably derive those quite easily with some trivial algebra, but the whole point of this technique is that it will work with composite projection matrices as well, and the game doesn't always hand us these values, so I had to find a way to derive them from any composite projection matrix.
The trick I came up with is actually fairly simple - take a homogeneous coordinate (0, 0, 0, 1) and multiply it by the inverse projection (or inverse view-projection, or inverse model-view-projection, etc) matrix. Then normalise that coordinate (i.e. turn it into a 3D cartesian coordinate somewhere in space) and multiply that by the forwards matrix - the resulting W value will be the near clipping plane. To find the far clipping plane do exactly the same, but start with the homogeneous coordinate (0, 0, 1, 1). This may be of use in other cases where we need to find the near and far clipping planes (e.g. for some variations of the stereo crosshair code, or certain corrections where we need to multiply or divide by the far clipping plane).
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
Like my fixes? Dontations can be made to: www.paypal.me/DShanz or rshannonca@gmail.com
Like electronic music? Check out: www.soundcloud.com/dj-ryan-king
Obviously we can already dump these out with frame analysis, and we can often use that to identify various matrices in the constant buffers, but sometimes we might be looking for something that might not be immediately obvious, like a combined MVP matrix that could just look like 16 random numbers. Being able to view it live means that we can quickly work out which numbers change while rotating or moving the camera, suggesting that they may well be a view matrix, or a composite matrix including a view matrix.
This does not require any new support in 3DMigoto - it leverages the custom shader support we already have, just in a new and exciting way. There are several parts to make this work, all of which can be found here:
https://github.com/DarkStarSword/3d-fixes/tree/master/custom_shader_cb_live_view
That's all commented and should be fairly straight forward to drop in any game you are working on. Add a line similar to this to a [ShaderOverride] section to copy or reference the constant buffer of interest:
Then in the present section add this:
To reduce how many entries it displays, simply lower the number on the draw line in the custom shader section:
If you wish to change the text colour, simply edit this line in the pixel shader:
Note that the floating point values this displays may not be as accurate as those dumped out using frame analysis - this is not intended to give you super accurate values, this is intended to see how the values change during gameplay, and a bit of floating point error shouldn't hurt there.
For a bit of an overview of how this works - this is entirely decoding the constant buffers on the GPU. It uses three shaders - the vertex shader is not very interesting as it just passes the index into the constant buffer along to the geometry shader. The geometry shader is the real heart of this - it includes routines to convert the floating point values from the constant buffer to text, position each character on screen taking into account the width of each individual character and generate the required triangle strips to pass to the pixel shader, which draws the font to the screen. A single invocation of the geometry shader can only generate a limited amount of text because the buffer size it passes to the pixel shader is quite small, so each invocation only generates the text for the four components of a single offset in the constant buffer, and we simply invoke it for however many entries we want to see.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
Seriously, I put in a request for this a while back, and was just thinking about this 2 days ago. Christmas comes 4 days early this year!
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
Like my fixes? Dontations can be made to: www.paypal.me/DShanz or rshannonca@gmail.com
Like electronic music? Check out: www.soundcloud.com/dj-ryan-king
https://github.com/DarkStarSword/3d-fixes/commit/d09e0e3d0d05ab8efeab0ce156101eeec33dff29
That engine has some complications, in particular their projection matrix looks like this:
Specifically, the 0 in location 3x3 stuffs up the near & far calculations - that location would normally hold far/(far-near), but it seems this engine can optionally use something called "reverse Z projection" and in that case using the above method of finding near & far will end up with a divide by zero and propogate the result "Not a Number" through the rest of the calculations.
At the moment I'm deriving the injection matrix from first principles instead since I do have access to the projection matrix elsewhere in the constant buffer, and that works fine. There's probably some other maths I could use to derive the injection matrix, but this will do for now (I'm certain I can make this run once per frame in this engine, so I'm not worried about a few more calculations here).
The second complication is that the matrix I needed to adjust was "SVPositionToTranslatedWorld", which includes a resolution divide and viewport offset, as opposed to "ClipToTranslatedWorld" which is the inverse view-projection matrix that the injection matrix operates on (these names come from the engine's source code). Fortunately, it is easy to pull out the extra component matrix in SVPositionToTranslatedWorld with a bit more math and inject the stereo correction between it and the inverse projection matrix. We could also generate the extra matrix fairly easily based on the resolution, but for now I'm deriving it in case differences in render scale or viewport offset mess that up.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
So far I've only been able to get this to work with buffers in UAV / SRV slots - to use the buffer in a constant buffer slot I need to make a copy of it with different bind flags (a DirectX restriction related to hardware optimisations), but doing that copy loses the values for the left eye. Edit: Seems the problem is that buffers are created stereo based on driver heuristics and ignore the surface creation mode - and I can't influence those heuristics for constant buffers.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
Same error with all hooking options and "allow" options, however a couple interesting notes are:
1) With none of the hooking/allow options set (or using any other than what I mention in #2 below) the game provides an error message "DirectX Error - The parameter is incorrect". I checked the Steam forums, and people were also getting that error when running on Win7 without the evil update, and installing evil update fixed issue for them. (I'm running on Win 10, and have evil update on Win 7 already installed, so doesn't apply here)
2) If I set allow_create_device to 1 or 2, I don't get the above message, a .dmp file gets generated in the game's folder, and if I don't get an unbuffered log the log section cuts off at:
Can anything be done? Game is available on shaderhackers account if testing is required.
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
Like my fixes? Dontations can be made to: www.paypal.me/DShanz or rshannonca@gmail.com
Like electronic music? Check out: www.soundcloud.com/dj-ryan-king
I need to inject a custom vertex/pixel shader before an original pixel shader but after an original vertex shader. can I use o0 from an original vertex shader as a texture in a custom pixel shader? It doesn't seem to work.
EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64
Taking your question at face value no, you can't inject a vertex and pixel shader between an existing vertex shader and pixel shader because that's not how the graphics pipeline works:
You could inject one of the other shader types between the vertex and pixel shader - if you wanted to manipulate the vertices in some way before they go to the pixel shader you could use a geometry shader to do it*, though unless you need to generate new vertices that is probably better done by just editing the vertex shader, or you could** use the stream output stage to pass the vertex information out to wherever you like.
* this uses the custom shader support in 3DMigoto to duplicate a draw call with the new geometry shader, but be aware that the original draw call will still happen, so you need to edit one of the original shaders to conditionally abort based on a value in IniParams that you only set when running the custom shader and clear afterwards.
** "could" in theory, but not in practice - we have untested support to bind buffers to the stream output stage as part of the arbitrary resource copying feature (so0, so1, so2, so3), but the shaders also need to be specially prepared to use it, which we can't do yet.
However, 3DMigoto can be used to duplicate a draw call (except indirect draw calls) replacing only what you need from the original, which might be what you want. I used this in The Witness to duplicate a draw call using the original vertex shader with my own pixel shader and a custom render target that I rendered depth information to. Since the original vertex shader is run with the same inputs as the original draw call it should produce the same result. This is done using a custom shader section but only overriding the pixel shader (and this is the reason the custom shader sections do not automatically unbind anything from the pipeline unless you ask them to). There is a special draw command you use to duplicate the original draw call without needing to know the exact parameters the game used: "draw = from_caller".
o0 from the vertex shader can't be used as a texture, because it is not a texture (you could turn it into one by rendering it to a render target with a pixel shader) - it is some form of per vertex information that the rasterizer interpolates (usually linearly, though you can customise that when you refer to it in the pixel shader input) between the three vertices of a triangle for each pixel. In the pixel shader you define an input with a matching*** semantic (typically a TEXCOORD for arbitrary information) to read the interpolated value for that pixel. If the output you are interested in is a "system value" (e.g. SV_Position) the rules are a little different (too complex to go into here - google directx semantics and read the first hit on MSDN) - in particular, SV_Position in the vertex shader is in clip space coordinates, but in the pixel shader it is in pixels (the rasterizer is the point at which that changes in the rendering pipeline).
*** If you are trying to match a TEXCOORD, be aware that DirectX ignores the texcoord index, so you need to specify *all* texcoords that the vertex shader output with the same types and index and matching the order exactly - you can usually just copy the function prototype from the original pixel shader as 3DMigoto usually gets it right unless there is some unusual packing.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
I took approach from your example for The Witness and it's almost what I want to achieve.
I have a water shader which samples the depth buffer, but the buffer is incomplete, it does not include the water surface. It's somehow exposed in v0.z register, so I was wondering about combining them into a texture before water pixel shader is called.
edit: then it's used for screen space reflections.
I will post an example to look at as soon as I figure out what I did wrong with the csPosition transformations :)
EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64
I did first try merging the depth information into a copy of the depth buffer, but I ran into problems with that (I don't quite recall the specifics - it might have been related to floating point rounding errors when dividing Z by W to match the depth buffer scaling that got amplified when later calculating the linear depth. I don't recall if I tried reading from SV_Depth in the pixel shader as an alternative), so I just wrote the linear depth to a separate texture for better accuracy, then in the destination shader I checked both buffers and used the linear depth for any pixels where it had been written (I cleared the buffer between frames), and calculated it from the Z buffer for other pixels.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
https://github.com/bo3b/3Dmigoto/releases
This update includes a *major* overhaul to the assembler, which should now work in many more games. It should be considered a slightly risky update given how much has changed, but is still highly recommended for anyone working with assembly shaders. The assembler still has a tendancy to drop instructions it doesn't recognise - I would like to address that and have it explicitly fail instead, but I judged it a little too risky for this release given the potential for it to break existing fixes (e.g. if there is a typo, incorrect comment character, etc that is currently being ignored).
General:
Assembler:
abort, atomic_cmp_store, atomic_xor, bufinfo, dadd, dcl_gsinstances, dcl_output_sgv, ddiv, deq, dfma, dge, dlt, dmax, dmin, dmov, dmovc, dmul, dne, drcp, dtof, dtoi, dtou, emit_then_cut, emit_then_cut_stream, eval_centroid, eval_snapped, firstbin_sni, ftod, gather4_aoffimmi, gather4_c, gather4_c_aoffimmi, gather4_po, gather4_po_c, imm_atomic_imax, imm_atomic_imin, imm_atomic_or, imm_atomic_umax, imm_atomic_umin, imm_atomic_xor, itod, ld_uav_typed, ldms_aoffimmi, msad, nop, resinfo_rcpfloat, sample_b_aoffimmi, sample_b_aoffimmi_indexable, sample_c_aoffimmi_indexable, sample_d_aoffimmi, sample_d_aoffimmi_indexable, sync, sync_g, sync_sat_uglobal, sync_sat_uglobal_g, sync_sat_uglobal_g_t, sync_sat_uglobal_t, sync_sat_ugroup, sync_sat_ugroup_g, sync_sat_ugroup_g_t, sync_sat_ugroup_t, sync_t, sync_uglobal_g, sync_uglobal_g_t, sync_uglobal_t, uaddc, usubb, utod
Most of these missing features were discovered during a comprehensive audit of the assembler, and most have been verified with test cases. There are still some missing features, but they should be fairly rare - instructions related to function calls are still not supported (fxc aggressively inlines everything - I have never seen it produce a function), nor are debug layer instructions (useless without the DirectX SDK, though potentially quite useful with it) and one instruction and register optionally used by the hull shader join phase is still missing (we have never needed to fix a hull shader and I am not familiar with how the join phase maps to HLSL).
cmd_Decompiler:
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64