Thanks for the help! I'll keep all of that in mind.
Except for the HUD shaders (where I made the depth and tilting formulas), the rest of shaders only needed the typical stereo correction. I can make them use only ASM easily, and then check the performance (edit2: strange. The main one controling wave effects looks different. Incorrect in ASM despite the same formula).
Edit 3: now that I think about it. Some of my shaders stereoize o0 because they were in 2D (and this gives them even proper depth). Those would be the main suspects of those 2 fps when pressing F9 in hunting mode.
I'll wait for 1.2.66 patiently while I play :).
Can you tell me how to load the xyzw and x1y1z1w1 iniparams in ASM? I think I tried once and didn't figure it out correctly (edit: probably "dcl_resource_texture2d (float,float,float,float) t120" and then "dcl_resource_texture2d (float,float,float,float) t120" for the normal ones, right?). And now that I asked that, has anyone here loaded the "matrix.hlsl" and used it in ASM? (I don't use it for Grim Dawn, but just in case I need it for an AMS shader in the future).
Thanks for the help! I'll keep all of that in mind.
Except for the HUD shaders (where I made the depth and tilting formulas), the rest of shaders only needed the typical stereo correction. I can make them use only ASM easily, and then check the performance (edit2: strange. The main one controling wave effects looks different. Incorrect in ASM despite the same formula).
Edit 3: now that I think about it. Some of my shaders stereoize o0 because they were in 2D (and this gives them even proper depth). Those would be the main suspects of those 2 fps when pressing F9 in hunting mode.
I'll wait for 1.2.66 patiently while I play :).
Can you tell me how to load the xyzw and x1y1z1w1 iniparams in ASM? I think I tried once and didn't figure it out correctly (edit: probably "dcl_resource_texture2d (float,float,float,float) t120" and then "dcl_resource_texture2d (float,float,float,float) t120" for the normal ones, right?). And now that I asked that, has anyone here loaded the "matrix.hlsl" and used it in ASM? (I don't use it for Grim Dawn, but just in case I need it for an AMS shader in the future).
[quote="bo3b"]@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect.
Please run it again with full debugging enabled, and we'll take a look:
calls=1
input=1
debug=1
unbuffered=1
[/quote]
I uploaded a log at [url]https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt[/url].
bo3b said:@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect.
Please run it again with full debugging enabled, and we'll take a look:
calls=1
input=1
debug=1
unbuffered=1
Do make sure that those shaders are actually contributing to the fps hit before converting them (i.e. move them out of the way and reload), because it might be a lot of effort for zero gain if they are not. HLSL is not inherently slower than assembly, but in some pathological cases it may be - it just happens that I hit a particularly bad case of that in Dreamfall Chapters.
For IniParams x/y/z/w you want:
[code]dcl_resource_texture1d (float,float,float,float) t120
...
ld_indexable(texture1d)(float,float,float,float) r12.xyzw, l(0, 0, 0, 0), t120.xyzw
[/code]
Note that it is texture1d, not texture2d. If you want to load say x2/y2/z2/w2 you do:
[code]ld_indexable(texture1d)(float,float,float,float) r8.xyzw, l(2, 0, 0, 0), t120.xyzw
[/code]
The recommended way to do a matrix inverse in assembly is to inverse it in a separate custom compute shader, such as this one (which uses 4 threads in a compute shader to do it in parallel):
https://github.com/DarkStarSword/3d-fixes/blob/master/inverse-cs.hlsl
Example:
https://github.com/DarkStarSword/3d-fixes/commit/14f07cbf994f5d9335909d28a70c97ea8039621d
Another example is in my Unity template, which does a little more and optimises it to only do the inverse once per frame (which may or may not be possible for you depending on what you are inversing).
You can do it directly in assembly without that, but if you need a full matrix inverse it can get pretty nasty - this is from fxc compiling a trivial shader that inverses a matrix in v0-v3 into o0-o3. You would need to search and replace all the registers with the right inputs + outputs + unused temps, and add 9 to dcl_temps:
[code]
mul r0.xyzw, v0.zwyw, v3.wzwy
mad r0.xyzw, v0.zwyw, v3.wzwy, -r0.yxwz
mul r1.xyz, r0.yxwy, v1.yxxy
mul r0.xyz, r0.xyzx, v2.yxxy
mul r2.xyzw, v1.wzwy, v3.zwyw
mad r2.xyzw, v3.zwyw, v1.wzwy, -r2.yxwz
mad r1.xyz, v0.yxxy, r2.yxwy, r1.xyzx
mul r2.xyz, r2.xyzx, v2.yxxy
mul r3.xyzw, v0.zwyw, v1.wzwy
mad r3.xyzw, v0.zwyw, v1.wzwy, -r3.yxwz
mad r1.xyz, v3.yxxy, r3.xyzx, r1.xyzx
mov r4.z, r1.x
mul r5.xyz, v2.zwxz, v3.wxzw
mad r5.xyz, v2.wxzw, v3.zwxz, -r5.xyzx
dp3 r5.y, r5.xyzx, v1.xzwx
mul r6.xyz, v2.yzxy, v3.zxyz
mad r6.xyz, v2.zxyz, v3.yzxy, -r6.xyzx
dp3 r5.w, r6.xyzx, v1.xyzx
mul r6.xyz, v2.ywxy, v3.wxyw
mad r6.yzw, -v2.wwxy, v3.yywx, r6.xxyz
mad r0.w, v2.w, v3.y, -r6.x
mad r0.z, v0.x, r0.w, r0.z
dp3 r5.z, r6.yzwy, v1.xywx
mul r6.xyz, v2.wyzw, v3.zwyz
mad r6.xyz, v2.zwyz, v3.wyzw, -r6.xyzx
dp3 r5.x, r6.xyzx, v1.yzwy
dp4 r0.w, r5.xyzw, v0.xyzw
mul r5.xyzw, v0.wzwy, v2.zwyw
mad r5.xyzw, v2.zwyw, v0.wzwy, -r5.yxwz
mul r7.xyz, r5.yxwy, v1.yxxy
mul r8.xyzw, v1.zwyw, v2.wzwy
mad r8.xyzw, v1.zwyw, v2.wzwy, -r8.yxwz
mad r7.xyz, v0.yxxy, r8.yxwy, r7.xyzx
mad r3.xyz, v2.yxxy, r3.yxwy, r7.xyzx
mov r4.w, r3.x
mad r1.x, v1.y, r6.x, r2.x
mad r4.x, v3.y, r8.x, r1.x
mul r7.xyz, v2.zwyz, v3.wyzw
mad r1.xw, v2.wwwz, v3.zzzy, -r7.xxxz
mad r2.x, v2.y, v3.w, -r7.y
mad r2.x, v1.x, r2.x, r2.z
mad r2.y, v1.x, r1.x, r2.y
mad r7.x, v3.x, r8.y, r2.y
mad r2.x, v3.x, r8.z, r2.x
mad r0.x, v0.y, r1.x, r0.x
mad r0.y, v0.x, r6.x, r0.y
mul r1.x, r6.z, v0.x
mad r7.y, v3.x, r5.y, r0.y
mul r0.y, r1.w, v1.x
mad r4.y, v3.y, r5.x, r0.x
mad r2.y, v3.x, r5.z, r0.z
div o0.xyzw, r4.xyzw, r0.wwww
mov r7.z, r1.y
mov r2.z, r1.z
mov r7.w, r3.y
mov r2.w, r3.z
div o2.xyzw, r2.xyzw, r0.wwww
div o1.xyzw, r7.xyzw, r0.wwww
mul r0.xz, v1.yyzy, v3.zzyz
mad r0.xz, v1.yyzy, v3.zzyz, -r0.zzxz
mad r0.x, v2.x, r0.x, r0.y
mul r1.yz, v1.zzyz, v2.yyzy
mad r1.yz, v2.yyzy, v1.zzyz, -r1.zzyz
mad r2.x, v3.x, r1.y, r0.x
mul r0.xy, v0.zyzz, v3.yzyy
mad r0.xy, v3.yzyy, v0.zyzz, -r0.yxyy
mad r0.x, v2.x, r0.x, r1.x
mul r0.y, r0.y, v1.x
mad r0.y, v0.x, r0.z, r0.y
mul r1.xy, v0.yzyy, v2.zyzz
mad r1.xy, v0.yzyy, v2.zyzz, -r1.yxyy
mad r2.y, v3.x, r1.x, r0.x
mul r0.x, r1.y, v1.x
mad r0.x, v0.x, r1.z, r0.x
mul r1.xy, v0.zyzz, v1.yzyy
mad r1.xy, v1.yzyy, v0.zyzz, -r1.yxyy
mad r2.z, v3.x, r1.x, r0.y
mad r2.w, v2.x, r1.y, r0.x
div o3.xyzw, r2.xyzw, r0.wwww
[/code]
I have a hand-crafted optimised version for a euclidean matrix (one that doesn't include a projection matrix where we can assume the 4th column is 0001), which is a lot shorter, but that's in DX9 assembly and if I try to convert it now without testing I'm sure to screw it up - it's in my matrix.py if you wanted to take a look (there's some magic in the pyasm module that can convert it to shader model 3 and substitute the registers for whatever you need, but I haven't updated that for DX11 and I never got around to making that easy to use - the idea was that it would end up in shadertool)... Or you could do the same trick with fxc and just tell it that the 4th column is constant.
Do make sure that those shaders are actually contributing to the fps hit before converting them (i.e. move them out of the way and reload), because it might be a lot of effort for zero gain if they are not. HLSL is not inherently slower than assembly, but in some pathological cases it may be - it just happens that I hit a particularly bad case of that in Dreamfall Chapters.
Another example is in my Unity template, which does a little more and optimises it to only do the inverse once per frame (which may or may not be possible for you depending on what you are inversing).
You can do it directly in assembly without that, but if you need a full matrix inverse it can get pretty nasty - this is from fxc compiling a trivial shader that inverses a matrix in v0-v3 into o0-o3. You would need to search and replace all the registers with the right inputs + outputs + unused temps, and add 9 to dcl_temps:
I have a hand-crafted optimised version for a euclidean matrix (one that doesn't include a projection matrix where we can assume the 4th column is 0001), which is a lot shorter, but that's in DX9 assembly and if I try to convert it now without testing I'm sure to screw it up - it's in my matrix.py if you wanted to take a look (there's some magic in the pyasm module that can convert it to shader model 3 and substitute the registers for whatever you need, but I haven't updated that for DX11 and I never got around to making that easy to use - the idea was that it would end up in shadertool)... Or you could do the same trick with fxc and just tell it that the 4th column is constant.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
[quote="mx-2"][quote="bo3b"]@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect.
Please run it again with full debugging enabled, and we'll take a look:
calls=1
input=1
debug=1
unbuffered=1
[/quote]
I uploaded a log at [url]https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt[/url].[/quote]
OK, thanks for that. The log shows that we are in fact hooking the DXGI properly, and creating the Overlay. But we never get Present calls.
The problem is the game does another one of these incomprehensible things:
[code]HackerUnknown::QueryInterface(class HackerDXGISwapChain@0000000002FFC2D0) called with IID: IDXGISwapChain
returns result = 0 for 0000000003ABD838
[/code]
That's a call to SwapChain asking for ... itself.
We don't presently have any code around the QueryInterface for the HackerSwapChain because we never expect devs to do something like this. Of course, we can be surprised. This effectively unhooks us, because the returned result is what they use to call Present.
For this game, we'll need a 3Dmigoto code change to look for that stupid self->whoami call, and return self.
In the meantime, you can use 1.0.1 to hunt shaders. I know that it has been reported that it's not possible to hunt all shaders, but it is. There isn't any difference in how we handle the active shaders from 1.0.1 to 1.2.65, it's the active list in the current frame.
If you can't find shaders with 1.0.1, you won't be able to with 1.2.65 after we fix it.
Looking at the log, the most likely answer is that the game is using ComputeShaders for those effects. I don't think 1.0.1 had any ability to sift ComputeShaders, but it might be worth looking at the dumped ComputeShaders in ShaderCache.
bo3b said:@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect.
Please run it again with full debugging enabled, and we'll take a look:
calls=1
input=1
debug=1
unbuffered=1
OK, thanks for that. The log shows that we are in fact hooking the DXGI properly, and creating the Overlay. But we never get Present calls.
The problem is the game does another one of these incomprehensible things:
HackerUnknown::QueryInterface(class HackerDXGISwapChain@0000000002FFC2D0) called with IID: IDXGISwapChain
returns result = 0 for 0000000003ABD838
That's a call to SwapChain asking for ... itself.
We don't presently have any code around the QueryInterface for the HackerSwapChain because we never expect devs to do something like this. Of course, we can be surprised. This effectively unhooks us, because the returned result is what they use to call Present.
For this game, we'll need a 3Dmigoto code change to look for that stupid self->whoami call, and return self.
In the meantime, you can use 1.0.1 to hunt shaders. I know that it has been reported that it's not possible to hunt all shaders, but it is. There isn't any difference in how we handle the active shaders from 1.0.1 to 1.2.65, it's the active list in the current frame.
If you can't find shaders with 1.0.1, you won't be able to with 1.2.65 after we fix it.
Looking at the log, the most likely answer is that the game is using ComputeShaders for those effects. I don't think 1.0.1 had any ability to sift ComputeShaders, but it might be worth looking at the dumped ComputeShaders in ShaderCache.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
[quote="DarkStarSword"]Ok, thanks for that. I'm pretty sure it's the [index] buffer hash, which was re-enabled in that release - I was concerned when I saw that had gone in, because the hashes can be expensive, and while most games tend to restrict most texture creation to loading times so we get away with that, they can be a bit more liberal on creating buffers whenever they feel like. The buffer hash is still broken anyway, because it's not hooked up to anything that can make use of it - I'll try to fix that up so it works for 1.2.66 and probably add an option or something to restrict it to specific types of buffers that we actually want, which in 99% of cases will be none.[/quote]
In theory, we should not have had a hit here, because Ilys switched it to use GetPrivateData for any IndexBuffer hashes, which should be inexpensive. This test case seems very solid though, so probably we missed something.
In any case, the goal was to not have them always active, it was put them into PrivateData for the ones that were used. And the 99% case would be a GetPrivateData that returns null and we skip it.
Edit: I see you updated the GitHub Issue for this already. BTW, for others following along, this was introduced as part of the automatic convergence feature.
DarkStarSword said:Ok, thanks for that. I'm pretty sure it's the [index] buffer hash, which was re-enabled in that release - I was concerned when I saw that had gone in, because the hashes can be expensive, and while most games tend to restrict most texture creation to loading times so we get away with that, they can be a bit more liberal on creating buffers whenever they feel like. The buffer hash is still broken anyway, because it's not hooked up to anything that can make use of it - I'll try to fix that up so it works for 1.2.66 and probably add an option or something to restrict it to specific types of buffers that we actually want, which in 99% of cases will be none.
In theory, we should not have had a hit here, because Ilys switched it to use GetPrivateData for any IndexBuffer hashes, which should be inexpensive. This test case seems very solid though, so probably we missed something.
In any case, the goal was to not have them always active, it was put them into PrivateData for the ones that were used. And the 99% case would be a GetPrivateData that returns null and we skip it.
Edit: I see you updated the GitHub Issue for this already. BTW, for others following along, this was introduced as part of the automatic convergence feature.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
[quote="DarkStarSword"]
For IniParams x1/y1/z1/w1 you want:
[code]dcl_resource_texture1d (float,float,float,float) t120
...
ld_indexable(texture1d)(float,float,float,float) r12.xyzw, l(0, 0, 0, 0), t120.xyzw
[/code]
[/quote]Wouldn't that be for x0-w0? I'd expect it to be l(1,0,0,0), but have not looked closely.
[quote="DarkStarSword"]The recommended way to do a matrix inverse in assembly is to inverse it in a separate custom compute shader, such as this one (which uses 4 threads in a compute shader to do it in parallel):
https://github.com/DarkStarSword/3d-fixes/blob/master/inverse-cs.hlsl
Example:
https://github.com/DarkStarSword/3d-fixes/commit/14f07cbf994f5d9335909d28a70c97ea8039621d
Another example is in my Unity template, which does a little more and optimises it to only do the inverse once per frame (which may or may not be possible for you depending on what you are inversing).[/quote]BTW, I'm keeping a full list of Inverse examples on the wiki at:
[url]http://wiki.bo3b.net/index.php?title=Canonical_Stereo_Code[/url]
This includes both ASM and HLSL examples. And the ComputeShader references. Please don't hesitate to add other examples.
When you look at the ASM example you can see it's pretty expensive. You can possibly get away with that in a VertexShader, but you would not want to do that in a PixelShader unless nothing else worked.
Another example is in my Unity template, which does a little more and optimises it to only do the inverse once per frame (which may or may not be possible for you depending on what you are inversing).
BTW, I'm keeping a full list of Inverse examples on the wiki at:
This includes both ASM and HLSL examples. And the ComputeShader references. Please don't hesitate to add other examples.
When you look at the ASM example you can see it's pretty expensive. You can possibly get away with that in a VertexShader, but you would not want to do that in a PixelShader unless nothing else worked.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
[quote="bo3b"]In theory, we should not have had a hit here, because Ilys switched it to use GetPrivateData for any IndexBuffer hashes, which should be inexpensive. This test case seems very solid though, so probably we missed something.[/quote]I don't think the problem will be where we store the hash - more that we didn't used to calculate it at all with hunting disabled, and now we always calculate it whenever any type of buffer is created regardless of the hunting setting.
Do we actually know what the performance of the PrivateData is? It entirely depends on how DirectX handles that, but the GUID is quite long - too long to be used directly as a key for a hash map lookup, so if that's how they are storing it they have to be hashing that GUID every time we call it, and I'm concerned that could add up if we call it a lot - I don't know, I'd need to measure it. The private data approach has other advantages though, so if performance is good on it I'd like to look into switching a lot of our maps to use it some day.
[quote]Edit: I see you updated the GitHub Issue for this already. BTW, for others following along, this was introduced as part of the automatic convergence feature.[/quote]I don't think it was ever actually hooked up properly - the indexbufferfilter could only ever be combined with the per-draw call convergence/separation overrides making it rather useless... Anyway, once it's hooked up to the same infrastructure as texture filtering than it will work with everything that texture hashes do, including presets.
To llyzs's credit - he was looking at the exact same shader override code I looked at when I first joined 3DMigoto and started looking to improve it same as I did, and his bug report on it shows he was thinking of the same type of improvements that I originally thought about as well - it's just a shame I was out at the time because I had been working towards deprecating and eventually removing all of that code in favour of the flexibility offered by the command lists.
[quote="bo3b"]Wouldn't that be for x0-w0? I'd expect it to be l(1,0,0,0), but have not looked closely.[/quote]Yep, edited and fixed ;-)
[quote="DarkStarSword"]BTW, I'm keeping a full list of Inverse examples on the wiki at:
[url]http://wiki.bo3b.net/index.php?title=Canonical_Stereo_Code[/url]
This includes both ASM and HLSL examples. And the ComputeShader references. Please don't hesitate to add other examples.
When you look at the ASM example you can see it's pretty expensive. You can possibly get away with that in a VertexShader, but you would not want to do that in a PixelShader unless nothing else worked.[/quote]Nice - I should add my euclidean optimised assembly version to it :)
Edit: Done
bo3b said:In theory, we should not have had a hit here, because Ilys switched it to use GetPrivateData for any IndexBuffer hashes, which should be inexpensive. This test case seems very solid though, so probably we missed something.
I don't think the problem will be where we store the hash - more that we didn't used to calculate it at all with hunting disabled, and now we always calculate it whenever any type of buffer is created regardless of the hunting setting.
Do we actually know what the performance of the PrivateData is? It entirely depends on how DirectX handles that, but the GUID is quite long - too long to be used directly as a key for a hash map lookup, so if that's how they are storing it they have to be hashing that GUID every time we call it, and I'm concerned that could add up if we call it a lot - I don't know, I'd need to measure it. The private data approach has other advantages though, so if performance is good on it I'd like to look into switching a lot of our maps to use it some day.
Edit: I see you updated the GitHub Issue for this already. BTW, for others following along, this was introduced as part of the automatic convergence feature.
I don't think it was ever actually hooked up properly - the indexbufferfilter could only ever be combined with the per-draw call convergence/separation overrides making it rather useless... Anyway, once it's hooked up to the same infrastructure as texture filtering than it will work with everything that texture hashes do, including presets.
To llyzs's credit - he was looking at the exact same shader override code I looked at when I first joined 3DMigoto and started looking to improve it same as I did, and his bug report on it shows he was thinking of the same type of improvements that I originally thought about as well - it's just a shame I was out at the time because I had been working towards deprecating and eventually removing all of that code in favour of the flexibility offered by the command lists.
bo3b said:Wouldn't that be for x0-w0? I'd expect it to be l(1,0,0,0), but have not looked closely.
Yep, edited and fixed ;-)
DarkStarSword said:BTW, I'm keeping a full list of Inverse examples on the wiki at:
This includes both ASM and HLSL examples. And the ComputeShader references. Please don't hesitate to add other examples.
When you look at the ASM example you can see it's pretty expensive. You can possibly get away with that in a VertexShader, but you would not want to do that in a PixelShader unless nothing else worked.
Nice - I should add my euclidean optimised assembly version to it :)
Edit: Done
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen.
By breaking I mean it crashes on startup.
Something in 1.2.57 onward is broken.
Log:
D3D11 DLL starting init - v 1.2.57 - Tue Oct 24 14:11:53 2017
Looking up profiles related to D:\Steam\steamapps\common\Lords Of The Fallen\bin\LordsOfTheFallen.exe
----------- Driver profile settings -----------
BaseProfile "Base Profile"
SelectedGlobalProfile "Base Profile"
----------- End driver profile settings -----------
No profile update required
*** D3D11 DLL successfully initialized. ***
Trying to load original_d3d11.dll
Hooked_LoadLibraryExW switching to original dll: original_d3d11.dll to C:\WINDOWS\system32\d3d11.dll.
Hooked_CreateDXGIFactory1 called with riid: IDXGIFactory1
calling original CreateDXGIFactory1 API
CreateDXGIFactory1 returned factory = 0000000033E63CE0, result = 0
new HackerDXGIFactory1(class HackerDXGIFactory1@000000004C110250) wrapped 0000000033E63CE0
HackerDXGIFactory1::EnumAdapters1(class HackerDXGIFactory1@000000004C110250) adapter 0 requested
created HackerDXGIAdapter1 wrapper = 000000004C110400 of 000000004C0B6520
returns result = 0
HackerDXGIAdapter1::GetDesc1(class HackerDXGIAdapter1@000000004C110400) called
returns adapter: NVIDIA GeForce GTX 980 Ti, sysmem=0, vidmem=6290079744, flags=0
*** D3D11CreateDevice called with
pAdapter = 000000004C110400
Flags = 0
pFeatureLevels = 0xb000
FeatureLevels = 3
ppDevice = 000000000014CB18
pFeatureLevel = 0
ppImmediateContext = 000000004C125B20
return HackerDXGIAdapter1 wrapper = 000000004C110400
HackerDXGIAdapter1::GetDesc1(class HackerDXGIAdapter1@000000004C110400) called
returns adapter: NVIDIA GeForce GTX 980 Ti, sysmem=0, vidmem=6290079744, flags=0
HackerUnknown::Release(class HackerDXGIAdapter1@000000004C110400), counter=1, this=000000004C110400
HackerDXGIObject::GetPrivateData(class HackerDXGIAdapter1@000000004C110400) called with GUID: {1D6AD054-FB2F-4000-B3AB-E873A9131A7C}
returns result = 887a0002
HackerUnknown::AddRef(class HackerDXGIAdapter1@000000004C110400), counter=2, this=000000004C110400
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
Hooked_CreateDXGIFactory called with riid: IDXGIFactory
calling original CreateDXGIFactory API
CreateDXGIFactory1 returned factory = 0000000033E64300, result = 0
new HackerDXGIFactory1(class HackerDXGIFactory1@000000004C11ADB0) wrapped 0000000033E64300
HackerDXGIFactory::QueryInterface(class HackerDXGIFactory1@000000004C11ADB0) called with IID: IDXGIFactory2
*** returns E_NOINTERFACE as error for IDXGIFactory2.
HackerUnknown::Release(class HackerDXGIFactory1@000000004C11ADB0), counter=1, this=000000004C11ADB0
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
HackerDXGIObject::GetPrivateData(class HackerDXGIAdapter1@000000004C110400) called with GUID: {D722FB4D-7A68-437A-B20C-5804EE2494A6}
returns result = 887a0002
HackerUnknown::Release(class HackerDXGIAdapter1@000000004C110400), counter=4, this=000000004C110400
D3D11CreateDevice returned device handle = 00000000336F8180, context handle = 000000004C449158
HackerDevice 000000004C2E5E20 created to wrap 00000000336F8180
HackerContext 000000004C4B6F60 created to wrap 000000004C449158
HackerDevice::Create3DMigotoResources(class HackerDevice@000000004C2E5E20) call
This is "out-of the box" wrapper.
Last version that it works is 1.2.56
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
[quote="bo3b"][quote="mx-2"][quote="bo3b"]@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect.
Please run it again with full debugging enabled, and we'll take a look:
calls=1
input=1
debug=1
unbuffered=1
[/quote]
I uploaded a log at [url]https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt[/url].[/quote]
OK, thanks for that. The log shows that we are in fact hooking the DXGI properly, and creating the Overlay. But we never get Present calls.
The problem is the game does another one of these incomprehensible things:
[code]HackerUnknown::QueryInterface(class HackerDXGISwapChain@0000000002FFC2D0) called with IID: IDXGISwapChain
returns result = 0 for 0000000003ABD838
[/code]
That's a call to SwapChain asking for ... itself.
We don't presently have any code around the QueryInterface for the HackerSwapChain because we never expect devs to do something like this. Of course, we can be surprised. This effectively unhooks us, because the returned result is what they use to call Present.
For this game, we'll need a 3Dmigoto code change to look for that stupid self->whoami call, and return self.
In the meantime, you can use 1.0.1 to hunt shaders. I know that it has been reported that it's not possible to hunt all shaders, but it is. There isn't any difference in how we handle the active shaders from 1.0.1 to 1.2.65, it's the active list in the current frame.
If you can't find shaders with 1.0.1, you won't be able to with 1.2.65 after we fix it.
Looking at the log, the most likely answer is that the game is using ComputeShaders for those effects. I don't think 1.0.1 had any ability to sift ComputeShaders, but it might be worth looking at the dumped ComputeShaders in ShaderCache.[/quote]
Hi Bo3b,
Do we expect a fix for this problem? Indeed 1.01 doesn't support CS. And dumping the shaders it seems there are quite a few CS. Looking in the VS and PS(s) for the shadows it doesn't look like we can do anything about it there.
Looking in the CS it seems that most likely there is where the shadows are calculated. However, without knowing which shader is exact will take an insane amount of time binary chopping and finding the right CS for the shadows;)
bo3b said:@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect.
Please run it again with full debugging enabled, and we'll take a look:
calls=1
input=1
debug=1
unbuffered=1
OK, thanks for that. The log shows that we are in fact hooking the DXGI properly, and creating the Overlay. But we never get Present calls.
The problem is the game does another one of these incomprehensible things:
HackerUnknown::QueryInterface(class HackerDXGISwapChain@0000000002FFC2D0) called with IID: IDXGISwapChain
returns result = 0 for 0000000003ABD838
That's a call to SwapChain asking for ... itself.
We don't presently have any code around the QueryInterface for the HackerSwapChain because we never expect devs to do something like this. Of course, we can be surprised. This effectively unhooks us, because the returned result is what they use to call Present.
For this game, we'll need a 3Dmigoto code change to look for that stupid self->whoami call, and return self.
In the meantime, you can use 1.0.1 to hunt shaders. I know that it has been reported that it's not possible to hunt all shaders, but it is. There isn't any difference in how we handle the active shaders from 1.0.1 to 1.2.65, it's the active list in the current frame.
If you can't find shaders with 1.0.1, you won't be able to with 1.2.65 after we fix it.
Looking at the log, the most likely answer is that the game is using ComputeShaders for those effects. I don't think 1.0.1 had any ability to sift ComputeShaders, but it might be worth looking at the dumped ComputeShaders in ShaderCache.
Hi Bo3b,
Do we expect a fix for this problem? Indeed 1.01 doesn't support CS. And dumping the shaders it seems there are quite a few CS. Looking in the VS and PS(s) for the shadows it doesn't look like we can do anything about it there.
Looking in the CS it seems that most likely there is where the shadows are calculated. However, without knowing which shader is exact will take an insane amount of time binary chopping and finding the right CS for the shadows;)
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
[quote="Helifax"]Hi Bo3b,
Do we expect a fix for this problem? Indeed 1.01 doesn't support CS. And dumping the shaders it seems there are quite a few CS. Looking in the VS and PS(s) for the shadows it doesn't look like we can do anything about it there.
Looking in the CS it seems that most likely there is where the shadows are calculated. However, without knowing which shader is exact will take an insane amount of time binary chopping and finding the right CS for the shadows;)[/quote]
Just pushed up a fix for this lost object problem, although I don't have access to an example to test it.
Also added a fix for the crash on null device, when platform_update=1.
New build will depend upon what DarkStarSword has in progress, I'll let him post the build when he's ready.
Helifax said:Hi Bo3b,
Do we expect a fix for this problem? Indeed 1.01 doesn't support CS. And dumping the shaders it seems there are quite a few CS. Looking in the VS and PS(s) for the shadows it doesn't look like we can do anything about it there.
Looking in the CS it seems that most likely there is where the shadows are calculated. However, without knowing which shader is exact will take an insane amount of time binary chopping and finding the right CS for the shadows;)
Just pushed up a fix for this lost object problem, although I don't have access to an example to test it.
Also added a fix for the crash on null device, when platform_update=1.
New build will depend upon what DarkStarSword has in progress, I'll let him post the build when he's ready.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
[quote="Helifax"]3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen.
By breaking I mean it crashes on startup.
Something in 1.2.57 onward is broken.
Log:
<snip>
This is "out-of the box" wrapper.
Last version that it works is 1.2.56[/quote]
Same d3dx.ini and settings? Only code change of new dlls?
This is strange because the difference between 1.2.56 and 1.2.57 is a bunch of changes starting implement platform_update, and should have no impact when that is not enabled.
Looks like it crashes during CreateDevice. Maybe create a log with debug=1, unbuffered=1 for more detail.
Helifax said:3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen.
By breaking I mean it crashes on startup.
Something in 1.2.57 onward is broken.
Log:
<snip>
This is "out-of the box" wrapper.
Last version that it works is 1.2.56
Same d3dx.ini and settings? Only code change of new dlls?
This is strange because the difference between 1.2.56 and 1.2.57 is a bunch of changes starting implement platform_update, and should have no impact when that is not enabled.
Looks like it crashes during CreateDevice. Maybe create a log with debug=1, unbuffered=1 for more detail.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
[quote="bo3b"][quote="Helifax"]3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen.
By breaking I mean it crashes on startup.
Something in 1.2.57 onward is broken.
Log:
<snip>
This is "out-of the box" wrapper.
Last version that it works is 1.2.56[/quote]
Same d3dx.ini and settings? Only code change of new dlls?
This is strange because the difference between 1.2.56 and 1.2.57 is a bunch of changes starting implement platform_update, and should have no impact when that is not enabled.
Looks like it crashes during CreateDevice. Maybe create a log with debug=1, unbuffered=1 for more detail.[/quote]
Yes, same ini, just the dlls where changed (all of them).
I'll try to get an extra log for this;)
Many thanks for the other changes! Looking forward to test it out! many thanks again!
Helifax said:3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen.
By breaking I mean it crashes on startup.
Something in 1.2.57 onward is broken.
Log:
<snip>
This is "out-of the box" wrapper.
Last version that it works is 1.2.56
Same d3dx.ini and settings? Only code change of new dlls?
This is strange because the difference between 1.2.56 and 1.2.57 is a bunch of changes starting implement platform_update, and should have no impact when that is not enabled.
Looks like it crashes during CreateDevice. Maybe create a log with debug=1, unbuffered=1 for more detail.
Yes, same ini, just the dlls where changed (all of them).
I'll try to get an extra log for this;)
Many thanks for the other changes! Looking forward to test it out! many thanks again!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
[quote="bo3b"]Just pushed up a fix for this lost object problem, although I don't have access to an example to test it.
Also added a fix for the crash on null device, when platform_update=1.
New build will depend upon what DarkStarSword has in progress, I'll let him post the build when he's ready.[/quote]I just tried it but unfortunately it doesn't work for me because
typeid(this) == "class HackerUnknown * __ptr64" and typeid(*ppvObject) is "void * __ptr64".
Below is a really ugly hack which uses the logging string output. With that hack hunting and overlay works.
[code]
string thisName = type_name(this);
if (thisName.find("class Hacker", 0) == 0)
thisName.erase(0, 12); // leading "class Hacker"
string riidName = NameFromIID(riid);
riidName.erase(0, 1); // leading "I"
if (thisName == riidName)
*ppvObject = this;
[/code]
Edit: Attachement deleted
bo3b said:Just pushed up a fix for this lost object problem, although I don't have access to an example to test it.
Also added a fix for the crash on null device, when platform_update=1.
New build will depend upon what DarkStarSword has in progress, I'll let him post the build when he's ready.
I just tried it but unfortunately it doesn't work for me because
typeid(this) == "class HackerUnknown * __ptr64" and typeid(*ppvObject) is "void * __ptr64".
Below is a really ugly hack which uses the logging string output. With that hack hunting and overlay works.
string thisName = type_name(this);
if (thisName.find("class Hacker", 0) == 0)
thisName.erase(0, 12); // leading "class Hacker"
string riidName = NameFromIID(riid);
riidName.erase(0, 1); // leading "I"
[quote="mx-2"]I just tried it but unfortunately it doesn't work for me because
typeid(this) == "class HackerUnknown * __ptr64" and typeid(*ppvObject) is "void * __ptr64".
Below is a really ugly hack which uses the logging string output. With that hack hunting and overlay works. I attach the modified dll to this post.
[code]
string thisName = type_name(this);
if (thisName.find("class Hacker", 0) == 0)
thisName.erase(0, 12); // leading "class Hacker"
string riidName = NameFromIID(riid);
riidName.erase(0, 1); // leading "I"
if (thisName == riidName)
*ppvObject = this;
[/code]
[/quote]OK, that's interesting. Thanks for taking a look. I pretty much figured that typeid wasn't going to cut it, but that implies that it also has more object weirdness, in that at that point, I can't get what the true object would be. I don't want to know it's a HackerUnknown, I already know that. I need to know if it's say a HackerDXGISwapChain at that moment. Helpful to know that's what happens.
On the plus side, at least I successfully figured out what the original problem was. :->
For this hack-fix, you can just do the assignment, [i]*ppvObject=this[/i]. This is going to be called super rarely. That will fix it for this specific problem. I'll look into a better way to handle this.
mx-2 said:I just tried it but unfortunately it doesn't work for me because
typeid(this) == "class HackerUnknown * __ptr64" and typeid(*ppvObject) is "void * __ptr64".
Below is a really ugly hack which uses the logging string output. With that hack hunting and overlay works. I attach the modified dll to this post.
string thisName = type_name(this);
if (thisName.find("class Hacker", 0) == 0)
thisName.erase(0, 12); // leading "class Hacker"
string riidName = NameFromIID(riid);
riidName.erase(0, 1); // leading "I"
if (thisName == riidName)
*ppvObject = this;
OK, that's interesting. Thanks for taking a look. I pretty much figured that typeid wasn't going to cut it, but that implies that it also has more object weirdness, in that at that point, I can't get what the true object would be. I don't want to know it's a HackerUnknown, I already know that. I need to know if it's say a HackerDXGISwapChain at that moment. Helpful to know that's what happens.
On the plus side, at least I successfully figured out what the original problem was. :->
For this hack-fix, you can just do the assignment, *ppvObject=this. This is going to be called super rarely. That will fix it for this specific problem. I'll look into a better way to handle this.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
[quote="mx-2"]I attach the modified dll to this post.[/quote]Can we maybe give you commit access to 3DMigoto? I'd rather avoid side builds like this one becoming public (giving them to an individual for testing is fine) because we end up with version numbers in the wild that don't match the source code, so I'd rather that if you do a public release like this that you use the proper publish.bat, tag and upload it to github.
For now (unless Bo3b has anything to supersede it?) I'll just apply your code to 3DMigoto (tagging you as the author) and do a proper release - I had wanted to get some things I need for the UE4 extension DLL in for this release, but if I put them in now they will be a rush job and we've got enough important fixes (and ShaderRegex) as it is that I don't want to delay it any longer.
Can we maybe give you commit access to 3DMigoto? I'd rather avoid side builds like this one becoming public (giving them to an individual for testing is fine) because we end up with version numbers in the wild that don't match the source code, so I'd rather that if you do a public release like this that you use the proper publish.bat, tag and upload it to github.
For now (unless Bo3b has anything to supersede it?) I'll just apply your code to 3DMigoto (tagging you as the author) and do a proper release - I had wanted to get some things I need for the UE4 extension DLL in for this release, but if I put them in now they will be a rush job and we've got enough important fixes (and ShaderRegex) as it is that I don't want to delay it any longer.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Except for the HUD shaders (where I made the depth and tilting formulas), the rest of shaders only needed the typical stereo correction. I can make them use only ASM easily, and then check the performance (edit2: strange. The main one controling wave effects looks different. Incorrect in ASM despite the same formula).
Edit 3: now that I think about it. Some of my shaders stereoize o0 because they were in 2D (and this gives them even proper depth). Those would be the main suspects of those 2 fps when pressing F9 in hunting mode.
I'll wait for 1.2.66 patiently while I play :).
Can you tell me how to load the xyzw and x1y1z1w1 iniparams in ASM? I think I tried once and didn't figure it out correctly (edit: probably "dcl_resource_texture2d (float,float,float,float) t120" and then "dcl_resource_texture2d (float,float,float,float) t120" for the normal ones, right?). And now that I asked that, has anyone here loaded the "matrix.hlsl" and used it in ASM? (I don't use it for Grim Dawn, but just in case I need it for an AMS shader in the future).
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com
I uploaded a log at https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt.
My 3D fixes with Helixmod for the Risen series on GitHub
Bo3b's School for Shaderhackers - starting point for your first 3D fix
For IniParams x/y/z/w you want:
Note that it is texture1d, not texture2d. If you want to load say x2/y2/z2/w2 you do:
The recommended way to do a matrix inverse in assembly is to inverse it in a separate custom compute shader, such as this one (which uses 4 threads in a compute shader to do it in parallel):
https://github.com/DarkStarSword/3d-fixes/blob/master/inverse-cs.hlsl
Example:
https://github.com/DarkStarSword/3d-fixes/commit/14f07cbf994f5d9335909d28a70c97ea8039621d
Another example is in my Unity template, which does a little more and optimises it to only do the inverse once per frame (which may or may not be possible for you depending on what you are inversing).
You can do it directly in assembly without that, but if you need a full matrix inverse it can get pretty nasty - this is from fxc compiling a trivial shader that inverses a matrix in v0-v3 into o0-o3. You would need to search and replace all the registers with the right inputs + outputs + unused temps, and add 9 to dcl_temps:
I have a hand-crafted optimised version for a euclidean matrix (one that doesn't include a projection matrix where we can assume the 4th column is 0001), which is a lot shorter, but that's in DX9 assembly and if I try to convert it now without testing I'm sure to screw it up - it's in my matrix.py if you wanted to take a look (there's some magic in the pyasm module that can convert it to shader model 3 and substitute the registers for whatever you need, but I haven't updated that for DX11 and I never got around to making that easy to use - the idea was that it would end up in shadertool)... Or you could do the same trick with fxc and just tell it that the 4th column is constant.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
OK, thanks for that. The log shows that we are in fact hooking the DXGI properly, and creating the Overlay. But we never get Present calls.
The problem is the game does another one of these incomprehensible things:
That's a call to SwapChain asking for ... itself.
We don't presently have any code around the QueryInterface for the HackerSwapChain because we never expect devs to do something like this. Of course, we can be surprised. This effectively unhooks us, because the returned result is what they use to call Present.
For this game, we'll need a 3Dmigoto code change to look for that stupid self->whoami call, and return self.
In the meantime, you can use 1.0.1 to hunt shaders. I know that it has been reported that it's not possible to hunt all shaders, but it is. There isn't any difference in how we handle the active shaders from 1.0.1 to 1.2.65, it's the active list in the current frame.
If you can't find shaders with 1.0.1, you won't be able to with 1.2.65 after we fix it.
Looking at the log, the most likely answer is that the game is using ComputeShaders for those effects. I don't think 1.0.1 had any ability to sift ComputeShaders, but it might be worth looking at the dumped ComputeShaders in ShaderCache.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
In theory, we should not have had a hit here, because Ilys switched it to use GetPrivateData for any IndexBuffer hashes, which should be inexpensive. This test case seems very solid though, so probably we missed something.
In any case, the goal was to not have them always active, it was put them into PrivateData for the ones that were used. And the 99% case would be a GetPrivateData that returns null and we skip it.
Edit: I see you updated the GitHub Issue for this already. BTW, for others following along, this was introduced as part of the automatic convergence feature.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
BTW, I'm keeping a full list of Inverse examples on the wiki at:
http://wiki.bo3b.net/index.php?title=Canonical_Stereo_Code
This includes both ASM and HLSL examples. And the ComputeShader references. Please don't hesitate to add other examples.
When you look at the ASM example you can see it's pretty expensive. You can possibly get away with that in a VertexShader, but you would not want to do that in a PixelShader unless nothing else worked.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
Do we actually know what the performance of the PrivateData is? It entirely depends on how DirectX handles that, but the GUID is quite long - too long to be used directly as a key for a hash map lookup, so if that's how they are storing it they have to be hashing that GUID every time we call it, and I'm concerned that could add up if we call it a lot - I don't know, I'd need to measure it. The private data approach has other advantages though, so if performance is good on it I'd like to look into switching a lot of our maps to use it some day.
I don't think it was ever actually hooked up properly - the indexbufferfilter could only ever be combined with the per-draw call convergence/separation overrides making it rather useless... Anyway, once it's hooked up to the same infrastructure as texture filtering than it will work with everything that texture hashes do, including presets.
To llyzs's credit - he was looking at the exact same shader override code I looked at when I first joined 3DMigoto and started looking to improve it same as I did, and his bug report on it shows he was thinking of the same type of improvements that I originally thought about as well - it's just a shame I was out at the time because I had been working towards deprecating and eventually removing all of that code in favour of the flexibility offered by the command lists.
Yep, edited and fixed ;-)
Nice - I should add my euclidean optimised assembly version to it :)
Edit: Done
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
By breaking I mean it crashes on startup.
Something in 1.2.57 onward is broken.
Log:
This is "out-of the box" wrapper.
Last version that it works is 1.2.56
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
Hi Bo3b,
Do we expect a fix for this problem? Indeed 1.01 doesn't support CS. And dumping the shaders it seems there are quite a few CS. Looking in the VS and PS(s) for the shadows it doesn't look like we can do anything about it there.
Looking in the CS it seems that most likely there is where the shadows are calculated. However, without knowing which shader is exact will take an insane amount of time binary chopping and finding the right CS for the shadows;)
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
Just pushed up a fix for this lost object problem, although I don't have access to an example to test it.
Also added a fix for the crash on null device, when platform_update=1.
New build will depend upon what DarkStarSword has in progress, I'll let him post the build when he's ready.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
Same d3dx.ini and settings? Only code change of new dlls?
This is strange because the difference between 1.2.56 and 1.2.57 is a bunch of changes starting implement platform_update, and should have no impact when that is not enabled.
Looks like it crashes during CreateDevice. Maybe create a log with debug=1, unbuffered=1 for more detail.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
Yes, same ini, just the dlls where changed (all of them).
I'll try to get an extra log for this;)
Many thanks for the other changes! Looking forward to test it out! many thanks again!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
typeid(this) == "class HackerUnknown * __ptr64" and typeid(*ppvObject) is "void * __ptr64".
Below is a really ugly hack which uses the logging string output. With that hack hunting and overlay works.
Edit: Attachement deleted
My 3D fixes with Helixmod for the Risen series on GitHub
Bo3b's School for Shaderhackers - starting point for your first 3D fix
On the plus side, at least I successfully figured out what the original problem was. :->
For this hack-fix, you can just do the assignment, *ppvObject=this. This is going to be called super rarely. That will fix it for this specific problem. I'll look into a better way to handle this.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
For now (unless Bo3b has anything to supersede it?) I'll just apply your code to 3DMigoto (tagging you as the author) and do a proper release - I had wanted to get some things I need for the UE4 extension DLL in for this release, but if I put them in now they will be a rush job and we've got enough important fixes (and ShaderRegex) as it is that I don't want to delay it any longer.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword