3Dmigoto now open-source...
  109 / 143    
Thanks for the help! I'll keep all of that in mind. Except for the HUD shaders (where I made the depth and tilting formulas), the rest of shaders only needed the typical stereo correction. I can make them use only ASM easily, and then check the performance (edit2: strange. The main one controling wave effects looks different. Incorrect in ASM despite the same formula). Edit 3: now that I think about it. Some of my shaders stereoize o0 because they were in 2D (and this gives them even proper depth). Those would be the main suspects of those 2 fps when pressing F9 in hunting mode. I'll wait for 1.2.66 patiently while I play :). Can you tell me how to load the xyzw and x1y1z1w1 iniparams in ASM? I think I tried once and didn't figure it out correctly (edit: probably "dcl_resource_texture2d (float,float,float,float) t120" and then "dcl_resource_texture2d (float,float,float,float) t120" for the normal ones, right?). And now that I asked that, has anyone here loaded the "matrix.hlsl" and used it in ASM? (I don't use it for Grim Dawn, but just in case I need it for an AMS shader in the future).
Thanks for the help! I'll keep all of that in mind.

Except for the HUD shaders (where I made the depth and tilting formulas), the rest of shaders only needed the typical stereo correction. I can make them use only ASM easily, and then check the performance (edit2: strange. The main one controling wave effects looks different. Incorrect in ASM despite the same formula).

Edit 3: now that I think about it. Some of my shaders stereoize o0 because they were in 2D (and this gives them even proper depth). Those would be the main suspects of those 2 fps when pressing F9 in hunting mode.

I'll wait for 1.2.66 patiently while I play :).

Can you tell me how to load the xyzw and x1y1z1w1 iniparams in ASM? I think I tried once and didn't figure it out correctly (edit: probably "dcl_resource_texture2d (float,float,float,float) t120" and then "dcl_resource_texture2d (float,float,float,float) t120" for the normal ones, right?). And now that I asked that, has anyone here loaded the "matrix.hlsl" and used it in ASM? (I don't use it for Grim Dawn, but just in case I need it for an AMS shader in the future).

CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com

Posted 10/22/2017 02:56 PM   
[quote="bo3b"]@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect. Please run it again with full debugging enabled, and we'll take a look: calls=1 input=1 debug=1 unbuffered=1 [/quote] I uploaded a log at [url]https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt[/url].
bo3b said:@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect.

Please run it again with full debugging enabled, and we'll take a look:
calls=1
input=1
debug=1
unbuffered=1

I uploaded a log at https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt.
Do make sure that those shaders are actually contributing to the fps hit before converting them (i.e. move them out of the way and reload), because it might be a lot of effort for zero gain if they are not. HLSL is not inherently slower than assembly, but in some pathological cases it may be - it just happens that I hit a particularly bad case of that in Dreamfall Chapters. For IniParams x/y/z/w you want: [code]dcl_resource_texture1d (float,float,float,float) t120 ... ld_indexable(texture1d)(float,float,float,float) r12.xyzw, l(0, 0, 0, 0), t120.xyzw [/code] Note that it is texture1d, not texture2d. If you want to load say x2/y2/z2/w2 you do: [code]ld_indexable(texture1d)(float,float,float,float) r8.xyzw, l(2, 0, 0, 0), t120.xyzw [/code] The recommended way to do a matrix inverse in assembly is to inverse it in a separate custom compute shader, such as this one (which uses 4 threads in a compute shader to do it in parallel): https://github.com/DarkStarSword/3d-fixes/blob/master/inverse-cs.hlsl Example: https://github.com/DarkStarSword/3d-fixes/commit/14f07cbf994f5d9335909d28a70c97ea8039621d Another example is in my Unity template, which does a little more and optimises it to only do the inverse once per frame (which may or may not be possible for you depending on what you are inversing). You can do it directly in assembly without that, but if you need a full matrix inverse it can get pretty nasty - this is from fxc compiling a trivial shader that inverses a matrix in v0-v3 into o0-o3. You would need to search and replace all the registers with the right inputs + outputs + unused temps, and add 9 to dcl_temps: [code] mul r0.xyzw, v0.zwyw, v3.wzwy mad r0.xyzw, v0.zwyw, v3.wzwy, -r0.yxwz mul r1.xyz, r0.yxwy, v1.yxxy mul r0.xyz, r0.xyzx, v2.yxxy mul r2.xyzw, v1.wzwy, v3.zwyw mad r2.xyzw, v3.zwyw, v1.wzwy, -r2.yxwz mad r1.xyz, v0.yxxy, r2.yxwy, r1.xyzx mul r2.xyz, r2.xyzx, v2.yxxy mul r3.xyzw, v0.zwyw, v1.wzwy mad r3.xyzw, v0.zwyw, v1.wzwy, -r3.yxwz mad r1.xyz, v3.yxxy, r3.xyzx, r1.xyzx mov r4.z, r1.x mul r5.xyz, v2.zwxz, v3.wxzw mad r5.xyz, v2.wxzw, v3.zwxz, -r5.xyzx dp3 r5.y, r5.xyzx, v1.xzwx mul r6.xyz, v2.yzxy, v3.zxyz mad r6.xyz, v2.zxyz, v3.yzxy, -r6.xyzx dp3 r5.w, r6.xyzx, v1.xyzx mul r6.xyz, v2.ywxy, v3.wxyw mad r6.yzw, -v2.wwxy, v3.yywx, r6.xxyz mad r0.w, v2.w, v3.y, -r6.x mad r0.z, v0.x, r0.w, r0.z dp3 r5.z, r6.yzwy, v1.xywx mul r6.xyz, v2.wyzw, v3.zwyz mad r6.xyz, v2.zwyz, v3.wyzw, -r6.xyzx dp3 r5.x, r6.xyzx, v1.yzwy dp4 r0.w, r5.xyzw, v0.xyzw mul r5.xyzw, v0.wzwy, v2.zwyw mad r5.xyzw, v2.zwyw, v0.wzwy, -r5.yxwz mul r7.xyz, r5.yxwy, v1.yxxy mul r8.xyzw, v1.zwyw, v2.wzwy mad r8.xyzw, v1.zwyw, v2.wzwy, -r8.yxwz mad r7.xyz, v0.yxxy, r8.yxwy, r7.xyzx mad r3.xyz, v2.yxxy, r3.yxwy, r7.xyzx mov r4.w, r3.x mad r1.x, v1.y, r6.x, r2.x mad r4.x, v3.y, r8.x, r1.x mul r7.xyz, v2.zwyz, v3.wyzw mad r1.xw, v2.wwwz, v3.zzzy, -r7.xxxz mad r2.x, v2.y, v3.w, -r7.y mad r2.x, v1.x, r2.x, r2.z mad r2.y, v1.x, r1.x, r2.y mad r7.x, v3.x, r8.y, r2.y mad r2.x, v3.x, r8.z, r2.x mad r0.x, v0.y, r1.x, r0.x mad r0.y, v0.x, r6.x, r0.y mul r1.x, r6.z, v0.x mad r7.y, v3.x, r5.y, r0.y mul r0.y, r1.w, v1.x mad r4.y, v3.y, r5.x, r0.x mad r2.y, v3.x, r5.z, r0.z div o0.xyzw, r4.xyzw, r0.wwww mov r7.z, r1.y mov r2.z, r1.z mov r7.w, r3.y mov r2.w, r3.z div o2.xyzw, r2.xyzw, r0.wwww div o1.xyzw, r7.xyzw, r0.wwww mul r0.xz, v1.yyzy, v3.zzyz mad r0.xz, v1.yyzy, v3.zzyz, -r0.zzxz mad r0.x, v2.x, r0.x, r0.y mul r1.yz, v1.zzyz, v2.yyzy mad r1.yz, v2.yyzy, v1.zzyz, -r1.zzyz mad r2.x, v3.x, r1.y, r0.x mul r0.xy, v0.zyzz, v3.yzyy mad r0.xy, v3.yzyy, v0.zyzz, -r0.yxyy mad r0.x, v2.x, r0.x, r1.x mul r0.y, r0.y, v1.x mad r0.y, v0.x, r0.z, r0.y mul r1.xy, v0.yzyy, v2.zyzz mad r1.xy, v0.yzyy, v2.zyzz, -r1.yxyy mad r2.y, v3.x, r1.x, r0.x mul r0.x, r1.y, v1.x mad r0.x, v0.x, r1.z, r0.x mul r1.xy, v0.zyzz, v1.yzyy mad r1.xy, v1.yzyy, v0.zyzz, -r1.yxyy mad r2.z, v3.x, r1.x, r0.y mad r2.w, v2.x, r1.y, r0.x div o3.xyzw, r2.xyzw, r0.wwww [/code] I have a hand-crafted optimised version for a euclidean matrix (one that doesn't include a projection matrix where we can assume the 4th column is 0001), which is a lot shorter, but that's in DX9 assembly and if I try to convert it now without testing I'm sure to screw it up - it's in my matrix.py if you wanted to take a look (there's some magic in the pyasm module that can convert it to shader model 3 and substitute the registers for whatever you need, but I haven't updated that for DX11 and I never got around to making that easy to use - the idea was that it would end up in shadertool)... Or you could do the same trick with fxc and just tell it that the 4th column is constant.
Do make sure that those shaders are actually contributing to the fps hit before converting them (i.e. move them out of the way and reload), because it might be a lot of effort for zero gain if they are not. HLSL is not inherently slower than assembly, but in some pathological cases it may be - it just happens that I hit a particularly bad case of that in Dreamfall Chapters.



For IniParams x/y/z/w you want:

dcl_resource_texture1d (float,float,float,float) t120

...

ld_indexable(texture1d)(float,float,float,float) r12.xyzw, l(0, 0, 0, 0), t120.xyzw

Note that it is texture1d, not texture2d. If you want to load say x2/y2/z2/w2 you do:

ld_indexable(texture1d)(float,float,float,float) r8.xyzw, l(2, 0, 0, 0), t120.xyzw



The recommended way to do a matrix inverse in assembly is to inverse it in a separate custom compute shader, such as this one (which uses 4 threads in a compute shader to do it in parallel):
https://github.com/DarkStarSword/3d-fixes/blob/master/inverse-cs.hlsl

Example:
https://github.com/DarkStarSword/3d-fixes/commit/14f07cbf994f5d9335909d28a70c97ea8039621d

Another example is in my Unity template, which does a little more and optimises it to only do the inverse once per frame (which may or may not be possible for you depending on what you are inversing).



You can do it directly in assembly without that, but if you need a full matrix inverse it can get pretty nasty - this is from fxc compiling a trivial shader that inverses a matrix in v0-v3 into o0-o3. You would need to search and replace all the registers with the right inputs + outputs + unused temps, and add 9 to dcl_temps:
mul r0.xyzw, v0.zwyw, v3.wzwy
mad r0.xyzw, v0.zwyw, v3.wzwy, -r0.yxwz
mul r1.xyz, r0.yxwy, v1.yxxy
mul r0.xyz, r0.xyzx, v2.yxxy
mul r2.xyzw, v1.wzwy, v3.zwyw
mad r2.xyzw, v3.zwyw, v1.wzwy, -r2.yxwz
mad r1.xyz, v0.yxxy, r2.yxwy, r1.xyzx
mul r2.xyz, r2.xyzx, v2.yxxy
mul r3.xyzw, v0.zwyw, v1.wzwy
mad r3.xyzw, v0.zwyw, v1.wzwy, -r3.yxwz
mad r1.xyz, v3.yxxy, r3.xyzx, r1.xyzx
mov r4.z, r1.x
mul r5.xyz, v2.zwxz, v3.wxzw
mad r5.xyz, v2.wxzw, v3.zwxz, -r5.xyzx
dp3 r5.y, r5.xyzx, v1.xzwx
mul r6.xyz, v2.yzxy, v3.zxyz
mad r6.xyz, v2.zxyz, v3.yzxy, -r6.xyzx
dp3 r5.w, r6.xyzx, v1.xyzx
mul r6.xyz, v2.ywxy, v3.wxyw
mad r6.yzw, -v2.wwxy, v3.yywx, r6.xxyz
mad r0.w, v2.w, v3.y, -r6.x
mad r0.z, v0.x, r0.w, r0.z
dp3 r5.z, r6.yzwy, v1.xywx
mul r6.xyz, v2.wyzw, v3.zwyz
mad r6.xyz, v2.zwyz, v3.wyzw, -r6.xyzx
dp3 r5.x, r6.xyzx, v1.yzwy
dp4 r0.w, r5.xyzw, v0.xyzw
mul r5.xyzw, v0.wzwy, v2.zwyw
mad r5.xyzw, v2.zwyw, v0.wzwy, -r5.yxwz
mul r7.xyz, r5.yxwy, v1.yxxy
mul r8.xyzw, v1.zwyw, v2.wzwy
mad r8.xyzw, v1.zwyw, v2.wzwy, -r8.yxwz
mad r7.xyz, v0.yxxy, r8.yxwy, r7.xyzx
mad r3.xyz, v2.yxxy, r3.yxwy, r7.xyzx
mov r4.w, r3.x
mad r1.x, v1.y, r6.x, r2.x
mad r4.x, v3.y, r8.x, r1.x
mul r7.xyz, v2.zwyz, v3.wyzw
mad r1.xw, v2.wwwz, v3.zzzy, -r7.xxxz
mad r2.x, v2.y, v3.w, -r7.y
mad r2.x, v1.x, r2.x, r2.z
mad r2.y, v1.x, r1.x, r2.y
mad r7.x, v3.x, r8.y, r2.y
mad r2.x, v3.x, r8.z, r2.x
mad r0.x, v0.y, r1.x, r0.x
mad r0.y, v0.x, r6.x, r0.y
mul r1.x, r6.z, v0.x
mad r7.y, v3.x, r5.y, r0.y
mul r0.y, r1.w, v1.x
mad r4.y, v3.y, r5.x, r0.x
mad r2.y, v3.x, r5.z, r0.z
div o0.xyzw, r4.xyzw, r0.wwww
mov r7.z, r1.y
mov r2.z, r1.z
mov r7.w, r3.y
mov r2.w, r3.z
div o2.xyzw, r2.xyzw, r0.wwww
div o1.xyzw, r7.xyzw, r0.wwww
mul r0.xz, v1.yyzy, v3.zzyz
mad r0.xz, v1.yyzy, v3.zzyz, -r0.zzxz
mad r0.x, v2.x, r0.x, r0.y
mul r1.yz, v1.zzyz, v2.yyzy
mad r1.yz, v2.yyzy, v1.zzyz, -r1.zzyz
mad r2.x, v3.x, r1.y, r0.x
mul r0.xy, v0.zyzz, v3.yzyy
mad r0.xy, v3.yzyy, v0.zyzz, -r0.yxyy
mad r0.x, v2.x, r0.x, r1.x
mul r0.y, r0.y, v1.x
mad r0.y, v0.x, r0.z, r0.y
mul r1.xy, v0.yzyy, v2.zyzz
mad r1.xy, v0.yzyy, v2.zyzz, -r1.yxyy
mad r2.y, v3.x, r1.x, r0.x
mul r0.x, r1.y, v1.x
mad r0.x, v0.x, r1.z, r0.x
mul r1.xy, v0.zyzz, v1.yzyy
mad r1.xy, v1.yzyy, v0.zyzz, -r1.yxyy
mad r2.z, v3.x, r1.x, r0.y
mad r2.w, v2.x, r1.y, r0.x
div o3.xyzw, r2.xyzw, r0.wwww

I have a hand-crafted optimised version for a euclidean matrix (one that doesn't include a projection matrix where we can assume the 4th column is 0001), which is a lot shorter, but that's in DX9 assembly and if I try to convert it now without testing I'm sure to screw it up - it's in my matrix.py if you wanted to take a look (there's some magic in the pyasm module that can convert it to shader model 3 and substitute the registers for whatever you need, but I haven't updated that for DX11 and I never got around to making that easy to use - the idea was that it would end up in shadertool)... Or you could do the same trick with fxc and just tell it that the 4th column is constant.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 10/22/2017 04:15 PM   
[quote="mx-2"][quote="bo3b"]@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect. Please run it again with full debugging enabled, and we'll take a look: calls=1 input=1 debug=1 unbuffered=1 [/quote] I uploaded a log at [url]https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt[/url].[/quote] OK, thanks for that. The log shows that we are in fact hooking the DXGI properly, and creating the Overlay. But we never get Present calls. The problem is the game does another one of these incomprehensible things: [code]HackerUnknown::QueryInterface(class HackerDXGISwapChain@0000000002FFC2D0) called with IID: IDXGISwapChain returns result = 0 for 0000000003ABD838 [/code] That's a call to SwapChain asking for ... itself. We don't presently have any code around the QueryInterface for the HackerSwapChain because we never expect devs to do something like this. Of course, we can be surprised. This effectively unhooks us, because the returned result is what they use to call Present. For this game, we'll need a 3Dmigoto code change to look for that stupid self->whoami call, and return self. In the meantime, you can use 1.0.1 to hunt shaders. I know that it has been reported that it's not possible to hunt all shaders, but it is. There isn't any difference in how we handle the active shaders from 1.0.1 to 1.2.65, it's the active list in the current frame. If you can't find shaders with 1.0.1, you won't be able to with 1.2.65 after we fix it. Looking at the log, the most likely answer is that the game is using ComputeShaders for those effects. I don't think 1.0.1 had any ability to sift ComputeShaders, but it might be worth looking at the dumped ComputeShaders in ShaderCache.
mx-2 said:
bo3b said:@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect.

Please run it again with full debugging enabled, and we'll take a look:
calls=1
input=1
debug=1
unbuffered=1

I uploaded a log at https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt.

OK, thanks for that. The log shows that we are in fact hooking the DXGI properly, and creating the Overlay. But we never get Present calls.


The problem is the game does another one of these incomprehensible things:

HackerUnknown::QueryInterface(class HackerDXGISwapChain@0000000002FFC2D0) called with IID: IDXGISwapChain
returns result = 0 for 0000000003ABD838

That's a call to SwapChain asking for ... itself.

We don't presently have any code around the QueryInterface for the HackerSwapChain because we never expect devs to do something like this. Of course, we can be surprised. This effectively unhooks us, because the returned result is what they use to call Present.

For this game, we'll need a 3Dmigoto code change to look for that stupid self->whoami call, and return self.


In the meantime, you can use 1.0.1 to hunt shaders. I know that it has been reported that it's not possible to hunt all shaders, but it is. There isn't any difference in how we handle the active shaders from 1.0.1 to 1.2.65, it's the active list in the current frame.

If you can't find shaders with 1.0.1, you won't be able to with 1.2.65 after we fix it.


Looking at the log, the most likely answer is that the game is using ComputeShaders for those effects. I don't think 1.0.1 had any ability to sift ComputeShaders, but it might be worth looking at the dumped ComputeShaders in ShaderCache.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 10/23/2017 12:47 AM   
[quote="DarkStarSword"]Ok, thanks for that. I'm pretty sure it's the [index] buffer hash, which was re-enabled in that release - I was concerned when I saw that had gone in, because the hashes can be expensive, and while most games tend to restrict most texture creation to loading times so we get away with that, they can be a bit more liberal on creating buffers whenever they feel like. The buffer hash is still broken anyway, because it's not hooked up to anything that can make use of it - I'll try to fix that up so it works for 1.2.66 and probably add an option or something to restrict it to specific types of buffers that we actually want, which in 99% of cases will be none.[/quote] In theory, we should not have had a hit here, because Ilys switched it to use GetPrivateData for any IndexBuffer hashes, which should be inexpensive. This test case seems very solid though, so probably we missed something. In any case, the goal was to not have them always active, it was put them into PrivateData for the ones that were used. And the 99% case would be a GetPrivateData that returns null and we skip it. Edit: I see you updated the GitHub Issue for this already. BTW, for others following along, this was introduced as part of the automatic convergence feature.
DarkStarSword said:Ok, thanks for that. I'm pretty sure it's the [index] buffer hash, which was re-enabled in that release - I was concerned when I saw that had gone in, because the hashes can be expensive, and while most games tend to restrict most texture creation to loading times so we get away with that, they can be a bit more liberal on creating buffers whenever they feel like. The buffer hash is still broken anyway, because it's not hooked up to anything that can make use of it - I'll try to fix that up so it works for 1.2.66 and probably add an option or something to restrict it to specific types of buffers that we actually want, which in 99% of cases will be none.

In theory, we should not have had a hit here, because Ilys switched it to use GetPrivateData for any IndexBuffer hashes, which should be inexpensive. This test case seems very solid though, so probably we missed something.

In any case, the goal was to not have them always active, it was put them into PrivateData for the ones that were used. And the 99% case would be a GetPrivateData that returns null and we skip it.

Edit: I see you updated the GitHub Issue for this already. BTW, for others following along, this was introduced as part of the automatic convergence feature.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 10/23/2017 12:52 AM   
[quote="DarkStarSword"] For IniParams x1/y1/z1/w1 you want: [code]dcl_resource_texture1d (float,float,float,float) t120 ... ld_indexable(texture1d)(float,float,float,float) r12.xyzw, l(0, 0, 0, 0), t120.xyzw [/code] [/quote]Wouldn't that be for x0-w0? I'd expect it to be l(1,0,0,0), but have not looked closely. [quote="DarkStarSword"]The recommended way to do a matrix inverse in assembly is to inverse it in a separate custom compute shader, such as this one (which uses 4 threads in a compute shader to do it in parallel): https://github.com/DarkStarSword/3d-fixes/blob/master/inverse-cs.hlsl Example: https://github.com/DarkStarSword/3d-fixes/commit/14f07cbf994f5d9335909d28a70c97ea8039621d Another example is in my Unity template, which does a little more and optimises it to only do the inverse once per frame (which may or may not be possible for you depending on what you are inversing).[/quote]BTW, I'm keeping a full list of Inverse examples on the wiki at: [url]http://wiki.bo3b.net/index.php?title=Canonical_Stereo_Code[/url] This includes both ASM and HLSL examples. And the ComputeShader references. Please don't hesitate to add other examples. When you look at the ASM example you can see it's pretty expensive. You can possibly get away with that in a VertexShader, but you would not want to do that in a PixelShader unless nothing else worked.
DarkStarSword said:
For IniParams x1/y1/z1/w1 you want:

dcl_resource_texture1d (float,float,float,float) t120

...

ld_indexable(texture1d)(float,float,float,float) r12.xyzw, l(0, 0, 0, 0), t120.xyzw

Wouldn't that be for x0-w0? I'd expect it to be l(1,0,0,0), but have not looked closely.


DarkStarSword said:The recommended way to do a matrix inverse in assembly is to inverse it in a separate custom compute shader, such as this one (which uses 4 threads in a compute shader to do it in parallel):
https://github.com/DarkStarSword/3d-fixes/blob/master/inverse-cs.hlsl

Example:
https://github.com/DarkStarSword/3d-fixes/commit/14f07cbf994f5d9335909d28a70c97ea8039621d

Another example is in my Unity template, which does a little more and optimises it to only do the inverse once per frame (which may or may not be possible for you depending on what you are inversing).
BTW, I'm keeping a full list of Inverse examples on the wiki at:

http://wiki.bo3b.net/index.php?title=Canonical_Stereo_Code

This includes both ASM and HLSL examples. And the ComputeShader references. Please don't hesitate to add other examples.


When you look at the ASM example you can see it's pretty expensive. You can possibly get away with that in a VertexShader, but you would not want to do that in a PixelShader unless nothing else worked.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 10/23/2017 12:57 AM   
[quote="bo3b"]In theory, we should not have had a hit here, because Ilys switched it to use GetPrivateData for any IndexBuffer hashes, which should be inexpensive. This test case seems very solid though, so probably we missed something.[/quote]I don't think the problem will be where we store the hash - more that we didn't used to calculate it at all with hunting disabled, and now we always calculate it whenever any type of buffer is created regardless of the hunting setting. Do we actually know what the performance of the PrivateData is? It entirely depends on how DirectX handles that, but the GUID is quite long - too long to be used directly as a key for a hash map lookup, so if that's how they are storing it they have to be hashing that GUID every time we call it, and I'm concerned that could add up if we call it a lot - I don't know, I'd need to measure it. The private data approach has other advantages though, so if performance is good on it I'd like to look into switching a lot of our maps to use it some day. [quote]Edit: I see you updated the GitHub Issue for this already. BTW, for others following along, this was introduced as part of the automatic convergence feature.[/quote]I don't think it was ever actually hooked up properly - the indexbufferfilter could only ever be combined with the per-draw call convergence/separation overrides making it rather useless... Anyway, once it's hooked up to the same infrastructure as texture filtering than it will work with everything that texture hashes do, including presets. To llyzs's credit - he was looking at the exact same shader override code I looked at when I first joined 3DMigoto and started looking to improve it same as I did, and his bug report on it shows he was thinking of the same type of improvements that I originally thought about as well - it's just a shame I was out at the time because I had been working towards deprecating and eventually removing all of that code in favour of the flexibility offered by the command lists. [quote="bo3b"]Wouldn't that be for x0-w0? I'd expect it to be l(1,0,0,0), but have not looked closely.[/quote]Yep, edited and fixed ;-) [quote="DarkStarSword"]BTW, I'm keeping a full list of Inverse examples on the wiki at: [url]http://wiki.bo3b.net/index.php?title=Canonical_Stereo_Code[/url] This includes both ASM and HLSL examples. And the ComputeShader references. Please don't hesitate to add other examples. When you look at the ASM example you can see it's pretty expensive. You can possibly get away with that in a VertexShader, but you would not want to do that in a PixelShader unless nothing else worked.[/quote]Nice - I should add my euclidean optimised assembly version to it :) Edit: Done
bo3b said:In theory, we should not have had a hit here, because Ilys switched it to use GetPrivateData for any IndexBuffer hashes, which should be inexpensive. This test case seems very solid though, so probably we missed something.
I don't think the problem will be where we store the hash - more that we didn't used to calculate it at all with hunting disabled, and now we always calculate it whenever any type of buffer is created regardless of the hunting setting.

Do we actually know what the performance of the PrivateData is? It entirely depends on how DirectX handles that, but the GUID is quite long - too long to be used directly as a key for a hash map lookup, so if that's how they are storing it they have to be hashing that GUID every time we call it, and I'm concerned that could add up if we call it a lot - I don't know, I'd need to measure it. The private data approach has other advantages though, so if performance is good on it I'd like to look into switching a lot of our maps to use it some day.

Edit: I see you updated the GitHub Issue for this already. BTW, for others following along, this was introduced as part of the automatic convergence feature.
I don't think it was ever actually hooked up properly - the indexbufferfilter could only ever be combined with the per-draw call convergence/separation overrides making it rather useless... Anyway, once it's hooked up to the same infrastructure as texture filtering than it will work with everything that texture hashes do, including presets.

To llyzs's credit - he was looking at the exact same shader override code I looked at when I first joined 3DMigoto and started looking to improve it same as I did, and his bug report on it shows he was thinking of the same type of improvements that I originally thought about as well - it's just a shame I was out at the time because I had been working towards deprecating and eventually removing all of that code in favour of the flexibility offered by the command lists.


bo3b said:Wouldn't that be for x0-w0? I'd expect it to be l(1,0,0,0), but have not looked closely.
Yep, edited and fixed ;-)


DarkStarSword said:BTW, I'm keeping a full list of Inverse examples on the wiki at:

http://wiki.bo3b.net/index.php?title=Canonical_Stereo_Code

This includes both ASM and HLSL examples. And the ComputeShader references. Please don't hesitate to add other examples.


When you look at the ASM example you can see it's pretty expensive. You can possibly get away with that in a VertexShader, but you would not want to do that in a PixelShader unless nothing else worked.
Nice - I should add my euclidean optimised assembly version to it :)

Edit: Done

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 10/23/2017 02:18 AM   
3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen. By breaking I mean it crashes on startup. Something in 1.2.57 onward is broken. Log: [code] D3D11 DLL starting init - v 1.2.57 - Tue Oct 24 14:11:53 2017 ----------- d3dx.ini settings ----------- [Logging] calls=1 input=1 debug=0 unbuffered=0 force_cpu_affinity=0 [System] [Device] get_resolution_from=swap_chain [Stereo] automatic_mode=0 create_profile=0 force_no_nvapi=0 [Rendering] shader_hash=3dmigoto cache_shaders=0 use_criticalsection=1 rasterizer_disable_scissor=1 export_fixed=0 export_shaders=0 export_hlsl=0 dump_usage=1 stereo_params=125 ini_params=120 override_directory=D:\Steam\steamapps\common\Lords Of The Fallen\bin\ShaderFixes cache_directory=D:\Steam\steamapps\common\Lords Of The Fallen\bin\ShaderCache fix_sv_position=0 ... missing automatic ini section [Hunting] hunting=1 marking_mode=0 mark_snapshot=2 reload_config=no_modifiers VK_F10 toggle_hunting=no_modifiers VK_NUMPAD0 next_pixelshader=no_modifiers VK_NUMPAD2 previous_pixelshader=no_modifiers VK_NUMPAD1 mark_pixelshader=no_modifiers VK_NUMPAD3 take_screenshot=no_modifiers VK_SNAPSHOT next_indexbuffer=no_modifiers VK_NUMPAD8 previous_indexbuffer=no_modifiers VK_NUMPAD7 mark_indexbuffer=no_modifiers VK_NUMPAD9 next_vertexshader=no_modifiers VK_NUMPAD5 previous_vertexshader=no_modifiers VK_NUMPAD4 mark_vertexshader=no_modifiers VK_NUMPAD6 next_rendertarget=no_modifiers VK_MULTIPLY previous_rendertarget=no_modifiers VK_DIVIDE mark_rendertarget=no_modifiers VK_SUBTRACT done_hunting=no_modifiers VK_ADD reload_fixes=no_modifiers VK_F10 show_original=no_modifiers VK_F9 analyse_options=log dump_rt_jps clear_rt repeat_rate=6 [KeyChange3DVision2SBSOutputMode] type=cycle Cycle 1: x7=1 Cycle 2: x7=2 Cycle 3: x7=3 Cycle 4: x7=4 Cycle 5: x7=5 Cycle 6: x7=0 Key=no_modifiers F11 [Resource3DVision2SBSBackupTexture] [customshader3dvision2sbs] vs=ShaderFixes/3dvision2sbsvs.hlsl hs=null ds=null gs=null ps=ShaderFixes/3dvision2sbsps.hlsl blend=disable cull=none topology=triangle_strip o1=null o2=null o3=null o4=null o5=null o6=null o7=null od=null o0=set_viewport no_view_cache bb resource3dvision2sbsbackuptexture=reference ps-t100 ps-t100=stereo2mono bb vs-t125=stereoparams ps-t125=stereoparams vs-t120=iniparams ps-t120=iniparams draw=4, 0 post ps-t100=reference resource3dvision2sbsbackuptexture [Present] [Constants] x7=0.00 [Profile] NVIDIA driver version 387.92 (branch r387_90) Looking up profiles related to D:\Steam\steamapps\common\Lords Of The Fallen\bin\LordsOfTheFallen.exe ----------- Driver profile settings ----------- BaseProfile "Base Profile" SelectedGlobalProfile "Base Profile" Profile "Base Profile" ShowOn All Setting ID_0x005a375c = 0x96861077 UserSpecified=true // Vertical Sync Tear Control Setting ID_0x00a879cf = 0x60925292 UserSpecified=true // Vertical Sync Setting ID_0x101ae763 = 0x00000000 UserSpecified=true // Flag to control smooth AFR behavior Setting ID_0x10a879ac = 0x00000004 UserSpecified=true // G-SYNC Setting ID_0x10a879cf = 0x00000004 UserSpecified=true // G-SYNC Setting ID_0x1194f158 = 0x00000000 UserSpecified=true // Enable G-SYNC globally Setting ID_0x80303a19 = 0x00000001 Setting ID_0x80857a28 = 0x00000001 Setting ID_0x809d5f60 = 0x00000001 EndProfile Profile "Lords of the Fallen" ShowOn GeForce Executable "lordsofthefallen.exe" Setting ID_0x002c7f45 = 0x00140000 Setting ID_0x00664339 = 0x00000001 // NVIDIA Predefined Ambient Occlusion Usage Setting ID_0x00a06946 = 0x000020f5 Setting ID_0x10115c8b = 0x00000028 // Quiet Mode Application FPS Setting ID_0x1033cec2 = 0x00000002 // NVIDIA predefined SLI mode on DirectX 10 Setting ID_0x1033dcd3 = 0x00000004 // NVIDIA predefined number of GPUs to use on SLI rendering mode on DirectX 10 Setting ID_0x106d5cff = 0x00000000 // Do not display this profile in the Control Panel Setting ID_0x10f9dc81 = 0x00000011 // Enable application for Optimus Setting ID_0x701eb457 = 0x00000001 // StereoProfile Setting ID_0x702442fc = 0x00005008 UserSpecified=true // StereoFlagsDX10 SettingString ID_0x7049c7ec = "5" // Value SettingString ID_0x704d456e = "Fix by: Mike_ar69, Helifax, Masterotaku, Kaimasta" UserSpecified=true // Comments SettingString ID_0x7051e5f5 = "0" UserSpecified=true // Compat Setting ID_0x707f4b45 = 0x00000001 UserSpecified=true // StereoMemoEnabled Setting ID_0x708db8c5 = 0x4341f366 // StereoConvergence = 193.950775 Setting ID_0x709a1ddf = 0x00000001 // StereoCutoff Setting ID_0x709adada = 0x10000006 // 2DDHUDSettings Setting ID_0x709adadb = 0x3f6f402c // 2DDConvergence = 0.934572935 SettingString ID_0x709adadc = "Some objects render at wrong depth." // 2DD_Notes Setting ID_0x709adadd = 0x00000001 UserSpecified=true // Disable2DD SettingString ID_0x70b5603f = "D3D" // API Setting ID_0x70edb381 = 0x00000023 // StereoTextureEnable EndProfile ----------- End driver profile settings ----------- No profile update required *** D3D11 DLL successfully initialized. *** Trying to load original_d3d11.dll Hooked_LoadLibraryExW switching to original dll: original_d3d11.dll to C:\WINDOWS\system32\d3d11.dll. Hooked_CreateDXGIFactory1 called with riid: IDXGIFactory1 calling original CreateDXGIFactory1 API CreateDXGIFactory1 returned factory = 0000000033E63CE0, result = 0 new HackerDXGIFactory1(class HackerDXGIFactory1@000000004C110250) wrapped 0000000033E63CE0 HackerDXGIFactory1::EnumAdapters1(class HackerDXGIFactory1@000000004C110250) adapter 0 requested created HackerDXGIAdapter1 wrapper = 000000004C110400 of 000000004C0B6520 returns result = 0 HackerDXGIAdapter1::GetDesc1(class HackerDXGIAdapter1@000000004C110400) called returns adapter: NVIDIA GeForce GTX 980 Ti, sysmem=0, vidmem=6290079744, flags=0 *** D3D11CreateDevice called with pAdapter = 000000004C110400 Flags = 0 pFeatureLevels = 0xb000 FeatureLevels = 3 ppDevice = 000000000014CB18 pFeatureLevel = 0 ppImmediateContext = 000000004C125B20 return HackerDXGIAdapter1 wrapper = 000000004C110400 HackerDXGIAdapter1::GetDesc1(class HackerDXGIAdapter1@000000004C110400) called returns adapter: NVIDIA GeForce GTX 980 Ti, sysmem=0, vidmem=6290079744, flags=0 HackerUnknown::Release(class HackerDXGIAdapter1@000000004C110400), counter=1, this=000000004C110400 HackerDXGIObject::GetPrivateData(class HackerDXGIAdapter1@000000004C110400) called with GUID: {1D6AD054-FB2F-4000-B3AB-E873A9131A7C} returns result = 887a0002 HackerUnknown::AddRef(class HackerDXGIAdapter1@000000004C110400), counter=2, this=000000004C110400 Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll. Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll. Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll. Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll. Hooked_CreateDXGIFactory called with riid: IDXGIFactory calling original CreateDXGIFactory API CreateDXGIFactory1 returned factory = 0000000033E64300, result = 0 new HackerDXGIFactory1(class HackerDXGIFactory1@000000004C11ADB0) wrapped 0000000033E64300 HackerDXGIFactory::QueryInterface(class HackerDXGIFactory1@000000004C11ADB0) called with IID: IDXGIFactory2 *** returns E_NOINTERFACE as error for IDXGIFactory2. HackerUnknown::Release(class HackerDXGIFactory1@000000004C11ADB0), counter=1, this=000000004C11ADB0 Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll. HackerDXGIObject::GetPrivateData(class HackerDXGIAdapter1@000000004C110400) called with GUID: {D722FB4D-7A68-437A-B20C-5804EE2494A6} returns result = 887a0002 HackerUnknown::Release(class HackerDXGIAdapter1@000000004C110400), counter=4, this=000000004C110400 D3D11CreateDevice returned device handle = 00000000336F8180, context handle = 000000004C449158 HackerDevice 000000004C2E5E20 created to wrap 00000000336F8180 HackerContext 000000004C4B6F60 created to wrap 000000004C449158 HackerDevice::Create3DMigotoResources(class HackerDevice@000000004C2E5E20) call [/code] This is "out-of the box" wrapper. Last version that it works is 1.2.56
3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen.
By breaking I mean it crashes on startup.
Something in 1.2.57 onward is broken.

Log:
D3D11 DLL starting init - v 1.2.57 - Tue Oct 24 14:11:53 2017


----------- d3dx.ini settings -----------
[Logging]
calls=1
input=1
debug=0
unbuffered=0
force_cpu_affinity=0
[System]
[Device]
get_resolution_from=swap_chain
[Stereo]
automatic_mode=0
create_profile=0
force_no_nvapi=0
[Rendering]
shader_hash=3dmigoto
cache_shaders=0
use_criticalsection=1
rasterizer_disable_scissor=1
export_fixed=0
export_shaders=0
export_hlsl=0
dump_usage=1
stereo_params=125
ini_params=120
override_directory=D:\Steam\steamapps\common\Lords Of The Fallen\bin\ShaderFixes
cache_directory=D:\Steam\steamapps\common\Lords Of The Fallen\bin\ShaderCache
fix_sv_position=0
... missing automatic ini section
[Hunting]
hunting=1
marking_mode=0
mark_snapshot=2
reload_config=no_modifiers VK_F10
toggle_hunting=no_modifiers VK_NUMPAD0
next_pixelshader=no_modifiers VK_NUMPAD2
previous_pixelshader=no_modifiers VK_NUMPAD1
mark_pixelshader=no_modifiers VK_NUMPAD3
take_screenshot=no_modifiers VK_SNAPSHOT
next_indexbuffer=no_modifiers VK_NUMPAD8
previous_indexbuffer=no_modifiers VK_NUMPAD7
mark_indexbuffer=no_modifiers VK_NUMPAD9
next_vertexshader=no_modifiers VK_NUMPAD5
previous_vertexshader=no_modifiers VK_NUMPAD4
mark_vertexshader=no_modifiers VK_NUMPAD6
next_rendertarget=no_modifiers VK_MULTIPLY
previous_rendertarget=no_modifiers VK_DIVIDE
mark_rendertarget=no_modifiers VK_SUBTRACT
done_hunting=no_modifiers VK_ADD
reload_fixes=no_modifiers VK_F10
show_original=no_modifiers VK_F9
analyse_options=log dump_rt_jps clear_rt
repeat_rate=6
[KeyChange3DVision2SBSOutputMode]
type=cycle
Cycle 1: x7=1
Cycle 2: x7=2
Cycle 3: x7=3
Cycle 4: x7=4
Cycle 5: x7=5
Cycle 6: x7=0
Key=no_modifiers F11
[Resource3DVision2SBSBackupTexture]
[customshader3dvision2sbs]
vs=ShaderFixes/3dvision2sbsvs.hlsl
hs=null
ds=null
gs=null
ps=ShaderFixes/3dvision2sbsps.hlsl
blend=disable
cull=none
topology=triangle_strip
o1=null
o2=null
o3=null
o4=null
o5=null
o6=null
o7=null
od=null
o0=set_viewport no_view_cache bb
resource3dvision2sbsbackuptexture=reference ps-t100
ps-t100=stereo2mono bb
vs-t125=stereoparams
ps-t125=stereoparams
vs-t120=iniparams
ps-t120=iniparams
draw=4, 0
post ps-t100=reference resource3dvision2sbsbackuptexture
[Present]
[Constants]
x7=0.00
[Profile]

NVIDIA driver version 387.92 (branch r387_90)

Looking up profiles related to D:\Steam\steamapps\common\Lords Of The Fallen\bin\LordsOfTheFallen.exe
----------- Driver profile settings -----------
BaseProfile "Base Profile"
SelectedGlobalProfile "Base Profile"

Profile "Base Profile"
ShowOn All
Setting ID_0x005a375c = 0x96861077 UserSpecified=true // Vertical Sync Tear Control
Setting ID_0x00a879cf = 0x60925292 UserSpecified=true // Vertical Sync
Setting ID_0x101ae763 = 0x00000000 UserSpecified=true // Flag to control smooth AFR behavior
Setting ID_0x10a879ac = 0x00000004 UserSpecified=true // G-SYNC
Setting ID_0x10a879cf = 0x00000004 UserSpecified=true // G-SYNC
Setting ID_0x1194f158 = 0x00000000 UserSpecified=true // Enable G-SYNC globally
Setting ID_0x80303a19 = 0x00000001
Setting ID_0x80857a28 = 0x00000001
Setting ID_0x809d5f60 = 0x00000001
EndProfile

Profile "Lords of the Fallen"
ShowOn GeForce
Executable "lordsofthefallen.exe"
Setting ID_0x002c7f45 = 0x00140000
Setting ID_0x00664339 = 0x00000001 // NVIDIA Predefined Ambient Occlusion Usage
Setting ID_0x00a06946 = 0x000020f5
Setting ID_0x10115c8b = 0x00000028 // Quiet Mode Application FPS
Setting ID_0x1033cec2 = 0x00000002 // NVIDIA predefined SLI mode on DirectX 10
Setting ID_0x1033dcd3 = 0x00000004 // NVIDIA predefined number of GPUs to use on SLI rendering mode on DirectX 10
Setting ID_0x106d5cff = 0x00000000 // Do not display this profile in the Control Panel
Setting ID_0x10f9dc81 = 0x00000011 // Enable application for Optimus
Setting ID_0x701eb457 = 0x00000001 // StereoProfile
Setting ID_0x702442fc = 0x00005008 UserSpecified=true // StereoFlagsDX10
SettingString ID_0x7049c7ec = "5" // Value
SettingString ID_0x704d456e = "Fix by: Mike_ar69, Helifax, Masterotaku, Kaimasta" UserSpecified=true // Comments
SettingString ID_0x7051e5f5 = "0" UserSpecified=true // Compat
Setting ID_0x707f4b45 = 0x00000001 UserSpecified=true // StereoMemoEnabled
Setting ID_0x708db8c5 = 0x4341f366 // StereoConvergence = 193.950775
Setting ID_0x709a1ddf = 0x00000001 // StereoCutoff
Setting ID_0x709adada = 0x10000006 // 2DDHUDSettings
Setting ID_0x709adadb = 0x3f6f402c // 2DDConvergence = 0.934572935
SettingString ID_0x709adadc = "Some objects render at wrong depth." // 2DD_Notes
Setting ID_0x709adadd = 0x00000001 UserSpecified=true // Disable2DD
SettingString ID_0x70b5603f = "D3D" // API
Setting ID_0x70edb381 = 0x00000023 // StereoTextureEnable
EndProfile

----------- End driver profile settings -----------
No profile update required

*** D3D11 DLL successfully initialized. ***

Trying to load original_d3d11.dll
Hooked_LoadLibraryExW switching to original dll: original_d3d11.dll to C:\WINDOWS\system32\d3d11.dll.
Hooked_CreateDXGIFactory1 called with riid: IDXGIFactory1
calling original CreateDXGIFactory1 API
CreateDXGIFactory1 returned factory = 0000000033E63CE0, result = 0
new HackerDXGIFactory1(class HackerDXGIFactory1@000000004C110250) wrapped 0000000033E63CE0
HackerDXGIFactory1::EnumAdapters1(class HackerDXGIFactory1@000000004C110250) adapter 0 requested
created HackerDXGIAdapter1 wrapper = 000000004C110400 of 000000004C0B6520
returns result = 0
HackerDXGIAdapter1::GetDesc1(class HackerDXGIAdapter1@000000004C110400) called
returns adapter: NVIDIA GeForce GTX 980 Ti, sysmem=0, vidmem=6290079744, flags=0


*** D3D11CreateDevice called with
pAdapter = 000000004C110400
Flags = 0
pFeatureLevels = 0xb000
FeatureLevels = 3
ppDevice = 000000000014CB18
pFeatureLevel = 0
ppImmediateContext = 000000004C125B20
return HackerDXGIAdapter1 wrapper = 000000004C110400
HackerDXGIAdapter1::GetDesc1(class HackerDXGIAdapter1@000000004C110400) called
returns adapter: NVIDIA GeForce GTX 980 Ti, sysmem=0, vidmem=6290079744, flags=0
HackerUnknown::Release(class HackerDXGIAdapter1@000000004C110400), counter=1, this=000000004C110400
HackerDXGIObject::GetPrivateData(class HackerDXGIAdapter1@000000004C110400) called with GUID: {1D6AD054-FB2F-4000-B3AB-E873A9131A7C}
returns result = 887a0002
HackerUnknown::AddRef(class HackerDXGIAdapter1@000000004C110400), counter=2, this=000000004C110400
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
Hooked_CreateDXGIFactory called with riid: IDXGIFactory
calling original CreateDXGIFactory API
CreateDXGIFactory1 returned factory = 0000000033E64300, result = 0
new HackerDXGIFactory1(class HackerDXGIFactory1@000000004C11ADB0) wrapped 0000000033E64300
HackerDXGIFactory::QueryInterface(class HackerDXGIFactory1@000000004C11ADB0) called with IID: IDXGIFactory2
*** returns E_NOINTERFACE as error for IDXGIFactory2.
HackerUnknown::Release(class HackerDXGIFactory1@000000004C11ADB0), counter=1, this=000000004C11ADB0
Replaced Hooked_LoadLibraryExW for: C:\WINDOWS\system32\nvapi64.dll to nvapi64.dll.
HackerDXGIObject::GetPrivateData(class HackerDXGIAdapter1@000000004C110400) called with GUID: {D722FB4D-7A68-437A-B20C-5804EE2494A6}
returns result = 887a0002
HackerUnknown::Release(class HackerDXGIAdapter1@000000004C110400), counter=4, this=000000004C110400
D3D11CreateDevice returned device handle = 00000000336F8180, context handle = 000000004C449158
HackerDevice 000000004C2E5E20 created to wrap 00000000336F8180
HackerContext 000000004C4B6F60 created to wrap 000000004C449158
HackerDevice::Create3DMigotoResources(class HackerDevice@000000004C2E5E20) call


This is "out-of the box" wrapper.
Last version that it works is 1.2.56

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/24/2017 01:14 PM   
[quote="bo3b"][quote="mx-2"][quote="bo3b"]@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect. Please run it again with full debugging enabled, and we'll take a look: calls=1 input=1 debug=1 unbuffered=1 [/quote] I uploaded a log at [url]https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt[/url].[/quote] OK, thanks for that. The log shows that we are in fact hooking the DXGI properly, and creating the Overlay. But we never get Present calls. The problem is the game does another one of these incomprehensible things: [code]HackerUnknown::QueryInterface(class HackerDXGISwapChain@0000000002FFC2D0) called with IID: IDXGISwapChain returns result = 0 for 0000000003ABD838 [/code] That's a call to SwapChain asking for ... itself. We don't presently have any code around the QueryInterface for the HackerSwapChain because we never expect devs to do something like this. Of course, we can be surprised. This effectively unhooks us, because the returned result is what they use to call Present. For this game, we'll need a 3Dmigoto code change to look for that stupid self->whoami call, and return self. In the meantime, you can use 1.0.1 to hunt shaders. I know that it has been reported that it's not possible to hunt all shaders, but it is. There isn't any difference in how we handle the active shaders from 1.0.1 to 1.2.65, it's the active list in the current frame. If you can't find shaders with 1.0.1, you won't be able to with 1.2.65 after we fix it. Looking at the log, the most likely answer is that the game is using ComputeShaders for those effects. I don't think 1.0.1 had any ability to sift ComputeShaders, but it might be worth looking at the dumped ComputeShaders in ShaderCache.[/quote] Hi Bo3b, Do we expect a fix for this problem? Indeed 1.01 doesn't support CS. And dumping the shaders it seems there are quite a few CS. Looking in the VS and PS(s) for the shadows it doesn't look like we can do anything about it there. Looking in the CS it seems that most likely there is where the shadows are calculated. However, without knowing which shader is exact will take an insane amount of time binary chopping and finding the right CS for the shadows;)
bo3b said:
mx-2 said:
bo3b said:@mx-2: That's interesting that it does look like d3d11.dll is loaded properly from the game directory, and then dxgi is loaded from system like we'd expect.

Please run it again with full debugging enabled, and we'll take a look:
calls=1
input=1
debug=1
unbuffered=1

I uploaded a log at https://raw.githubusercontent.com/mx-2/log/master/d3d11_log_elex.txt.

OK, thanks for that. The log shows that we are in fact hooking the DXGI properly, and creating the Overlay. But we never get Present calls.


The problem is the game does another one of these incomprehensible things:

HackerUnknown::QueryInterface(class HackerDXGISwapChain@0000000002FFC2D0) called with IID: IDXGISwapChain
returns result = 0 for 0000000003ABD838

That's a call to SwapChain asking for ... itself.

We don't presently have any code around the QueryInterface for the HackerSwapChain because we never expect devs to do something like this. Of course, we can be surprised. This effectively unhooks us, because the returned result is what they use to call Present.

For this game, we'll need a 3Dmigoto code change to look for that stupid self->whoami call, and return self.


In the meantime, you can use 1.0.1 to hunt shaders. I know that it has been reported that it's not possible to hunt all shaders, but it is. There isn't any difference in how we handle the active shaders from 1.0.1 to 1.2.65, it's the active list in the current frame.

If you can't find shaders with 1.0.1, you won't be able to with 1.2.65 after we fix it.


Looking at the log, the most likely answer is that the game is using ComputeShaders for those effects. I don't think 1.0.1 had any ability to sift ComputeShaders, but it might be worth looking at the dumped ComputeShaders in ShaderCache.


Hi Bo3b,
Do we expect a fix for this problem? Indeed 1.01 doesn't support CS. And dumping the shaders it seems there are quite a few CS. Looking in the VS and PS(s) for the shadows it doesn't look like we can do anything about it there.
Looking in the CS it seems that most likely there is where the shadows are calculated. However, without knowing which shader is exact will take an insane amount of time binary chopping and finding the right CS for the shadows;)

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/24/2017 02:28 PM   
[quote="Helifax"]Hi Bo3b, Do we expect a fix for this problem? Indeed 1.01 doesn't support CS. And dumping the shaders it seems there are quite a few CS. Looking in the VS and PS(s) for the shadows it doesn't look like we can do anything about it there. Looking in the CS it seems that most likely there is where the shadows are calculated. However, without knowing which shader is exact will take an insane amount of time binary chopping and finding the right CS for the shadows;)[/quote] Just pushed up a fix for this lost object problem, although I don't have access to an example to test it. Also added a fix for the crash on null device, when platform_update=1. New build will depend upon what DarkStarSword has in progress, I'll let him post the build when he's ready.
Helifax said:Hi Bo3b,
Do we expect a fix for this problem? Indeed 1.01 doesn't support CS. And dumping the shaders it seems there are quite a few CS. Looking in the VS and PS(s) for the shadows it doesn't look like we can do anything about it there.
Looking in the CS it seems that most likely there is where the shadows are calculated. However, without knowing which shader is exact will take an insane amount of time binary chopping and finding the right CS for the shadows;)

Just pushed up a fix for this lost object problem, although I don't have access to an example to test it.

Also added a fix for the crash on null device, when platform_update=1.


New build will depend upon what DarkStarSword has in progress, I'll let him post the build when he's ready.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 10/25/2017 11:32 AM   
[quote="Helifax"]3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen. By breaking I mean it crashes on startup. Something in 1.2.57 onward is broken. Log: <snip> This is "out-of the box" wrapper. Last version that it works is 1.2.56[/quote] Same d3dx.ini and settings? Only code change of new dlls? This is strange because the difference between 1.2.56 and 1.2.57 is a bunch of changes starting implement platform_update, and should have no impact when that is not enabled. Looks like it crashes during CreateDevice. Maybe create a log with debug=1, unbuffered=1 for more detail.
Helifax said:3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen.
By breaking I mean it crashes on startup.
Something in 1.2.57 onward is broken.

Log:
<snip>

This is "out-of the box" wrapper.
Last version that it works is 1.2.56

Same d3dx.ini and settings? Only code change of new dlls?

This is strange because the difference between 1.2.56 and 1.2.57 is a bunch of changes starting implement platform_update, and should have no impact when that is not enabled.


Looks like it crashes during CreateDevice. Maybe create a log with debug=1, unbuffered=1 for more detail.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 10/25/2017 11:54 AM   
[quote="bo3b"][quote="Helifax"]3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen. By breaking I mean it crashes on startup. Something in 1.2.57 onward is broken. Log: <snip> This is "out-of the box" wrapper. Last version that it works is 1.2.56[/quote] Same d3dx.ini and settings? Only code change of new dlls? This is strange because the difference between 1.2.56 and 1.2.57 is a bunch of changes starting implement platform_update, and should have no impact when that is not enabled. Looks like it crashes during CreateDevice. Maybe create a log with debug=1, unbuffered=1 for more detail.[/quote] Yes, same ini, just the dlls where changed (all of them). I'll try to get an extra log for this;) Many thanks for the other changes! Looking forward to test it out! many thanks again!
bo3b said:
Helifax said:3D Migoto version 1.2.57 is the first one that breaks Lords of the Fallen.
By breaking I mean it crashes on startup.
Something in 1.2.57 onward is broken.

Log:
<snip>

This is "out-of the box" wrapper.
Last version that it works is 1.2.56

Same d3dx.ini and settings? Only code change of new dlls?

This is strange because the difference between 1.2.56 and 1.2.57 is a bunch of changes starting implement platform_update, and should have no impact when that is not enabled.


Looks like it crashes during CreateDevice. Maybe create a log with debug=1, unbuffered=1 for more detail.


Yes, same ini, just the dlls where changed (all of them).
I'll try to get an extra log for this;)

Many thanks for the other changes! Looking forward to test it out! many thanks again!

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/25/2017 02:28 PM   
[quote="bo3b"]Just pushed up a fix for this lost object problem, although I don't have access to an example to test it. Also added a fix for the crash on null device, when platform_update=1. New build will depend upon what DarkStarSword has in progress, I'll let him post the build when he's ready.[/quote]I just tried it but unfortunately it doesn't work for me because typeid(this) == "class HackerUnknown * __ptr64" and typeid(*ppvObject) is "void * __ptr64". Below is a really ugly hack which uses the logging string output. With that hack hunting and overlay works. [code] string thisName = type_name(this); if (thisName.find("class Hacker", 0) == 0) thisName.erase(0, 12); // leading "class Hacker" string riidName = NameFromIID(riid); riidName.erase(0, 1); // leading "I" if (thisName == riidName) *ppvObject = this; [/code] Edit: Attachement deleted
bo3b said:Just pushed up a fix for this lost object problem, although I don't have access to an example to test it.

Also added a fix for the crash on null device, when platform_update=1.


New build will depend upon what DarkStarSword has in progress, I'll let him post the build when he's ready.
I just tried it but unfortunately it doesn't work for me because
typeid(this) == "class HackerUnknown * __ptr64" and typeid(*ppvObject) is "void * __ptr64".

Below is a really ugly hack which uses the logging string output. With that hack hunting and overlay works.
string thisName = type_name(this);
if (thisName.find("class Hacker", 0) == 0)
thisName.erase(0, 12); // leading "class Hacker"
string riidName = NameFromIID(riid);
riidName.erase(0, 1); // leading "I"

if (thisName == riidName)
*ppvObject = this;


Edit: Attachement deleted
[quote="mx-2"]I just tried it but unfortunately it doesn't work for me because typeid(this) == "class HackerUnknown * __ptr64" and typeid(*ppvObject) is "void * __ptr64". Below is a really ugly hack which uses the logging string output. With that hack hunting and overlay works. I attach the modified dll to this post. [code] string thisName = type_name(this); if (thisName.find("class Hacker", 0) == 0) thisName.erase(0, 12); // leading "class Hacker" string riidName = NameFromIID(riid); riidName.erase(0, 1); // leading "I" if (thisName == riidName) *ppvObject = this; [/code] [/quote]OK, that's interesting. Thanks for taking a look. I pretty much figured that typeid wasn't going to cut it, but that implies that it also has more object weirdness, in that at that point, I can't get what the true object would be. I don't want to know it's a HackerUnknown, I already know that. I need to know if it's say a HackerDXGISwapChain at that moment. Helpful to know that's what happens. On the plus side, at least I successfully figured out what the original problem was. :-> For this hack-fix, you can just do the assignment, [i]*ppvObject=this[/i]. This is going to be called super rarely. That will fix it for this specific problem. I'll look into a better way to handle this.
mx-2 said:I just tried it but unfortunately it doesn't work for me because
typeid(this) == "class HackerUnknown * __ptr64" and typeid(*ppvObject) is "void * __ptr64".

Below is a really ugly hack which uses the logging string output. With that hack hunting and overlay works. I attach the modified dll to this post.

string thisName = type_name(this);
if (thisName.find("class Hacker", 0) == 0)
thisName.erase(0, 12); // leading "class Hacker"
string riidName = NameFromIID(riid);
riidName.erase(0, 1); // leading "I"

if (thisName == riidName)
*ppvObject = this;

OK, that's interesting. Thanks for taking a look. I pretty much figured that typeid wasn't going to cut it, but that implies that it also has more object weirdness, in that at that point, I can't get what the true object would be. I don't want to know it's a HackerUnknown, I already know that. I need to know if it's say a HackerDXGISwapChain at that moment. Helpful to know that's what happens.

On the plus side, at least I successfully figured out what the original problem was. :->


For this hack-fix, you can just do the assignment, *ppvObject=this. This is going to be called super rarely. That will fix it for this specific problem. I'll look into a better way to handle this.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 10/25/2017 05:04 PM   
[quote="mx-2"]I attach the modified dll to this post.[/quote]Can we maybe give you commit access to 3DMigoto? I'd rather avoid side builds like this one becoming public (giving them to an individual for testing is fine) because we end up with version numbers in the wild that don't match the source code, so I'd rather that if you do a public release like this that you use the proper publish.bat, tag and upload it to github. For now (unless Bo3b has anything to supersede it?) I'll just apply your code to 3DMigoto (tagging you as the author) and do a proper release - I had wanted to get some things I need for the UE4 extension DLL in for this release, but if I put them in now they will be a rush job and we've got enough important fixes (and ShaderRegex) as it is that I don't want to delay it any longer.
mx-2 said:I attach the modified dll to this post.
Can we maybe give you commit access to 3DMigoto? I'd rather avoid side builds like this one becoming public (giving them to an individual for testing is fine) because we end up with version numbers in the wild that don't match the source code, so I'd rather that if you do a public release like this that you use the proper publish.bat, tag and upload it to github.

For now (unless Bo3b has anything to supersede it?) I'll just apply your code to 3DMigoto (tagging you as the author) and do a proper release - I had wanted to get some things I need for the UE4 extension DLL in for this release, but if I put them in now they will be a rush job and we've got enough important fixes (and ShaderRegex) as it is that I don't want to delay it any longer.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 10/26/2017 04:04 AM   
  109 / 143    
Scroll To Top