I just want to say that the idea from the Helixmod feature list guide about analyzing the depth of one register in the shader worked. At least for the effect I wanted (Hakumen symbols) without breaking bloom. Some other HUD things were affected, so I'll have to fine tune it to try distinguishing between that symbol, HUD and bloom.
I'm glad because I couldn't get texture filtering to work, because with that "8EF88061.txt.ps" file, these values didn't work in DX9Settings.ini:
- [VS91D12237]
- [VS8EF88061]
- [PS8EF88061] (Using a PS for texture filtering isn't even documented. Probably can't be done, I guess).
I just want to say that the idea from the Helixmod feature list guide about analyzing the depth of one register in the shader worked. At least for the effect I wanted (Hakumen symbols) without breaking bloom. Some other HUD things were affected, so I'll have to fine tune it to try distinguishing between that symbol, HUD and bloom.
I'm glad because I couldn't get texture filtering to work, because with that "8EF88061.txt.ps" file, these values didn't work in DX9Settings.ini:
- [VS91D12237]
- [VS8EF88061]
- [PS8EF88061] (Using a PS for texture filtering isn't even documented. Probably can't be done, I guess).
A preset hotkey I made for changing a constant doesn't work! In DX9Setting.ini, changing the preset that has the "UseByDef = true" works, but ingame the hotkey doesn't change the constant value.
"8EF88061.txt.ps" (relevant part inside the second "else"):
//HUD, most of it. Blue color in second "tips". VS.
//
// Generated by Microsoft (R) D3DX9 Shader Compiler 9.15.779.0000
//
// Parameters:
//
// float fScreenHeigft;
// float fScreenWidth;
//
//
// Registers:
//
// Name Reg Size
// ------------- ----- ----
// fScreenWidth c0 1
// fScreenHeigft c1 1
//
//
// Default values:
//
// fScreenWidth
// c0 = { 0, 0, 0, 0 };
//
// fScreenHeigft
// c1 = { 0, 0, 0, 0 };
//
Maybe somebody can help me with this Compute Shader;) I know DarkStarSword is the MASTER of CS ^_^, but there are others who might know what to do:)
In short this a CS from Battlefield 1 (Frostbyte3 Engine). We currently have fixes for:
- Dragon Age: Inquisition
- Battlefield 4
- Battlefield Hard Line
- Star Wars: Battlefront
Yet, in all these fixes the CS are disabled!
I did manage to fix it in 3D;) But there is one tiny problem. The damn TILES. Basically the fix works but at certain angles and distances...
So can anyone help me out?:)
// Global Illumination
// RUNs ONCE per eye...grrr
If we manage to fix it once and for all then we can get all the other games fixed as well, as is pretty much the same logic;) and the shaders are like 90% the same or alike:)
Thank you in advance!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
Also can somebody tell me how I can skip a compute shader?
Setting handling=skip doesn't seem to work:-s
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
Not really a clue on what to try out...Hacking around it just crashes the game though:-s
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
[quote="helifax"]Also can somebody tell me how I can skip a compute shader?
Setting handling=skip doesn't seem to work:-s[/quote]
No way to use the ShaderOverride 'skip' for ComputeShader. Just took a look at the code, and CS are not in that code sequence. They were there earlier, but I think that DarkStarSword pulled it out because if we skip CS, that typically leads to a crash.
Best bet to emulate skip would be to try to get an idea of how to disable a CS without crashing. Would require looking at the code and deciding what might be skippable. Pretty unclear though.
Example for these tiled lighting shaders. If we skip the CS, that might make the number of tiles=0, which would could easily destroy some later CS or PS that is not expecting a no-tiles scenario. Putting all the calculations on the GPU is bad for us because the tools for GPU debugging are weak (not just 3Dmigoto).
For the tiled lighting problem, you might ping DarkStarSword by PM. I know he's really booked up and probably has no time to read the forum. PM's send an email notification.
Also, you have probably already looked, but if not, check his github repo (not 3Dmigoto) for examples of CS fixes. https://github.com/DarkStarSword/3d-fixes
helifax said:Also can somebody tell me how I can skip a compute shader?
Setting handling=skip doesn't seem to work:-s
No way to use the ShaderOverride 'skip' for ComputeShader. Just took a look at the code, and CS are not in that code sequence. They were there earlier, but I think that DarkStarSword pulled it out because if we skip CS, that typically leads to a crash.
Best bet to emulate skip would be to try to get an idea of how to disable a CS without crashing. Would require looking at the code and deciding what might be skippable. Pretty unclear though.
Example for these tiled lighting shaders. If we skip the CS, that might make the number of tiles=0, which would could easily destroy some later CS or PS that is not expecting a no-tiles scenario. Putting all the calculations on the GPU is bad for us because the tools for GPU debugging are weak (not just 3Dmigoto).
For the tiled lighting problem, you might ping DarkStarSword by PM. I know he's really booked up and probably has no time to read the forum. PM's send an email notification.
[quote="bo3b"][quote="helifax"]Also can somebody tell me how I can skip a compute shader?
Setting handling=skip doesn't seem to work:-s[/quote]
No way to use the ShaderOverride 'skip' for ComputeShader. Just took a look at the code, and CS are not in that code sequence. They were there earlier, but I think that DarkStarSword pulled it out because if we skip CS, that typically leads to a crash.
Best bet to emulate skip would be to try to get an idea of how to disable a CS without crashing. Would require looking at the code and deciding what might be skippable. Pretty unclear though.
Example for these tiled lighting shaders. If we skip the CS, that might make the number of tiles=0, which would could easily destroy some later CS or PS that is not expecting a no-tiles scenario. Putting all the calculations on the GPU is bad for us because the tools for GPU debugging are weak (not just 3Dmigoto).
For the tiled lighting problem, you might ping DarkStarSword by PM. I know he's really booked up and probably has no time to read the forum. PM's send an email notification.
Also, you have probably already looked, but if not, check his github repo (not 3Dmigoto) for examples of CS fixes. https://github.com/DarkStarSword/3d-fixes[/quote]
Cheers Bo3b!
I noticed that and had a feeling this was the case, so I went ahead and started "chopping" at the CS until I actually managed to disable it. (It has some secondary consequences of course - like you said - but no crashing, just missing other stuff).
Definitely NOT ideal but the damn FrostByte 3 Computes still elude me :)) (The DAMN tiles... Fixing the rendering wasn't that bad, but making it "stick" on all angles is a PAIN ^_^ )
Thanks, I'll try to PM DSS about it and see if he has some time for me;)
Thank you again!
PS: Also if somebody is interested in a FULL ASM matrix inverse code let me know (I had some issues with the HLSL shader bound in ASM so I decided to make one in ASM;) Well FXC did most of the work, I just helped it out ^_^).
helifax said:Also can somebody tell me how I can skip a compute shader?
Setting handling=skip doesn't seem to work:-s
No way to use the ShaderOverride 'skip' for ComputeShader. Just took a look at the code, and CS are not in that code sequence. They were there earlier, but I think that DarkStarSword pulled it out because if we skip CS, that typically leads to a crash.
Best bet to emulate skip would be to try to get an idea of how to disable a CS without crashing. Would require looking at the code and deciding what might be skippable. Pretty unclear though.
Example for these tiled lighting shaders. If we skip the CS, that might make the number of tiles=0, which would could easily destroy some later CS or PS that is not expecting a no-tiles scenario. Putting all the calculations on the GPU is bad for us because the tools for GPU debugging are weak (not just 3Dmigoto).
For the tiled lighting problem, you might ping DarkStarSword by PM. I know he's really booked up and probably has no time to read the forum. PM's send an email notification.
Cheers Bo3b!
I noticed that and had a feeling this was the case, so I went ahead and started "chopping" at the CS until I actually managed to disable it. (It has some secondary consequences of course - like you said - but no crashing, just missing other stuff).
Definitely NOT ideal but the damn FrostByte 3 Computes still elude me :)) (The DAMN tiles... Fixing the rendering wasn't that bad, but making it "stick" on all angles is a PAIN ^_^ )
Thanks, I'll try to PM DSS about it and see if he has some time for me;)
Thank you again!
PS: Also if somebody is interested in a FULL ASM matrix inverse code let me know (I had some issues with the HLSL shader bound in ASM so I decided to make one in ASM;) Well FXC did most of the work, I just helped it out ^_^).
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
[quote="helifax"]PS: Also if somebody is interested in a FULL ASM matrix inverse code let me know (I had some issues with the HLSL shader bound in ASM so I decided to make one in ASM;) Well FXC did most of the work, I just helped it out ^_^).[/quote]
Heh! I had actually built one in ASM for DHR for JC3. He wound up not needing it because he got the ASM linked HLSL working.
However, I would be interested in posting your ASM code to the Wiki for others to reference. The version I made had pieces culled because of fxc optimizations, which sounds like maybe you avoided.
Please post to wiki.bo3b.net, or post here and I can add it. Adding a new page for matrix inversions of different techniques would be good.
helifax said:PS: Also if somebody is interested in a FULL ASM matrix inverse code let me know (I had some issues with the HLSL shader bound in ASM so I decided to make one in ASM;) Well FXC did most of the work, I just helped it out ^_^).
Heh! I had actually built one in ASM for DHR for JC3. He wound up not needing it because he got the ASM linked HLSL working.
However, I would be interested in posting your ASM code to the Wiki for others to reference. The version I made had pieces culled because of fxc optimizations, which sounds like maybe you avoided.
Please post to wiki.bo3b.net, or post here and I can add it. Adding a new page for matrix inversions of different techniques would be good.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
@helifax
The injected inverse matrix + fixing code from Mirror Edge i send you is not working in the CS Tile Lights from B1?. Is easier to use the injected one.
If don't work will make this a more strange case.
Those CS Lights have a very strange behavior....i almost sure is a profile/driver stuff....I already try i few thing with Mirror Edge, but in some spot and some angles only renders in one eye.
@bo3b
Was for Mirror Edge Catalyst....that inverted matrix also works fine! but is easier to use the injected one, because the input and outputs are more evident/clear.
@helifax
The injected inverse matrix + fixing code from Mirror Edge i send you is not working in the CS Tile Lights from B1?. Is easier to use the injected one.
If don't work will make this a more strange case.
Those CS Lights have a very strange behavior....i almost sure is a profile/driver stuff....I already try i few thing with Mirror Edge, but in some spot and some angles only renders in one eye.
@bo3b
Was for Mirror Edge Catalyst....that inverted matrix also works fine! but is easier to use the injected one, because the input and outputs are more evident/clear.
Sure thing Bo3b.
I'll paste it here so you can put it on wiki.bo3b.net where you believe is best place to have it;)
This code is with all the optimisations removed. So, is as clean as the original HLSL source.
// Inverse
// cb0[0], etc is the inverseMatrix
mov r0.xyzw, cb0[0].xyzw
mov r1.xyzw, cb0[1].xyzw
mov r2.xyzw, cb0[2].xyzw
mov r3.xyzw, cb0[3].xyzw
mul r4.x, r2.z, r3.w
mul r4.y, r2.w, r3.z
mov r4.y, -r4.y
add r4.x, r4.y, r4.x
mul r4.x, r1.y, r4.x
mul r4.y, r2.w, r3.y
mul r4.z, r2.y, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.z, r4.y
add r4.x, r4.y, r4.x
mul r4.y, r2.y, r3.z
mul r4.z, r2.z, r3.y
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.w, r4.y
add r4.x, r4.y, r4.x
mul r4.x, r0.x, r4.x
mul r4.y, r2.w, r3.z
mul r4.z, r2.z, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.x, r3.w
mul r4.w, r2.w, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.z, r3.x
mul r4.w, r2.x, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.y, r4.y
add r4.x, r4.y, r4.x
mul r4.y, r2.y, r3.w
mul r4.z, r2.w, r3.y
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.w, r3.x
mul r4.w, r2.x, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.y, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.x, r3.y
mul r4.w, r2.y, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.z, r4.y
add r4.x, r4.y, r4.x
mul r4.y, r2.z, r3.y
mul r4.z, r2.y, r3.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.x, r3.z
mul r4.w, r2.z, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.y, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.y, r3.x
mul r4.w, r2.x, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.z, r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.w, r4.y
add r4.x, r4.y, r4.x
mul r4.y, r2.z, r3.w
mul r4.z, r2.w, r3.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.y, r4.y
mul r4.z, r2.w, r3.y
mul r4.w, r2.y, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.y, r3.z
mul r4.w, r2.z, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r5.x, r4.z, r4.y
mul r4.y, r2.w, r3.z
mul r4.z, r2.z, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.y, r4.y
mul r4.z, r2.y, r3.w
mul r4.w, r2.w, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.z, r3.y
mul r4.w, r2.y, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r5.y, r4.z, r4.y
mul r4.y, r1.z, r3.w
mul r4.z, r1.w, r3.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.y, r4.y
mul r4.z, r1.w, r3.y
mul r4.w, r1.y, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r1.y, r3.z
mul r4.w, r1.z, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r5.z, r4.z, r4.y
mul r4.y, r1.w, r2.z
mul r4.z, r1.z, r2.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.y, r4.y
mul r4.z, r1.y, r2.w
mul r4.w, r1.w, r2.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r1.z, r2.y
mul r4.w, r1.y, r2.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r5.w, r4.z, r4.y
mul r4.y, r2.w, r3.z
mul r4.z, r2.z, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.x, r3.w
mul r4.w, r2.w, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.z, r3.x
mul r4.w, r2.x, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r6.x, r4.z, r4.y
mul r4.y, r2.z, r3.w
mul r4.z, r2.w, r3.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r2.w, r3.x
mul r4.w, r2.x, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.x, r3.z
mul r4.w, r2.z, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r6.y, r4.z, r4.y
mul r4.y, r1.w, r3.z
mul r4.z, r1.z, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r1.x, r3.w
mul r4.w, r1.w, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r1.z, r3.x
mul r4.w, r1.x, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r6.z, r4.z, r4.y
mul r4.y, r1.z, r2.w
mul r4.z, r1.w, r2.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r1.w, r2.x
mul r4.w, r1.x, r2.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r1.x, r2.z
mul r4.w, r1.z, r2.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r6.w, r4.z, r4.y
mul r4.y, r2.y, r3.w
mul r4.z, r2.w, r3.y
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.w, r3.x
mul r4.w, r2.x, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.y, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.x, r3.y
mul r4.w, r2.y, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r7.x, r4.z, r4.y
mul r4.y, r2.w, r3.y
mul r4.z, r2.y, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r2.x, r3.w
mul r4.w, r2.w, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.y, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.y, r3.x
mul r4.w, r2.x, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r7.y, r4.z, r4.y
mul r4.y, r1.y, r3.w
mul r4.z, r1.w, r3.y
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r1.w, r3.x
mul r3.w, r1.x, r3.w
mov r3.w, -r3.w
add r3.w, r3.w, r4.z
mul r3.w, r0.y, r3.w
add r3.w, r3.w, r4.y
mul r4.y, r1.x, r3.y
mul r4.z, r1.y, r3.x
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.w, r4.y
add r7.z, r3.w, r4.y
mul r3.w, r1.w, r2.y
mul r4.y, r1.y, r2.w
mov r4.y, -r4.y
add r3.w, r3.w, r4.y
mul r3.w, r0.x, r3.w
mul r2.w, r1.x, r2.w
mul r1.w, r1.w, r2.x
mov r1.w, -r1.w
add r1.w, r1.w, r2.w
mul r1.w, r0.y, r1.w
add r1.w, r1.w, r3.w
mul r2.w, r1.y, r2.x
mul r3.w, r1.x, r2.y
mov r3.w, -r3.w
add r2.w, r2.w, r3.w
mul r0.w, r0.w, r2.w
add r7.w, r0.w, r1.w
mul r0.w, r2.z, r3.y
mul r1.w, r2.y, r3.z
mov r1.w, -r1.w
add r0.w, r0.w, r1.w
mul r0.w, r0.w, r1.x
mul r1.w, r2.x, r3.z
mul r2.w, r2.z, r3.x
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r1.w, r1.y
add r0.w, r0.w, r1.w
mul r1.w, r2.y, r3.x
mul r2.w, r2.x, r3.y
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r1.w, r1.z
add r8.x, r0.w, r1.w
mul r0.w, r2.y, r3.z
mul r1.w, r2.z, r3.y
mov r1.w, -r1.w
add r0.w, r0.w, r1.w
mul r0.w, r0.w, r0.x
mul r1.w, r2.z, r3.x
mul r2.w, r2.x, r3.z
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r0.y, r1.w
add r0.w, r0.w, r1.w
mul r1.w, r2.x, r3.y
mul r2.w, r2.y, r3.x
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r0.z, r1.w
add r8.y, r0.w, r1.w
mul r0.w, r1.z, r3.y
mul r1.w, r1.y, r3.z
mov r1.w, -r1.w
add r0.w, r0.w, r1.w
mul r0.w, r0.w, r0.x
mul r1.w, r1.x, r3.z
mul r2.w, r1.z, r3.x
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r0.y, r1.w
add r0.w, r0.w, r1.w
mul r1.w, r1.y, r3.x
mul r2.w, r1.x, r3.y
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r0.z, r1.w
add r8.z, r0.w, r1.w
mul r0.w, r1.y, r2.z
mul r1.w, r1.z, r2.y
mov r1.w, -r1.w
add r0.w, r0.w, r1.w
mul r0.x, r0.w, r0.x
mul r0.w, r1.z, r2.x
mul r1.z, r1.x, r2.z
mov r1.z, -r1.z
add r0.w, r0.w, r1.z
mul r0.y, r0.w, r0.y
add r0.x, r0.y, r0.x
mul r0.y, r1.x, r2.y
mul r0.w, r1.y, r2.x
mov r0.w, -r0.w
add r0.y, r0.w, r0.y
mul r0.y, r0.y, r0.z
add r8.w, r0.y, r0.x
div r0.xyzw, r5.xyzw, r4.xxxx
div r1.xyzw, r6.xyzw, r4.xxxx
div r2.xyzw, r7.xyzw, r4.xxxx
div r3.xyzw, r8.xyzw, r4.xxxx
// Store results for later use as r0-r4 are most
// likely to be used by the default shader code.
// r50 equivalent of matrix._m00_m01_m02_m03
// r51 equivalent of matrix._m10_m11_m12_m13
// r52 equivalent of matrix._m20_m21_m22_m23
// r53 equivalent of matrix._m30_m31_m32_m33
If you follow the ASM code and compare it with the HLSL, you can see exactly that is the same without any weird optimizations inside ;) (It also explains why the ASM code is soo long ^_^).
Also, this code should be pasted at the beginning of main() to avoid overwriting any of the existed registers. In short it should be the first thing that get executed in that shader (I know it is obvious for you, but other readers might be confused).
The code works as I am currently using it in BF1 fix.
Thank you!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
Hi all,
I haven't used 3dMigoto yet but I read in this thread that matrix inversion has to be done manually in the shader code. The commonly used code seems to be based on Cramer's rule. I think that some code based on the Gauss-Jordan algorithm should be much faster on a GPU because it exploits the vector characteristic of the registers, especially if the code is executed for each pixel/vertex.
Has anybody tried that out yet, or is there some reason for the used code ?
For a (not debugged) code example see below.
[code]
// inverseMatrix.asm
// Matrix inversion with Gauss-Jordan elimination algorithm
// input matrix is in r0-r3
// output will be in r4-r7
// r8, r9 are used as temporary registers
// c200 = (1,0,0,0) is required
// r0.x r0.y r0.z r0.w | r4.x, r4.y, r4.z, r4.w
// r1.x r1.y r1.z r1.w | r5.x, r5.y, r5.z, r5.w
// r2.x r2.y r2.z r2.w | r6.x, r6.y, r6.z, r6.w
// r3.x r3.y r3.z r3.w | r7.x, r7.y, r7.z, r7.w
// Init registers
def c200, 1, 0, 0, 0
mov r4, c200.xyzw
mov r5, c200.wxyz
mov r6, c200.zwxy
mov r7, c200.yzwx
// First column
rcp r8.x, r0.x
mul r8.y, r8.x, r1.x
mul r9, r0, r8.y
sub r1, r1, r9
mul r9, r4, r8.y
sub r5, r5, r9
mul r8.y, r8.x, r2.x
mul r9, r0, r8.y
sub r2, r2, r9
mul r9, r4, r8.y
sub r6, r6, r9
mul r8.y, r8.x, r3.x
mul r9, r0, r8.y
sub r3, r3, r9
mul r9, r4, r8.y
sub r7, r7, r9
// Second column
rcp r8.x, r1.y
mul r8.y, r8.x, r2.y
mul r9, r1, r8.y
sub r2, r2, r9
mul r9, r5, r8.y
sub r6, r6, r9
mul r8.y, r8.x, r3.y
mul r9, r1, r8.y
sub r3, r3, r9
mul r9, r5, r8.y
sub r7, r7, r9
// Third column
rcp r8.x, r2.z
mul r8.y, r8.x, r3.z
mul r9, r2, r8.y
sub r3, r3, r9
mul r9, r6, r8.y
sub r7, r7, r9
// Normalize r3.w
rcp r8.x, r3.w
mul r3, r3, r8.x
mul r7, r7, r8.w
// Fourth column
mul r9, r3, r2.w
sub r2, r2, r9
mul r9, r7, r2.w
sub r6, r6, r9
mul r9, r3, r1.w
sub r1, r1, r9
mul r9, r7, r1.w
sub r5, r5, r9
mul r9, r3, r0.w
sub r0, r0, r9
mul r9, r7, r0.w
sub r4, r4, r9
// Third column (upper part)
mul r9, r2, r1.z
sub r1, r1, r9
mul r9, r6, r1.z
sub r5, r5, r9
mul r9, r2, r0.z
sub r0, r0, r9
mul r9, r6, r0.z
sub r4, r4, r9
// Second column (upper part)
mul r9, r1, r0.y
sub r0, r0, r9
mul r9, r5, r0.y
sub r4, r4, r9
// Normalize first column
rcp r8.x, r0.x
mul r0, r0, r8.x
mul r4, r4, r8.x
[/code]
I haven't used 3dMigoto yet but I read in this thread that matrix inversion has to be done manually in the shader code. The commonly used code seems to be based on Cramer's rule. I think that some code based on the Gauss-Jordan algorithm should be much faster on a GPU because it exploits the vector characteristic of the registers, especially if the code is executed for each pixel/vertex.
Has anybody tried that out yet, or is there some reason for the used code ?
For a (not debugged) code example see below.
// inverseMatrix.asm
// Matrix inversion with Gauss-Jordan elimination algorithm
// input matrix is in r0-r3
// output will be in r4-r7
// r8, r9 are used as temporary registers
// c200 = (1,0,0,0) is required
[quote="DHR"]@helifax
The injected inverse matrix + fixing code from Mirror Edge i send you is not working in the CS Tile Lights from B1?. Is easier to use the injected one.
If don't work will make this a more strange case.
Those CS Lights have a very strange behavior....i almost sure is a profile/driver stuff....I already try i few thing with Mirror Edge, but in some spot and some angles only renders in one eye.
[/quote]
1) The HLSL for Matrix Inversion works;) but 3DMigoto just decides to make a low beep when I put in the HASH of the Compute shader. So it doesn't work with the game. Doesn't like that HASH for some reason no matter what I do.
Thus, I had to put the matrix inverse in the shader code :)
2)The CS not working IS NOT A DRIVER ISSUE. Actually is the CS shaders that needs fixing! I haven't made FULL fix for them, but I see where things go wrong! Believe it or not, but the driver is actually working as it should;) It only affects the LEFT eye. What we know is that for Left eye we say "(-1) * separation". I expect that the CS doesn't like the NEGATIVE value of the position and just discards it or does weird thing with it!
@DHR:
I managed to hack it to some degree, but is not a proper fix;)
Sadly, I don't know what much about Compute shaders in 3D Vision. I know DSS is the expert as he always helped me before with them. If you want we can try to see what is wrong, but without a proper understand I don't think we can come up with the true formula;)
@mx-2:
- Thanks for that code. Didn't try it yet, but it definitely makes sense;) Big thanks for your reply!!!
DHR said:@helifax
The injected inverse matrix + fixing code from Mirror Edge i send you is not working in the CS Tile Lights from B1?. Is easier to use the injected one.
If don't work will make this a more strange case.
Those CS Lights have a very strange behavior....i almost sure is a profile/driver stuff....I already try i few thing with Mirror Edge, but in some spot and some angles only renders in one eye.
1) The HLSL for Matrix Inversion works;) but 3DMigoto just decides to make a low beep when I put in the HASH of the Compute shader. So it doesn't work with the game. Doesn't like that HASH for some reason no matter what I do.
Thus, I had to put the matrix inverse in the shader code :)
2)The CS not working IS NOT A DRIVER ISSUE. Actually is the CS shaders that needs fixing! I haven't made FULL fix for them, but I see where things go wrong! Believe it or not, but the driver is actually working as it should;) It only affects the LEFT eye. What we know is that for Left eye we say "(-1) * separation". I expect that the CS doesn't like the NEGATIVE value of the position and just discards it or does weird thing with it!
@DHR:
I managed to hack it to some degree, but is not a proper fix;)
Sadly, I don't know what much about Compute shaders in 3D Vision. I know DSS is the expert as he always helped me before with them. If you want we can try to see what is wrong, but without a proper understand I don't think we can come up with the true formula;)
@mx-2:
- Thanks for that code. Didn't try it yet, but it definitely makes sense;) Big thanks for your reply!!!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
About my Blazblue problems and after trying a few things, I've noticed something: my "8EF88061.txt.ps" file isn't receiving constants (that I want to use with a hotkey). Even if I write (important parts of DX9Settings.ini here):
[code]
DefVSConst1 = 190
DefPSConst1 = 190
[KEY3]
Key = 114
Presets = 3;4;
Type = 1
[PRES3]
Const1 = 0x3f800000
[PRES4]
Const1 = 0x00000000
UseByDef = true
[/code]
The shader always treats c190.x as 0, even if I make both presets have the "0x3f800000" value. Changing the other part of the "if_eq" in the shader (if_eq r27.x, c190.x) (where I r27.x refers to a constant defined in the shader that can be 0 or 1, for testing), changed the effects ingame correctly. The "Const1" value isn't reaching the shader correctly.
Is this normal?
About my Blazblue problems and after trying a few things, I've noticed something: my "8EF88061.txt.ps" file isn't receiving constants (that I want to use with a hotkey). Even if I write (important parts of DX9Settings.ini here):
DefVSConst1 = 190
DefPSConst1 = 190
[KEY3]
Key = 114
Presets = 3;4;
Type = 1
[PRES3]
Const1 = 0x3f800000
[PRES4]
Const1 = 0x00000000
UseByDef = true
The shader always treats c190.x as 0, even if I make both presets have the "0x3f800000" value. Changing the other part of the "if_eq" in the shader (if_eq r27.x, c190.x) (where I r27.x refers to a constant defined in the shader that can be 0 or 1, for testing), changed the effects ingame correctly. The "Const1" value isn't reaching the shader correctly.
Hi Bo3b, I was wondering if there is something I can do to decrease the time it takes the wrapper to dump the shaders?
In any frostbyte 3 game it takes 15 minutes to load a game... I am currently only using "export_hlsl=2" option.
Nomally it dumps around 20k shaders when a level loads...
If you can think of anything I can do to decrease this insane time, please let me know!
Cheers!
Hi Bo3b, I was wondering if there is something I can do to decrease the time it takes the wrapper to dump the shaders?
In any frostbyte 3 game it takes 15 minutes to load a game... I am currently only using "export_hlsl=2" option.
Nomally it dumps around 20k shaders when a level loads...
If you can think of anything I can do to decrease this insane time, please let me know!
Cheers!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
[quote="masterotaku"]About my Blazblue problems and after trying a few things, I've noticed something: my "8EF88061.txt.ps" file isn't receiving constants (that I want to use with a hotkey). Even if I write (important parts of DX9Settings.ini here):
[code]
DefVSConst1 = 190
DefPSConst1 = 190
[KEY3]
Key = 114
Presets = 3;4;
Type = 1
[PRES3]
Const1 = 0x3f800000
[PRES4]
Const1 = 0x00000000
UseByDef = true
[/code]
The shader always treats c190.x as 0, even if I make both presets have the "0x3f800000" value. Changing the other part of the "if_eq" in the shader (if_eq r27.x, c190.x) (where I r27.x refers to a constant defined in the shader that can be 0 or 1, for testing), changed the effects ingame correctly. The "Const1" value isn't reaching the shader correctly.
Is this normal?[/quote]
Not normal, should work.
Try a different constant register. It might be a conflict with a game shader use. Might also be worth doing a full dump of all shaders to see if c190 is in use.
I'm seem to also vaguely remember there is some sort of conflict with having both VS and PS use the same register, or maybe on specific games it doesn't always work. Might be worth trying a forum search for something like that.
masterotaku said:About my Blazblue problems and after trying a few things, I've noticed something: my "8EF88061.txt.ps" file isn't receiving constants (that I want to use with a hotkey). Even if I write (important parts of DX9Settings.ini here):
DefVSConst1 = 190
DefPSConst1 = 190
[KEY3]
Key = 114
Presets = 3;4;
Type = 1
[PRES3]
Const1 = 0x3f800000
[PRES4]
Const1 = 0x00000000
UseByDef = true
The shader always treats c190.x as 0, even if I make both presets have the "0x3f800000" value. Changing the other part of the "if_eq" in the shader (if_eq r27.x, c190.x) (where I r27.x refers to a constant defined in the shader that can be 0 or 1, for testing), changed the effects ingame correctly. The "Const1" value isn't reaching the shader correctly.
Is this normal?
Not normal, should work.
Try a different constant register. It might be a conflict with a game shader use. Might also be worth doing a full dump of all shaders to see if c190 is in use.
I'm seem to also vaguely remember there is some sort of conflict with having both VS and PS use the same register, or maybe on specific games it doesn't always work. Might be worth trying a forum search for something like that.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
I'm glad because I couldn't get texture filtering to work, because with that "8EF88061.txt.ps" file, these values didn't work in DX9Settings.ini:
- [VS91D12237]
- [VS8EF88061]
- [PS8EF88061] (Using a PS for texture filtering isn't even documented. Probably can't be done, I guess).
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com
"8EF88061.txt.ps" (relevant part inside the second "else"):
"DX9Setting.ini":
Download of this fix: https://www.dropbox.com/s/fvze23rzce1os5x/Blazblue_CT_3D_Vision_fix_HUD_tests.7z?dl=0
Using "DefPSConst1 = 233" instead also didn't work, just in case.
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com
Maybe somebody can help me with this Compute Shader;) I know DarkStarSword is the MASTER of CS ^_^, but there are others who might know what to do:)
In short this a CS from Battlefield 1 (Frostbyte3 Engine). We currently have fixes for:
- Dragon Age: Inquisition
- Battlefield 4
- Battlefield Hard Line
- Star Wars: Battlefront
Yet, in all these fixes the CS are disabled!
I did manage to fix it in 3D;) But there is one tiny problem. The damn TILES. Basically the fix works but at certain angles and distances...
So can anyone help me out?:)
If we manage to fix it once and for all then we can get all the other games fixed as well, as is pretty much the same logic;) and the shaders are like 90% the same or alike:)
Thank you in advance!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
Setting handling=skip doesn't seem to work:-s
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
I bet here is where we need to extend the tiles or something similar...
Not really a clue on what to try out...Hacking around it just crashes the game though:-s
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
No way to use the ShaderOverride 'skip' for ComputeShader. Just took a look at the code, and CS are not in that code sequence. They were there earlier, but I think that DarkStarSword pulled it out because if we skip CS, that typically leads to a crash.
Best bet to emulate skip would be to try to get an idea of how to disable a CS without crashing. Would require looking at the code and deciding what might be skippable. Pretty unclear though.
Example for these tiled lighting shaders. If we skip the CS, that might make the number of tiles=0, which would could easily destroy some later CS or PS that is not expecting a no-tiles scenario. Putting all the calculations on the GPU is bad for us because the tools for GPU debugging are weak (not just 3Dmigoto).
For the tiled lighting problem, you might ping DarkStarSword by PM. I know he's really booked up and probably has no time to read the forum. PM's send an email notification.
Also, you have probably already looked, but if not, check his github repo (not 3Dmigoto) for examples of CS fixes. https://github.com/DarkStarSword/3d-fixes
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
Cheers Bo3b!
I noticed that and had a feeling this was the case, so I went ahead and started "chopping" at the CS until I actually managed to disable it. (It has some secondary consequences of course - like you said - but no crashing, just missing other stuff).
Definitely NOT ideal but the damn FrostByte 3 Computes still elude me :)) (The DAMN tiles... Fixing the rendering wasn't that bad, but making it "stick" on all angles is a PAIN ^_^ )
Thanks, I'll try to PM DSS about it and see if he has some time for me;)
Thank you again!
PS: Also if somebody is interested in a FULL ASM matrix inverse code let me know (I had some issues with the HLSL shader bound in ASM so I decided to make one in ASM;) Well FXC did most of the work, I just helped it out ^_^).
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
Heh! I had actually built one in ASM for DHR for JC3. He wound up not needing it because he got the ASM linked HLSL working.
However, I would be interested in posting your ASM code to the Wiki for others to reference. The version I made had pieces culled because of fxc optimizations, which sounds like maybe you avoided.
Please post to wiki.bo3b.net, or post here and I can add it. Adding a new page for matrix inversions of different techniques would be good.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
The injected inverse matrix + fixing code from Mirror Edge i send you is not working in the CS Tile Lights from B1?. Is easier to use the injected one.
If don't work will make this a more strange case.
Those CS Lights have a very strange behavior....i almost sure is a profile/driver stuff....I already try i few thing with Mirror Edge, but in some spot and some angles only renders in one eye.
@bo3b
Was for Mirror Edge Catalyst....that inverted matrix also works fine! but is easier to use the injected one, because the input and outputs are more evident/clear.
MY WEB
Helix Mod - Making 3D Better
My 3D Screenshot Gallery
Like my fixes? you can donate to Paypal: dhr.donation@gmail.com
I'll paste it here so you can put it on wiki.bo3b.net where you believe is best place to have it;)
This code is with all the optimisations removed. So, is as clean as the original HLSL source.
Thus, the original HLSL code looks like this:
In ASM the exact same code looks like this:
If you follow the ASM code and compare it with the HLSL, you can see exactly that is the same without any weird optimizations inside ;) (It also explains why the ASM code is soo long ^_^).
Also, this code should be pasted at the beginning of main() to avoid overwriting any of the existed registers. In short it should be the first thing that get executed in that shader (I know it is obvious for you, but other readers might be confused).
The code works as I am currently using it in BF1 fix.
Thank you!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
I haven't used 3dMigoto yet but I read in this thread that matrix inversion has to be done manually in the shader code. The commonly used code seems to be based on Cramer's rule. I think that some code based on the Gauss-Jordan algorithm should be much faster on a GPU because it exploits the vector characteristic of the registers, especially if the code is executed for each pixel/vertex.
Has anybody tried that out yet, or is there some reason for the used code ?
For a (not debugged) code example see below.
My 3D fixes with Helixmod for the Risen series on GitHub
Bo3b's School for Shaderhackers - starting point for your first 3D fix
1) The HLSL for Matrix Inversion works;) but 3DMigoto just decides to make a low beep when I put in the HASH of the Compute shader. So it doesn't work with the game. Doesn't like that HASH for some reason no matter what I do.
Thus, I had to put the matrix inverse in the shader code :)
2)The CS not working IS NOT A DRIVER ISSUE. Actually is the CS shaders that needs fixing! I haven't made FULL fix for them, but I see where things go wrong! Believe it or not, but the driver is actually working as it should;) It only affects the LEFT eye. What we know is that for Left eye we say "(-1) * separation". I expect that the CS doesn't like the NEGATIVE value of the position and just discards it or does weird thing with it!
@DHR:
I managed to hack it to some degree, but is not a proper fix;)
Sadly, I don't know what much about Compute shaders in 3D Vision. I know DSS is the expert as he always helped me before with them. If you want we can try to see what is wrong, but without a proper understand I don't think we can come up with the true formula;)
@mx-2:
- Thanks for that code. Didn't try it yet, but it definitely makes sense;) Big thanks for your reply!!!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
The shader always treats c190.x as 0, even if I make both presets have the "0x3f800000" value. Changing the other part of the "if_eq" in the shader (if_eq r27.x, c190.x) (where I r27.x refers to a constant defined in the shader that can be 0 or 1, for testing), changed the effects ingame correctly. The "Const1" value isn't reaching the shader correctly.
Is this normal?
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com
In any frostbyte 3 game it takes 15 minutes to load a game... I am currently only using "export_hlsl=2" option.
Nomally it dumps around 20k shaders when a level loads...
If you can think of anything I can do to decrease this insane time, please let me know!
Cheers!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
Not normal, should work.
Try a different constant register. It might be a conflict with a game shader use. Might also be worth doing a full dump of all shaders to see if c190 is in use.
I'm seem to also vaguely remember there is some sort of conflict with having both VS and PS use the same register, or maybe on specific games it doesn't always work. Might be worth trying a forum search for something like that.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers