Bo3b's School For Shaderhackers
  66 / 88    
I just want to say that the idea from the Helixmod feature list guide about analyzing the depth of one register in the shader worked. At least for the effect I wanted (Hakumen symbols) without breaking bloom. Some other HUD things were affected, so I'll have to fine tune it to try distinguishing between that symbol, HUD and bloom. I'm glad because I couldn't get texture filtering to work, because with that "8EF88061.txt.ps" file, these values didn't work in DX9Settings.ini: - [VS91D12237] - [VS8EF88061] - [PS8EF88061] (Using a PS for texture filtering isn't even documented. Probably can't be done, I guess).
I just want to say that the idea from the Helixmod feature list guide about analyzing the depth of one register in the shader worked. At least for the effect I wanted (Hakumen symbols) without breaking bloom. Some other HUD things were affected, so I'll have to fine tune it to try distinguishing between that symbol, HUD and bloom.

I'm glad because I couldn't get texture filtering to work, because with that "8EF88061.txt.ps" file, these values didn't work in DX9Settings.ini:

- [VS91D12237]
- [VS8EF88061]
- [PS8EF88061] (Using a PS for texture filtering isn't even documented. Probably can't be done, I guess).

CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com

Posted 10/18/2016 08:24 PM   
A preset hotkey I made for changing a constant doesn't work! In DX9Setting.ini, changing the preset that has the "UseByDef = true" works, but ingame the hotkey doesn't change the constant value. "8EF88061.txt.ps" (relevant part inside the second "else"): [code] //HUD, most of it. Blue color in second "tips". VS. // // Generated by Microsoft (R) D3DX9 Shader Compiler 9.15.779.0000 // // Parameters: // // float fScreenHeigft; // float fScreenWidth; // // // Registers: // // Name Reg Size // ------------- ----- ---- // fScreenWidth c0 1 // fScreenHeigft c1 1 // // // Default values: // // fScreenWidth // c0 = { 0, 0, 0, 0 }; // // fScreenHeigft // c1 = { 0, 0, 0, 0 }; // //preshader //rcp c3.x, c0.x //rcp c4.x, c1.x // approximately 2 instructions used // // Generated by Microsoft (R) D3DX9 Shader Compiler 9.15.779.0000 // // Parameters: // // float4x4 matWorld; // // // Registers: // // Name Reg Size // ------------ ----- ---- // matWorld c0 3 // // // Default values: // // matWorld // c0 = { 0, 0, 0, 0 }; // c1 = { 0, 0, 0, 0 }; // c2 = { 0, 0, 0, 0 }; // vs_3_0 def c5, 1, 0, -0.5, -1 def c220, 0.40, 0, 0.0625, 1 def c201, 1280, 768, 0, 0.003125 def c202, 0, 0.003, 0.001, 0.99069 dcl_2d s0 dcl_position o10 dcl_texcoord o0.xy dcl_color o8 dcl_position v0 dcl_texcoord v1 dcl_color v2 rcp r15.x, c201.x rcp r16.x, c201.y mad r0, v0.xyzx, c5.xxxy, c5.yyyx dp4 o10.z, r0, c2 mov r25.w, r0.w dp4 r1.w, r0, c0 dp4 r0.w, r0, c1 add r1.w, r1.w, c5.z add r2.w, r1.w, r1.w add r1.w, r0.w, c5.z mov r0.xw, c5 //mov r25, c251 //if_eq r25.x, c5.x mov r26.x, c233.x //Depth compare. >0.99069 for Hakumen. ==0 for bloom and most of the HUD. if_gt r0.z, c202.w mad r10.x, r2.w, r15.x, r0.w texldl r11, c220.z, s0 mul r11.x, r11.x, c220.w mul r12.x, r11.x, r11.y mul r12.x, r12.x, c201.w add r11.x, r11.x, -r12.x add r10.x, r10.x, r11.x mov o10.x, r10.x else if_gt r26.x, c202.x //Investigate. mad r10.x, r2.w, r15.x, r0.w texldl r11, c220.z, s0 mul r11.x, r11.x, c220.w mul r12.x, r11.x, r11.y mul r12.x, r12.x, c201.w add r11.x, r11.x, -r12.x add r10.x, r10.x, r11.x mov o10.x, r10.x else mad o10.x, r2.w, r15.x, r0.w endif endif add r0.w, r1.w, r1.w mad o10.y, r0.w, -r16.x, r0.x mov o10.w, c5.x mul o0.xy, v1, c5.xwzw mov o8, v2 // approximately 14 instruction slots used [/code] "DX9Setting.ini": [code] [General] UseRenderedShaders = true PresetsKeysList = 1;2;3; //bCalcTexCRCatStart = true //UseExtInterfaceOnly = true DefVSConst1 = 233 //DefPSViewSizeConst = 210 //DefVSViewSizeConst = 254 //SkipSetScissorRect = true [KEY1] Key = 112 Presets = 1; Type = 1 [KEY2] Key = 113 Presets = 2; Type = 1 [KEY3] Key = 114 Presets = 3;4;5;6;7;8; Type = 1 [PRES1] SaveSepSettings = true UseSepSettings = true Convergence = 0x42fbb98d Separation = 0x42c80000 [PRES2] SaveSepSettings = true UseSepSettings = true Convergence = 0x439f9d59 Separation = 0x42c80000 [PRES3] Const1 = 0x3f19999a [PRES4] Const1 = 0x3f4ccccd [PRES5] Const1 = 0x3f666666 [PRES6] Const1 = 0x3f7851ec [PRES7] Const1 = 0x3f800000 [PRES8] Const1 = 0x00000000 UseByDef = true [/code] Download of this fix: https://www.dropbox.com/s/fvze23rzce1os5x/Blazblue_CT_3D_Vision_fix_HUD_tests.7z?dl=0 Using "DefPSConst1 = 233" instead also didn't work, just in case.
A preset hotkey I made for changing a constant doesn't work! In DX9Setting.ini, changing the preset that has the "UseByDef = true" works, but ingame the hotkey doesn't change the constant value.

"8EF88061.txt.ps" (relevant part inside the second "else"):

//HUD, most of it. Blue color in second "tips". VS.
//
// Generated by Microsoft (R) D3DX9 Shader Compiler 9.15.779.0000
//
// Parameters:
//
// float fScreenHeigft;
// float fScreenWidth;
//
//
// Registers:
//
// Name Reg Size
// ------------- ----- ----
// fScreenWidth c0 1
// fScreenHeigft c1 1
//
//
// Default values:
//
// fScreenWidth
// c0 = { 0, 0, 0, 0 };
//
// fScreenHeigft
// c1 = { 0, 0, 0, 0 };
//

//preshader
//rcp c3.x, c0.x
//rcp c4.x, c1.x

// approximately 2 instructions used
//
// Generated by Microsoft (R) D3DX9 Shader Compiler 9.15.779.0000
//
// Parameters:
//
// float4x4 matWorld;
//
//
// Registers:
//
// Name Reg Size
// ------------ ----- ----
// matWorld c0 3
//
//
// Default values:
//
// matWorld
// c0 = { 0, 0, 0, 0 };
// c1 = { 0, 0, 0, 0 };
// c2 = { 0, 0, 0, 0 };
//

vs_3_0
def c5, 1, 0, -0.5, -1
def c220, 0.40, 0, 0.0625, 1
def c201, 1280, 768, 0, 0.003125
def c202, 0, 0.003, 0.001, 0.99069
dcl_2d s0
dcl_position o10
dcl_texcoord o0.xy
dcl_color o8
dcl_position v0
dcl_texcoord v1
dcl_color v2
rcp r15.x, c201.x
rcp r16.x, c201.y
mad r0, v0.xyzx, c5.xxxy, c5.yyyx
dp4 o10.z, r0, c2
mov r25.w, r0.w
dp4 r1.w, r0, c0
dp4 r0.w, r0, c1
add r1.w, r1.w, c5.z
add r2.w, r1.w, r1.w
add r1.w, r0.w, c5.z
mov r0.xw, c5

//mov r25, c251

//if_eq r25.x, c5.x

mov r26.x, c233.x
//Depth compare. >0.99069 for Hakumen. ==0 for bloom and most of the HUD.
if_gt r0.z, c202.w
mad r10.x, r2.w, r15.x, r0.w
texldl r11, c220.z, s0
mul r11.x, r11.x, c220.w
mul r12.x, r11.x, r11.y
mul r12.x, r12.x, c201.w
add r11.x, r11.x, -r12.x
add r10.x, r10.x, r11.x
mov o10.x, r10.x
else
if_gt r26.x, c202.x
//Investigate.
mad r10.x, r2.w, r15.x, r0.w
texldl r11, c220.z, s0
mul r11.x, r11.x, c220.w
mul r12.x, r11.x, r11.y
mul r12.x, r12.x, c201.w
add r11.x, r11.x, -r12.x
add r10.x, r10.x, r11.x
mov o10.x, r10.x
else
mad o10.x, r2.w, r15.x, r0.w
endif
endif
add r0.w, r1.w, r1.w
mad o10.y, r0.w, -r16.x, r0.x
mov o10.w, c5.x
mul o0.xy, v1, c5.xwzw
mov o8, v2

// approximately 14 instruction slots used


"DX9Setting.ini":

[General]
UseRenderedShaders = true
PresetsKeysList = 1;2;3;
//bCalcTexCRCatStart = true
//UseExtInterfaceOnly = true
DefVSConst1 = 233
//DefPSViewSizeConst = 210
//DefVSViewSizeConst = 254
//SkipSetScissorRect = true


[KEY1]
Key = 112
Presets = 1;
Type = 1


[KEY2]
Key = 113
Presets = 2;
Type = 1

[KEY3]
Key = 114
Presets = 3;4;5;6;7;8;
Type = 1


[PRES1]
SaveSepSettings = true
UseSepSettings = true
Convergence = 0x42fbb98d
Separation = 0x42c80000


[PRES2]
SaveSepSettings = true
UseSepSettings = true
Convergence = 0x439f9d59
Separation = 0x42c80000

[PRES3]
Const1 = 0x3f19999a


[PRES4]
Const1 = 0x3f4ccccd


[PRES5]
Const1 = 0x3f666666


[PRES6]
Const1 = 0x3f7851ec


[PRES7]
Const1 = 0x3f800000


[PRES8]
Const1 = 0x00000000
UseByDef = true


Download of this fix: https://www.dropbox.com/s/fvze23rzce1os5x/Blazblue_CT_3D_Vision_fix_HUD_tests.7z?dl=0


Using "DefPSConst1 = 233" instead also didn't work, just in case.

CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com

Posted 10/18/2016 09:57 PM   
Hi guys;) Maybe somebody can help me with this Compute Shader;) I know DarkStarSword is the MASTER of CS ^_^, but there are others who might know what to do:) In short this a CS from Battlefield 1 (Frostbyte3 Engine). We currently have fixes for: - Dragon Age: Inquisition - Battlefield 4 - Battlefield Hard Line - Star Wars: Battlefront Yet, in all these fixes the CS are disabled! I did manage to fix it in 3D;) But there is one tiny problem. The damn TILES. Basically the fix works but at certain angles and distances... So can anyone help me out?:) [code] // Global Illumination // RUNs ONCE per eye...grrr // // Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384 // // using 3Dmigoto v1.2.43 on Sat Oct 22 17:07:53 2016 // // // Buffer Definitions: // // cbuffer cbLightInfo // { // // struct LocalIBLLightInfo // { // // float3 pos; // Offset: 0 // float sqrAttenuationRadius; // Offset: 12 // float4x4 invTransform; // Offset: 16 // float3 extend; // Offset: 80 // float textureIndex; // Offset: 92 // float3 sideFadePositive; // Offset: 96 // float influenceFadeDistance; // Offset: 108 // float3 sideFadeNegative; // Offset: 112 // float IBLType; // Offset: 124 // float3 localOffset; // Offset: 128 // float influenceExpandDistance; // Offset: 140 // float3 influenceFadeNormal; // Offset: 144 // float skipSkyVisibilityAsAO; // Offset: 156 // float skipSkyVisibilityAsMask; // Offset: 160 // float3 unused; // Offset: 164 // // } g_lightInfoLocalIBL[128]; // Offset: 0 Size: 22528 // // } // // cbuffer cb0 // { // // float4x4 invViewProjectionMatrix; // Offset: 0 Size: 64 // float4 g_exposureMultipliers; // Offset: 64 Size: 16 // float localIblMipmapBias; // Offset: 80 Size: 4 // float screenAspectRatio; // Offset: 84 Size: 4 [unused] // float2 invResolution; // Offset: 88 Size: 8 // float4 shadowMapSizeAndInvSize; // Offset: 96 Size: 16 [unused] // uint forceSplitLighting; // Offset: 112 Size: 4 [unused] // uint sssScatteringEnables; // Offset: 116 Size: 4 [unused] // float volumetricShadowmapHalfTexelOffset;// Offset: 120 Size: 4 [unused] // float volumetricShadowmapOneMinusHalfTexelOffset;// Offset: 124 Size: 4 [unused] // float volumetricShadowmapInvMaxCount;// Offset: 128 Size: 4 [unused] // float dynamicAOFactor; // Offset: 132 Size: 4 [unused] // uint tileCountX; // Offset: 136 Size: 4 // uint pad1; // Offset: 140 Size: 4 [unused] // float4x3 g_normalBasisTransforms[6];// Offset: 144 Size: 288 // // } // // Resource bind info for g_lightCullInput // { // // uint4 $Element; // Offset: 0 Size: 16 // // } // // Resource bind info for g_lightIndexInput // { // // uint $Element; // Offset: 0 Size: 4 // // } // // Resource bind info for g_compactTileGridBuffer // { // // uint $Element; // Offset: 0 Size: 4 // // } // // // Resource Bindings: // // Name Type Format Dim Slot Elements // ------------------------------ ---------- ------- ----------- ---- -------- // g_linearSampler sampler NA NA 0 1 // g_linearMipmapSampler sampler NA NA 1 1 // g_gbufferTexture0 texture float4 2d 0 1 // g_gbufferTexture1 texture float4 2d 1 1 // g_gbufferTexture2 texture float4 2d 2 1 // g_depthTexture texture float 2d 6 1 // g_specularLocalIBLsTexture texture float4 cubearray 7 1 // g_preIntegratedFGTexture texture float4 2d 8 1 // g_diffuseOcclusionTexture texture float 2d 10 1 // g_lightCullInput texture struct r/o 19 1 // g_lightIndexInput texture struct r/o 20 1 // g_compactTileGridBuffer texture struct r/o 25 1 // g_outputTexture0 UAV float4 2d 0 1 // cb0 cbuffer NA NA 0 1 // cbLightInfo cbuffer NA NA 1 1 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // no Input // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // no Output cs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb1[1408], dynamicIndexed dcl_constantbuffer cb0[27], dynamicIndexed dcl_sampler s0, mode_default dcl_sampler s1, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t1 dcl_resource_texture2d (float,float,float,float) t2 dcl_resource_texture2d (float,float,float,float) t6 dcl_resource_texturecubearray (float,float,float,float) t7 dcl_resource_texture2d (float,float,float,float) t8 dcl_resource_texture2d (float,float,float,float) t10 dcl_resource_structured t19, 16 dcl_resource_structured t20, 4 dcl_resource_structured t25, 4 dcl_uav_typed_texture2d (float,float,float,float) u0 dcl_input vThreadIDInGroupFlattened dcl_input vThreadGroupID.x dcl_input vThreadIDInGroup.xy dcl_temps 60 // 3DMigoto StereoParams: dcl_resource_texture2d (float,float,float,float) t120 dcl_resource_texture2d (float,float,float,float) t125 ld_indexable(texture2d)(float,float,float,float) r41.xyzw, l(0, 0, 0, 0), t120.xyzw ld_indexable(texture2d)(float,float,float,float) r40.xyzw, l(0, 0, 0, 0), t125.xyzw // Inverse // ASM code for matrix inverse is here but is too damn long to paste // store results for later use mov r50.xyzw, r0.xyzw mov r51.xyzw, r1.xyzw mov r52.xyzw, r2.xyzw mov r53.xyzw, r3.xyzw dcl_tgsm_raw g0, 4 dcl_tgsm_raw g1, 4 dcl_thread_group 16, 16, 1 ld_structured_indexable(structured_buffer, stride=4)(mixed,mixed,mixed,mixed) r0.x, vThreadGroupID.x, l(0), t25.xxxx ushr r1.x, r0.x, l(16) and r1.yzw, r0.xxxx, l(0, 0x0000ffff, 0x0000ffff, 0x0000ffff) imad r0.xyzw, r1.xyzw, l(16, 16, 16, 16), vThreadIDInGroup.xyyy if_z vThreadIDInGroupFlattened.x imad r1.x, r1.w, cb0[8].z, r1.x ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r1.xyzw, r1.x, l(0), t19.xyzw ushr r2.xy, r1.yzyy, l(16, 16, 0, 0) and r1.yzw, r1.yyzw, l(0, 0x0000ffff, 0x0000ffff, 0x0000ffff) store_raw g1.x, l(0), r1.w iadd r1.y, r2.x, r1.y iadd r1.y, r1.z, r1.y iadd r1.y, r2.y, r1.y iadd r1.x, r1.y, r1.x store_raw g0.x, l(0), r1.x endif sync_g_t utof r1.xy, r0.xwxx add r1.zw, r1.xxxy, l(0.000000, 0.000000, 0.500000, 0.500000) mul r1.zw, r1.zzzw, cb0[5].zzzw ftoi r2.xy, r1.xyxx mov r2.zw, l(0,0,0,0) ld_indexable(texture2d)(float,float,float,float) r3.xyzw, r2.xyww, t0.xyzw ld_indexable(texture2d)(float,float,float,float) r4.xyzw, r2.xyww, t1.xyzw ld_indexable(texture2d)(float,float,float,float) r5.xyz, r2.xyww, t2.yzwx ld_indexable(texture2d)(float,float,float,float) r2.z, r2.xyzw, t6.yzxw mul r1.x, r4.w, l(6.000000) round_ne r1.x, r1.x ftou r1.x, r1.x mad r6.xy, r3.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), l(-1.000000, -1.000000, 0.000000, 0.000000) dp2 r1.y, r6.xyxx, r6.xyxx min r1.y, r1.y, l(1.000000) add r1.y, -r1.y, l(1.000000) sqrt r6.z, r1.y imul null, r1.x, r1.x, l(3) dp3 r7.x, r6.xyzx, cb0[r1.x + 9].xyzx dp3 r7.y, r6.xyzx, cb0[r1.x + 10].xyzx dp3 r7.z, r6.xyzx, cb0[r1.x + 11].xyzx add r1.x, -r3.z, l(1.000000) mul r1.y, r3.w, l(3.000000) round_ne r1.y, r1.y ftoi r1.y, r1.y ieq r1.y, r1.y, l(1) movc r1.y, r1.y, l(0), r5.x mul r3.x, r5.y, r5.y mul r3.y, r3.x, l(0.160000) mad r3.xzw, -r3.xxxx, l(0.160000, 0.000000, 0.160000, 0.160000), r4.xxyz mad r3.xyz, r1.yyyy, r3.xzwx, r3.yyyy dp3 r1.y, r3.xyzx, l(0.330000, 0.330000, 0.330000, 0.000000) mul_sat r1.y, r1.y, l(50.000000) mul r4.y, r1.x, r1.x mad r4.zw, r1.zzzw, l(0.000000, 0.000000, 2.000000, 2.000000), l(0.000000, 0.000000, -1.000000, -1.000000) mul r2.xy, r4.zwzz, l(1.000000, -1.000000, 0.000000, 0.000000) mov r2.w, l(1.000000) // Original code dp4 r6.x, r2.xyzw, cb0[0].xyzw dp4 r6.y, r2.xyzw, cb0[1].xyzw dp4 r6.z, r2.xyzw, cb0[2].xyzw dp4 r2.x, r2.xyzw, cb0[3].xyzw div r2.x, l(1.000000, 1.000000, 1.000000, 1.000000), r2.x //Fix // Multiply by normal matrix mov r38.xyz, r6.xyz mov r38.w, l(1.000000) dp4 r39.x, r38.xyzw, r50.xyzw dp4 r39.y, r38.xyzw, r51.xyzw dp4 r39.z, r38.xyzw, r52.xyzw dp4 r39.w, r38.xyzw, r53.xyzw // r39.x -= stereo.x * iniParams.w; mul r40.w, r40.x, l(3.0) add r39.x, r39.x, -r40.w // Apply inverse again dp4 r38.x, r39.xyzw, cb0[0].xyzw dp4 r38.y, r39.xyzw, cb0[1].xyzw dp4 r38.z, r39.xyzw, cb0[2].xyzw dp4 r38.w, r39.xyzw, cb0[3].xyzw mov r6.xyz, r38.xyzw //End Fix mul r8.xyz, r2.xxxx, r6.xyzx dp3 r2.y, -r8.xyzx, -r8.xyzx rsq r2.y, r2.y mul r2.yzw, r2.yyyy, -r8.xxyz dp3 r3.w, -r2.yzwy, r7.xyzx add r3.w, r3.w, r3.w mad r5.xyw, r7.xyxz, -r3.wwww, -r2.yzyw dp3_sat r4.x, r7.xyzx, r2.yzwy sample_l_indexable(texture2d)(float,float,float,float) r1.z, r1.zwzz, t10.yzxw, s0, l(0.000000) ld_raw r1.w, l(0), g0.xxxx sample_l_indexable(texture2d)(float,float,float,float) r2.yz, r4.xyxx, t8.zxyw, s0, l(0.000000) mul r1.y, r1.y, r2.z mad r2.yzw, r3.xxyz, r2.yyyy, r1.yyyy ld_raw r1.y, l(0), g1.xxxx iadd r1.y, r1.w, r1.y mad r3.x, -r1.x, r1.x, l(1.000000) max r3.y, r3.x, l(0.000000) sqrt r3.y, r3.y mad r3.y, r1.x, r1.x, r3.y mul r3.x, r3.y, r3.x add r3.yzw, -r7.xxyz, r5.xxyw mad r3.xyz, r3.xxxx, r3.yzwy, r7.xyzx dp3 r3.w, r3.xyzx, r3.xyzx mul r4.z, r3.w, l(4.000000) eq r4.w, r3.w, l(0.000000) add r3.w, r3.w, r3.w min r5.x, r4.y, l(1.000000) mad r5.y, r5.x, l(-2.000000), l(3.000000) mul r5.x, r5.x, r5.x mul r5.x, r5.x, r5.y mad r5.y, r4.y, l(-16.000000), l(-1.000000) exp r5.y, r5.y add r5.w, -r5.z, l(1.000000) mov r8.w, l(1.000000) mov r9.xyz, l(0,0,0,0) mov r6.w, l(0) mov r7.w, r1.w loop ult r9.w, r7.w, r1.y lt r10.x, r6.w, l(1.000000) and r9.w, r9.w, r10.x breakc_z r9.w ld_structured_indexable(structured_buffer, stride=4)(mixed,mixed,mixed,mixed) r9.w, r7.w, l(0), t20.xxxx imul null, r9.w, r9.w, l(11) eq r10.x, l(0.000000), cb1[r9.w + 7].w if_nz r10.x mad r10.xyz, r6.xyzx, r2.xxxx, -cb1[r9.w + 0].xyzx dp3 r10.w, r3.xyzx, r10.xyzx add r11.x, r10.w, r10.w dp3 r11.y, r10.xyzx, r10.xyzx mad r11.z, -cb1[r9.w + 5].x, cb1[r9.w + 5].x, r11.y mul r11.z, r4.z, r11.z mad r11.x, r11.x, r11.x, -r11.z lt r11.z, r11.x, l(0.000000) or r11.z, r4.w, r11.z if_nz r11.z mov r11.w, l(0) else sqrt r11.x, r11.x mad r10.w, -r10.w, l(2.000000), r11.x div r11.w, r10.w, r3.w endif if_z r11.z mad r12.xyz, r11.wwww, r3.xyzx, r8.xyzx add r12.xyz, r12.xyzx, -cb1[r9.w + 0].xyzx dp3 r10.w, r12.xyzx, r12.xyzx sqrt r10.w, r10.w mul r11.x, r4.y, r11.w div r10.w, r11.x, r10.w mad r11.x, r1.x, r1.x, -r10.w mad r10.w, r5.x, r11.x, r10.w add r11.xzw, r3.xxyz, -r12.xxyz mad r12.xyz, r5.xxxx, r11.xzwx, r12.xyzx mad r10.w, r10.w, cb0[5].x, l(1.000000) log r10.w, r10.w mov r12.w, cb1[r9.w + 5].w sample_l_indexable(texturecubearray)(float,float,float,float) r12.xyzw, r12.xyzw, t7.xyzw, s1, r10.w mul r11.xzw, r2.yyzw, r12.xxyz max r10.w, r5.z, cb1[r9.w + 9].w min r10.w, r1.z, r10.w add r12.x, r4.x, r10.w log r12.x, r12.x mul r12.x, r5.y, r12.x exp r12.x, r12.x add r10.w, r10.w, r12.x add_sat r10.w, r10.w, l(-1.000000) mul r11.xzw, r10.wwww, r11.xxzw max r10.w, r5.w, cb1[r9.w + 10].x mul r10.w, r10.w, r12.w sqrt r12.x, r11.y add r12.y, cb1[r9.w + 5].x, cb1[r9.w + 8].w add r12.y, -r12.x, r12.y max r12.z, l(0.000100), cb1[r9.w + 6].w div_sat r12.y, r12.y, r12.z add r12.x, r12.x, -cb1[r9.w + 5].x add_sat r12.x, r12.x, cb1[r9.w + 9].x rsq r11.y, r11.y mul r10.xyz, r10.xyzx, r11.yyyy dp3 r10.x, r7.xyzx, r10.xyzx mad_sat r10.x, r10.x, l(-2.500000), l(1.500000) lt r10.y, l(0.000000), r12.x movc r10.x, r10.y, r10.x, l(1.000000) mul r10.x, r10.x, r12.y mul r10.y, r10.x, r10.x mad r10.x, -r10.x, l(2.000000), l(3.000000) mul r10.x, r10.x, r10.y mul r10.x, r10.x, r10.w else mov r11.xzw, l(0,0,0,0) mov r10.x, l(0) endif else dp4 r12.x, r8.xyzw, cb1[r9.w + 1].xyzw dp4 r12.y, r8.xyzw, cb1[r9.w + 2].xyzw dp4 r12.z, r8.xyzw, cb1[r9.w + 3].xyzw dp3 r13.x, r3.xyzx, cb1[r9.w + 1].xyzx dp3 r13.y, r3.xyzx, cb1[r9.w + 2].xyzx dp3 r13.z, r3.xyzx, cb1[r9.w + 3].xyzx div r10.yzw, l(1.000000, 1.000000, 1.000000, 1.000000), r13.xxyz add r14.xyz, -r12.xyzx, -cb1[r9.w + 5].xyzx mul r14.xyz, r10.yzwy, r14.xyzx add r15.xyz, -r12.xyzx, cb1[r9.w + 5].xyzx mul r10.yzw, r10.yyzw, r15.xxyz min r15.xyz, r10.yzwy, r14.xyzx max r10.yzw, r10.yyzw, r14.xxyz max r11.y, r15.z, r15.y max r11.y, r11.y, r15.x min r10.y, r10.z, r10.y min r10.y, r10.w, r10.y lt r10.z, r11.y, r10.y if_nz r10.z mad r14.xyz, r10.yyyy, r13.xyzx, r12.xyzx add r14.xyz, r14.xyzx, -cb1[r9.w + 8].xyzx dp3 r10.z, r14.xyzx, r14.xyzx sqrt r10.z, r10.z mul r10.y, r4.y, r10.y div r10.y, r10.y, r10.z mad r10.z, r1.x, r1.x, -r10.y mad r10.y, r5.x, r10.z, r10.y add r13.xyz, r13.xyzx, -r14.xyzx mad r13.xyz, r5.xxxx, r13.xyzx, r14.xyzx mad r10.y, r10.y, cb0[5].x, l(1.000000) log r10.y, r10.y mov r13.w, cb1[r9.w + 5].w sample_l_indexable(texturecubearray)(float,float,float,float) r14.xyzw, r13.xyzw, t7.xyzw, s1, r10.y mul r10.yzw, r2.yyzw, r14.xxyz max r11.y, r5.z, cb1[r9.w + 9].w min r11.y, r1.z, r11.y add r12.w, r4.x, r11.y log r12.w, r12.w mul r12.w, r5.y, r12.w exp r12.w, r12.w add r11.y, r11.y, r12.w add_sat r11.y, r11.y, l(-1.000000) mul r11.xzw, r10.yyzw, r11.yyyy max r10.y, r5.w, cb1[r9.w + 10].x mul r10.y, r10.y, r14.w add r14.xyz, cb1[r9.w + 5].xyzx, cb1[r9.w + 8].wwww add r15.xyz, -r14.xyzx, cb1[r9.w + 6].wwww add r16.xyz, r14.xyzx, -cb1[r9.w + 6].wwww lt r17.xyz, r12.xyzx, r15.xyzx add r15.xyz, -r12.xyzx, r15.xyzx mul r15.xyz, r15.xyzx, r15.xyzx and r15.xyz, r15.xyzx, r17.xyzx lt r17.xyz, r16.xyzx, r12.xyzx add r16.xyz, r12.xyzx, -r16.xyzx mul r16.xyz, r16.xyzx, r16.xyzx and r16.xyz, r16.xyzx, r17.xyzx dp3 r10.z, r15.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000) dp3 r10.w, r16.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000) add r10.z, r10.w, r10.z sqrt r10.z, r10.z max r10.w, l(0.000100), cb1[r9.w + 6].w div r10.z, r10.z, r10.w min r10.z, r10.z, l(1.000000) add r10.z, -r10.z, l(1.000000) add r12.xyz, |r12.xyzx|, -r14.xyzx add_sat r12.xyz, r12.xyzx, cb1[r9.w + 9].xyzx mad r14.xyz, r6.xyzx, r2.xxxx, -cb1[r9.w + 0].xyzx dp3 r10.w, r14.xyzx, r14.xyzx rsq r10.w, r10.w mul r14.xyz, r10.wwww, r14.xyzx dp3 r10.w, r7.xyzx, r14.xyzx mad_sat r10.w, r10.w, l(-2.500000), l(1.500000) dp3 r11.y, r12.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000) lt r11.y, l(0.000000), r11.y movc r10.w, r11.y, r10.w, l(1.000000) mul r10.z, r10.w, r10.z dp3 r10.w, r13.xyzx, r13.xyzx rsq r10.w, r10.w mul r12.xyz, r10.wwww, r13.xyzx mad r13.xyz, r12.xyzx, l(6.000000, 6.000000, 6.000000, 0.000000), l(-2.000000, -2.000000, -2.000000, 0.000000) mul_sat r13.xyz, r13.xyzx, cb1[r9.w + 6].xyzx mad r12.xyz, r12.xyzx, l(-6.000000, -6.000000, -6.000000, 0.000000), l(-2.000000, -2.000000, -2.000000, 0.000000) mul_sat r12.xyz, r12.xyzx, cb1[r9.w + 7].xyzx add r12.xyz, r12.xyzx, r13.xyzx add r9.w, r12.y, r12.x add r9.w, r12.z, r9.w add r9.w, -r9.w, l(1.000000) max r9.w, r9.w, l(0.000000) mul r9.w, r9.w, r10.z mul r10.z, r9.w, r9.w mad r9.w, -r9.w, l(2.000000), l(3.000000) mul r9.w, r9.w, r10.z mul r10.x, r9.w, r10.y else mov r11.xzw, l(0,0,0,0) mov r10.x, l(0) endif endif add_sat r9.w, -r6.w, r10.x mad r9.xyz, r11.xzwx, r9.wwww, r9.xyzx add r9.w, r6.w, r9.w min r6.w, r9.w, l(1.000000) iadd r7.w, r7.w, l(1) endloop add r1.w, -r6.w, l(1.000000) mul r2.xyz, r9.xyzx, cb0[4].zzzz min r1.xyz, r2.xyzx, l(65504.000000, 65504.000000, 65504.000000, 0.000000) store_uav_typed u0.xyzw, r0.xyzw, r1.xyzw ret // Approximately 280 instruction slots used [/code] If we manage to fix it once and for all then we can get all the other games fixed as well, as is pretty much the same logic;) and the shaders are like 90% the same or alike:) Thank you in advance!
Hi guys;)

Maybe somebody can help me with this Compute Shader;) I know DarkStarSword is the MASTER of CS ^_^, but there are others who might know what to do:)

In short this a CS from Battlefield 1 (Frostbyte3 Engine). We currently have fixes for:
- Dragon Age: Inquisition
- Battlefield 4
- Battlefield Hard Line
- Star Wars: Battlefront

Yet, in all these fixes the CS are disabled!

I did manage to fix it in 3D;) But there is one tiny problem. The damn TILES. Basically the fix works but at certain angles and distances...

So can anyone help me out?:)

// Global Illumination
// RUNs ONCE per eye...grrr


//
// Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384
//
// using 3Dmigoto v1.2.43 on Sat Oct 22 17:07:53 2016
//
//
// Buffer Definitions:
//
// cbuffer cbLightInfo
// {
//
// struct LocalIBLLightInfo
// {
//
// float3 pos; // Offset: 0
// float sqrAttenuationRadius; // Offset: 12
// float4x4 invTransform; // Offset: 16
// float3 extend; // Offset: 80
// float textureIndex; // Offset: 92
// float3 sideFadePositive; // Offset: 96
// float influenceFadeDistance; // Offset: 108
// float3 sideFadeNegative; // Offset: 112
// float IBLType; // Offset: 124
// float3 localOffset; // Offset: 128
// float influenceExpandDistance; // Offset: 140
// float3 influenceFadeNormal; // Offset: 144
// float skipSkyVisibilityAsAO; // Offset: 156
// float skipSkyVisibilityAsMask; // Offset: 160
// float3 unused; // Offset: 164
//
// } g_lightInfoLocalIBL[128]; // Offset: 0 Size: 22528
//
// }
//
// cbuffer cb0
// {
//
// float4x4 invViewProjectionMatrix; // Offset: 0 Size: 64
// float4 g_exposureMultipliers; // Offset: 64 Size: 16
// float localIblMipmapBias; // Offset: 80 Size: 4
// float screenAspectRatio; // Offset: 84 Size: 4 [unused]
// float2 invResolution; // Offset: 88 Size: 8
// float4 shadowMapSizeAndInvSize; // Offset: 96 Size: 16 [unused]
// uint forceSplitLighting; // Offset: 112 Size: 4 [unused]
// uint sssScatteringEnables; // Offset: 116 Size: 4 [unused]
// float volumetricShadowmapHalfTexelOffset;// Offset: 120 Size: 4 [unused]
// float volumetricShadowmapOneMinusHalfTexelOffset;// Offset: 124 Size: 4 [unused]
// float volumetricShadowmapInvMaxCount;// Offset: 128 Size: 4 [unused]
// float dynamicAOFactor; // Offset: 132 Size: 4 [unused]
// uint tileCountX; // Offset: 136 Size: 4
// uint pad1; // Offset: 140 Size: 4 [unused]
// float4x3 g_normalBasisTransforms[6];// Offset: 144 Size: 288
//
// }
//
// Resource bind info for g_lightCullInput
// {
//
// uint4 $Element; // Offset: 0 Size: 16
//
// }
//
// Resource bind info for g_lightIndexInput
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
// Resource bind info for g_compactTileGridBuffer
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// g_linearSampler sampler NA NA 0 1
// g_linearMipmapSampler sampler NA NA 1 1
// g_gbufferTexture0 texture float4 2d 0 1
// g_gbufferTexture1 texture float4 2d 1 1
// g_gbufferTexture2 texture float4 2d 2 1
// g_depthTexture texture float 2d 6 1
// g_specularLocalIBLsTexture texture float4 cubearray 7 1
// g_preIntegratedFGTexture texture float4 2d 8 1
// g_diffuseOcclusionTexture texture float 2d 10 1
// g_lightCullInput texture struct r/o 19 1
// g_lightIndexInput texture struct r/o 20 1
// g_compactTileGridBuffer texture struct r/o 25 1
// g_outputTexture0 UAV float4 2d 0 1
// cb0 cbuffer NA NA 0 1
// cbLightInfo cbuffer NA NA 1 1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Input
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Output
cs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb1[1408], dynamicIndexed
dcl_constantbuffer cb0[27], dynamicIndexed
dcl_sampler s0, mode_default
dcl_sampler s1, mode_default
dcl_resource_texture2d (float,float,float,float) t0
dcl_resource_texture2d (float,float,float,float) t1
dcl_resource_texture2d (float,float,float,float) t2
dcl_resource_texture2d (float,float,float,float) t6
dcl_resource_texturecubearray (float,float,float,float) t7
dcl_resource_texture2d (float,float,float,float) t8
dcl_resource_texture2d (float,float,float,float) t10
dcl_resource_structured t19, 16
dcl_resource_structured t20, 4
dcl_resource_structured t25, 4
dcl_uav_typed_texture2d (float,float,float,float) u0
dcl_input vThreadIDInGroupFlattened
dcl_input vThreadGroupID.x
dcl_input vThreadIDInGroup.xy

dcl_temps 60

// 3DMigoto StereoParams:
dcl_resource_texture2d (float,float,float,float) t120
dcl_resource_texture2d (float,float,float,float) t125
ld_indexable(texture2d)(float,float,float,float) r41.xyzw, l(0, 0, 0, 0), t120.xyzw
ld_indexable(texture2d)(float,float,float,float) r40.xyzw, l(0, 0, 0, 0), t125.xyzw

// Inverse
// ASM code for matrix inverse is here but is too damn long to paste

// store results for later use
mov r50.xyzw, r0.xyzw
mov r51.xyzw, r1.xyzw
mov r52.xyzw, r2.xyzw
mov r53.xyzw, r3.xyzw

dcl_tgsm_raw g0, 4
dcl_tgsm_raw g1, 4
dcl_thread_group 16, 16, 1
ld_structured_indexable(structured_buffer, stride=4)(mixed,mixed,mixed,mixed) r0.x, vThreadGroupID.x, l(0), t25.xxxx
ushr r1.x, r0.x, l(16)
and r1.yzw, r0.xxxx, l(0, 0x0000ffff, 0x0000ffff, 0x0000ffff)
imad r0.xyzw, r1.xyzw, l(16, 16, 16, 16), vThreadIDInGroup.xyyy
if_z vThreadIDInGroupFlattened.x
imad r1.x, r1.w, cb0[8].z, r1.x
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r1.xyzw, r1.x, l(0), t19.xyzw
ushr r2.xy, r1.yzyy, l(16, 16, 0, 0)
and r1.yzw, r1.yyzw, l(0, 0x0000ffff, 0x0000ffff, 0x0000ffff)
store_raw g1.x, l(0), r1.w
iadd r1.y, r2.x, r1.y
iadd r1.y, r1.z, r1.y
iadd r1.y, r2.y, r1.y
iadd r1.x, r1.y, r1.x
store_raw g0.x, l(0), r1.x
endif
sync_g_t
utof r1.xy, r0.xwxx
add r1.zw, r1.xxxy, l(0.000000, 0.000000, 0.500000, 0.500000)
mul r1.zw, r1.zzzw, cb0[5].zzzw
ftoi r2.xy, r1.xyxx
mov r2.zw, l(0,0,0,0)
ld_indexable(texture2d)(float,float,float,float) r3.xyzw, r2.xyww, t0.xyzw
ld_indexable(texture2d)(float,float,float,float) r4.xyzw, r2.xyww, t1.xyzw
ld_indexable(texture2d)(float,float,float,float) r5.xyz, r2.xyww, t2.yzwx
ld_indexable(texture2d)(float,float,float,float) r2.z, r2.xyzw, t6.yzxw
mul r1.x, r4.w, l(6.000000)
round_ne r1.x, r1.x
ftou r1.x, r1.x
mad r6.xy, r3.xyxx, l(2.000000, 2.000000, 0.000000, 0.000000), l(-1.000000, -1.000000, 0.000000, 0.000000)
dp2 r1.y, r6.xyxx, r6.xyxx
min r1.y, r1.y, l(1.000000)
add r1.y, -r1.y, l(1.000000)
sqrt r6.z, r1.y
imul null, r1.x, r1.x, l(3)
dp3 r7.x, r6.xyzx, cb0[r1.x + 9].xyzx
dp3 r7.y, r6.xyzx, cb0[r1.x + 10].xyzx
dp3 r7.z, r6.xyzx, cb0[r1.x + 11].xyzx
add r1.x, -r3.z, l(1.000000)
mul r1.y, r3.w, l(3.000000)
round_ne r1.y, r1.y
ftoi r1.y, r1.y
ieq r1.y, r1.y, l(1)
movc r1.y, r1.y, l(0), r5.x
mul r3.x, r5.y, r5.y
mul r3.y, r3.x, l(0.160000)
mad r3.xzw, -r3.xxxx, l(0.160000, 0.000000, 0.160000, 0.160000), r4.xxyz
mad r3.xyz, r1.yyyy, r3.xzwx, r3.yyyy
dp3 r1.y, r3.xyzx, l(0.330000, 0.330000, 0.330000, 0.000000)
mul_sat r1.y, r1.y, l(50.000000)
mul r4.y, r1.x, r1.x
mad r4.zw, r1.zzzw, l(0.000000, 0.000000, 2.000000, 2.000000), l(0.000000, 0.000000, -1.000000, -1.000000)
mul r2.xy, r4.zwzz, l(1.000000, -1.000000, 0.000000, 0.000000)
mov r2.w, l(1.000000)

// Original code
dp4 r6.x, r2.xyzw, cb0[0].xyzw
dp4 r6.y, r2.xyzw, cb0[1].xyzw
dp4 r6.z, r2.xyzw, cb0[2].xyzw
dp4 r2.x, r2.xyzw, cb0[3].xyzw
div r2.x, l(1.000000, 1.000000, 1.000000, 1.000000), r2.x

//Fix
// Multiply by normal matrix
mov r38.xyz, r6.xyz
mov r38.w, l(1.000000)
dp4 r39.x, r38.xyzw, r50.xyzw
dp4 r39.y, r38.xyzw, r51.xyzw
dp4 r39.z, r38.xyzw, r52.xyzw
dp4 r39.w, r38.xyzw, r53.xyzw

// r39.x -= stereo.x * iniParams.w;
mul r40.w, r40.x, l(3.0)
add r39.x, r39.x, -r40.w

// Apply inverse again
dp4 r38.x, r39.xyzw, cb0[0].xyzw
dp4 r38.y, r39.xyzw, cb0[1].xyzw
dp4 r38.z, r39.xyzw, cb0[2].xyzw
dp4 r38.w, r39.xyzw, cb0[3].xyzw
mov r6.xyz, r38.xyzw
//End Fix

mul r8.xyz, r2.xxxx, r6.xyzx
dp3 r2.y, -r8.xyzx, -r8.xyzx
rsq r2.y, r2.y
mul r2.yzw, r2.yyyy, -r8.xxyz
dp3 r3.w, -r2.yzwy, r7.xyzx
add r3.w, r3.w, r3.w
mad r5.xyw, r7.xyxz, -r3.wwww, -r2.yzyw
dp3_sat r4.x, r7.xyzx, r2.yzwy
sample_l_indexable(texture2d)(float,float,float,float) r1.z, r1.zwzz, t10.yzxw, s0, l(0.000000)
ld_raw r1.w, l(0), g0.xxxx
sample_l_indexable(texture2d)(float,float,float,float) r2.yz, r4.xyxx, t8.zxyw, s0, l(0.000000)
mul r1.y, r1.y, r2.z
mad r2.yzw, r3.xxyz, r2.yyyy, r1.yyyy
ld_raw r1.y, l(0), g1.xxxx
iadd r1.y, r1.w, r1.y
mad r3.x, -r1.x, r1.x, l(1.000000)
max r3.y, r3.x, l(0.000000)
sqrt r3.y, r3.y
mad r3.y, r1.x, r1.x, r3.y
mul r3.x, r3.y, r3.x
add r3.yzw, -r7.xxyz, r5.xxyw
mad r3.xyz, r3.xxxx, r3.yzwy, r7.xyzx
dp3 r3.w, r3.xyzx, r3.xyzx
mul r4.z, r3.w, l(4.000000)
eq r4.w, r3.w, l(0.000000)
add r3.w, r3.w, r3.w
min r5.x, r4.y, l(1.000000)
mad r5.y, r5.x, l(-2.000000), l(3.000000)
mul r5.x, r5.x, r5.x
mul r5.x, r5.x, r5.y
mad r5.y, r4.y, l(-16.000000), l(-1.000000)
exp r5.y, r5.y
add r5.w, -r5.z, l(1.000000)
mov r8.w, l(1.000000)
mov r9.xyz, l(0,0,0,0)
mov r6.w, l(0)
mov r7.w, r1.w
loop
ult r9.w, r7.w, r1.y
lt r10.x, r6.w, l(1.000000)
and r9.w, r9.w, r10.x
breakc_z r9.w
ld_structured_indexable(structured_buffer, stride=4)(mixed,mixed,mixed,mixed) r9.w, r7.w, l(0), t20.xxxx
imul null, r9.w, r9.w, l(11)
eq r10.x, l(0.000000), cb1[r9.w + 7].w
if_nz r10.x
mad r10.xyz, r6.xyzx, r2.xxxx, -cb1[r9.w + 0].xyzx
dp3 r10.w, r3.xyzx, r10.xyzx
add r11.x, r10.w, r10.w
dp3 r11.y, r10.xyzx, r10.xyzx
mad r11.z, -cb1[r9.w + 5].x, cb1[r9.w + 5].x, r11.y
mul r11.z, r4.z, r11.z
mad r11.x, r11.x, r11.x, -r11.z
lt r11.z, r11.x, l(0.000000)
or r11.z, r4.w, r11.z
if_nz r11.z
mov r11.w, l(0)
else
sqrt r11.x, r11.x
mad r10.w, -r10.w, l(2.000000), r11.x
div r11.w, r10.w, r3.w
endif
if_z r11.z
mad r12.xyz, r11.wwww, r3.xyzx, r8.xyzx
add r12.xyz, r12.xyzx, -cb1[r9.w + 0].xyzx
dp3 r10.w, r12.xyzx, r12.xyzx
sqrt r10.w, r10.w
mul r11.x, r4.y, r11.w
div r10.w, r11.x, r10.w
mad r11.x, r1.x, r1.x, -r10.w
mad r10.w, r5.x, r11.x, r10.w
add r11.xzw, r3.xxyz, -r12.xxyz
mad r12.xyz, r5.xxxx, r11.xzwx, r12.xyzx
mad r10.w, r10.w, cb0[5].x, l(1.000000)
log r10.w, r10.w
mov r12.w, cb1[r9.w + 5].w
sample_l_indexable(texturecubearray)(float,float,float,float) r12.xyzw, r12.xyzw, t7.xyzw, s1, r10.w
mul r11.xzw, r2.yyzw, r12.xxyz
max r10.w, r5.z, cb1[r9.w + 9].w
min r10.w, r1.z, r10.w
add r12.x, r4.x, r10.w
log r12.x, r12.x
mul r12.x, r5.y, r12.x
exp r12.x, r12.x
add r10.w, r10.w, r12.x
add_sat r10.w, r10.w, l(-1.000000)
mul r11.xzw, r10.wwww, r11.xxzw
max r10.w, r5.w, cb1[r9.w + 10].x
mul r10.w, r10.w, r12.w
sqrt r12.x, r11.y
add r12.y, cb1[r9.w + 5].x, cb1[r9.w + 8].w
add r12.y, -r12.x, r12.y
max r12.z, l(0.000100), cb1[r9.w + 6].w
div_sat r12.y, r12.y, r12.z
add r12.x, r12.x, -cb1[r9.w + 5].x
add_sat r12.x, r12.x, cb1[r9.w + 9].x
rsq r11.y, r11.y
mul r10.xyz, r10.xyzx, r11.yyyy
dp3 r10.x, r7.xyzx, r10.xyzx
mad_sat r10.x, r10.x, l(-2.500000), l(1.500000)
lt r10.y, l(0.000000), r12.x
movc r10.x, r10.y, r10.x, l(1.000000)
mul r10.x, r10.x, r12.y
mul r10.y, r10.x, r10.x
mad r10.x, -r10.x, l(2.000000), l(3.000000)
mul r10.x, r10.x, r10.y
mul r10.x, r10.x, r10.w
else
mov r11.xzw, l(0,0,0,0)
mov r10.x, l(0)
endif
else
dp4 r12.x, r8.xyzw, cb1[r9.w + 1].xyzw
dp4 r12.y, r8.xyzw, cb1[r9.w + 2].xyzw
dp4 r12.z, r8.xyzw, cb1[r9.w + 3].xyzw
dp3 r13.x, r3.xyzx, cb1[r9.w + 1].xyzx
dp3 r13.y, r3.xyzx, cb1[r9.w + 2].xyzx
dp3 r13.z, r3.xyzx, cb1[r9.w + 3].xyzx
div r10.yzw, l(1.000000, 1.000000, 1.000000, 1.000000), r13.xxyz
add r14.xyz, -r12.xyzx, -cb1[r9.w + 5].xyzx
mul r14.xyz, r10.yzwy, r14.xyzx
add r15.xyz, -r12.xyzx, cb1[r9.w + 5].xyzx
mul r10.yzw, r10.yyzw, r15.xxyz
min r15.xyz, r10.yzwy, r14.xyzx
max r10.yzw, r10.yyzw, r14.xxyz
max r11.y, r15.z, r15.y
max r11.y, r11.y, r15.x
min r10.y, r10.z, r10.y
min r10.y, r10.w, r10.y
lt r10.z, r11.y, r10.y
if_nz r10.z
mad r14.xyz, r10.yyyy, r13.xyzx, r12.xyzx
add r14.xyz, r14.xyzx, -cb1[r9.w + 8].xyzx
dp3 r10.z, r14.xyzx, r14.xyzx
sqrt r10.z, r10.z
mul r10.y, r4.y, r10.y
div r10.y, r10.y, r10.z
mad r10.z, r1.x, r1.x, -r10.y
mad r10.y, r5.x, r10.z, r10.y
add r13.xyz, r13.xyzx, -r14.xyzx
mad r13.xyz, r5.xxxx, r13.xyzx, r14.xyzx
mad r10.y, r10.y, cb0[5].x, l(1.000000)
log r10.y, r10.y
mov r13.w, cb1[r9.w + 5].w
sample_l_indexable(texturecubearray)(float,float,float,float) r14.xyzw, r13.xyzw, t7.xyzw, s1, r10.y
mul r10.yzw, r2.yyzw, r14.xxyz
max r11.y, r5.z, cb1[r9.w + 9].w
min r11.y, r1.z, r11.y
add r12.w, r4.x, r11.y
log r12.w, r12.w
mul r12.w, r5.y, r12.w
exp r12.w, r12.w
add r11.y, r11.y, r12.w
add_sat r11.y, r11.y, l(-1.000000)
mul r11.xzw, r10.yyzw, r11.yyyy
max r10.y, r5.w, cb1[r9.w + 10].x
mul r10.y, r10.y, r14.w
add r14.xyz, cb1[r9.w + 5].xyzx, cb1[r9.w + 8].wwww
add r15.xyz, -r14.xyzx, cb1[r9.w + 6].wwww
add r16.xyz, r14.xyzx, -cb1[r9.w + 6].wwww
lt r17.xyz, r12.xyzx, r15.xyzx
add r15.xyz, -r12.xyzx, r15.xyzx
mul r15.xyz, r15.xyzx, r15.xyzx
and r15.xyz, r15.xyzx, r17.xyzx
lt r17.xyz, r16.xyzx, r12.xyzx
add r16.xyz, r12.xyzx, -r16.xyzx
mul r16.xyz, r16.xyzx, r16.xyzx
and r16.xyz, r16.xyzx, r17.xyzx
dp3 r10.z, r15.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)
dp3 r10.w, r16.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)
add r10.z, r10.w, r10.z
sqrt r10.z, r10.z
max r10.w, l(0.000100), cb1[r9.w + 6].w
div r10.z, r10.z, r10.w
min r10.z, r10.z, l(1.000000)
add r10.z, -r10.z, l(1.000000)
add r12.xyz, |r12.xyzx|, -r14.xyzx
add_sat r12.xyz, r12.xyzx, cb1[r9.w + 9].xyzx
mad r14.xyz, r6.xyzx, r2.xxxx, -cb1[r9.w + 0].xyzx
dp3 r10.w, r14.xyzx, r14.xyzx
rsq r10.w, r10.w
mul r14.xyz, r10.wwww, r14.xyzx
dp3 r10.w, r7.xyzx, r14.xyzx
mad_sat r10.w, r10.w, l(-2.500000), l(1.500000)
dp3 r11.y, r12.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)
lt r11.y, l(0.000000), r11.y
movc r10.w, r11.y, r10.w, l(1.000000)
mul r10.z, r10.w, r10.z
dp3 r10.w, r13.xyzx, r13.xyzx
rsq r10.w, r10.w
mul r12.xyz, r10.wwww, r13.xyzx
mad r13.xyz, r12.xyzx, l(6.000000, 6.000000, 6.000000, 0.000000), l(-2.000000, -2.000000, -2.000000, 0.000000)
mul_sat r13.xyz, r13.xyzx, cb1[r9.w + 6].xyzx
mad r12.xyz, r12.xyzx, l(-6.000000, -6.000000, -6.000000, 0.000000), l(-2.000000, -2.000000, -2.000000, 0.000000)
mul_sat r12.xyz, r12.xyzx, cb1[r9.w + 7].xyzx
add r12.xyz, r12.xyzx, r13.xyzx
add r9.w, r12.y, r12.x
add r9.w, r12.z, r9.w
add r9.w, -r9.w, l(1.000000)
max r9.w, r9.w, l(0.000000)
mul r9.w, r9.w, r10.z
mul r10.z, r9.w, r9.w
mad r9.w, -r9.w, l(2.000000), l(3.000000)
mul r9.w, r9.w, r10.z
mul r10.x, r9.w, r10.y
else
mov r11.xzw, l(0,0,0,0)
mov r10.x, l(0)
endif
endif
add_sat r9.w, -r6.w, r10.x
mad r9.xyz, r11.xzwx, r9.wwww, r9.xyzx
add r9.w, r6.w, r9.w
min r6.w, r9.w, l(1.000000)
iadd r7.w, r7.w, l(1)
endloop
add r1.w, -r6.w, l(1.000000)
mul r2.xyz, r9.xyzx, cb0[4].zzzz
min r1.xyz, r2.xyzx, l(65504.000000, 65504.000000, 65504.000000, 0.000000)
store_uav_typed u0.xyzw, r0.xyzw, r1.xyzw
ret
// Approximately 280 instruction slots used


If we manage to fix it once and for all then we can get all the other games fixed as well, as is pretty much the same logic;) and the shaders are like 90% the same or alike:)
Thank you in advance!

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/22/2016 06:25 PM   
Also can somebody tell me how I can skip a compute shader? Setting handling=skip doesn't seem to work:-s
Also can somebody tell me how I can skip a compute shader?
Setting handling=skip doesn't seem to work:-s

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/22/2016 06:34 PM   
Found another shader that is actually calculating the tiles. I bet here is where we need to extend the tiles or something similar... [code] // Tiles CS??? // // Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384 // // using 3Dmigoto v1.2.43 on Sat Oct 22 19:46:23 2016 // // // Buffer Definitions: // // cbuffer cb0 // { // // float4x4 invViewProjectionMatrix; // Offset: 0 Size: 64 [unused] // uint2 resolutionMinusOne; // Offset: 64 Size: 8 [unused] // float2 invResolutionTimesBlockSize;// Offset: 72 Size: 8 [unused] // uint tileCountX; // Offset: 80 Size: 4 // uint tileCountY; // Offset: 84 Size: 4 [unused] // uint maxLightsInLightList; // Offset: 88 Size: 4 [unused] // uint cullAppendPassMaterialPassType;// Offset: 92 Size: 4 // uint lightInfoCount_Punctual; // Offset: 96 Size: 4 [unused] // uint lightInfoCount_PunctualShadow;// Offset: 100 Size: 4 [unused] // uint lightInfoCount_Area; // Offset: 104 Size: 4 [unused] // uint lightInfoCount_AreaShadow; // Offset: 108 Size: 4 [unused] // uint lightInfoCount_LocalIBL; // Offset: 112 Size: 4 [unused] // uint lightInfoCount_LocalPR; // Offset: 116 Size: 4 [unused] // uint unused1; // Offset: 120 Size: 4 [unused] // uint unused2; // Offset: 124 Size: 4 [unused] // uint2 coarseToFineTileShift; // Offset: 128 Size: 8 [unused] // uint2 coarseToFineInputTileRes; // Offset: 136 Size: 8 [unused] // // } // // Resource bind info for g_lightCullInput // { // // uint4 $Element; // Offset: 0 Size: 16 // // } // // Resource bind info for g_compactTileGridUav_Punctual // { // // uint $Element; // Offset: 0 Size: 4 // // } // // Resource bind info for g_compactTileGridUav_PunctualShadow // { // // uint $Element; // Offset: 0 Size: 4 // // } // // Resource bind info for g_compactTileGridUav_Area // { // // uint $Element; // Offset: 0 Size: 4 // // } // // Resource bind info for g_compactTileGridUav_AreaShadow // { // // uint $Element; // Offset: 0 Size: 4 // // } // // Resource bind info for g_compactTileGridUav_LocalIBL // { // // uint $Element; // Offset: 0 Size: 4 // // } // // Resource bind info for g_compactTileGridUav_LocalIBLAndPR // { // // uint $Element; // Offset: 0 Size: 4 // // } // // Resource bind info for g_compactTileGridUav_AllLights // { // // uint $Element; // Offset: 0 Size: 4 // // } // // // Resource Bindings: // // Name Type Format Dim Slot Elements // ------------------------------ ---------- ------- ----------- ---- -------- // g_materialIdTileMaskTexture texture uint 2d 0 1 // g_lightCullInput texture struct r/o 1 1 // g_compactTileGridUav_Punctual UAV struct append 0 1 // g_compactTileGridUav_PunctualShadow UAV struct append 1 1 // g_compactTileGridUav_Area UAV struct append 2 1 // g_compactTileGridUav_AreaShadow UAV struct append 3 1 // g_compactTileGridUav_LocalIBL UAV struct append 4 1 // g_compactTileGridUav_LocalIBLAndPR UAV struct append 5 1 // g_compactTileGridUav_AllLights UAV struct append 6 1 // cb0 cbuffer NA NA 0 1 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // no Input // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // no Output cs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[6], immediateIndexed dcl_resource_texture2d (uint,uint,uint,uint) t0 dcl_resource_structured t1, 16 dcl_uav_structured u0, 4 dcl_uav_structured u1, 4 dcl_uav_structured u2, 4 dcl_uav_structured u3, 4 dcl_uav_structured u4, 4 dcl_uav_structured u5, 4 dcl_uav_structured u6, 4 dcl_input vThreadID.xy dcl_temps 3 dcl_thread_group 64, 1, 1 ult r0.x, vThreadID.x, cb0[5].x if_nz r0.x mov r0.xy, vThreadID.xyxx mov r0.zw, l(0,0,0,0) ld_indexable(texture2d)(uint,uint,uint,uint) r0.x, r0.xyzw, t0.xyzw ieq r0.x, r0.x, l(1) ieq r0.y, cb0[5].w, l(0) ieq r0.x, r0.y, r0.x if_nz r0.x imad r0.x, vThreadID.y, cb0[5].x, vThreadID.x ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r0.xyz, r0.x, l(4), t1.xyzx ushr r1.xyz, r0.xyzx, l(16, 16, 16, 0) and r0.xyz, r0.xyzx, l(0x0000ffff, 0x0000ffff, 0x0000ffff, 0) iadd r1.zw, r1.xxxz, r0.xxxz iadd r0.w, r0.y, r1.z iadd r0.w, r1.y, r0.w ishl r1.z, vThreadID.x, l(16) or r1.z, r1.z, vThreadID.y if_nz r0.w ieq r0.x, r0.w, r0.x if_nz r0.x imm_atomic_alloc r2.x, u0 store_structured u0.x, r2.x, l(0), r1.z else ieq r0.x, r0.w, r1.x if_nz r0.x imm_atomic_alloc r2.x, u1 store_structured u1.x, r2.x, l(0), r1.z else ieq r0.x, r0.w, r0.y if_nz r0.x imm_atomic_alloc r2.x, u2 store_structured u2.x, r2.x, l(0), r1.z else ieq r0.x, r0.w, r1.y //if_nz r0.x imm_atomic_alloc r2.x, u3 store_structured u3.x, r2.x, l(0), r1.z //else imm_atomic_alloc r2.x, u6 store_structured u6.x, r2.x, l(0), r1.z endif endif endif endif endif if_nz r1.w ieq r0.x, r0.z, r1.w if_nz r0.x imm_atomic_alloc r0.x, u4 store_structured u4.x, r0.x, l(0), r1.z else imm_atomic_alloc r0.x, u5 store_structured u5.x, r0.x, l(0), r1.z endif endif endif endif ret // Approximately 59 instruction slots used [/code] Not really a clue on what to try out...Hacking around it just crashes the game though:-s
Found another shader that is actually calculating the tiles.
I bet here is where we need to extend the tiles or something similar...

// Tiles CS???
//
// Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384
//
// using 3Dmigoto v1.2.43 on Sat Oct 22 19:46:23 2016
//
//
// Buffer Definitions:
//
// cbuffer cb0
// {
//
// float4x4 invViewProjectionMatrix; // Offset: 0 Size: 64 [unused]
// uint2 resolutionMinusOne; // Offset: 64 Size: 8 [unused]
// float2 invResolutionTimesBlockSize;// Offset: 72 Size: 8 [unused]
// uint tileCountX; // Offset: 80 Size: 4
// uint tileCountY; // Offset: 84 Size: 4 [unused]
// uint maxLightsInLightList; // Offset: 88 Size: 4 [unused]
// uint cullAppendPassMaterialPassType;// Offset: 92 Size: 4
// uint lightInfoCount_Punctual; // Offset: 96 Size: 4 [unused]
// uint lightInfoCount_PunctualShadow;// Offset: 100 Size: 4 [unused]
// uint lightInfoCount_Area; // Offset: 104 Size: 4 [unused]
// uint lightInfoCount_AreaShadow; // Offset: 108 Size: 4 [unused]
// uint lightInfoCount_LocalIBL; // Offset: 112 Size: 4 [unused]
// uint lightInfoCount_LocalPR; // Offset: 116 Size: 4 [unused]
// uint unused1; // Offset: 120 Size: 4 [unused]
// uint unused2; // Offset: 124 Size: 4 [unused]
// uint2 coarseToFineTileShift; // Offset: 128 Size: 8 [unused]
// uint2 coarseToFineInputTileRes; // Offset: 136 Size: 8 [unused]
//
// }
//
// Resource bind info for g_lightCullInput
// {
//
// uint4 $Element; // Offset: 0 Size: 16
//
// }
//
// Resource bind info for g_compactTileGridUav_Punctual
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
// Resource bind info for g_compactTileGridUav_PunctualShadow
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
// Resource bind info for g_compactTileGridUav_Area
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
// Resource bind info for g_compactTileGridUav_AreaShadow
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
// Resource bind info for g_compactTileGridUav_LocalIBL
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
// Resource bind info for g_compactTileGridUav_LocalIBLAndPR
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
// Resource bind info for g_compactTileGridUav_AllLights
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// g_materialIdTileMaskTexture texture uint 2d 0 1
// g_lightCullInput texture struct r/o 1 1
// g_compactTileGridUav_Punctual UAV struct append 0 1
// g_compactTileGridUav_PunctualShadow UAV struct append 1 1
// g_compactTileGridUav_Area UAV struct append 2 1
// g_compactTileGridUav_AreaShadow UAV struct append 3 1
// g_compactTileGridUav_LocalIBL UAV struct append 4 1
// g_compactTileGridUav_LocalIBLAndPR UAV struct append 5 1
// g_compactTileGridUav_AllLights UAV struct append 6 1
// cb0 cbuffer NA NA 0 1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Input
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Output
cs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer cb0[6], immediateIndexed
dcl_resource_texture2d (uint,uint,uint,uint) t0
dcl_resource_structured t1, 16
dcl_uav_structured u0, 4
dcl_uav_structured u1, 4
dcl_uav_structured u2, 4
dcl_uav_structured u3, 4
dcl_uav_structured u4, 4
dcl_uav_structured u5, 4
dcl_uav_structured u6, 4
dcl_input vThreadID.xy
dcl_temps 3
dcl_thread_group 64, 1, 1
ult r0.x, vThreadID.x, cb0[5].x
if_nz r0.x
mov r0.xy, vThreadID.xyxx
mov r0.zw, l(0,0,0,0)
ld_indexable(texture2d)(uint,uint,uint,uint) r0.x, r0.xyzw, t0.xyzw
ieq r0.x, r0.x, l(1)
ieq r0.y, cb0[5].w, l(0)
ieq r0.x, r0.y, r0.x
if_nz r0.x
imad r0.x, vThreadID.y, cb0[5].x, vThreadID.x
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r0.xyz, r0.x, l(4), t1.xyzx
ushr r1.xyz, r0.xyzx, l(16, 16, 16, 0)
and r0.xyz, r0.xyzx, l(0x0000ffff, 0x0000ffff, 0x0000ffff, 0)
iadd r1.zw, r1.xxxz, r0.xxxz
iadd r0.w, r0.y, r1.z
iadd r0.w, r1.y, r0.w
ishl r1.z, vThreadID.x, l(16)
or r1.z, r1.z, vThreadID.y
if_nz r0.w
ieq r0.x, r0.w, r0.x
if_nz r0.x
imm_atomic_alloc r2.x, u0
store_structured u0.x, r2.x, l(0), r1.z
else
ieq r0.x, r0.w, r1.x
if_nz r0.x
imm_atomic_alloc r2.x, u1
store_structured u1.x, r2.x, l(0), r1.z
else
ieq r0.x, r0.w, r0.y
if_nz r0.x
imm_atomic_alloc r2.x, u2
store_structured u2.x, r2.x, l(0), r1.z
else
ieq r0.x, r0.w, r1.y
//if_nz r0.x
imm_atomic_alloc r2.x, u3
store_structured u3.x, r2.x, l(0), r1.z
//else
imm_atomic_alloc r2.x, u6
store_structured u6.x, r2.x, l(0), r1.z
endif
endif
endif
endif
endif
if_nz r1.w
ieq r0.x, r0.z, r1.w
if_nz r0.x
imm_atomic_alloc r0.x, u4
store_structured u4.x, r0.x, l(0), r1.z
else
imm_atomic_alloc r0.x, u5
store_structured u5.x, r0.x, l(0), r1.z
endif
endif
endif
endif
ret
// Approximately 59 instruction slots used


Not really a clue on what to try out...Hacking around it just crashes the game though:-s

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/22/2016 06:52 PM   
[quote="helifax"]Also can somebody tell me how I can skip a compute shader? Setting handling=skip doesn't seem to work:-s[/quote] No way to use the ShaderOverride 'skip' for ComputeShader. Just took a look at the code, and CS are not in that code sequence. They were there earlier, but I think that DarkStarSword pulled it out because if we skip CS, that typically leads to a crash. Best bet to emulate skip would be to try to get an idea of how to disable a CS without crashing. Would require looking at the code and deciding what might be skippable. Pretty unclear though. Example for these tiled lighting shaders. If we skip the CS, that might make the number of tiles=0, which would could easily destroy some later CS or PS that is not expecting a no-tiles scenario. Putting all the calculations on the GPU is bad for us because the tools for GPU debugging are weak (not just 3Dmigoto). For the tiled lighting problem, you might ping DarkStarSword by PM. I know he's really booked up and probably has no time to read the forum. PM's send an email notification. Also, you have probably already looked, but if not, check his github repo (not 3Dmigoto) for examples of CS fixes. https://github.com/DarkStarSword/3d-fixes
helifax said:Also can somebody tell me how I can skip a compute shader?
Setting handling=skip doesn't seem to work:-s

No way to use the ShaderOverride 'skip' for ComputeShader. Just took a look at the code, and CS are not in that code sequence. They were there earlier, but I think that DarkStarSword pulled it out because if we skip CS, that typically leads to a crash.

Best bet to emulate skip would be to try to get an idea of how to disable a CS without crashing. Would require looking at the code and deciding what might be skippable. Pretty unclear though.

Example for these tiled lighting shaders. If we skip the CS, that might make the number of tiles=0, which would could easily destroy some later CS or PS that is not expecting a no-tiles scenario. Putting all the calculations on the GPU is bad for us because the tools for GPU debugging are weak (not just 3Dmigoto).


For the tiled lighting problem, you might ping DarkStarSword by PM. I know he's really booked up and probably has no time to read the forum. PM's send an email notification.

Also, you have probably already looked, but if not, check his github repo (not 3Dmigoto) for examples of CS fixes. https://github.com/DarkStarSword/3d-fixes

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 10/23/2016 12:53 AM   
[quote="bo3b"][quote="helifax"]Also can somebody tell me how I can skip a compute shader? Setting handling=skip doesn't seem to work:-s[/quote] No way to use the ShaderOverride 'skip' for ComputeShader. Just took a look at the code, and CS are not in that code sequence. They were there earlier, but I think that DarkStarSword pulled it out because if we skip CS, that typically leads to a crash. Best bet to emulate skip would be to try to get an idea of how to disable a CS without crashing. Would require looking at the code and deciding what might be skippable. Pretty unclear though. Example for these tiled lighting shaders. If we skip the CS, that might make the number of tiles=0, which would could easily destroy some later CS or PS that is not expecting a no-tiles scenario. Putting all the calculations on the GPU is bad for us because the tools for GPU debugging are weak (not just 3Dmigoto). For the tiled lighting problem, you might ping DarkStarSword by PM. I know he's really booked up and probably has no time to read the forum. PM's send an email notification. Also, you have probably already looked, but if not, check his github repo (not 3Dmigoto) for examples of CS fixes. https://github.com/DarkStarSword/3d-fixes[/quote] Cheers Bo3b! I noticed that and had a feeling this was the case, so I went ahead and started "chopping" at the CS until I actually managed to disable it. (It has some secondary consequences of course - like you said - but no crashing, just missing other stuff). Definitely NOT ideal but the damn FrostByte 3 Computes still elude me :)) (The DAMN tiles... Fixing the rendering wasn't that bad, but making it "stick" on all angles is a PAIN ^_^ ) Thanks, I'll try to PM DSS about it and see if he has some time for me;) Thank you again! PS: Also if somebody is interested in a FULL ASM matrix inverse code let me know (I had some issues with the HLSL shader bound in ASM so I decided to make one in ASM;) Well FXC did most of the work, I just helped it out ^_^).
bo3b said:
helifax said:Also can somebody tell me how I can skip a compute shader?
Setting handling=skip doesn't seem to work:-s

No way to use the ShaderOverride 'skip' for ComputeShader. Just took a look at the code, and CS are not in that code sequence. They were there earlier, but I think that DarkStarSword pulled it out because if we skip CS, that typically leads to a crash.

Best bet to emulate skip would be to try to get an idea of how to disable a CS without crashing. Would require looking at the code and deciding what might be skippable. Pretty unclear though.

Example for these tiled lighting shaders. If we skip the CS, that might make the number of tiles=0, which would could easily destroy some later CS or PS that is not expecting a no-tiles scenario. Putting all the calculations on the GPU is bad for us because the tools for GPU debugging are weak (not just 3Dmigoto).


For the tiled lighting problem, you might ping DarkStarSword by PM. I know he's really booked up and probably has no time to read the forum. PM's send an email notification.

Also, you have probably already looked, but if not, check his github repo (not 3Dmigoto) for examples of CS fixes. https://github.com/DarkStarSword/3d-fixes


Cheers Bo3b!
I noticed that and had a feeling this was the case, so I went ahead and started "chopping" at the CS until I actually managed to disable it. (It has some secondary consequences of course - like you said - but no crashing, just missing other stuff).
Definitely NOT ideal but the damn FrostByte 3 Computes still elude me :)) (The DAMN tiles... Fixing the rendering wasn't that bad, but making it "stick" on all angles is a PAIN ^_^ )

Thanks, I'll try to PM DSS about it and see if he has some time for me;)

Thank you again!

PS: Also if somebody is interested in a FULL ASM matrix inverse code let me know (I had some issues with the HLSL shader bound in ASM so I decided to make one in ASM;) Well FXC did most of the work, I just helped it out ^_^).

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/23/2016 01:05 AM   
[quote="helifax"]PS: Also if somebody is interested in a FULL ASM matrix inverse code let me know (I had some issues with the HLSL shader bound in ASM so I decided to make one in ASM;) Well FXC did most of the work, I just helped it out ^_^).[/quote] Heh! I had actually built one in ASM for DHR for JC3. He wound up not needing it because he got the ASM linked HLSL working. However, I would be interested in posting your ASM code to the Wiki for others to reference. The version I made had pieces culled because of fxc optimizations, which sounds like maybe you avoided. Please post to wiki.bo3b.net, or post here and I can add it. Adding a new page for matrix inversions of different techniques would be good.
helifax said:PS: Also if somebody is interested in a FULL ASM matrix inverse code let me know (I had some issues with the HLSL shader bound in ASM so I decided to make one in ASM;) Well FXC did most of the work, I just helped it out ^_^).

Heh! I had actually built one in ASM for DHR for JC3. He wound up not needing it because he got the ASM linked HLSL working.

However, I would be interested in posting your ASM code to the Wiki for others to reference. The version I made had pieces culled because of fxc optimizations, which sounds like maybe you avoided.

Please post to wiki.bo3b.net, or post here and I can add it. Adding a new page for matrix inversions of different techniques would be good.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 10/23/2016 01:22 AM   
@helifax The injected inverse matrix + fixing code from Mirror Edge i send you is not working in the CS Tile Lights from B1?. Is easier to use the injected one. If don't work will make this a more strange case. Those CS Lights have a very strange behavior....i almost sure is a profile/driver stuff....I already try i few thing with Mirror Edge, but in some spot and some angles only renders in one eye. @bo3b Was for Mirror Edge Catalyst....that inverted matrix also works fine! but is easier to use the injected one, because the input and outputs are more evident/clear.
@helifax
The injected inverse matrix + fixing code from Mirror Edge i send you is not working in the CS Tile Lights from B1?. Is easier to use the injected one.

If don't work will make this a more strange case.

Those CS Lights have a very strange behavior....i almost sure is a profile/driver stuff....I already try i few thing with Mirror Edge, but in some spot and some angles only renders in one eye.


@bo3b
Was for Mirror Edge Catalyst....that inverted matrix also works fine! but is easier to use the injected one, because the input and outputs are more evident/clear.

MY WEB

Helix Mod - Making 3D Better

My 3D Screenshot Gallery

Like my fixes? you can donate to Paypal: dhr.donation@gmail.com

Posted 10/23/2016 10:41 AM   
Sure thing Bo3b. I'll paste it here so you can put it on wiki.bo3b.net where you believe is best place to have it;) This code is with all the optimisations removed. So, is as clean as the original HLSL source. Thus, the original HLSL code looks like this: [code] //Work out Inverse //...Variables float4 a1, a2, a3, a4; float4 b1, b2, b3, b4; float det; //...Original Matrix a1 = g_invViewProjMatrix._m00_m10_m20_m30; a2 = g_invViewProjMatrix._m01_m11_m21_m31; a3 = g_invViewProjMatrix._m02_m12_m22_m32; a4 = g_invViewProjMatrix._m03_m13_m23_m33; //...Determinant det = a1.x*(a2.y*(a3.z*a4.w - a3.w*a4.z) + a2.z*(a3.w*a4.y - a3.y*a4.w) + a2.w*(a3.y*a4.z - a3.z*a4.y)); det += a1.y*(a2.x*(a3.w*a4.z - a3.z*a4.w) + a2.z*(a3.x*a4.w - a3.w*a4.z) + a2.w*(a3.z*a4.x - a3.x*a4.z)); det += a1.z*(a2.x*(a3.y*a4.w - a3.w*a4.y) + a2.y*(a3.w*a4.x - a3.x*a4.w) + a2.w*(a3.x*a4.y - a3.y*a4.x)); det += a1.w*(a2.x*(a3.z*a4.y - a3.y*a4.z) + a2.y*(a3.x*a4.z - a3.z*a4.x) + a2.z*(a3.y*a4.x - a3.x*a4.y)); //...Inverse Matrix Elements b1.x = a2.y*(a3.z*a4.w - a3.w*a4.z) + a2.z*(a3.w*a4.y - a3.y*a4.w) + a2.w*(a3.y*a4.z - a3.z*a4.y); b1.y = a1.y*(a3.w*a4.z - a3.z*a4.w) + a1.z*(a3.y*a4.w - a3.w*a4.y) + a1.w*(a3.z*a4.y - a3.y*a4.z); b1.z = a1.y*(a2.z*a4.w - a2.w*a4.z) + a1.z*(a2.w*a4.y - a2.y*a4.w) + a1.w*(a2.y*a4.z - a2.z*a4.y); b1.w = a1.y*(a2.w*a3.z - a2.z*a3.w) + a1.z*(a2.y*a3.w - a2.w*a3.y) + a1.w*(a2.z*a3.y - a2.y*a3.z); b2.x = a2.x*(a3.w*a4.z - a3.z*a4.w) + a2.z*(a3.x*a4.w - a3.w*a4.x) + a2.w*(a3.z*a4.x - a3.x*a4.z); b2.y = a1.x*(a3.z*a4.w - a3.w*a4.z) + a1.z*(a3.w*a4.x - a3.x*a4.w) + a1.w*(a3.x*a4.z - a3.z*a4.x); b2.z = a1.x*(a2.w*a4.z - a2.z*a4.w) + a1.z*(a2.x*a4.w - a2.w*a4.x) + a1.w*(a2.z*a4.x - a2.x*a4.z); b2.w = a1.x*(a2.z*a3.w - a2.w*a3.z) + a1.z*(a2.w*a3.x - a2.x*a3.w) + a1.w*(a2.x*a3.z - a2.z*a3.x); b3.x = a2.x*(a3.y*a4.w - a3.w*a4.y) + a2.y*(a3.w*a4.x - a3.x*a4.w) + a2.w*(a3.x*a4.y - a3.y*a4.x); b3.y = a1.x*(a3.w*a4.y - a3.y*a4.w) + a1.y*(a3.x*a4.w - a3.w*a4.x) + a1.w*(a3.y*a4.x - a3.x*a4.y); b3.z = a1.x*(a2.y*a4.w - a2.w*a4.y) + a1.y*(a2.w*a4.x - a2.x*a4.w) + a1.w*(a2.x*a4.y - a2.y*a4.x); b3.w = a1.x*(a2.w*a3.y - a2.y*a3.w) + a1.y*(a2.x*a3.w - a2.w*a3.x) + a1.w*(a2.y*a3.x - a2.x*a3.y); b4.x = a2.x*(a3.z*a4.y - a3.y*a4.z) + a2.y*(a3.x*a4.z - a3.z*a4.x) + a2.z*(a3.y*a4.x - a3.x*a4.y); b4.y = a1.x*(a3.y*a4.z - a3.z*a4.y) + a1.y*(a3.z*a4.x - a3.x*a4.z) + a1.z*(a3.x*a4.y - a3.y*a4.x); b4.z = a1.x*(a2.z*a4.y - a2.y*a4.z) + a1.y*(a2.x*a4.z - a2.z*a4.x) + a1.z*(a2.y*a4.x - a2.x*a4.y); b4.w = a1.x*(a2.y*a3.z - a2.z*a3.y) + a1.y*(a2.z*a3.x - a2.x*a3.z) + a1.z*(a2.x*a3.y - a2.y*a3.x); b1.xyzw /= det; b2.xyzw /= det; b3.xyzw /= det; b4.xyzw /= det; //End Inverse [/code] In ASM the exact same code looks like this: [code] // Declare how many registers ww use // The code uses registers from r38 to r53. dcl_temps 60 // 3DMigoto StereoParams: dcl_resource_texture1d (float,float,float,float) t120 dcl_resource_texture2d (float,float,float,float) t125 ld_indexable(texture1d)(float,float,float,float) r41.xyzw, l(0, 0, 0, 0), t120.xyzw ld_indexable(texture2d)(float,float,float,float) r40.xyzw, l(0, 0, 0, 0), t125.xyzw // Inverse // cb0[0], etc is the inverseMatrix mov r0.xyzw, cb0[0].xyzw mov r1.xyzw, cb0[1].xyzw mov r2.xyzw, cb0[2].xyzw mov r3.xyzw, cb0[3].xyzw mul r4.x, r2.z, r3.w mul r4.y, r2.w, r3.z mov r4.y, -r4.y add r4.x, r4.y, r4.x mul r4.x, r1.y, r4.x mul r4.y, r2.w, r3.y mul r4.z, r2.y, r3.w mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r1.z, r4.y add r4.x, r4.y, r4.x mul r4.y, r2.y, r3.z mul r4.z, r2.z, r3.y mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r1.w, r4.y add r4.x, r4.y, r4.x mul r4.x, r0.x, r4.x mul r4.y, r2.w, r3.z mul r4.z, r2.z, r3.w mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r1.x, r4.y mul r4.z, r2.x, r3.w mul r4.w, r2.w, r3.z mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.z, r4.z add r4.y, r4.z, r4.y mul r4.z, r2.z, r3.x mul r4.w, r2.x, r3.z mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.w, r4.z add r4.y, r4.z, r4.y mul r4.y, r0.y, r4.y add r4.x, r4.y, r4.x mul r4.y, r2.y, r3.w mul r4.z, r2.w, r3.y mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r1.x, r4.y mul r4.z, r2.w, r3.x mul r4.w, r2.x, r3.w mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.y, r4.z add r4.y, r4.z, r4.y mul r4.z, r2.x, r3.y mul r4.w, r2.y, r3.x mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.w, r4.z add r4.y, r4.z, r4.y mul r4.y, r0.z, r4.y add r4.x, r4.y, r4.x mul r4.y, r2.z, r3.y mul r4.z, r2.y, r3.z mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r1.x, r4.y mul r4.z, r2.x, r3.z mul r4.w, r2.z, r3.x mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.y, r4.z add r4.y, r4.z, r4.y mul r4.z, r2.y, r3.x mul r4.w, r2.x, r3.y mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.z, r4.z add r4.y, r4.z, r4.y mul r4.y, r0.w, r4.y add r4.x, r4.y, r4.x mul r4.y, r2.z, r3.w mul r4.z, r2.w, r3.z mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r1.y, r4.y mul r4.z, r2.w, r3.y mul r4.w, r2.y, r3.w mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.z, r4.z add r4.y, r4.z, r4.y mul r4.z, r2.y, r3.z mul r4.w, r2.z, r3.y mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.w, r4.z add r5.x, r4.z, r4.y mul r4.y, r2.w, r3.z mul r4.z, r2.z, r3.w mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r0.y, r4.y mul r4.z, r2.y, r3.w mul r4.w, r2.w, r3.y mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.z, r4.z add r4.y, r4.z, r4.y mul r4.z, r2.z, r3.y mul r4.w, r2.y, r3.z mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.w, r4.z add r5.y, r4.z, r4.y mul r4.y, r1.z, r3.w mul r4.z, r1.w, r3.z mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r0.y, r4.y mul r4.z, r1.w, r3.y mul r4.w, r1.y, r3.w mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.z, r4.z add r4.y, r4.z, r4.y mul r4.z, r1.y, r3.z mul r4.w, r1.z, r3.y mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.w, r4.z add r5.z, r4.z, r4.y mul r4.y, r1.w, r2.z mul r4.z, r1.z, r2.w mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r0.y, r4.y mul r4.z, r1.y, r2.w mul r4.w, r1.w, r2.y mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.z, r4.z add r4.y, r4.z, r4.y mul r4.z, r1.z, r2.y mul r4.w, r1.y, r2.z mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.w, r4.z add r5.w, r4.z, r4.y mul r4.y, r2.w, r3.z mul r4.z, r2.z, r3.w mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r1.x, r4.y mul r4.z, r2.x, r3.w mul r4.w, r2.w, r3.x mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.z, r4.z add r4.y, r4.z, r4.y mul r4.z, r2.z, r3.x mul r4.w, r2.x, r3.z mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.w, r4.z add r6.x, r4.z, r4.y mul r4.y, r2.z, r3.w mul r4.z, r2.w, r3.z mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r0.x, r4.y mul r4.z, r2.w, r3.x mul r4.w, r2.x, r3.w mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.z, r4.z add r4.y, r4.z, r4.y mul r4.z, r2.x, r3.z mul r4.w, r2.z, r3.x mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.w, r4.z add r6.y, r4.z, r4.y mul r4.y, r1.w, r3.z mul r4.z, r1.z, r3.w mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r0.x, r4.y mul r4.z, r1.x, r3.w mul r4.w, r1.w, r3.x mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.z, r4.z add r4.y, r4.z, r4.y mul r4.z, r1.z, r3.x mul r4.w, r1.x, r3.z mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.w, r4.z add r6.z, r4.z, r4.y mul r4.y, r1.z, r2.w mul r4.z, r1.w, r2.z mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r0.x, r4.y mul r4.z, r1.w, r2.x mul r4.w, r1.x, r2.w mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.z, r4.z add r4.y, r4.z, r4.y mul r4.z, r1.x, r2.z mul r4.w, r1.z, r2.x mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.w, r4.z add r6.w, r4.z, r4.y mul r4.y, r2.y, r3.w mul r4.z, r2.w, r3.y mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r1.x, r4.y mul r4.z, r2.w, r3.x mul r4.w, r2.x, r3.w mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.y, r4.z add r4.y, r4.z, r4.y mul r4.z, r2.x, r3.y mul r4.w, r2.y, r3.x mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r1.w, r4.z add r7.x, r4.z, r4.y mul r4.y, r2.w, r3.y mul r4.z, r2.y, r3.w mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r0.x, r4.y mul r4.z, r2.x, r3.w mul r4.w, r2.w, r3.x mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.y, r4.z add r4.y, r4.z, r4.y mul r4.z, r2.y, r3.x mul r4.w, r2.x, r3.y mov r4.w, -r4.w add r4.z, r4.w, r4.z mul r4.z, r0.w, r4.z add r7.y, r4.z, r4.y mul r4.y, r1.y, r3.w mul r4.z, r1.w, r3.y mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r0.x, r4.y mul r4.z, r1.w, r3.x mul r3.w, r1.x, r3.w mov r3.w, -r3.w add r3.w, r3.w, r4.z mul r3.w, r0.y, r3.w add r3.w, r3.w, r4.y mul r4.y, r1.x, r3.y mul r4.z, r1.y, r3.x mov r4.z, -r4.z add r4.y, r4.z, r4.y mul r4.y, r0.w, r4.y add r7.z, r3.w, r4.y mul r3.w, r1.w, r2.y mul r4.y, r1.y, r2.w mov r4.y, -r4.y add r3.w, r3.w, r4.y mul r3.w, r0.x, r3.w mul r2.w, r1.x, r2.w mul r1.w, r1.w, r2.x mov r1.w, -r1.w add r1.w, r1.w, r2.w mul r1.w, r0.y, r1.w add r1.w, r1.w, r3.w mul r2.w, r1.y, r2.x mul r3.w, r1.x, r2.y mov r3.w, -r3.w add r2.w, r2.w, r3.w mul r0.w, r0.w, r2.w add r7.w, r0.w, r1.w mul r0.w, r2.z, r3.y mul r1.w, r2.y, r3.z mov r1.w, -r1.w add r0.w, r0.w, r1.w mul r0.w, r0.w, r1.x mul r1.w, r2.x, r3.z mul r2.w, r2.z, r3.x mov r2.w, -r2.w add r1.w, r1.w, r2.w mul r1.w, r1.w, r1.y add r0.w, r0.w, r1.w mul r1.w, r2.y, r3.x mul r2.w, r2.x, r3.y mov r2.w, -r2.w add r1.w, r1.w, r2.w mul r1.w, r1.w, r1.z add r8.x, r0.w, r1.w mul r0.w, r2.y, r3.z mul r1.w, r2.z, r3.y mov r1.w, -r1.w add r0.w, r0.w, r1.w mul r0.w, r0.w, r0.x mul r1.w, r2.z, r3.x mul r2.w, r2.x, r3.z mov r2.w, -r2.w add r1.w, r1.w, r2.w mul r1.w, r0.y, r1.w add r0.w, r0.w, r1.w mul r1.w, r2.x, r3.y mul r2.w, r2.y, r3.x mov r2.w, -r2.w add r1.w, r1.w, r2.w mul r1.w, r0.z, r1.w add r8.y, r0.w, r1.w mul r0.w, r1.z, r3.y mul r1.w, r1.y, r3.z mov r1.w, -r1.w add r0.w, r0.w, r1.w mul r0.w, r0.w, r0.x mul r1.w, r1.x, r3.z mul r2.w, r1.z, r3.x mov r2.w, -r2.w add r1.w, r1.w, r2.w mul r1.w, r0.y, r1.w add r0.w, r0.w, r1.w mul r1.w, r1.y, r3.x mul r2.w, r1.x, r3.y mov r2.w, -r2.w add r1.w, r1.w, r2.w mul r1.w, r0.z, r1.w add r8.z, r0.w, r1.w mul r0.w, r1.y, r2.z mul r1.w, r1.z, r2.y mov r1.w, -r1.w add r0.w, r0.w, r1.w mul r0.x, r0.w, r0.x mul r0.w, r1.z, r2.x mul r1.z, r1.x, r2.z mov r1.z, -r1.z add r0.w, r0.w, r1.z mul r0.y, r0.w, r0.y add r0.x, r0.y, r0.x mul r0.y, r1.x, r2.y mul r0.w, r1.y, r2.x mov r0.w, -r0.w add r0.y, r0.w, r0.y mul r0.y, r0.y, r0.z add r8.w, r0.y, r0.x div r0.xyzw, r5.xyzw, r4.xxxx div r1.xyzw, r6.xyzw, r4.xxxx div r2.xyzw, r7.xyzw, r4.xxxx div r3.xyzw, r8.xyzw, r4.xxxx // Store results for later use as r0-r4 are most // likely to be used by the default shader code. // r50 equivalent of matrix._m00_m01_m02_m03 // r51 equivalent of matrix._m10_m11_m12_m13 // r52 equivalent of matrix._m20_m21_m22_m23 // r53 equivalent of matrix._m30_m31_m32_m33 mov r50.xyzw, r0.xyzw mov r51.xyzw, r1.xyzw mov r52.xyzw, r2.xyzw mov r53.xyzw, r3.xyzw [/code] If you follow the ASM code and compare it with the HLSL, you can see exactly that is the same without any weird optimizations inside ;) (It also explains why the ASM code is soo long ^_^). Also, this code should be pasted at the beginning of main() to avoid overwriting any of the existed registers. In short it should be the first thing that get executed in that shader (I know it is obvious for you, but other readers might be confused). The code works as I am currently using it in BF1 fix. Thank you!
Sure thing Bo3b.
I'll paste it here so you can put it on wiki.bo3b.net where you believe is best place to have it;)
This code is with all the optimisations removed. So, is as clean as the original HLSL source.

Thus, the original HLSL code looks like this:
//Work out Inverse
//...Variables
float4 a1, a2, a3, a4;
float4 b1, b2, b3, b4;
float det;
//...Original Matrix
a1 = g_invViewProjMatrix._m00_m10_m20_m30;
a2 = g_invViewProjMatrix._m01_m11_m21_m31;
a3 = g_invViewProjMatrix._m02_m12_m22_m32;
a4 = g_invViewProjMatrix._m03_m13_m23_m33;
//...Determinant
det = a1.x*(a2.y*(a3.z*a4.w - a3.w*a4.z) + a2.z*(a3.w*a4.y - a3.y*a4.w) + a2.w*(a3.y*a4.z - a3.z*a4.y));
det += a1.y*(a2.x*(a3.w*a4.z - a3.z*a4.w) + a2.z*(a3.x*a4.w - a3.w*a4.z) + a2.w*(a3.z*a4.x - a3.x*a4.z));
det += a1.z*(a2.x*(a3.y*a4.w - a3.w*a4.y) + a2.y*(a3.w*a4.x - a3.x*a4.w) + a2.w*(a3.x*a4.y - a3.y*a4.x));
det += a1.w*(a2.x*(a3.z*a4.y - a3.y*a4.z) + a2.y*(a3.x*a4.z - a3.z*a4.x) + a2.z*(a3.y*a4.x - a3.x*a4.y));
//...Inverse Matrix Elements
b1.x = a2.y*(a3.z*a4.w - a3.w*a4.z) + a2.z*(a3.w*a4.y - a3.y*a4.w) + a2.w*(a3.y*a4.z - a3.z*a4.y);
b1.y = a1.y*(a3.w*a4.z - a3.z*a4.w) + a1.z*(a3.y*a4.w - a3.w*a4.y) + a1.w*(a3.z*a4.y - a3.y*a4.z);
b1.z = a1.y*(a2.z*a4.w - a2.w*a4.z) + a1.z*(a2.w*a4.y - a2.y*a4.w) + a1.w*(a2.y*a4.z - a2.z*a4.y);
b1.w = a1.y*(a2.w*a3.z - a2.z*a3.w) + a1.z*(a2.y*a3.w - a2.w*a3.y) + a1.w*(a2.z*a3.y - a2.y*a3.z);
b2.x = a2.x*(a3.w*a4.z - a3.z*a4.w) + a2.z*(a3.x*a4.w - a3.w*a4.x) + a2.w*(a3.z*a4.x - a3.x*a4.z);
b2.y = a1.x*(a3.z*a4.w - a3.w*a4.z) + a1.z*(a3.w*a4.x - a3.x*a4.w) + a1.w*(a3.x*a4.z - a3.z*a4.x);
b2.z = a1.x*(a2.w*a4.z - a2.z*a4.w) + a1.z*(a2.x*a4.w - a2.w*a4.x) + a1.w*(a2.z*a4.x - a2.x*a4.z);
b2.w = a1.x*(a2.z*a3.w - a2.w*a3.z) + a1.z*(a2.w*a3.x - a2.x*a3.w) + a1.w*(a2.x*a3.z - a2.z*a3.x);
b3.x = a2.x*(a3.y*a4.w - a3.w*a4.y) + a2.y*(a3.w*a4.x - a3.x*a4.w) + a2.w*(a3.x*a4.y - a3.y*a4.x);
b3.y = a1.x*(a3.w*a4.y - a3.y*a4.w) + a1.y*(a3.x*a4.w - a3.w*a4.x) + a1.w*(a3.y*a4.x - a3.x*a4.y);
b3.z = a1.x*(a2.y*a4.w - a2.w*a4.y) + a1.y*(a2.w*a4.x - a2.x*a4.w) + a1.w*(a2.x*a4.y - a2.y*a4.x);
b3.w = a1.x*(a2.w*a3.y - a2.y*a3.w) + a1.y*(a2.x*a3.w - a2.w*a3.x) + a1.w*(a2.y*a3.x - a2.x*a3.y);
b4.x = a2.x*(a3.z*a4.y - a3.y*a4.z) + a2.y*(a3.x*a4.z - a3.z*a4.x) + a2.z*(a3.y*a4.x - a3.x*a4.y);
b4.y = a1.x*(a3.y*a4.z - a3.z*a4.y) + a1.y*(a3.z*a4.x - a3.x*a4.z) + a1.z*(a3.x*a4.y - a3.y*a4.x);
b4.z = a1.x*(a2.z*a4.y - a2.y*a4.z) + a1.y*(a2.x*a4.z - a2.z*a4.x) + a1.z*(a2.y*a4.x - a2.x*a4.y);
b4.w = a1.x*(a2.y*a3.z - a2.z*a3.y) + a1.y*(a2.z*a3.x - a2.x*a3.z) + a1.z*(a2.x*a3.y - a2.y*a3.x);
b1.xyzw /= det;
b2.xyzw /= det;
b3.xyzw /= det;
b4.xyzw /= det;
//End Inverse


In ASM the exact same code looks like this:
// Declare how many registers ww use
// The code uses registers from r38 to r53.
dcl_temps 60

// 3DMigoto StereoParams:
dcl_resource_texture1d (float,float,float,float) t120
dcl_resource_texture2d (float,float,float,float) t125
ld_indexable(texture1d)(float,float,float,float) r41.xyzw, l(0, 0, 0, 0), t120.xyzw
ld_indexable(texture2d)(float,float,float,float) r40.xyzw, l(0, 0, 0, 0), t125.xyzw

// Inverse
// cb0[0], etc is the inverseMatrix
mov r0.xyzw, cb0[0].xyzw
mov r1.xyzw, cb0[1].xyzw
mov r2.xyzw, cb0[2].xyzw
mov r3.xyzw, cb0[3].xyzw
mul r4.x, r2.z, r3.w
mul r4.y, r2.w, r3.z
mov r4.y, -r4.y
add r4.x, r4.y, r4.x
mul r4.x, r1.y, r4.x
mul r4.y, r2.w, r3.y
mul r4.z, r2.y, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.z, r4.y
add r4.x, r4.y, r4.x
mul r4.y, r2.y, r3.z
mul r4.z, r2.z, r3.y
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.w, r4.y
add r4.x, r4.y, r4.x
mul r4.x, r0.x, r4.x
mul r4.y, r2.w, r3.z
mul r4.z, r2.z, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.x, r3.w
mul r4.w, r2.w, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.z, r3.x
mul r4.w, r2.x, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.y, r4.y
add r4.x, r4.y, r4.x
mul r4.y, r2.y, r3.w
mul r4.z, r2.w, r3.y
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.w, r3.x
mul r4.w, r2.x, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.y, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.x, r3.y
mul r4.w, r2.y, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.z, r4.y
add r4.x, r4.y, r4.x
mul r4.y, r2.z, r3.y
mul r4.z, r2.y, r3.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.x, r3.z
mul r4.w, r2.z, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.y, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.y, r3.x
mul r4.w, r2.x, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.z, r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.w, r4.y
add r4.x, r4.y, r4.x
mul r4.y, r2.z, r3.w
mul r4.z, r2.w, r3.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.y, r4.y
mul r4.z, r2.w, r3.y
mul r4.w, r2.y, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.y, r3.z
mul r4.w, r2.z, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r5.x, r4.z, r4.y
mul r4.y, r2.w, r3.z
mul r4.z, r2.z, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.y, r4.y
mul r4.z, r2.y, r3.w
mul r4.w, r2.w, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.z, r3.y
mul r4.w, r2.y, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r5.y, r4.z, r4.y
mul r4.y, r1.z, r3.w
mul r4.z, r1.w, r3.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.y, r4.y
mul r4.z, r1.w, r3.y
mul r4.w, r1.y, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r1.y, r3.z
mul r4.w, r1.z, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r5.z, r4.z, r4.y
mul r4.y, r1.w, r2.z
mul r4.z, r1.z, r2.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.y, r4.y
mul r4.z, r1.y, r2.w
mul r4.w, r1.w, r2.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r1.z, r2.y
mul r4.w, r1.y, r2.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r5.w, r4.z, r4.y
mul r4.y, r2.w, r3.z
mul r4.z, r2.z, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.x, r3.w
mul r4.w, r2.w, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.z, r3.x
mul r4.w, r2.x, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r6.x, r4.z, r4.y
mul r4.y, r2.z, r3.w
mul r4.z, r2.w, r3.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r2.w, r3.x
mul r4.w, r2.x, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.x, r3.z
mul r4.w, r2.z, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r6.y, r4.z, r4.y
mul r4.y, r1.w, r3.z
mul r4.z, r1.z, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r1.x, r3.w
mul r4.w, r1.w, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r1.z, r3.x
mul r4.w, r1.x, r3.z
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r6.z, r4.z, r4.y
mul r4.y, r1.z, r2.w
mul r4.z, r1.w, r2.z
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r1.w, r2.x
mul r4.w, r1.x, r2.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.z, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r1.x, r2.z
mul r4.w, r1.z, r2.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r6.w, r4.z, r4.y
mul r4.y, r2.y, r3.w
mul r4.z, r2.w, r3.y
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r1.x, r4.y
mul r4.z, r2.w, r3.x
mul r4.w, r2.x, r3.w
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.y, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.x, r3.y
mul r4.w, r2.y, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r1.w, r4.z
add r7.x, r4.z, r4.y
mul r4.y, r2.w, r3.y
mul r4.z, r2.y, r3.w
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r2.x, r3.w
mul r4.w, r2.w, r3.x
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.y, r4.z
add r4.y, r4.z, r4.y
mul r4.z, r2.y, r3.x
mul r4.w, r2.x, r3.y
mov r4.w, -r4.w
add r4.z, r4.w, r4.z
mul r4.z, r0.w, r4.z
add r7.y, r4.z, r4.y
mul r4.y, r1.y, r3.w
mul r4.z, r1.w, r3.y
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.x, r4.y
mul r4.z, r1.w, r3.x
mul r3.w, r1.x, r3.w
mov r3.w, -r3.w
add r3.w, r3.w, r4.z
mul r3.w, r0.y, r3.w
add r3.w, r3.w, r4.y
mul r4.y, r1.x, r3.y
mul r4.z, r1.y, r3.x
mov r4.z, -r4.z
add r4.y, r4.z, r4.y
mul r4.y, r0.w, r4.y
add r7.z, r3.w, r4.y
mul r3.w, r1.w, r2.y
mul r4.y, r1.y, r2.w
mov r4.y, -r4.y
add r3.w, r3.w, r4.y
mul r3.w, r0.x, r3.w
mul r2.w, r1.x, r2.w
mul r1.w, r1.w, r2.x
mov r1.w, -r1.w
add r1.w, r1.w, r2.w
mul r1.w, r0.y, r1.w
add r1.w, r1.w, r3.w
mul r2.w, r1.y, r2.x
mul r3.w, r1.x, r2.y
mov r3.w, -r3.w
add r2.w, r2.w, r3.w
mul r0.w, r0.w, r2.w
add r7.w, r0.w, r1.w
mul r0.w, r2.z, r3.y
mul r1.w, r2.y, r3.z
mov r1.w, -r1.w
add r0.w, r0.w, r1.w
mul r0.w, r0.w, r1.x
mul r1.w, r2.x, r3.z
mul r2.w, r2.z, r3.x
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r1.w, r1.y
add r0.w, r0.w, r1.w
mul r1.w, r2.y, r3.x
mul r2.w, r2.x, r3.y
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r1.w, r1.z
add r8.x, r0.w, r1.w
mul r0.w, r2.y, r3.z
mul r1.w, r2.z, r3.y
mov r1.w, -r1.w
add r0.w, r0.w, r1.w
mul r0.w, r0.w, r0.x
mul r1.w, r2.z, r3.x
mul r2.w, r2.x, r3.z
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r0.y, r1.w
add r0.w, r0.w, r1.w
mul r1.w, r2.x, r3.y
mul r2.w, r2.y, r3.x
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r0.z, r1.w
add r8.y, r0.w, r1.w
mul r0.w, r1.z, r3.y
mul r1.w, r1.y, r3.z
mov r1.w, -r1.w
add r0.w, r0.w, r1.w
mul r0.w, r0.w, r0.x
mul r1.w, r1.x, r3.z
mul r2.w, r1.z, r3.x
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r0.y, r1.w
add r0.w, r0.w, r1.w
mul r1.w, r1.y, r3.x
mul r2.w, r1.x, r3.y
mov r2.w, -r2.w
add r1.w, r1.w, r2.w
mul r1.w, r0.z, r1.w
add r8.z, r0.w, r1.w
mul r0.w, r1.y, r2.z
mul r1.w, r1.z, r2.y
mov r1.w, -r1.w
add r0.w, r0.w, r1.w
mul r0.x, r0.w, r0.x
mul r0.w, r1.z, r2.x
mul r1.z, r1.x, r2.z
mov r1.z, -r1.z
add r0.w, r0.w, r1.z
mul r0.y, r0.w, r0.y
add r0.x, r0.y, r0.x
mul r0.y, r1.x, r2.y
mul r0.w, r1.y, r2.x
mov r0.w, -r0.w
add r0.y, r0.w, r0.y
mul r0.y, r0.y, r0.z
add r8.w, r0.y, r0.x
div r0.xyzw, r5.xyzw, r4.xxxx
div r1.xyzw, r6.xyzw, r4.xxxx
div r2.xyzw, r7.xyzw, r4.xxxx
div r3.xyzw, r8.xyzw, r4.xxxx

// Store results for later use as r0-r4 are most
// likely to be used by the default shader code.
// r50 equivalent of matrix._m00_m01_m02_m03
// r51 equivalent of matrix._m10_m11_m12_m13
// r52 equivalent of matrix._m20_m21_m22_m23
// r53 equivalent of matrix._m30_m31_m32_m33

mov r50.xyzw, r0.xyzw
mov r51.xyzw, r1.xyzw
mov r52.xyzw, r2.xyzw
mov r53.xyzw, r3.xyzw


If you follow the ASM code and compare it with the HLSL, you can see exactly that is the same without any weird optimizations inside ;) (It also explains why the ASM code is soo long ^_^).

Also, this code should be pasted at the beginning of main() to avoid overwriting any of the existed registers. In short it should be the first thing that get executed in that shader (I know it is obvious for you, but other readers might be confused).
The code works as I am currently using it in BF1 fix.

Thank you!

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/23/2016 10:57 AM   
Hi all, I haven't used 3dMigoto yet but I read in this thread that matrix inversion has to be done manually in the shader code. The commonly used code seems to be based on Cramer's rule. I think that some code based on the Gauss-Jordan algorithm should be much faster on a GPU because it exploits the vector characteristic of the registers, especially if the code is executed for each pixel/vertex. Has anybody tried that out yet, or is there some reason for the used code ? For a (not debugged) code example see below. [code] // inverseMatrix.asm // Matrix inversion with Gauss-Jordan elimination algorithm // input matrix is in r0-r3 // output will be in r4-r7 // r8, r9 are used as temporary registers // c200 = (1,0,0,0) is required // r0.x r0.y r0.z r0.w | r4.x, r4.y, r4.z, r4.w // r1.x r1.y r1.z r1.w | r5.x, r5.y, r5.z, r5.w // r2.x r2.y r2.z r2.w | r6.x, r6.y, r6.z, r6.w // r3.x r3.y r3.z r3.w | r7.x, r7.y, r7.z, r7.w // Init registers def c200, 1, 0, 0, 0 mov r4, c200.xyzw mov r5, c200.wxyz mov r6, c200.zwxy mov r7, c200.yzwx // First column rcp r8.x, r0.x mul r8.y, r8.x, r1.x mul r9, r0, r8.y sub r1, r1, r9 mul r9, r4, r8.y sub r5, r5, r9 mul r8.y, r8.x, r2.x mul r9, r0, r8.y sub r2, r2, r9 mul r9, r4, r8.y sub r6, r6, r9 mul r8.y, r8.x, r3.x mul r9, r0, r8.y sub r3, r3, r9 mul r9, r4, r8.y sub r7, r7, r9 // Second column rcp r8.x, r1.y mul r8.y, r8.x, r2.y mul r9, r1, r8.y sub r2, r2, r9 mul r9, r5, r8.y sub r6, r6, r9 mul r8.y, r8.x, r3.y mul r9, r1, r8.y sub r3, r3, r9 mul r9, r5, r8.y sub r7, r7, r9 // Third column rcp r8.x, r2.z mul r8.y, r8.x, r3.z mul r9, r2, r8.y sub r3, r3, r9 mul r9, r6, r8.y sub r7, r7, r9 // Normalize r3.w rcp r8.x, r3.w mul r3, r3, r8.x mul r7, r7, r8.w // Fourth column mul r9, r3, r2.w sub r2, r2, r9 mul r9, r7, r2.w sub r6, r6, r9 mul r9, r3, r1.w sub r1, r1, r9 mul r9, r7, r1.w sub r5, r5, r9 mul r9, r3, r0.w sub r0, r0, r9 mul r9, r7, r0.w sub r4, r4, r9 // Third column (upper part) mul r9, r2, r1.z sub r1, r1, r9 mul r9, r6, r1.z sub r5, r5, r9 mul r9, r2, r0.z sub r0, r0, r9 mul r9, r6, r0.z sub r4, r4, r9 // Second column (upper part) mul r9, r1, r0.y sub r0, r0, r9 mul r9, r5, r0.y sub r4, r4, r9 // Normalize first column rcp r8.x, r0.x mul r0, r0, r8.x mul r4, r4, r8.x [/code]
Hi all,

I haven't used 3dMigoto yet but I read in this thread that matrix inversion has to be done manually in the shader code. The commonly used code seems to be based on Cramer's rule. I think that some code based on the Gauss-Jordan algorithm should be much faster on a GPU because it exploits the vector characteristic of the registers, especially if the code is executed for each pixel/vertex.

Has anybody tried that out yet, or is there some reason for the used code ?
For a (not debugged) code example see below.

// inverseMatrix.asm
// Matrix inversion with Gauss-Jordan elimination algorithm

// input matrix is in r0-r3
// output will be in r4-r7
// r8, r9 are used as temporary registers
// c200 = (1,0,0,0) is required

// r0.x r0.y r0.z r0.w | r4.x, r4.y, r4.z, r4.w
// r1.x r1.y r1.z r1.w | r5.x, r5.y, r5.z, r5.w
// r2.x r2.y r2.z r2.w | r6.x, r6.y, r6.z, r6.w
// r3.x r3.y r3.z r3.w | r7.x, r7.y, r7.z, r7.w

// Init registers
def c200, 1, 0, 0, 0
mov r4, c200.xyzw
mov r5, c200.wxyz
mov r6, c200.zwxy
mov r7, c200.yzwx

// First column
rcp r8.x, r0.x
mul r8.y, r8.x, r1.x
mul r9, r0, r8.y
sub r1, r1, r9
mul r9, r4, r8.y
sub r5, r5, r9

mul r8.y, r8.x, r2.x
mul r9, r0, r8.y
sub r2, r2, r9
mul r9, r4, r8.y
sub r6, r6, r9

mul r8.y, r8.x, r3.x
mul r9, r0, r8.y
sub r3, r3, r9
mul r9, r4, r8.y
sub r7, r7, r9

// Second column
rcp r8.x, r1.y
mul r8.y, r8.x, r2.y
mul r9, r1, r8.y
sub r2, r2, r9
mul r9, r5, r8.y
sub r6, r6, r9

mul r8.y, r8.x, r3.y
mul r9, r1, r8.y
sub r3, r3, r9
mul r9, r5, r8.y
sub r7, r7, r9

// Third column
rcp r8.x, r2.z
mul r8.y, r8.x, r3.z
mul r9, r2, r8.y
sub r3, r3, r9
mul r9, r6, r8.y
sub r7, r7, r9

// Normalize r3.w
rcp r8.x, r3.w
mul r3, r3, r8.x
mul r7, r7, r8.w

// Fourth column
mul r9, r3, r2.w
sub r2, r2, r9
mul r9, r7, r2.w
sub r6, r6, r9

mul r9, r3, r1.w
sub r1, r1, r9
mul r9, r7, r1.w
sub r5, r5, r9

mul r9, r3, r0.w
sub r0, r0, r9
mul r9, r7, r0.w
sub r4, r4, r9

// Third column (upper part)
mul r9, r2, r1.z
sub r1, r1, r9
mul r9, r6, r1.z
sub r5, r5, r9

mul r9, r2, r0.z
sub r0, r0, r9
mul r9, r6, r0.z
sub r4, r4, r9

// Second column (upper part)
mul r9, r1, r0.y
sub r0, r0, r9
mul r9, r5, r0.y
sub r4, r4, r9

// Normalize first column
rcp r8.x, r0.x
mul r0, r0, r8.x
mul r4, r4, r8.x
[quote="DHR"]@helifax The injected inverse matrix + fixing code from Mirror Edge i send you is not working in the CS Tile Lights from B1?. Is easier to use the injected one. If don't work will make this a more strange case. Those CS Lights have a very strange behavior....i almost sure is a profile/driver stuff....I already try i few thing with Mirror Edge, but in some spot and some angles only renders in one eye. [/quote] 1) The HLSL for Matrix Inversion works;) but 3DMigoto just decides to make a low beep when I put in the HASH of the Compute shader. So it doesn't work with the game. Doesn't like that HASH for some reason no matter what I do. Thus, I had to put the matrix inverse in the shader code :) 2)The CS not working IS NOT A DRIVER ISSUE. Actually is the CS shaders that needs fixing! I haven't made FULL fix for them, but I see where things go wrong! Believe it or not, but the driver is actually working as it should;) It only affects the LEFT eye. What we know is that for Left eye we say "(-1) * separation". I expect that the CS doesn't like the NEGATIVE value of the position and just discards it or does weird thing with it! @DHR: I managed to hack it to some degree, but is not a proper fix;) Sadly, I don't know what much about Compute shaders in 3D Vision. I know DSS is the expert as he always helped me before with them. If you want we can try to see what is wrong, but without a proper understand I don't think we can come up with the true formula;) @mx-2: - Thanks for that code. Didn't try it yet, but it definitely makes sense;) Big thanks for your reply!!!
DHR said:@helifax
The injected inverse matrix + fixing code from Mirror Edge i send you is not working in the CS Tile Lights from B1?. Is easier to use the injected one.

If don't work will make this a more strange case.

Those CS Lights have a very strange behavior....i almost sure is a profile/driver stuff....I already try i few thing with Mirror Edge, but in some spot and some angles only renders in one eye.


1) The HLSL for Matrix Inversion works;) but 3DMigoto just decides to make a low beep when I put in the HASH of the Compute shader. So it doesn't work with the game. Doesn't like that HASH for some reason no matter what I do.
Thus, I had to put the matrix inverse in the shader code :)

2)The CS not working IS NOT A DRIVER ISSUE. Actually is the CS shaders that needs fixing! I haven't made FULL fix for them, but I see where things go wrong! Believe it or not, but the driver is actually working as it should;) It only affects the LEFT eye. What we know is that for Left eye we say "(-1) * separation". I expect that the CS doesn't like the NEGATIVE value of the position and just discards it or does weird thing with it!

@DHR:
I managed to hack it to some degree, but is not a proper fix;)
Sadly, I don't know what much about Compute shaders in 3D Vision. I know DSS is the expert as he always helped me before with them. If you want we can try to see what is wrong, but without a proper understand I don't think we can come up with the true formula;)


@mx-2:
- Thanks for that code. Didn't try it yet, but it definitely makes sense;) Big thanks for your reply!!!

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/23/2016 07:49 PM   
About my Blazblue problems and after trying a few things, I've noticed something: my "8EF88061.txt.ps" file isn't receiving constants (that I want to use with a hotkey). Even if I write (important parts of DX9Settings.ini here): [code] DefVSConst1 = 190 DefPSConst1 = 190 [KEY3] Key = 114 Presets = 3;4; Type = 1 [PRES3] Const1 = 0x3f800000 [PRES4] Const1 = 0x00000000 UseByDef = true [/code] The shader always treats c190.x as 0, even if I make both presets have the "0x3f800000" value. Changing the other part of the "if_eq" in the shader (if_eq r27.x, c190.x) (where I r27.x refers to a constant defined in the shader that can be 0 or 1, for testing), changed the effects ingame correctly. The "Const1" value isn't reaching the shader correctly. Is this normal?
About my Blazblue problems and after trying a few things, I've noticed something: my "8EF88061.txt.ps" file isn't receiving constants (that I want to use with a hotkey). Even if I write (important parts of DX9Settings.ini here):

DefVSConst1 = 190
DefPSConst1 = 190

[KEY3]
Key = 114
Presets = 3;4;
Type = 1

[PRES3]
Const1 = 0x3f800000

[PRES4]
Const1 = 0x00000000
UseByDef = true


The shader always treats c190.x as 0, even if I make both presets have the "0x3f800000" value. Changing the other part of the "if_eq" in the shader (if_eq r27.x, c190.x) (where I r27.x refers to a constant defined in the shader that can be 0 or 1, for testing), changed the effects ingame correctly. The "Const1" value isn't reaching the shader correctly.

Is this normal?

CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com

Posted 10/23/2016 08:00 PM   
Hi Bo3b, I was wondering if there is something I can do to decrease the time it takes the wrapper to dump the shaders? In any frostbyte 3 game it takes 15 minutes to load a game... I am currently only using "export_hlsl=2" option. Nomally it dumps around 20k shaders when a level loads... If you can think of anything I can do to decrease this insane time, please let me know! Cheers!
Hi Bo3b, I was wondering if there is something I can do to decrease the time it takes the wrapper to dump the shaders?

In any frostbyte 3 game it takes 15 minutes to load a game... I am currently only using "export_hlsl=2" option.
Nomally it dumps around 20k shaders when a level loads...
If you can think of anything I can do to decrease this insane time, please let me know!

Cheers!

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 10/24/2016 08:14 PM   
[quote="masterotaku"]About my Blazblue problems and after trying a few things, I've noticed something: my "8EF88061.txt.ps" file isn't receiving constants (that I want to use with a hotkey). Even if I write (important parts of DX9Settings.ini here): [code] DefVSConst1 = 190 DefPSConst1 = 190 [KEY3] Key = 114 Presets = 3;4; Type = 1 [PRES3] Const1 = 0x3f800000 [PRES4] Const1 = 0x00000000 UseByDef = true [/code] The shader always treats c190.x as 0, even if I make both presets have the "0x3f800000" value. Changing the other part of the "if_eq" in the shader (if_eq r27.x, c190.x) (where I r27.x refers to a constant defined in the shader that can be 0 or 1, for testing), changed the effects ingame correctly. The "Const1" value isn't reaching the shader correctly. Is this normal?[/quote] Not normal, should work. Try a different constant register. It might be a conflict with a game shader use. Might also be worth doing a full dump of all shaders to see if c190 is in use. I'm seem to also vaguely remember there is some sort of conflict with having both VS and PS use the same register, or maybe on specific games it doesn't always work. Might be worth trying a forum search for something like that.
masterotaku said:About my Blazblue problems and after trying a few things, I've noticed something: my "8EF88061.txt.ps" file isn't receiving constants (that I want to use with a hotkey). Even if I write (important parts of DX9Settings.ini here):

DefVSConst1 = 190
DefPSConst1 = 190

[KEY3]
Key = 114
Presets = 3;4;
Type = 1

[PRES3]
Const1 = 0x3f800000

[PRES4]
Const1 = 0x00000000
UseByDef = true


The shader always treats c190.x as 0, even if I make both presets have the "0x3f800000" value. Changing the other part of the "if_eq" in the shader (if_eq r27.x, c190.x) (where I r27.x refers to a constant defined in the shader that can be 0 or 1, for testing), changed the effects ingame correctly. The "Const1" value isn't reaching the shader correctly.

Is this normal?

Not normal, should work.

Try a different constant register. It might be a conflict with a game shader use. Might also be worth doing a full dump of all shaders to see if c190 is in use.

I'm seem to also vaguely remember there is some sort of conflict with having both VS and PS use the same register, or maybe on specific games it doesn't always work. Might be worth trying a forum search for something like that.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 10/25/2016 10:12 AM   
  66 / 88    
Scroll To Top