Rise of the Tomb Raider (3D Vision Ready Support)
  13 / 41    
[quote="DarkStarSword"][quote="helifax"]Did anyone notice that if Bloom is enabled we get a "bleed" image in the left eye from the Main Menu ? I know is not something BIG and we could perfectly play with Bloom disabled but I was wondering if you guys see the same thing and I was wondering how we can fix this hmm...[/quote]Sounds like a mono render target that needs to be forced to Stereo (same thing in Mad Max IIRC). I find that frame analysis tends to be the easiest and most reliable way to find these. [quote="helifax"]Anyone can tell me how I can manually fix these lines in HLSL please?:) [code] r2.xyz = v1.xyz * r1.www; // Known bad code for instruction (needs manual fix): ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx r0.w = g_sTrilinear[]..swiz; [/code] [/quote] I think I gave you a hand on one of these before: [url]https://forums.geforce.com/default/topic/766890/3d-vision/bo3bs-school-for-shaderhackers/post/4739720/#4739720[/url] Something like: r0.w = <whatever the StructuredBuffer for t5 is called>[r0.w].<whatever is 4 bytes into the structure>.<possibly with a swizzle to select the right offset if it's in a float4, uint4 or similar>; [quote] [code] r2.y = (((uint)r0.y << 0) & bitmask.y) | ((uint)r0.x & ~bitmask.y); r0.xy = (uint2)v0.yx; // Needs manual fix for instruction: imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1) InterlockedAdd?(dest, value, orig_value); [/code] [/quote] I think that will be something like: [code] uint oldval; <whatever the buffer for u3 is called>.InterlockedAdd(0, 1, oldval); r1.x = oldval; [/code] [/quote] Big thanks! I actually had bookmarked that post! To use for future and I actually looked at it yesterday but I missed it >:<... Now I feel silly for asking it again:)) I think I was able to fix the 1st one. I have attached the whole shader here. Can you please give it a look and see that I understood correctly??? [code] // ---- Created with 3Dmigoto v1.2.27 on Fri Jan 29 19:41:02 2016 // HAIR ON cbuffer cbPerScene : register(b8) { row_major float4x4 g_mViewProj : packoffset(c0); row_major float4x4 g_mInvViewProj : packoffset(c4); float3 g_vEye : packoffset(c8); float g_FiberAlpha : packoffset(c8.w); float2 g_WinSize : packoffset(c9); float g_FiberRadius : packoffset(c9.z); float g_fvFov : packoffset(c9.w); float2 g_zMinMax : packoffset(c10); float g_zWriteValue : packoffset(c10.z); float g_NoiseScale : packoffset(c10.w); float4 g_ScreenExtents : packoffset(c11); float3 g_dirtColor : packoffset(c12); float g_dirtLevel : packoffset(c12.w); uint g_baseGroupId : packoffset(c13); uint g_dirtGroupId : packoffset(c13.y); float g_widthCurveOverride : packoffset(c13.z); float g_alphaCurveOverride : packoffset(c13.w); float3 g_SlaveOffsetsReferencePosition : packoffset(c14); float g_SlaveOffsetsScale : packoffset(c14.w); } SamplerState g_sTrilinear_s : register(s1); Texture2D<float> g_txNoise : register(t3); Texture2D<float> g_txDirt : register(t4); // Do we need to comment out the old one??? //StructuredBuffer<GroupRenderData> GroupRenderData : register(t5); // 3Dmigoto declarations #define cmp - Texture1D<float4> IniParams : register(t120); Texture2D<float4> StereoParams : register(t125); struct GroupRenderDataType { struct HairGroupRenderData { float noiseFrequency; // Offset: 0 float noiseIntensity; // Offset: 4 struct curve { float4 samples03; // Offset: 8 float4 samples47; // Offset: 24 } thicknessCurve; // Offset: 8 struct curve { float4 samples03; // Offset: 40 float4 samples47; // Offset: 56 } alphaCurve; // Offset: 40 } $Element; // Offset: 0 Size: 72 }; // New Struct Buffer definition StructuredBuffer<GroupRenderDataType> GroupRenderData : register(t5); void main( float4 v0 : SV_POSITION0, float4 v1 : TANGENT0, float4 v2 : TEXCOORD0, float3 v3 : TEXCOORD1, float4 v4 : COLOR0, out float4 o0 : SV_Target0) { float4 r0,r1,r2,r3; uint4 bitmask, uiDest; float4 fDest; r0.x = dot(g_ScreenExtents.zz, v0.xx); r0.xz = float2(-1,-1) + r0.xx; r1.x = dot(g_ScreenExtents.ww, v0.yy); r0.yw = float2(1,1) + -r1.xx; r1.xyzw = g_WinSize.xyxy * v2.xyzw; r0.xyzw = -r0.xyzw * g_WinSize.xyxy + r1.xyzw; r0.x = dot(r0.xy, r0.xy); r0.y = dot(r0.zw, r0.zw); r0.zw = v2.xy * g_WinSize.xy + -r1.zw; r0.z = dot(r0.zw, r0.zw); r0.xyz = sqrt(r0.xyz); r0.w = cmp(r0.x >= r0.z); r0.z = cmp(r0.y >= r0.z); r1.xy = r0.wz ? 1.000000 : 0; r0.z = dot(r1.xy, r1.xy); r0.z = cmp(r0.z != 0.000000); r0.z = r0.z ? -1 : 1; r0.x = min(r0.x, r0.y); r0.x = min(1, r0.x); r0.x = r0.z * r0.x + 1; r0.x = v3.z * r0.x; r0.y = cmp(0.00776470592 < r0.x); r1.x = v1.w; r1.y = 0.5; r0.z = g_txNoise.Sample(g_sTrilinear_s, r1.xy).x; if (r0.y != 0) { r0.y = g_txDirt.Sample(g_sTrilinear_s, v3.xy).x; r0.w = (uint)v4.w; r1.x = -g_dirtLevel + 1; r0.y = -r1.x + r0.y; r1.x = g_dirtLevel + 0.00100000005; r0.y = saturate(r0.y / r1.x); r1.xyz = r0.yyy * g_dirtColor.xyz + -v4.xyz; r1.xyz = saturate(r0.yyy * r1.xyz + v4.xyz); r1.w = dot(v1.xyz, v1.xyz); r1.w = rsqrt(r1.w); r2.xyz = v1.xyz * r1.www; // Known bad code for instruction (needs manual fix): //ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx // is this correct?!?! r0.w = GroupRenderData[r0.w].Element.y; r0.w = g_sTrilinear[]..swiz; r0.z = r0.w * r0.z; r0.y = cmp(0.5 < r0.y); r3.xy = g_dirtGroupId; r0.w = v4.w + r3.y; r0.y = r0.y ? r3.x : r0.w; r2.xyz = saturate(r2.xyz * float3(0.5,0.5,0.5) + float3(0.5,0.5,0.5)); r2.xyz = float3(255,255,255) * r2.xyz; r2.xyz = (uint3)r2.xyz; r2.yz = (uint2)r2.yz << int2(16,8); r0.w = mad((int)r2.x, 0x01000000, (int)r2.y); r0.w = (int)r0.w + (int)r2.z; r0.x = saturate(-r0.x * 0.5 + 1); r0.x = 255 * r0.x; r0.xy = (uint2)r0.xy; r2.x = (int)r0.w + (int)r0.x; r1.xyz = float3(127,127,63) * r1.xyz; r1.xyz = (uint3)r1.xyz; r0.xw = (uint2)r1.yz << int2(18,12); r0.x = mad((int)r1.x, 0x02000000, (int)r0.x); r0.x = (int)r0.x + (int)r0.w; r0.z = saturate(4 * r0.z); r0.z = 63 * r0.z; r0.z = (uint)r0.z; r0.z = (uint)r0.z << 6; r0.x = (int)r0.x + (int)r0.z; bitmask.y = ((~(-1 << 6)) << 0) & 0xffffffff; r2.y = (((uint)r0.y << 0) & bitmask.y) | ((uint)r0.x & ~bitmask.y); r0.xy = (uint2)v0.yx; // Needs manual fix for instruction: imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1) InterlockedAdd?(dest, value, orig_value); r0.xy = (uint2)r0.xy; r0.x = r0.x * g_WinSize.x + r0.y; r0.x = 4 * r0.x; r0.x = (uint)r0.x; } /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // // Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384 // // using 3Dmigoto v1.2.27 on Fri Jan 29 19:41:02 2016 // // // Note: shader requires additional functionality: // Early depth-stencil // // // Buffer Definitions: // // cbuffer cbPerScene // { // // row_major float4x4 g_mViewProj; // Offset: 0 Size: 64 [unused] // row_major float4x4 g_mInvViewProj; // Offset: 64 Size: 64 [unused] // float3 g_vEye; // Offset: 128 Size: 12 [unused] // float g_FiberAlpha; // Offset: 140 Size: 4 [unused] // float2 g_WinSize; // Offset: 144 Size: 8 // float g_FiberRadius; // Offset: 152 Size: 4 [unused] // float g_fvFov; // Offset: 156 Size: 4 [unused] // float2 g_zMinMax; // Offset: 160 Size: 8 [unused] // float g_zWriteValue; // Offset: 168 Size: 4 [unused] // float g_NoiseScale; // Offset: 172 Size: 4 [unused] // float4 g_ScreenExtents; // Offset: 176 Size: 16 // float3 g_dirtColor; // Offset: 192 Size: 12 // float g_dirtLevel; // Offset: 204 Size: 4 // uint g_baseGroupId; // Offset: 208 Size: 4 // uint g_dirtGroupId; // Offset: 212 Size: 4 // float g_widthCurveOverride; // Offset: 216 Size: 4 [unused] // float g_alphaCurveOverride; // Offset: 220 Size: 4 [unused] // float3 g_SlaveOffsetsReferencePosition;// Offset: 224 Size: 12 [unused] // float g_SlaveOffsetsScale; // Offset: 236 Size: 4 [unused] // // } // // Resource bind info for GroupRenderData // { // // struct HairGroupRenderData // { // // float noiseFrequency; // Offset: 0 // float noiseIntensity; // Offset: 4 // // struct curve // { // // float4 samples03; // Offset: 8 // float4 samples47; // Offset: 24 // // } thicknessCurve; // Offset: 8 // // struct curve // { // // float4 samples03; // Offset: 40 // float4 samples47; // Offset: 56 // // } alphaCurve; // Offset: 40 // // } $Element; // Offset: 0 Size: 72 // // } // // Resource bind info for HairElementsUAV // { // // struct ABufferNode // { // // uint uPackedData0; // Offset: 0 // uint uPackedData1; // Offset: 4 // uint uPackedData2_Next; // Offset: 8 // float fDepth; // Offset: 12 // // } $Element; // Offset: 0 Size: 16 // // } // // Resource bind info for HairElementsCounterUAV // { // // uint $Element; // Offset: 0 Size: 4 // // } // // // Resource Bindings: // // Name Type Format Dim Slot Elements // ------------------------------ ---------- ------- ----------- ---- -------- // g_sTrilinear sampler NA NA 1 1 // g_txNoise texture float 2d 3 1 // g_txDirt texture float 2d 4 1 // GroupRenderData texture struct r/o 5 1 // HairPixelHeadUAV UAV byte r/w 1 1 // HairElementsUAV UAV struct r/w 2 1 // HairElementsCounterUAV UAV struct r/w 3 1 // cbPerScene cbuffer NA NA 8 1 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_POSITION 0 xyzw 0 POS float xyz // TANGENT 0 xyzw 1 NONE float xyzw // TEXCOORD 0 xyzw 2 NONE float xyzw // TEXCOORD 1 xyz 3 NONE float xyz // COLOR 0 xyzw 4 NONE float xyzw // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Target 0 xyzw 0 TARGET float xyzw // ps_5_0 dcl_globalFlags refactoringAllowed | forceEarlyDepthStencil dcl_constantbuffer cb8[14], immediateIndexed dcl_sampler s1, mode_default dcl_resource_texture2d (float,float,float,float) t3 dcl_resource_texture2d (float,float,float,float) t4 dcl_resource_structured t5, 72 dcl_uav_raw u1 dcl_uav_structured u2, 16 dcl_uav_structured u3, 4 dcl_input_ps_siv linear noperspective v0.xyz, position dcl_input_ps linear v1.xyzw dcl_input_ps linear v2.xyzw dcl_input_ps linear v3.xyz dcl_input_ps linear v4.xyzw dcl_output o0.xyzw dcl_temps 4 dp2 r0.x, cb8[11].zzzz, v0.xxxx add r0.xz, r0.xxxx, l(-1.000000, 0.000000, -1.000000, 0.000000) dp2 r1.x, cb8[11].wwww, v0.yyyy add r0.yw, -r1.xxxx, l(0.000000, 1.000000, 0.000000, 1.000000) mul r1.xyzw, v2.xyzw, cb8[9].xyxy mad r0.xyzw, -r0.xyzw, cb8[9].xyxy, r1.xyzw dp2 r0.x, r0.xyxx, r0.xyxx dp2 r0.y, r0.zwzz, r0.zwzz mad r0.zw, v2.xxxy, cb8[9].xxxy, -r1.zzzw dp2 r0.z, r0.zwzz, r0.zwzz sqrt r0.xyz, r0.xyzx ge r0.w, r0.x, r0.z ge r0.z, r0.y, r0.z and r1.xy, r0.wzww, l(0x3f800000, 0x3f800000, 0, 0) dp2 r0.z, r1.xyxx, r1.xyxx ne r0.z, r0.z, l(0.000000) movc r0.z, r0.z, l(-1.000000), l(1.000000) min r0.x, r0.y, r0.x min r0.x, r0.x, l(1.000000) mad r0.x, r0.z, r0.x, l(1.000000) mul r0.x, r0.x, v3.z lt r0.y, l(0.00776470592), r0.x mov r1.x, v1.w mov r1.y, l(0.500000) sample_indexable(texture2d)(float,float,float,float) r0.z, r1.xyxx, t3.yzxw, s1 if_nz r0.y sample_indexable(texture2d)(float,float,float,float) r0.y, v3.xyxx, t4.yxzw, s1 ftou r0.w, v4.w add r1.x, l(1.000000), -cb8[12].w add r0.y, r0.y, -r1.x add r1.x, l(0.001000), cb8[12].w div_sat r0.y, r0.y, r1.x mad r1.xyz, r0.yyyy, cb8[12].xyzx, -v4.xyzx mad_sat r1.xyz, r0.yyyy, r1.xyzx, v4.xyzx dp3 r1.w, v1.xyzx, v1.xyzx rsq r1.w, r1.w mul r2.xyz, r1.wwww, v1.xyzx ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx mul r0.z, r0.z, r0.w lt r0.y, l(0.500000), r0.y utof r3.xy, cb8[13].yxyy add r0.w, r3.y, v4.w movc r0.y, r0.y, r3.x, r0.w mad_sat r2.xyz, r2.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000) mul r2.xyz, r2.xyzx, l(255.000000, 255.000000, 255.000000, 0.000000) ftou r2.xyz, r2.xyzx ishl r2.yz, r2.yyzy, l(0, 16, 8, 0) imad r0.w, r2.x, l(0x01000000), r2.y iadd r0.w, r0.w, r2.z mad_sat r0.x, -r0.x, l(0.500000), l(1.000000) mul r0.x, r0.x, l(255.000000) ftou r0.xy, r0.xyxx iadd r2.x, r0.w, r0.x mul r1.xyz, r1.xyzx, l(127.000000, 127.000000, 63.000000, 0.000000) ftou r1.xyz, r1.xyzx ishl r0.xw, r1.yyyz, l(18, 0, 0, 12) imad r0.x, r1.x, l(0x02000000), r0.x iadd r0.x, r0.x, r0.w mul_sat r0.z, r0.z, l(4.000000) mul r0.z, r0.z, l(63.000000) ftou r0.z, r0.z ishl r0.z, r0.z, l(6) iadd r0.x, r0.x, r0.z bfi r2.y, l(6), l(0), r0.y, r0.x ftou r0.xy, v0.yxyy imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1) utof r0.xy, r0.xyxx mad r0.x, r0.x, cb8[9].x, r0.y mul r0.x, r0.x, l(4.000000) ftou r0.x, r0.x imm_atomic_exch r0.x, u1, r0.x, r1.x bfi r2.z, l(26), l(0), r0.x, l(0xfc000000) mov r2.w, v0.z store_structured u2.xyzw, r1.x, l(0), r2.xyzw endif mov o0.xyzw, l(1.000000,0,0,1.000000) ret // Approximately 77 instruction slots used ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/ [/code] For the atomic addition operation, I still don't understand how to fix it. I can't find any u3 in the whole file (same file attached above). Can you please elaborate a bit more ? Thank you again!
DarkStarSword said:
helifax said:Did anyone notice that if Bloom is enabled we get a "bleed" image in the left eye from the Main Menu ? I know is not something BIG and we could perfectly play with Bloom disabled but I was wondering if you guys see the same thing and I was wondering how we can fix this hmm...
Sounds like a mono render target that needs to be forced to Stereo (same thing in Mad Max IIRC). I find that frame analysis tends to be the easiest and most reliable way to find these.

helifax said:Anyone can tell me how I can manually fix these lines in HLSL please?:)

r2.xyz = v1.xyz * r1.www;
// Known bad code for instruction (needs manual fix):
ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx
r0.w = g_sTrilinear[]..swiz;


I think I gave you a hand on one of these before:
https://forums.geforce.com/default/topic/766890/3d-vision/bo3bs-school-for-shaderhackers/post/4739720/#4739720

Something like:
r0.w = <whatever the StructuredBuffer for t5 is called>[r0.w].<whatever is 4 bytes into the structure>.<possibly with a swizzle to select the right offset if it's in a float4, uint4 or similar>;


r2.y = (((uint)r0.y << 0) & bitmask.y) | ((uint)r0.x & ~bitmask.y);
r0.xy = (uint2)v0.yx;
// Needs manual fix for instruction:
imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1)
InterlockedAdd?(dest, value, orig_value);


I think that will be something like:
uint oldval;
<whatever the buffer for u3 is called>.InterlockedAdd(0, 1, oldval);
r1.x = oldval;



Big thanks! I actually had bookmarked that post! To use for future and I actually looked at it yesterday but I missed it >:<...
Now I feel silly for asking it again:))

I think I was able to fix the 1st one. I have attached the whole shader here. Can you please give it a look and see that I understood correctly???

// ---- Created with 3Dmigoto v1.2.27 on Fri Jan 29 19:41:02 2016
// HAIR ON

cbuffer cbPerScene : register(b8)
{
row_major float4x4 g_mViewProj : packoffset(c0);
row_major float4x4 g_mInvViewProj : packoffset(c4);
float3 g_vEye : packoffset(c8);
float g_FiberAlpha : packoffset(c8.w);
float2 g_WinSize : packoffset(c9);
float g_FiberRadius : packoffset(c9.z);
float g_fvFov : packoffset(c9.w);
float2 g_zMinMax : packoffset(c10);
float g_zWriteValue : packoffset(c10.z);
float g_NoiseScale : packoffset(c10.w);
float4 g_ScreenExtents : packoffset(c11);
float3 g_dirtColor : packoffset(c12);
float g_dirtLevel : packoffset(c12.w);
uint g_baseGroupId : packoffset(c13);
uint g_dirtGroupId : packoffset(c13.y);
float g_widthCurveOverride : packoffset(c13.z);
float g_alphaCurveOverride : packoffset(c13.w);
float3 g_SlaveOffsetsReferencePosition : packoffset(c14);
float g_SlaveOffsetsScale : packoffset(c14.w);
}

SamplerState g_sTrilinear_s : register(s1);
Texture2D<float> g_txNoise : register(t3);
Texture2D<float> g_txDirt : register(t4);
// Do we need to comment out the old one???
//StructuredBuffer<GroupRenderData> GroupRenderData : register(t5);


// 3Dmigoto declarations
#define cmp -
Texture1D<float4> IniParams : register(t120);
Texture2D<float4> StereoParams : register(t125);


struct GroupRenderDataType
{
struct HairGroupRenderData
{
float noiseFrequency; // Offset: 0
float noiseIntensity; // Offset: 4
struct curve
{
float4 samples03; // Offset: 8
float4 samples47; // Offset: 24
} thicknessCurve; // Offset: 8
struct curve
{
float4 samples03; // Offset: 40
float4 samples47; // Offset: 56
} alphaCurve; // Offset: 40
} $Element; // Offset: 0 Size: 72
};
// New Struct Buffer definition
StructuredBuffer<GroupRenderDataType> GroupRenderData : register(t5);

void main(
float4 v0 : SV_POSITION0,
float4 v1 : TANGENT0,
float4 v2 : TEXCOORD0,
float3 v3 : TEXCOORD1,
float4 v4 : COLOR0,
out float4 o0 : SV_Target0)
{
float4 r0,r1,r2,r3;
uint4 bitmask, uiDest;
float4 fDest;

r0.x = dot(g_ScreenExtents.zz, v0.xx);
r0.xz = float2(-1,-1) + r0.xx;
r1.x = dot(g_ScreenExtents.ww, v0.yy);
r0.yw = float2(1,1) + -r1.xx;
r1.xyzw = g_WinSize.xyxy * v2.xyzw;
r0.xyzw = -r0.xyzw * g_WinSize.xyxy + r1.xyzw;
r0.x = dot(r0.xy, r0.xy);
r0.y = dot(r0.zw, r0.zw);
r0.zw = v2.xy * g_WinSize.xy + -r1.zw;
r0.z = dot(r0.zw, r0.zw);
r0.xyz = sqrt(r0.xyz);
r0.w = cmp(r0.x >= r0.z);
r0.z = cmp(r0.y >= r0.z);
r1.xy = r0.wz ? 1.000000 : 0;
r0.z = dot(r1.xy, r1.xy);
r0.z = cmp(r0.z != 0.000000);
r0.z = r0.z ? -1 : 1;
r0.x = min(r0.x, r0.y);
r0.x = min(1, r0.x);
r0.x = r0.z * r0.x + 1;
r0.x = v3.z * r0.x;
r0.y = cmp(0.00776470592 < r0.x);
r1.x = v1.w;
r1.y = 0.5;
r0.z = g_txNoise.Sample(g_sTrilinear_s, r1.xy).x;
if (r0.y != 0) {
r0.y = g_txDirt.Sample(g_sTrilinear_s, v3.xy).x;
r0.w = (uint)v4.w;
r1.x = -g_dirtLevel + 1;
r0.y = -r1.x + r0.y;
r1.x = g_dirtLevel + 0.00100000005;
r0.y = saturate(r0.y / r1.x);
r1.xyz = r0.yyy * g_dirtColor.xyz + -v4.xyz;
r1.xyz = saturate(r0.yyy * r1.xyz + v4.xyz);
r1.w = dot(v1.xyz, v1.xyz);
r1.w = rsqrt(r1.w);
r2.xyz = v1.xyz * r1.www;

// Known bad code for instruction (needs manual fix):
//ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx
// is this correct?!?!
r0.w = GroupRenderData[r0.w].Element.y;


r0.w = g_sTrilinear[]..swiz;
r0.z = r0.w * r0.z;
r0.y = cmp(0.5 < r0.y);
r3.xy = g_dirtGroupId;
r0.w = v4.w + r3.y;
r0.y = r0.y ? r3.x : r0.w;
r2.xyz = saturate(r2.xyz * float3(0.5,0.5,0.5) + float3(0.5,0.5,0.5));
r2.xyz = float3(255,255,255) * r2.xyz;
r2.xyz = (uint3)r2.xyz;
r2.yz = (uint2)r2.yz << int2(16,8);
r0.w = mad((int)r2.x, 0x01000000, (int)r2.y);
r0.w = (int)r0.w + (int)r2.z;
r0.x = saturate(-r0.x * 0.5 + 1);
r0.x = 255 * r0.x;
r0.xy = (uint2)r0.xy;
r2.x = (int)r0.w + (int)r0.x;
r1.xyz = float3(127,127,63) * r1.xyz;
r1.xyz = (uint3)r1.xyz;
r0.xw = (uint2)r1.yz << int2(18,12);
r0.x = mad((int)r1.x, 0x02000000, (int)r0.x);
r0.x = (int)r0.x + (int)r0.w;
r0.z = saturate(4 * r0.z);
r0.z = 63 * r0.z;
r0.z = (uint)r0.z;
r0.z = (uint)r0.z << 6;
r0.x = (int)r0.x + (int)r0.z;
bitmask.y = ((~(-1 << 6)) << 0) & 0xffffffff;
r2.y = (((uint)r0.y << 0) & bitmask.y) | ((uint)r0.x & ~bitmask.y);
r0.xy = (uint2)v0.yx;
// Needs manual fix for instruction:
imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1)
InterlockedAdd?(dest, value, orig_value);
r0.xy = (uint2)r0.xy;
r0.x = r0.x * g_WinSize.x + r0.y;
r0.x = 4 * r0.x;
r0.x = (uint)r0.x;
}

/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//
// Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384
//
// using 3Dmigoto v1.2.27 on Fri Jan 29 19:41:02 2016
//
//
// Note: shader requires additional functionality:
// Early depth-stencil
//
//
// Buffer Definitions:
//
// cbuffer cbPerScene
// {
//
// row_major float4x4 g_mViewProj; // Offset: 0 Size: 64 [unused]
// row_major float4x4 g_mInvViewProj; // Offset: 64 Size: 64 [unused]
// float3 g_vEye; // Offset: 128 Size: 12 [unused]
// float g_FiberAlpha; // Offset: 140 Size: 4 [unused]
// float2 g_WinSize; // Offset: 144 Size: 8
// float g_FiberRadius; // Offset: 152 Size: 4 [unused]
// float g_fvFov; // Offset: 156 Size: 4 [unused]
// float2 g_zMinMax; // Offset: 160 Size: 8 [unused]
// float g_zWriteValue; // Offset: 168 Size: 4 [unused]
// float g_NoiseScale; // Offset: 172 Size: 4 [unused]
// float4 g_ScreenExtents; // Offset: 176 Size: 16
// float3 g_dirtColor; // Offset: 192 Size: 12
// float g_dirtLevel; // Offset: 204 Size: 4
// uint g_baseGroupId; // Offset: 208 Size: 4
// uint g_dirtGroupId; // Offset: 212 Size: 4
// float g_widthCurveOverride; // Offset: 216 Size: 4 [unused]
// float g_alphaCurveOverride; // Offset: 220 Size: 4 [unused]
// float3 g_SlaveOffsetsReferencePosition;// Offset: 224 Size: 12 [unused]
// float g_SlaveOffsetsScale; // Offset: 236 Size: 4 [unused]
//
// }
//
// Resource bind info for GroupRenderData
// {
//
// struct HairGroupRenderData
// {
//
// float noiseFrequency; // Offset: 0
// float noiseIntensity; // Offset: 4
//
// struct curve
// {
//
// float4 samples03; // Offset: 8
// float4 samples47; // Offset: 24
//
// } thicknessCurve; // Offset: 8
//
// struct curve
// {
//
// float4 samples03; // Offset: 40
// float4 samples47; // Offset: 56
//
// } alphaCurve; // Offset: 40
//
// } $Element; // Offset: 0 Size: 72
//
// }
//
// Resource bind info for HairElementsUAV
// {
//
// struct ABufferNode
// {
//
// uint uPackedData0; // Offset: 0
// uint uPackedData1; // Offset: 4
// uint uPackedData2_Next; // Offset: 8
// float fDepth; // Offset: 12
//
// } $Element; // Offset: 0 Size: 16
//
// }
//
// Resource bind info for HairElementsCounterUAV
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// g_sTrilinear sampler NA NA 1 1
// g_txNoise texture float 2d 3 1
// g_txDirt texture float 2d 4 1
// GroupRenderData texture struct r/o 5 1
// HairPixelHeadUAV UAV byte r/w 1 1
// HairElementsUAV UAV struct r/w 2 1
// HairElementsCounterUAV UAV struct r/w 3 1
// cbPerScene cbuffer NA NA 8 1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_POSITION 0 xyzw 0 POS float xyz
// TANGENT 0 xyzw 1 NONE float xyzw
// TEXCOORD 0 xyzw 2 NONE float xyzw
// TEXCOORD 1 xyz 3 NONE float xyz
// COLOR 0 xyzw 4 NONE float xyzw
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Target 0 xyzw 0 TARGET float xyzw
//
ps_5_0
dcl_globalFlags refactoringAllowed | forceEarlyDepthStencil
dcl_constantbuffer cb8[14], immediateIndexed
dcl_sampler s1, mode_default
dcl_resource_texture2d (float,float,float,float) t3
dcl_resource_texture2d (float,float,float,float) t4
dcl_resource_structured t5, 72
dcl_uav_raw u1
dcl_uav_structured u2, 16
dcl_uav_structured u3, 4
dcl_input_ps_siv linear noperspective v0.xyz, position
dcl_input_ps linear v1.xyzw
dcl_input_ps linear v2.xyzw
dcl_input_ps linear v3.xyz
dcl_input_ps linear v4.xyzw
dcl_output o0.xyzw
dcl_temps 4
dp2 r0.x, cb8[11].zzzz, v0.xxxx
add r0.xz, r0.xxxx, l(-1.000000, 0.000000, -1.000000, 0.000000)
dp2 r1.x, cb8[11].wwww, v0.yyyy
add r0.yw, -r1.xxxx, l(0.000000, 1.000000, 0.000000, 1.000000)
mul r1.xyzw, v2.xyzw, cb8[9].xyxy
mad r0.xyzw, -r0.xyzw, cb8[9].xyxy, r1.xyzw
dp2 r0.x, r0.xyxx, r0.xyxx
dp2 r0.y, r0.zwzz, r0.zwzz
mad r0.zw, v2.xxxy, cb8[9].xxxy, -r1.zzzw
dp2 r0.z, r0.zwzz, r0.zwzz
sqrt r0.xyz, r0.xyzx
ge r0.w, r0.x, r0.z
ge r0.z, r0.y, r0.z
and r1.xy, r0.wzww, l(0x3f800000, 0x3f800000, 0, 0)
dp2 r0.z, r1.xyxx, r1.xyxx
ne r0.z, r0.z, l(0.000000)
movc r0.z, r0.z, l(-1.000000), l(1.000000)
min r0.x, r0.y, r0.x
min r0.x, r0.x, l(1.000000)
mad r0.x, r0.z, r0.x, l(1.000000)
mul r0.x, r0.x, v3.z
lt r0.y, l(0.00776470592), r0.x
mov r1.x, v1.w
mov r1.y, l(0.500000)
sample_indexable(texture2d)(float,float,float,float) r0.z, r1.xyxx, t3.yzxw, s1
if_nz r0.y
sample_indexable(texture2d)(float,float,float,float) r0.y, v3.xyxx, t4.yxzw, s1
ftou r0.w, v4.w
add r1.x, l(1.000000), -cb8[12].w
add r0.y, r0.y, -r1.x
add r1.x, l(0.001000), cb8[12].w
div_sat r0.y, r0.y, r1.x
mad r1.xyz, r0.yyyy, cb8[12].xyzx, -v4.xyzx
mad_sat r1.xyz, r0.yyyy, r1.xyzx, v4.xyzx
dp3 r1.w, v1.xyzx, v1.xyzx
rsq r1.w, r1.w
mul r2.xyz, r1.wwww, v1.xyzx
ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx
mul r0.z, r0.z, r0.w
lt r0.y, l(0.500000), r0.y
utof r3.xy, cb8[13].yxyy
add r0.w, r3.y, v4.w
movc r0.y, r0.y, r3.x, r0.w
mad_sat r2.xyz, r2.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000)
mul r2.xyz, r2.xyzx, l(255.000000, 255.000000, 255.000000, 0.000000)
ftou r2.xyz, r2.xyzx
ishl r2.yz, r2.yyzy, l(0, 16, 8, 0)
imad r0.w, r2.x, l(0x01000000), r2.y
iadd r0.w, r0.w, r2.z
mad_sat r0.x, -r0.x, l(0.500000), l(1.000000)
mul r0.x, r0.x, l(255.000000)
ftou r0.xy, r0.xyxx
iadd r2.x, r0.w, r0.x
mul r1.xyz, r1.xyzx, l(127.000000, 127.000000, 63.000000, 0.000000)
ftou r1.xyz, r1.xyzx
ishl r0.xw, r1.yyyz, l(18, 0, 0, 12)
imad r0.x, r1.x, l(0x02000000), r0.x
iadd r0.x, r0.x, r0.w
mul_sat r0.z, r0.z, l(4.000000)
mul r0.z, r0.z, l(63.000000)
ftou r0.z, r0.z
ishl r0.z, r0.z, l(6)
iadd r0.x, r0.x, r0.z
bfi r2.y, l(6), l(0), r0.y, r0.x
ftou r0.xy, v0.yxyy
imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1)
utof r0.xy, r0.xyxx
mad r0.x, r0.x, cb8[9].x, r0.y
mul r0.x, r0.x, l(4.000000)
ftou r0.x, r0.x
imm_atomic_exch r0.x, u1, r0.x, r1.x
bfi r2.z, l(26), l(0), r0.x, l(0xfc000000)
mov r2.w, v0.z
store_structured u2.xyzw, r1.x, l(0), r2.xyzw
endif
mov o0.xyzw, l(1.000000,0,0,1.000000)
ret
// Approximately 77 instruction slots used

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/



For the atomic addition operation, I still don't understand how to fix it. I can't find any u3 in the whole file (same file attached above).
Can you please elaborate a bit more ?

Thank you again!

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 02/02/2016 03:24 PM   
I tried to buy the game from Windows store(9$), unfortunately I get a message to contact Microsoft as they were unable to process the payment. Does this still work for others?
I tried to buy the game from Windows store(9$), unfortunately I get a message to contact Microsoft as they were unable to process the payment.
Does this still work for others?

Intel i7 8086K
Gigabyte GTX 1080Ti Aorus Extreme
DDR4 2x8gb 3200mhz Cl14
TV LG OLED65E6V
Windows 10 64bits

Posted 02/02/2016 04:42 PM   
I believe MS have fixed the error allowing that to work. Regardless, I wouldn't recommend doing it. From what I can tell so far, it's not going to be possible to fix the WS version. Users have no access to the folder where the game's files are kept, and even if you take ownership and grant read access, you can't get write access. The only way I've found to get 3dmigoto installed is to switch to my Win8 install and access the folder. But after doing that, the game refuses to launch. I'm going to play around some more, but the WS version looks like a bust.
I believe MS have fixed the error allowing that to work.

Regardless, I wouldn't recommend doing it. From what I can tell so far, it's not going to be possible to fix the WS version. Users have no access to the folder where the game's files are kept, and even if you take ownership and grant read access, you can't get write access. The only way I've found to get 3dmigoto installed is to switch to my Win8 install and access the folder.

But after doing that, the game refuses to launch. I'm going to play around some more, but the WS version looks like a bust.

Posted 02/02/2016 04:54 PM   
[quote="Pirateguybrush"]I believe MS have fixed the error allowing that to work. Regardless, I wouldn't recommend doing it. From what I can tell so far, it's not going to be possible to fix the WS version. Users have no access to the folder where the game's files are kept, and even if you take ownership and grant read access, you can't get write access. The only way I've found to get 3dmigoto installed is to switch to my Win8 install and access the folder. But after doing that, the game refuses to launch. I'm going to play around some more, but the WS version looks like a bust.[/quote] I had the exact same problem with Minecraft Win10 edition. The install location is locked down tight, you have to jump through hoops just to access it then you can't write to it no matter what permissions or ownership you give yourself. The game can't be copied anywhere else to be run either because then it complains that it needs to be run in a "container" or something. I really hope Microsoft sorts this crap out, it basically removes any ability to mod games for any purpose.
Pirateguybrush said:I believe MS have fixed the error allowing that to work.

Regardless, I wouldn't recommend doing it. From what I can tell so far, it's not going to be possible to fix the WS version. Users have no access to the folder where the game's files are kept, and even if you take ownership and grant read access, you can't get write access. The only way I've found to get 3dmigoto installed is to switch to my Win8 install and access the folder.

But after doing that, the game refuses to launch. I'm going to play around some more, but the WS version looks like a bust.

I had the exact same problem with Minecraft Win10 edition. The install location is locked down tight, you have to jump through hoops just to access it then you can't write to it no matter what permissions or ownership you give yourself. The game can't be copied anywhere else to be run either because then it complains that it needs to be run in a "container" or something. I really hope Microsoft sorts this crap out, it basically removes any ability to mod games for any purpose.

Rig: Intel i7-8700K @4.7GHz, 16Gb Ram, SSD, GTX 1080Ti, Win10x64, Asus VG278

Posted 02/02/2016 04:57 PM   
You can try [url=https://forums.geforce.com/default/topic/550192/geforce-drivers/wagnard-tools-ddu-gmp-tdr-manipulator-new-cpu-core-analyzer-updated-01-19-2016-/post/3846181/#3846181]Give Me Power[/url] by Wagnard
You can try Give Me Power by Wagnard

Posted 02/02/2016 05:09 PM   
Thanks for the tip D-Man, but no dice. Weirdly, if I launched Notepad through GMP and attempted to browse to the folder, I couldn't see quite a few of my drives and folders. I could access them by typing paths though. But once I got there, I still couldn't create a new folder as a test. Tried doing it by launching explorer.exe through GMP as well. I could see everything properly there, but still couldn't modify those files.
Thanks for the tip D-Man, but no dice. Weirdly, if I launched Notepad through GMP and attempted to browse to the folder, I couldn't see quite a few of my drives and folders. I could access them by typing paths though. But once I got there, I still couldn't create a new folder as a test.

Tried doing it by launching explorer.exe through GMP as well. I could see everything properly there, but still couldn't modify those files.

Posted 02/02/2016 05:18 PM   
Hmm found another shader with compiling problems: [code] cbuffer WorldBuffer : register(b0) { struct { row_major float4x4 ViewProject; struct { row_major float4x4 World; row_major float4x4 WorldViewProject; } PerInstance[256]; } WorldParameters : packoffset(c0); } cbuffer MaterialBuffer : register(b4) { float4 MaterialParams[512] : packoffset(c0); } // 3Dmigoto declarations #define cmp - Texture1D<float4> IniParams : register(t120); Texture2D<float4> StereoParams : register(t125); void main( float4 v0 : COLOR0, float3 v1 : POSITION0, uint v2 : SV_InstanceID0, uint v3 : SV_VertexID0, out float4 o0 : SV_POSITION0, out float4 o1 : COLOR0, out uint o2 : PSIZE0) { float4 r0,r1; uint4 bitmask, uiDest; float4 fDest; r0.x = (uint)v2.x << 3; r1.xyzw = WorldParameters.PerInstance[r0.x/4]._m10_m11_m12_m13 * v1.yyyy; r1.xyzw = v1.xxxx * WorldParameters.PerInstance[r0.x/4]._m00_m01_m02_m03 + r1.xyzw; r1.xyzw = v1.zzzz * WorldParameters.PerInstance[r0.x/4]._m20_m21_m22_m23 + r1.xyzw; o0.xyzw = WorldParameters.PerInstance[r0.x/4]._m30_m31_m32_m33 + r1.xyzw; r0.xyzw = max(float4(0,0,0,0), v0.xyzw); r0.xyzw = log2(r0.xyzw); r0.xyzw = MaterialParams[0].xxxx * r0.xyzw; o1.xyzw = exp2(r0.xyzw); o2.x = v2.x; return; } [/code] I narrowed down the problem to "WorldParameters.PerInstance[r0.x/4]._m10_m11_m12_m13". It doesn't like the "PerInstance[]" but I have no idea how to fix it... Any helpers?:) ^_^ Edit: I fixed the problem. I didn't noticed the decompiled version was basically missing the variable from the PerInstance struct. The fix is like this: [code] r0.x = (uint)v2.x << 3; r1.xyzw = WorldParameters.PerInstance[r0.x/4].WorldViewProject._m10_m11_m12_m13 * v1.yyyy; r1.xyzw = v1.xxxx * WorldParameters.PerInstance[r0.x/4].WorldViewProject._m00_m01_m02_m03 + r1.xyzw; r1.xyzw = v1.zzzz * WorldParameters.PerInstance[r0.x/4].WorldViewProject._m20_m21_m22_m23 + r1.xyzw; o0.xyzw = WorldParameters.PerInstance[r0.x/4].WorldViewProject._m30_m31_m32_m33 + r1.xyzw; r0.xyzw = max(float4(0,0,0,0), v0.xyzw); r0.xyzw = log2(r0.xyzw); r0.xyzw = MaterialParams[0].xxxx * r0.xyzw; [/code]
Hmm found another shader with compiling problems:

cbuffer WorldBuffer : register(b0)
{

struct
{
row_major float4x4 ViewProject;

struct
{
row_major float4x4 World;
row_major float4x4 WorldViewProject;
} PerInstance[256];

} WorldParameters : packoffset(c0);

}

cbuffer MaterialBuffer : register(b4)
{
float4 MaterialParams[512] : packoffset(c0);
}



// 3Dmigoto declarations
#define cmp -
Texture1D<float4> IniParams : register(t120);
Texture2D<float4> StereoParams : register(t125);


void main(
float4 v0 : COLOR0,
float3 v1 : POSITION0,
uint v2 : SV_InstanceID0,
uint v3 : SV_VertexID0,
out float4 o0 : SV_POSITION0,
out float4 o1 : COLOR0,
out uint o2 : PSIZE0)
{
float4 r0,r1;
uint4 bitmask, uiDest;
float4 fDest;

r0.x = (uint)v2.x << 3;
r1.xyzw = WorldParameters.PerInstance[r0.x/4]._m10_m11_m12_m13 * v1.yyyy;
r1.xyzw = v1.xxxx * WorldParameters.PerInstance[r0.x/4]._m00_m01_m02_m03 + r1.xyzw;
r1.xyzw = v1.zzzz * WorldParameters.PerInstance[r0.x/4]._m20_m21_m22_m23 + r1.xyzw;
o0.xyzw = WorldParameters.PerInstance[r0.x/4]._m30_m31_m32_m33 + r1.xyzw;
r0.xyzw = max(float4(0,0,0,0), v0.xyzw);
r0.xyzw = log2(r0.xyzw);
r0.xyzw = MaterialParams[0].xxxx * r0.xyzw;
o1.xyzw = exp2(r0.xyzw);
o2.x = v2.x;
return;
}


I narrowed down the problem to "WorldParameters.PerInstance[r0.x/4]._m10_m11_m12_m13".
It doesn't like the "PerInstance[]" but I have no idea how to fix it...
Any helpers?:) ^_^


Edit:
I fixed the problem. I didn't noticed the decompiled version was basically missing the variable from the PerInstance struct.
The fix is like this:

r0.x = (uint)v2.x << 3;
r1.xyzw = WorldParameters.PerInstance[r0.x/4].WorldViewProject._m10_m11_m12_m13 * v1.yyyy;
r1.xyzw = v1.xxxx * WorldParameters.PerInstance[r0.x/4].WorldViewProject._m00_m01_m02_m03 + r1.xyzw;
r1.xyzw = v1.zzzz * WorldParameters.PerInstance[r0.x/4].WorldViewProject._m20_m21_m22_m23 + r1.xyzw;
o0.xyzw = WorldParameters.PerInstance[r0.x/4].WorldViewProject._m30_m31_m32_m33 + r1.xyzw;
r0.xyzw = max(float4(0,0,0,0), v0.xyzw);
r0.xyzw = log2(r0.xyzw);
r0.xyzw = MaterialParams[0].xxxx * r0.xyzw;

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 02/02/2016 08:42 PM   
[quote="helifax"] I think I was able to fix the 1st one. I have attached the whole shader here. Can you please give it a look and see that I understood correctly??? [code] ... // Do we need to comment out the old one??? //StructuredBuffer<GroupRenderData> GroupRenderData : register(t5); [/code][/quote] Yeah, you should only have one definition for t5, so either comment out the old one or make sure it's using the same data type name. [quote][code] struct GroupRenderDataType { struct HairGroupRenderData { float noiseFrequency; // Offset: 0 float noiseIntensity; // Offset: 4 struct curve { float4 samples03; // Offset: 8 float4 samples47; // Offset: 24 } thicknessCurve; // Offset: 8 struct curve { float4 samples03; // Offset: 40 float4 samples47; // Offset: 56 } alphaCurve; // Offset: 40 } $Element; // Offset: 0 Size: 72 [/code][/quote] You want to remove the $ from that name (I think you could probably actually remove the entire containing struct and just use HairGroupRenderData directly, but either way should work). [quote][code] }; // New Struct Buffer definition StructuredBuffer<GroupRenderDataType> GroupRenderData : register(t5); ... // Known bad code for instruction (needs manual fix): //ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx // is this correct?!?! r0.w = GroupRenderData[r0.w].Element.y; [/code][/quote] Looking at the structure definition offset 4 is noiseIntensity (and is just a 'float', so won't have a swizzle in this case), so that should be: [code] r0.w = GroupRenderData[r0.w].Element.noiseIntensity; [/code] [quote]For the atomic addition operation, I still don't understand how to fix it. I can't find any u3 in the whole file (same file attached above). Can you please elaborate a bit more ?[/quote] Based on the assembly version: [code] // HairElementsCounterUAV UAV struct r/w 3 1 ... dcl_uav_structured u3, 4 [/code] I think it will be something like: [code]RWStructuredBuffer<uint> HairElementsCounterUAV : register(u3);[/code] The only thing I'm not positive about is that MSDN doesn't list InterlockedAdd as a method on RWStructuredBuffer (only lists it on RWByteAddressBuffer), but based on the documentation of InterlockedAdd itself I think that may just be an oversight.
helifax said:
I think I was able to fix the 1st one. I have attached the whole shader here. Can you please give it a look and see that I understood correctly???

...
// Do we need to comment out the old one???
//StructuredBuffer<GroupRenderData> GroupRenderData : register(t5);

Yeah, you should only have one definition for t5, so either comment out the old one or make sure it's using the same data type name.

struct GroupRenderDataType
{
struct HairGroupRenderData
{
float noiseFrequency; // Offset: 0
float noiseIntensity; // Offset: 4
struct curve
{
float4 samples03; // Offset: 8
float4 samples47; // Offset: 24
} thicknessCurve; // Offset: 8
struct curve
{
float4 samples03; // Offset: 40
float4 samples47; // Offset: 56
} alphaCurve; // Offset: 40
} $Element; // Offset: 0 Size: 72


You want to remove the $ from that name (I think you could probably actually remove the entire containing struct and just use HairGroupRenderData directly, but either way should work).

};
// New Struct Buffer definition
StructuredBuffer<GroupRenderDataType> GroupRenderData : register(t5);

...

// Known bad code for instruction (needs manual fix):
//ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx
// is this correct?!?!
r0.w = GroupRenderData[r0.w].Element.y;

Looking at the structure definition offset 4 is noiseIntensity (and is just a 'float', so won't have a swizzle in this case), so that should be:
r0.w = GroupRenderData[r0.w].Element.noiseIntensity;


For the atomic addition operation, I still don't understand how to fix it. I can't find any u3 in the whole file (same file attached above).
Can you please elaborate a bit more ?


Based on the assembly version:
// HairElementsCounterUAV                UAV  struct         r/w    3        1
...
dcl_uav_structured u3, 4


I think it will be something like:

RWStructuredBuffer<uint> HairElementsCounterUAV : register(u3);


The only thing I'm not positive about is that MSDN doesn't list InterlockedAdd as a method on RWStructuredBuffer (only lists it on RWByteAddressBuffer), but based on the documentation of InterlockedAdd itself I think that may just be an oversight.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 02/03/2016 09:49 AM   
Big thank you DarkStarSword! I will definitely try it once I get home this evening!
Big thank you DarkStarSword!
I will definitely try it once I get home this evening!

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 02/03/2016 12:16 PM   
Those changes are pretty close, but I think they still won't compile. This one is pretty complicated, it took me awhile to work out the syntax for the different variants. Rather than do them piece by piece, here is that shader from above, with a bunch of manual fixes (documented). You can compare to what you started with to see the changes. This compiles, but I can't promise that it's working fully correctly, and don't have time at the moment to compare the output to the original. It's also worth noting that there were some complicated missing instructions at the bottom of the shader. I would have expected some "missing instruction" output, not have them be actually blank. I put those back. [code]// ---- Created with 3Dmigoto v1.2.27 on Fri Jan 29 19:41:02 2016 // HAIR ON cbuffer cbPerScene : register(b8) { row_major float4x4 g_mViewProj : packoffset(c0); row_major float4x4 g_mInvViewProj : packoffset(c4); float3 g_vEye : packoffset(c8); float g_FiberAlpha : packoffset(c8.w); float2 g_WinSize : packoffset(c9); float g_FiberRadius : packoffset(c9.z); float g_fvFov : packoffset(c9.w); float2 g_zMinMax : packoffset(c10); float g_zWriteValue : packoffset(c10.z); float g_NoiseScale : packoffset(c10.w); float4 g_ScreenExtents : packoffset(c11); float3 g_dirtColor : packoffset(c12); float g_dirtLevel : packoffset(c12.w); uint g_baseGroupId : packoffset(c13); uint g_dirtGroupId : packoffset(c13.y); float g_widthCurveOverride : packoffset(c13.z); float g_alphaCurveOverride : packoffset(c13.w); float3 g_SlaveOffsetsReferencePosition : packoffset(c14); float g_SlaveOffsetsScale : packoffset(c14.w); } // Manual copy from ASM declaration // Resource bind info for GroupRenderData // { // // struct HairGroupRenderData // { // // float noiseFrequency; // Offset: 0 // float noiseIntensity; // Offset: 4 // // struct curve // { // // float4 samples03; // Offset: 8 // float4 samples47; // Offset: 24 // // } thicknessCurve; // Offset: 8 // // struct curve // { // // float4 samples03; // Offset: 40 // float4 samples47; // Offset: 56 // // } alphaCurve; // Offset: 40 // // } $Element; // Offset: 0 Size: 72 // // } // Becomes: struct HairGroupRenderData { float noiseFrequency; // Offset: 0 float noiseIntensity; // Offset: 4 struct { float4 samples03; // Offset: 8 float4 samples47; // Offset: 24 } thicknessCurve; // Offset: 8 struct { float4 samples03; // Offset: 40 float4 samples47; // Offset: 56 } alphaCurve; // Offset: 40 }; // Offset: 0 Size: 72 // Resource bind info for HairElementsUAV // { // // struct ABufferNode // { // // uint uPackedData0; // Offset: 0 // uint uPackedData1; // Offset: 4 // uint uPackedData2_Next; // Offset: 8 // float fDepth; // Offset: 12 // // } $Element; // Offset: 0 Size: 16 // // } // Becomes: struct ABufferNode { uint uPackedData0; // Offset: 0 uint uPackedData1; // Offset: 4 uint uPackedData2_Next; // Offset: 8 float fDepth; // Offset: 12 }; // Offset: 0 Size: 16 SamplerState g_sTrilinear_s : register(s1); Texture2D<float> g_txNoise : register(t3); Texture2D<float> g_txDirt : register(t4); // Use main struct for definition StructuredBuffer<HairGroupRenderData> GroupRenderData : register(t5); // Manually added u1,u2,u3 RWByteAddressBuffer HairPixelHeadUAV : register(u1); //dcl_uav_raw u1 RWStructuredBuffer<ABufferNode> HairElementsUAV : register(u2); //dcl_uav_structured u2, 16 RWStructuredBuffer<uint> HairElementsCounterUAV : register(u3); //dcl_uav_structured u3, 4 // 3Dmigoto declarations #define cmp - Texture1D<float4> IniParams : register(t120); Texture2D<float4> StereoParams : register(t125); void main( float4 v0 : SV_POSITION0, float4 v1 : TANGENT0, float4 v2 : TEXCOORD0, float3 v3 : TEXCOORD1, float4 v4 : COLOR0, out float4 o0 : SV_Target0) { float4 r0,r1,r2,r3; uint4 bitmask, uiDest; float4 fDest; r0.x = dot(g_ScreenExtents.zz, v0.xx); r0.xz = float2(-1,-1) + r0.xx; r1.x = dot(g_ScreenExtents.ww, v0.yy); r0.yw = float2(1,1) + -r1.xx; r1.xyzw = g_WinSize.xyxy * v2.xyzw; r0.xyzw = -r0.xyzw * g_WinSize.xyxy + r1.xyzw; r0.x = dot(r0.xy, r0.xy); r0.y = dot(r0.zw, r0.zw); r0.zw = v2.xy * g_WinSize.xy + -r1.zw; r0.z = dot(r0.zw, r0.zw); r0.xyz = sqrt(r0.xyz); r0.w = cmp(r0.x >= r0.z); r0.z = cmp(r0.y >= r0.z); r1.xy = r0.wz ? 1.000000 : 0; r0.z = dot(r1.xy, r1.xy); r0.z = cmp(r0.z != 0.000000); r0.z = r0.z ? -1 : 1; r0.x = min(r0.x, r0.y); r0.x = min(1, r0.x); r0.x = r0.z * r0.x + 1; r0.x = v3.z * r0.x; r0.y = cmp(0.00776470592 < r0.x); r1.x = v1.w; r1.y = 0.5; r0.z = g_txNoise.Sample(g_sTrilinear_s, r1.xy).x; if (r0.y != 0) { r0.y = g_txDirt.Sample(g_sTrilinear_s, v3.xy).x; r0.w = (uint)v4.w; r1.x = -g_dirtLevel + 1; r0.y = -r1.x + r0.y; r1.x = g_dirtLevel + 0.00100000005; r0.y = saturate(r0.y / r1.x); r1.xyz = r0.yyy * g_dirtColor.xyz + -v4.xyz; r1.xyz = saturate(r0.yyy * r1.xyz + v4.xyz); r1.w = dot(v1.xyz, v1.xyz); r1.w = rsqrt(r1.w); r2.xyz = v1.xyz * r1.www; // Known bad code for instruction (needs manual fix): //ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx r0.w = GroupRenderData[r0.w].noiseIntensity; r0.z = r0.w * r0.z; r0.y = cmp(0.5 < r0.y); r3.xy = g_dirtGroupId; r0.w = v4.w + r3.y; r0.y = r0.y ? r3.x : r0.w; r2.xyz = saturate(r2.xyz * float3(0.5,0.5,0.5) + float3(0.5,0.5,0.5)); r2.xyz = float3(255,255,255) * r2.xyz; r2.xyz = (uint3)r2.xyz; r2.yz = (uint2)r2.yz << int2(16,8); r0.w = mad((int)r2.x, 0x01000000, (int)r2.y); r0.w = (int)r0.w + (int)r2.z; r0.x = saturate(-r0.x * 0.5 + 1); r0.x = 255 * r0.x; r0.xy = (uint2)r0.xy; r2.x = (int)r0.w + (int)r0.x; r1.xyz = float3(127,127,63) * r1.xyz; r1.xyz = (uint3)r1.xyz; r0.xw = (uint2)r1.yz << int2(18,12); r0.x = mad((int)r1.x, 0x02000000, (int)r0.x); r0.x = (int)r0.x + (int)r0.w; r0.z = saturate(4 * r0.z); r0.z = 63 * r0.z; // ftou r0.z, r0.z // ishl r0.z, r0.z, l(6) // iadd r0.x, r0.x, r0.z r0.z = (uint)r0.z; r0.z = (uint)r0.z << 6; r0.x = (int)r0.x + (int)r0.z; // bfi r2.y, l(6), l(0), r0.y, r0.x bitmask.y = ((~(-1 << 6)) << 0) & 0xffffffff; r2.y = (((uint)r0.y << 0) & bitmask.y) | ((uint)r0.x & ~bitmask.y); // ftou r0.xy, v0.yxyy r0.xy = (uint2)v0.yx; // Needs manual fix for instruction: // imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1) InterlockedAdd(HairElementsCounterUAV[0], 1, r1.x); // utof r0.xy, r0.xyxx // mad r0.x, r0.x, cb8[9].x, r0.y // mul r0.x, r0.x, l(4.000000) // ftou r0.x, r0.x r0.xy = (uint2)r0.xy; r0.x = r0.x * g_WinSize.x + r0.y; r0.x = 4 * r0.x; r0.x = (uint)r0.x; // This section somehow damaged, missing instructions. // imm_atomic_exch r0.x, u1, r0.x, r1.x HairPixelHeadUAV.InterlockedExchange(r0.x, r1.x, r0.x); // bfi r2.z, l(26), l(0), r0.x, l(0xfc000000) bitmask.z = ((~(-1 << 26)) << 0) & 0xffffffff; r2.z = (((uint)r0.x << 0) & bitmask.z) | ((uint)0xfc000000 & ~bitmask.z); // mov r2.w, v0.z r2.w = v0.z; // store_structured u2.xyzw, r1.x, l(0), r2.xyzw HairElementsUAV[r1.x].uPackedData0 = r2.x; HairElementsUAV[r1.x].uPackedData1 = r2.y; HairElementsUAV[r1.x].uPackedData2_Next = r2.z; HairElementsUAV[r1.x].fDepth = r2.w; // endif } // mov o0.xyzw, l(1.000000,0,0,1.000000) o0.xyzw = float4(1,0,0,1); // ret } /*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // // Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384 // // using 3Dmigoto v1.2.27 on Fri Jan 29 19:41:02 2016 // // // Note: shader requires additional functionality: // Early depth-stencil // // // Buffer Definitions: // // cbuffer cbPerScene // { // // row_major float4x4 g_mViewProj; // Offset: 0 Size: 64 [unused] // row_major float4x4 g_mInvViewProj; // Offset: 64 Size: 64 [unused] // float3 g_vEye; // Offset: 128 Size: 12 [unused] // float g_FiberAlpha; // Offset: 140 Size: 4 [unused] // float2 g_WinSize; // Offset: 144 Size: 8 // float g_FiberRadius; // Offset: 152 Size: 4 [unused] // float g_fvFov; // Offset: 156 Size: 4 [unused] // float2 g_zMinMax; // Offset: 160 Size: 8 [unused] // float g_zWriteValue; // Offset: 168 Size: 4 [unused] // float g_NoiseScale; // Offset: 172 Size: 4 [unused] // float4 g_ScreenExtents; // Offset: 176 Size: 16 // float3 g_dirtColor; // Offset: 192 Size: 12 // float g_dirtLevel; // Offset: 204 Size: 4 // uint g_baseGroupId; // Offset: 208 Size: 4 // uint g_dirtGroupId; // Offset: 212 Size: 4 // float g_widthCurveOverride; // Offset: 216 Size: 4 [unused] // float g_alphaCurveOverride; // Offset: 220 Size: 4 [unused] // float3 g_SlaveOffsetsReferencePosition;// Offset: 224 Size: 12 [unused] // float g_SlaveOffsetsScale; // Offset: 236 Size: 4 [unused] // // } // // Resource bind info for GroupRenderData // { // // struct HairGroupRenderData // { // // float noiseFrequency; // Offset: 0 // float noiseIntensity; // Offset: 4 // // struct curve // { // // float4 samples03; // Offset: 8 // float4 samples47; // Offset: 24 // // } thicknessCurve; // Offset: 8 // // struct curve // { // // float4 samples03; // Offset: 40 // float4 samples47; // Offset: 56 // // } alphaCurve; // Offset: 40 // // } $Element; // Offset: 0 Size: 72 // // } // // Resource bind info for HairElementsUAV // { // // struct ABufferNode // { // // uint uPackedData0; // Offset: 0 // uint uPackedData1; // Offset: 4 // uint uPackedData2_Next; // Offset: 8 // float fDepth; // Offset: 12 // // } $Element; // Offset: 0 Size: 16 // // } // // Resource bind info for HairElementsCounterUAV // { // // uint $Element; // Offset: 0 Size: 4 // // } // // // Resource Bindings: // // Name Type Format Dim Slot Elements // ------------------------------ ---------- ------- ----------- ---- -------- // g_sTrilinear sampler NA NA 1 1 // g_txNoise texture float 2d 3 1 // g_txDirt texture float 2d 4 1 // GroupRenderData texture struct r/o 5 1 // HairPixelHeadUAV UAV byte r/w 1 1 // HairElementsUAV UAV struct r/w 2 1 // HairElementsCounterUAV UAV struct r/w 3 1 // cbPerScene cbuffer NA NA 8 1 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_POSITION 0 xyzw 0 POS float xyz // TANGENT 0 xyzw 1 NONE float xyzw // TEXCOORD 0 xyzw 2 NONE float xyzw // TEXCOORD 1 xyz 3 NONE float xyz // COLOR 0 xyzw 4 NONE float xyzw // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Target 0 xyzw 0 TARGET float xyzw // ps_5_0 dcl_globalFlags refactoringAllowed | forceEarlyDepthStencil dcl_constantbuffer cb8[14], immediateIndexed dcl_sampler s1, mode_default dcl_resource_texture2d (float,float,float,float) t3 dcl_resource_texture2d (float,float,float,float) t4 dcl_resource_structured t5, 72 dcl_uav_raw u1 dcl_uav_structured u2, 16 dcl_uav_structured u3, 4 dcl_input_ps_siv linear noperspective v0.xyz, position dcl_input_ps linear v1.xyzw dcl_input_ps linear v2.xyzw dcl_input_ps linear v3.xyz dcl_input_ps linear v4.xyzw dcl_output o0.xyzw dcl_temps 4 dp2 r0.x, cb8[11].zzzz, v0.xxxx add r0.xz, r0.xxxx, l(-1.000000, 0.000000, -1.000000, 0.000000) dp2 r1.x, cb8[11].wwww, v0.yyyy add r0.yw, -r1.xxxx, l(0.000000, 1.000000, 0.000000, 1.000000) mul r1.xyzw, v2.xyzw, cb8[9].xyxy mad r0.xyzw, -r0.xyzw, cb8[9].xyxy, r1.xyzw dp2 r0.x, r0.xyxx, r0.xyxx dp2 r0.y, r0.zwzz, r0.zwzz mad r0.zw, v2.xxxy, cb8[9].xxxy, -r1.zzzw dp2 r0.z, r0.zwzz, r0.zwzz sqrt r0.xyz, r0.xyzx ge r0.w, r0.x, r0.z ge r0.z, r0.y, r0.z and r1.xy, r0.wzww, l(0x3f800000, 0x3f800000, 0, 0) dp2 r0.z, r1.xyxx, r1.xyxx ne r0.z, r0.z, l(0.000000) movc r0.z, r0.z, l(-1.000000), l(1.000000) min r0.x, r0.y, r0.x min r0.x, r0.x, l(1.000000) mad r0.x, r0.z, r0.x, l(1.000000) mul r0.x, r0.x, v3.z lt r0.y, l(0.00776470592), r0.x mov r1.x, v1.w mov r1.y, l(0.500000) sample_indexable(texture2d)(float,float,float,float) r0.z, r1.xyxx, t3.yzxw, s1 if_nz r0.y sample_indexable(texture2d)(float,float,float,float) r0.y, v3.xyxx, t4.yxzw, s1 ftou r0.w, v4.w add r1.x, l(1.000000), -cb8[12].w add r0.y, r0.y, -r1.x add r1.x, l(0.001000), cb8[12].w div_sat r0.y, r0.y, r1.x mad r1.xyz, r0.yyyy, cb8[12].xyzx, -v4.xyzx mad_sat r1.xyz, r0.yyyy, r1.xyzx, v4.xyzx dp3 r1.w, v1.xyzx, v1.xyzx rsq r1.w, r1.w mul r2.xyz, r1.wwww, v1.xyzx ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx mul r0.z, r0.z, r0.w lt r0.y, l(0.500000), r0.y utof r3.xy, cb8[13].yxyy add r0.w, r3.y, v4.w movc r0.y, r0.y, r3.x, r0.w mad_sat r2.xyz, r2.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000) mul r2.xyz, r2.xyzx, l(255.000000, 255.000000, 255.000000, 0.000000) ftou r2.xyz, r2.xyzx ishl r2.yz, r2.yyzy, l(0, 16, 8, 0) imad r0.w, r2.x, l(0x01000000), r2.y iadd r0.w, r0.w, r2.z mad_sat r0.x, -r0.x, l(0.500000), l(1.000000) mul r0.x, r0.x, l(255.000000) ftou r0.xy, r0.xyxx iadd r2.x, r0.w, r0.x mul r1.xyz, r1.xyzx, l(127.000000, 127.000000, 63.000000, 0.000000) ftou r1.xyz, r1.xyzx ishl r0.xw, r1.yyyz, l(18, 0, 0, 12) imad r0.x, r1.x, l(0x02000000), r0.x iadd r0.x, r0.x, r0.w mul_sat r0.z, r0.z, l(4.000000) mul r0.z, r0.z, l(63.000000) ftou r0.z, r0.z ishl r0.z, r0.z, l(6) iadd r0.x, r0.x, r0.z bfi r2.y, l(6), l(0), r0.y, r0.x ftou r0.xy, v0.yxyy imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1) utof r0.xy, r0.xyxx mad r0.x, r0.x, cb8[9].x, r0.y mul r0.x, r0.x, l(4.000000) ftou r0.x, r0.x imm_atomic_exch r0.x, u1, r0.x, r1.x bfi r2.z, l(26), l(0), r0.x, l(0xfc000000) mov r2.w, v0.z store_structured u2.xyzw, r1.x, l(0), r2.xyzw endif mov o0.xyzw, l(1.000000,0,0,1.000000) ret // Approximately 77 instruction slots used ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/[/code]
Those changes are pretty close, but I think they still won't compile. This one is pretty complicated, it took me awhile to work out the syntax for the different variants. Rather than do them piece by piece, here is that shader from above, with a bunch of manual fixes (documented). You can compare to what you started with to see the changes.

This compiles, but I can't promise that it's working fully correctly, and don't have time at the moment to compare the output to the original.

It's also worth noting that there were some complicated missing instructions at the bottom of the shader. I would have expected some "missing instruction" output, not have them be actually blank. I put those back.


// ---- Created with 3Dmigoto v1.2.27 on Fri Jan 29 19:41:02 2016
// HAIR ON

cbuffer cbPerScene : register(b8)
{
row_major float4x4 g_mViewProj : packoffset(c0);
row_major float4x4 g_mInvViewProj : packoffset(c4);
float3 g_vEye : packoffset(c8);
float g_FiberAlpha : packoffset(c8.w);
float2 g_WinSize : packoffset(c9);
float g_FiberRadius : packoffset(c9.z);
float g_fvFov : packoffset(c9.w);
float2 g_zMinMax : packoffset(c10);
float g_zWriteValue : packoffset(c10.z);
float g_NoiseScale : packoffset(c10.w);
float4 g_ScreenExtents : packoffset(c11);
float3 g_dirtColor : packoffset(c12);
float g_dirtLevel : packoffset(c12.w);
uint g_baseGroupId : packoffset(c13);
uint g_dirtGroupId : packoffset(c13.y);
float g_widthCurveOverride : packoffset(c13.z);
float g_alphaCurveOverride : packoffset(c13.w);
float3 g_SlaveOffsetsReferencePosition : packoffset(c14);
float g_SlaveOffsetsScale : packoffset(c14.w);
}

// Manual copy from ASM declaration
// Resource bind info for GroupRenderData
// {
//
// struct HairGroupRenderData
// {
//
// float noiseFrequency; // Offset: 0
// float noiseIntensity; // Offset: 4
//
// struct curve
// {
//
// float4 samples03; // Offset: 8
// float4 samples47; // Offset: 24
//
// } thicknessCurve; // Offset: 8
//
// struct curve
// {
//
// float4 samples03; // Offset: 40
// float4 samples47; // Offset: 56
//
// } alphaCurve; // Offset: 40
//
// } $Element; // Offset: 0 Size: 72
//
// }
// Becomes:
struct HairGroupRenderData
{

float noiseFrequency; // Offset: 0
float noiseIntensity; // Offset: 4

struct
{

float4 samples03; // Offset: 8
float4 samples47; // Offset: 24

} thicknessCurve; // Offset: 8

struct
{

float4 samples03; // Offset: 40
float4 samples47; // Offset: 56

} alphaCurve; // Offset: 40

}; // Offset: 0 Size: 72

// Resource bind info for HairElementsUAV
// {
//
// struct ABufferNode
// {
//
// uint uPackedData0; // Offset: 0
// uint uPackedData1; // Offset: 4
// uint uPackedData2_Next; // Offset: 8
// float fDepth; // Offset: 12
//
// } $Element; // Offset: 0 Size: 16
//
// }
// Becomes:
struct ABufferNode
{

uint uPackedData0; // Offset: 0
uint uPackedData1; // Offset: 4
uint uPackedData2_Next; // Offset: 8
float fDepth; // Offset: 12

}; // Offset: 0 Size: 16

SamplerState g_sTrilinear_s : register(s1);
Texture2D<float> g_txNoise : register(t3);
Texture2D<float> g_txDirt : register(t4);

// Use main struct for definition
StructuredBuffer<HairGroupRenderData> GroupRenderData : register(t5);

// Manually added u1,u2,u3

RWByteAddressBuffer HairPixelHeadUAV : register(u1); //dcl_uav_raw u1
RWStructuredBuffer<ABufferNode> HairElementsUAV : register(u2); //dcl_uav_structured u2, 16
RWStructuredBuffer<uint> HairElementsCounterUAV : register(u3); //dcl_uav_structured u3, 4


// 3Dmigoto declarations
#define cmp -
Texture1D<float4> IniParams : register(t120);
Texture2D<float4> StereoParams : register(t125);



void main(
float4 v0 : SV_POSITION0,
float4 v1 : TANGENT0,
float4 v2 : TEXCOORD0,
float3 v3 : TEXCOORD1,
float4 v4 : COLOR0,
out float4 o0 : SV_Target0)
{
float4 r0,r1,r2,r3;
uint4 bitmask, uiDest;
float4 fDest;

r0.x = dot(g_ScreenExtents.zz, v0.xx);
r0.xz = float2(-1,-1) + r0.xx;
r1.x = dot(g_ScreenExtents.ww, v0.yy);
r0.yw = float2(1,1) + -r1.xx;
r1.xyzw = g_WinSize.xyxy * v2.xyzw;
r0.xyzw = -r0.xyzw * g_WinSize.xyxy + r1.xyzw;
r0.x = dot(r0.xy, r0.xy);
r0.y = dot(r0.zw, r0.zw);
r0.zw = v2.xy * g_WinSize.xy + -r1.zw;
r0.z = dot(r0.zw, r0.zw);
r0.xyz = sqrt(r0.xyz);
r0.w = cmp(r0.x >= r0.z);
r0.z = cmp(r0.y >= r0.z);
r1.xy = r0.wz ? 1.000000 : 0;
r0.z = dot(r1.xy, r1.xy);
r0.z = cmp(r0.z != 0.000000);
r0.z = r0.z ? -1 : 1;
r0.x = min(r0.x, r0.y);
r0.x = min(1, r0.x);
r0.x = r0.z * r0.x + 1;
r0.x = v3.z * r0.x;
r0.y = cmp(0.00776470592 < r0.x);
r1.x = v1.w;
r1.y = 0.5;
r0.z = g_txNoise.Sample(g_sTrilinear_s, r1.xy).x;
if (r0.y != 0) {
r0.y = g_txDirt.Sample(g_sTrilinear_s, v3.xy).x;
r0.w = (uint)v4.w;
r1.x = -g_dirtLevel + 1;
r0.y = -r1.x + r0.y;
r1.x = g_dirtLevel + 0.00100000005;
r0.y = saturate(r0.y / r1.x);
r1.xyz = r0.yyy * g_dirtColor.xyz + -v4.xyz;
r1.xyz = saturate(r0.yyy * r1.xyz + v4.xyz);
r1.w = dot(v1.xyz, v1.xyz);
r1.w = rsqrt(r1.w);
r2.xyz = v1.xyz * r1.www;

// Known bad code for instruction (needs manual fix):
//ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx
r0.w = GroupRenderData[r0.w].noiseIntensity;

r0.z = r0.w * r0.z;
r0.y = cmp(0.5 < r0.y);
r3.xy = g_dirtGroupId;
r0.w = v4.w + r3.y;
r0.y = r0.y ? r3.x : r0.w;
r2.xyz = saturate(r2.xyz * float3(0.5,0.5,0.5) + float3(0.5,0.5,0.5));
r2.xyz = float3(255,255,255) * r2.xyz;
r2.xyz = (uint3)r2.xyz;
r2.yz = (uint2)r2.yz << int2(16,8);
r0.w = mad((int)r2.x, 0x01000000, (int)r2.y);
r0.w = (int)r0.w + (int)r2.z;
r0.x = saturate(-r0.x * 0.5 + 1);
r0.x = 255 * r0.x;
r0.xy = (uint2)r0.xy;
r2.x = (int)r0.w + (int)r0.x;
r1.xyz = float3(127,127,63) * r1.xyz;
r1.xyz = (uint3)r1.xyz;
r0.xw = (uint2)r1.yz << int2(18,12);
r0.x = mad((int)r1.x, 0x02000000, (int)r0.x);
r0.x = (int)r0.x + (int)r0.w;
r0.z = saturate(4 * r0.z);
r0.z = 63 * r0.z;

// ftou r0.z, r0.z
// ishl r0.z, r0.z, l(6)
// iadd r0.x, r0.x, r0.z
r0.z = (uint)r0.z;
r0.z = (uint)r0.z << 6;
r0.x = (int)r0.x + (int)r0.z;

// bfi r2.y, l(6), l(0), r0.y, r0.x
bitmask.y = ((~(-1 << 6)) << 0) & 0xffffffff;
r2.y = (((uint)r0.y << 0) & bitmask.y) | ((uint)r0.x & ~bitmask.y);

// ftou r0.xy, v0.yxyy
r0.xy = (uint2)v0.yx;

// Needs manual fix for instruction:
// imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1)
InterlockedAdd(HairElementsCounterUAV[0], 1, r1.x);

// utof r0.xy, r0.xyxx
// mad r0.x, r0.x, cb8[9].x, r0.y
// mul r0.x, r0.x, l(4.000000)
// ftou r0.x, r0.x
r0.xy = (uint2)r0.xy;
r0.x = r0.x * g_WinSize.x + r0.y;
r0.x = 4 * r0.x;
r0.x = (uint)r0.x;

// This section somehow damaged, missing instructions.

// imm_atomic_exch r0.x, u1, r0.x, r1.x
HairPixelHeadUAV.InterlockedExchange(r0.x, r1.x, r0.x);

// bfi r2.z, l(26), l(0), r0.x, l(0xfc000000)
bitmask.z = ((~(-1 << 26)) << 0) & 0xffffffff;
r2.z = (((uint)r0.x << 0) & bitmask.z) | ((uint)0xfc000000 & ~bitmask.z);

// mov r2.w, v0.z
r2.w = v0.z;

// store_structured u2.xyzw, r1.x, l(0), r2.xyzw
HairElementsUAV[r1.x].uPackedData0 = r2.x;
HairElementsUAV[r1.x].uPackedData1 = r2.y;
HairElementsUAV[r1.x].uPackedData2_Next = r2.z;
HairElementsUAV[r1.x].fDepth = r2.w;

// endif
}
// mov o0.xyzw, l(1.000000,0,0,1.000000)
o0.xyzw = float4(1,0,0,1);
// ret
}

/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//
// Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384
//
// using 3Dmigoto v1.2.27 on Fri Jan 29 19:41:02 2016
//
//
// Note: shader requires additional functionality:
// Early depth-stencil
//
//
// Buffer Definitions:
//
// cbuffer cbPerScene
// {
//
// row_major float4x4 g_mViewProj; // Offset: 0 Size: 64 [unused]
// row_major float4x4 g_mInvViewProj; // Offset: 64 Size: 64 [unused]
// float3 g_vEye; // Offset: 128 Size: 12 [unused]
// float g_FiberAlpha; // Offset: 140 Size: 4 [unused]
// float2 g_WinSize; // Offset: 144 Size: 8
// float g_FiberRadius; // Offset: 152 Size: 4 [unused]
// float g_fvFov; // Offset: 156 Size: 4 [unused]
// float2 g_zMinMax; // Offset: 160 Size: 8 [unused]
// float g_zWriteValue; // Offset: 168 Size: 4 [unused]
// float g_NoiseScale; // Offset: 172 Size: 4 [unused]
// float4 g_ScreenExtents; // Offset: 176 Size: 16
// float3 g_dirtColor; // Offset: 192 Size: 12
// float g_dirtLevel; // Offset: 204 Size: 4
// uint g_baseGroupId; // Offset: 208 Size: 4
// uint g_dirtGroupId; // Offset: 212 Size: 4
// float g_widthCurveOverride; // Offset: 216 Size: 4 [unused]
// float g_alphaCurveOverride; // Offset: 220 Size: 4 [unused]
// float3 g_SlaveOffsetsReferencePosition;// Offset: 224 Size: 12 [unused]
// float g_SlaveOffsetsScale; // Offset: 236 Size: 4 [unused]
//
// }
//
// Resource bind info for GroupRenderData
// {
//
// struct HairGroupRenderData
// {
//
// float noiseFrequency; // Offset: 0
// float noiseIntensity; // Offset: 4
//
// struct curve
// {
//
// float4 samples03; // Offset: 8
// float4 samples47; // Offset: 24
//
// } thicknessCurve; // Offset: 8
//
// struct curve
// {
//
// float4 samples03; // Offset: 40
// float4 samples47; // Offset: 56
//
// } alphaCurve; // Offset: 40
//
// } $Element; // Offset: 0 Size: 72
//
// }
//
// Resource bind info for HairElementsUAV
// {
//
// struct ABufferNode
// {
//
// uint uPackedData0; // Offset: 0
// uint uPackedData1; // Offset: 4
// uint uPackedData2_Next; // Offset: 8
// float fDepth; // Offset: 12
//
// } $Element; // Offset: 0 Size: 16
//
// }
//
// Resource bind info for HairElementsCounterUAV
// {
//
// uint $Element; // Offset: 0 Size: 4
//
// }
//
//
// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// g_sTrilinear sampler NA NA 1 1
// g_txNoise texture float 2d 3 1
// g_txDirt texture float 2d 4 1
// GroupRenderData texture struct r/o 5 1
// HairPixelHeadUAV UAV byte r/w 1 1
// HairElementsUAV UAV struct r/w 2 1
// HairElementsCounterUAV UAV struct r/w 3 1
// cbPerScene cbuffer NA NA 8 1
//
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_POSITION 0 xyzw 0 POS float xyz
// TANGENT 0 xyzw 1 NONE float xyzw
// TEXCOORD 0 xyzw 2 NONE float xyzw
// TEXCOORD 1 xyz 3 NONE float xyz
// COLOR 0 xyzw 4 NONE float xyzw
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_Target 0 xyzw 0 TARGET float xyzw
//
ps_5_0
dcl_globalFlags refactoringAllowed | forceEarlyDepthStencil
dcl_constantbuffer cb8[14], immediateIndexed
dcl_sampler s1, mode_default
dcl_resource_texture2d (float,float,float,float) t3
dcl_resource_texture2d (float,float,float,float) t4
dcl_resource_structured t5, 72
dcl_uav_raw u1
dcl_uav_structured u2, 16
dcl_uav_structured u3, 4
dcl_input_ps_siv linear noperspective v0.xyz, position
dcl_input_ps linear v1.xyzw
dcl_input_ps linear v2.xyzw
dcl_input_ps linear v3.xyz
dcl_input_ps linear v4.xyzw
dcl_output o0.xyzw
dcl_temps 4
dp2 r0.x, cb8[11].zzzz, v0.xxxx
add r0.xz, r0.xxxx, l(-1.000000, 0.000000, -1.000000, 0.000000)
dp2 r1.x, cb8[11].wwww, v0.yyyy
add r0.yw, -r1.xxxx, l(0.000000, 1.000000, 0.000000, 1.000000)
mul r1.xyzw, v2.xyzw, cb8[9].xyxy
mad r0.xyzw, -r0.xyzw, cb8[9].xyxy, r1.xyzw
dp2 r0.x, r0.xyxx, r0.xyxx
dp2 r0.y, r0.zwzz, r0.zwzz
mad r0.zw, v2.xxxy, cb8[9].xxxy, -r1.zzzw
dp2 r0.z, r0.zwzz, r0.zwzz
sqrt r0.xyz, r0.xyzx
ge r0.w, r0.x, r0.z
ge r0.z, r0.y, r0.z
and r1.xy, r0.wzww, l(0x3f800000, 0x3f800000, 0, 0)
dp2 r0.z, r1.xyxx, r1.xyxx
ne r0.z, r0.z, l(0.000000)
movc r0.z, r0.z, l(-1.000000), l(1.000000)
min r0.x, r0.y, r0.x
min r0.x, r0.x, l(1.000000)
mad r0.x, r0.z, r0.x, l(1.000000)
mul r0.x, r0.x, v3.z
lt r0.y, l(0.00776470592), r0.x
mov r1.x, v1.w
mov r1.y, l(0.500000)
sample_indexable(texture2d)(float,float,float,float) r0.z, r1.xyxx, t3.yzxw, s1
if_nz r0.y
sample_indexable(texture2d)(float,float,float,float) r0.y, v3.xyxx, t4.yxzw, s1
ftou r0.w, v4.w
add r1.x, l(1.000000), -cb8[12].w
add r0.y, r0.y, -r1.x
add r1.x, l(0.001000), cb8[12].w
div_sat r0.y, r0.y, r1.x
mad r1.xyz, r0.yyyy, cb8[12].xyzx, -v4.xyzx
mad_sat r1.xyz, r0.yyyy, r1.xyzx, v4.xyzx
dp3 r1.w, v1.xyzx, v1.xyzx
rsq r1.w, r1.w
mul r2.xyz, r1.wwww, v1.xyzx
ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx
mul r0.z, r0.z, r0.w
lt r0.y, l(0.500000), r0.y
utof r3.xy, cb8[13].yxyy
add r0.w, r3.y, v4.w
movc r0.y, r0.y, r3.x, r0.w
mad_sat r2.xyz, r2.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000)
mul r2.xyz, r2.xyzx, l(255.000000, 255.000000, 255.000000, 0.000000)
ftou r2.xyz, r2.xyzx
ishl r2.yz, r2.yyzy, l(0, 16, 8, 0)
imad r0.w, r2.x, l(0x01000000), r2.y
iadd r0.w, r0.w, r2.z
mad_sat r0.x, -r0.x, l(0.500000), l(1.000000)
mul r0.x, r0.x, l(255.000000)
ftou r0.xy, r0.xyxx
iadd r2.x, r0.w, r0.x
mul r1.xyz, r1.xyzx, l(127.000000, 127.000000, 63.000000, 0.000000)
ftou r1.xyz, r1.xyzx
ishl r0.xw, r1.yyyz, l(18, 0, 0, 12)
imad r0.x, r1.x, l(0x02000000), r0.x
iadd r0.x, r0.x, r0.w
mul_sat r0.z, r0.z, l(4.000000)
mul r0.z, r0.z, l(63.000000)

ftou r0.z, r0.z
ishl r0.z, r0.z, l(6)
iadd r0.x, r0.x, r0.z

bfi r2.y, l(6), l(0), r0.y, r0.x

ftou r0.xy, v0.yxyy

imm_atomic_iadd r1.x, u3, l(0, 0, 0, 0), l(1)

utof r0.xy, r0.xyxx
mad r0.x, r0.x, cb8[9].x, r0.y
mul r0.x, r0.x, l(4.000000)
ftou r0.x, r0.x

imm_atomic_exch r0.x, u1, r0.x, r1.x
bfi r2.z, l(26), l(0), r0.x, l(0xfc000000)
mov r2.w, v0.z
store_structured u2.xyzw, r1.x, l(0), r2.xyzw
endif
mov o0.xyzw, l(1.000000,0,0,1.000000)
ret
// Approximately 77 instruction slots used

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 02/03/2016 01:33 PM   
Wow, big thank you bo3b for taking the time and looking into this! I haven't even realised that some instructions were missing. Will definitely try it out when I get home and will let you know if is working or not;) Big thank you again!
Wow, big thank you bo3b for taking the time and looking into this!
I haven't even realised that some instructions were missing.
Will definitely try it out when I get home and will let you know if is working or not;)

Big thank you again!

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 02/03/2016 01:52 PM   
Glad to help out, hope it works. I put up an online diff for comparison: https://www.diffchecker.com/iywcixuf The big instructions like ld_indexed and InterlockedExchange are all correct, but the compiler moves stuff into different registers so it looks like a lot of mismatch when it's just using different registers like r2 instead of r1. Overall this looks pretty close, but could have glitches. Didn't help that Microsoft documentation on InterlockedExchange is 100% wrong on the syntax. I've got a pretty good handle on the ld_structured_indexable we are seeing a lot in SM5 games, and I have an idea for how to generate the right code for these struct loads. Take me awhile, but I should be able to solve that set of manual fixes. Like: [code]//ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx r0.w = GroupRenderData[r0.w].noiseIntensity; [/code]
Glad to help out, hope it works. I put up an online diff for comparison:

https://www.diffchecker.com/iywcixuf

The big instructions like ld_indexed and InterlockedExchange are all correct, but the compiler moves stuff into different registers so it looks like a lot of mismatch when it's just using different registers like r2 instead of r1.

Overall this looks pretty close, but could have glitches.

Didn't help that Microsoft documentation on InterlockedExchange is 100% wrong on the syntax.


I've got a pretty good handle on the ld_structured_indexable we are seeing a lot in SM5 games, and I have an idea for how to generate the right code for these struct loads. Take me awhile, but I should be able to solve that set of manual fixes. Like:

//ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx
r0.w = GroupRenderData[r0.w].noiseIntensity;

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 02/03/2016 02:08 PM   
[quote="helifax"]So, I know some of US (included me) hate CM for not being as GOOD as proper Geometry rendered 3D. I agree every second is not as good and NEVER will be! ... [/quote] Thank you for the profile, it works very well, for CM :) But I am also looking forward for the fix you guys are working on. You do a great job!
helifax said:So, I know some of US (included me) hate CM for not being as GOOD as proper Geometry rendered 3D. I agree every second is not as good and NEVER will be!
...


Thank you for the profile, it works very well, for CM :)
But I am also looking forward for the fix you guys are working on. You do a great job!

Gaming Machine:
CPU: i7-5960X @4.5 GHz | Board: Asus Rampage V Extreme| GPU: GTX Titan X 2000/5000| RAM: 16 GB Corsair Vengance LPX DDR4 2666@2800 | PSU: Lepa G1600 | SSD: Samsung 830 Intel SSD 750 | Case: Corsair 500r | Monitor: Asus PG278Q, Asus PB279Q | OS: Win10 x64 | Cooling: Airplex Gigant: http://www.forum-3dcenter.org/vbulletin/showthread.php?t=557708


Home Entertainment:
Sony X85C 65", Nvidia Shield Pro 500 GB, Denon AVR X1200W, 5.0 Nubert nuJubilee 40

Posted 02/03/2016 10:28 PM   
Nothing to worry about! Mike has already fixed a lot of stuff. More stuff remains to be fixed as well (and some of it looks complicated) unfortunately so it will take a while. If you can't wait for a fix then you can always use the CM, but do plan on keeping an eye around here as once the fix will be done this game DEMANDS a re-play!!! ;)
Nothing to worry about! Mike has already fixed a lot of stuff. More stuff remains to be fixed as well (and some of it looks complicated) unfortunately so it will take a while. If you can't wait for a fix then you can always use the CM, but do plan on keeping an eye around here as once the fix will be done this game DEMANDS a re-play!!! ;)

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 02/03/2016 11:21 PM   
[quote="bo3b"]Glad to help out, hope it works. I put up an online diff for comparison: https://www.diffchecker.com/iywcixuf The big instructions like ld_indexed and InterlockedExchange are all correct, but the compiler moves stuff into different registers so it looks like a lot of mismatch when it's just using different registers like r2 instead of r1. Overall this looks pretty close, but could have glitches. Didn't help that Microsoft documentation on InterlockedExchange is 100% wrong on the syntax. I've got a pretty good handle on the ld_structured_indexable we are seeing a lot in SM5 games, and I have an idea for how to generate the right code for these struct loads. Take me awhile, but I should be able to solve that set of manual fixes. Like: [code]//ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx r0.w = GroupRenderData[r0.w].noiseIntensity; [/code] [/quote] Yes! The shader Worked very well actually! Unfortunately, it wasn't "the droid" I was looking for.../sigh :( So, back to square one:( Again big thanks! Note to self: I really need to pin down all the fixes discussed in these threads otherwise I always forget and I tend to ask the same questions later on -_-...
bo3b said:Glad to help out, hope it works. I put up an online diff for comparison:


https://www.diffchecker.com/iywcixuf


The big instructions like ld_indexed and InterlockedExchange are all correct, but the compiler moves stuff into different registers so it looks like a lot of mismatch when it's just using different registers like r2 instead of r1.

Overall this looks pretty close, but could have glitches.

Didn't help that Microsoft documentation on InterlockedExchange is 100% wrong on the syntax.


I've got a pretty good handle on the ld_structured_indexable we are seeing a lot in SM5 games, and I have an idea for how to generate the right code for these struct loads. Take me awhile, but I should be able to solve that set of manual fixes. Like:

//ld_structured_indexable(structured_buffer, stride=72)(mixed,mixed,mixed,mixed) r0.w, r0.w, l(4), t5.xxxx
r0.w = GroupRenderData[r0.w].noiseIntensity;



Yes! The shader Worked very well actually!
Unfortunately, it wasn't "the droid" I was looking for.../sigh :( So, back to square one:(

Again big thanks!
Note to self: I really need to pin down all the fixes discussed in these threads otherwise I always forget and I tend to ask the same questions later on -_-...

1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc


My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com

(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)

Posted 02/03/2016 11:23 PM   
  13 / 41    
Scroll To Top