Thanks, bo3b!
By the way, the memory leak only happens when hunting mode is enabled (I see you opened an issue in Github about this). With it disabled, I used alt+enter multiple times and VRAM remained steady.
I've released the fixes for Zelda OOT here: https://forums.dolphin-emu.org/Thread-zelda-collector-s-edition-hd-texture-pack-v0-4-patch-2-new-dds-link-and-ar-fix
If the iniparams issue gets fixed someday, 2 fixes packs will be needed instead of 4.
By the way, the memory leak only happens when hunting mode is enabled (I see you opened an issue in Github about this). With it disabled, I used alt+enter multiple times and VRAM remained steady.
Totally not related to Dolphin but related to the memory leak on alt+tabbing:
- I saw this thing a while back on Dragon Age: Inquisition as well. If you alt+tab (while shader hunting=1) around 4-5 times and also monitor the RAM usage you will see that at some point your whole RAM is used (in my case 16GB) and stars dumping stuff on the HDD. At some point it will silently crash the game...
Now I haven't checked this thing lately and I can't remember if you found the issue there or not... but I just remembered this and maybe it helps you identify the issue easier?:)
Totally not related to Dolphin but related to the memory leak on alt+tabbing:
- I saw this thing a while back on Dragon Age: Inquisition as well. If you alt+tab (while shader hunting=1) around 4-5 times and also monitor the RAM usage you will see that at some point your whole RAM is used (in my case 16GB) and stars dumping stuff on the HDD. At some point it will silently crash the game...
Now I haven't checked this thing lately and I can't remember if you found the issue there or not... but I just remembered this and maybe it helps you identify the issue easier?:)
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
[quote=""]Totally not related to Dolphin but related to the memory leak on alt+tabbing:
- I saw this thing a while back on Dragon Age: Inquisition as well. If you alt+tab (while shader hunting=1) around 4-5 times and also monitor the RAM usage you will see that at some point your whole RAM is used (in my case 16GB) and stars dumping stuff on the HDD. At some point it will silently crash the game...[/quote]
In my case Shadow Warrior and Dolphin filled my VRAM, and after a few more alt+tabbing, they crash, or make the Nvidia drivers crash, or Windows complained about the lack of memory, but the system RAM usage was normal.
By the way, with 3Dmigoto in 2D, not only constants don't work. Shader partnering also doesn't work, for example. It made the vertex shader not apply to anything, instead of applying to the defined pixel shader. This gets in my way of fixing precisely a game with 2D backgrounds that should be played in 2D.
said:Totally not related to Dolphin but related to the memory leak on alt+tabbing:
- I saw this thing a while back on Dragon Age: Inquisition as well. If you alt+tab (while shader hunting=1) around 4-5 times and also monitor the RAM usage you will see that at some point your whole RAM is used (in my case 16GB) and stars dumping stuff on the HDD. At some point it will silently crash the game...
In my case Shadow Warrior and Dolphin filled my VRAM, and after a few more alt+tabbing, they crash, or make the Nvidia drivers crash, or Windows complained about the lack of memory, but the system RAM usage was normal.
By the way, with 3Dmigoto in 2D, not only constants don't work. Shader partnering also doesn't work, for example. It made the vertex shader not apply to anything, instead of applying to the defined pixel shader. This gets in my way of fixing precisely a game with 2D backgrounds that should be played in 2D.
Just posted an update to 3Dmigoto for version 1.2.1. [url]https://github.com/bo3b/3Dmigoto/releases[/url]
This version primarily includes a new hash calculation for textures and buffers. If you want to use TextureOverrides, the hash calculation is now completely different and so the hashes themselves will be different.
Old fixes using those will not directly work with this new version, which is why we are bumping to 1.2, it's an incompatible change. I think there are only 2 or 3 fixes using this, including FarCry4, Crysis2&3. If it's too inconvenient, it's always OK to use the older version.
The new hash is only 32 bits instead of 64, so a bit easier to manage. The hash function for shaders themselves did not change, as there was no advantage there, and plenty of shaders already out there. This will make it easy to tell the difference between hashes, as textures will be shorter.
The reason to do this change was to improve the performance of the texture override function, as it was using 2.4% or so of the CPU, almost all in the hash calculation itself. This change uses a hardware based crc32c function that is in all modern processors, and so it's some 30x faster than the old one.
With this change, we are now back to using 0.8% of the CPU for 3Dmigoto as measured with performance tools.
This version primarily includes a new hash calculation for textures and buffers. If you want to use TextureOverrides, the hash calculation is now completely different and so the hashes themselves will be different.
Old fixes using those will not directly work with this new version, which is why we are bumping to 1.2, it's an incompatible change. I think there are only 2 or 3 fixes using this, including FarCry4, Crysis2&3. If it's too inconvenient, it's always OK to use the older version.
The new hash is only 32 bits instead of 64, so a bit easier to manage. The hash function for shaders themselves did not change, as there was no advantage there, and plenty of shaders already out there. This will make it easy to tell the difference between hashes, as textures will be shorter.
The reason to do this change was to improve the performance of the texture override function, as it was using 2.4% or so of the CPU, almost all in the hash calculation itself. This change uses a hardware based crc32c function that is in all modern processors, and so it's some 30x faster than the old one.
With this change, we are now back to using 0.8% of the CPU for 3Dmigoto as measured with performance tools.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
Looks like I have to add support for this eventually. Did you consider making it backwards compatible.
Basically detect if textureObveride uses old or new hashes and either do the efficient new texture hashing or de old one depending on d3dx.ini.
As you mentioned it only affects a few game fixes so not sure if it matters.
Yes, we considered it, but DarkStarSword and I thought now was a good time to make a clean break there because only a couple of games are using it. If someone decides to revisit those games, it won't be a large burden to replace the TextureOverride.
So, I wouldn't worry about the old version, and just go with the new version. The crc32c library code is in the project, and I did not modify the project, so you should be able to just grab those code files and call them.
It's basically changing the call from fnv_64_buf for Texture2D, Texture3D, and Buffer to use the append_hw call. You might want to use the superset version of CalcTexture2DDescHash from HackerDevice.cpp, as that includes the code to include the pDesc.
Please look for example at CreateTexture2D to be able to exactly match the new texture hash for these. It should be simpler than it was before.
Yes, we considered it, but DarkStarSword and I thought now was a good time to make a clean break there because only a couple of games are using it. If someone decides to revisit those games, it won't be a large burden to replace the TextureOverride.
So, I wouldn't worry about the old version, and just go with the new version. The crc32c library code is in the project, and I did not modify the project, so you should be able to just grab those code files and call them.
It's basically changing the call from fnv_64_buf for Texture2D, Texture3D, and Buffer to use the append_hw call. You might want to use the superset version of CalcTexture2DDescHash from HackerDevice.cpp, as that includes the code to include the pDesc.
Please look for example at CreateTexture2D to be able to exactly match the new texture hash for these. It should be simpler than it was before.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
I'm having an issue with a game I'm looking at. I have a number of shaders (4 so far) that get blanked out simply by being in the ShaderFixes directory. When I look in the D3d11 log for the associated shader, it only shows "D:\Origin Games\BFH\wrapper1349(116,3-14): warning X3557: loop only executes for 0 iteration(s), forcing loop to unroll". I'm using the latest 3Dmigoto version (1.2.3) and I tried with hunting off, but no dice. I'm wondering if there's anything else I can try to fix this issue?
I'm having an issue with a game I'm looking at. I have a number of shaders (4 so far) that get blanked out simply by being in the ShaderFixes directory. When I look in the D3d11 log for the associated shader, it only shows "D:\Origin Games\BFH\wrapper1349(116,3-14): warning X3557: loop only executes for 0 iteration(s), forcing loop to unroll". I'm using the latest 3Dmigoto version (1.2.3) and I tried with hunting off, but no dice. I'm wondering if there's anything else I can try to fix this issue?
4everAwake..look the wip i send you. The shader for reflection has a manual fix (replace one line) and also in the star wars battlefront we found a similar issue that need manually remove one line (you can look the thread)
Upload the shaders here...i think will be similar issue to the previous one i mentioned. Its an specific issue with the wrapper.
4everAwake..look the wip i send you. The shader for reflection has a manual fix (replace one line) and also in the star wars battlefront we found a similar issue that need manually remove one line (you can look the thread)
Upload the shaders here...i think will be similar issue to the previous one i mentioned. Its an specific issue with the wrapper.
Introducing 3DMigoto 1.2.4 with support for copying resources arbitrarily between shaders:
https://github.com/bo3b/3Dmigoto/releases/tag/1.2.4
Currently supports copying:
- Constant Buffers (<type>s-cb<slot>)
- Vertex Buffers (vb<slot>)
- Index Buffers (ib)
- Stream Output Buffers (so<slot>)
- Textures (<type>s-t<slot>)
- Render Targets (o<slot>)
- Depth Targets (oD)
(Unordered Access Views and sampler states not yet supported)
I haven't fully documented this in the d3dx.ini yet, but here's a few examples to get you started.
Copy constant buffer 1 from the pixel shader to the vertex shader as constant buffer 13 (used in Mad Max to get access to the depth of a light to fix bloom):
[code]
[ShaderOverrideBloomVS]
; The vertex shader doesn't have access to the light depth, which we need to
; correct the bloom position accurately. The pixel shader does have the depth
; which it uses to calculate the bloom opacity. Bind the relevant constant
; buffer from the pixel shader to the vertex shader so we can get the correct
; depth and fix the position.
Hash=e9849e745227d124
vs-cb13 = ps-cb1
[/code]
- https://github.com/DarkStarSword/3d-fixes/commit/565daab56d5b4d52aa6aaa9e1b45cf37d03e2492
Copy a constant buffer from one shader to another, using an intermediate resource (used to fix specular highlights and environmental reflections in Unity 5 games):
[code]
; Define an intermediate resource to copy between different shaders. Custom
; resources like this start with "Resource":
[ResourceUnityPerCameraRare]
; Copy constant buffer 1 from the directional lighting shader into the
; intermediate resource:
[ShaderOverrideDirectional]
Hash = b78925705424e647
ResourceUnityPerCameraRare = vs-cb1
; Copy (actually reference) the intermediate resource into the physical
; lighting shader as constant buffer 13:
[ShaderOverridePhysical]
Hash = ca5cfc8e4d8b1ce5
vs-cb13 = ResourceUnityPerCameraRare
[/code]
- https://github.com/DarkStarSword/3d-fixes/commit/cacec95fe30661194ef014e1b386bb346dc10118
Copy the currently active depth buffer into a vertex shader (e.g. for automatically adjusting a crosshair depth) - this replaces the experimental depth_input feature, which has been removed:
[code]
[ShaderOverrideCrosshair]
Hash = 07e0f4c1eb997ee1
vs-t110 = oD
[/code]
Copy a vertex buffer into the shader as a constant buffer (Might be able to look up the position of other vertices?):
[code]
[ShaderOverrideClippedLight]
Hash = 1234
vs-cb12 = vb0
[/code]
Copy an active render target into the shader as a texture:
[code]
[ShaderOverrideClippedTransparency]
Hash = abcd
ps-t108 = o0
[/code]
There's also keywords to control some advanced features. For example, 3DMigoto will try to guess whether it should do a full copy of a resource, or only a lightweight reference, but maybe you want to override this:
[code]
[ResourceTempStorage]
[ShaderOverrideFoo]
Hash = foo
; Assigning *to* a temporary resource defaults to copying, but maybe we
; actually want a reference either to get any updates the game makes to the
; original texture, or because we know the resource won't change before we need
; to use it:
ResourceTempStorage = reference ps-t0
[ShaderOverrideBar]
Hash = bar
; Assigning *from* a temporary resource defaults to reference, but perhaps the
; temporary resource doesn't have the right bind flags for what we are
; assigning it to, or we want it leave it assigned in this slot while
; preventing it from getting updated if the game changes the original:
ps-t110 = copy ResourceTempStorage
[/code]
By default, if you try to copy something that wasn't bound, 3DMigoto will unbind the destination as well. But, perhaps the resource you are copying is only bound some of the time, and if it is not bound you want to leave whatever was previously bound in the destination alone:
[code]
[ShaderOverrideBaz]
Hash = baz
ps-t50 = o2 unless_null
[/code]
Or, perhaps you just want to unbind something from the pipeline for some reason (like influencing driver heuristics?):
[code]
[ShaderOverrideRubbish]
Hash = rubbish
o0 = null
[/code]
All this code is very new and there might still be bugs or memory leaks. If you can think of something this might be useful for, please give it a try and see what happens :)
(Unordered Access Views and sampler states not yet supported)
I haven't fully documented this in the d3dx.ini yet, but here's a few examples to get you started.
Copy constant buffer 1 from the pixel shader to the vertex shader as constant buffer 13 (used in Mad Max to get access to the depth of a light to fix bloom):
[ShaderOverrideBloomVS]
; The vertex shader doesn't have access to the light depth, which we need to
; correct the bloom position accurately. The pixel shader does have the depth
; which it uses to calculate the bloom opacity. Bind the relevant constant
; buffer from the pixel shader to the vertex shader so we can get the correct
; depth and fix the position.
Hash=e9849e745227d124
vs-cb13 = ps-cb1
Copy a constant buffer from one shader to another, using an intermediate resource (used to fix specular highlights and environmental reflections in Unity 5 games):
; Define an intermediate resource to copy between different shaders. Custom
; resources like this start with "Resource":
[ResourceUnityPerCameraRare]
; Copy constant buffer 1 from the directional lighting shader into the
; intermediate resource:
[ShaderOverrideDirectional]
Hash = b78925705424e647
ResourceUnityPerCameraRare = vs-cb1
; Copy (actually reference) the intermediate resource into the physical
; lighting shader as constant buffer 13:
[ShaderOverridePhysical]
Hash = ca5cfc8e4d8b1ce5
vs-cb13 = ResourceUnityPerCameraRare
Copy the currently active depth buffer into a vertex shader (e.g. for automatically adjusting a crosshair depth) - this replaces the experimental depth_input feature, which has been removed:
[ShaderOverrideCrosshair]
Hash = 07e0f4c1eb997ee1
vs-t110 = oD
Copy a vertex buffer into the shader as a constant buffer (Might be able to look up the position of other vertices?):
There's also keywords to control some advanced features. For example, 3DMigoto will try to guess whether it should do a full copy of a resource, or only a lightweight reference, but maybe you want to override this:
[ResourceTempStorage]
[ShaderOverrideFoo]
Hash = foo
; Assigning *to* a temporary resource defaults to copying, but maybe we
; actually want a reference either to get any updates the game makes to the
; original texture, or because we know the resource won't change before we need
; to use it:
ResourceTempStorage = reference ps-t0
[ShaderOverrideBar]
Hash = bar
; Assigning *from* a temporary resource defaults to reference, but perhaps the
; temporary resource doesn't have the right bind flags for what we are
; assigning it to, or we want it leave it assigned in this slot while
; preventing it from getting updated if the game changes the original:
ps-t110 = copy ResourceTempStorage
By default, if you try to copy something that wasn't bound, 3DMigoto will unbind the destination as well. But, perhaps the resource you are copying is only bound some of the time, and if it is not bound you want to leave whatever was previously bound in the destination alone:
Or, perhaps you just want to unbind something from the pipeline for some reason (like influencing driver heuristics?):
[ShaderOverrideRubbish]
Hash = rubbish
o0 = null
All this code is very new and there might still be bugs or memory leaks. If you can think of something this might be useful for, please give it a try and see what happens :)
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
[quote="DarkStarSword"]
Copy the currently active depth buffer into a vertex shader (e.g. for automatically adjusting a crosshair depth) - this replaces the experimental depth_input feature, which has been removed:
[code]
[ShaderOverrideCrosshair]
Hash = 07e0f4c1eb997ee1
vs-t110 = oD
[/code]
[/quote]
Can you extend a little more this feature....how to use in a shader and what's all the info.
With this we can implement dynamics crosshair...yeaaah!!
DarkStarSword said:
Copy the currently active depth buffer into a vertex shader (e.g. for automatically adjusting a crosshair depth) - this replaces the experimental depth_input feature, which has been removed:
[ShaderOverrideCrosshair]
Hash = 07e0f4c1eb997ee1
vs-t110 = oD
Can you extend a little more this feature....how to use in a shader and what's all the info.
With this we can implement dynamics crosshair...yeaaah!!
There's a few pieces you need to get an auto crosshair working, and you will need to adjust it for each game.
[size="L"][center][b][color="green"]Step 1: Copy the depth buffer[/color][/b][/center][/size]
Firstly, you will need to get access to the depth buffer in the crosshair's vertex shader, which you can use the new resource copying feature for.
[size="M"][b][color="green"]Option 1: Simple[/color][/b][/size]
If the game leaves the depth buffer assigned while it is drawing UI elements, you can simply do this to copy it to texture 110 in the vertex shader:
[code]
[ShaderOverrideCrosshair]
Hash = ...
vs-t110 = oD
[/code]
This may or may not work in a given game. You can use frame analysis to dump out the depth buffer for this shader to check, or just try it and see.
There may also be some other problems with this method in terms of performance or vram usage as this will cause a full copy every time this shader is encountered. This is likely to happen multiple times in a frame, and if you are adjusting the entire UI you may end up with copies performed for each UI element (eating performance), and copies hanging around for each unique UI shader (consuming vram). Resource renaming (a hardware feature to trade off vram for performance) may also cause more vram to be consumed than we would naively expect. This should be fairly well contained though - if you see ram or vram usage continue to increase without limit while using this feature let me know as that may signify be a bug (which is entirely possible as this is all very new code).
[size="M"][b][color="green"]Option 2: Reduced copies, better performance, less vram usage[/color][/b][/size]
You can limit the number of copies to once per frame for a given resource to eliminate these issues, though it uses slightly different code:
[code]
[ResourceDepthBuffer]
max_copies_per_frame=1
[ShaderOverrideCrosshair]
Hash = ...
; Since we are limiting the number of copies, use the 'unless_null' keyword to
; make sure we don't end up with a blank buffer if some draw call doesn't have
; a depth buffer bound:
ResourceDepthBuffer = oD unless_null
vs-t110 = ResourceDepthBuffer
[/code]
The idea there is that a depth buffer will only be copied to ResourceDepthBuffer the first time a UI shader is encountered in a frame, eliminating excess copies and reducing the performance overhead. Using an intermediate resource means we can get this benefit even if multiple UI shaders are used. The "copy" of the temporary resource to a texture happens by reference (by default), so there is no overhead there.
Edit: Supported in 3DMigoto 1.2.5. Added 'unless_null' keyword.
[size="M"][b][color="green"]Option 3: Copy from another shader (may make step 5 easier)[/color][/b][/size]
If the depth buffer is not assigned when the UI shader is drawn you can't just copy it directly and will need to copy it from a separate shader. This should not be too difficult as the depth buffer is typically available all over the place. You might even be able to find a version that has been pre-scaled to world coordinates (a "W-buffer"), which will make step 5 trivial.
e.g. you might copy it out of a shadow shader:
[code]
[ResourceDepthBuffer]
[ShaderOverrideShadow]
Hash = ...
; There's a good chance that copying by reference will work as we are copying a
; texture for use in a texture slot (same type of binding), and if it works this
; will save us the copy and extra storage:
ResourceDepthBuffer = reference ps-t0
; But if it doesn't work (maybe the game overwrites the texture afterwards),
; use a full copy instead, which is the default when copying to a temporary
; resource so we don't need to explicitly say so:
;ResourceDepthBuffer = ps-t0
[ShaderOverrideCrosshair]
Hash = ...
vs-t110 = ResourceDepthBuffer
[/code]
or perhaps you have found (using frame analysis) a shader run once at the start of post processing that renders the depth buffer to a render target (seems to be fairly common):
[code]
[ResourceDepthBuffer]
[ShaderOverridePostProcessingDepthBuffer]
Hash = ...
; Unlike the above we almost certainly need to do a full copy here as we are
; copying a render target for use in a texture, which are different types of
; bindings (unless the game created the texture with both bind flags, which is
; unlikely). This is the default behaviour when copying to a temporary
; resource, so we don't need to explicitly say so:
;
; Since we are copying a render target we want to wait until after the draw
; call has finished before we copy it, which requires the "post" keyword
; introduced in 3DMigoto 1.2.5:
post ResourceDepthBuffer = o0
[ShaderOverrideCrosshair]
Hash = ...
vs-t110 = ResourceDepthBuffer
[/code]
Edit: Added post keyword in 3DMigoto 1.2.5 to copy render targets after the draw call
[size="L"][center][b][color="green"]Step 2: Declare the copied depth buffer[/color][/b][/center][/size]
Once you have the depth buffer copied into the vertex shader you need to add this declaration to the top, adjusting register(t110) to match whichever texture slot you copied the depth buffer to:
[code]
// Depth buffer copied to this input with 3Dmigoto:
Texture2D<float> DepthBuffer : register(t110);
[/code]
Note that if you copied the resource from another texture (as opposed to a depth buffer) you should use the same declaration as the shader you copied it from (except for the register number). If you copied it from another render target you may also need to change float to float4 to match the oN register in the original pixel shader. If either of these mean using a float2/float3/float4 you may also need to adjust which channel the depth is read from where it is used below.
[size="L"][center][b][color="green"]Step 3: Copy and paste the auto crosshair code[/color][/b][/center][/size]
Then, you will want to paste this code into the shader before the main() function. You will need to change some things (either near & far, or the scaling applied in world_z_from_depth_buffer), but we will come back to that:
[code]
static const float near = 0.1;
static const float far = 40000;
float world_z_from_depth_buffer(float x, float y)
{
uint width, height;
float z;
DepthBuffer.GetDimensions(width, height);
x = min(max((x / 2 + 0.5) * width, 0), width - 1);
y = min(max((-y / 2 + 0.5) * height, 0), height - 1);
z = DepthBuffer.Load(int3(x, y, 0));
if (z == 1)
return 0;
// Derive world Z from depth buffer. This is a kluge since I don't know
// the correct scaling, and the Z buffer seems to be (1 - what I expected).
// Might be able to determine the correct way to scale it from other shaders.
return far*near/(((1-z)*near) + (far*z));
}
float adjust_from_depth_buffer(float x, float y)
{
float4 stereo = StereoParams.Load(0);
float separation = stereo.x; float convergence = stereo.y;
float old_offset, offset, w, sampled_w, distance;
uint i;
// Stereo cursor: To improve the accuracy of the stereo cursor, we
// sample a number of points on the depth buffer, starting at the near
// clipping plane and working towards original x + separation.
//
// You can think of this as a line in three dimensional space that
// starts at each eye and stretches out towards infinity. We sample 255
// points along this line (evenly spaced in the X axis) and compare
// with the depth buffer to find where the line is first intersected.
//
// Note: The reason for sampling 255 points came from a restriction in
// DX9/SM3 where loops had to run a constant number of iterations and
// there was no way to set that number from within the shader itself.
// I'm not sure if the same restriction applies in DX11 with SM4/5 - if
// it doesn't, we could change this to check each pixel instead for
// better accuracy.
//
// Based on DarkStarSword's stereo crosshair code originally developed
// for Miasmata, adapted to Unity, then translated to HLSL.
offset = (near - convergence) * separation; // Z = X offset from center
distance = separation - offset; // Total distance to cover (separation - starting X offset)
old_offset = offset;
for (i = 0; i < 255; i++) {
offset += distance / 255.0;
// Calculate depth for this point on the line:
w = (separation * convergence) / (separation - offset);
sampled_w = world_z_from_depth_buffer(x + offset, y);
if (sampled_w == 0)
return 0;
// If the sampled depth is closer than the calculated depth,
// we have found something that intersects the line, so exit
// the loop and return the last point that was not intersected:
if (w > sampled_w)
break;
old_offset = offset;
}
return old_offset;
}
[/code]
[size="L"][center][b][color="green"]Step 4: Hook up the auto crosshair code[/color][/b][/center][/size]
[size="M"][b][color="green"]Option 1: Adjust based on the center of the screen[/color][/b][/size]
Then, somewhere in the body of the code you call this function and pass it the coordinates on the depth buffer you want to check. For example, if you are adjusting a crosshair you probably want to sample around the center of the screen (0,0):
[code]
o0.x += adjust_from_depth_buffer(0, 0);
[/code]
This assumes that o0.w == 1 and the UI element was being displayed at screen depth originally. If that is not the case, you would need to change the adjustment to compensate (e.g. by multiplying by o0.w, and/or subtracting the nvidia formula and/or normalising the coordinate so that it is at depth==1 or depth==convergence) - standard UI adjustment stuff.
[size="M"][b][color="green"]Option 2: Adjust each UI element separately[/color][/b][/size]
I've generally found it's simpler to adjust the whole UI to the crosshair depth, or better to only adjust the crosshair, but if you wanted to experiment with automatically adjust the entire UI you could do something like this to adjust each vertex individually. The result looks very similar to how the UI appears in compatibility mode, with the UI distorted over the geometry, so I don't necessarily recommend this:
[code]
o0.x += adjust_from_depth_buffer(o0.x, o0.y);
[/code]
You might also experiment with finding a consistent point to sample for a given UI element. You might be able to multiply a point at the origin (or some fixed offset) by the MVP matrix passed to the shader to find a point that will be consistent for all corners, or perhaps you could copy a vertex buffer in to look up the coordinates of other vertices (I haven't tried this yet).
[size="L"][center][b][color="green"]Step 5: Figure out the right scaling[/color][/b][/center][/size]
Finally, you will need to figure out the correct way to scale a value from the depth buffer to world Z for this particular game, and the right answer here will vary depending on the game.
[size="M"][b][color="green"]Option 1: Adjust near + far clipping planes[/color][/b][/size]
A simple way that might work is to find or guess the values of the near and far clipping planes and adjust the definitions at the top of the code. You might be able to find these in a constant buffer somewhere and use them directly or dump them out using frame analysis to find their values and hardcode them (which will only work if they don't change during the game - check a few levels & cutscenes to make sure they are ok).
You can also use the convergence to guess these - find something in the game that clips through the camera (or failing that is as close to the camera as possible), then adjust the convergence to put the point it clips at screen depth. At that point the convergence will be an upper bound for the near clipping plane. Then, find something far away (like a mountain, but probably not the sky box) and adjust convergence until that is at screen depth, which will give you a lower bound for the far clipping plane. Plug those values into the code, then use trial and error to tune them until the crosshair rests on whatever you are aiming at.
If this is totally not working it may be that the game is fundamentally scaling their depth buffer differently (e.g. linear vs exponential), and the formula in world_z_from_depth_buffer() may need to be adjusted (the above one works in Witcher 3 and Mad Max).
[size="M"][b][color="green"]Option 2: Do what the game does[/color][/b][/size]
A better way is to look at other places the game uses the depth buffer to calculate a world coordinate (such as in most deferred lighting shaders) and replace the "return far*near/..." line in the world_z_from_depth_buffer() function with code that does the same scaling as the game. You may find you also need to some values which may or may not already be available in the crosshair shader - if they aren't you can copy them in, or use frame analysis to dump them out and hardcode them (which will only work if they don't change during the game).
[size="M"][b][color="green"]Option 3: Revisit step 1[/color][/b][/size]
The game may have a pre-scaled copy of the depth buffer available somewhere (a "W-buffer"), allowing you to use it's values directly with no scaling whatsoever. This won't be available if you copied a depth buffer from oD, so you may wish to go back to step 1 and look for alternative sources of depth information.
e.g. in The Witcher 3 the first shader run during post-processing scales the Z-buffer into a W-buffer, so using it allows us to skip scaling it ourselves:
[code]
[ResourceWBuffer]
[ShaderOverrideHBAODepthPass]
Hash = 170486ed36efcc9e
; This shader converts the Z buffer into a W buffer. Save it off for later use
; in HUD shaders:
post ResourceWBuffer = o0
[/code]
There's a few pieces you need to get an auto crosshair working, and you will need to adjust it for each game.
Step 1: Copy the depth buffer
Firstly, you will need to get access to the depth buffer in the crosshair's vertex shader, which you can use the new resource copying feature for.
Option 1: Simple
If the game leaves the depth buffer assigned while it is drawing UI elements, you can simply do this to copy it to texture 110 in the vertex shader:
[ShaderOverrideCrosshair]
Hash = ...
vs-t110 = oD
This may or may not work in a given game. You can use frame analysis to dump out the depth buffer for this shader to check, or just try it and see.
There may also be some other problems with this method in terms of performance or vram usage as this will cause a full copy every time this shader is encountered. This is likely to happen multiple times in a frame, and if you are adjusting the entire UI you may end up with copies performed for each UI element (eating performance), and copies hanging around for each unique UI shader (consuming vram). Resource renaming (a hardware feature to trade off vram for performance) may also cause more vram to be consumed than we would naively expect. This should be fairly well contained though - if you see ram or vram usage continue to increase without limit while using this feature let me know as that may signify be a bug (which is entirely possible as this is all very new code).
Option 2: Reduced copies, better performance, less vram usage
You can limit the number of copies to once per frame for a given resource to eliminate these issues, though it uses slightly different code:
[ResourceDepthBuffer]
max_copies_per_frame=1
[ShaderOverrideCrosshair]
Hash = ...
; Since we are limiting the number of copies, use the 'unless_null' keyword to
; make sure we don't end up with a blank buffer if some draw call doesn't have
; a depth buffer bound:
ResourceDepthBuffer = oD unless_null
vs-t110 = ResourceDepthBuffer
The idea there is that a depth buffer will only be copied to ResourceDepthBuffer the first time a UI shader is encountered in a frame, eliminating excess copies and reducing the performance overhead. Using an intermediate resource means we can get this benefit even if multiple UI shaders are used. The "copy" of the temporary resource to a texture happens by reference (by default), so there is no overhead there.
Edit: Supported in 3DMigoto 1.2.5. Added 'unless_null' keyword.
Option 3: Copy from another shader (may make step 5 easier)
If the depth buffer is not assigned when the UI shader is drawn you can't just copy it directly and will need to copy it from a separate shader. This should not be too difficult as the depth buffer is typically available all over the place. You might even be able to find a version that has been pre-scaled to world coordinates (a "W-buffer"), which will make step 5 trivial.
e.g. you might copy it out of a shadow shader:
[ResourceDepthBuffer]
[ShaderOverrideShadow]
Hash = ...
; There's a good chance that copying by reference will work as we are copying a
; texture for use in a texture slot (same type of binding), and if it works this
; will save us the copy and extra storage:
ResourceDepthBuffer = reference ps-t0
; But if it doesn't work (maybe the game overwrites the texture afterwards),
; use a full copy instead, which is the default when copying to a temporary
; resource so we don't need to explicitly say so:
;ResourceDepthBuffer = ps-t0
or perhaps you have found (using frame analysis) a shader run once at the start of post processing that renders the depth buffer to a render target (seems to be fairly common):
[ResourceDepthBuffer]
[ShaderOverridePostProcessingDepthBuffer]
Hash = ...
; Unlike the above we almost certainly need to do a full copy here as we are
; copying a render target for use in a texture, which are different types of
; bindings (unless the game created the texture with both bind flags, which is
; unlikely). This is the default behaviour when copying to a temporary
; resource, so we don't need to explicitly say so:
;
; Since we are copying a render target we want to wait until after the draw
; call has finished before we copy it, which requires the "post" keyword
; introduced in 3DMigoto 1.2.5:
post ResourceDepthBuffer = o0
Edit: Added post keyword in 3DMigoto 1.2.5 to copy render targets after the draw call
Step 2: Declare the copied depth buffer
Once you have the depth buffer copied into the vertex shader you need to add this declaration to the top, adjusting register(t110) to match whichever texture slot you copied the depth buffer to:
// Depth buffer copied to this input with 3Dmigoto:
Texture2D<float> DepthBuffer : register(t110);
Note that if you copied the resource from another texture (as opposed to a depth buffer) you should use the same declaration as the shader you copied it from (except for the register number). If you copied it from another render target you may also need to change float to float4 to match the oN register in the original pixel shader. If either of these mean using a float2/float3/float4 you may also need to adjust which channel the depth is read from where it is used below.
Step 3: Copy and paste the auto crosshair code
Then, you will want to paste this code into the shader before the main() function. You will need to change some things (either near & far, or the scaling applied in world_z_from_depth_buffer), but we will come back to that:
static const float near = 0.1;
static const float far = 40000;
x = min(max((x / 2 + 0.5) * width, 0), width - 1);
y = min(max((-y / 2 + 0.5) * height, 0), height - 1);
z = DepthBuffer.Load(int3(x, y, 0));
if (z == 1)
return 0;
// Derive world Z from depth buffer. This is a kluge since I don't know
// the correct scaling, and the Z buffer seems to be (1 - what I expected).
// Might be able to determine the correct way to scale it from other shaders.
return far*near/(((1-z)*near) + (far*z));
}
// Stereo cursor: To improve the accuracy of the stereo cursor, we
// sample a number of points on the depth buffer, starting at the near
// clipping plane and working towards original x + separation.
//
// You can think of this as a line in three dimensional space that
// starts at each eye and stretches out towards infinity. We sample 255
// points along this line (evenly spaced in the X axis) and compare
// with the depth buffer to find where the line is first intersected.
//
// Note: The reason for sampling 255 points came from a restriction in
// DX9/SM3 where loops had to run a constant number of iterations and
// there was no way to set that number from within the shader itself.
// I'm not sure if the same restriction applies in DX11 with SM4/5 - if
// it doesn't, we could change this to check each pixel instead for
// better accuracy.
//
// Based on DarkStarSword's stereo crosshair code originally developed
// for Miasmata, adapted to Unity, then translated to HLSL.
offset = (near - convergence) * separation; // Z = X offset from center
distance = separation - offset; // Total distance to cover (separation - starting X offset)
old_offset = offset;
for (i = 0; i < 255; i++) {
offset += distance / 255.0;
// Calculate depth for this point on the line:
w = (separation * convergence) / (separation - offset);
// If the sampled depth is closer than the calculated depth,
// we have found something that intersects the line, so exit
// the loop and return the last point that was not intersected:
if (w > sampled_w)
break;
old_offset = offset;
}
return old_offset;
}
Step 4: Hook up the auto crosshair code
Option 1: Adjust based on the center of the screen
Then, somewhere in the body of the code you call this function and pass it the coordinates on the depth buffer you want to check. For example, if you are adjusting a crosshair you probably want to sample around the center of the screen (0,0):
o0.x += adjust_from_depth_buffer(0, 0);
This assumes that o0.w == 1 and the UI element was being displayed at screen depth originally. If that is not the case, you would need to change the adjustment to compensate (e.g. by multiplying by o0.w, and/or subtracting the nvidia formula and/or normalising the coordinate so that it is at depth==1 or depth==convergence) - standard UI adjustment stuff.
Option 2: Adjust each UI element separately
I've generally found it's simpler to adjust the whole UI to the crosshair depth, or better to only adjust the crosshair, but if you wanted to experiment with automatically adjust the entire UI you could do something like this to adjust each vertex individually. The result looks very similar to how the UI appears in compatibility mode, with the UI distorted over the geometry, so I don't necessarily recommend this:
o0.x += adjust_from_depth_buffer(o0.x, o0.y);
You might also experiment with finding a consistent point to sample for a given UI element. You might be able to multiply a point at the origin (or some fixed offset) by the MVP matrix passed to the shader to find a point that will be consistent for all corners, or perhaps you could copy a vertex buffer in to look up the coordinates of other vertices (I haven't tried this yet).
Step 5: Figure out the right scaling
Finally, you will need to figure out the correct way to scale a value from the depth buffer to world Z for this particular game, and the right answer here will vary depending on the game.
Option 1: Adjust near + far clipping planes
A simple way that might work is to find or guess the values of the near and far clipping planes and adjust the definitions at the top of the code. You might be able to find these in a constant buffer somewhere and use them directly or dump them out using frame analysis to find their values and hardcode them (which will only work if they don't change during the game - check a few levels & cutscenes to make sure they are ok).
You can also use the convergence to guess these - find something in the game that clips through the camera (or failing that is as close to the camera as possible), then adjust the convergence to put the point it clips at screen depth. At that point the convergence will be an upper bound for the near clipping plane. Then, find something far away (like a mountain, but probably not the sky box) and adjust convergence until that is at screen depth, which will give you a lower bound for the far clipping plane. Plug those values into the code, then use trial and error to tune them until the crosshair rests on whatever you are aiming at.
If this is totally not working it may be that the game is fundamentally scaling their depth buffer differently (e.g. linear vs exponential), and the formula in world_z_from_depth_buffer() may need to be adjusted (the above one works in Witcher 3 and Mad Max).
Option 2: Do what the game does
A better way is to look at other places the game uses the depth buffer to calculate a world coordinate (such as in most deferred lighting shaders) and replace the "return far*near/..." line in the world_z_from_depth_buffer() function with code that does the same scaling as the game. You may find you also need to some values which may or may not already be available in the crosshair shader - if they aren't you can copy them in, or use frame analysis to dump them out and hardcode them (which will only work if they don't change during the game).
Option 3: Revisit step 1
The game may have a pre-scaled copy of the depth buffer available somewhere (a "W-buffer"), allowing you to use it's values directly with no scaling whatsoever. This won't be available if you copied a depth buffer from oD, so you may wish to go back to step 1 and look for alternative sources of depth information.
e.g. in The Witcher 3 the first shader run during post-processing scales the Z-buffer into a W-buffer, so using it allows us to skip scaling it ourselves:
[ResourceWBuffer]
[ShaderOverrideHBAODepthPass]
Hash = 170486ed36efcc9e
; This shader converts the Z buffer into a W buffer. Save it off for later use
; in HUD shaders:
post ResourceWBuffer = o0
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
DHR said:4everAwake..look the wip i send you. The shader for reflection has a manual fix (replace one line) and also in the star wars battlefront we found a similar issue that need manually remove one line (you can look the thread)
Upload the shaders here...i think will be similar issue to the previous one i mentioned. Its an specific issue with the wrapper.
I looked at the example in the Battlefront thread. I know there's a line you have to disable, but I was unable to determine which line it is with this shader:
I'm not the expert, but you can try replacing this line (line 54)
[code]r1.zw = r1.zz ? float2(2.80259693e-045,0.499999493) : float2(1.40129846e-045,0.999998987);[/code]
for this one:
[code]r1.zw = r1.zz ? float2(2.0,0.499999) : float2(1.0,0.999999);[/code]
Is the same issue pattern in the reflection shader i send you in the WIP version.
I hit that same decompiler bug in some of the bloom pixel shaders in Mad Max, where we are somehow replaced 0x3e000000 (0.125) with 1040187392.0 (0x4e780000). Here 2.0 (0x40000000) was somehow replaced with 2.8e-45 (0x00000002).
I hit that same decompiler bug in some of the bloom pixel shaders in Mad Max, where we are somehow replaced 0x3e000000 (0.125) with 1040187392.0 (0x4e780000). Here 2.0 (0x40000000) was somehow replaced with 2.8e-45 (0x00000002).
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
By the way, the memory leak only happens when hunting mode is enabled (I see you opened an issue in Github about this). With it disabled, I used alt+enter multiple times and VRAM remained steady.
I've released the fixes for Zelda OOT here: https://forums.dolphin-emu.org/Thread-zelda-collector-s-edition-hd-texture-pack-v0-4-patch-2-new-dds-link-and-ar-fix
If the iniparams issue gets fixed someday, 2 fixes packs will be needed instead of 4.
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com
- I saw this thing a while back on Dragon Age: Inquisition as well. If you alt+tab (while shader hunting=1) around 4-5 times and also monitor the RAM usage you will see that at some point your whole RAM is used (in my case 16GB) and stars dumping stuff on the HDD. At some point it will silently crash the game...
Now I haven't checked this thing lately and I can't remember if you found the issue there or not... but I just remembered this and maybe it helps you identify the issue easier?:)
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
In my case Shadow Warrior and Dolphin filled my VRAM, and after a few more alt+tabbing, they crash, or make the Nvidia drivers crash, or Windows complained about the lack of memory, but the system RAM usage was normal.
By the way, with 3Dmigoto in 2D, not only constants don't work. Shader partnering also doesn't work, for example. It made the vertex shader not apply to anything, instead of applying to the defined pixel shader. This gets in my way of fixing precisely a game with 2D backgrounds that should be played in 2D.
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com
This version primarily includes a new hash calculation for textures and buffers. If you want to use TextureOverrides, the hash calculation is now completely different and so the hashes themselves will be different.
Old fixes using those will not directly work with this new version, which is why we are bumping to 1.2, it's an incompatible change. I think there are only 2 or 3 fixes using this, including FarCry4, Crysis2&3. If it's too inconvenient, it's always OK to use the older version.
The new hash is only 32 bits instead of 64, so a bit easier to manage. The hash function for shaders themselves did not change, as there was no advantage there, and plenty of shaders already out there. This will make it easy to tell the difference between hashes, as textures will be shorter.
The reason to do this change was to improve the performance of the texture override function, as it was using 2.4% or so of the CPU, almost all in the hash calculation itself. This change uses a hardware based crc32c function that is in all modern processors, and so it's some 30x faster than the old one.
With this change, we are now back to using 0.8% of the CPU for 3Dmigoto as measured with performance tools.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
Basically detect if textureObveride uses old or new hashes and either do the efficient new texture hashing or de old one depending on d3dx.ini.
As you mentioned it only affects a few game fixes so not sure if it matters.
Thanks to everybody using my assembler it warms my heart.
To have a critical piece of code that everyone can enjoy!
What more can you ask for?
donations: ulfjalmbrant@hotmail.com
So, I wouldn't worry about the old version, and just go with the new version. The crc32c library code is in the project, and I did not modify the project, so you should be able to just grab those code files and call them.
It's basically changing the call from fnv_64_buf for Texture2D, Texture3D, and Buffer to use the append_hw call. You might want to use the superset version of CalcTexture2DDescHash from HackerDevice.cpp, as that includes the code to include the pDesc.
Please look for example at CreateTexture2D to be able to exactly match the new texture hash for these. It should be simpler than it was before.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
Dual boot Win 7 x64 & Win 10 (1809) | Geforce Drivers 417.35
Upload the shaders here...i think will be similar issue to the previous one i mentioned. Its an specific issue with the wrapper.
MY WEB
Helix Mod - Making 3D Better
My 3D Screenshot Gallery
Like my fixes? you can donate to Paypal: dhr.donation@gmail.com
https://github.com/bo3b/3Dmigoto/releases/tag/1.2.4
Currently supports copying:
- Constant Buffers (<type>s-cb<slot>)
- Vertex Buffers (vb<slot>)
- Index Buffers (ib)
- Stream Output Buffers (so<slot>)
- Textures (<type>s-t<slot>)
- Render Targets (o<slot>)
- Depth Targets (oD)
(Unordered Access Views and sampler states not yet supported)
I haven't fully documented this in the d3dx.ini yet, but here's a few examples to get you started.
Copy constant buffer 1 from the pixel shader to the vertex shader as constant buffer 13 (used in Mad Max to get access to the depth of a light to fix bloom):
- https://github.com/DarkStarSword/3d-fixes/commit/565daab56d5b4d52aa6aaa9e1b45cf37d03e2492
Copy a constant buffer from one shader to another, using an intermediate resource (used to fix specular highlights and environmental reflections in Unity 5 games):
- https://github.com/DarkStarSword/3d-fixes/commit/cacec95fe30661194ef014e1b386bb346dc10118
Copy the currently active depth buffer into a vertex shader (e.g. for automatically adjusting a crosshair depth) - this replaces the experimental depth_input feature, which has been removed:
Copy a vertex buffer into the shader as a constant buffer (Might be able to look up the position of other vertices?):
Copy an active render target into the shader as a texture:
There's also keywords to control some advanced features. For example, 3DMigoto will try to guess whether it should do a full copy of a resource, or only a lightweight reference, but maybe you want to override this:
By default, if you try to copy something that wasn't bound, 3DMigoto will unbind the destination as well. But, perhaps the resource you are copying is only bound some of the time, and if it is not bound you want to leave whatever was previously bound in the destination alone:
Or, perhaps you just want to unbind something from the pipeline for some reason (like influencing driver heuristics?):
All this code is very new and there might still be bugs or memory leaks. If you can think of something this might be useful for, please give it a try and see what happens :)
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
Can you extend a little more this feature....how to use in a shader and what's all the info.
With this we can implement dynamics crosshair...yeaaah!!
MY WEB
Helix Mod - Making 3D Better
My 3D Screenshot Gallery
Like my fixes? you can donate to Paypal: dhr.donation@gmail.com
Firstly, you will need to get access to the depth buffer in the crosshair's vertex shader, which you can use the new resource copying feature for.
Option 1: Simple
If the game leaves the depth buffer assigned while it is drawing UI elements, you can simply do this to copy it to texture 110 in the vertex shader:
This may or may not work in a given game. You can use frame analysis to dump out the depth buffer for this shader to check, or just try it and see.
There may also be some other problems with this method in terms of performance or vram usage as this will cause a full copy every time this shader is encountered. This is likely to happen multiple times in a frame, and if you are adjusting the entire UI you may end up with copies performed for each UI element (eating performance), and copies hanging around for each unique UI shader (consuming vram). Resource renaming (a hardware feature to trade off vram for performance) may also cause more vram to be consumed than we would naively expect. This should be fairly well contained though - if you see ram or vram usage continue to increase without limit while using this feature let me know as that may signify be a bug (which is entirely possible as this is all very new code).
Option 2: Reduced copies, better performance, less vram usage
You can limit the number of copies to once per frame for a given resource to eliminate these issues, though it uses slightly different code:
The idea there is that a depth buffer will only be copied to ResourceDepthBuffer the first time a UI shader is encountered in a frame, eliminating excess copies and reducing the performance overhead. Using an intermediate resource means we can get this benefit even if multiple UI shaders are used. The "copy" of the temporary resource to a texture happens by reference (by default), so there is no overhead there.
Edit: Supported in 3DMigoto 1.2.5. Added 'unless_null' keyword.
Option 3: Copy from another shader (may make step 5 easier)
If the depth buffer is not assigned when the UI shader is drawn you can't just copy it directly and will need to copy it from a separate shader. This should not be too difficult as the depth buffer is typically available all over the place. You might even be able to find a version that has been pre-scaled to world coordinates (a "W-buffer"), which will make step 5 trivial.
e.g. you might copy it out of a shadow shader:
or perhaps you have found (using frame analysis) a shader run once at the start of post processing that renders the depth buffer to a render target (seems to be fairly common):
Edit: Added post keyword in 3DMigoto 1.2.5 to copy render targets after the draw call
Once you have the depth buffer copied into the vertex shader you need to add this declaration to the top, adjusting register(t110) to match whichever texture slot you copied the depth buffer to:
Note that if you copied the resource from another texture (as opposed to a depth buffer) you should use the same declaration as the shader you copied it from (except for the register number). If you copied it from another render target you may also need to change float to float4 to match the oN register in the original pixel shader. If either of these mean using a float2/float3/float4 you may also need to adjust which channel the depth is read from where it is used below.
Then, you will want to paste this code into the shader before the main() function. You will need to change some things (either near & far, or the scaling applied in world_z_from_depth_buffer), but we will come back to that:
Option 1: Adjust based on the center of the screen
Then, somewhere in the body of the code you call this function and pass it the coordinates on the depth buffer you want to check. For example, if you are adjusting a crosshair you probably want to sample around the center of the screen (0,0):
This assumes that o0.w == 1 and the UI element was being displayed at screen depth originally. If that is not the case, you would need to change the adjustment to compensate (e.g. by multiplying by o0.w, and/or subtracting the nvidia formula and/or normalising the coordinate so that it is at depth==1 or depth==convergence) - standard UI adjustment stuff.
Option 2: Adjust each UI element separately
I've generally found it's simpler to adjust the whole UI to the crosshair depth, or better to only adjust the crosshair, but if you wanted to experiment with automatically adjust the entire UI you could do something like this to adjust each vertex individually. The result looks very similar to how the UI appears in compatibility mode, with the UI distorted over the geometry, so I don't necessarily recommend this:
You might also experiment with finding a consistent point to sample for a given UI element. You might be able to multiply a point at the origin (or some fixed offset) by the MVP matrix passed to the shader to find a point that will be consistent for all corners, or perhaps you could copy a vertex buffer in to look up the coordinates of other vertices (I haven't tried this yet).
Finally, you will need to figure out the correct way to scale a value from the depth buffer to world Z for this particular game, and the right answer here will vary depending on the game.
Option 1: Adjust near + far clipping planes
A simple way that might work is to find or guess the values of the near and far clipping planes and adjust the definitions at the top of the code. You might be able to find these in a constant buffer somewhere and use them directly or dump them out using frame analysis to find their values and hardcode them (which will only work if they don't change during the game - check a few levels & cutscenes to make sure they are ok).
You can also use the convergence to guess these - find something in the game that clips through the camera (or failing that is as close to the camera as possible), then adjust the convergence to put the point it clips at screen depth. At that point the convergence will be an upper bound for the near clipping plane. Then, find something far away (like a mountain, but probably not the sky box) and adjust convergence until that is at screen depth, which will give you a lower bound for the far clipping plane. Plug those values into the code, then use trial and error to tune them until the crosshair rests on whatever you are aiming at.
If this is totally not working it may be that the game is fundamentally scaling their depth buffer differently (e.g. linear vs exponential), and the formula in world_z_from_depth_buffer() may need to be adjusted (the above one works in Witcher 3 and Mad Max).
Option 2: Do what the game does
A better way is to look at other places the game uses the depth buffer to calculate a world coordinate (such as in most deferred lighting shaders) and replace the "return far*near/..." line in the world_z_from_depth_buffer() function with code that does the same scaling as the game. You may find you also need to some values which may or may not already be available in the crosshair shader - if they aren't you can copy them in, or use frame analysis to dump them out and hardcode them (which will only work if they don't change during the game).
Option 3: Revisit step 1
The game may have a pre-scaled copy of the depth buffer available somewhere (a "W-buffer"), allowing you to use it's values directly with no scaling whatsoever. This won't be available if you copied a depth buffer from oD, so you may wish to go back to step 1 and look for alternative sources of depth information.
e.g. in The Witcher 3 the first shader run during post-processing scales the Z-buffer into a W-buffer, so using it allows us to skip scaling it ourselves:
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
MY WEB
Helix Mod - Making 3D Better
My 3D Screenshot Gallery
Like my fixes? you can donate to Paypal: dhr.donation@gmail.com
I looked at the example in the Battlefront thread. I know there's a line you have to disable, but I was unable to determine which line it is with this shader:
Dual boot Win 7 x64 & Win 10 (1809) | Geforce Drivers 417.35
for this one:
Is the same issue pattern in the reflection shader i send you in the WIP version.
MY WEB
Helix Mod - Making 3D Better
My 3D Screenshot Gallery
Like my fixes? you can donate to Paypal: dhr.donation@gmail.com
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword