3Dmigoto now open-source...
  85 / 141    
While waiting for DSS to take another look at my trees, I decided to look around in the 3DM sourcecode. Out of curiosity, I started experimenting with the ReloadFixes() function to see If I will be able to add the the things I was asking you for some time ago: the ability to load shaders from subfolders and allow prefixes in shader names (that's an alternative to friendly names as you are preserving the hash). It went smoothly, everything is hunky-dory, but the shaders load on pressing F10 only, ok, what's next then, ReplaceShader() maybe?.... Emmm,...ermmm...how do you even? .... then I stumbled on DSS comment [quote]// This whole function is in need of major refactoring. At a quick glance I can // see several code paths that will leak objects, and in general it is far, // too long and complex - the human brain has between 5 an 9 (typically 7) // general purpose registers, but this function requires far more than that to // understand.[/quote]oh well ... never mind then :)
While waiting for DSS to take another look at my trees, I decided to look around in the 3DM sourcecode. Out of curiosity, I started experimenting with the ReloadFixes() function to see If I will be able to add the the things I was asking you for some time ago: the ability to load shaders from subfolders and allow prefixes in shader names (that's an alternative to friendly names as you are preserving the hash). It went smoothly, everything is hunky-dory, but the shaders load on pressing F10 only, ok, what's next then, ReplaceShader() maybe?.... Emmm,...ermmm...how do you even? .... then I stumbled on DSS comment

// This whole function is in need of major refactoring. At a quick glance I can
// see several code paths that will leak objects, and in general it is far,
// too long and complex - the human brain has between 5 an 9 (typically 7)
// general purpose registers, but this function requires far more than that to
// understand.
oh well ... never mind then :)

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 01/19/2017 11:03 PM   
Hehe, and you're looking at this after Bob3 did make some attempt to refactor parts of it last year, so it is nowhere near as bad as it used to be (it used to be 560 lines, now it's 334 lines)... it's still pretty bad though and my comment still stands - I ultimately want that top level routine to be nothing more than a description of the logic flow with helpers to take care of all the details. I think we should probably start by scrapping the old method for the zero output shader to simplify it a little (does this actually still work for anyone? Last time I tried it 3DMigoto crashed) - a variant of Bo3b's pinking shader that outputs 0 might work just as well. There's also some relatively low hanging fruit that could simplify the way we call the HLSL decompiler - we copy a bunch of things out of the global struct - we could move them into a struct of their own so we only need to copy that struct (or better yet pass a pointer to it), or alternatively look into getting the HLSL decompiler to access the global struct directly (however I'd rather reduce our dependency on that struct as it creates too many cyclic dependencies - hence why most of the new globals I added for the command list code do not live there).
Hehe, and you're looking at this after Bob3 did make some attempt to refactor parts of it last year, so it is nowhere near as bad as it used to be (it used to be 560 lines, now it's 334 lines)... it's still pretty bad though and my comment still stands - I ultimately want that top level routine to be nothing more than a description of the logic flow with helpers to take care of all the details.

I think we should probably start by scrapping the old method for the zero output shader to simplify it a little (does this actually still work for anyone? Last time I tried it 3DMigoto crashed) - a variant of Bo3b's pinking shader that outputs 0 might work just as well.

There's also some relatively low hanging fruit that could simplify the way we call the HLSL decompiler - we copy a bunch of things out of the global struct - we could move them into a struct of their own so we only need to copy that struct (or better yet pass a pointer to it), or alternatively look into getting the HLSL decompiler to access the global struct directly (however I'd rather reduce our dependency on that struct as it creates too many cyclic dependencies - hence why most of the new globals I added for the command list code do not live there).

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/20/2017 01:45 AM   
No worries, I don't mind pressing F10 one more time after the game starts for the convenience of having a file structure readable by a human being during my experiments with shaders. I planned to revert the shaderoverride folder back to it's original form before releasing a mod anyway. It can stay as it is. It must to, as in the current form going further and adding a directory crawler to each ReplaceShader call would murder the cpu. There is a thing that wonders me though about that function. Does iterating through shaders to find if there exist a file to override it with serve some special function? Couldn't it just iterate through existing files instead like it's during reloading and leave all other shaders alone?
No worries, I don't mind pressing F10 one more time after the game starts for the convenience of having a file structure readable by a human being during my experiments with shaders. I planned to revert the shaderoverride folder back to it's original form before releasing a mod anyway. It can stay as it is. It must to, as in the current form going further and adding a directory crawler to each ReplaceShader call would murder the cpu. There is a thing that wonders me though about that function. Does iterating through shaders to find if there exist a file to override it with serve some special function? Couldn't it just iterate through existing files instead like it's during reloading and leave all other shaders alone?

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 01/20/2017 03:12 AM   
[quote="Oomek"]I planned to revert the shaderoverride folder back to it's original form before releasing a mod anyway. It can stay as it is. It must to, as in the current form going further and adding a directory crawler to each ReplaceShader call would murder the cpu.[/quote]Probably for the best - we'd love it if you joined the 3DMigoto development team, but that is one area where I am likely to NAK a lot of patches, for the reason you just outlined. [quote]There is a thing that wonders me though about that function. Does iterating through shaders to find if there exist a file to override it with serve some special function? Couldn't it just iterate through existing files instead like it's during reloading and leave all other shaders alone?[/quote]It's simpler *and* faster. Directory listings in Windows are slow, whereas looking for a file by name is fast and the result is cached in the OS (including the case where the file does not exist) making subsequent lookups even faster. If we did a directory listing we would still have to process the result to cope with there being multiple versions (_replace.txt .txt .bin _replace.bin _bad.txt) and use the correct one based on the timestamps and our chosen policy of which take priority over which, not to mention handle differing cases, so we would still need to do the same work but now we would have extra overhead and complexity. Also, a directory listing means we process every file in that directory, whereas our current approach only process shaders that have actually been used by the game so far (in some cases that is all of them, in others it can be orders of magnitude less). Any patches that are changing to a directory walk or otherwise making fundamental changes to this area would be more palatable if they included performance metrics proving that they don't make things worse (our three largest DX11 mods that would be of particular interest when obtaining these metrics are Rise of the Tomb Raider with 26,047 assembly + 18 HLSL shaders, keeping in mind that the mod is only for older versions of that game and that most of those shaders will not be used until enough progression through the game has been made, WATCH_DOGS2 with 3,661 assembly + 56 HLSL replacement shaders + a number of custom shaders and included files, and Dragon Age Inquisition with 1,970 HLSL shaders). In actual fact, in some cases we have far more overhead coming from the config reload than the shader reload (particularly for some Unity games which include copy directives in the d3dx.ini for almost every shader) - the shader reload is pretty fast if it checks the timestamps and finds that it doesn't need to do anything (and the slow case where it does need to do something is always going to be slow), but the API we are using to parse the ini file opens and parses it everytime we look up an option in it. It would be faster if we replaced that API with one that parsed it once into a data structure that we could then look up. Also, custom shaders are recompiled on every config reload - adding a cache to those would be worthwhile (e.g. anyone using my constant buffer debug shader may have noticed that it adds a full second or two to every config reload). Beware if evaluating other libraries or considering writing our own version that we do have some special needs beyond the basic ini handling, such as allowing repeated lines, allowing lines without an equals sign and preserving the order of lines in the file.
Oomek said:I planned to revert the shaderoverride folder back to it's original form before releasing a mod anyway. It can stay as it is. It must to, as in the current form going further and adding a directory crawler to each ReplaceShader call would murder the cpu.
Probably for the best - we'd love it if you joined the 3DMigoto development team, but that is one area where I am likely to NAK a lot of patches, for the reason you just outlined.

There is a thing that wonders me though about that function. Does iterating through shaders to find if there exist a file to override it with serve some special function? Couldn't it just iterate through existing files instead like it's during reloading and leave all other shaders alone?
It's simpler *and* faster. Directory listings in Windows are slow, whereas looking for a file by name is fast and the result is cached in the OS (including the case where the file does not exist) making subsequent lookups even faster. If we did a directory listing we would still have to process the result to cope with there being multiple versions (_replace.txt .txt .bin _replace.bin _bad.txt) and use the correct one based on the timestamps and our chosen policy of which take priority over which, not to mention handle differing cases, so we would still need to do the same work but now we would have extra overhead and complexity.

Also, a directory listing means we process every file in that directory, whereas our current approach only process shaders that have actually been used by the game so far (in some cases that is all of them, in others it can be orders of magnitude less).

Any patches that are changing to a directory walk or otherwise making fundamental changes to this area would be more palatable if they included performance metrics proving that they don't make things worse (our three largest DX11 mods that would be of particular interest when obtaining these metrics are Rise of the Tomb Raider with 26,047 assembly + 18 HLSL shaders, keeping in mind that the mod is only for older versions of that game and that most of those shaders will not be used until enough progression through the game has been made, WATCH_DOGS2 with 3,661 assembly + 56 HLSL replacement shaders + a number of custom shaders and included files, and Dragon Age Inquisition with 1,970 HLSL shaders).


In actual fact, in some cases we have far more overhead coming from the config reload than the shader reload (particularly for some Unity games which include copy directives in the d3dx.ini for almost every shader) - the shader reload is pretty fast if it checks the timestamps and finds that it doesn't need to do anything (and the slow case where it does need to do something is always going to be slow), but the API we are using to parse the ini file opens and parses it everytime we look up an option in it. It would be faster if we replaced that API with one that parsed it once into a data structure that we could then look up. Also, custom shaders are recompiled on every config reload - adding a cache to those would be worthwhile (e.g. anyone using my constant buffer debug shader may have noticed that it adds a full second or two to every config reload). Beware if evaluating other libraries or considering writing our own version that we do have some special needs beyond the basic ini handling, such as allowing repeated lines, allowing lines without an equals sign and preserving the order of lines in the file.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/20/2017 04:32 AM   
Also worth noting for anyone doing refactoring of that monster routine- there are other copies of the ReloadShader (copy/paste) in other parts of the code. In CopyOnMark, there is another copy of the same sequence, because we made the choice that any CopyOnMark needed to activate that shader immediately, without having to hit F10. That spot is a little removed from the primary loading routine, but it has been my much delayed goal to make specific routines to handle those scenarios, so that we only have a single loader routine that is used in multiple spots. Same is true of the disassembly functions to fetch the ASM to begin with. There are multiple copies, that would ideally be merged to one utility function. But... as DarkStarSword notes, that function is fragile, so I try very hard to tread lightly there.
Also worth noting for anyone doing refactoring of that monster routine- there are other copies of the ReloadShader (copy/paste) in other parts of the code. In CopyOnMark, there is another copy of the same sequence, because we made the choice that any CopyOnMark needed to activate that shader immediately, without having to hit F10.

That spot is a little removed from the primary loading routine, but it has been my much delayed goal to make specific routines to handle those scenarios, so that we only have a single loader routine that is used in multiple spots.

Same is true of the disassembly functions to fetch the ASM to begin with. There are multiple copies, that would ideally be merged to one utility function.

But... as DarkStarSword notes, that function is fragile, so I try very hard to tread lightly there.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 01/20/2017 06:33 AM   
I've come up with another crazy idea this morning, if it's all bollocks just tell me but gently please:) Is it possible to hack the directx in a way that it would allow to use 1 gpu for the rendering and another for displaying the back buffer? As you know there are some external gpus available for laptops, but you need an external monitor for it to work. Is it even doable to: - initialize 2 adapters (no1 internal connected to a display, and no2 external disconnected) - set rendering pipeline to use adapter no2 - initialize fullscreen on no1 - intercept the present call - copy the backbuffer from no2 to no1 Some laptops with multiple gpus have that functionality but I don't know if it's done on a software or hardware level. This would also allow to use a cheap AMD card for just it's freesync functionality, but still rendering games on nvidia.
I've come up with another crazy idea this morning, if it's all bollocks just tell me but gently please:)

Is it possible to hack the directx in a way that it would allow to use 1 gpu for the rendering and another for displaying the back buffer? As you know there are some external gpus available for laptops, but you need an external monitor for it to work. Is it even doable to:
- initialize 2 adapters (no1 internal connected to a display, and no2 external disconnected)
- set rendering pipeline to use adapter no2
- initialize fullscreen on no1
- intercept the present call
- copy the backbuffer from no2 to no1

Some laptops with multiple gpus have that functionality but I don't know if it's done on a software or hardware level.

This would also allow to use a cheap AMD card for just it's freesync functionality, but still rendering games on nvidia.

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 01/21/2017 01:22 PM   
You might run into issues if the external GPU doesn't have a display connected - I'm guessing here, but it might just disallow you from creating the device because there is no valid resolution at which to create a swap chain, and things like the present call and vsync lose their meaning when there is no output. Maybe you can somehow force this, but I really don't know... and it's still the game that would have to choose to use the external GPU, so you might also need to fool it into believing that there are valid modes available for it to use on that GPU. There is some mentions in the DirectX documentation about sharing resources between devices, so you might be able to make it work, I just can't guarantee it because I've never tried. I'm going off memory here, but I think you might need to force a flag at device creation, as well as when creating any resources that are going to be shared. Alternatively you could copy the back buffer into a staging resource at the present call to copy it back to the CPU (this can stall the pipeline if you block waiting for it to be ready - you can probably use an event query to determine when the staging resource is ready removing the need to block), then copy it to a resource on the internal GPU and then to it's back buffer and present. I'm also not sure if we would really want this feature in 3Dmigoto - it's a little far removed from 3D modding and accepting it means (Bo3b and I) paying the maintenance cost due to the increased complexity - but I have no objection to you trying it out in a topic branch or a side build and showing us what you come up with, and maybe we could find a use for it to justify it being in mainline (maybe VR integration could use it?) I have a feeling Bo3b might have some advice or suggestions here... Bo3b?
You might run into issues if the external GPU doesn't have a display connected - I'm guessing here, but it might just disallow you from creating the device because there is no valid resolution at which to create a swap chain, and things like the present call and vsync lose their meaning when there is no output. Maybe you can somehow force this, but I really don't know... and it's still the game that would have to choose to use the external GPU, so you might also need to fool it into believing that there are valid modes available for it to use on that GPU.

There is some mentions in the DirectX documentation about sharing resources between devices, so you might be able to make it work, I just can't guarantee it because I've never tried. I'm going off memory here, but I think you might need to force a flag at device creation, as well as when creating any resources that are going to be shared. Alternatively you could copy the back buffer into a staging resource at the present call to copy it back to the CPU (this can stall the pipeline if you block waiting for it to be ready - you can probably use an event query to determine when the staging resource is ready removing the need to block), then copy it to a resource on the internal GPU and then to it's back buffer and present.


I'm also not sure if we would really want this feature in 3Dmigoto - it's a little far removed from 3D modding and accepting it means (Bo3b and I) paying the maintenance cost due to the increased complexity - but I have no objection to you trying it out in a topic branch or a side build and showing us what you come up with, and maybe we could find a use for it to justify it being in mainline (maybe VR integration could use it?)

I have a feeling Bo3b might have some advice or suggestions here... Bo3b?

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/22/2017 06:36 AM   
[quote="DarkStarSword"]Any patches that are changing to a directory walk or otherwise making fundamental changes to this area would be more palatable if they included performance metrics proving that they don't make things worse (our three largest DX11 mods that would be of particular interest when obtaining these metrics are Rise of the Tomb Raider with 26,047 assembly + 18 HLSL shaders, keeping in mind that the mod is only for older versions of that game and that most of those shaders will not be used until enough progression through the game has been made, WATCH_DOGS2 with 3,661 assembly + 56 HLSL replacement shaders + a number of custom shaders and included files, and Dragon Age Inquisition with 1,970 HLSL shaders).[/quote] I will come back to this subject after doing some benchmarks on games you mentioned.
DarkStarSword said:Any patches that are changing to a directory walk or otherwise making fundamental changes to this area would be more palatable if they included performance metrics proving that they don't make things worse (our three largest DX11 mods that would be of particular interest when obtaining these metrics are Rise of the Tomb Raider with 26,047 assembly + 18 HLSL shaders, keeping in mind that the mod is only for older versions of that game and that most of those shaders will not be used until enough progression through the game has been made, WATCH_DOGS2 with 3,661 assembly + 56 HLSL replacement shaders + a number of custom shaders and included files, and Dragon Age Inquisition with 1,970 HLSL shaders).

I will come back to this subject after doing some benchmarks on games you mentioned.

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 01/22/2017 12:42 PM   
[quote="DarkStarSword"]You might run into issues if the external GPU doesn't have a display connected - I'm guessing here, but it might just disallow you from creating the device because there is no valid resolution at which to create a swap chain, and things like the present call and vsync lose their meaning when there is no output. Maybe you can somehow force this, but I really don't know... and it's still the game that would have to choose to use the external GPU, so you might also need to fool it into believing that there are valid modes available for it to use on that GPU. There is some mentions in the DirectX documentation about sharing resources between devices, so you might be able to make it work, I just can't guarantee it because I've never tried. I'm going off memory here, but I think you might need to force a flag at device creation, as well as when creating any resources that are going to be shared. Alternatively you could copy the back buffer into a staging resource at the present call to copy it back to the CPU (this can stall the pipeline if you block waiting for it to be ready - you can probably use an event query to determine when the staging resource is ready removing the need to block), then copy it to a resource on the internal GPU and then to it's back buffer and present. I'm also not sure if we would really want this feature in 3Dmigoto - it's a little far removed from 3D modding and accepting it means (Bo3b and I) paying the maintenance cost due to the increased complexity - but I have no objection to you trying it out in a topic branch or a side build and showing us what you come up with, and maybe we could find a use for it to justify it being in mainline (maybe VR integration could use it?) I have a feeling Bo3b might have some advice or suggestions here... Bo3b?[/quote] It is possible to share resources between different DXGI devices, at least on the same adapter. I've got some prototype code for VR that is doing just that, although the setup is really weird and complicated, and the sharing is primitive and restrictive. But it's good enough to share a back-buffer which is what you care about here. However, I cannot speak to whether it works cross-adapter or not. I did not do any cross-adapter copies, and did not research that, so I'm not sure if your idea will work or not. Direct cross-adapter copies is a funny scenario, and is not how something like Surround works (which is done by the video driver, not DirectX). In principle, you can definitely do this, even if you have to do it the hard way by copying back to system RAM first. The performance impact of a full frame copy from GPU->CPU-GPU2 is probably not all that significant if you only do it once per frame. Given what I've seen of the DirectX support here, I'd actually recommend going the CPU route to begin with, just because it should be a lot simpler and reliable, especially if we throw AMD cards into the mix. I'd only try the direct card-to-card approach if the performance proved out to be problematic.
DarkStarSword said:You might run into issues if the external GPU doesn't have a display connected - I'm guessing here, but it might just disallow you from creating the device because there is no valid resolution at which to create a swap chain, and things like the present call and vsync lose their meaning when there is no output. Maybe you can somehow force this, but I really don't know... and it's still the game that would have to choose to use the external GPU, so you might also need to fool it into believing that there are valid modes available for it to use on that GPU.

There is some mentions in the DirectX documentation about sharing resources between devices, so you might be able to make it work, I just can't guarantee it because I've never tried. I'm going off memory here, but I think you might need to force a flag at device creation, as well as when creating any resources that are going to be shared. Alternatively you could copy the back buffer into a staging resource at the present call to copy it back to the CPU (this can stall the pipeline if you block waiting for it to be ready - you can probably use an event query to determine when the staging resource is ready removing the need to block), then copy it to a resource on the internal GPU and then to it's back buffer and present.


I'm also not sure if we would really want this feature in 3Dmigoto - it's a little far removed from 3D modding and accepting it means (Bo3b and I) paying the maintenance cost due to the increased complexity - but I have no objection to you trying it out in a topic branch or a side build and showing us what you come up with, and maybe we could find a use for it to justify it being in mainline (maybe VR integration could use it?)

I have a feeling Bo3b might have some advice or suggestions here... Bo3b?

It is possible to share resources between different DXGI devices, at least on the same adapter. I've got some prototype code for VR that is doing just that, although the setup is really weird and complicated, and the sharing is primitive and restrictive. But it's good enough to share a back-buffer which is what you care about here.

However, I cannot speak to whether it works cross-adapter or not. I did not do any cross-adapter copies, and did not research that, so I'm not sure if your idea will work or not. Direct cross-adapter copies is a funny scenario, and is not how something like Surround works (which is done by the video driver, not DirectX).


In principle, you can definitely do this, even if you have to do it the hard way by copying back to system RAM first. The performance impact of a full frame copy from GPU->CPU-GPU2 is probably not all that significant if you only do it once per frame.

Given what I've seen of the DirectX support here, I'd actually recommend going the CPU route to begin with, just because it should be a lot simpler and reliable, especially if we throw AMD cards into the mix. I'd only try the direct card-to-card approach if the performance proved out to be problematic.

Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers

Posted 01/24/2017 02:22 AM   
@DarkStarSword I have a suspicion that improperly calculated rayLenght is causing the shift and artiffacts. You have mentioned that Dirt 3 is using an inverse z projection, should't that be also applied to the code below? [code]float rayLength = ((csOrig.z + csDir.z * cb_maxDistance) > -cb_nearPlaneZ) ? // Changed < to > and negated near to match the paper (-cb_nearPlaneZ - csOrig.z) / csDir.z : cb_maxDistance; float3 csEndPoint = csOrig + csDir * rayLength;[/code]
@DarkStarSword
I have a suspicion that improperly calculated rayLenght is causing the shift and artiffacts.
You have mentioned that Dirt 3 is using an inverse z projection, should't that be also applied to the code below?

float rayLength = ((csOrig.z + csDir.z * cb_maxDistance) > -cb_nearPlaneZ) ? // Changed < to > and negated near to match the paper
(-cb_nearPlaneZ - csOrig.z) / csDir.z : cb_maxDistance;
float3 csEndPoint = csOrig + csDir * rayLength;

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 01/24/2017 06:04 PM   
I negated those at the function call, since that seems to be when they stop being used as "view-space" and start being used as "camera-space": [quote="DarkStarSword"] [code] @@ -367,7 +355,7 @@ //float jitter = cb_stride > 1.0f ? float(int(v0.x + v0.y) & 1) * 0.5f : 0.0f; float jitter = 0; // perform ray tracing - true if hit found, false otherwise - bool intersection = traceScreenSpaceRay(rayOriginVS, rayDirectionVS, jitter, hitPixel, hitPoint); + bool intersection = traceScreenSpaceRay(-rayOriginVS, -rayDirectionVS, jitter, hitPixel, hitPoint); depth = DepthBuffer.Load(int3(hitPixel, 0)); [/code] [/quote]
I negated those at the function call, since that seems to be when they stop being used as "view-space" and start being used as "camera-space":

DarkStarSword said:
@@ -367,7 +355,7 @@
//float jitter = cb_stride > 1.0f ? float(int(v0.x + v0.y) & 1) * 0.5f : 0.0f;
float jitter = 0;
// perform ray tracing - true if hit found, false otherwise
- bool intersection = traceScreenSpaceRay(rayOriginVS, rayDirectionVS, jitter, hitPixel, hitPoint);
+ bool intersection = traceScreenSpaceRay(-rayOriginVS, -rayDirectionVS, jitter, hitPixel, hitPoint);


depth = DepthBuffer.Load(int3(hitPixel, 0));

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/24/2017 06:13 PM   
When i set it to static distance of the object it fixes the shift for that object. That indicates that the problem may be with the rayLength.
When i set it to static distance of the object it fixes the shift for that object. That indicates that the problem may be with the rayLength.

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 01/24/2017 06:16 PM   
I really need you to take another look at that function as it's very difficult for me to debug something I do not fully understand.
I really need you to take another look at that function as it's very difficult for me to debug something I do not fully understand.

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 01/24/2017 06:40 PM   
[quote="Oomek"]@DarkStarSword I have a suspicion that improperly calculated rayLenght is causing the shift and artiffacts. You have mentioned that Dirt 3 is using an inverse z projection, should't that be also applied to the code below? [code]float rayLength = ((csOrig.z + csDir.z * cb_maxDistance) > -cb_nearPlaneZ) ? // Changed < to > and negated near to match the paper (-cb_nearPlaneZ - csOrig.z) / csDir.z : cb_maxDistance; float3 csEndPoint = csOrig + csDir * rayLength;[/code][/quote] This equation does not make any sense for the cb_maxDistance = inf (i set it to maximum float allowed) and cb_nearPlaneZ = 10.0
Oomek said:@DarkStarSword
I have a suspicion that improperly calculated rayLenght is causing the shift and artiffacts.
You have mentioned that Dirt 3 is using an inverse z projection, should't that be also applied to the code below?

float rayLength = ((csOrig.z + csDir.z * cb_maxDistance) > -cb_nearPlaneZ) ? // Changed < to > and negated near to match the paper
(-cb_nearPlaneZ - csOrig.z) / csDir.z : cb_maxDistance;
float3 csEndPoint = csOrig + csDir * rayLength;


This equation does not make any sense for the cb_maxDistance = inf (i set it to maximum float allowed) and cb_nearPlaneZ = 10.0

EVGA GeForce GTX 980 SC
Core i5 2500K
MSI Z77A-G45
8GB DDR3
Windows 10 x64

Posted 01/25/2017 12:56 AM   
What? I didn't say this game used reverse Z projection - I didn't check, but I don't think it is. That was ABZU that used those values. You shouldn't need to hardcode the near clipping plane - I gave you the procedure to find it, just pass 0 to the linearise_depth function I gave you. And even if your far clipping plane was infinity, which would surprise me, you still need a finite distance scale for this so don't set the max distance to infinity.
What? I didn't say this game used reverse Z projection - I didn't check, but I don't think it is. That was ABZU that used those values. You shouldn't need to hardcode the near clipping plane - I gave you the procedure to find it, just pass 0 to the linearise_depth function I gave you. And even if your far clipping plane was infinity, which would surprise me, you still need a finite distance scale for this so don't set the max distance to infinity.

2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit

Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD

Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword

Posted 01/25/2017 05:32 AM   
  85 / 141    
Scroll To Top