[quote="DarkStarSword"][quote="mx-2"]I attach the modified dll to this post.[/quote]Can we maybe give you commit access to 3DMigoto? I'd rather avoid side builds like this one becoming public (giving them to an individual for testing is fine) because we end up with version numbers in the wild that don't match the source code, so I'd rather that if you do a public release like this that you use the proper publish.bat, tag and upload it to github.
For now (unless Bo3b has anything to supersede it?) I'll just apply your code to 3DMigoto (tagging you as the author) and do a proper release - I had wanted to get some things I need for the UE4 extension DLL in for this release, but if I put them in now they will be a rush job and we've got enough important fixes (and ShaderRegex) as it is that I don't want to delay it any longer.[/quote]
I'd go with his code for now, because it works for his scenario, and is unlikely to be called by others.
I already know this is not what we want here, because it will match any Hacker* object, not solely the one passed in by RIID, which was what I was trying to get at. There is something that will work better here, but it's not clear to me without a live test. So at least temporarily, this is OK.
To fix only this scenario, it's better to add a QueryInterface override in HackerDXGISwapChain and return 'this' from there. That is safer in that it fixes this exact problem, and won't be possible to break others. My hope was to make something that fixes all possible uses of QueryInterface(self), but if we cannot determine the true object at HackerUnknown it'll never work. Based on our discussion, there may not be a true object here.
@mx-2: But most definitely, if you want to make builds, we can give you access. I agree with DarkStarSword that we pretty much never want eng builds to go out that are not marked as such.
Edit: Actually I don't know how that is working. If it's always that this=HackerUnknown, then it should only ever match "IUnknown" RIID. The typeid(*ppvObject) is clearly wrong, it's doing typeid on the pointer, not the object. Some alt-syntax there might work.
I looked through 40 games or so, and did not find others that call HackerUnknown::QueryInterface in the log, so I think this is pretty rare.
Can we maybe give you commit access to 3DMigoto? I'd rather avoid side builds like this one becoming public (giving them to an individual for testing is fine) because we end up with version numbers in the wild that don't match the source code, so I'd rather that if you do a public release like this that you use the proper publish.bat, tag and upload it to github.
For now (unless Bo3b has anything to supersede it?) I'll just apply your code to 3DMigoto (tagging you as the author) and do a proper release - I had wanted to get some things I need for the UE4 extension DLL in for this release, but if I put them in now they will be a rush job and we've got enough important fixes (and ShaderRegex) as it is that I don't want to delay it any longer.
I'd go with his code for now, because it works for his scenario, and is unlikely to be called by others.
I already know this is not what we want here, because it will match any Hacker* object, not solely the one passed in by RIID, which was what I was trying to get at. There is something that will work better here, but it's not clear to me without a live test. So at least temporarily, this is OK.
To fix only this scenario, it's better to add a QueryInterface override in HackerDXGISwapChain and return 'this' from there. That is safer in that it fixes this exact problem, and won't be possible to break others. My hope was to make something that fixes all possible uses of QueryInterface(self), but if we cannot determine the true object at HackerUnknown it'll never work. Based on our discussion, there may not be a true object here.
@mx-2: But most definitely, if you want to make builds, we can give you access. I agree with DarkStarSword that we pretty much never want eng builds to go out that are not marked as such.
Edit: Actually I don't know how that is working. If it's always that this=HackerUnknown, then it should only ever match "IUnknown" RIID. The typeid(*ppvObject) is clearly wrong, it's doing typeid on the pointer, not the object. Some alt-syntax there might work.
I looked through 40 games or so, and did not find others that call HackerUnknown::QueryInterface in the log, so I think this is pretty rare.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
[center][color="orange"][size="XL"]3Dmigoto 1.2.66[/size][/color][/center][center][color="green"][url]https://github.com/bo3b/3Dmigoto/releases[/url][/color][/center]
[color="orange"][size="XL"]This release has been removed due to a serious regression with key bindings.[/size][/color]
The main highlight of this release is the new ShaderRegex engine, detailed below. There's a few other new things like an easier method to clear resources, but besides that this release mostly contains a great many bug fixes and should be considered a fairly important upgrade.
- Fix Elex (Bo3b & mx-2)
- Fix null pointer crash on platform_update=1 (Bo3b)
[size="S"][color="green"]Shader Management[/color][/size]
- Fixes two long standing bugs that could cause StereoParams to be defined twice (and other bad behaviour) in some circumstances.
- Reloading shaders will now compile other shaders even if one fails to compile.
- Fixes some crashes in rare circumstances when reloading shaders where some shaders had been removed.
[size="S"][color="green"]Overlay[/color][/size]
- Overlay now shows the tag lines of every type of shader currently selected rather than just one.
- New "verbose_overlay" option to show the shader hashes in the overlay.
- Overlay now shows index buffer and render target hunting progress (hidden unless actually hunting these).
- Overlay no longer shows tag lines of removed shaders.
- Fixed crash if shader tag line contained a unicode character.
[size="S"][color="green"]Command Lists / Resource Copying[/color][/size]
- New [ClearUnorderedAccessViewUint] and [ClearUnorderedAccessViewFloat] command list sections, which work in the same way as the [ClearRenderTargetView] and [ClearDepthStencilView] sections to run a command list before & after the game clears a UAV.
- Command lists run from [Clear*] sections now have a "this" resource copy target, which can be used to get access to the resource the game is clearing.
- Accessing the back buffer with resource copying now always implies 'no_view_cache' to fix crashes when changing the resolution if this was omitted.
- New "clear" command to clear resources, eliminating the need to use a custom shader to do this in most cases. The simple form is just "clear = ResourceFoo", but for more advanced usage examples (including clearing with a given colour) refer to this commit message:
https://github.com/bo3b/3Dmigoto/commit/0f5bb989b24180db939aa90aa86f90c3e738f70c
Note that in some cases a resource may only be cleared in one eye when using this command (this is a driver bug, and is known to affect at least the back buffer and structured buffer UAVs in SLI. Render and depth targets seem to be ok, but more investigation is required).
[size="S"][color="green"]Index Buffers[/color][/size]
- Index buffers (and other types of buffers) can now be used with texture filtering, checktextureoverride=ib, presets, etc. [color="orange"][s]but refer to the below note[/s][/color]
- Fix performance regression related to index buffers introduced in 1.2.63 (@masterotaku - can you compare the performance with 1.2.62 with hunting disabled? There's more we could do here, but if its not a problem in practice I'd rather leave it as is to keep things simpler, but if we do need more I need to know ASAP before people start using index buffer filtering)
- List of visited index buffers when dumping vertex/pixel shaders, and list of visited vertex/pixel shaders when dumping index buffers is now correct in the d3d11_log.txt
[s]just a heads up to anyone thinking about using index buffer filtering - we might change how you have to define [TextureOverride] sections for these to indicate what kind of buffer you are using for the next release, so you may wish to hold off using this, or be prepared to make some minor changes to d3dx.ini for 1.2.67.[/s] Looks like we're good - the remaining performance problems are coming from elsewhere so we don't need this complexity and index buffer filtering will remain as is.
[size="S"][color="green"]Ini Parser[/color][/size]
- Audible warning tones when loading or reloading the d3dx.ini are now limited to the first three warnings.
- Fix crash if a line in the d3dx.ini was too long (some old fixes such as Alien Isolation have incorrect comment styles in the [Constants] section which caused this - these will still trigger warnings, but no longer crash).
- Fix spurious warnings about lines consisting of only whitespace
- Minor logging fixes
[size="S"][color="green"]Frame Analysis[/color][/size]
- New dump_on_update option to dump resources that the game updates with UpdateSubresource()
[size="S"][color="green"]Hunting[/color][/size]
- Hunting render targets now only considers those used in the current scene (still not the recommended way to identify render targets).
[size="S"][color="green"]Misc[/color][/size]
- 3DMigoto can now be used on AMD/Intel systems for non-3D Vision modding by removing nvapi.dll (as in, the old hack of shipping the nvidia nvapi.dll instead of ours is no longer required to do this, but we still can't have our nvapi.dll wrapper on non-nvidia systems).
- The "use_criticalsection" option has been removed - we now always use the critical section for compatibility with multi-threaded games.
[size="XL"][color="green"][b]ShaderRegex Engine[/b][/color][/size]
This release includes the new ShaderRegex engine, which is capable of patching shaders on the fly that match a given pattern.
I'm just going to go ahead and jump into some real world examples of using this feature, because I think it will make more sense once you see it. The d3dx.ini includes the first of these examples along with some documentation.
This is a complete, working example from Hellblade that locates a matrix multiply and injects a stereo correction before it. This example will work regardless of which registers the game used, and uses named capture groups to pull out the registers and components that contain the X and Z values (and another "result" capture group to check that the same register is used in two instructions, though this for demonstration purposes and is not actually required in this case), two temporary registers are defined named "stereo" and "tmp1" that can be used in the replace section, InsertDeclarations is used to define StereoParams, and ${0} is used to insert the entire matching pattern in the replaced text (this is how you insert text before or after a pattern):
[code]
[ShaderRegexScreenToShadowMatrix]
shader_model = ps_5_0
temps = stereo tmp1
[ShaderRegexScreenToShadowMatrix.Pattern]
mul r\d+\.xyzw, r\d+\.yyyy, cb0\[28\]\.xyzw\n
mad r\d+\.xyzw, (?P<pos_x>r\d+)\.(?P<swizzle_x>[xyzw])[xyzw]{3}, cb0\[27\]\.xyzw, r\d+\.xyzw\n
mad r\d+\.xyzw, (?P<pos_z>r\d+)\.(?P<swizzle_z>[xyzw])[xyzw]{3}, cb0\[29\]\.xyzw, r\d+\.xyzw\n
add (?P<result>r\d+)\.xyzw, r\d+\.xyzw, cb0\[30\]\.xyzw\n
div r\d+\.[xyzw]{2}, (?P=result)\.[xyzw]{4}, r\d+\.wwww\n
[ShaderRegexScreenToShadowMatrix.InsertDeclarations]
dcl_resource_texture2d (float,float,float,float) t125
[ShaderRegexScreenToShadowMatrix.Pattern.Replace]
\n// Shadows automatically corrected by DarkStarSword's UE4 autofix:\n
ld_indexable(texture2d)(float,float,float,float) ${stereo}.xyzw, l(0, 0, 0, 0), t125.xyzw\n
add ${tmp1}.x, ${pos_z}.${swizzle_z}, -${stereo}.y\n
mad ${pos_x}.${swizzle_x}, -${tmp1}.x, ${stereo}.x, ${pos_x}.${swizzle_x}\n
\n
${0}
[/code]
This is another example (also from Hellblade) that shows how this can be used in conjunction with a command list to copy a custom resource containing a matrix into the shader, and shows how you can use named capture groups to capture entire chunks of the original shader and decide where to place them in the replaced output (this is how to insert and/or alter text in the middle of a pattern). This example is a little more precise than the above one since it only matches shaders using the same register numbers - I'm not recommending either approach at the moment: you could start with a more precise pattern and make it more flexible as you come across variations, or start with something more flexible and blacklist any shaders it shouldn't have been applied to - both are valid approaches and we need more real world experience to work out which is "better":
[code]
[ShaderRegexScreenToDecalMatrix]
shader_model = ps_5_0
temps = tmp1 m0 m1 m2 m3
; Need SV_Position stereoisation matrix:
ps-t90 = ResourceStereoInjectionMatrices
post ps-t90 = null
[ShaderRegexScreenToDecalMatrix.Pattern]
; Texture and sampler numbers known to vary. The indented whitespace here is
; for my reference, it is not part of the pattern and is not required:
(?P<prefix>
mul r0\.xy, v0\.xyxx, cb1\[123\]\.zwzz\n
sample_l_indexable\(texture2d\)\(float,float,float,float\) r0\.x, r0\.xyxx, t\d+\.xyzw, s\d+, l\(0\.000000\)\n
)
mul r1\.xyzw, v0\.yyyy, cb0\[2\]\.xyzw\n
mad r1\.xyzw, v0\.xxxx, cb0\[1\]\.xyzw, r1\.xyzw\n
mad r0\.xyzw, r0\.xxxx, cb0\[3\]\.xyzw, r1\.xyzw\n
(?<after>
add r0\.xyzw, r0\.xyzw, cb0\[4\]\.xyzw\n
div r0\.[xyzw]{3}, r0\.[xyzw]{4}, r0\.wwww\n
)
[ShaderRegexScreenToDecalMatrix.InsertDeclarations]
// Injection Matrices:
dcl_resource_structured t90, 64
[ShaderRegexScreenToDecalMatrix.Pattern.Replace]
${prefix}
\n
// SV_Position inverse stereoisation matrix:\n
ld_structured_indexable(structured_buffer, stride=64)(mixed,mixed,mixed,mixed) ${m0}.xyzw, l(0), l(0), t90.xyzw\n
ld_structured_indexable(structured_buffer, stride=64)(mixed,mixed,mixed,mixed) ${m1}.xyzw, l(0), l(16), t90.xyzw\n
ld_structured_indexable(structured_buffer, stride=64)(mixed,mixed,mixed,mixed) ${m2}.xyzw, l(0), l(32), t90.xyzw\n
ld_structured_indexable(structured_buffer, stride=64)(mixed,mixed,mixed,mixed) ${m3}.xyzw, l(0), l(48), t90.xyzw\n
\n
// Multiply SV_Position by injection matrix:\n
mul ${tmp1}.xyzw, v0.xxxx, ${m0}.xyzw\n
mad ${tmp1}.xyzw, v0.yyyy, ${m1}.xyzw, ${tmp1}.xyzw\n
mad ${tmp1}.xyzw, r0.xxxx, ${m2}.xyzw, ${tmp1}.xyzw\n
add ${tmp1}.xyzw, ${tmp1}.xyzw, ${m3}.xyzw\n
\n
// Adjust original matrix multiply to use inverse stereoised SV_Position:\n
mul r1.xyzw, ${tmp1}.yyyy, cb0[2].xyzw\n
mad r1.xyzw, ${tmp1}.xxxx, cb0[1].xyzw, r1.xyzw\n
mad r0.xyzw, ${tmp1}.zzzz, cb0[3].xyzw, r1.xyzw\n
${after}
[/code]
As you can see there are four sections that make up this feature:
1. [ShaderRegex*] defines which shader models it matches and declares any temporary registers that you can use in the Replace section (which will automatically update/insert dcl_temps as required). This section also acts as a command list, which is appended to the ShaderOverride command list of all matching shaders.
2. [ShaderRegex*.Pattern] is a PCRE2 regular expression that will match part of the shader.
3. [ShaderRegex*.Pattern.Replace] is the text that will replace the matched pattern. Use ${0} to insert the original text the pattern matched, or use named capture groups to insert part of the matched pattern whereever it is needed. Temporary registers you have defined in the main section use the exact same syntax here as named capture groups here, like ${tmp1}, etc.
4. [ShaderRegex*.InsertDeclarations] is where you insert any additional declarations you need, which will typically be used for StereoParams. 3DMigoto will prevent these from being inserted if a 100% identical declaration is already in the shader (e.g. if multiple ShaderRegex sections match a single shader you still only want one StereoParams declaration).
A few things to point out here that might trip some people up:
1. Only assembly shaders can be patched using this engine. This is by design, for reliability and performance reasons.
2. [color="orange"]The explicit newline characters \n in the [b]pattern *and* replacement[/b] text are very important[/color] - try not to forget these, I am fully expecting they will be the number 1 thing wrong if a pattern isn't working.
3. [color="orange"]Some characters like . [] () need to be escaped with a \ in the [b]Pattern[/b][/color] section because these have special meanings in regular expressions. You may be able to get away without escaping dots, but that is because a dot in a regular expression matches *any* character, which happens to include dots - you should try to remember to escape these as well to be on the safe side. [color="orange"]These characters are [b]not[/b] to be escaped in the [b]Replace[/b] section[/color].
4. Whitespace at the [color="orange"]start and end of each line is ignored[/color], but [color="orange"]whitespace in the middle of a line is not[/color]. This is due to the ini parsing and is not inherently part of regular expressions, but it seems to work pretty well in practice (as you can see above, I indented some lines in the patterns for my reference). There is a switch you can use in the regular expression to ignore all whitespace (refer to the PCRE2 syntax guide), but that is separate to this.
5. The Replace section is ".Pattern.Replace", not just ".Replace". At the moment there can only be one of each, but the intent is that in the future we will support multiple patterns in a ShaderRegex group (e.g. to match part of a header in one pattern, and code in another), so the .Replace section is associated with a specific pattern.
The grammar we are using is the very powerful PCRE2, which is largely compatible with the Perl and Python (and I think .NET) grammars if you are familiar with any of those. I'm using Python style named capture groups above, but Perl and .NET styles work as well. The reference for PCRE2 is here (beware that if you aren't familiar with regular expressions this will look like Greek - any good regular expression tutorial should provide a good primer on the general syntax and only refer to this for specific details):
http://www.pcre.org/current/doc/html/pcre2syntax.html
The d3d11_log.txt will tell you if the pattern matched a shader, and will show you the patched shader - use this to make sure that the pattern is matching a shader at all, and applying correctly if it did match.
The hunting overlay will show the [ShaderRegex] section for any patched shaders. For debugging I recommend turning on the new verbose_overlay option to see the hash of these shaders to match these up with the log.
If you dump out a patched shader, you will get the *original* shader, not the patched version in ShaderFixes, and doing this will blacklist it from further consideration in the ShaderRegex engine until it has been removed from ShaderFixes. If you need the patched version, you can copy it out of the d3d11_log.txt (or bug me to add an option to switch this behaviour around).
This feature fully works with F10 reload (it is recommended to have both shader and config reload assigned to the same key) to adjust the patterns on the fly and see the results instantly in game. This includes unpatching shaders that no longer match, so what you see after an F10 reload should be the same as starting the game from scratch.
A few special uses of this feature for [color="orange"]advanced users[/color]:
- The .Pattern.Replace section is optional. If you omit it the pattern will still be matched against shaders and the command list and .InsertDeclarations section will still be applied, it just won't replace the matched part of the shader.
- The .Pattern section is also optional. If you omit it the pattern will match every shader that matches the shader_model, and the command list and .InsertDeclarations section will still be applied. This can be used to do advanced things like globally disabling the driver's stereo correction via the cb12 method in InsertDeclarations, or running a command list for every shader.
[color="orange"]Current Limitations & Performance Notes[/color]
- There is a slight performance hit of enabling the patching engine
- Shaders are patched when they are first used, which may introduce some minor stuttering during gameplay (in the future we may try to patch these at load time when possible to reduce this impact, but I have other reasons for doing this on the fly that are not apparent just yet).
- Patched shaders are not currently cached anywhere, so the cost will be paid every time the game is run, and again after any F10 reload.
- Each ShaderRegex section is currently limited to one pattern and one replace section - in the future we may allow for multiple patterns to make it easier to e.g. match a header with one pattern and use a value extracted from that header in the main patch. This can be done without this, but would require a more complicated pattern that matches the header and instructions and everything between.
This release has been removed due to a serious regression with key bindings.
The main highlight of this release is the new ShaderRegex engine, detailed below. There's a few other new things like an easier method to clear resources, but besides that this release mostly contains a great many bug fixes and should be considered a fairly important upgrade.
Shader Management
- Fixes two long standing bugs that could cause StereoParams to be defined twice (and other bad behaviour) in some circumstances.
- Reloading shaders will now compile other shaders even if one fails to compile.
- Fixes some crashes in rare circumstances when reloading shaders where some shaders had been removed.
Overlay
- Overlay now shows the tag lines of every type of shader currently selected rather than just one.
- New "verbose_overlay" option to show the shader hashes in the overlay.
- Overlay now shows index buffer and render target hunting progress (hidden unless actually hunting these).
- Overlay no longer shows tag lines of removed shaders.
- Fixed crash if shader tag line contained a unicode character.
Command Lists / Resource Copying
- New [ClearUnorderedAccessViewUint] and [ClearUnorderedAccessViewFloat] command list sections, which work in the same way as the [ClearRenderTargetView] and [ClearDepthStencilView] sections to run a command list before & after the game clears a UAV.
- Command lists run from [Clear*] sections now have a "this" resource copy target, which can be used to get access to the resource the game is clearing.
- Accessing the back buffer with resource copying now always implies 'no_view_cache' to fix crashes when changing the resolution if this was omitted.
- New "clear" command to clear resources, eliminating the need to use a custom shader to do this in most cases. The simple form is just "clear = ResourceFoo", but for more advanced usage examples (including clearing with a given colour) refer to this commit message:
https://github.com/bo3b/3Dmigoto/commit/0f5bb989b24180db939aa90aa86f90c3e738f70c
Note that in some cases a resource may only be cleared in one eye when using this command (this is a driver bug, and is known to affect at least the back buffer and structured buffer UAVs in SLI. Render and depth targets seem to be ok, but more investigation is required).
Index Buffers
- Index buffers (and other types of buffers) can now be used with texture filtering, checktextureoverride=ib, presets, etc. but refer to the below note
- Fix performance regression related to index buffers introduced in 1.2.63 (@masterotaku - can you compare the performance with 1.2.62 with hunting disabled? There's more we could do here, but if its not a problem in practice I'd rather leave it as is to keep things simpler, but if we do need more I need to know ASAP before people start using index buffer filtering)
- List of visited index buffers when dumping vertex/pixel shaders, and list of visited vertex/pixel shaders when dumping index buffers is now correct in the d3d11_log.txt
just a heads up to anyone thinking about using index buffer filtering - we might change how you have to define [TextureOverride] sections for these to indicate what kind of buffer you are using for the next release, so you may wish to hold off using this, or be prepared to make some minor changes to d3dx.ini for 1.2.67. Looks like we're good - the remaining performance problems are coming from elsewhere so we don't need this complexity and index buffer filtering will remain as is.
Ini Parser
- Audible warning tones when loading or reloading the d3dx.ini are now limited to the first three warnings.
- Fix crash if a line in the d3dx.ini was too long (some old fixes such as Alien Isolation have incorrect comment styles in the [Constants] section which caused this - these will still trigger warnings, but no longer crash).
- Fix spurious warnings about lines consisting of only whitespace
- Minor logging fixes
Frame Analysis
- New dump_on_update option to dump resources that the game updates with UpdateSubresource()
Hunting
- Hunting render targets now only considers those used in the current scene (still not the recommended way to identify render targets).
Misc
- 3DMigoto can now be used on AMD/Intel systems for non-3D Vision modding by removing nvapi.dll (as in, the old hack of shipping the nvidia nvapi.dll instead of ours is no longer required to do this, but we still can't have our nvapi.dll wrapper on non-nvidia systems).
- The "use_criticalsection" option has been removed - we now always use the critical section for compatibility with multi-threaded games.
ShaderRegex Engine
This release includes the new ShaderRegex engine, which is capable of patching shaders on the fly that match a given pattern.
I'm just going to go ahead and jump into some real world examples of using this feature, because I think it will make more sense once you see it. The d3dx.ini includes the first of these examples along with some documentation.
This is a complete, working example from Hellblade that locates a matrix multiply and injects a stereo correction before it. This example will work regardless of which registers the game used, and uses named capture groups to pull out the registers and components that contain the X and Z values (and another "result" capture group to check that the same register is used in two instructions, though this for demonstration purposes and is not actually required in this case), two temporary registers are defined named "stereo" and "tmp1" that can be used in the replace section, InsertDeclarations is used to define StereoParams, and ${0} is used to insert the entire matching pattern in the replaced text (this is how you insert text before or after a pattern):
[ShaderRegexScreenToShadowMatrix]
shader_model = ps_5_0
temps = stereo tmp1
This is another example (also from Hellblade) that shows how this can be used in conjunction with a command list to copy a custom resource containing a matrix into the shader, and shows how you can use named capture groups to capture entire chunks of the original shader and decide where to place them in the replaced output (this is how to insert and/or alter text in the middle of a pattern). This example is a little more precise than the above one since it only matches shaders using the same register numbers - I'm not recommending either approach at the moment: you could start with a more precise pattern and make it more flexible as you come across variations, or start with something more flexible and blacklist any shaders it shouldn't have been applied to - both are valid approaches and we need more real world experience to work out which is "better":
[ShaderRegexScreenToDecalMatrix]
shader_model = ps_5_0
temps = tmp1 m0 m1 m2 m3
; Need SV_Position stereoisation matrix:
ps-t90 = ResourceStereoInjectionMatrices
post ps-t90 = null
[ShaderRegexScreenToDecalMatrix.Pattern]
; Texture and sampler numbers known to vary. The indented whitespace here is
; for my reference, it is not part of the pattern and is not required:
(?P<prefix>
mul r0\.xy, v0\.xyxx, cb1\[123\]\.zwzz\n
sample_l_indexable\(texture2d\)\(float,float,float,float\) r0\.x, r0\.xyxx, t\d+\.xyzw, s\d+, l\(0\.000000\)\n
)
mul r1\.xyzw, v0\.yyyy, cb0\[2\]\.xyzw\n
mad r1\.xyzw, v0\.xxxx, cb0\[1\]\.xyzw, r1\.xyzw\n
mad r0\.xyzw, r0\.xxxx, cb0\[3\]\.xyzw, r1\.xyzw\n
(?<after>
add r0\.xyzw, r0\.xyzw, cb0\[4\]\.xyzw\n
div r0\.[xyzw]{3}, r0\.[xyzw]{4}, r0\.wwww\n
)
As you can see there are four sections that make up this feature:
1. [ShaderRegex*] defines which shader models it matches and declares any temporary registers that you can use in the Replace section (which will automatically update/insert dcl_temps as required). This section also acts as a command list, which is appended to the ShaderOverride command list of all matching shaders.
2. [ShaderRegex*.Pattern] is a PCRE2 regular expression that will match part of the shader.
3. [ShaderRegex*.Pattern.Replace] is the text that will replace the matched pattern. Use ${0} to insert the original text the pattern matched, or use named capture groups to insert part of the matched pattern whereever it is needed. Temporary registers you have defined in the main section use the exact same syntax here as named capture groups here, like ${tmp1}, etc.
4. [ShaderRegex*.InsertDeclarations] is where you insert any additional declarations you need, which will typically be used for StereoParams. 3DMigoto will prevent these from being inserted if a 100% identical declaration is already in the shader (e.g. if multiple ShaderRegex sections match a single shader you still only want one StereoParams declaration).
A few things to point out here that might trip some people up:
1. Only assembly shaders can be patched using this engine. This is by design, for reliability and performance reasons.
2. The explicit newline characters \n in the pattern *and* replacement text are very important - try not to forget these, I am fully expecting they will be the number 1 thing wrong if a pattern isn't working.
3. Some characters like . [] () need to be escaped with a \ in the Pattern section because these have special meanings in regular expressions. You may be able to get away without escaping dots, but that is because a dot in a regular expression matches *any* character, which happens to include dots - you should try to remember to escape these as well to be on the safe side. These characters are not to be escaped in the Replace section.
4. Whitespace at the start and end of each line is ignored, but whitespace in the middle of a line is not. This is due to the ini parsing and is not inherently part of regular expressions, but it seems to work pretty well in practice (as you can see above, I indented some lines in the patterns for my reference). There is a switch you can use in the regular expression to ignore all whitespace (refer to the PCRE2 syntax guide), but that is separate to this.
5. The Replace section is ".Pattern.Replace", not just ".Replace". At the moment there can only be one of each, but the intent is that in the future we will support multiple patterns in a ShaderRegex group (e.g. to match part of a header in one pattern, and code in another), so the .Replace section is associated with a specific pattern.
The grammar we are using is the very powerful PCRE2, which is largely compatible with the Perl and Python (and I think .NET) grammars if you are familiar with any of those. I'm using Python style named capture groups above, but Perl and .NET styles work as well. The reference for PCRE2 is here (beware that if you aren't familiar with regular expressions this will look like Greek - any good regular expression tutorial should provide a good primer on the general syntax and only refer to this for specific details):
http://www.pcre.org/current/doc/html/pcre2syntax.html
The d3d11_log.txt will tell you if the pattern matched a shader, and will show you the patched shader - use this to make sure that the pattern is matching a shader at all, and applying correctly if it did match.
The hunting overlay will show the [ShaderRegex] section for any patched shaders. For debugging I recommend turning on the new verbose_overlay option to see the hash of these shaders to match these up with the log.
If you dump out a patched shader, you will get the *original* shader, not the patched version in ShaderFixes, and doing this will blacklist it from further consideration in the ShaderRegex engine until it has been removed from ShaderFixes. If you need the patched version, you can copy it out of the d3d11_log.txt (or bug me to add an option to switch this behaviour around).
This feature fully works with F10 reload (it is recommended to have both shader and config reload assigned to the same key) to adjust the patterns on the fly and see the results instantly in game. This includes unpatching shaders that no longer match, so what you see after an F10 reload should be the same as starting the game from scratch.
A few special uses of this feature for advanced users:
- The .Pattern.Replace section is optional. If you omit it the pattern will still be matched against shaders and the command list and .InsertDeclarations section will still be applied, it just won't replace the matched part of the shader.
- The .Pattern section is also optional. If you omit it the pattern will match every shader that matches the shader_model, and the command list and .InsertDeclarations section will still be applied. This can be used to do advanced things like globally disabling the driver's stereo correction via the cb12 method in InsertDeclarations, or running a command list for every shader.
Current Limitations & Performance Notes
- There is a slight performance hit of enabling the patching engine
- Shaders are patched when they are first used, which may introduce some minor stuttering during gameplay (in the future we may try to patch these at load time when possible to reduce this impact, but I have other reasons for doing this on the fly that are not apparent just yet).
- Patched shaders are not currently cached anywhere, so the cost will be paid every time the game is run, and again after any F10 reload.
- Each ShaderRegex section is currently limited to one pattern and one replace section - in the future we may allow for multiple patterns to make it easier to e.g. match a header with one pattern and use a value extracted from that header in the main patch. This can be done without this, but would require a more complicated pattern that matches the header and instructions and everything between.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Whoa, that's a big update. ShaderRegex will be very useful for games that have hundreds or thousands of shaders that need the same fixing pattern, once they can be cached/saved in the future especially.
I'll try this version with Grim Dawn right when I get home, as you tell me. In 5.5-6 hours.
Whoa, that's a big update. ShaderRegex will be very useful for games that have hundreds or thousands of shaders that need the same fixing pattern, once they can be cached/saved in the future especially.
I'll try this version with Grim Dawn right when I get home, as you tell me. In 5.5-6 hours.
Big and Awesome thanks DSS! This is one hell of an awesome update!
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
Thanks so much for the update DSS and huge patch notes post. That alone must have taken some time. You are a hero!
i7-4790K CPU 4.8Ghz stable overclock.
16 GB RAM Corsair
ASUS Turbo 2080TI
Samsung SSD 840Pro
ASUS Z97-WS3D
Surround ASUS Rog Swift PG278Q(R), 2x PG278Q (yes it works)
Obutto R3volution.
Windows 10 pro 64x (Windows 7 Dual boot)
[quote="DarkStarSword"]The fps drop you see when the OSD is enabled will be due to the statistics collection (the info that ends up in shader_usage.txt when you dump something), which is known to be expensive (some games are worse than others - this sounds like a particularly bad case) so we limit it to only happen when hunting is fully enabled with the OSD shown. I've thought about adding an option for this specific thing, but turning off the OSD has usually been good enough, so unless you want one I'd rather leave that as is.[/quote]Turns out I already did that and had forgotten about it (because in Firewatch this was more than just a performance issue and quickly ate up all available RAM until Windows freaked out and it crashed, which only took about a minute) - if you disable dump_usage you should be able to recover this performance, at the cost of no longer getting ShaderUsage.txt.
DarkStarSword said:The fps drop you see when the OSD is enabled will be due to the statistics collection (the info that ends up in shader_usage.txt when you dump something), which is known to be expensive (some games are worse than others - this sounds like a particularly bad case) so we limit it to only happen when hunting is fully enabled with the OSD shown. I've thought about adding an option for this specific thing, but turning off the OSD has usually been good enough, so unless you want one I'd rather leave that as is.
Turns out I already did that and had forgotten about it (because in Firewatch this was more than just a performance issue and quickly ate up all available RAM until Windows freaked out and it crashed, which only took about a minute) - if you disable dump_usage you should be able to recover this performance, at the cost of no longer getting ShaderUsage.txt.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
That's good to know. It will be useful for hunting while playing once ShaderUsage isn't useful anymore, and useful for comparing fps of default vs modified shaders.
Thanks a lot for your work, DSS and bo3b.
That's good to know. It will be useful for hunting while playing once ShaderUsage isn't useful anymore, and useful for comparing fps of default vs modified shaders.
[quote="DarkStarSword"][center][color="orange"][size="XL"]3Dmigoto 1.2.66[/size][/color][/center][center][color="green"][url]https://github.com/bo3b/3Dmigoto/releases[/url][/color][/center]
Snip
[/quote]
As always, thank you for such a monumental release and all the work put into it. So glad that I just happened to chance upon starting to learn regex using text editors for my most recent fix, so that I'm able to actually understand this. The overlay additions sound awesome, and lots of good bug fixes too. Speaking of which...
[quote]- Fix performance regression related to index buffers introduced in 1.2.63[/quote]
In my most recent fix, I tried copying an index buffer into a shader to use for using your autodepth stereo correction code for some HUD icons, and performance completely tanked so I had to scratch using it. When I looked at the dumped index buffer, it was a rather huge file so I assumed that was why, but could it have been due to this regression issue? I suppose I could always just test and find out a bit later, but doesn't hurt to ask.
Anyway, thanks again.
As always, thank you for such a monumental release and all the work put into it. So glad that I just happened to chance upon starting to learn regex using text editors for my most recent fix, so that I'm able to actually understand this. The overlay additions sound awesome, and lots of good bug fixes too. Speaking of which...
- Fix performance regression related to index buffers introduced in 1.2.63
In my most recent fix, I tried copying an index buffer into a shader to use for using your autodepth stereo correction code for some HUD icons, and performance completely tanked so I had to scratch using it. When I looked at the dumped index buffer, it was a rather huge file so I assumed that was why, but could it have been due to this regression issue? I suppose I could always just test and find out a bit later, but doesn't hurt to ask.
Anyway, thanks again.
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
There are three things I can comment about the new version:
1- Something weird must be going on with hunting mode, because fps tank HARD even when the overlay is not visible. Like 23-30 fps in Grim Dawn. I tried disabling all kinds of logging, including "dump_usage=0". Setting "hunting=0" was the only solution I found.
2- bo3b, when you talked about "allow_create_device=1" I thought it was weird that you didn't mention "allow_create_device=2", which is the default value. I can see why. "2" keeps crashing Grim Dawn at boot. "1" doesn't make it crash anymore. Not that the game needs any value higher than 0 for anything. Just that 0 and 1 are fine now, and only 0 worked before.
3- About fps (when "hunting=0"), I seem to get very slightly higher fps. With all pets in the first town, I now get consistently 46fps instead of 45, with some short peaks to 47. Almost within margin of error and still far from not using 3Dmigoto.
There are three things I can comment about the new version:
1- Something weird must be going on with hunting mode, because fps tank HARD even when the overlay is not visible. Like 23-30 fps in Grim Dawn. I tried disabling all kinds of logging, including "dump_usage=0". Setting "hunting=0" was the only solution I found.
2- bo3b, when you talked about "allow_create_device=1" I thought it was weird that you didn't mention "allow_create_device=2", which is the default value. I can see why. "2" keeps crashing Grim Dawn at boot. "1" doesn't make it crash anymore. Not that the game needs any value higher than 0 for anything. Just that 0 and 1 are fine now, and only 0 worked before.
3- About fps (when "hunting=0"), I seem to get very slightly higher fps. With all pets in the first town, I now get consistently 46fps instead of 45, with some short peaks to 47. Almost within margin of error and still far from not using 3Dmigoto.
[quote="DarkStarSword"]Can we maybe give you commit access to 3DMigoto? I'd rather avoid side builds like this one becoming public (giving them to an individual for testing is fine) because we end up with version numbers in the wild that don't match the source code, so I'd rather that if you do a public release like this that you use the proper publish.bat, tag and upload it to github.
For now (unless Bo3b has anything to supersede it?) I'll just apply your code to 3DMigoto (tagging you as the author) and do a proper release - I had wanted to get some things I need for the UE4 extension DLL in for this release, but if I put them in now they will be a rush job and we've got enough important fixes (and ShaderRegex) as it is that I don't want to delay it any longer.[/quote]
[quote="bo3b"]@mx-2: But most definitely, if you want to make builds, we can give you access. I agree with DarkStarSword that we pretty much never want eng builds to go out that are not marked as such.
[/quote]
I don't think that I need commit access for now - this was just a dirty hack for that problem until there is a better solution and the next 3Dmigoto version is released. If I write patch next time, I'll send a github pull request.
My reason for that build was that I didn't know if somebody else is also stuck at this bug and how long it takes until the next 3Dmigoto release. To prevent further version number chaos, I removed my attachment from the previous post.
DarkStarSword said:Can we maybe give you commit access to 3DMigoto? I'd rather avoid side builds like this one becoming public (giving them to an individual for testing is fine) because we end up with version numbers in the wild that don't match the source code, so I'd rather that if you do a public release like this that you use the proper publish.bat, tag and upload it to github.
For now (unless Bo3b has anything to supersede it?) I'll just apply your code to 3DMigoto (tagging you as the author) and do a proper release - I had wanted to get some things I need for the UE4 extension DLL in for this release, but if I put them in now they will be a rush job and we've got enough important fixes (and ShaderRegex) as it is that I don't want to delay it any longer.
bo3b said:@mx-2: But most definitely, if you want to make builds, we can give you access. I agree with DarkStarSword that we pretty much never want eng builds to go out that are not marked as such.
I don't think that I need commit access for now - this was just a dirty hack for that problem until there is a better solution and the next 3Dmigoto version is released. If I write patch next time, I'll send a github pull request.
My reason for that build was that I didn't know if somebody else is also stuck at this bug and how long it takes until the next 3Dmigoto release. To prevent further version number chaos, I removed my attachment from the previous post.
[quote="DJ-RK"][quote]- Fix performance regression related to index buffers introduced in 1.2.63[/quote]In my most recent fix, I tried copying an index buffer into a shader to use for using your autodepth stereo correction code for some HUD icons, and performance completely tanked so I had to scratch using it. When I looked at the dumped index buffer, it was a rather huge file so I assumed that was why, but could it have been due to this regression issue? I suppose I could always just test and find out a bit later, but doesn't hurt to ask.[/quote]
The index buffer hash isn't used when copying an index buffer, so I doubt this will make any difference (this performance issue was a passive one that could knock a few fps off any CPU bound game regardless of what features they were using), but certainly do let me know.
When you say huge - how huge is huge? Games can combine tones of these into a single buffer, so it certainly could be quite large. 3DMigoto knocks off the offset into the buffer for the current draw call when it copies it, so it won't actually copy the full buffer, but that's more intended to make the buffer easier to use in the destination shader (so you don't have to deal with the offsets) rather than save on performance, and it still copies to the end of the buffer.
Performance problems when copying a resource are usually due to performing too many full copies in a frame - I recommend using the frame analysis log to check this, searching for these strings:
"performing full copy" - 3DMigoto performed a full copy - expensive.
"performing region copy" - 3DMigoto performed a partial full copy, but knocked off an offset - still expensive.
"copying by reference" - Just a lightweight reference, good for performance.
"max_copies_per_frame exceeded" - 3DMigoto would have performed a full copy, but this limit prevented it.
Depending on your situation there are some strategies you can use to try to avoid this:
- Copy by reference. Since you are copying index buffers into a different resource slot it is unlikely you would be able to use this because the bind flags will likely be wrong.
- Set max_copies_per_frame=1 in the destination [Resource] section, but this is no good if you legitimately need an updated copy every time. If you only need an updated copy once per scene (for games that render multiple scenes every frame) you can reset this counter from a [ClearRenderTargetView] section (refer to my latest Unity template for an example).
- Use an intermediate resource that you copy everything to by reference, then do a full copy of that resource only when you actually need it in the destination shader. This is suitable for situations where you are trying to opportunistically get access to some resource, but only need it once or twice later - I use this approach a lot for depth buffers. You can also perform a full copy from the first intermediate resource into a second using max_copies_per_frame=1 to limit the number of full copies if you use this resource in a lot of shaders or HUD elements.
[s]- Use a custom shader to save only the information you need to a UAV or custom render target. This is by far the most flexible option if you do need updated copies for every draw call, but the flexibility means it is kind of hard to just point to one example to show how it is done. It will still have a performance cost since you are now running extra shaders, and it's hard to say how that will compare with the performance cost of doing the full copy. You could study my HUD analysis shaders in Dreamfall Chapters, which use a UAV to track state across multiple draw calls in a frame, including the positions of up to eight text elements drawn in a frame.[/s] Scratch that - you'd still need to copy the index buffer into the custom shader making this quite pointless.
- Fix performance regression related to index buffers introduced in 1.2.63
In my most recent fix, I tried copying an index buffer into a shader to use for using your autodepth stereo correction code for some HUD icons, and performance completely tanked so I had to scratch using it. When I looked at the dumped index buffer, it was a rather huge file so I assumed that was why, but could it have been due to this regression issue? I suppose I could always just test and find out a bit later, but doesn't hurt to ask.
The index buffer hash isn't used when copying an index buffer, so I doubt this will make any difference (this performance issue was a passive one that could knock a few fps off any CPU bound game regardless of what features they were using), but certainly do let me know.
When you say huge - how huge is huge? Games can combine tones of these into a single buffer, so it certainly could be quite large. 3DMigoto knocks off the offset into the buffer for the current draw call when it copies it, so it won't actually copy the full buffer, but that's more intended to make the buffer easier to use in the destination shader (so you don't have to deal with the offsets) rather than save on performance, and it still copies to the end of the buffer.
Performance problems when copying a resource are usually due to performing too many full copies in a frame - I recommend using the frame analysis log to check this, searching for these strings:
"performing full copy" - 3DMigoto performed a full copy - expensive.
"performing region copy" - 3DMigoto performed a partial full copy, but knocked off an offset - still expensive.
"copying by reference" - Just a lightweight reference, good for performance.
"max_copies_per_frame exceeded" - 3DMigoto would have performed a full copy, but this limit prevented it.
Depending on your situation there are some strategies you can use to try to avoid this:
- Copy by reference. Since you are copying index buffers into a different resource slot it is unlikely you would be able to use this because the bind flags will likely be wrong.
- Set max_copies_per_frame=1 in the destination [Resource] section, but this is no good if you legitimately need an updated copy every time. If you only need an updated copy once per scene (for games that render multiple scenes every frame) you can reset this counter from a [ClearRenderTargetView] section (refer to my latest Unity template for an example).
- Use an intermediate resource that you copy everything to by reference, then do a full copy of that resource only when you actually need it in the destination shader. This is suitable for situations where you are trying to opportunistically get access to some resource, but only need it once or twice later - I use this approach a lot for depth buffers. You can also perform a full copy from the first intermediate resource into a second using max_copies_per_frame=1 to limit the number of full copies if you use this resource in a lot of shaders or HUD elements.
- Use a custom shader to save only the information you need to a UAV or custom render target. This is by far the most flexible option if you do need updated copies for every draw call, but the flexibility means it is kind of hard to just point to one example to show how it is done. It will still have a performance cost since you are now running extra shaders, and it's hard to say how that will compare with the performance cost of doing the full copy. You could study my HUD analysis shaders in Dreamfall Chapters, which use a UAV to track state across multiple draw calls in a frame, including the positions of up to eight text elements drawn in a frame. Scratch that - you'd still need to copy the index buffer into the custom shader making this quite pointless.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
I've gone back through this thread and added links to most of the previous release notes to the github releases page going back to 1.0 - this is something that Oomek requested and I'd been meaning to do anyway. There's probably a few missing, but I hope I at least got all the major ones.
I've gone back through this thread and added links to most of the previous release notes to the github releases page going back to 1.0 - this is something that Oomek requested and I'd been meaning to do anyway. There's probably a few missing, but I hope I at least got all the major ones.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
[quote="masterotaku"]1- Something weird must be going on with hunting mode, because fps tank HARD even when the overlay is not visible. Like 23-30 fps in Grim Dawn. I tried disabling all kinds of logging, including "dump_usage=0". Setting "hunting=0" was the only solution I found.[/quote]
Bugger :(
The fact that hunting=0 fixes it suggests this might be something other than the index buffer hash (the cost of performing the hash is paid regardless of hunting, and the additional performance cost that was showing up in our profiling should now only be paid when hunting=1, but not when hunting=2, and should be lower than 1.2.65 regardless). There's a few new things in this release related to ShaderRegex that could have cost performance, but nothing really makes sense regarding your observation on hunting modes... I might just have to grab this game for the shared accounts and do some profiling.
Edit: I'm thinking this might be an interaction with the hash contamination detection...
[quote]3- About fps (when "hunting=0"), I seem to get very slightly higher fps. With all pets in the first town, I now get consistently 46fps instead of 45, with some short peaks to 47. Almost within margin of error and still far from not using 3Dmigoto.[/quote]Ok, so we still have more work to do.
[s]In that case, just a heads up to anyone thinking about using index buffer filtering - we might change how you have to define [TextureOverride] sections for these to indicate what kind of buffer you are using for the next release, so you may wish to hold off using this, or be prepared to make some minor changes to d3dx.ini for 1.2.67. I'll know more after examining this game in closer detail.[/s] Nope, this doesn't show up in profiling so leaving it as is.
masterotaku said:1- Something weird must be going on with hunting mode, because fps tank HARD even when the overlay is not visible. Like 23-30 fps in Grim Dawn. I tried disabling all kinds of logging, including "dump_usage=0". Setting "hunting=0" was the only solution I found.
Bugger :(
The fact that hunting=0 fixes it suggests this might be something other than the index buffer hash (the cost of performing the hash is paid regardless of hunting, and the additional performance cost that was showing up in our profiling should now only be paid when hunting=1, but not when hunting=2, and should be lower than 1.2.65 regardless). There's a few new things in this release related to ShaderRegex that could have cost performance, but nothing really makes sense regarding your observation on hunting modes... I might just have to grab this game for the shared accounts and do some profiling.
Edit: I'm thinking this might be an interaction with the hash contamination detection...
3- About fps (when "hunting=0"), I seem to get very slightly higher fps. With all pets in the first town, I now get consistently 46fps instead of 45, with some short peaks to 47. Almost within margin of error and still far from not using 3Dmigoto.
Ok, so we still have more work to do.
In that case, just a heads up to anyone thinking about using index buffer filtering - we might change how you have to define [TextureOverride] sections for these to indicate what kind of buffer you are using for the next release, so you may wish to hold off using this, or be prepared to make some minor changes to d3dx.ini for 1.2.67. I'll know more after examining this game in closer detail. Nope, this doesn't show up in profiling so leaving it as is.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
[quote="DarkStarSword"]snip[/quote]
Crap, you know what? I confused the vertex buffer with an index buffer, it's the vertex buffer that was massive and caused major slowdown when copied over into a texture slot. Not sure if that makes any difference to your comments. Either which way, I probably won't bother messing around with it more for the current game if it's not a simple matter of swapping out the .dll since I'm happy enough with the implementation I was able to achieve.
Crap, you know what? I confused the vertex buffer with an index buffer, it's the vertex buffer that was massive and caused major slowdown when copied over into a texture slot. Not sure if that makes any difference to your comments. Either which way, I probably won't bother messing around with it more for the current game if it's not a simple matter of swapping out the .dll since I'm happy enough with the implementation I was able to achieve.
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
I'd go with his code for now, because it works for his scenario, and is unlikely to be called by others.
I already know this is not what we want here, because it will match any Hacker* object, not solely the one passed in by RIID, which was what I was trying to get at. There is something that will work better here, but it's not clear to me without a live test. So at least temporarily, this is OK.
To fix only this scenario, it's better to add a QueryInterface override in HackerDXGISwapChain and return 'this' from there. That is safer in that it fixes this exact problem, and won't be possible to break others. My hope was to make something that fixes all possible uses of QueryInterface(self), but if we cannot determine the true object at HackerUnknown it'll never work. Based on our discussion, there may not be a true object here.
@mx-2: But most definitely, if you want to make builds, we can give you access. I agree with DarkStarSword that we pretty much never want eng builds to go out that are not marked as such.
Edit: Actually I don't know how that is working. If it's always that this=HackerUnknown, then it should only ever match "IUnknown" RIID. The typeid(*ppvObject) is clearly wrong, it's doing typeid on the pointer, not the object. Some alt-syntax there might work.
I looked through 40 games or so, and did not find others that call HackerUnknown::QueryInterface in the log, so I think this is pretty rare.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
This release has been removed due to a serious regression with key bindings.
The main highlight of this release is the new ShaderRegex engine, detailed below. There's a few other new things like an easier method to clear resources, but besides that this release mostly contains a great many bug fixes and should be considered a fairly important upgrade.
- Fix Elex (Bo3b & mx-2)
- Fix null pointer crash on platform_update=1 (Bo3b)
Shader Management
- Fixes two long standing bugs that could cause StereoParams to be defined twice (and other bad behaviour) in some circumstances.
- Reloading shaders will now compile other shaders even if one fails to compile.
- Fixes some crashes in rare circumstances when reloading shaders where some shaders had been removed.
Overlay
- Overlay now shows the tag lines of every type of shader currently selected rather than just one.
- New "verbose_overlay" option to show the shader hashes in the overlay.
- Overlay now shows index buffer and render target hunting progress (hidden unless actually hunting these).
- Overlay no longer shows tag lines of removed shaders.
- Fixed crash if shader tag line contained a unicode character.
Command Lists / Resource Copying
- New [ClearUnorderedAccessViewUint] and [ClearUnorderedAccessViewFloat] command list sections, which work in the same way as the [ClearRenderTargetView] and [ClearDepthStencilView] sections to run a command list before & after the game clears a UAV.
- Command lists run from [Clear*] sections now have a "this" resource copy target, which can be used to get access to the resource the game is clearing.
- Accessing the back buffer with resource copying now always implies 'no_view_cache' to fix crashes when changing the resolution if this was omitted.
- New "clear" command to clear resources, eliminating the need to use a custom shader to do this in most cases. The simple form is just "clear = ResourceFoo", but for more advanced usage examples (including clearing with a given colour) refer to this commit message:
https://github.com/bo3b/3Dmigoto/commit/0f5bb989b24180db939aa90aa86f90c3e738f70c
Note that in some cases a resource may only be cleared in one eye when using this command (this is a driver bug, and is known to affect at least the back buffer and structured buffer UAVs in SLI. Render and depth targets seem to be ok, but more investigation is required).
Index Buffers
- Index buffers (and other types of buffers) can now be used with texture filtering, checktextureoverride=ib, presets, etc.
but refer to the below note- Fix performance regression related to index buffers introduced in 1.2.63 (@masterotaku - can you compare the performance with 1.2.62 with hunting disabled? There's more we could do here, but if its not a problem in practice I'd rather leave it as is to keep things simpler, but if we do need more I need to know ASAP before people start using index buffer filtering)
- List of visited index buffers when dumping vertex/pixel shaders, and list of visited vertex/pixel shaders when dumping index buffers is now correct in the d3d11_log.txt
just a heads up to anyone thinking about using index buffer filtering - we might change how you have to define [TextureOverride] sections for these to indicate what kind of buffer you are using for the next release, so you may wish to hold off using this, or be prepared to make some minor changes to d3dx.ini for 1.2.67.Looks like we're good - the remaining performance problems are coming from elsewhere so we don't need this complexity and index buffer filtering will remain as is.Ini Parser
- Audible warning tones when loading or reloading the d3dx.ini are now limited to the first three warnings.
- Fix crash if a line in the d3dx.ini was too long (some old fixes such as Alien Isolation have incorrect comment styles in the [Constants] section which caused this - these will still trigger warnings, but no longer crash).
- Fix spurious warnings about lines consisting of only whitespace
- Minor logging fixes
Frame Analysis
- New dump_on_update option to dump resources that the game updates with UpdateSubresource()
Hunting
- Hunting render targets now only considers those used in the current scene (still not the recommended way to identify render targets).
Misc
- 3DMigoto can now be used on AMD/Intel systems for non-3D Vision modding by removing nvapi.dll (as in, the old hack of shipping the nvidia nvapi.dll instead of ours is no longer required to do this, but we still can't have our nvapi.dll wrapper on non-nvidia systems).
- The "use_criticalsection" option has been removed - we now always use the critical section for compatibility with multi-threaded games.
ShaderRegex Engine
This release includes the new ShaderRegex engine, which is capable of patching shaders on the fly that match a given pattern.
I'm just going to go ahead and jump into some real world examples of using this feature, because I think it will make more sense once you see it. The d3dx.ini includes the first of these examples along with some documentation.
This is a complete, working example from Hellblade that locates a matrix multiply and injects a stereo correction before it. This example will work regardless of which registers the game used, and uses named capture groups to pull out the registers and components that contain the X and Z values (and another "result" capture group to check that the same register is used in two instructions, though this for demonstration purposes and is not actually required in this case), two temporary registers are defined named "stereo" and "tmp1" that can be used in the replace section, InsertDeclarations is used to define StereoParams, and ${0} is used to insert the entire matching pattern in the replaced text (this is how you insert text before or after a pattern):
This is another example (also from Hellblade) that shows how this can be used in conjunction with a command list to copy a custom resource containing a matrix into the shader, and shows how you can use named capture groups to capture entire chunks of the original shader and decide where to place them in the replaced output (this is how to insert and/or alter text in the middle of a pattern). This example is a little more precise than the above one since it only matches shaders using the same register numbers - I'm not recommending either approach at the moment: you could start with a more precise pattern and make it more flexible as you come across variations, or start with something more flexible and blacklist any shaders it shouldn't have been applied to - both are valid approaches and we need more real world experience to work out which is "better":
As you can see there are four sections that make up this feature:
1. [ShaderRegex*] defines which shader models it matches and declares any temporary registers that you can use in the Replace section (which will automatically update/insert dcl_temps as required). This section also acts as a command list, which is appended to the ShaderOverride command list of all matching shaders.
2. [ShaderRegex*.Pattern] is a PCRE2 regular expression that will match part of the shader.
3. [ShaderRegex*.Pattern.Replace] is the text that will replace the matched pattern. Use ${0} to insert the original text the pattern matched, or use named capture groups to insert part of the matched pattern whereever it is needed. Temporary registers you have defined in the main section use the exact same syntax here as named capture groups here, like ${tmp1}, etc.
4. [ShaderRegex*.InsertDeclarations] is where you insert any additional declarations you need, which will typically be used for StereoParams. 3DMigoto will prevent these from being inserted if a 100% identical declaration is already in the shader (e.g. if multiple ShaderRegex sections match a single shader you still only want one StereoParams declaration).
A few things to point out here that might trip some people up:
1. Only assembly shaders can be patched using this engine. This is by design, for reliability and performance reasons.
2. The explicit newline characters \n in the pattern *and* replacement text are very important - try not to forget these, I am fully expecting they will be the number 1 thing wrong if a pattern isn't working.
3. Some characters like . [] () need to be escaped with a \ in the Pattern section because these have special meanings in regular expressions. You may be able to get away without escaping dots, but that is because a dot in a regular expression matches *any* character, which happens to include dots - you should try to remember to escape these as well to be on the safe side. These characters are not to be escaped in the Replace section.
4. Whitespace at the start and end of each line is ignored, but whitespace in the middle of a line is not. This is due to the ini parsing and is not inherently part of regular expressions, but it seems to work pretty well in practice (as you can see above, I indented some lines in the patterns for my reference). There is a switch you can use in the regular expression to ignore all whitespace (refer to the PCRE2 syntax guide), but that is separate to this.
5. The Replace section is ".Pattern.Replace", not just ".Replace". At the moment there can only be one of each, but the intent is that in the future we will support multiple patterns in a ShaderRegex group (e.g. to match part of a header in one pattern, and code in another), so the .Replace section is associated with a specific pattern.
The grammar we are using is the very powerful PCRE2, which is largely compatible with the Perl and Python (and I think .NET) grammars if you are familiar with any of those. I'm using Python style named capture groups above, but Perl and .NET styles work as well. The reference for PCRE2 is here (beware that if you aren't familiar with regular expressions this will look like Greek - any good regular expression tutorial should provide a good primer on the general syntax and only refer to this for specific details):
http://www.pcre.org/current/doc/html/pcre2syntax.html
The d3d11_log.txt will tell you if the pattern matched a shader, and will show you the patched shader - use this to make sure that the pattern is matching a shader at all, and applying correctly if it did match.
The hunting overlay will show the [ShaderRegex] section for any patched shaders. For debugging I recommend turning on the new verbose_overlay option to see the hash of these shaders to match these up with the log.
If you dump out a patched shader, you will get the *original* shader, not the patched version in ShaderFixes, and doing this will blacklist it from further consideration in the ShaderRegex engine until it has been removed from ShaderFixes. If you need the patched version, you can copy it out of the d3d11_log.txt (or bug me to add an option to switch this behaviour around).
This feature fully works with F10 reload (it is recommended to have both shader and config reload assigned to the same key) to adjust the patterns on the fly and see the results instantly in game. This includes unpatching shaders that no longer match, so what you see after an F10 reload should be the same as starting the game from scratch.
A few special uses of this feature for advanced users:
- The .Pattern.Replace section is optional. If you omit it the pattern will still be matched against shaders and the command list and .InsertDeclarations section will still be applied, it just won't replace the matched part of the shader.
- The .Pattern section is also optional. If you omit it the pattern will match every shader that matches the shader_model, and the command list and .InsertDeclarations section will still be applied. This can be used to do advanced things like globally disabling the driver's stereo correction via the cb12 method in InsertDeclarations, or running a command list for every shader.
Current Limitations & Performance Notes
- There is a slight performance hit of enabling the patching engine
- Shaders are patched when they are first used, which may introduce some minor stuttering during gameplay (in the future we may try to patch these at load time when possible to reduce this impact, but I have other reasons for doing this on the fly that are not apparent just yet).
- Patched shaders are not currently cached anywhere, so the cost will be paid every time the game is run, and again after any F10 reload.
- Each ShaderRegex section is currently limited to one pattern and one replace section - in the future we may allow for multiple patterns to make it easier to e.g. match a header with one pattern and use a value extracted from that header in the main patch. This can be done without this, but would require a more complicated pattern that matches the header and instructions and everything between.
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
I'll try this version with Grim Dawn right when I get home, as you tell me. In 5.5-6 hours.
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com
1x Palit RTX 2080Ti Pro Gaming OC(watercooled and overclocked to hell)
3x 3D Vision Ready Asus VG278HE monitors (5760x1080).
Intel i9 9900K (overclocked to 5.3 and watercooled ofc).
Asus Maximus XI Hero Mobo.
16 GB Team Group T-Force Dark Pro DDR4 @ 3600.
Lots of Disks:
- Raid 0 - 256GB Sandisk Extreme SSD.
- Raid 0 - WD Black - 2TB.
- SanDisk SSD PLUS 480 GB.
- Intel 760p 256GB M.2 PCIe NVMe SSD.
Creative Sound Blaster Z.
Windows 10 x64 Pro.
etc
My website with my fixes and OpenGL to 3D Vision wrapper:
http://3dsurroundgaming.com
(If you like some of the stuff that I've done and want to donate something, you can do it with PayPal at tavyhome@gmail.com)
i7-4790K CPU 4.8Ghz stable overclock.
16 GB RAM Corsair
ASUS Turbo 2080TI
Samsung SSD 840Pro
ASUS Z97-WS3D
Surround ASUS Rog Swift PG278Q(R), 2x PG278Q (yes it works)
Obutto R3volution.
Windows 10 pro 64x (Windows 7 Dual boot)
Thanks!!!!
MY WEB
Helix Mod - Making 3D Better
My 3D Screenshot Gallery
Like my fixes? you can donate to Paypal: dhr.donation@gmail.com
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
Thanks a lot for your work, DSS and bo3b.
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com
As always, thank you for such a monumental release and all the work put into it. So glad that I just happened to chance upon starting to learn regex using text editors for my most recent fix, so that I'm able to actually understand this. The overlay additions sound awesome, and lots of good bug fixes too. Speaking of which...
In my most recent fix, I tried copying an index buffer into a shader to use for using your autodepth stereo correction code for some HUD icons, and performance completely tanked so I had to scratch using it. When I looked at the dumped index buffer, it was a rather huge file so I assumed that was why, but could it have been due to this regression issue? I suppose I could always just test and find out a bit later, but doesn't hurt to ask.
Anyway, thanks again.
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
Like my fixes? Dontations can be made to: www.paypal.me/DShanz or rshannonca@gmail.com
Like electronic music? Check out: www.soundcloud.com/dj-ryan-king
1- Something weird must be going on with hunting mode, because fps tank HARD even when the overlay is not visible. Like 23-30 fps in Grim Dawn. I tried disabling all kinds of logging, including "dump_usage=0". Setting "hunting=0" was the only solution I found.
2- bo3b, when you talked about "allow_create_device=1" I thought it was weird that you didn't mention "allow_create_device=2", which is the default value. I can see why. "2" keeps crashing Grim Dawn at boot. "1" doesn't make it crash anymore. Not that the game needs any value higher than 0 for anything. Just that 0 and 1 are fine now, and only 0 worked before.
3- About fps (when "hunting=0"), I seem to get very slightly higher fps. With all pets in the first town, I now get consistently 46fps instead of 45, with some short peaks to 47. Almost within margin of error and still far from not using 3Dmigoto.
CPU: Intel Core i7 7700K @ 4.9GHz
Motherboard: Gigabyte Aorus GA-Z270X-Gaming 5
RAM: GSKILL Ripjaws Z 16GB 3866MHz CL18
GPU: MSI GeForce RTX 2080Ti Gaming X Trio
Monitor: Asus PG278QR
Speakers: Logitech Z506
Donations account: masterotakusuko@gmail.com
I don't think that I need commit access for now - this was just a dirty hack for that problem until there is a better solution and the next 3Dmigoto version is released. If I write patch next time, I'll send a github pull request.
My reason for that build was that I didn't know if somebody else is also stuck at this bug and how long it takes until the next 3Dmigoto release. To prevent further version number chaos, I removed my attachment from the previous post.
My 3D fixes with Helixmod for the Risen series on GitHub
Bo3b's School for Shaderhackers - starting point for your first 3D fix
The index buffer hash isn't used when copying an index buffer, so I doubt this will make any difference (this performance issue was a passive one that could knock a few fps off any CPU bound game regardless of what features they were using), but certainly do let me know.
When you say huge - how huge is huge? Games can combine tones of these into a single buffer, so it certainly could be quite large. 3DMigoto knocks off the offset into the buffer for the current draw call when it copies it, so it won't actually copy the full buffer, but that's more intended to make the buffer easier to use in the destination shader (so you don't have to deal with the offsets) rather than save on performance, and it still copies to the end of the buffer.
Performance problems when copying a resource are usually due to performing too many full copies in a frame - I recommend using the frame analysis log to check this, searching for these strings:
"performing full copy" - 3DMigoto performed a full copy - expensive.
"performing region copy" - 3DMigoto performed a partial full copy, but knocked off an offset - still expensive.
"copying by reference" - Just a lightweight reference, good for performance.
"max_copies_per_frame exceeded" - 3DMigoto would have performed a full copy, but this limit prevented it.
Depending on your situation there are some strategies you can use to try to avoid this:
- Copy by reference. Since you are copying index buffers into a different resource slot it is unlikely you would be able to use this because the bind flags will likely be wrong.
- Set max_copies_per_frame=1 in the destination [Resource] section, but this is no good if you legitimately need an updated copy every time. If you only need an updated copy once per scene (for games that render multiple scenes every frame) you can reset this counter from a [ClearRenderTargetView] section (refer to my latest Unity template for an example).
- Use an intermediate resource that you copy everything to by reference, then do a full copy of that resource only when you actually need it in the destination shader. This is suitable for situations where you are trying to opportunistically get access to some resource, but only need it once or twice later - I use this approach a lot for depth buffers. You can also perform a full copy from the first intermediate resource into a second using max_copies_per_frame=1 to limit the number of full copies if you use this resource in a lot of shaders or HUD elements.
- Use a custom shader to save only the information you need to a UAV or custom render target. This is by far the most flexible option if you do need updated copies for every draw call, but the flexibility means it is kind of hard to just point to one example to show how it is done. It will still have a performance cost since you are now running extra shaders, and it's hard to say how that will compare with the performance cost of doing the full copy. You could study my HUD analysis shaders in Dreamfall Chapters, which use a UAV to track state across multiple draw calls in a frame, including the positions of up to eight text elements drawn in a frame.Scratch that - you'd still need to copy the index buffer into the custom shader making this quite pointless.2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
Bugger :(
The fact that hunting=0 fixes it suggests this might be something other than the index buffer hash (the cost of performing the hash is paid regardless of hunting, and the additional performance cost that was showing up in our profiling should now only be paid when hunting=1, but not when hunting=2, and should be lower than 1.2.65 regardless). There's a few new things in this release related to ShaderRegex that could have cost performance, but nothing really makes sense regarding your observation on hunting modes... I might just have to grab this game for the shared accounts and do some profiling.
Edit: I'm thinking this might be an interaction with the hash contamination detection...
Ok, so we still have more work to do.
In that case, just a heads up to anyone thinking about using index buffer filtering - we might change how you have to define [TextureOverride] sections for these to indicate what kind of buffer you are using for the next release, so you may wish to hold off using this, or be prepared to make some minor changes to d3dx.ini for 1.2.67. I'll know more after examining this game in closer detail.Nope, this doesn't show up in profiling so leaving it as is.2x Geforce GTX 980 in SLI provided by NVIDIA, i7 6700K 4GHz CPU, Asus 27" VG278HE 144Hz 3D Monitor, BenQ W1070 3D Projector, 120" Elite Screens YardMaster 2, 32GB Corsair DDR4 3200MHz RAM, Samsung 850 EVO 500G SSD, 4x750GB HDD in RAID5, Gigabyte Z170X-Gaming 7 Motherboard, Corsair Obsidian 750D Airflow Edition Case, Corsair RM850i PSU, HTC Vive, Win 10 64bit
Alienware M17x R4 w/ built in 3D, Intel i7 3740QM, GTX 680m 2GB, 16GB DDR3 1600MHz RAM, Win7 64bit, 1TB SSD, 1TB HDD, 750GB HDD
Pre-release 3D fixes, shadertool.py and other goodies: http://github.com/DarkStarSword/3d-fixes
Support me on Patreon: https://www.patreon.com/DarkStarSword or PayPal: https://www.paypal.me/DarkStarSword
Crap, you know what? I confused the vertex buffer with an index buffer, it's the vertex buffer that was massive and caused major slowdown when copied over into a texture slot. Not sure if that makes any difference to your comments. Either which way, I probably won't bother messing around with it more for the current game if it's not a simple matter of swapping out the .dll since I'm happy enough with the implementation I was able to achieve.
3D Gaming Rig: CPU: i7 7700K @ 4.9Ghz | Mobo: Asus Maximus Hero VIII | RAM: Corsair Dominator 16GB | GPU: 2 x GTX 1080 Ti SLI | 3xSSDs for OS and Apps, 2 x HDD's for 11GB storage | PSU: Seasonic X-1250 M2| Case: Corsair C70 | Cooling: Corsair H115i Hydro cooler | Displays: Asus PG278QR, BenQ XL2420TX & BenQ HT1075 | OS: Windows 10 Pro + Windows 7 dual boot
Like my fixes? Dontations can be made to: www.paypal.me/DShanz or rshannonca@gmail.com
Like electronic music? Check out: www.soundcloud.com/dj-ryan-king