A short update about my experiences with toggling shaders and the problem that certain code within the if_eq statements cause helixmod to ignore the override: I included the toggle mechanism in other shaders where not only one line (texld r2, r0, s..) but much more code had to be included in if_eq statements. As once again the override shaders were ignored by helixmod I had to filter out other code that caused issues. In some cases the presence of texldp in if_eq statements was the reason.
To keep texldp out of the if_eq statements I used a workaround. So instead of
[code]if_eq r30.x, c3.x
texldp r2, r2, s5
else
texldp r1, r1, s5
endif[/code]
I used:
[code]if_eq r30.x, c3.x
mov r9, r2
else
mov r9, r1
endif
texldp r9, r9, s5
if_eq r30.x, c3.x
mov r2, r9
else
mov r1, r9
endif [/code]
As texldp is outside if_eq this code was accepted and toggling finally worked.
Is there another way to avoid this issue similar to replacing texld with texldl?
I'm looking forward to lesson 6 but I wanted to finish my Sims 4 (hot)fix first ;)
A short update about my experiences with toggling shaders and the problem that certain code within the if_eq statements cause helixmod to ignore the override: I included the toggle mechanism in other shaders where not only one line (texld r2, r0, s..) but much more code had to be included in if_eq statements. As once again the override shaders were ignored by helixmod I had to filter out other code that caused issues. In some cases the presence of texldp in if_eq statements was the reason.
To keep texldp out of the if_eq statements I used a workaround. So instead of
As texldp is outside if_eq this code was accepted and toggling finally worked.
Is there another way to avoid this issue similar to replacing texld with texldl?
I'm looking forward to lesson 6 but I wanted to finish my Sims 4 (hot)fix first ;)
My original display name is 3d4dd - for some reason Nvidia changed it..?!
@3d4dd: That is some wacky stuff there. This doesn't surprise me too much, there are these sorts of bizarre restrictions at the ASM level, as they come from the underlying hardware. In this case, this is a multiple sample of a texture, like a 2x2 sample, and thus flow control can give wrong results, so they just make it an illegal operation instead.
Do you get any errors in the Assemble? I'd really expect the assembler to enforce these restrictions, but Microsoft's assembler seems pretty weak because they put all the emphasis on the HLSL compiler.
Here is the documentation for that specific restriction. I haven't heard of this exact one before, but you used the right technique to narrow it down to this restriction, without even knowing about it.
[url]http://msdn.microsoft.com/en-us/library/windows/desktop/bb219848(v=vs.85).aspx[/url]
Now curiously, the documentation suggests that those instructions should still work, because of the use of the temporary registers. Not the first time the docs don't agree with the runtime of course.
I don't think there is a replacement for texldp in this case, it's fairly special purpose.
Your code seems fine to use. The only other option I can see would be to do both texldp variants in the mainline code, and then use the if to decide which result to keep. Your way might be more efficient, but with more instructions (hard to tell how expensive texldp is). I'd keep your way.
@3d4dd: That is some wacky stuff there. This doesn't surprise me too much, there are these sorts of bizarre restrictions at the ASM level, as they come from the underlying hardware. In this case, this is a multiple sample of a texture, like a 2x2 sample, and thus flow control can give wrong results, so they just make it an illegal operation instead.
Do you get any errors in the Assemble? I'd really expect the assembler to enforce these restrictions, but Microsoft's assembler seems pretty weak because they put all the emphasis on the HLSL compiler.
Here is the documentation for that specific restriction. I haven't heard of this exact one before, but you used the right technique to narrow it down to this restriction, without even knowing about it.
Now curiously, the documentation suggests that those instructions should still work, because of the use of the temporary registers. Not the first time the docs don't agree with the runtime of course.
I don't think there is a replacement for texldp in this case, it's fairly special purpose.
Your code seems fine to use. The only other option I can see would be to do both texldp variants in the mainline code, and then use the if to decide which result to keep. Your way might be more efficient, but with more instructions (hard to tell how expensive texldp is). I'd keep your way.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
Hey bo3b,
I'm making my way through lesson 3, and in the video you've got the constants set to 250. But on the page you tell people to copy from, they're set to 220. If I've understood things correctly so far, this shouldn't really make a difference. But would it be better to change it to 250 on the page for consistency?
http://wiki.bo3b.net/index.php?title=Default_DX9Settings.ini
I see you've got it set to 220 in step 6 of the lesson, too. I'm going to proceed according to the video (changing them all to 250), but if I run into trouble I'll change it to 220.
EDIT: Aha. I'm further through now, and it all makes sense.
I'm making my way through lesson 3, and in the video you've got the constants set to 250. But on the page you tell people to copy from, they're set to 220. If I've understood things correctly so far, this shouldn't really make a difference. But would it be better to change it to 250 on the page for consistency?
http://wiki.bo3b.net/index.php?title=Default_DX9Settings.ini
I see you've got it set to 220 in step 6 of the lesson, too. I'm going to proceed according to the video (changing them all to 250), but if I run into trouble I'll change it to 220.
EDIT: Aha. I'm further through now, and it all makes sense.
I've got a question. You use r30, but I don't see you define it earlier. Does that mean it already existed, or does the mov command call it into existence? Also, does the letter have to be r? Could we just as easily have made it f90, for example?
EDIT: Rewatching part of it. So constants have to be c, and r is a register? What's the difference between a constant and a register?
I've got a question. You use r30, but I don't see you define it earlier. Does that mean it already existed, or does the mov command call it into existence? Also, does the letter have to be r? Could we just as easily have made it f90, for example?
EDIT: Rewatching part of it. So constants have to be c, and r is a register? What's the difference between a constant and a register?
[quote="Pirateguybrush"]I've got a question. You use r30, but I don't see you define it earlier. Does that mean it already existed, or does the mov command call it into existence? Also, does the letter have to be r? Could we just as easily have made it f90, for example?
EDIT: Rewatching part of it. So constants have to be c, and r is a register? What's the difference between a constant and a register?[/quote]For the regular registers like r30, we don't have to define them to use them. They are already available as temporary registers in every shader. We can only use from r0 to r31, and of course we try to avoid conflicts with things already in use. These have to start with the 'r' to signify they are temporary registers. (Unlike in high level languages, we cannot define our own variables.)
Constants are defined starting with 'c' instead, and the primary difference is that a constant register cannot be changed. So for example a "mov c250, r0" will not work, because c250 cannot change. The constants can be sent in from the calling program like you see in the header information, and also like the constants passed in by Helix, so those also are not explicitly declared. Anything that is only used locally like our c200 needs to be declared before it can be used.
You use constants just like registers, but with some restrictions. So for example doing a "mul r0, r0, c200" is legal, and works just like "mul r0, r0, r30" would, just using the constant as a parameter instead of a register. They sort of act like registers.
In general, the rules for ASM are a bit inconsistent, so we just have to follow their restrictions.
Here is a link for some extra info about the registers and constants:
[url]http://msdn.microsoft.com/en-us/library/windows/desktop/bb172920(v=vs.85).aspx[/url]
I think for the constants being passed in, I was planning on using c250 to pass in from DX9Settings.ini, but ran into a conflict in this game, so had to move to c220. The videos tend to not necessarily match the text because it's hard to edit. In general I am trying to make the text on the wiki the definitive version for copying.
Pirateguybrush said:I've got a question. You use r30, but I don't see you define it earlier. Does that mean it already existed, or does the mov command call it into existence? Also, does the letter have to be r? Could we just as easily have made it f90, for example?
EDIT: Rewatching part of it. So constants have to be c, and r is a register? What's the difference between a constant and a register?
For the regular registers like r30, we don't have to define them to use them. They are already available as temporary registers in every shader. We can only use from r0 to r31, and of course we try to avoid conflicts with things already in use. These have to start with the 'r' to signify they are temporary registers. (Unlike in high level languages, we cannot define our own variables.)
Constants are defined starting with 'c' instead, and the primary difference is that a constant register cannot be changed. So for example a "mov c250, r0" will not work, because c250 cannot change. The constants can be sent in from the calling program like you see in the header information, and also like the constants passed in by Helix, so those also are not explicitly declared. Anything that is only used locally like our c200 needs to be declared before it can be used.
You use constants just like registers, but with some restrictions. So for example doing a "mul r0, r0, c200" is legal, and works just like "mul r0, r0, r30" would, just using the constant as a parameter instead of a register. They sort of act like registers.
In general, the rules for ASM are a bit inconsistent, so we just have to follow their restrictions.
I think for the constants being passed in, I was planning on using c250 to pass in from DX9Settings.ini, but ran into a conflict in this game, so had to move to c220. The videos tend to not necessarily match the text because it's hard to edit. In general I am trying to make the text on the wiki the definitive version for copying.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
Thanks bo3b, I think I understand most of that. Your memory of why you changed it is correct, and the video explains that. I just hadn't made it all the way through before I asked my question. It was actually quite helpful to see a case where it didn't quite work though, and have the reason behind it explained.
You've said c250 can't change, but it can if it's toggled with a keypress as defined in dx9settings though, correct?
Thanks bo3b, I think I understand most of that. Your memory of why you changed it is correct, and the video explains that. I just hadn't made it all the way through before I asked my question. It was actually quite helpful to see a case where it didn't quite work though, and have the reason behind it explained.
You've said c250 can't change, but it can if it's toggled with a keypress as defined in dx9settings though, correct?
[quote="Pirateguybrush"]Thanks bo3b, I think I understand most of that. Your memory of why you changed it is correct, and the video explains that. I just hadn't made it all the way through before I asked my question. It was actually quite helpful to see a case where it didn't quite work though, and have the reason behind it explained.
You've said c250 can't change, but it can if it's toggled with a keypress as defined in dx9settings though, correct?[/quote]Good deal. Yeah, I was torn on leaving those mistakes in the video, but figured that the mistakes plus corrections are probably nearly as valuable as the always working case, because it's not always going to go to plan.
For the constants, that's a good point- they are constant, but only in the context of the shader itself. They do in fact change if the DX9Setting specifies that they would change. Just like all the other constant inputs to the shader in the header. They can change between invocations of the shader, but cannot be changed while the shader code is running.
This is one of those things that is invisible to me as a computer guy because I'm so accustomed to it. In formal terms it's called 'scope'. In the shader scope, they are constant, but in the game as a whole they can change.
Pirateguybrush said:Thanks bo3b, I think I understand most of that. Your memory of why you changed it is correct, and the video explains that. I just hadn't made it all the way through before I asked my question. It was actually quite helpful to see a case where it didn't quite work though, and have the reason behind it explained.
You've said c250 can't change, but it can if it's toggled with a keypress as defined in dx9settings though, correct?
Good deal. Yeah, I was torn on leaving those mistakes in the video, but figured that the mistakes plus corrections are probably nearly as valuable as the always working case, because it's not always going to go to plan.
For the constants, that's a good point- they are constant, but only in the context of the shader itself. They do in fact change if the DX9Setting specifies that they would change. Just like all the other constant inputs to the shader in the header. They can change between invocations of the shader, but cannot be changed while the shader code is running.
This is one of those things that is invisible to me as a computer guy because I'm so accustomed to it. In formal terms it's called 'scope'. In the shader scope, they are constant, but in the game as a whole they can change.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
[quote="bo3b"][quote="Pirateguybrush"]I've got a question. You use r30, but I don't see you define it earlier. Does that mean it already existed, or does the mov command call it into existence? Also, does the letter have to be r? Could we just as easily have made it f90, for example?
EDIT: Rewatching part of it. So constants have to be c, and r is a register? What's the difference between a constant and a register?[/quote]For the regular registers like r30, we don't have to define them to use them. They are already available as temporary registers in every shader. We can only use from r0 to r31, and of course we try to avoid conflicts with things already in use. These have to start with the 'r' to signify they are temporary registers. (Unlike in high level languages, we cannot define our own variables.)
Constants are defined starting with 'c' instead, and the primary difference is that a constant register cannot be changed. So for example a "mov c250, r0" will not work, because c250 cannot change. The constants can be sent in from the calling program like you see in the header information, and also like the constants passed in by Helix, so those also are not explicitly declared. Anything that is only used locally like our c200 needs to be declared before it can be used.
You use constants just like registers, but with some restrictions. So for example doing a "mul r0, r0, c200" is legal, and works just like "mul r0, r0, r30" would, just using the constant as a parameter instead of a register. They sort of act like registers.
In general, the rules for ASM are a bit inconsistent, so we just have to follow their restrictions.
Here is a link for some extra info about the registers and constants:
[url]http://msdn.microsoft.com/en-us/library/windows/desktop/bb172920(v=vs.85).aspx[/url]
I think for the constants being passed in, I was planning on using c250 to pass in from DX9Settings.ini, but ran into a conflict in this game, so had to move to c220. The videos tend to not necessarily match the text because it's hard to edit. In general I am trying to make the text on the wiki the definitive version for copying.[/quote]
Something else to remember with constants is that on any given instruction line you can only use one of them, though can access different components of the same constant. SO
1. This IS allowed "mad r0, r1, c1.x, c1.y" because only c1 is being used
2. This is NOT allowed "mad r0, r1, c1.x, c2.x" because it is trying to use c1 and c2 on the same line.
You have to do something like this instead:
mov r11.x, c2.x
mad r0, r1, c1.x, r11.x
You may have covered that already, but wanted to point it out just in case.
Pirateguybrush said:I've got a question. You use r30, but I don't see you define it earlier. Does that mean it already existed, or does the mov command call it into existence? Also, does the letter have to be r? Could we just as easily have made it f90, for example?
EDIT: Rewatching part of it. So constants have to be c, and r is a register? What's the difference between a constant and a register?
For the regular registers like r30, we don't have to define them to use them. They are already available as temporary registers in every shader. We can only use from r0 to r31, and of course we try to avoid conflicts with things already in use. These have to start with the 'r' to signify they are temporary registers. (Unlike in high level languages, we cannot define our own variables.)
Constants are defined starting with 'c' instead, and the primary difference is that a constant register cannot be changed. So for example a "mov c250, r0" will not work, because c250 cannot change. The constants can be sent in from the calling program like you see in the header information, and also like the constants passed in by Helix, so those also are not explicitly declared. Anything that is only used locally like our c200 needs to be declared before it can be used.
You use constants just like registers, but with some restrictions. So for example doing a "mul r0, r0, c200" is legal, and works just like "mul r0, r0, r30" would, just using the constant as a parameter instead of a register. They sort of act like registers.
In general, the rules for ASM are a bit inconsistent, so we just have to follow their restrictions.
I think for the constants being passed in, I was planning on using c250 to pass in from DX9Settings.ini, but ran into a conflict in this game, so had to move to c220. The videos tend to not necessarily match the text because it's hard to edit. In general I am trying to make the text on the wiki the definitive version for copying.
Something else to remember with constants is that on any given instruction line you can only use one of them, though can access different components of the same constant. SO
1. This IS allowed "mad r0, r1, c1.x, c1.y" because only c1 is being used
2. This is NOT allowed "mad r0, r1, c1.x, c2.x" because it is trying to use c1 and c2 on the same line.
You have to do something like this instead:
mov r11.x, c2.x
mad r0, r1, c1.x, r11.x
You may have covered that already, but wanted to point it out just in case.
I don't know what mad does yet, and I wouldn't say I entirely understand mov (though I think I'm starting to get it).
mov r11.x, c2.x will replace the x value of r11 with the x value from c2, yes?
[quote="Pirateguybrush"]mov r11.x, c2.x will replace the x value of r11 with the x value from c2, yes?[/quote]
Correct. Also, in Lesson 2, bo3b brought up another example:
[code]
def c1, 0.5, 1, -0.5, 0
...
mov oC0.xyzw, c1.wwww [/code]
c1.w is moved into oC0.x, oC0.y, oC0.z and oC0.w
[quote="Pirateguybrush"]I don't know what mad does yet, and I wouldn't say I entirely understand mov (though I think I'm starting to get it).
mov r11.x, c2.x will replace the x value of r11 with the x value from c2, yes?[/quote]
The main ones you will come across are as follows, though I am simplifying for now something called "swizzling" where you can map components (x,y,z,w) from one register to to different components in another register:
1. mov dest, src: copies the contents from src to dest, on a component by component (xyzw) basis
2. add dest, src1, src2: adds src1 and src2 on a componentwise basis and puts it in dest
3. mul dest, src1, src2: multiplies src1 and src2 on a componentwise basis and puts it in dest
4. rcp dest.n, src.m: Only works on one component, inverses src.m (m=x,y,z or w) and copies it into dest.n (and n can be different to m). This is used to do division, since SM3 does not have "div".
5. mad dest, src1, src2, src3: This is "multiply and add". Multiplies src1 and src2 in the same was as "mul", then adds to this result src3, and copies the result into dest
6. dp4 dest.n, src1, src3: This is a "dot product". There is also dp3 and dp2. Note the result is a single number copied to dest.n:
dest.n = (src1.x * src2.x) + (src1.y * src2.y) + (src1.z * src2.z) + (src1.w * src2.w)
If you get on top of these, all the rest are more of the same.
One last comment is that when using mov, add, mul and mad you really want to have the same number of components in all registers.
e.g. mul r1.xy, r2.xy, r3.xy
e.g. add r1.xy, r2.xy, r3.xx
You can get away with this:
e.g. add r1.xy, r2.xy, r3 - and the compiler will work out what bits of r3 to access based on r1.xy
But you cannot have (or at least it will be left undefined):
e.g. mul r1.xyz, r2.xy, r3.xy
Pirateguybrush said:I don't know what mad does yet, and I wouldn't say I entirely understand mov (though I think I'm starting to get it).
mov r11.x, c2.x will replace the x value of r11 with the x value from c2, yes?
The main ones you will come across are as follows, though I am simplifying for now something called "swizzling" where you can map components (x,y,z,w) from one register to to different components in another register:
1. mov dest, src: copies the contents from src to dest, on a component by component (xyzw) basis
2. add dest, src1, src2: adds src1 and src2 on a componentwise basis and puts it in dest
3. mul dest, src1, src2: multiplies src1 and src2 on a componentwise basis and puts it in dest
4. rcp dest.n, src.m: Only works on one component, inverses src.m (m=x,y,z or w) and copies it into dest.n (and n can be different to m). This is used to do division, since SM3 does not have "div".
5. mad dest, src1, src2, src3: This is "multiply and add". Multiplies src1 and src2 in the same was as "mul", then adds to this result src3, and copies the result into dest
6. dp4 dest.n, src1, src3: This is a "dot product". There is also dp3 and dp2. Note the result is a single number copied to dest.n:
If you get on top of these, all the rest are more of the same.
One last comment is that when using mov, add, mul and mad you really want to have the same number of components in all registers.
e.g. mul r1.xy, r2.xy, r3.xy
e.g. add r1.xy, r2.xy, r3.xx
You can get away with this:
e.g. add r1.xy, r2.xy, r3 - and the compiler will work out what bits of r3 to access based on r1.xy
But you cannot have (or at least it will be left undefined):
e.g. mul r1.xyz, r2.xy, r3.xy
[quote="InsaneInGame"]But what did happen was that when I disabled that shader, and looked around, aimed at a different surface, the whole game crashed! Very strange. I'm very new at this, so I don't know anything about the script you're using. Would you mind telling me how it works? Sounds interesting. [/quote]
I don't have a script.. I was just advocating for one =P. But what I did in regards to this was:
-Saved all 80+ vertex shaders one by one.
-I exited the game & used Notepad++ to sort the shaders (filter out the shaders I don't need).
-Edited the rest of the files with the aid of AutoHotKey (the actual correction is covered in Lesson 6).
-Put the edited files in the shader override directory.
-In game, I noticed that the water was fixed so I commented out the output positions of a group of shaders at a time until I narrowed down the one shader I was looking for.
There are probably much better ways to go about doing this. Actually, I have to go back & redo Lesson 5 since it says I have to find a PS, instead of a VS.
InsaneInGame said:But what did happen was that when I disabled that shader, and looked around, aimed at a different surface, the whole game crashed! Very strange. I'm very new at this, so I don't know anything about the script you're using. Would you mind telling me how it works? Sounds interesting.
I don't have a script.. I was just advocating for one =P. But what I did in regards to this was:
-Saved all 80+ vertex shaders one by one.
-I exited the game & used Notepad++ to sort the shaders (filter out the shaders I don't need).
-Edited the rest of the files with the aid of AutoHotKey (the actual correction is covered in Lesson 6).
-Put the edited files in the shader override directory.
-In game, I noticed that the water was fixed so I commented out the output positions of a group of shaders at a time until I narrowed down the one shader I was looking for.
There are probably much better ways to go about doing this. Actually, I have to go back & redo Lesson 5 since it says I have to find a PS, instead of a VS.
@PirateGuyBrush: I wouldn't worry too much about the details of the swizzling and even the instructions. In general, those get into more advanced fixes, where you might need to shuffle stuff around in registers, or tweak values before they are used or something.
For the first fixes, we'll be doing pretty much copy/paste type operations, where we just use a code snippet. It's helpful to know what some of those instructions do to find the right spot to paste the code snippet. But if it seems a little fuzzy that's not going to limit you.
I'll take an example. You've seen haloing around characters in some games I'm sure, where everything seems OK, but there is a halo around the main character, sometimes near water.
This type of problem happens because the NVidia driver automatically fixes the location of the underlying model, the wireframes, but doesn't automatically fix the textures being applied.
In code, we'll sometimes see stuff like:
[code] ...
dcl_texcoord7 o5
dcl_position o6
...
mov o5, r5
mov o6, r5
[/code]
That is clearly using the same output for both pieces. One is for the position, one is for a texture. The NVidia driver only fixes the position.
So to fix it, we'll add our magic code, the prime directive, which you'll get in Lesson 6. So the end fixed code would be:
[code]...
def c200, 1.0, 600, 0.0625, 0 // new
dcl_2d s0 // new
dcl_texcoord7 o5
dcl_position o6
...
// mov o5, r5 commented out
mov o6, r5
texldl r30, c200.z, s0
add r30.w, r5.w, -r30.y
mad r5.x, r30.x, r30.w, r5.x
mov o5, r5 // fixed
[/code]
Maybe that example makes it more clear. You need to understand and recognize the pattern of mov to two different outputs, and then paste in the magic code to fix the texture one.
Early on, and certainly up through intermediate type fixes, you only really need to be able to paste in the magic code sequence in the proper spot.
@PirateGuyBrush: I wouldn't worry too much about the details of the swizzling and even the instructions. In general, those get into more advanced fixes, where you might need to shuffle stuff around in registers, or tweak values before they are used or something.
For the first fixes, we'll be doing pretty much copy/paste type operations, where we just use a code snippet. It's helpful to know what some of those instructions do to find the right spot to paste the code snippet. But if it seems a little fuzzy that's not going to limit you.
I'll take an example. You've seen haloing around characters in some games I'm sure, where everything seems OK, but there is a halo around the main character, sometimes near water.
This type of problem happens because the NVidia driver automatically fixes the location of the underlying model, the wireframes, but doesn't automatically fix the textures being applied.
Maybe that example makes it more clear. You need to understand and recognize the pattern of mov to two different outputs, and then paste in the magic code to fix the texture one.
Early on, and certainly up through intermediate type fixes, you only really need to be able to paste in the magic code sequence in the proper spot.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607 Latest 3Dmigoto Release Bo3b's School for ShaderHackers
To keep texldp out of the if_eq statements I used a workaround. So instead of
I used:
As texldp is outside if_eq this code was accepted and toggling finally worked.
Is there another way to avoid this issue similar to replacing texld with texldl?
I'm looking forward to lesson 6 but I wanted to finish my Sims 4 (hot)fix first ;)
My original display name is 3d4dd - for some reason Nvidia changed it..?!
Do you get any errors in the Assemble? I'd really expect the assembler to enforce these restrictions, but Microsoft's assembler seems pretty weak because they put all the emphasis on the HLSL compiler.
Here is the documentation for that specific restriction. I haven't heard of this exact one before, but you used the right technique to narrow it down to this restriction, without even knowing about it.
http://msdn.microsoft.com/en-us/library/windows/desktop/bb219848(v=vs.85).aspx
Now curiously, the documentation suggests that those instructions should still work, because of the use of the temporary registers. Not the first time the docs don't agree with the runtime of course.
I don't think there is a replacement for texldp in this case, it's fairly special purpose.
Your code seems fine to use. The only other option I can see would be to do both texldp variants in the mainline code, and then use the if to decide which result to keep. Your way might be more efficient, but with more instructions (hard to tell how expensive texldp is). I'd keep your way.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
I'm making my way through lesson 3, and in the video you've got the constants set to 250. But on the page you tell people to copy from, they're set to 220. If I've understood things correctly so far, this shouldn't really make a difference. But would it be better to change it to 250 on the page for consistency?
http://wiki.bo3b.net/index.php?title=Default_DX9Settings.ini
I see you've got it set to 220 in step 6 of the lesson, too. I'm going to proceed according to the video (changing them all to 250), but if I run into trouble I'll change it to 220.
EDIT: Aha. I'm further through now, and it all makes sense.
EDIT: Rewatching part of it. So constants have to be c, and r is a register? What's the difference between a constant and a register?
Constants are defined starting with 'c' instead, and the primary difference is that a constant register cannot be changed. So for example a "mov c250, r0" will not work, because c250 cannot change. The constants can be sent in from the calling program like you see in the header information, and also like the constants passed in by Helix, so those also are not explicitly declared. Anything that is only used locally like our c200 needs to be declared before it can be used.
You use constants just like registers, but with some restrictions. So for example doing a "mul r0, r0, c200" is legal, and works just like "mul r0, r0, r30" would, just using the constant as a parameter instead of a register. They sort of act like registers.
In general, the rules for ASM are a bit inconsistent, so we just have to follow their restrictions.
Here is a link for some extra info about the registers and constants:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb172920(v=vs.85).aspx
I think for the constants being passed in, I was planning on using c250 to pass in from DX9Settings.ini, but ran into a conflict in this game, so had to move to c220. The videos tend to not necessarily match the text because it's hard to edit. In general I am trying to make the text on the wiki the definitive version for copying.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
You've said c250 can't change, but it can if it's toggled with a keypress as defined in dx9settings though, correct?
For the constants, that's a good point- they are constant, but only in the context of the shader itself. They do in fact change if the DX9Setting specifies that they would change. Just like all the other constant inputs to the shader in the header. They can change between invocations of the shader, but cannot be changed while the shader code is running.
This is one of those things that is invisible to me as a computer guy because I'm so accustomed to it. In formal terms it's called 'scope'. In the shader scope, they are constant, but in the game as a whole they can change.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers
Something else to remember with constants is that on any given instruction line you can only use one of them, though can access different components of the same constant. SO
1. This IS allowed "mad r0, r1, c1.x, c1.y" because only c1 is being used
2. This is NOT allowed "mad r0, r1, c1.x, c2.x" because it is trying to use c1 and c2 on the same line.
You have to do something like this instead:
mov r11.x, c2.x
mad r0, r1, c1.x, r11.x
You may have covered that already, but wanted to point it out just in case.
Rig: Intel i7-8700K @4.7GHz, 16Gb Ram, SSD, GTX 1080Ti, Win10x64, Asus VG278
mov r11.x, c2.x will replace the x value of r11 with the x value from c2, yes?
Correct. Also, in Lesson 2, bo3b brought up another example:
c1.w is moved into oC0.x, oC0.y, oC0.z and oC0.w
Dual boot Win 7 x64 & Win 10 (1809) | Geforce Drivers 417.35
The main ones you will come across are as follows, though I am simplifying for now something called "swizzling" where you can map components (x,y,z,w) from one register to to different components in another register:
1. mov dest, src: copies the contents from src to dest, on a component by component (xyzw) basis
2. add dest, src1, src2: adds src1 and src2 on a componentwise basis and puts it in dest
3. mul dest, src1, src2: multiplies src1 and src2 on a componentwise basis and puts it in dest
4. rcp dest.n, src.m: Only works on one component, inverses src.m (m=x,y,z or w) and copies it into dest.n (and n can be different to m). This is used to do division, since SM3 does not have "div".
5. mad dest, src1, src2, src3: This is "multiply and add". Multiplies src1 and src2 in the same was as "mul", then adds to this result src3, and copies the result into dest
6. dp4 dest.n, src1, src3: This is a "dot product". There is also dp3 and dp2. Note the result is a single number copied to dest.n:
dest.n = (src1.x * src2.x) + (src1.y * src2.y) + (src1.z * src2.z) + (src1.w * src2.w)
If you get on top of these, all the rest are more of the same.
One last comment is that when using mov, add, mul and mad you really want to have the same number of components in all registers.
e.g. mul r1.xy, r2.xy, r3.xy
e.g. add r1.xy, r2.xy, r3.xx
You can get away with this:
e.g. add r1.xy, r2.xy, r3 - and the compiler will work out what bits of r3 to access based on r1.xy
But you cannot have (or at least it will be left undefined):
e.g. mul r1.xyz, r2.xy, r3.xy
Rig: Intel i7-8700K @4.7GHz, 16Gb Ram, SSD, GTX 1080Ti, Win10x64, Asus VG278
I don't have a script.. I was just advocating for one =P. But what I did in regards to this was:
-Saved all 80+ vertex shaders one by one.
-I exited the game & used Notepad++ to sort the shaders (filter out the shaders I don't need).
-Edited the rest of the files with the aid of AutoHotKey (the actual correction is covered in Lesson 6).
-Put the edited files in the shader override directory.
-In game, I noticed that the water was fixed so I commented out the output positions of a group of shaders at a time until I narrowed down the one shader I was looking for.
There are probably much better ways to go about doing this. Actually, I have to go back & redo Lesson 5 since it says I have to find a PS, instead of a VS.
Dual boot Win 7 x64 & Win 10 (1809) | Geforce Drivers 417.35
Dual boot Win 7 x64 & Win 10 (1809) | Geforce Drivers 417.35
For the first fixes, we'll be doing pretty much copy/paste type operations, where we just use a code snippet. It's helpful to know what some of those instructions do to find the right spot to paste the code snippet. But if it seems a little fuzzy that's not going to limit you.
I'll take an example. You've seen haloing around characters in some games I'm sure, where everything seems OK, but there is a halo around the main character, sometimes near water.
This type of problem happens because the NVidia driver automatically fixes the location of the underlying model, the wireframes, but doesn't automatically fix the textures being applied.
In code, we'll sometimes see stuff like:
That is clearly using the same output for both pieces. One is for the position, one is for a texture. The NVidia driver only fixes the position.
So to fix it, we'll add our magic code, the prime directive, which you'll get in Lesson 6. So the end fixed code would be:
Maybe that example makes it more clear. You need to understand and recognize the pattern of mov to two different outputs, and then paste in the magic code to fix the texture one.
Early on, and certainly up through intermediate type fixes, you only really need to be able to paste in the magic code sequence in the proper spot.
Acer H5360 (1280x720@120Hz) - ASUS VG248QE with GSync mod - 3D Vision 1&2 - Driver 372.54
GTX 970 - i5-4670K@4.2GHz - 12GB RAM - Win7x64+evilKB2670838 - 4 Disk X25 RAID
SAGER NP9870-S - GTX 980 - i7-6700K - Win10 Pro 1607
Latest 3Dmigoto Release
Bo3b's School for ShaderHackers