What is the purpose of the “pause” instruction in X86?

I am trying to create a dumb version of a spin lock. While browsing the web, I encountered an assembly instruction called “PAUSE” in x86, which is used to send the processor Provide hints for currently running spin locks on this CPU. Intel manuals and other available information explain this

The processor uses this hint to avoid the memory order violation in
most situations, which greatly improves processor performance. For
this reason, it is recommended that a PAUSE instruction be placed in
all spin-wait loops. The documentation also mentions that “wait (some
delay)” is the pseudo implementation of the instruction.

The last line of the previous paragraph is very intuitive. If I don’t succeed in grabbing the lock, I have to wait for a while and then grab Keep the lock.

But, in the case of rotation lock, what does the memory order violation mean?
Does “memory order violation” mean incorrectly speculative load/store of instructions after spin lock?

About stack overflow, I have asked about the spin-lock issue before, but the memory order violation issue has not been resolved (at least for my understanding).

Imagine how the processor performs a typical spin-wait loop:

1 Spin_Lock:< br />2 CMP lockvar, 0; Check if lock is free
3 JE Get_Lock
4 JMP Spin_Lock
5 Get_Lock:

After several iterations, branch prediction The processor will predict that the conditional branch will never be taken (3) and the pipeline will be filled with CMP instructions (2). This continues until finally another processor writes zeros to the lockvar. At this point, our pipeline is full of speculation (i.e. Not yet submitted) CMP instructions, some of which have read lockvar and reported (incorrect) non-zero results (also speculative) to the following conditional branch (3). This is when a memory order violation occurs. Whenever the processor “sees” an external write (a write from another processor), it searches its pipeline for instructions that speculatively access the same memory location but have not yet been committed. If it finds any such instructions, Then the speculative state of the processor is invalid and flushed and erased with a pipeline.

Unfortunately, this situation (very likely) will repeat every time the processor waits for the spin lock, and make these locks Much slower than they should be.

Enter the PAUSE instruction:

1 Spin_Lock:
2 CMP lockvar, 0; Check if lock is free
3 JE Get_Lock
4 PAUSE; Wait for memory pipeline to become empty
5 JMP Spin_Lock
6 Get_Lock:

PAUSE instruction will “decompress “Memory read, so the pipeline is not filled with speculative CMP(2) instructions like in the first example. (That is, it can block the pipeline until all old memory instructions are submitted.) Because CMP instructions (2) are executed sequentially, Therefore, it is unlikely that an external write operation will occur after the CMP instruction (2) is read (that is, the time window is much shorter) loc kvar but before submitting CMP.

Of course, “de-pipeline” will also waste less energy in the spin lock, and in the case of hyperthreading, it can be better not to waste other threads On the other hand, there are still branch mispredictions waiting to occur before each loop exits. Intel’s documentation does not imply that PAUSE eliminates pipeline flushing, but who knows…

< /p>

I am trying to create a dumb version of a spin lock. When browsing the web, I encountered an assembly instruction called “PAUSE” in x86, which is used to provide the processor with the current CPU Tips for running a spinlock on the computer. The Intel manual and other available information explain this.

The processor uses this hint to avoid the memory order violation in
most situations, which greatly improves processor performance. For
this reason, it is recommended that a PAUSE instruction be placed in
all spin-wait loops. The documentation also mentions that “wait(some
delay)” is the pseudo implementation of the instruction.

The last line of the previous paragraph is very intuitive. If I do not successfully seize the lock, I must wait for a while and then seize the lock.

p>

But what does the memory order violation mean in the case of rotation lock?
Does “memory order violation” mean incorrectly speculative load/store of instructions after spin lock?

About the stack overflow, I have asked about the spin lock issue before, but the memory order violation issue has not been resolved (at least for my understanding).

Imagine how the processor performs a typical spin wait loop:

1 Spin_Lock:
2 CMP lockvar, 0; Check if lock is free
3 JE Get_Lock
4 JMP Spin_Lock
5 Get_Lock:

After a few iterations, the branch predictor will predict that it will never take a conditional branch (3 ) And the pipeline will be filled with CMP instructions (2). This continues until the last another processor writes zeros to lockvar. At this point, our pipeline is full of speculative (i.e. not yet committed) CMP instructions, some of which have already been read Lockvar, and reported (incorrect) non-zero results (also speculative) to the following conditional branch (3). This is when a memory order violation occurs. Whenever the processor “sees” an external write ( When writing from another processor), it searches its pipeline for instructions that speculatively access the same memory location but has not yet submitted. If any such instructions are found, the speculative state of the processor is invalid and the pipeline is used Refresh and erase.

Unfortunately, this situation (very likely) will repeat every time the processor waits for spin locks, and makes these locks much slower than they should be.

Enter the PAUSE command:

1 Spin_Lock:
2 CMP lockvar, 0; Check if lock is free
3 JE Get_Lock
4 PAUSE; Wait for memory pipeline to become empty
5 JMP Spin_Lock
6 Get_Lock:

The PAUSE instruction will “decompress” the memory read, so there is nothing like the first in the pipeline Fill in the speculative CMP(2) instruction as in the example. (That is, it can block the pipeline until all the old memory instructions are submitted.) Because the CMP instruction (2) is executed sequentially, the external operation occurs after the CMP instruction (2) is read. Write operations are unlikely (i.e., the time window is much shorter) lockvar but before submitting CMP.

Of course, “de-pipeline” will also waste less energy in the spin lock, and in the ultra In the case of threads, it will not Wasting resources that other threads can better use. On the other hand, there are still branch mispredictions waiting to occur before each loop exits. Intel’s documentation does not imply that PAUSE eliminates pipeline flushing, but who knows…

Leave a Comment

Your email address will not be published.