< /p>
{$IFDEF FPC}
{$ASMMODE INTEL}
{$ENDIF}
procedure DoesNotCompile;
asm
LEA ECX,[ECX + ESI + $265E5A51]
end;
// Error: Asm: 16 or 32 Bit references not supported
Possible solutions are:
procedure Compiles1;
asm
ADD ECX,ESI
ADD ECX,$265E5A51
end ;
procedure Compiles2;
asm
LEA ECX,[RCX + RSI + $265E5A51]
end;
I just don’t understand Win64 What is wrong with the 32-bit LEA instruction in the target (it compiles normally in 32-bit Delphi, so it is the correct CPU instruction).
Optimization notes:
The next code is composed of 64-bit FPC 2.6.2 compilation
{$MODE DELPHI}
{$ASMMODE INTEL}
procedure Test;
asm
LEA ECX,[RCX + RSI + $265E5A51]
NOP
LEA RCX,[RCX + RSI + $265E5A51]
NOP
ADD ECX,$265E5A51
ADD ECX,ESI
NOP
end;Generate the next assembler output:
00000000004013F0 4883ec08 sub $0x8,%rsp
project1.lpr:10 LEA ECX,[RCX + RSI + $265E5A51]
00000000004013F4 8d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%ecx
project1.lpr:11 NOP
00000000004013FB 90 nop
project1.lpr:12 LEA RCX,[RCX + RSI + $265E5A51]
00000000004013FC 488d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%rcx
project1.lpr:13 NOP
0000000000401404 90 nop
project1.lpr:14 ADD ECX,$265E5A51
0000000000401405 81c1515a5e26 add $0x265e5a51,%ecx
project1.lpr:15 ADD ECX,ESI
000000000040140B 01f1 add %esi,%ecx< br /> project1.lpr:16 NOP
000000000040140D 90 n op
project1.lpr:17 end;
000000000040140E 4883c408 add $0x8,%rspThe winner is (7 bytes long):
< /p>
LEA ECX,[RCX + RSI + $265E5A51]All 3 alternatives (including LEA ECX, [ECX ESI $265E5A51], do not pass 64-bit FPC compilation) length Is 8 bytes.
Not sure if the speed of the winner is the best.
To solve this problem, you need to assemble the code manually:
DQ $265e5a510e8c8d67 pre>In the Delphi CPU view, it is displayed as:
Project1.dpr.12: DQ $265e5a510e8c8d67
0000000000424160 678D8C0E515A5E26 lea ecx,[esi+ ecx+$265e5a51]I performed a very simple benchmark to compare the use of 32-bit and 64-bit operands, as well as the two ADD versions. The code is as follows:< /p>
{$APPTYPE CONSOLE}
uses
System.Diagnostics;
function BenchWithTwoAdds: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@lo op:
ADD EAX,ESI
ADD EAX,$265E5A51
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
function BenchWith32bitOperands: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
LEA EAX,[EAX + ESI + $265E5A51]
DEC ECX
CMP ECX,0
JNZ @loop< br /> MOV ESI,EDX
end;
{$IFDEF CPUX64}
function BenchWith64bitOperands: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
LEA EAX,[RAX + RSI + $265E5A51]
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
{$ENDIF}
var
Stopwatch: TStopwatch;
begin
{$IFDEF CPUX64}
Writeln('64 bit');
{$ELSE}
Writeln( '32 bit');
{$ENDIF}
Writeln;
Writeln('BenchWithTwoAdds');
Stopwatch := TStopwatch.StartNew;
Writeln('Value =', BenchWithTwoAdds);
Writeln('Elapsed time =', Stopwatch.ElapsedMilliseconds);< br /> Writeln;
Writeln('BenchWith32bitOperands');
Stopwatch := TStopwatch.StartNew;
Writeln('Value =', BenchWith32bitOperands);
Writeln ('Elapsed time =', Stopwatch.ElapsedMilliseconds);
Writeln;
{$IFDEF CPUX64}
Writeln('BenchWith64bitOperands');
Stopwatch := TStopwatch .StartNew;
Writeln('Value =', BenchWith64bitOperands);
Writeln('Elapsed time =', Stopwatch.ElapsedMilliseconds);
{$ENDIF}
Readln;
end.The output of Intel i5-2300:
32 bit
BenchWithTwoAdds
Value = -644343429
Elapsed time = 2615
BenchWith32bitOperands
Value = -644343429
Elapsed time = 3915
---- ------------------
64 bit
BenchWithTwoAdds
Value = -6 44343429
Elapsed time = 2612
BenchWith32bitOperands
Value = -644343429
Elapsed time = 3917
BenchWith64bitOperands
Value =- 644343429
Elapsed time = 3918As you can see, there is no choice between the LEA options based on this. The difference between their time is completely measured. Within the degeneration. However, the ADD variant won twice.
Some different results on different machines. This is the output of Xeon E5530:
64 bit
BenchWithTwoAdds
Value = -644343429
Elapsed time = 3434
BenchWith32bitOperands
Value = -644343429
Elapsed time = 3295
BenchWith64bitOperands
Value = -644343429
Elapsed time = 3279On Xeon E5-4640 v2:
64 bit
BenchWithTwoAdds
Value = -644343429
Elapsed time = 4102
BenchWith32bitOperands
Value = -644343429
Elapsed time = 5868
BenchWith64bitOperands
Value = -644343429
Elapsed time = 5868
< /p>
I am porting 32-bit Delphi BASM code to 64-bit FPC (Win64 target operating system), and want to know why the next instruction cannot be compiled in 64-bit FPC:
{$IFDEF FPC}
{$ASMMODE INTEL}
{$ENDIF}
procedure DoesNotCompile;
asm
LEA ECX,[ECX + ESI + $265E5A51]
end;
// Error: Asm: 16 or 32 Bit references not supported
The possible solution is:
procedure Compiles1;
asm
ADD ECX,ESI
ADD ECX,$265E5A51
end;
procedure Compiles2;
asm
LEA ECX,[RCX + RSI + $265E5A51]
end;
I just don’t understand what’s wrong with the 32-bit LEA instruction in the Win64 target (It compiles normally in 32-bit Delphi, so it is the correct CPU instruction).
Optimization notes:
The next code is compiled by 64-bit FPC 2.6.2
{$MODE DELPHI}
{$ASMMODE INTEL}
procedure Test;
asm
LEA ECX,[RCX + RSI + $265E5A51]
NOP
LEA RCX,[RCX + RSI + $265E5A51]
NOP
ADD ECX,$265E5A51
ADD ECX,ESI
NOP
end;
Generate the next assembler output:
00000000004013F0 4883ec08 sub $0x8,%rsp
project1.lpr:10 LEA ECX,[RCX + RSI + $265E5A51]
00000000004013F4 8d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%ecx
project1.lpr:11 NOP
00000000004013FB 90 nop
project1.lpr:12 LEA RCX,[RCX + RSI + $265E5A51]
00000000004013FC 488d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%rcx
project1.lpr :13 NOP
0000000000401404 90 nop
project1.lpr:14 ADD ECX,$265E5A51
0000000000401405 81c1515a5e26 add $0x265e5a51,%ecx
project1.lpr:15 ADD ECX,ESI< br />000000000040140B 01f1 add %esi,%ecx
project1.lpr:16 NOP
000000000040140D 90 nop
project1.lpr:17 end;
000000000040140E 4883c408 add $0x8,% rsp
The winner is (7 bytes long):
LEA ECX,[RCX + RSI + $265E5A51]
All 3 devices The options (including LEA ECX, [ECX ESI $265E5A51], not compiled by 64-bit FPC) are 8 bytes in length.
Not sure if the speed of the winner is the best.
< /p>
I think this is an error in the FPC assembler. The asm code you provided is valid. In 64-bit mode, using LEA with 32-bit registers is completely valid The Intel processor documentation is clear. Delphi 64-bit inline assembler accepts this code.
To solve this problem, you need to assemble the code manually:
p>
DQ $265e5a510e8c8d67
In the Delphi CPU view, it is displayed as:
Project1.dpr.12: DQ $265e5a510e8c8d67
0000000000424160 678D8C0E515A5E26 lea ecx,[esi+ecx+$265e5a51]
I performed a very simple benchmark to compare the use of 32-bit and 64-bit operands, as well as using two The version of ADD. The code is as follows:
{$APPTYPE CONSOLE}
uses
System.Diagnostics;
< br />function BenchWithTwoAdds: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
ADD EAX,ESI
ADD EAX,$265E5A51
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
function BenchWith32bitOperan ds: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI,$98C34
MOV ECX,$ffffffff
@loop:
LEA EAX,[EAX + ESI + $265E5A51]
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
{$IFDEF CPUX64}
function BenchWith64bitOperands: Integer;
asm
MOV EDX,ESI
XOR EAX,EAX
MOV ESI, $98C34
MOV ECX,$ffffffff
@loop:
LEA EAX,[RAX + RSI + $265E5A51]
DEC ECX
CMP ECX,0
JNZ @loop
MOV ESI,EDX
end;
{$ENDIF}
var
Stopwatch: TStopwatch;
begin
{$IFDEF CPUX64}
Writeln('64 bit');
{$ELSE}
Writeln('32 bit');
{$ENDIF }
Writeln;
Writeln('BenchWithTwoAdds');
Stopwatch := TStopwatch.StartNew;
Writeln('Value =', BenchWithTwoAdds);
Writeln('Elapsed time =', Stopwatch.ElapsedMilliseconds);
W riteln;
Writeln('BenchWith32bitOperands');
Stopwatch := TStopwatch.StartNew;
Writeln('Value =', BenchWith32bitOperands);
Writeln('Elapsed time =', Stopwatch.ElapsedMilliseconds);
Writeln;
{$IFDEF CPUX64}
Writeln('BenchWith64bitOperands');
Stopwatch := TStopwatch.StartNew;
Writeln('Value =', BenchWith64bitOperands);
Writeln('Elapsed time =', Stopwatch.ElapsedMilliseconds);
{$ENDIF}
Readln;< br />end.
The output of Intel i5-2300:
32 bit
BenchWithTwoAdds
Value = -644343429
Elapsed time = 2615
BenchWith32bitOperands
Value = -644343429
Elapsed time = 3915
------- ---------------
64 bit
BenchWithTwoAdds
Value = -644343429
Elapsed time = 2612
BenchWith32bitOperands
Value = -644343429
Elapsed time = 3917
BenchWith64bitOperands
Value = -644343429
Elapsed time = 3918
As you can see, there is no choice between LEA options based on this. The difference between their time is completely within the variability of the measurement. However, use The ADD variant won twice.
Some different results on different machines. This is the output of Xeon E5530:
64 bit
< br />BenchWithTwoAdds
Value = -644343429
Elapsed time = 3434
BenchWith32bitOperands
Value = -644343429
Elapsed time = 3295
< br />BenchWith64bitOperands
Value = -644343429
Elapsed time = 3279
On Xeon E5-4640 v2:
64 bit
BenchWithTwoAdds
Value = -644343429
Elapsed time = 4102
BenchWith32bitOperands
Value = -644343429
Elapsed time = 5868
BenchWith64bitOperands
Value = -644343429
Elapsed time = 5868