How to calculate a sinus value somewhere, then move to XMM0 in Assembly? - Assembly, Calculation, How to, Moving, sinusoidal, somewhere, then, value, XMM, XMM0

I was doing integration tasks with FPU before, and now I am fighting with SSE.

My main problem is that when I use the FPU stack, There is the fsin function, which can be used on numbers, it is at the top of the stack (st0).

Now I want to calculate the sine of all my four numbers in XMM0, or calculate it elsewhere and enter XMM0. I am using AT&T syntax.

I think the second idea is actually possible, but I don’t know how 🙂

Does anyone know how to do it?

Three choices:

>Use and existing Library to calculate the sin of the SSE vector.
>Use SSE to write your own vector sin function.
>Store the vector in memory, use fsin to calculate the sine of each element, and then load the result. Assuming your stack It is 16-byte aligned and has 16-byte space, as shown below:

movaps %xmm0, (%rsp)
mov $3, %rcx
0: flds (%rsp,%rcx,4)
fsin
fstps (%rsp,%rcx,4)
sub $1, %rcx
jns 0b

(1) is almost certainly the best-performing choice and the easiest choice. If you have extensive experience in writing vector code and know a priori that the parameters belong to a certain range, then you can pass (2) Get better performance. Using fsin will work, but if it matters, it will be ugly, slow and not particularly accurate.

I I was doing integration tasks with FPU before, and now I am struggling with SSE.

My main problem is that when I use the FPU stack, there are fsin functions that can be used numerically, which is located at The top of the stack (st0).

Now I want to calculate the sine of all my four numbers in XMM0, or calculate it elsewhere and enter XMM0. I am using AT&T syntax.

I think the second idea is actually possible, but I don't know how :)

Does anyone know how to do it?

Three choices:

>Use and existing library to calculate the sin of the SSE vector.
> Use SSE to write your own vector sin function.
>Store the vector into memory, use fsin to calculate the sine of each element, and then load the result. Assuming your stack is 16-byte aligned and has 16-byte space , As follows:

movaps %xmm0, (%rsp)
 mov $3, %rcx
0: flds (%rsp,%rcx, 4)
 fsin
 fstps (%rsp,%rcx,4)
 sub $1, %rcx
 jns 0b

(1) is almost certainly The best performance choice is also the easiest choice. If you have rich experience in writing vector code and know a priori that the parameters belong to a certain range, then you can get better performance through (2). Using fsin will It works, but if it matters, it will be ugly, slow and not particularly accurate.

Leave a Comment Cancel reply