How to calculate a sinus value somewhere, then move to XMM0 in Assembly?

I was doing integration tasks with FPU before, and now I am fighting with SSE.

My main problem is that when I use the FPU stack, There is the fsin function, which can be used on numbers, it is at the top of the stack (st0).

Now I want to calculate the sine of all my four numbers in XMM0, or calculate it elsewhere and enter XMM0. I am using AT&T syntax.

I think the second idea is actually possible, but I don’t know how 🙂

Does anyone know how to do it?

Three choices:

>Use and existing Library to calculate the sin of the SSE vector.
>Use SSE to write your own vector sin function.
>Store the vector in memory, use fsin to calculate the sine of each element, and then load the result. Assuming your stack It is 16-byte aligned and has 16-byte space, as shown below:

movaps %xmm0, (%rsp)
mov $3, %rcx
0: flds (%rsp,%rcx,4)
fsin
fstps (%rsp,%rcx,4)
sub $1, %rcx
jns 0b

(1) is almost certainly the best-performing choice and the easiest choice. If you have extensive experience in writing vector code and know a priori that the parameters belong to a certain range, then you can pass (2) Get better performance. Using fsin will work, but if it matters, it will be ugly, slow and not particularly accurate.

I I was doing integration tasks with FPU before, and now I am struggling with SSE.

My main problem is that when I use the FPU stack, there are fsin functions that can be used numerically, which is located at The top of the stack (st0).

Now I want to calculate the sine of all my four numbers in XMM0, or calculate it elsewhere and enter XMM0. I am using AT&T syntax.

I think the second idea is actually possible, but I don't know how :)

Does anyone know how to do it?

Three choices:

>Use and existing library to calculate the sin of the SSE vector.
> Use SSE to write your own vector sin function.
>Store the vector into memory, use fsin to calculate the sine of each element, and then load the result. Assuming your stack is 16-byte aligned and has 16-byte space , As follows:

movaps %xmm0, (%rsp)
mov $3, %rcx
0: flds (%rsp,%rcx, 4)
fsin
fstps (%rsp,%rcx,4)
sub $1, %rcx
jns 0b

(1) is almost certainly The best performance choice is also the easiest choice. If you have rich experience in writing vector code and know a priori that the parameters belong to a certain range, then you can get better performance through (2). Using fsin will It works, but if it matters, it will be ugly, slow and not particularly accurate.

WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]
SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 4215 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC

Leave a Comment

Your email address will not be published.