Parallel processing – Julia parallel acceleration performance for large-scale calculation

General situation:

I developed a fairly large Navier-Stokes (finite difference) solver written in FORTRAN90. It has an adaptive network Grid (hence the load balancing problem), I tried various techniques (MPI, OpenMP and OpenMP-MPI hyrbid) to parallelize it. However, it does not scale well, i.e. it runs 96-97% in parallel according to Amdahl’s law Calculation. In addition, the general size of the grid is hundreds of millions of points, which will need to be increased later in the future.

Query:

Now, I am considering switching to Julia, Because it becomes very cumbersome to maintain the existing code and add more features to it.

The problem is that I can’t find a good answer about Julia’s parallel performance. I have searched on the Internet and watched many YouTube videos What I noticed is that most people think Julia is very suitable for parallel computing, and some even provide a bar graph showing that the elapsed time is reduced compared to the serial code. However, some answers/videos are quite old , As this new language continues to grow, it makes them a bit unreliable.

So, I want to know if this language has the ability to expand even thousands of cores?

Additional information:

I am still trying to increase the speed of the existing code to achieve almost linear performance of several thousand cores. The solver needs to be swapped every time step 3-4 overlap points. Therefore, it involves huge communication overhead. However, the non-adaptive grid version of the code can be easily extended to 20k cores.

I have also read some places, Julia did not Parallel use InfiniBand standard for data communication.

The following article gives the pde constraint parameter estimation problem Scaling results, but not anywhere where you seem to be interested in the number of cores: https://arxiv.org/abs/1606.07399. I have not seen any examples with thousands of cores.

Re infiniband: By default, Julia uses shared memory for intra-node communication, and uses TCP/IP across nodes, so infiniband is not supported by default. However, the language allows for custom transmission, I think someone will Infiniband support is added sometimes, but I can’t find any implementation of fast Google search.

General situation:

I developed A fairly large Navier-Stokes (finite difference) solver written in FORTRAN90. It has an adaptive grid (hence the load balancing problem) and I tried various techniques (MPI, OpenMP and OpenMP-MPI hyrbid) to parallelize It. However, it does not scale well, that is, it runs 96-97% of the calculations in parallel according to Amdahl’s law. In addition, the general size of the grid is several hundred million points, which will need to be increased later in the future.

Query:

Now, I am considering switching to Julia, because it becomes very cumbersome to maintain the existing code and add more features to it.

The problem is me Can’t find a good answer about Julia’s parallel performance. I have searched on the Internet and watched many YouTube videos. What I noticed is that most people think Julia is very suitable for parallel computing, and some even provide a bar graph , Showing that the elapsed time has been reduced compared to the serial code. However, some answers/videos are already quite old, which makes them a bit unreliable due to the continuous growth of this new language.

So, I think Do you know whether this language has the ability to expand even thousands of cores?

Additional information:

I am still trying to increase the speed of the existing code to achieve almost linear performance of several thousand cores. The solver needs to be swapped every time step 3-4 overlap points. Therefore, it involves huge communication overhead. However, the non-adaptive grid version of the code can be easily extended to 20k cores.

I have also read some places, Julia did not Parallel use of InfiniBand standards for data communication.

The following article gives the scaling results of the pde constraint parameter estimation problem, but it does not reach the core that you seem to be interested in. Quantity anywhere: https://arxiv.org/abs/1606.07399. I have not seen any examples with thousands of cores.

Re infiniband: By default, Julia uses Shared memory is used for intra-node communication, and TCP/IP is used across nodes, so infiniband is not supported by default. However, the language allows for custom transmission, I think someone will add infiniband support at some point, but I can’t find it Any implementation of fast Google search.

Leave a Comment

Your email address will not be published.