为什么 Fortran 在 julia 基准测试“rand_mat_mul"中很慢?

2022-01-23 00:00:00 numpy fortran julia blas

问题描述

Julia 主页 (http://julialang.org/) 上的基准测试结果表明,Fortran 是在rand_mat_mul"基准测试中比 Julia/Numpy 慢约 4 倍.

我不明白为什么从同一个 fortran 库 (BLAS) 调用时 fortran 速度较慢??

我还对 fortran、julia 和 numpy 的矩阵乘法进行了简单的测试,得到了类似的结果:

朱莉娅

n = 1000;A = 兰德(n,n);B = 兰德(n,n);@时间 C = A*B;

<块引用>

>> 经过时间:0.069577896 秒(已分配 7 MB)

IPython 中的 Numpy

从 numpy 导入 *n = 1000;A = 随机数.rand(n,n);B = 随机数.rand(n,n);%时间 C = 点(A,B);

<块引用>

>> 挂墙时间:98 毫秒

Fortran

程序测试隐式无整数,参数 :: N = 1000整数 :: I,J真实*8 :: T0,T1实数*8 :: A(N,N), B(N,N), C(N,N)调用随机种子()DO I = 1, N, 1DO J = 1, N, 1CALL RANDOM_NUMBER(A(I,J))CALL RANDOM_NUMBER(B(I,J))结束做结束做调用 cpu_time(t0)调用 DGEMM(N"、N"、N、N、N、1.D0、A、N、B、N、0.D0、C、N)调用 cpu_time(t1)write(unit=*, fmt="(a24,f10.3,a1)") "乘法时间:",t1-t0,"s"结束程序测试

<块引用>

gfortran test_blas.f90 libopenblas.dll -O3 &一个.exe

>> 乘法时间:0.296s

解决方案

我把计时函数改成 system_clock() 结果是(我在一个程序中运行了五次)

<块引用>

乘法时间:92ms

乘法时间:92ms

乘法时间:89ms

乘法时间:85ms

乘法时间:94ms

它近似于 Numpy,但仍比 Julia 慢 20%.

Benchmark test results on the home page of Julia (http://julialang.org/) shows that Fortran is about 4x slower than Julia/Numpy in the "rand_mat_mul" benchmark.

I can not understand that why fortran is slower while calling from the same fortran library (BLAS)??

I have also performed a simple test for matrix multiplication evolving fortran, julia and numpy and got the similar results:

Julia

n = 1000; A = rand(n,n); B = rand(n,n);
@time C = A*B;

>> elapsed time: 0.069577896 seconds (7 MB allocated)

Numpy in IPython

from numpy import *
n = 1000; A = random.rand(n,n); B = random.rand(n,n);
%time C = dot(A,B);

>> Wall time: 98 ms

Fortran

PROGRAM TEST

IMPLICIT NONE
INTEGER, PARAMETER :: N = 1000
INTEGER :: I,J
REAL*8 :: T0,T1

REAL*8 :: A(N,N), B(N,N), C(N,N)

CALL RANDOM_SEED()
DO I = 1, N, 1
    DO J = 1, N, 1
        CALL RANDOM_NUMBER(A(I,J))
        CALL RANDOM_NUMBER(B(I,J))
    END DO
END DO

call cpu_time(t0)
CALL DGEMM ( "N", "N", N, N, N, 1.D0, A, N, B, N, 0.D0, C, N )
call cpu_time(t1)

write(unit=*, fmt="(a24,f10.3,a1)") "Time for Multiplication:",t1-t0,"s"

END PROGRAM TEST

gfortran test_blas.f90 libopenblas.dll -O3 & a.exe

>> Time for Multiplication: 0.296s

解决方案

I have changed the timing function to system_clock() and result turns out to be (I run it five times in one program)

Time for Multiplication: 92ms

Time for Multiplication: 92ms

Time for Multiplication: 89ms

Time for Multiplication: 85ms

Time for Multiplication: 94ms

It is approximate as Numpy, but still about 20% slower than Julia.

相关文章