为什么 Fortran 在 julia 基准测试“rand_mat_mul"中很慢?
问题描述
Julia 主页 (http://julialang.org/) 上的基准测试结果表明,Fortran 是在rand_mat_mul"基准测试中比 Julia/Numpy 慢约 4 倍.
我不明白为什么从同一个 fortran 库 (BLAS) 调用时 fortran 速度较慢??
我还对 fortran、julia 和 numpy 的矩阵乘法进行了简单的测试,得到了类似的结果:
朱莉娅
n = 1000;A = 兰德(n,n);B = 兰德(n,n);@时间 C = A*B;
<块引用>
>> 经过时间:0.069577896 秒(已分配 7 MB)
IPython 中的 Numpy
从 numpy 导入 *n = 1000;A = 随机数.rand(n,n);B = 随机数.rand(n,n);%时间 C = 点(A,B);
<块引用>
>> 挂墙时间:98 毫秒
Fortran
程序测试隐式无整数,参数 :: N = 1000整数 :: I,J真实*8 :: T0,T1实数*8 :: A(N,N), B(N,N), C(N,N)调用随机种子()DO I = 1, N, 1DO J = 1, N, 1CALL RANDOM_NUMBER(A(I,J))CALL RANDOM_NUMBER(B(I,J))结束做结束做调用 cpu_time(t0)调用 DGEMM(N"、N"、N、N、N、1.D0、A、N、B、N、0.D0、C、N)调用 cpu_time(t1)write(unit=*, fmt="(a24,f10.3,a1)") "乘法时间:",t1-t0,"s"结束程序测试
<块引用>
gfortran test_blas.f90 libopenblas.dll -O3 &一个.exe
>> 乘法时间:0.296s
解决方案我把计时函数改成 system_clock() 结果是(我在一个程序中运行了五次)
<块引用>乘法时间:92ms
乘法时间:92ms
乘法时间:89ms
乘法时间:85ms
乘法时间:94ms
它近似于 Numpy,但仍比 Julia 慢 20%.
Benchmark test results on the home page of Julia (http://julialang.org/) shows that Fortran is about 4x slower than Julia/Numpy in the "rand_mat_mul" benchmark.
I can not understand that why fortran is slower while calling from the same fortran library (BLAS)??
I have also performed a simple test for matrix multiplication evolving fortran, julia and numpy and got the similar results:
Julia
n = 1000; A = rand(n,n); B = rand(n,n);
@time C = A*B;
>> elapsed time: 0.069577896 seconds (7 MB allocated)
Numpy in IPython
from numpy import *
n = 1000; A = random.rand(n,n); B = random.rand(n,n);
%time C = dot(A,B);
>> Wall time: 98 ms
Fortran
PROGRAM TEST
IMPLICIT NONE
INTEGER, PARAMETER :: N = 1000
INTEGER :: I,J
REAL*8 :: T0,T1
REAL*8 :: A(N,N), B(N,N), C(N,N)
CALL RANDOM_SEED()
DO I = 1, N, 1
DO J = 1, N, 1
CALL RANDOM_NUMBER(A(I,J))
CALL RANDOM_NUMBER(B(I,J))
END DO
END DO
call cpu_time(t0)
CALL DGEMM ( "N", "N", N, N, N, 1.D0, A, N, B, N, 0.D0, C, N )
call cpu_time(t1)
write(unit=*, fmt="(a24,f10.3,a1)") "Time for Multiplication:",t1-t0,"s"
END PROGRAM TEST
gfortran test_blas.f90 libopenblas.dll -O3 & a.exe
>> Time for Multiplication: 0.296s
解决方案
I have changed the timing function to system_clock() and result turns out to be (I run it five times in one program)
Time for Multiplication: 92ms
Time for Multiplication: 92ms
Time for Multiplication: 89ms
Time for Multiplication: 85ms
Time for Multiplication: 94ms
It is approximate as Numpy, but still about 20% slower than Julia.
相关文章