对每个进程使用具有不同随机种子的 python 多处理

2022-01-12 00:00:00 python multiprocessing



I wish to run several instances of a simulation in parallel, but with each simulation having its own independent data set.


P = mp.Pool(ncpus) # Generate pool of workers
for j in range(nrun): # Generate processes
    sim = MDF.Simulation(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat,savetemp)
    lattice = MDF.Lattice(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, kb, ks, kbs, a, p, q, massL, randinit, initvel, parangle,scaletemp,savetemp)
    adatom1 = MDF.Adatom(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, ra, massa, amorse, bmorse, r0, z0, name, lattice, samplerate,savetemp)        
    P.apply_async(run,(j,sim,lattice,adatom1),callback=After) # run simulation and ISF analysis in each process
P.join() # start processes  

其中 simadatom1lattice 是传递给启动模拟的函数 run 的对象.

where sim, adatom1 and lattice are objects passed to the function run which initiates the simulation.

但是,我最近发现,我同时运行的每个批次(即,每个 ncpus 都用完模拟运行的总 nrun 次)给出完全相同的结果.

However, I recently found out that each batch I run simultaneously (that is, each ncpus runs out of the total nrun of simulations runs) gives the exact same results.


Can someone here enlighten how to fix this?



Just thought I would add an actual answer to make it clear for others.

引用 aix 的答案in this问题:

Quoting the answer from aix in this question:

发生的情况是,在 Unix 上,每个工作进程都继承相同的来自父进程的随机数生成器的状态.这是为什么它们会生成相同的伪随机序列.

What happens is that on Unix every worker process inherits the same state of the random number generator from the parent process. This is why they generate identical pseudo-random sequences.

使用 random.seed() 方法(或 scipy/numpy 等价物)正确设置种子.另请参阅这个 numpy 线程.

Use the random.seed() method (or the scipy/numpy equivalent) to set the seed properly. See also this numpy thread.
