使用HDF5追加仿真数据

2022-04-06 00:00:00 python hdf5 h5py plot simulation

问题描述

我当前多次运行模拟，希望保存这些模拟的结果，以便可以将其用于可视化。

模拟运行100次，每个模拟生成大约100万个数据点(即100万集的100万个值)，我现在想高效地存储这些数据点。每一集的目标都是在所有100个模拟中生成每个值的平均值。

我的main文件如下：

# Defining the test simulation environment
def test_simulation:
    environment = environment(
            periods = 1000000
            parameter_x = ...
            parameter_y = ...
      )

    # Defining the simulation
    environment.simulation()

    # Save simulation data
    hf = h5py.File('runs/simulation_runs.h5', 'a')
    hf.create_dataset('data', data=environment.value_history, compression='gzip', chunks=True)
    hf.close()

# Run the simulation 100 times
for i in range(100):
    print(f'--- Iteration {i} ---')
    test_simulation()

value_history在game()内生成，即根据

将值连续追加到空列表中：

def simulation:
    for episode in range(periods):
        value = doSomething()
        self.value_history.append(value)

现在，我在进入下一次模拟时收到以下错误消息：

ValueError: Unable to create dataset (name already exists)

我知道当前代码不断尝试创建新文件并生成错误，因为它已经存在。现在，我希望重新打开在第一个模拟中创建的文件，追加下一个模拟中的数据，然后再次保存。

解决方案

下面的示例显示如何将所有这些想法结合在一起。它创建2个文件：

在第一个循环中使用maxshape()参数创建一个可调整大小的数据集，然后在后续循环中使用dataset.resize()--输出为 simulation_runs1.h5

simulation_runs2.h5

我为模拟数据&创建了一个简单的100x100 NumPy随机数组，并运行了10次模拟。它们是变量，因此您可以增加到更大的值，以确定哪种方法更适合(更快)您的数据。您还可能发现内存限制，在1M个时间段内保存1M个数据点。
注1：如果无法将所有数据保存在系统内存中，可以将仿真结果增量保存到H5文件中。只是稍微复杂了一点。
注2：我添加了一个mode变量来控制是为第一个模拟(i==0)创建一个新文件，还是以追加模式打开现有文件以供后续模拟使用。

import h5py
import numpy as np

# Create some psuedo-test data
def test_simulation(i):
    periods = 100
    times = 100

    # Define the simulation with some random data
    val_hist = np.random.random(periods*times).reshape(periods,times)    
    a0, a1 = val_hist.shape[0], val_hist.shape[1]
    
    if i == 0:
        mode='w'
    else:
        mode='a'
        
    # Save simulation data (resize dataset)
    with h5py.File('runs/simulation_runs1.h5', mode) as hf:
        if 'data' not in list(hf.keys()):
            print('create new dataset')
            hf.create_dataset('data', shape=(1,a0,a1), maxshape=(None,a0,a1), data=val_hist, 
                              compression='gzip', chunks=True)
        else:
            print('resize existing dataset')
            d0 = hf['data'].shape[0]
            hf['data'].resize( (d0+1,a0,a1) )
            hf['data'][d0:d0+1,:,:] = val_hist
 
    # Save simulation data (unique datasets)
    with h5py.File('runs/simulation_runs2.h5', mode) as hf:
        hf.create_dataset(f'data_{i:03}', data=val_hist, 
                          compression='gzip', chunks=True)

# Run the simulation 100 times
for i in range(10):
    print(f'--- Iteration {i} ---')
    test_simulation(i)

相关文章