Dear ALPS developers,
I am running worm simulations for many different parameter sets on cluster. The problem I am having is filesystem bandwidth limitation, probably mostly because other users also use the system. What I have found out when running simulation is the generated output data is quite big. I generated 72MB of data for one simulation with aprox. 20 tasks (worm simulation, 1000000 sweeps, I change t in Hubbard model from [0.03 - 0.25]). I have many different simulations, quickly generating 10GB+ of data. I am not using --write-xml option. My question is, why is the output so big? From what I understand, the only thing needed is to store all observables with their errors (+ checkpoint data). Is there any way to: minimize the output generated (so I don't contribute to bandwidth limitations of cluster) or copy only data that I need (namely averages) to my laptop? I would like to work on data not connected to cluster and copying many GBs of files seems a waste (+ is really slow on the system).
Regards, Žiga Osolin
Hi,
10 GB is actually not much data, but if you want to minimize the disk space you can delete the files ending in *.run1, *.run2, ... . If you just want to do final evaluations and are 100% sure that the errors are converged and the simulation is equilibrated then you just need to transfer the files ending in *task1.out.h5, *task2.out.h5, ... and not any of the run files.
The files do not only contain the final average and error, but much more information that is required, e.g. for a jackknife analysis if you want to calculate nonlinear functions of measurements.
Matthias Troyer
On 17 May 2012, at 04:43, Žiga Osolin wrote:
Dear ALPS developers,
I am running worm simulations for many different parameter sets on cluster. The problem I am having is filesystem bandwidth limitation, probably mostly because other users also use the system. What I have found out when running simulation is the generated output data is quite big. I generated 72MB of data for one simulation with aprox. 20 tasks (worm simulation, 1000000 sweeps, I change t in Hubbard model from [0.03 - 0.25]). I have many different simulations, quickly generating 10GB+ of data. I am not using --write-xml option. My question is, why is the output so big? From what I understand, the only thing needed is to store all observables with their errors (+ checkpoint data). Is there any way to: minimize the output generated (so I don't contribute to bandwidth limitations of cluster) or copy only data that I need (namely averages) to my laptop? I would like to work on data not connected to cluster and copying many GBs of files seems a waste (+ is really slow on the system).
Regards, Žiga Osolin
comp-phys-alps-users@lists.phys.ethz.ch