Hi Matthias,
thank you for your quick answer.
The machine I'm talking about is this one http://www.top500.org/system/9881 , so it is basically a linux cluster with Intel processors.
A typical error message is :
parsing task files ... Loading information about run 1 from file /scratch/cont003/carleo/ beta40/N180.task1.out.run1 failed to read array of type double from an IXDRDump Cannot open simulation file /scratch/cont003/carleo/beta40/ N180.task1.out.xml.
This issue happens for all the checkpoints, and the checkpoints files exist (i.e., in the previous case both /scratch/cont003/carleo/beta40/ N180.task1.out.xml and /scratch/cont003/carleo/beta40/N180.task1.out.run1 exist) and they are not truncated (i.e. at least the .out.xml files correctly end with </MCRUN></SIMULATION> ).
Moreover, the checkpoints file are indeed accessible and have the right permissions (-rw-r-----)... uhm, strange.
Giuseppe
Hi Giuseppe
What type of machine are you using ALPS on? I cannot immediately tell you what the problem might be. Do all files actually exist locally or might the checkpoints not be accessible? Or maybe the file was truncated by the process being killed? Does this happen to all checkpoints or just some?
Matthias
On May 5, 2010, at 10:38 AM, Giuseppe Carleo wrote:
Hello everybody,
I am currently using the ALPS (v. 1.35) scheduler in my QMC code, and everything works pretty well.
Nonetheless, I've experienced an error when trying to restart my simulations on a HPC machine :
"failed to read array of type double from an IXDRDump"
which doesn't allow me to restart any simulation...
On the other hand, on other machines the simulations are correctly restarted. I therefore assume that the way I use to restart simulations is correct, i.e. I invoke something like mpirun ./myprogram.o simulation_name.out.xml , and that the dumping of the internal variables is done correctly in my code.
I think the error message should be related to some machine- specific issue.
Do you have suggestions for this problem?
Thank you in advance,
Giuseppe