I'm attempting to execute tutorial3a.py in the mc-03-magnetization directory using MPI. I'm using a cluster running Ubuntu Server 12.04 LTS and using the SLURM queue manager.
I have run it successfully in serial by submitting it to nodes via SLURM. I have also successfully run it in serial without SLURM.
When running in serial, I find the following line of code inside the .py file.
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True)
According to the ALPS tutorial webpage, the simulation can be run in parallel by simply changing the line above to
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True,MPI=4)
All I have done is appended "MPI=4" as an argument. This generates the following output when run with or without SLURM:
parsing task files ... Creating a new simulation: 1 on 1 processes Created run 1 locally All processes have been assigned NUMBER_OF_WORMS_PER_SWEEP: 39 Checking if Simulation 1 is finished: Finished Halted Simulation 1 WARNING: Unclosed tag: SIMULATION! boost::filesystem::rename: No such file or directory: "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5.bak", "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5" -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.
Does anyone have any suggestions regarding this issue? I have tried both defining and not defining the number of cores for SLURM to allocate, but haven't had any luck.
Thanks!
Chad Garland
Have you tried running it using the command line instead of calling through Python? Does that work?
On Sep 4, 2013, at 4:06 PM, Scott Garland scgarland191@yahoo.com wrote:
I'm attempting to execute tutorial3a.py in the mc-03-magnetization directory using MPI. I'm using a cluster running Ubuntu Server 12.04 LTS and using the SLURM queue manager.
I have run it successfully in serial by submitting it to nodes via SLURM. I have also successfully run it in serial without SLURM.
When running in serial, I find the following line of code inside the .py file.
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True)
According to the ALPS tutorial webpage, the simulation can be run in parallel by simply changing the line above to
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True,MPI=4)
All I have done is appended "MPI=4" as an argument. This generates the following output when run with or without SLURM:
parsing task files ... Creating a new simulation: 1 on 1 processes Created run 1 locally All processes have been assigned NUMBER_OF_WORMS_PER_SWEEP: 39 Checking if Simulation 1 is finished: Finished Halted Simulation 1 WARNING: Unclosed tag: SIMULATION! boost::filesystem::rename: No such file or directory: "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5.bak", "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5"
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.
Does anyone have any suggestions regarding this issue? I have tried both defining and not defining the number of cores for SLURM to allocate, but haven't had any luck.
Thanks!
Chad Garland
Hello. Thanks for responding so quickly.
No I had not tried that. I am supposed to use python for this particular assignment.
To be sure, I just tried it via the command line for the first time. From what I can tell, it works from the command line both in serial and in parallel, both on the master and through SLURM. The program took ~144s in serial and ~30 seconds in parallel, so the timing checks out.
Any idea what the python problem could be? If I must use it, the command line operation should be sufficient, but if you have any more insight regarding this issue, it would be very much appreciated.
Thanks, Chad
________________________________ From: Matthias Troyer troyer@phys.ethz.ch To: comp-phys-alps-users@lists.phys.ethz.ch Sent: Wednesday, September 4, 2013 10:53 AM Subject: Re: [ALPS-users] Parallel ALPS (using python) jobs via SLURM
Have you tried running it using the command line instead of calling through Python? Does that work?
On Sep 4, 2013, at 4:06 PM, Scott Garland scgarland191@yahoo.com wrote:
I'm attempting to execute tutorial3a.py in the mc-03-magnetization directory using MPI. I'm using a cluster running Ubuntu Server 12.04 LTS and using the SLURM queue manager.
I have run it successfully in serial by submitting it to nodes via SLURM. I have also successfully run it in serial without SLURM.
When running in serial, I find the following line of code inside the .py file.
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True)
According to the ALPS tutorial webpage, the simulation can be run in parallel by simply changing the line above to
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True,MPI=4)
All I have done is appended "MPI=4" as an argument. This generates the following output when run with or without SLURM:
parsing task files ... Creating a new simulation: 1 on 1 processes Created run 1 locally All processes have been assigned NUMBER_OF_WORMS_PER_SWEEP: 39 Checking if Simulation 1 is finished: Finished Halted Simulation 1 WARNING: Unclosed tag: SIMULATION! boost::filesystem::rename: No such file or directory: "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5.bak", "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5"
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.
Does anyone have any suggestions regarding this issue? I have tried both defining and not defining the number of cores for SLURM to allocate, but haven't had any luck.
Thanks!
Chad Garland
What command did you use for the command line version? And can you check whether the Python version (without the MPI=4) might be launched four times?
On Sep 4, 2013, at 10:17 PM, Chad Garland scgarland191@yahoo.com wrote:
Hello. Thanks for responding so quickly.
No I had not tried that. I am supposed to use python for this particular assignment.
To be sure, I just tried it via the command line for the first time. From what I can tell, it works from the command line both in serial and in parallel, both on the master and through SLURM. The program took ~144s in serial and ~30 seconds in parallel, so the timing checks out.
Any idea what the python problem could be? If I must use it, the command line operation should be sufficient, but if you have any more insight regarding this issue, it would be very much appreciated.
Thanks, Chad
From: Matthias Troyer troyer@phys.ethz.ch To: comp-phys-alps-users@lists.phys.ethz.ch Sent: Wednesday, September 4, 2013 10:53 AM Subject: Re: [ALPS-users] Parallel ALPS (using python) jobs via SLURM
Have you tried running it using the command line instead of calling through Python? Does that work?
On Sep 4, 2013, at 4:06 PM, Scott Garland scgarland191@yahoo.com wrote:
I'm attempting to execute tutorial3a.py in the mc-03-magnetization directory using MPI. I'm using a cluster running Ubuntu Server 12.04 LTS and using the SLURM queue manager.
I have run it successfully in serial by submitting it to nodes via SLURM. I have also successfully run it in serial without SLURM.
When running in serial, I find the following line of code inside the .py file.
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True)
According to the ALPS tutorial webpage, the simulation can be run in parallel by simply changing the line above to
res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True,MPI=4)
All I have done is appended "MPI=4" as an argument. This generates the following output when run with or without SLURM:
parsing task files ... Creating a new simulation: 1 on 1 processes Created run 1 locally All processes have been assigned NUMBER_OF_WORMS_PER_SWEEP: 39 Checking if Simulation 1 is finished: Finished Halted Simulation 1 WARNING: Unclosed tag: SIMULATION! boost::filesystem::rename: No such file or directory: "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5.bak", "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5"
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.
Does anyone have any suggestions regarding this issue? I have tried both defining and not defining the number of cores for SLURM to allocate, but haven't had any luck.
Thanks!
Chad Garland
comp-phys-alps-users@lists.phys.ethz.ch