What command did you use for the command line version? And can you check whether the Python version (without the MPI=4) might be launched four times?

On Sep 4, 2013, at 10:17 PM, Chad Garland <scgarland191@yahoo.com> wrote:

Hello. Thanks for responding so quickly.

No I had not tried that. I am supposed to use python for this particular assignment.

To be sure, I just tried it via the command line for the first time. From what I can tell, it works from the command line both in serial and in parallel, both on the master and through SLURM. The program took ~144s in serial and ~30 seconds in parallel, so the timing checks out.

Any idea what the python problem could be? If I must use it, the command line operation should be sufficient, but if you have any more insight regarding this issue, it would be very much appreciated.

Thanks,
Chad


From: Matthias Troyer <troyer@phys.ethz.ch>
To: comp-phys-alps-users@lists.phys.ethz.ch
Sent: Wednesday, September 4, 2013 10:53 AM
Subject: Re: [ALPS-users] Parallel ALPS (using python) jobs via SLURM

Have you tried running it using the command line instead of calling through Python? Does that work?

On Sep 4, 2013, at 4:06 PM, Scott Garland <scgarland191@yahoo.com> wrote:

I'm attempting to execute tutorial3a.py in the mc-03-magnetization directory using MPI. I'm using a cluster running Ubuntu Server 12.04 LTS and using the SLURM queue manager.

I have run it successfully in serial by submitting it to nodes via SLURM. I have also successfully run it in serial without SLURM.

When running in serial, I find the following line of code inside the .py file.

res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True)

According to the ALPS tutorial webpage, the simulation can be run in parallel by simply changing the line above to

res = pyalps.runApplication('dirloop_sse',input_file,Tmin=5,writexml=True,MPI=4)

All I have done is appended "MPI=4" as an argument. This generates the following output when run with or without SLURM:

parsing task files ... 
Creating a new simulation: 1 on 1 processes
Created run 1 locally
All processes have been assigned
NUMBER_OF_WORMS_PER_SWEEP: 39
Checking if Simulation 1 is finished: Finished
Halted Simulation 1
WARNING: Unclosed tag: SIMULATION!
boost::filesystem::rename: No such file or directory: "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5.bak", "/home/sgarland/moreTutorials/alps/mc-03-magnetization/outputFolder/output.task1.out.run1.h5"
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.


Does anyone have any suggestions regarding this issue? I have tried both defining and not defining the number of cores for SLURM to allocate, but haven't had any luck.

Thanks!

Chad Garland