Hello,
I am using the ALPS libraries for scheduling and observable handling. I typically have more tasks than cores, especially when debugging on my personal computer. Is it possible to make the scheduler start say the first four jobs and when it's time to write a checkpoint switch to the next four instead of continuing with the first until they are finished?
Also for debugging purposes, I print out some messages along the way and it would help to know which process printed them. Is the MPI rank saved somewhere or do I have to obtain it myself?
Best regards, Peter
Dear Peter,
On 2013/09/17, at 17:55, Peter Bröcker peter.broecker@uni-koeln.de wrote:
I am using the ALPS libraries for scheduling and observable handling. I typically have more tasks than cores, especially when debugging on my personal computer. Is it possible to make the scheduler start say the first four jobs and when it's time to write a checkpoint switch to the next four instead of continuing with the first until they are finished?
No, currently it is not possible. Could you tell me a bit more why you need such functionality?
Also for debugging purposes, I print out some messages along the way and it would help to know which process printed them. Is the MPI rank saved somewhere or do I have to obtain it myself?
Please use MPI_Comm_rank.
Best, Synge
On Sep 17, 2013, at 5:44 PM, Synge Todo wistaria@comp-phys.org wrote:
Dear Peter,
On 2013/09/17, at 17:55, Peter Bröcker peter.broecker@uni-koeln.de wrote:
I am using the ALPS libraries for scheduling and observable handling. I typically have more tasks than cores, especially when debugging on my personal computer. Is it possible to make the scheduler start say the first four jobs and when it's time to write a checkpoint switch to the next four instead of continuing with the first until they are finished?
No, currently it is not possible. Could you tell me a bit more why you need such functionality?
The easiest is just to split the simulation into several input files, each containing four instances and then run them alternatingly for some fixed time.
Matthias
Hello,
No, currently it is not possible. Could you tell me a bit more why you need such functionality?
when I run my simulations, I like to have a rough idea of where it is going. So before switching to the ALPS libraries for scheduling, I defined a threshold for how many sweeps where done on a given task before switching to a different one. That way, I could run a job for may be 24h and see where the results were headed. This is especially useful when the cluster queue is loaded and it might take some time after a resubmitted job starts. I could set the number of sweeps to a very large number and see whether the result had converged or not. If it did, I would deactivate the task via an entry in its file. That way, jobs that require less work don't take up computing time from those that need more.
I am guessing the same can be done in ALPS by defining a rather low number of sweeps and then changing the xml-files if needed, right?
Does the parapack scheduler have this functionality?
The easiest is just to split the simulation into several input files, each containing four instances and then run them alternatingly for some fixed time.
Matthias
That's of course true. When running alpspython on an input file, can I define the number of input files the job should be split into?
Best, Peter
On Sep 18, 2013, at 5:35 PM, Peter Bröcker peter.broecker@uni-koeln.de wrote:
Hello,
No, currently it is not possible. Could you tell me a bit more why you need such functionality?
when I run my simulations, I like to have a rough idea of where it is going. So before switching to the ALPS libraries for scheduling, I defined a threshold for how many sweeps where done on a given task before switching to a different one. That way, I could run a job for may be 24h and see where the results were headed. This is especially useful when the cluster queue is loaded and it might take some time after a resubmitted job starts. I could set the number of sweeps to a very large number and see whether the result had converged or not. If it did, I would deactivate the task via an entry in its file. That way, jobs that require less work don't take up computing time from those that need more.
I am guessing the same can be done in ALPS by defining a rather low number of sweeps and then changing the xml-files if needed, right?
Does the parapack scheduler have this functionality?
The easiest is just to split the simulation into several input files, each containing four instances and then run them alternatingly for some fixed time.
Matthias
That's of course true. When running alpspython on an input file, can I define the number of input files the job should be split into?
No, but you can simply change the Python code and write multiple files, e.g.:
input_file_a = pyalps.writeInputFiles('parm1a',parms[0:4]) input_file_b = pyalps.writeInputFiles('parm1b',parms[4:8])
Matthias
comp-phys-alps-users@lists.phys.ethz.ch