ok, I'll try to run it on CINECA and I'll tell you if it works.
On 1 Jul 2012, at 15:25, Rachele Nerattini wrote:Thank you all very much!
Yes, I think this could help. I'll send your mail to the CINECA-help-desk so that they will tell me if this is all we need and how to concatenate jobs.
In any case, if I understand well, all I have to do is:
Nearly0)use parameter2xml to generate the file .*.in.xml;Yes
1) to run the first block inserting the flag ' --time-limit #sec ' in the command line wich tells the program to stop after #sec seconds;
2) to run the second block using again the flag ' --time-limit #sec ' in the command line and the file .out.xml as input file;
3) to go on with this till the end of the simulations.24 ho0urs will be 86400 seconds, but the code will need time to write the checkpoints, thus specify about 1000 seconds less to be on the safe side.Is that ok?
For what concerns the walltime limit I think it is 24 hours (24:00:00)...
Thnak you again I'll let you know if everything works.
Bye for now
Rachele2012/7/1 Fabien Alet <alet@irsamc.ups-tlse.fr>
Dear Rachele,if I understood correctly, what you need to do is to :0) run for 10 seconds or so your job locally on the main server such that file Ising3DL10mpiprocs10.out.xml gets generated(e.g. something like spinmc --time-limit 10 --write-xml Ising3DL10mpiprocs10.in.xml, alternatively just stop the job with CTRL-C)1) add the time limit of 2.4 hours in the command line where you execute your observable, such as e.g.mpirun -np 10 spinmc --mpi --Tmin 100 --time-limit 9000 --write-xml Ising3DL10mpiprocs10.out.xml[actually you should give slightly less than 2.4 hours such that files have the time to be written to disk, this is why I used 9000 seconds]Remark that I used .out.xml , and not .in.xml such that your script really continues your ongoing jobs, and does not restart from scratch every time.I hope this can help,BestFabienLe 1 juil. 2012 à 14:40, Rachele Nerattini a écrit :Dear Mr. Troyer,
I read the page you suggested me but I still don't understand. It says how to restart a simulation stopped because the computer shut down or how to make it stop before its natural end in case you achieved the required error but I don't understad how to stop it in a controlled way.
For istance, let's suppose I have to run a simulation which is 24 hours long.
I want to divide the long run in 10 runs of 2.4 hours each. The output of the first run must be the input of the following and I want to do all the runs in cascade.
What do I have to write in the input file to do that? Sorry but I didn't find it in the page
http://alps.comp-phys.org/mediawiki/index.php/Documentation:Running
certeinly because I'm not an expert in these kind of things.
I copy and paste here in the following an example of input file I've wrote for the CINECA and an example of script I had to write to run a tet job....
Can you tell me what I have to change to make it stop after a certain numeber of seconds? Then I'll ask to CINECA assistence how to make all the block run in cascade.
Thank you and all the best,
Rachele Nerattini
input file = Ising3DL10mpiprocs10
LATTICE="simple cubic lattice"
LATTICE_LIBRARY="lattices.xml"
T=4.511441614
J=1
THERMALIZATION=200000
SWEEPS=1000000
UPDATE="cluster"
MODEL="Ising"
{L=10;}
script = Ising3DL10mpiprocs10.sh
#!/bin/bash
#PBS -A name_of_the_project
#PBS -l walltime=1:00:00
#PBS -l select=1:ncpus=10:mpiprocs=10:mem=40GB
#PBS -q parallel
#PBS -o Ising3DL10mpiprocs10.out
#PBS -e Ising3DL10mpiprocs10.err
# put the executable in the PATH
module load autoload alps
# cd in the directory where you have input e job.sh
cd $PBS_O_WORKDIR
# prep gli input
parameter2xml Ising3DL10mpiprocs10
# it mounts openmpi to have mpirun in the PATH
module load profile/advanced openmpi/1.4.4--intel--co-2011.6.233--binary
mpirun -np 10 spinmc --mpi --Tmin 100 --write-xml Ising3DL10mpiprocs10.in.xml
qsub Ising3DL10mpiprocs10.sh
What do I have to add to have what I need?
Thank you again for the help,
I whish you all the best
Rachele Nerattini2012/7/1 Rachele Nerattini <r.nerattini@gmail.com>
Thank you very much for the help! I'll do it immediately.
All the best
Rachele Nerattini2012/7/1 Matthias Troyer <troyer@phys.ethz.ch>
It is explained here:Best regardsMatthias TroyerOn Jul 1, 2012, at 12:18 PM, Rachele Nerattini wrote:Dear Mr.Troyer,
I'm using Monte Carlo simulations for classical O(n) spin models. To be more precise I'm using the spinmc algorithm for Ising, XY and Heisenberg models.
Thank you for the help,
all the best
Rachele Nerattini2012/6/28 Matthias Troyer <troyer@phys.ethz.ch>Dear Rachele Nerattini,
Are you using Monte Carlo simulations or one of the other codes?
With best regards
Matthias Troyer
On 28 Jun 2012, at 16:59, Rachele Nerattini wrote:
> Dear Mr.Troyer,
>
> I'm running some simulations using ALPS at the CINECA cluster in Bologna. I will have to run really long simulations which go well beyond the walltime limit of the machine.
>
> To do that I have to divide the whole run in several subsections and then run all the processes in cascade.
>
> Can you tell me how can I do that with ALPS? Which are the commands of stop/restart that I have to use?
>
> Thank you for your help,
>
> Rachele Nerattini