Inconsistent comment in generate_spline_matrix in the DMFT folder

List overview All Threads

newer

older

Error when using 'singlesite'...

Acquiring data to susceptibility...

Kuangshing Chen

29 Dec 2011 29 Dec '11

12:04 a.m.

Hi,

The function generate_spline_matrix in fouriertransform.C in the dmft folder shows the comments as follows:

// A is the matrix whose inverse defines spline_matrix // // 6 6 // 1 4 1 // 1 4 1 // A = ... // // 1 4 1 // -2 0 0 2

However, the following codes,

dense_matrix A = 4*dt/6.*boost::numeric::ublas::identity_matrix<double>(Np1);

for (int i=1; i<Np1-1; i++) { A(i,i-1) = dt/6.; A(i,i+1) = dt/6.; } A(0,0) = 1.; A(0, Np1-1) = 1.; A(Np1-1, 0) = -2.*dt/6.; A(Np1-1, 1) = -1.*dt/6.; A(Np1-1, Np1-2) = 1*dt/6.; A(Np1-1, Np1-1) = 2*dt/6.;

the red lines will make the matrix looks like

// 6 6 // 1 4 1 // 1 4 1 // A = ... // // 1 4 1 // -2 *-1* *1* 2

Is the comment correct or the code correct? Any reference for this algorithm?

Thank you, Kuang-Shing Chen

Attachments:

attachment.html (text/html — 4.0 KB)

Show replies by date

Hartmut Hafermann

29 Dec 29 Dec

1:15 a.m.

New subject: Inconsistent comment in generate_spline_matrix in the DMFT folder

Hi,

the code is correct. I have changed the comment to comply with the code. As a reference you can have a look into Emanuel's thesis, p. 138: http://e-collection.library.ethz.ch/eserv/eth:31103/eth-31103-02.pdf

Best, Hartmut

Am 29.12.2011 um 00:04 schrieb Kuangshing Chen:

...

Hi,

The function generate_spline_matrix in fouriertransform.C in the dmft folder shows the comments as follows:

// A is the matrix whose inverse defines spline_matrix // // 6 6 // 1 4 1 // 1 4 1 // A = ... // // 1 4 1 // -2 0 0 2

However, the following codes,

dense_matrix A = 4*dt/6.*boost::numeric::ublas::identity_matrix<double>(Np1);

for (int i=1; i<Np1-1; i++) { A(i,i-1) = dt/6.; A(i,i+1) = dt/6.; } A(0,0) = 1.; A(0, Np1-1) = 1.; A(Np1-1, 0) = -2.*dt/6.; A(Np1-1, 1) = -1.*dt/6.; A(Np1-1, Np1-2) = 1*dt/6.; A(Np1-1, Np1-1) = 2*dt/6.;

the red lines will make the matrix looks like

// 6 6 // 1 4 1 // 1 4 1 // A = ... // // 1 4 1 // -2 -1 1 2

Is the comment correct or the code correct? Any reference for this algorithm?

Thank you, Kuang-Shing Chen

-- Hartmut Hafermann

École Polytechnique Centre de Physique Theorique (CPHT) 91128 Palaiseau Cedex, France

Tel.: +33 1 69 33 42 34 Fax: +33 1 69 33 49 49

Kuangshing Chen

1:34 a.m.

New subject: Inconsistent comment in generate_spline_matrix in the DMFT folder

Thank you Hartmut,

I just test the fourier transformation inside SemicircleHilbertTransformer::initial_G0. The initial G(tau) is defined by

std::complex<double> zeta = iw+mu+(flavor%2 ? -h : h); G0_omega(i, flavor) = (zeta - sqrt(zeta*zeta-4*tsq[flavor]))/(2*tsq[flavor]);

The code uses backward_ft to inverse fourier transform G0_tau to G0_omega, fourier_ptr->backward_ft(G0_tau, G0_omega);

If I fourier transform G0_omega back to G0_tau now, I should get the original G0_tau. But the result of the test shows different G0_tau. Again if I do fourier_ptr->backward_ft(G0_tau, G0_omega), I will get a new G0_omega'. It sounds that FT(IFT) does not equal to identity! Could you double check that for me?

Kuang-Shing Chen

On Wed, Dec 28, 2011 at 6:15 PM, Hartmut Hafermann < hartmut.hafermann@cpht.polytechnique.fr> wrote:

...

Hi,

the code is correct. I have changed the comment to comply with the code. As a reference you can have a look into Emanuel's thesis, p. 138: http://e-collection.library.ethz.ch/eserv/eth:31103/eth-31103-02.pdf

Best, Hartmut

Am 29.12.2011 um 00:04 schrieb Kuangshing Chen:

Hi,

The function generate_spline_matrix in fouriertransform.C in the dmft folder shows the comments as follows:

// A is the matrix whose inverse defines spline_matrix // // 6 6 // 1 4 1 // 1 4 1 // A = ... // // 1 4 1 // -2 0 0 2

However, the following codes,

dense_matrix A = 4*dt/6.*boost::numeric::ublas::identity_matrix<double>(Np1);

for (int i=1; i<Np1-1; i++) { A(i,i-1) = dt/6.; A(i,i+1) = dt/6.; } A(0,0) = 1.; A(0, Np1-1) = 1.; A(Np1-1, 0) = -2.*dt/6.; A(Np1-1, 1) = -1.*dt/6.; A(Np1-1, Np1-2) = 1*dt/6.; A(Np1-1, Np1-1) = 2*dt/6.;

the red lines will make the matrix looks like

// 6 6 // 1 4 1 // 1 4 1 // A = ... // // 1 4 1 // -2 *-1* *1* 2

Is the comment correct or the code correct? Any reference for this algorithm?

Thank you, Kuang-Shing Chen

-- Hartmut Hafermann

École Polytechnique Centre de Physique Theorique (CPHT) 91128 Palaiseau Cedex, France

Tel.: +33 1 69 33 42 34 Fax: +33 1 69 33 49 49

-- Kuang-Shing Chen

Emanuel Gull

2:35 a.m.

New subject: Inconsistent comment in generate_spline_matrix in the DMFT folder

Hi Kuang-Shing Chen,

yes, that's correct: FT(IFT) is not the identity. The Fourier transform from the frequency to the time domain uses model functions. The Fourier transform from time to frequency uses (in this implementation) a spline fitting routine.

You can, if you prefer, do a Fourier transform with a model function also from the time to the frequency domain but you will need to know your high frequency moments.

There are various points where you can read about this. Probably Nils Blümer's thesis has the most thorough description of Fourier transforms in the DMFT context (http://komet337.physik.uni-mainz.de/Bluemer/thesis.en.shtml).

Emanuel

On Dec 28, 2011, at 7:34 PM, Kuangshing Chen wrote:

...

Thank you Hartmut,

I just test the fourier transformation inside SemicircleHilbertTransformer::initial_G0. The initial G(tau) is defined by
  std::complex<double> zeta = iw+mu+(flavor%2 ? -h : h);
  G0_omega(i, flavor) = (zeta - sqrt(zeta*zeta-4*tsq[flavor]))/(2*tsq[flavor]); 
The code uses backward_ft to inverse fourier transform G0_tau to G0_omega, fourier_ptr->backward_ft(G0_tau, G0_omega);

If I fourier transform G0_omega back to G0_tau now, I should get the original G0_tau. But the result of the test shows different G0_tau. Again if I do fourier_ptr->backward_ft(G0_tau, G0_omega), I will get a new G0_omega'. It sounds that FT(IFT) does not equal to identity! Could you double check that for me?

Kuang-Shing Chen

On Wed, Dec 28, 2011 at 6:15 PM, Hartmut Hafermann hartmut.hafermann@cpht.polytechnique.fr wrote: Hi,

the code is correct. I have changed the comment to comply with the code. As a reference you can have a look into Emanuel's thesis, p. 138: http://e-collection.library.ethz.ch/eserv/eth:31103/eth-31103-02.pdf

Best, Hartmut

Am 29.12.2011 um 00:04 schrieb Kuangshing Chen:

...
Hi,

The function generate_spline_matrix in fouriertransform.C in the dmft folder shows the comments as follows:

// A is the matrix whose inverse defines spline_matrix // // 6 6 // 1 4 1 // 1 4 1 // A = ... // // 1 4 1 // -2 0 0 2

However, the following codes,

dense_matrix A = 4*dt/6.*boost::numeric::ublas::identity_matrix<double>(Np1);

for (int i=1; i<Np1-1; i++) { A(i,i-1) = dt/6.; A(i,i+1) = dt/6.; } A(0,0) = 1.; A(0, Np1-1) = 1.; A(Np1-1, 0) = -2.*dt/6.; A(Np1-1, 1) = -1.*dt/6.; A(Np1-1, Np1-2) = 1*dt/6.; A(Np1-1, Np1-1) = 2*dt/6.;

the red lines will make the matrix looks like

// 6 6 // 1 4 1 // 1 4 1 // A = ... // // 1 4 1 // -2 -1 1 2

Is the comment correct or the code correct? Any reference for this algorithm?

Thank you, Kuang-Shing Chen

-- Hartmut Hafermann

École Polytechnique Centre de Physique Theorique (CPHT) 91128 Palaiseau Cedex, France

Tel.: +33 1 69 33 42 34 Fax: +33 1 69 33 49 49

-- Kuang-Shing Chen

Mateusz Łącki

9:39 a.m.

New subject: Avoided problem

Dear All, While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place?

Regards, Mateusz Łącki

Matthias Troyer

9:44 a.m.

New subject: Avoided problem

This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened.

Matthias

On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote:

...

Dear All, While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place?

Regards, Mateusz Łącki

Mateusz Łącki

9:53 a.m.

New subject: Avoided problem

Dear Matthias,

Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now?

Regards, Mateusz

...

This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened.

Matthias

On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote:

...
Dear All, While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place?

Regards, Mateusz Łącki

Matthias Troyer

9:59 a.m.

New subject: Avoided problem

Yes, indeed

On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote:

...

Dear Matthias,

Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now?

Regards, Mateusz

...
This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened.

Matthias

On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote:

...
Dear All, While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place?

Regards, Mateusz Łącki

Mateusz Łącki

30 Dec 30 Dec

12:05 p.m.

New subject: worm: kink output indicates some error?

Dear All, I have set up some computation which failed: -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec has exited due to process rank 27 with PID 15245 on node clone18 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here). --------------------------------------------------------------------------

just before that (in the output file, not sure about the time):

Avoided problem q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 Spin configuration: Wormheads at 78 0.999997 and 78 0.259301 Site: 0 Kink : [ 0.119336 : 0 ] Kink : [ 0.124065 : 1 ] Kink : [ 0.174815 : 0 ] Kink : [ 0.17605 : 1 ] Kink : [ 0.335094 : 2 ] Kink : [ 0.368865 : 1 ] (...) Site: 299 Kink : [ 0.00590279 : 2 ] Kink : [ 0.0326616 : 1 ] Kink : [ 0.0697665 : 0 ] Kink : [ 0.0977223 : 1 ] Kink : [ 0.254292 : 2 ] Kink : [ 0.256147 : 1 ] Kink : [ 0.328286 : 2 ] Kink : [ 0.329838 : 1 ] Kink : [ 0.405038 : 0 ] Kink : [ 0.438803 : 1 ] Kink : [ 0.487034 : 2 ] Kink : [ 0.503331 : 1 ] Kink : [ 0.812159 : 2 ] Kink : [ 0.827811 : 1 ]

Is this related? I am not sure whether this output indicates an error.

Regards, Mateusz

On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote:

...

Yes, indeed

On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote:

...
Dear Matthias,

Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now?

Regards, Mateusz

...
This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened.

Matthias

On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote:

...
Dear All, While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place?

Regards, Mateusz Łącki

Matthias Troyer

5:36 p.m.

New subject: worm: kink output indicates some error?

Hi,

I can only look into this if you send you input file.

Matthias

On 30 Dec 2011, at 12:05, Mateusz Łącki wrote:

...

Dear All, I have set up some computation which failed:

MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

mpiexec has exited due to process rank 27 with PID 15245 on node clone18 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here).

just before that (in the output file, not sure about the time):

Avoided problem q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 Spin configuration: Wormheads at 78 0.999997 and 78 0.259301 Site: 0 Kink : [ 0.119336 : 0 ] Kink : [ 0.124065 : 1 ] Kink : [ 0.174815 : 0 ] Kink : [ 0.17605 : 1 ] Kink : [ 0.335094 : 2 ] Kink : [ 0.368865 : 1 ] (...) Site: 299 Kink : [ 0.00590279 : 2 ] Kink : [ 0.0326616 : 1 ] Kink : [ 0.0697665 : 0 ] Kink : [ 0.0977223 : 1 ] Kink : [ 0.254292 : 2 ] Kink : [ 0.256147 : 1 ] Kink : [ 0.328286 : 2 ] Kink : [ 0.329838 : 1 ] Kink : [ 0.405038 : 0 ] Kink : [ 0.438803 : 1 ] Kink : [ 0.487034 : 2 ] Kink : [ 0.503331 : 1 ] Kink : [ 0.812159 : 2 ] Kink : [ 0.827811 : 1 ]

Is this related? I am not sure whether this output indicates an error.

Regards, Mateusz

On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote:

...
Yes, indeed

On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote:

...
Dear Matthias,

Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now?

Regards, Mateusz

...
This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened.

Matthias

On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote:

...
Dear All, While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place?

Regards, Mateusz Łącki

Mateusz Łącki

11:04 p.m.

New subject: worm: kink output indicates some error?

Dear Matthias, I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2)

Regards, Mateusz Łącki

On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote:

...

Hi,

I can only look into this if you send you input file.

Matthias

On 30 Dec 2011, at 12:05, Mateusz Łącki wrote:

...
Dear All, I have set up some computation which failed:

MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

mpiexec has exited due to process rank 27 with PID 15245 on node clone18 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here).

just before that (in the output file, not sure about the time):

Avoided problem q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 Spin configuration: Wormheads at 78 0.999997 and 78 0.259301 Site: 0 Kink : [ 0.119336 : 0 ] Kink : [ 0.124065 : 1 ] Kink : [ 0.174815 : 0 ] Kink : [ 0.17605 : 1 ] Kink : [ 0.335094 : 2 ] Kink : [ 0.368865 : 1 ] (...) Site: 299 Kink : [ 0.00590279 : 2 ] Kink : [ 0.0326616 : 1 ] Kink : [ 0.0697665 : 0 ] Kink : [ 0.0977223 : 1 ] Kink : [ 0.254292 : 2 ] Kink : [ 0.256147 : 1 ] Kink : [ 0.328286 : 2 ] Kink : [ 0.329838 : 1 ] Kink : [ 0.405038 : 0 ] Kink : [ 0.438803 : 1 ] Kink : [ 0.487034 : 2 ] Kink : [ 0.503331 : 1 ] Kink : [ 0.812159 : 2 ] Kink : [ 0.827811 : 1 ]

Is this related? I am not sure whether this output indicates an error.

Regards, Mateusz

On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote:

...
Yes, indeed

On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote:

...
Dear Matthias,

Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now?

Regards, Mateusz

...
This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened.

Matthias

On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote:

...
Dear All, While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place?

Regards, Mateusz Łącki

Matthias Troyer

11:26 p.m.

New subject: worm: kink output indicates some error?

It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use?

Matthias

On 30 Dec 2011, at 23:04, Mateusz Łącki wrote:

...

Dear Matthias, I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2)

Regards, Mateusz Łącki

<dalps.zip>

On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote:

...
Hi,

I can only look into this if you send you input file.

Matthias

On 30 Dec 2011, at 12:05, Mateusz Łącki wrote:

...
Dear All, I have set up some computation which failed:

MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

mpiexec has exited due to process rank 27 with PID 15245 on node clone18 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here).

just before that (in the output file, not sure about the time):

Avoided problem q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 Spin configuration: Wormheads at 78 0.999997 and 78 0.259301 Site: 0 Kink : [ 0.119336 : 0 ] Kink : [ 0.124065 : 1 ] Kink : [ 0.174815 : 0 ] Kink : [ 0.17605 : 1 ] Kink : [ 0.335094 : 2 ] Kink : [ 0.368865 : 1 ] (...) Site: 299 Kink : [ 0.00590279 : 2 ] Kink : [ 0.0326616 : 1 ] Kink : [ 0.0697665 : 0 ] Kink : [ 0.0977223 : 1 ] Kink : [ 0.254292 : 2 ] Kink : [ 0.256147 : 1 ] Kink : [ 0.328286 : 2 ] Kink : [ 0.329838 : 1 ] Kink : [ 0.405038 : 0 ] Kink : [ 0.438803 : 1 ] Kink : [ 0.487034 : 2 ] Kink : [ 0.503331 : 1 ] Kink : [ 0.812159 : 2 ] Kink : [ 0.827811 : 1 ]

Is this related? I am not sure whether this output indicates an error.

Regards, Mateusz

On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote:

...
Yes, indeed

On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote:

...
Dear Matthias,

Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now?

Regards, Mateusz

...
This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened.

Matthias

On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote:

> Dear All, > While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? > > Regards, > Mateusz Łącki

Matthias Troyer

11:33 p.m.

New subject: worm: kink output indicates some error?

One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again:

--- applications/qmc/worms/Wcheck.C (revision 5899) +++ applications/qmc/worms/Wcheck.C (working copy) @@ -94,6 +94,7 @@

void WRun::print_spins() { + std::cout << parms; std::cout << "Spin configuration:\n"; std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl;

On 30 Dec 2011, at 23:26, Matthias Troyer wrote:

...

It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use?

Matthias

On 30 Dec 2011, at 23:04, Mateusz Łącki wrote:

...
Dear Matthias, I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2)

Regards, Mateusz Łącki

<dalps.zip>

On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote:

...
Hi,

I can only look into this if you send you input file.

Matthias

On 30 Dec 2011, at 12:05, Mateusz Łącki wrote:

...
Dear All, I have set up some computation which failed:

MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

mpiexec has exited due to process rank 27 with PID 15245 on node clone18 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here).

just before that (in the output file, not sure about the time):

Avoided problem q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 Spin configuration: Wormheads at 78 0.999997 and 78 0.259301 Site: 0 Kink : [ 0.119336 : 0 ] Kink : [ 0.124065 : 1 ] Kink : [ 0.174815 : 0 ] Kink : [ 0.17605 : 1 ] Kink : [ 0.335094 : 2 ] Kink : [ 0.368865 : 1 ] (...) Site: 299 Kink : [ 0.00590279 : 2 ] Kink : [ 0.0326616 : 1 ] Kink : [ 0.0697665 : 0 ] Kink : [ 0.0977223 : 1 ] Kink : [ 0.254292 : 2 ] Kink : [ 0.256147 : 1 ] Kink : [ 0.328286 : 2 ] Kink : [ 0.329838 : 1 ] Kink : [ 0.405038 : 0 ] Kink : [ 0.438803 : 1 ] Kink : [ 0.487034 : 2 ] Kink : [ 0.503331 : 1 ] Kink : [ 0.812159 : 2 ] Kink : [ 0.827811 : 1 ]

Is this related? I am not sure whether this output indicates an error.

Regards, Mateusz

On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote:

...
Yes, indeed

On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote:

...
Dear Matthias,

Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now?

Regards, Mateusz

> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. > > Matthias > > On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: > >> Dear All, >> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >> >> Regards, >> Mateusz Łącki >

Mateusz Łącki

31 Dec 31 Dec

12:20 p.m.

New subject: worm: kink output indicates some error?

Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help?

Regards, Mateusz

...

One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again:

--- applications/qmc/worms/Wcheck.C (revision 5899) +++ applications/qmc/worms/Wcheck.C (working copy) @@ -94,6 +94,7 @@

void WRun::print_spins() {

std::cout << parms; std::cout << "Spin configuration:\n"; std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl;

On 30 Dec 2011, at 23:26, Matthias Troyer wrote:

...
It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use?

Matthias

On 30 Dec 2011, at 23:04, Mateusz Łącki wrote:

...
Dear Matthias, I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2)

Regards, Mateusz Łącki

<dalps.zip>

On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote:

...
Hi,

I can only look into this if you send you input file.

Matthias

On 30 Dec 2011, at 12:05, Mateusz Łącki wrote:

...
Dear All, I have set up some computation which failed:

MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

mpiexec has exited due to process rank 27 with PID 15245 on node clone18 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here).

just before that (in the output file, not sure about the time):

Avoided problem q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 Spin configuration: Wormheads at 78 0.999997 and 78 0.259301 Site: 0 Kink : [ 0.119336 : 0 ] Kink : [ 0.124065 : 1 ] Kink : [ 0.174815 : 0 ] Kink : [ 0.17605 : 1 ] Kink : [ 0.335094 : 2 ] Kink : [ 0.368865 : 1 ] (...) Site: 299 Kink : [ 0.00590279 : 2 ] Kink : [ 0.0326616 : 1 ] Kink : [ 0.0697665 : 0 ] Kink : [ 0.0977223 : 1 ] Kink : [ 0.254292 : 2 ] Kink : [ 0.256147 : 1 ] Kink : [ 0.328286 : 2 ] Kink : [ 0.329838 : 1 ] Kink : [ 0.405038 : 0 ] Kink : [ 0.438803 : 1 ] Kink : [ 0.487034 : 2 ] Kink : [ 0.503331 : 1 ] Kink : [ 0.812159 : 2 ] Kink : [ 0.827811 : 1 ]

Is this related? I am not sure whether this output indicates an error.

Regards, Mateusz

On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote:

...
Yes, indeed

On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote:

> Dear Matthias, > > Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? > > Regards, > Mateusz > >> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >> >> Matthias >> >> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >> >>> Dear All, >>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>> >>> Regards, >>> Mateusz Łącki >> >

Matthias Troyer

12:31 p.m.

New subject: worm: kink output indicates some error?

No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that.

Matthias

On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote:

...

Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help?

Regards, Mateusz

...
One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again:

--- applications/qmc/worms/Wcheck.C (revision 5899) +++ applications/qmc/worms/Wcheck.C (working copy) @@ -94,6 +94,7 @@

void WRun::print_spins() {

std::cout << parms;

std::cout << "Spin configuration:\n"; std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl;

On 30 Dec 2011, at 23:26, Matthias Troyer wrote:

...
It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use?

Matthias

On 30 Dec 2011, at 23:04, Mateusz Łącki wrote:

...
Dear Matthias, I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2)

Regards, Mateusz Łącki

<dalps.zip>

On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote:

...
Hi,

I can only look into this if you send you input file.

Matthias

On 30 Dec 2011, at 12:05, Mateusz Łącki wrote:

...
Dear All, I have set up some computation which failed:

MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD with errorcode -2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

mpiexec has exited due to process rank 27 with PID 15245 on node clone18 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here).

just before that (in the output file, not sure about the time):

Avoided problem q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 Spin configuration: Wormheads at 78 0.999997 and 78 0.259301 Site: 0 Kink : [ 0.119336 : 0 ] Kink : [ 0.124065 : 1 ] Kink : [ 0.174815 : 0 ] Kink : [ 0.17605 : 1 ] Kink : [ 0.335094 : 2 ] Kink : [ 0.368865 : 1 ] (...) Site: 299 Kink : [ 0.00590279 : 2 ] Kink : [ 0.0326616 : 1 ] Kink : [ 0.0697665 : 0 ] Kink : [ 0.0977223 : 1 ] Kink : [ 0.254292 : 2 ] Kink : [ 0.256147 : 1 ] Kink : [ 0.328286 : 2 ] Kink : [ 0.329838 : 1 ] Kink : [ 0.405038 : 0 ] Kink : [ 0.438803 : 1 ] Kink : [ 0.487034 : 2 ] Kink : [ 0.503331 : 1 ] Kink : [ 0.812159 : 2 ] Kink : [ 0.827811 : 1 ]

Is this related? I am not sure whether this output indicates an error.

Regards, Mateusz

On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote:

> Yes, indeed > > On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: > >> Dear Matthias, >> >> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >> >> Regards, >> Mateusz >> >>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>> >>> Matthias >>> >>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>> >>>> Dear All, >>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>> >>>> Regards, >>>> Mateusz Łącki >>> >> >

Mateusz Łącki

10:35 p.m.

New subject: worm: kink output indicates some error?

Dear Matthias, I attach new output files. Sorry for the delay - I have misapplied the patch and needed to redo the whole procedure. Nevertheless the parameters are printed.

Regards, Happy New Year Mateusz

On Dec 31, 2011, at 12:31 PM, Matthias Troyer wrote:

...

No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that.

Matthias

On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote:

...
Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help?

Regards, Mateusz

...
One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again:

--- applications/qmc/worms/Wcheck.C (revision 5899) +++ applications/qmc/worms/Wcheck.C (working copy) @@ -94,6 +94,7 @@

void WRun::print_spins() {

std::cout << parms;

std::cout << "Spin configuration:\n"; std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl;

On 30 Dec 2011, at 23:26, Matthias Troyer wrote:

...
It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use?

Matthias

On 30 Dec 2011, at 23:04, Mateusz Łącki wrote:

...
Dear Matthias, I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2)

Regards, Mateusz Łącki

<dalps.zip>

On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote:

...
Hi,

I can only look into this if you send you input file.

Matthias

On 30 Dec 2011, at 12:05, Mateusz Łącki wrote:

> Dear All, > I have set up some computation which failed: > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD > with errorcode -2. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec has exited due to process rank 27 with PID 15245 on > node clone18 exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpiexec (as reported here). > -------------------------------------------------------------------------- > > just before that (in the output file, not sure about the time): > > Avoided problem > q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 > Spin configuration: > Wormheads at 78 0.999997 and 78 0.259301 > Site: 0 > Kink : [ 0.119336 : 0 ] > Kink : [ 0.124065 : 1 ] > Kink : [ 0.174815 : 0 ] > Kink : [ 0.17605 : 1 ] > Kink : [ 0.335094 : 2 ] > Kink : [ 0.368865 : 1 ] > (...) > Site: 299 > Kink : [ 0.00590279 : 2 ] > Kink : [ 0.0326616 : 1 ] > Kink : [ 0.0697665 : 0 ] > Kink : [ 0.0977223 : 1 ] > Kink : [ 0.254292 : 2 ] > Kink : [ 0.256147 : 1 ] > Kink : [ 0.328286 : 2 ] > Kink : [ 0.329838 : 1 ] > Kink : [ 0.405038 : 0 ] > Kink : [ 0.438803 : 1 ] > Kink : [ 0.487034 : 2 ] > Kink : [ 0.503331 : 1 ] > Kink : [ 0.812159 : 2 ] > Kink : [ 0.827811 : 1 ] > > > Is this related? I am not sure whether this output indicates an error. > > Regards, > Mateusz > > On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote: > >> Yes, indeed >> >> On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: >> >>> Dear Matthias, >>> >>> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >>> >>> Regards, >>> Mateusz >>> >>>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>>> >>>> Matthias >>>> >>>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>>> >>>>> Dear All, >>>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>>> >>>>> Regards, >>>>> Mateusz Łącki >>>> >>> >> >

Mateusz Łącki

14 Jan 14 Jan

7:22 p.m.

New subject: worm: kink output indicates some error?

Dear Matthias, Did You have any success troubleshooting the issue? I have run into it again (different compilation, different machine). The only common thing seems to be a large number of worms involved (in the newest case it was 50). Of course the error is identical, *xml files too, but parameters are a little different.

Regards, Mateusz

On Dec 31, 2011, at 10:35 PM, Mateusz Łącki wrote:

...

Dear Matthias, I attach new output files. Sorry for the delay - I have misapplied the patch and needed to redo the whole procedure. Nevertheless the parameters are printed.

<dalps2.zip>

Regards, Happy New Year Mateusz

On Dec 31, 2011, at 12:31 PM, Matthias Troyer wrote:

...
No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that.

Matthias

On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote:

...
Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help?

Regards, Mateusz

...
One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again:

--- applications/qmc/worms/Wcheck.C (revision 5899) +++ applications/qmc/worms/Wcheck.C (working copy) @@ -94,6 +94,7 @@

void WRun::print_spins() {

std::cout << parms;

std::cout << "Spin configuration:\n"; std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl;

On 30 Dec 2011, at 23:26, Matthias Troyer wrote:

...
It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use?

Matthias

On 30 Dec 2011, at 23:04, Mateusz Łącki wrote:

...
Dear Matthias, I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2)

Regards, Mateusz Łącki

<dalps.zip>

On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote:

> Hi, > > I can only look into this if you send you input file. > > Matthias > > On 30 Dec 2011, at 12:05, Mateusz Łącki wrote: > >> Dear All, >> I have set up some computation which failed: >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD >> with errorcode -2. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpiexec has exited due to process rank 27 with PID 15245 on >> node clone18 exiting without calling "finalize". This may >> have caused other processes in the application to be >> terminated by signals sent by mpiexec (as reported here). >> -------------------------------------------------------------------------- >> >> just before that (in the output file, not sure about the time): >> >> Avoided problem >> q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 >> Spin configuration: >> Wormheads at 78 0.999997 and 78 0.259301 >> Site: 0 >> Kink : [ 0.119336 : 0 ] >> Kink : [ 0.124065 : 1 ] >> Kink : [ 0.174815 : 0 ] >> Kink : [ 0.17605 : 1 ] >> Kink : [ 0.335094 : 2 ] >> Kink : [ 0.368865 : 1 ] >> (...) >> Site: 299 >> Kink : [ 0.00590279 : 2 ] >> Kink : [ 0.0326616 : 1 ] >> Kink : [ 0.0697665 : 0 ] >> Kink : [ 0.0977223 : 1 ] >> Kink : [ 0.254292 : 2 ] >> Kink : [ 0.256147 : 1 ] >> Kink : [ 0.328286 : 2 ] >> Kink : [ 0.329838 : 1 ] >> Kink : [ 0.405038 : 0 ] >> Kink : [ 0.438803 : 1 ] >> Kink : [ 0.487034 : 2 ] >> Kink : [ 0.503331 : 1 ] >> Kink : [ 0.812159 : 2 ] >> Kink : [ 0.827811 : 1 ] >> >> >> Is this related? I am not sure whether this output indicates an error. >> >> Regards, >> Mateusz >> >> On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote: >> >>> Yes, indeed >>> >>> On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: >>> >>>> Dear Matthias, >>>> >>>> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >>>> >>>> Regards, >>>> Mateusz >>>> >>>>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>>>> >>>>> Matthias >>>>> >>>>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>>>> >>>>>> Dear All, >>>>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>>>> >>>>>> Regards, >>>>>> Mateusz Łącki >>>>> >>>> >>> >> >

Mateusz Łącki

17 Jan 17 Jan

4:31 p.m.

New subject: worm: kink output indicates some error?

Dear Matthias I would like add some new details ("read" messages come from the very initial patch for diagnosing the iostream fix. I was too lazy to unpack the source again). On one core with no mpi apparently the problem is less severe. At least I was able to obtain the following output:

Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 52% done). Checking if it is finished: not yet, next check in 900 seconds ( 54% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 55% done). Checking if it is finished: not yet, next check in 900 seconds ( 57% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 59% done). Checking if it is finished: not yet, next check in 900 seconds ( 61% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Avoided problem Avoided problem Avoided problem Avoided problem Avoided problem

But the simulation stopped outputting anything 24h ago (i took over 1904 minutes altogether, so it took around 9h to calculate up to 60%). The problem looks substantially different than the mpi case.

The file was:

LATTICE="inhomogeneous open chain lattice"; L=30;

MODEL="trapped boson Hubbard"; NONLOCAL=0; U = 1.0; mu = 0.5; Nmax = 5;

T=0.04; t=0.05; K=0.00

MEASURE[Correlations] = 'True'; MEASURE_LOCAL[Occupation] = "n"; MEASURE_LOCAL[SlonTrabalski] = "n2"; MEASURE_CORRELATION[Czeslaw] = "n:n" THERMALIZATION=100000; SWEEPS=2000000; dasdaRESTRICT_MEASUREMENTS[N]=30

{t=0.55; mu=-0.315; L=120; RESTRICT_MEASUREMENTS[N]=120}

W dniu 14 stycznia 2012 19:22 użytkownik Mateusz Łącki mateusz.lacki@gmail.com napisał:

...

Dear Matthias, Did You have any success troubleshooting the issue? I have run into it again (different compilation, different machine). The only common thing seems to be a large number of worms involved (in the newest case it was 50). Of course the error is identical, *xml files too, but parameters are a little different.

Regards, Mateusz

On Dec 31, 2011, at 10:35 PM, Mateusz Łącki wrote:

...
Dear Matthias, I attach new output files. Sorry for the delay - I have misapplied the patch and needed to redo the whole procedure. Nevertheless the parameters are printed.

<dalps2.zip>

Regards, Happy New Year Mateusz

On Dec 31, 2011, at 12:31 PM, Matthias Troyer wrote:

...
No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that.

Matthias

On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote:

...
Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help?

Regards, Mateusz

...
One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again:

--- applications/qmc/worms/Wcheck.C (revision 5899) +++ applications/qmc/worms/Wcheck.C (working copy) @@ -94,6 +94,7 @@

void WRun::print_spins() {

std::cout << parms;

std::cout << "Spin configuration:\n"; std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl;

On 30 Dec 2011, at 23:26, Matthias Troyer wrote:

...
It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use?

Matthias

On 30 Dec 2011, at 23:04, Mateusz Łącki wrote:

> Dear Matthias, > I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2) > > Regards, > Mateusz Łącki > > <dalps.zip> > > On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote: > >> Hi, >> >> I can only look into this if you send you input file. >> >> Matthias >> >> On 30 Dec 2011, at 12:05, Mateusz Łącki wrote: >> >>> Dear All, >>> I have set up some computation which failed: >>> -------------------------------------------------------------------------- >>> MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD >>> with errorcode -2. >>> >>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>> You may or may not see output from other processes, depending on >>> exactly when Open MPI kills them. >>> -------------------------------------------------------------------------- >>> -------------------------------------------------------------------------- >>> mpiexec has exited due to process rank 27 with PID 15245 on >>> node clone18 exiting without calling "finalize". This may >>> have caused other processes in the application to be >>> terminated by signals sent by mpiexec (as reported here). >>> -------------------------------------------------------------------------- >>> >>> just before that (in the output file, not sure about the time): >>> >>> Avoided problem >>> q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 >>> Spin configuration: >>> Wormheads at 78 0.999997 and 78 0.259301 >>> Site: 0 >>> Kink : [ 0.119336 : 0 ] >>> Kink : [ 0.124065 : 1 ] >>> Kink : [ 0.174815 : 0 ] >>> Kink : [ 0.17605 : 1 ] >>> Kink : [ 0.335094 : 2 ] >>> Kink : [ 0.368865 : 1 ] >>> (...) >>> Site: 299 >>> Kink : [ 0.00590279 : 2 ] >>> Kink : [ 0.0326616 : 1 ] >>> Kink : [ 0.0697665 : 0 ] >>> Kink : [ 0.0977223 : 1 ] >>> Kink : [ 0.254292 : 2 ] >>> Kink : [ 0.256147 : 1 ] >>> Kink : [ 0.328286 : 2 ] >>> Kink : [ 0.329838 : 1 ] >>> Kink : [ 0.405038 : 0 ] >>> Kink : [ 0.438803 : 1 ] >>> Kink : [ 0.487034 : 2 ] >>> Kink : [ 0.503331 : 1 ] >>> Kink : [ 0.812159 : 2 ] >>> Kink : [ 0.827811 : 1 ] >>> >>> >>> Is this related? I am not sure whether this output indicates an error. >>> >>> Regards, >>> Mateusz >>> >>> On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote: >>> >>>> Yes, indeed >>>> >>>> On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: >>>> >>>>> Dear Matthias, >>>>> >>>>> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >>>>> >>>>> Regards, >>>>> Mateusz >>>>> >>>>>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>>>>> >>>>>> Matthias >>>>>> >>>>>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>>>>> >>>>>>> Dear All, >>>>>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>>>>> >>>>>>> Regards, >>>>>>> Mateusz Łącki >>>>>> >>>>> >>>> >>> >> >

Matthias Troyer

4:38 p.m.

New subject: worm: kink output indicates some error?

Dear Mateusz,

I will look at the issue as soon as I find some time to investigate it. Right now I have been too busy with with other important things to do. But I have not forgotten your problem.

Matthias

On 17 Jan 2012, at 08:31, Mateusz Łącki wrote:

...

Dear Matthias I would like add some new details ("read" messages come from the very initial patch for diagnosing the iostream fix. I was too lazy to unpack the source again). On one core with no mpi apparently the problem is less severe. At least I was able to obtain the following output:

Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 52% done). Checking if it is finished: not yet, next check in 900 seconds ( 54% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 55% done). Checking if it is finished: not yet, next check in 900 seconds ( 57% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 59% done). Checking if it is finished: not yet, next check in 900 seconds ( 61% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Avoided problem Avoided problem Avoided problem Avoided problem Avoided problem

But the simulation stopped outputting anything 24h ago (i took over 1904 minutes altogether, so it took around 9h to calculate up to 60%). The problem looks substantially different than the mpi case.

The file was:

LATTICE="inhomogeneous open chain lattice"; L=30;

MODEL="trapped boson Hubbard"; NONLOCAL=0; U = 1.0; mu = 0.5; Nmax = 5;

T=0.04; t=0.05; K=0.00

MEASURE[Correlations] = 'True'; MEASURE_LOCAL[Occupation] = "n"; MEASURE_LOCAL[SlonTrabalski] = "n2"; MEASURE_CORRELATION[Czeslaw] = "n:n" THERMALIZATION=100000; SWEEPS=2000000; dasdaRESTRICT_MEASUREMENTS[N]=30

{t=0.55; mu=-0.315; L=120; RESTRICT_MEASUREMENTS[N]=120}

W dniu 14 stycznia 2012 19:22 użytkownik Mateusz Łącki mateusz.lacki@gmail.com napisał:

...
Dear Matthias, Did You have any success troubleshooting the issue? I have run into it again (different compilation, different machine). The only common thing seems to be a large number of worms involved (in the newest case it was 50). Of course the error is identical, *xml files too, but parameters are a little different.

Regards, Mateusz

On Dec 31, 2011, at 10:35 PM, Mateusz Łącki wrote:

...
Dear Matthias, I attach new output files. Sorry for the delay - I have misapplied the patch and needed to redo the whole procedure. Nevertheless the parameters are printed.

<dalps2.zip>

Regards, Happy New Year Mateusz

On Dec 31, 2011, at 12:31 PM, Matthias Troyer wrote:

...
No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that.

Matthias

On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote:

...
Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help?

Regards, Mateusz

...
One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again:

--- applications/qmc/worms/Wcheck.C (revision 5899) +++ applications/qmc/worms/Wcheck.C (working copy) @@ -94,6 +94,7 @@

void WRun::print_spins() {

std::cout << parms;

std::cout << "Spin configuration:\n"; std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl;

On 30 Dec 2011, at 23:26, Matthias Troyer wrote:

> It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use? > > Matthias > > On 30 Dec 2011, at 23:04, Mateusz Łącki wrote: > >> Dear Matthias, >> I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2) >> >> Regards, >> Mateusz Łącki >> >> <dalps.zip> >> >> On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote: >> >>> Hi, >>> >>> I can only look into this if you send you input file. >>> >>> Matthias >>> >>> On 30 Dec 2011, at 12:05, Mateusz Łącki wrote: >>> >>>> Dear All, >>>> I have set up some computation which failed: >>>> -------------------------------------------------------------------------- >>>> MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD >>>> with errorcode -2. >>>> >>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>> You may or may not see output from other processes, depending on >>>> exactly when Open MPI kills them. >>>> -------------------------------------------------------------------------- >>>> -------------------------------------------------------------------------- >>>> mpiexec has exited due to process rank 27 with PID 15245 on >>>> node clone18 exiting without calling "finalize". This may >>>> have caused other processes in the application to be >>>> terminated by signals sent by mpiexec (as reported here). >>>> -------------------------------------------------------------------------- >>>> >>>> just before that (in the output file, not sure about the time): >>>> >>>> Avoided problem >>>> q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 >>>> Spin configuration: >>>> Wormheads at 78 0.999997 and 78 0.259301 >>>> Site: 0 >>>> Kink : [ 0.119336 : 0 ] >>>> Kink : [ 0.124065 : 1 ] >>>> Kink : [ 0.174815 : 0 ] >>>> Kink : [ 0.17605 : 1 ] >>>> Kink : [ 0.335094 : 2 ] >>>> Kink : [ 0.368865 : 1 ] >>>> (...) >>>> Site: 299 >>>> Kink : [ 0.00590279 : 2 ] >>>> Kink : [ 0.0326616 : 1 ] >>>> Kink : [ 0.0697665 : 0 ] >>>> Kink : [ 0.0977223 : 1 ] >>>> Kink : [ 0.254292 : 2 ] >>>> Kink : [ 0.256147 : 1 ] >>>> Kink : [ 0.328286 : 2 ] >>>> Kink : [ 0.329838 : 1 ] >>>> Kink : [ 0.405038 : 0 ] >>>> Kink : [ 0.438803 : 1 ] >>>> Kink : [ 0.487034 : 2 ] >>>> Kink : [ 0.503331 : 1 ] >>>> Kink : [ 0.812159 : 2 ] >>>> Kink : [ 0.827811 : 1 ] >>>> >>>> >>>> Is this related? I am not sure whether this output indicates an error. >>>> >>>> Regards, >>>> Mateusz >>>> >>>> On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote: >>>> >>>>> Yes, indeed >>>>> >>>>> On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: >>>>> >>>>>> Dear Matthias, >>>>>> >>>>>> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >>>>>> >>>>>> Regards, >>>>>> Mateusz >>>>>> >>>>>>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>>>>>> >>>>>>> Matthias >>>>>>> >>>>>>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>>>>>> >>>>>>>> Dear All, >>>>>>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Mateusz Łącki >>>>>>> >>>>>> >>>>> >>>> >>> >> >

Mateusz Łącki

18 Jan 18 Jan

3:45 p.m.

New subject: worm: kink output indicates some error?

Dear Matthias, In the meantime I wish to report that I restarted the task and it completed on one core.

Regards, Mateusz

W dniu 17 stycznia 2012 16:38 użytkownik Matthias Troyer troyer@phys.ethz.ch napisał:

...

Dear Mateusz,

I will look at the issue as soon as I find some time to investigate it. Right now I have been too busy with with other important things to do. But I have not forgotten your problem.

Matthias

On 17 Jan 2012, at 08:31, Mateusz Łącki wrote:

...
Dear Matthias I would like add some new details ("read" messages come from the very initial patch for diagnosing the iostream fix. I was too lazy to unpack the source again). On one core with no mpi apparently the problem is less severe. At least I was able to obtain the following output:

Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 52% done). Checking if it is finished: not yet, next check in 900 seconds ( 54% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 55% done). Checking if it is finished: not yet, next check in 900 seconds ( 57% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 59% done). Checking if it is finished: not yet, next check in 900 seconds ( 61% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Avoided problem Avoided problem Avoided problem Avoided problem Avoided problem

But the simulation stopped outputting anything 24h ago (i took over 1904 minutes altogether, so it took around 9h to calculate up to 60%). The problem looks substantially different than the mpi case.

The file was:

LATTICE="inhomogeneous open chain lattice"; L=30;

MODEL="trapped boson Hubbard"; NONLOCAL=0; U = 1.0; mu = 0.5; Nmax = 5;

T=0.04; t=0.05; K=0.00

MEASURE[Correlations] = 'True'; MEASURE_LOCAL[Occupation] = "n"; MEASURE_LOCAL[SlonTrabalski] = "n2"; MEASURE_CORRELATION[Czeslaw] = "n:n" THERMALIZATION=100000; SWEEPS=2000000; dasdaRESTRICT_MEASUREMENTS[N]=30

{t=0.55; mu=-0.315; L=120; RESTRICT_MEASUREMENTS[N]=120}

W dniu 14 stycznia 2012 19:22 użytkownik Mateusz Łącki mateusz.lacki@gmail.com napisał:

...
Dear Matthias, Did You have any success troubleshooting the issue? I have run into it again (different compilation, different machine). The only common thing seems to be a large number of worms involved (in the newest case it was 50). Of course the error is identical, *xml files too, but parameters are a little different.

Regards, Mateusz

On Dec 31, 2011, at 10:35 PM, Mateusz Łącki wrote:

...
Dear Matthias, I attach new output files. Sorry for the delay - I have misapplied the patch and needed to redo the whole procedure. Nevertheless the parameters are printed.

<dalps2.zip>

Regards, Happy New Year Mateusz

On Dec 31, 2011, at 12:31 PM, Matthias Troyer wrote:

...
No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that.

Matthias

On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote:

...
Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help?

Regards, Mateusz

> One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again: > > --- applications/qmc/worms/Wcheck.C (revision 5899) > +++ applications/qmc/worms/Wcheck.C (working copy) > @@ -94,6 +94,7 @@ > > void WRun::print_spins() > { > + std::cout << parms; > std::cout << "Spin configuration:\n"; > std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() > << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl; > > > > On 30 Dec 2011, at 23:26, Matthias Troyer wrote: > >> It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use? >> >> Matthias >> >> On 30 Dec 2011, at 23:04, Mateusz Łącki wrote: >> >>> Dear Matthias, >>> I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2) >>> >>> Regards, >>> Mateusz Łącki >>> >>> <dalps.zip> >>> >>> On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote: >>> >>>> Hi, >>>> >>>> I can only look into this if you send you input file. >>>> >>>> Matthias >>>> >>>> On 30 Dec 2011, at 12:05, Mateusz Łącki wrote: >>>> >>>>> Dear All, >>>>> I have set up some computation which failed: >>>>> -------------------------------------------------------------------------- >>>>> MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD >>>>> with errorcode -2. >>>>> >>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>> You may or may not see output from other processes, depending on >>>>> exactly when Open MPI kills them. >>>>> -------------------------------------------------------------------------- >>>>> -------------------------------------------------------------------------- >>>>> mpiexec has exited due to process rank 27 with PID 15245 on >>>>> node clone18 exiting without calling "finalize". This may >>>>> have caused other processes in the application to be >>>>> terminated by signals sent by mpiexec (as reported here). >>>>> -------------------------------------------------------------------------- >>>>> >>>>> just before that (in the output file, not sure about the time): >>>>> >>>>> Avoided problem >>>>> q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 >>>>> Spin configuration: >>>>> Wormheads at 78 0.999997 and 78 0.259301 >>>>> Site: 0 >>>>> Kink : [ 0.119336 : 0 ] >>>>> Kink : [ 0.124065 : 1 ] >>>>> Kink : [ 0.174815 : 0 ] >>>>> Kink : [ 0.17605 : 1 ] >>>>> Kink : [ 0.335094 : 2 ] >>>>> Kink : [ 0.368865 : 1 ] >>>>> (...) >>>>> Site: 299 >>>>> Kink : [ 0.00590279 : 2 ] >>>>> Kink : [ 0.0326616 : 1 ] >>>>> Kink : [ 0.0697665 : 0 ] >>>>> Kink : [ 0.0977223 : 1 ] >>>>> Kink : [ 0.254292 : 2 ] >>>>> Kink : [ 0.256147 : 1 ] >>>>> Kink : [ 0.328286 : 2 ] >>>>> Kink : [ 0.329838 : 1 ] >>>>> Kink : [ 0.405038 : 0 ] >>>>> Kink : [ 0.438803 : 1 ] >>>>> Kink : [ 0.487034 : 2 ] >>>>> Kink : [ 0.503331 : 1 ] >>>>> Kink : [ 0.812159 : 2 ] >>>>> Kink : [ 0.827811 : 1 ] >>>>> >>>>> >>>>> Is this related? I am not sure whether this output indicates an error. >>>>> >>>>> Regards, >>>>> Mateusz >>>>> >>>>> On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote: >>>>> >>>>>> Yes, indeed >>>>>> >>>>>> On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: >>>>>> >>>>>>> Dear Matthias, >>>>>>> >>>>>>> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >>>>>>> >>>>>>> Regards, >>>>>>> Mateusz >>>>>>> >>>>>>>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>>>>>>> >>>>>>>> Matthias >>>>>>>> >>>>>>>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>>>>>>> >>>>>>>>> Dear All, >>>>>>>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Mateusz Łącki >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

Mateusz Łącki

21 Jul 21 Jul

8:54 p.m.

New subject: worm: kink output indicates some error?

Dear Matthias, Did you have time to investigate the issue? Is is perhaps fixed in newer versions of alps (this online trunk?) . After out correspondence I have run into the issue several times (always after lengthy calcualtions, I have never see a case when the kink output, appears right away - typically after several hours of testing). I occurs in 2.2 as well.

Is there perhaps a quick “dirty” fix one can do?

Best, Mateusz

...

On 17 Jan 2012, at 16:38, Matthias Troyer troyer@phys.ethz.ch wrote:

Dear Mateusz,

I will look at the issue as soon as I find some time to investigate it. Right now I have been too busy with with other important things to do. But I have not forgotten your problem.

Matthias

On 17 Jan 2012, at 08:31, Mateusz Łącki wrote:

...
Dear Matthias I would like add some new details ("read" messages come from the very initial patch for diagnosing the iostream fix. I was too lazy to unpack the source again). On one core with no mpi apparently the problem is less severe. At least I was able to obtain the following output:

Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 52% done). Checking if it is finished: not yet, next check in 900 seconds ( 54% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 55% done). Checking if it is finished: not yet, next check in 900 seconds ( 57% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 59% done). Checking if it is finished: not yet, next check in 900 seconds ( 61% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Avoided problem Avoided problem Avoided problem Avoided problem Avoided problem

But the simulation stopped outputting anything 24h ago (i took over 1904 minutes altogether, so it took around 9h to calculate up to 60%). The problem looks substantially different than the mpi case.

The file was:

LATTICE="inhomogeneous open chain lattice"; L=30;

MODEL="trapped boson Hubbard"; NONLOCAL=0; U = 1.0; mu = 0.5; Nmax = 5;

T=0.04; t=0.05; K=0.00

MEASURE[Correlations] = 'True'; MEASURE_LOCAL[Occupation] = "n"; MEASURE_LOCAL[SlonTrabalski] = "n2"; MEASURE_CORRELATION[Czeslaw] = "n:n" THERMALIZATION=100000; SWEEPS=2000000; dasdaRESTRICT_MEASUREMENTS[N]=30

{t=0.55; mu=-0.315; L=120; RESTRICT_MEASUREMENTS[N]=120}

W dniu 14 stycznia 2012 19:22 użytkownik Mateusz Łącki mateusz.lacki@gmail.com napisał:

...
Dear Matthias, Did You have any success troubleshooting the issue? I have run into it again (different compilation, different machine). The only common thing seems to be a large number of worms involved (in the newest case it was 50). Of course the error is identical, *xml files too, but parameters are a little different.

Regards, Mateusz

On Dec 31, 2011, at 10:35 PM, Mateusz Łącki wrote:

...
Dear Matthias, I attach new output files. Sorry for the delay - I have misapplied the patch and needed to redo the whole procedure. Nevertheless the parameters are printed.

<dalps2.zip>

Regards, Happy New Year Mateusz

On Dec 31, 2011, at 12:31 PM, Matthias Troyer wrote:

...
No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that.

Matthias

On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote:

...
Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help?

Regards, Mateusz

> One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again: > > --- applications/qmc/worms/Wcheck.C (revision 5899) > +++ applications/qmc/worms/Wcheck.C (working copy) > @@ -94,6 +94,7 @@ > > void WRun::print_spins() > { > + std::cout << parms; > std::cout << "Spin configuration:\n"; > std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() > << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl; > > > > On 30 Dec 2011, at 23:26, Matthias Troyer wrote: > >> It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use? >> >> Matthias >> >> On 30 Dec 2011, at 23:04, Mateusz Łącki wrote: >> >>> Dear Matthias, >>> I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2) >>> >>> Regards, >>> Mateusz Łącki >>> >>> <dalps.zip> >>> >>> On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote: >>> >>>> Hi, >>>> >>>> I can only look into this if you send you input file. >>>> >>>> Matthias >>>> >>>> On 30 Dec 2011, at 12:05, Mateusz Łącki wrote: >>>> >>>>> Dear All, >>>>> I have set up some computation which failed: >>>>> -------------------------------------------------------------------------- >>>>> MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD >>>>> with errorcode -2. >>>>> >>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>> You may or may not see output from other processes, depending on >>>>> exactly when Open MPI kills them. >>>>> -------------------------------------------------------------------------- >>>>> -------------------------------------------------------------------------- >>>>> mpiexec has exited due to process rank 27 with PID 15245 on >>>>> node clone18 exiting without calling "finalize". This may >>>>> have caused other processes in the application to be >>>>> terminated by signals sent by mpiexec (as reported here). >>>>> -------------------------------------------------------------------------- >>>>> >>>>> just before that (in the output file, not sure about the time): >>>>> >>>>> Avoided problem >>>>> q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 >>>>> Spin configuration: >>>>> Wormheads at 78 0.999997 and 78 0.259301 >>>>> Site: 0 >>>>> Kink : [ 0.119336 : 0 ] >>>>> Kink : [ 0.124065 : 1 ] >>>>> Kink : [ 0.174815 : 0 ] >>>>> Kink : [ 0.17605 : 1 ] >>>>> Kink : [ 0.335094 : 2 ] >>>>> Kink : [ 0.368865 : 1 ] >>>>> (...) >>>>> Site: 299 >>>>> Kink : [ 0.00590279 : 2 ] >>>>> Kink : [ 0.0326616 : 1 ] >>>>> Kink : [ 0.0697665 : 0 ] >>>>> Kink : [ 0.0977223 : 1 ] >>>>> Kink : [ 0.254292 : 2 ] >>>>> Kink : [ 0.256147 : 1 ] >>>>> Kink : [ 0.328286 : 2 ] >>>>> Kink : [ 0.329838 : 1 ] >>>>> Kink : [ 0.405038 : 0 ] >>>>> Kink : [ 0.438803 : 1 ] >>>>> Kink : [ 0.487034 : 2 ] >>>>> Kink : [ 0.503331 : 1 ] >>>>> Kink : [ 0.812159 : 2 ] >>>>> Kink : [ 0.827811 : 1 ] >>>>> >>>>> >>>>> Is this related? I am not sure whether this output indicates an error. >>>>> >>>>> Regards, >>>>> Mateusz >>>>> >>>>> On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote: >>>>> >>>>>> Yes, indeed >>>>>> >>>>>> On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: >>>>>> >>>>>>> Dear Matthias, >>>>>>> >>>>>>> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >>>>>>> >>>>>>> Regards, >>>>>>> Mateusz >>>>>>> >>>>>>>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>>>>>>> >>>>>>>> Matthias >>>>>>>> >>>>>>>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>>>>>>> >>>>>>>>> Dear All, >>>>>>>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Mateusz Łącki >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

Mateusz Łącki

9:21 p.m.

New subject: worm: kink output indicates some error?

I think now the problem is slightly different, as I get something else too: q = -0 state1 = 0 state2 = 0 bond_type = 0 id 1173987462 1173987462 zero matrix element in remove_jump

This appears to be caused by low temeprature (such as T=0.02) and BH model with high nmax=6 (however this is a guess based on a single run).

Is it possible that the “weight” of configuration with occupation 5 is sth like exp(-50* ( 5*(5-1)/2 ) ) = exp(-50 * 10 ) = underflow and this kills the worm code?

Let me point out that I am guessing at this point, having little understanding of the code.

...

On 21 Jul 2015, at 20:54, Mateusz Łącki mateusz.lacki@gmail.com wrote:

Dear Matthias, Did you have time to investigate the issue? Is is perhaps fixed in newer versions of alps (this online trunk?) . After out correspondence I have run into the issue several times (always after lengthy calcualtions, I have never see a case when the kink output, appears right away - typically after several hours of testing). I occurs in 2.2 as well.

Is there perhaps a quick “dirty” fix one can do?

Best, Mateusz

...
On 17 Jan 2012, at 16:38, Matthias Troyer troyer@phys.ethz.ch wrote:

Dear Mateusz,

I will look at the issue as soon as I find some time to investigate it. Right now I have been too busy with with other important things to do. But I have not forgotten your problem.

Matthias

On 17 Jan 2012, at 08:31, Mateusz Łącki wrote:

...
Dear Matthias I would like add some new details ("read" messages come from the very initial patch for diagnosing the iostream fix. I was too lazy to unpack the source again). On one core with no mpi apparently the problem is less severe. At least I was able to obtain the following output:

Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 52% done). Checking if it is finished: not yet, next check in 900 seconds ( 54% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 55% done). Checking if it is finished: not yet, next check in 900 seconds ( 57% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 59% done). Checking if it is finished: not yet, next check in 900 seconds ( 61% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Avoided problem Avoided problem Avoided problem Avoided problem Avoided problem

But the simulation stopped outputting anything 24h ago (i took over 1904 minutes altogether, so it took around 9h to calculate up to 60%). The problem looks substantially different than the mpi case.

The file was:

LATTICE="inhomogeneous open chain lattice"; L=30;

MODEL="trapped boson Hubbard"; NONLOCAL=0; U = 1.0; mu = 0.5; Nmax = 5;

T=0.04; t=0.05; K=0.00

MEASURE[Correlations] = 'True'; MEASURE_LOCAL[Occupation] = "n"; MEASURE_LOCAL[SlonTrabalski] = "n2"; MEASURE_CORRELATION[Czeslaw] = "n:n" THERMALIZATION=100000; SWEEPS=2000000; dasdaRESTRICT_MEASUREMENTS[N]=30

{t=0.55; mu=-0.315; L=120; RESTRICT_MEASUREMENTS[N]=120}

W dniu 14 stycznia 2012 19:22 użytkownik Mateusz Łącki mateusz.lacki@gmail.com napisał:

...
Dear Matthias, Did You have any success troubleshooting the issue? I have run into it again (different compilation, different machine). The only common thing seems to be a large number of worms involved (in the newest case it was 50). Of course the error is identical, *xml files too, but parameters are a little different.

Regards, Mateusz

On Dec 31, 2011, at 10:35 PM, Mateusz Łącki wrote:

...
Dear Matthias, I attach new output files. Sorry for the delay - I have misapplied the patch and needed to redo the whole procedure. Nevertheless the parameters are printed.

<dalps2.zip>

Regards, Happy New Year Mateusz

On Dec 31, 2011, at 12:31 PM, Matthias Troyer wrote:

...
No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that.

Matthias

On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote:

> Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help? > > Regards, > Mateusz > >> One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again: >> >> --- applications/qmc/worms/Wcheck.C (revision 5899) >> +++ applications/qmc/worms/Wcheck.C (working copy) >> @@ -94,6 +94,7 @@ >> >> void WRun::print_spins() >> { >> + std::cout << parms; >> std::cout << "Spin configuration:\n"; >> std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() >> << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl; >> >> >> >> On 30 Dec 2011, at 23:26, Matthias Troyer wrote: >> >>> It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use? >>> >>> Matthias >>> >>> On 30 Dec 2011, at 23:04, Mateusz Łącki wrote: >>> >>>> Dear Matthias, >>>> I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2) >>>> >>>> Regards, >>>> Mateusz Łącki >>>> >>>> <dalps.zip> >>>> >>>> On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote: >>>> >>>>> Hi, >>>>> >>>>> I can only look into this if you send you input file. >>>>> >>>>> Matthias >>>>> >>>>> On 30 Dec 2011, at 12:05, Mateusz Łącki wrote: >>>>> >>>>>> Dear All, >>>>>> I have set up some computation which failed: >>>>>> -------------------------------------------------------------------------- >>>>>> MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD >>>>>> with errorcode -2. >>>>>> >>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>> You may or may not see output from other processes, depending on >>>>>> exactly when Open MPI kills them. >>>>>> -------------------------------------------------------------------------- >>>>>> -------------------------------------------------------------------------- >>>>>> mpiexec has exited due to process rank 27 with PID 15245 on >>>>>> node clone18 exiting without calling "finalize". This may >>>>>> have caused other processes in the application to be >>>>>> terminated by signals sent by mpiexec (as reported here). >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> just before that (in the output file, not sure about the time): >>>>>> >>>>>> Avoided problem >>>>>> q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 >>>>>> Spin configuration: >>>>>> Wormheads at 78 0.999997 and 78 0.259301 >>>>>> Site: 0 >>>>>> Kink : [ 0.119336 : 0 ] >>>>>> Kink : [ 0.124065 : 1 ] >>>>>> Kink : [ 0.174815 : 0 ] >>>>>> Kink : [ 0.17605 : 1 ] >>>>>> Kink : [ 0.335094 : 2 ] >>>>>> Kink : [ 0.368865 : 1 ] >>>>>> (...) >>>>>> Site: 299 >>>>>> Kink : [ 0.00590279 : 2 ] >>>>>> Kink : [ 0.0326616 : 1 ] >>>>>> Kink : [ 0.0697665 : 0 ] >>>>>> Kink : [ 0.0977223 : 1 ] >>>>>> Kink : [ 0.254292 : 2 ] >>>>>> Kink : [ 0.256147 : 1 ] >>>>>> Kink : [ 0.328286 : 2 ] >>>>>> Kink : [ 0.329838 : 1 ] >>>>>> Kink : [ 0.405038 : 0 ] >>>>>> Kink : [ 0.438803 : 1 ] >>>>>> Kink : [ 0.487034 : 2 ] >>>>>> Kink : [ 0.503331 : 1 ] >>>>>> Kink : [ 0.812159 : 2 ] >>>>>> Kink : [ 0.827811 : 1 ] >>>>>> >>>>>> >>>>>> Is this related? I am not sure whether this output indicates an error. >>>>>> >>>>>> Regards, >>>>>> Mateusz >>>>>> >>>>>> On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote: >>>>>> >>>>>>> Yes, indeed >>>>>>> >>>>>>> On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: >>>>>>> >>>>>>>> Dear Matthias, >>>>>>>> >>>>>>>> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Mateusz >>>>>>>> >>>>>>>>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>>>>>>>> >>>>>>>>> Matthias >>>>>>>>> >>>>>>>>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>>>>>>>> >>>>>>>>>> Dear All, >>>>>>>>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Mateusz Łącki >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

Matthias Troyer

9:23 p.m.

New subject: worm: kink output indicates some error?

Please use the new dwa code for doing simulations with the worm algorithm. The old worm code is about to be deprecated

Matthias

...

On 21 Jul 2015, at 13:21, Mateusz Łącki mateusz.lacki@gmail.com wrote:

I think now the problem is slightly different, as I get something else too: q = -0 state1 = 0 state2 = 0 bond_type = 0 id 1173987462 1173987462 zero matrix element in remove_jump

This appears to be caused by low temeprature (such as T=0.02) and BH model with high nmax=6 (however this is a guess based on a single run).

Is it possible that the “weight” of configuration with occupation 5 is sth like exp(-50* ( 5*(5-1)/2 ) ) = exp(-50 * 10 ) = underflow and this kills the worm code?

Let me point out that I am guessing at this point, having little understanding of the code.

...
On 21 Jul 2015, at 20:54, Mateusz Łącki mateusz.lacki@gmail.com wrote:

Dear Matthias, Did you have time to investigate the issue? Is is perhaps fixed in newer versions of alps (this online trunk?) . After out correspondence I have run into the issue several times (always after lengthy calcualtions, I have never see a case when the kink output, appears right away - typically after several hours of testing). I occurs in 2.2 as well.

Is there perhaps a quick “dirty” fix one can do?

Best, Mateusz

...
On 17 Jan 2012, at 16:38, Matthias Troyer troyer@phys.ethz.ch wrote:

Dear Mateusz,

I will look at the issue as soon as I find some time to investigate it. Right now I have been too busy with with other important things to do. But I have not forgotten your problem.

Matthias

On 17 Jan 2012, at 08:31, Mateusz Łącki wrote:

...
Dear Matthias I would like add some new details ("read" messages come from the very initial patch for diagnosing the iostream fix. I was too lazy to unpack the source again). On one core with no mpi apparently the problem is less severe. At least I was able to obtain the following output:

Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 52% done). Checking if it is finished: not yet, next check in 900 seconds ( 54% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 55% done). Checking if it is finished: not yet, next check in 900 seconds ( 57% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Checking if it is finished: not yet, next check in 900 seconds ( 59% done). Checking if it is finished: not yet, next check in 900 seconds ( 61% done). Making regular checkpoint. Checkpointing Simulation 1 read: 111 read: 98 read: 111 read: 98 Done with checkpoint. Avoided problem Avoided problem Avoided problem Avoided problem Avoided problem

But the simulation stopped outputting anything 24h ago (i took over 1904 minutes altogether, so it took around 9h to calculate up to 60%). The problem looks substantially different than the mpi case.

The file was:

LATTICE="inhomogeneous open chain lattice"; L=30;

MODEL="trapped boson Hubbard"; NONLOCAL=0; U = 1.0; mu = 0.5; Nmax = 5;

T=0.04; t=0.05; K=0.00

MEASURE[Correlations] = 'True'; MEASURE_LOCAL[Occupation] = "n"; MEASURE_LOCAL[SlonTrabalski] = "n2"; MEASURE_CORRELATION[Czeslaw] = "n:n" THERMALIZATION=100000; SWEEPS=2000000; dasdaRESTRICT_MEASUREMENTS[N]=30

{t=0.55; mu=-0.315; L=120; RESTRICT_MEASUREMENTS[N]=120}

W dniu 14 stycznia 2012 19:22 użytkownik Mateusz Łącki mateusz.lacki@gmail.com napisał:

...
Dear Matthias, Did You have any success troubleshooting the issue? I have run into it again (different compilation, different machine). The only common thing seems to be a large number of worms involved (in the newest case it was 50). Of course the error is identical, *xml files too, but parameters are a little different.

Regards, Mateusz

On Dec 31, 2011, at 10:35 PM, Mateusz Łącki wrote:

...
Dear Matthias, I attach new output files. Sorry for the delay - I have misapplied the patch and needed to redo the whole procedure. Nevertheless the parameters are printed.

<dalps2.zip>

Regards, Happy New Year Mateusz

On Dec 31, 2011, at 12:31 PM, Matthias Troyer wrote:

> No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that. > > Matthias > > On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote: > >> Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help? >> >> Regards, >> Mateusz >> >>> One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again: >>> >>> --- applications/qmc/worms/Wcheck.C (revision 5899) >>> +++ applications/qmc/worms/Wcheck.C (working copy) >>> @@ -94,6 +94,7 @@ >>> >>> void WRun::print_spins() >>> { >>> + std::cout << parms; >>> std::cout << "Spin configuration:\n"; >>> std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() >>> << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl; >>> >>> >>> >>> On 30 Dec 2011, at 23:26, Matthias Troyer wrote: >>> >>>> It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use? >>>> >>>> Matthias >>>> >>>> On 30 Dec 2011, at 23:04, Mateusz Łącki wrote: >>>> >>>>> Dear Matthias, >>>>> I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2) >>>>> >>>>> Regards, >>>>> Mateusz Łącki >>>>> >>>>> <dalps.zip> >>>>> >>>>> On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I can only look into this if you send you input file. >>>>>> >>>>>> Matthias >>>>>> >>>>>> On 30 Dec 2011, at 12:05, Mateusz Łącki wrote: >>>>>> >>>>>>> Dear All, >>>>>>> I have set up some computation which failed: >>>>>>> -------------------------------------------------------------------------- >>>>>>> MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD >>>>>>> with errorcode -2. >>>>>>> >>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>>> You may or may not see output from other processes, depending on >>>>>>> exactly when Open MPI kills them. >>>>>>> -------------------------------------------------------------------------- >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpiexec has exited due to process rank 27 with PID 15245 on >>>>>>> node clone18 exiting without calling "finalize". This may >>>>>>> have caused other processes in the application to be >>>>>>> terminated by signals sent by mpiexec (as reported here). >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> just before that (in the output file, not sure about the time): >>>>>>> >>>>>>> Avoided problem >>>>>>> q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 >>>>>>> Spin configuration: >>>>>>> Wormheads at 78 0.999997 and 78 0.259301 >>>>>>> Site: 0 >>>>>>> Kink : [ 0.119336 : 0 ] >>>>>>> Kink : [ 0.124065 : 1 ] >>>>>>> Kink : [ 0.174815 : 0 ] >>>>>>> Kink : [ 0.17605 : 1 ] >>>>>>> Kink : [ 0.335094 : 2 ] >>>>>>> Kink : [ 0.368865 : 1 ] >>>>>>> (...) >>>>>>> Site: 299 >>>>>>> Kink : [ 0.00590279 : 2 ] >>>>>>> Kink : [ 0.0326616 : 1 ] >>>>>>> Kink : [ 0.0697665 : 0 ] >>>>>>> Kink : [ 0.0977223 : 1 ] >>>>>>> Kink : [ 0.254292 : 2 ] >>>>>>> Kink : [ 0.256147 : 1 ] >>>>>>> Kink : [ 0.328286 : 2 ] >>>>>>> Kink : [ 0.329838 : 1 ] >>>>>>> Kink : [ 0.405038 : 0 ] >>>>>>> Kink : [ 0.438803 : 1 ] >>>>>>> Kink : [ 0.487034 : 2 ] >>>>>>> Kink : [ 0.503331 : 1 ] >>>>>>> Kink : [ 0.812159 : 2 ] >>>>>>> Kink : [ 0.827811 : 1 ] >>>>>>> >>>>>>> >>>>>>> Is this related? I am not sure whether this output indicates an error. >>>>>>> >>>>>>> Regards, >>>>>>> Mateusz >>>>>>> >>>>>>> On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote: >>>>>>> >>>>>>>> Yes, indeed >>>>>>>> >>>>>>>> On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: >>>>>>>> >>>>>>>>> Dear Matthias, >>>>>>>>> >>>>>>>>> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Mateusz >>>>>>>>> >>>>>>>>>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>>>>>>>>> >>>>>>>>>> Matthias >>>>>>>>>> >>>>>>>>>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>>>>>>>>> >>>>>>>>>>> Dear All, >>>>>>>>>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Mateusz Łącki >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >

Mateusz Łącki

22 Jul 22 Jul

12:11 a.m.

New subject: avereaging of a 2d lattice

Dear All, I want to average results from DWA simulation of a BH model to get the 2D density of a gas in a harmonic trap.

I was trying to use function “collectXY” in pyalps to combine simulation results, but the following issues arise: - I can average any measurement that produces a single number: collectXY(‘J’, ‘Stifness’) as in tutorial5a.py

How can I average the local density? then in hdf files ‘y’ is no longer a number but a 2D vector of dimension Lx, Ly, and ‘x’ is 1D vetor of size Lx*Ly naive application of collectXY results in error of mismatching tensor dimensions.

Is there an “official” way to do it ?

In http://alps.comp-phys.org/mediawiki/index.php/ALPS_2_Tutorials:DWA-03_Time_o...

there is an example of plotting the Green function. Can I save the density measurement to a file somehow instead of plotting ?

Best, Mateusz

Mateusz Łącki

12 Jan 12 Jan

1:14 a.m.

New subject: worm: kink output indicates some error?

Dear All I would like to ask what exactly is "MEASURE_GREEN" in worm, which is untested and should not be used. Is it <b_j^dag b_i> or some version of thermal green function? If the latter, is there a way to measure correlations "bdag:b". I think that MEASURE_CORRELATIONS[Name]="bdag:b" gets ignored (no relevant output is found). In the other hand, MEASURE_CORRELATIONS[Name]="n:n" works fine.

Kind regards, Mateusz

On Dec 31, 2011, at 12:31 PM, Matthias Troyer wrote:

...

No, the output files are not needed. I just need to know which of the simulations caused the problem and the patch will help with that.

Matthias

On Dec 31, 2011, at 12:20 PM, Mateusz Łącki wrote:

...
Ok, I have set up the calculation again with the new patch. I will also try to do in one core only (but this will take time). Will any output files such that h5 files be of any help?

Regards, Mateusz

...
One thing that might help me find which of the hundreds of simulations that you started caused the issue is if you apply the following patch and try again:

--- applications/qmc/worms/Wcheck.C (revision 5899) +++ applications/qmc/worms/Wcheck.C (working copy) @@ -94,6 +94,7 @@

void WRun::print_spins() {

std::cout << parms;

std::cout << "Spin configuration:\n"; std::cout << "Wormheads at " << worm_head[0].site() << " " << worm_head[0].time() << " and " << worm_head[1].site() << " " << worm_head[1].time() << std::endl;

On 30 Dec 2011, at 23:26, Matthias Troyer wrote:

...
It's hard debugging this if you launch so many jobs by MPI. Have you tried to see whether the problem also occurs if you don't use MPI? And, which version of ALPS do you use?

Matthias

On 30 Dec 2011, at 23:04, Mateusz Łącki wrote:

...
Dear Matthias, I attach input file (parm5c), modified models.xml and lattices.xml, stdout and stderr in separate files (out, out2)

Regards, Mateusz Łącki

<dalps.zip>

On Dec 30, 2011, at 5:36 PM, Matthias Troyer wrote:

...
Hi,

I can only look into this if you send you input file.

Matthias

On 30 Dec 2011, at 12:05, Mateusz Łącki wrote:

> Dear All, > I have set up some computation which failed: > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 27 in communicator MPI_COMM_WORLD > with errorcode -2. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec has exited due to process rank 27 with PID 15245 on > node clone18 exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpiexec (as reported here). > -------------------------------------------------------------------------- > > just before that (in the output file, not sure about the time): > > Avoided problem > q = -0 state1 = 0 state2 = 0 bond_type = 0 id 2047013814 2047013814 > Spin configuration: > Wormheads at 78 0.999997 and 78 0.259301 > Site: 0 > Kink : [ 0.119336 : 0 ] > Kink : [ 0.124065 : 1 ] > Kink : [ 0.174815 : 0 ] > Kink : [ 0.17605 : 1 ] > Kink : [ 0.335094 : 2 ] > Kink : [ 0.368865 : 1 ] > (...) > Site: 299 > Kink : [ 0.00590279 : 2 ] > Kink : [ 0.0326616 : 1 ] > Kink : [ 0.0697665 : 0 ] > Kink : [ 0.0977223 : 1 ] > Kink : [ 0.254292 : 2 ] > Kink : [ 0.256147 : 1 ] > Kink : [ 0.328286 : 2 ] > Kink : [ 0.329838 : 1 ] > Kink : [ 0.405038 : 0 ] > Kink : [ 0.438803 : 1 ] > Kink : [ 0.487034 : 2 ] > Kink : [ 0.503331 : 1 ] > Kink : [ 0.812159 : 2 ] > Kink : [ 0.827811 : 1 ] > > > Is this related? I am not sure whether this output indicates an error. > > Regards, > Mateusz > > On Dec 29, 2011, at 9:59 AM, Matthias Troyer wrote: > >> Yes, indeed >> >> On Dec 29, 2011, at 9:53 AM, Mateusz Łącki wrote: >> >>> Dear Matthias, >>> >>> Thank you for your answer. If I understand correctly the problem is solved and results take into account this special case now? >>> >>> Regards, >>> Mateusz >>> >>>> This is a debug message which I added a while ago when solving a problem that we had because of finite resolution of floating point numbers. There is a chance of 1e-16 per site and unit imaginary time interval that two bosons hop away from two neighboring sites at exactly the same time. This case needs special consideration, and the notice was added to indicate that this case had happened. >>>> >>>> Matthias >>>> >>>> On Dec 29, 2011, at 9:39 AM, Mateusz Łącki wrote: >>>> >>>>> Dear All, >>>>> While running some QMC (worm) computation by MPI over several nodes I noticed "Avoided problem" message appearing fro time to time. It does not seem particularly dangerous, but is it possible to find out what was the problem in the first place? >>>>> >>>>> Regards, >>>>> Mateusz Łącki >>>> >>> >> >

Matthias Troyer

1:59 a.m.

New subject: worm: kink output indicates some error?

On Jan 11, 2012, at 5:14 PM, Mateusz Łącki wrote:

...

Dear All I would like to ask what exactly is "MEASURE_GREEN" in worm, which is untested and should not be used. Is it <b_j^dag b_i> or some version of thermal green function?

yes

...

If the latter, is there a way to measure correlations "bdag:b". I think that MEASURE_CORRELATIONS[Name]="bdag:b" gets ignored (no relevant output is found).

Indeed, only diagonal correlations can be measured. You need to wait for the next release to measure the Green's function.

...

In the other hand, MEASURE_CORRELATIONS[Name]="n:n" works fine.

Correct

Matthias

3683

Age (days ago)

4984

Last active (days ago)

comp-phys-alps-users@lists.phys.ethz.ch

25 comments

5 participants

tags (0)

participants (5)

Emanuel Gull
Hartmut Hafermann
Kuangshing Chen
Mateusz Łącki
Matthias Troyer