Simple graphs and complex hopping parameter(t)

List overview All Threads

newer

older

current correlation function for...

definition of energy density/heat...

Akinlolu Akande

3 Feb 2009 3 Feb '09

11:57 a.m.

Hi all, Is it not possible to define a simple graph with complex hopping parameter in the Hamiltonian when using fulldiagonization?

For example, a 4-site chain is defined using a simple graph as

which I could also use the lattices.xml in the lattice libraries.I discovered that when the lattice in the libraries is used, it works well for complex hopping but using a simple graphs I got the following errors 'can not convert complex number into real one'.Any reason for this error?The reason I want to use a simple graph is because I want to use a different lattice structure for my calculations.

Thanks

Akin

Attachments:

attachment.html (text/html — 1.4 KB)

Show replies by date

Justin David Peel

6 Feb 6 Feb

5:35 p.m.

New subject: Some errors while running dmrg

I have recently been running a lot of dmrg calculations on a linux cluster. I realize that the dmrg program is not parallel programmed, but I run a lot of separate programs on separate processors. However, the clusters are set up so that there are 2 processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and lp solve 4.0. I had to specify where the mpich2 files were when I compiled as well as specifying the compiler as gnu (maybe that's the problem?). I was told by the support staff of the linux cluster to try that.

The most distressing error has been "St9bad_alloc" which crashes the program most of the time (sometimes it is able to keep going). I recently received that error followed by five lines of: *** ERROR in dsyev: info != 0 (failed to converge)

when I was running a 2D heisenberg model (4x4 lattice).

I also have received the error: *** glibc detected *** double free or corruption (!prev): 0x00000000007b34a0 ***

but it doesn't seem to crash the program ever and the results seem to be fine so I'm not as worried about that one. I don't receive these errors all the time, but they worry me all the same. Any ideas on what might be wrong? Is it because I used gnu as the compiler?

Thanks, Justin

Jeff Hammond

5:45 p.m.

New subject: Some errors while running dmrg

This indicates a problem in LAPACK:

*** ERROR in dsyev: info != 0 (failed to converge)

I have had issues in the past with certain proprietary LAPACK libraries not converging but the slower Netlib version does.

What library are you using? Is your machine 64-bit? Are you compiling ALPS for 64-bit integers? Is your BLAS/LAPACK for 64-bit integers?

When you're running on the cluster with multiple independent jobs, are you running in the same directory on a shared file system? The DMRG scratch files aren't named uniquely for each job so if they are all being written to one directory on the same filesystem, each job will overwrite the others files. This can cause all sorts of terrible things to occur.

You might want to setup your job submission script to create temporary scratch directory for each job to run in, and have the script copy back your final output files to whatever directory you submitted the job from, for example.

And no, GCC is not the problem here.

Jeff

On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel justin.peel@utah.edu wrote:

...

I have recently been running a lot of dmrg calculations on a linux cluster. I realize that the dmrg program is not parallel programmed, but I run a lot of separate programs on separate processors. However, the clusters are set up so that there are 2 processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and lp solve 4.0. I had to specify where the mpich2 files were when I compiled as well as specifying the compiler as gnu (maybe that's the problem?). I was told by the support staff of the linux cluster to try that.

The most distressing error has been "St9bad_alloc" which crashes the program most of the time (sometimes it is able to keep going). I recently received that error followed by five lines of: *** ERROR in dsyev: info != 0 (failed to converge)

when I was running a 2D heisenberg model (4x4 lattice).

I also have received the error: *** glibc detected *** double free or corruption (!prev): 0x00000000007b34a0 ***

but it doesn't seem to crash the program ever and the results seem to be fine so I'm not as worried about that one. I don't receive these errors all the time, but they worry me all the same. Any ideas on what might be wrong? Is it because I used gnu as the compiler?

Thanks, Justin

-- Jeff Hammond The University of Chicago http://home.uchicago.edu/~jhammond/

Justin David Peel

6:11 p.m.

New subject: Some errors while running dmrg

I don't know the answers to all of those things (I'm not very experienced with LINUX), but I do know some. The jobs are not all running in the same shared folder; I am using a scratch folder (the LINUX cluster people direct us to do so because it is faster disk access). The machine is 64-bit. I don't know about the LAPACK library (I'm not sure how to check). I didn't have to specify a location for that one or the BLAS library. I just followed the ALPS instructions for installation so I don't know if I really installed it for 64-bit or not. I've tried looking around the for LAPACK library but haven't found it. Maybe I'll have to ask the people who run the cluster about this.

Thanks for the reply, Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Jeff Hammond Sent: Fri 2/6/2009 9:45 AM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

This indicates a problem in LAPACK:

*** ERROR in dsyev: info != 0 (failed to converge)

I have had issues in the past with certain proprietary LAPACK libraries not converging but the slower Netlib version does.

What library are you using? Is your machine 64-bit? Are you compiling ALPS for 64-bit integers? Is your BLAS/LAPACK for 64-bit integers?

And no, GCC is not the problem here.

Jeff

On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel justin.peel@utah.edu wrote:

...

I have recently been running a lot of dmrg calculations on a linux cluster. I realize that the dmrg program is not parallel programmed, but I run a lot of separate programs on separate processors. However, the clusters are set up so that there are 2 processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and lp solve 4.0. I had to specify where the mpich2 files were when I compiled as well as specifying the compiler as gnu (maybe that's the problem?). I was told by the support staff of the linux cluster to try that.

The most distressing error has been "St9bad_alloc" which crashes the program most of the time (sometimes it is able to keep going). I recently received that error followed by five lines of: *** ERROR in dsyev: info != 0 (failed to converge)

when I was running a 2D heisenberg model (4x4 lattice).

I also have received the error: *** glibc detected *** double free or corruption (!prev): 0x00000000007b34a0 ***

but it doesn't seem to crash the program ever and the results seem to be fine so I'm not as worried about that one. I don't receive these errors all the time, but they worry me all the same. Any ideas on what might be wrong? Is it because I used gnu as the compiler?

Thanks, Justin

-- Jeff Hammond The University of Chicago http://home.uchicago.edu/~jhammond/

sinan bulut

9:27 p.m.

New subject: Some errors while running dmrg

type "man dsyev" in the terminal. if Lapack is properly installed, you should be able to see some explanation about this LAPACK subroutine. Can you find the explicit value of the "info" variable after the occurance of the error. As you will see in the manual page of "dsyev" that the value of "info" gives clues about the nature of the error.

On Fri, Feb 6, 2009 at 12:11 PM, Justin David Peel justin.peel@utah.eduwrote:

...

I don't know the answers to all of those things (I'm not very experienced with LINUX), but I do know some. The jobs are not all running in the same shared folder; I am using a scratch folder (the LINUX cluster people direct us to do so because it is faster disk access). The machine is 64-bit. I don't know about the LAPACK library (I'm not sure how to check). I didn't have to specify a location for that one or the BLAS library. I just followed the ALPS instructions for installation so I don't know if I really installed it for 64-bit or not. I've tried looking around the for LAPACK library but haven't found it. Maybe I'll have to ask the people who run the cluster about this.

Thanks for the reply, Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Jeff Hammond Sent: Fri 2/6/2009 9:45 AM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

This indicates a problem in LAPACK:

*** ERROR in dsyev: info != 0 (failed to converge)

I have had issues in the past with certain proprietary LAPACK libraries not converging but the slower Netlib version does.

What library are you using? Is your machine 64-bit? Are you compiling ALPS for 64-bit integers? Is your BLAS/LAPACK for 64-bit integers?

When you're running on the cluster with multiple independent jobs, are you running in the same directory on a shared file system? The DMRG scratch files aren't named uniquely for each job so if they are all being written to one directory on the same filesystem, each job will overwrite the others files. This can cause all sorts of terrible things to occur.

You might want to setup your job submission script to create temporary scratch directory for each job to run in, and have the script copy back your final output files to whatever directory you submitted the job from, for example.

And no, GCC is not the problem here.

Jeff

On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel justin.peel@utah.edu wrote:

...
I have recently been running a lot of dmrg calculations on a linux

cluster. I realize that the dmrg program is not parallel programmed, but I run a lot of separate programs on separate processors. However, the clusters are set up so that there are 2 processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and lp solve 4.0. I had to specify where the mpich2 files were when I compiled as well as specifying the compiler as gnu (maybe that's the problem?). I was told by the support staff of the linux cluster to try that.

...
The most distressing error has been "St9bad_alloc" which crashes the

program most of the time (sometimes it is able to keep going). I recently received that error followed by five lines of:

...
*** ERROR in dsyev: info != 0 (failed to converge)

when I was running a 2D heisenberg model (4x4 lattice).

I also have received the error: *** glibc detected *** double free or corruption (!prev):

0x00000000007b34a0 ***

...
but it doesn't seem to crash the program ever and the results seem to be

fine so I'm not as worried about that one. I don't receive these errors all the time, but they worry me all the same. Any ideas on what might be wrong? Is it because I used gnu as the compiler?

...
Thanks, Justin

-- Jeff Hammond The University of Chicago http://home.uchicago.edu/~jhammond/http://home.uchicago.edu/%7Ejhammond/

Adrian E. Feiguin

10:29 p.m.

New subject: Some errors while running dmrg

Dear Justin,

To properly diagnose your problem we need some help from you. Could you please send us the output file? Also, could you also try to run using gdb? You need to follow these steps:

$ gdb ./dmrg $ run parms_file ... crash! $ bt

and you send us the output.

Another thing: can you tell us a little bit about the problem you are studying? It may help.

Thank you, Saludos, <ADRIAN>

sinan bulut wrote:

...

On Fri, Feb 6, 2009 at 12:11 PM, Justin David Peel <justin.peel@utah.edu mailto:justin.peel@utah.edu> wrote:

I don't know the answers to all of those things (I'm not very
experienced with LINUX), but I do know some. The jobs are not all
running in the same shared folder; I am using a scratch folder
(the LINUX cluster people direct us to do so because it is faster
disk access). The machine is 64-bit. I don't know about the LAPACK
library (I'm not sure how to check). I didn't have to specify a
location for that one or the BLAS library. I just followed the
ALPS instructions for installation so I don't know if I really
installed it for 64-bit or not. I've tried looking around the for
LAPACK library but haven't found it. Maybe I'll have to ask the
people who run the cluster about this.

Thanks for the reply,
Justin

-----Original Message-----
From: comp-phys-alps-users-bounces@phys.ethz.ch
<mailto:comp-phys-alps-users-bounces@phys.ethz.ch> on behalf of
Jeff Hammond
Sent: Fri 2/6/2009 9:45 AM
To: comp-phys-alps-users@phys.ethz.ch
<mailto:comp-phys-alps-users@phys.ethz.ch>
Subject: Re: [ALPS-users] Some errors while running dmrg

This indicates a problem in LAPACK:

 *** ERROR in dsyev: info != 0 (failed to converge)

I have had issues in the past with certain proprietary LAPACK
libraries not converging but the slower Netlib version does.

What library are you using?  Is your machine 64-bit?  Are you
compiling ALPS for 64-bit integers?  Is your BLAS/LAPACK for 64-bit
integers?

When you're running on the cluster with multiple independent jobs, are
you running in the same directory on a shared file system?  The DMRG
scratch files aren't named uniquely for each job so if they are all
being written to one directory on the same filesystem, each job will
overwrite the others files.  This can cause all sorts of terrible
things to occur.

You might want to setup your job submission script to create temporary
scratch directory for each job to run in, and have the script copy
back your final output files to whatever directory you submitted the
job from, for example.

And no, GCC is not the problem here.

Jeff

On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel
<justin.peel@utah.edu <mailto:justin.peel@utah.edu>> wrote:
> I have recently been running a lot of dmrg calculations on a
linux cluster. I realize that the dmrg program is not parallel
programmed, but I run a lot of separate programs on separate
processors. However, the clusters are set up so that there are 2
processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and
lp solve 4.0. I had to specify where the mpich2 files were when I
compiled as well as specifying the compiler as gnu (maybe that's
the problem?). I was told by the support staff of the linux
cluster to try that.
>
> The most distressing error has been "St9bad_alloc" which crashes
the program most of the time (sometimes it is able to keep going).
I recently received that error followed by five lines of:
> *** ERROR in dsyev: info != 0 (failed to converge)
>
> when I was running a 2D heisenberg model (4x4 lattice).
>
> I also have received the error:
> *** glibc detected *** double free or corruption (!prev):
0x00000000007b34a0 ***
>
> but it doesn't seem to crash the program ever and the results
seem to be fine so I'm not as worried about that one. I don't
receive these errors all the time, but they worry me all the same.
Any ideas on what might be wrong? Is it because I used gnu as the
compiler?
>
> Thanks,
> Justin
>
>

--
Jeff Hammond
The University of Chicago
http://home.uchicago.edu/~jhammond/
<http://home.uchicago.edu/%7Ejhammond/>

Justin David Peel

9 Feb 9 Feb

3:43 a.m.

New subject: Some errors while running dmrg

I think that I may have figured it out. I think it has to do my entering one of the mpi commands incorrectly. This may not be all of the problems, but I'll let you know if I have any more of them. I was having trouble duplicating the errors in gdb, but that was probably because of my ignorance with the mpi commands. It could also be that these errors are very infrequent. I'll just have to see. Thanks so much for being willing to help and being patient with me.

Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Adrian E. Feiguin Sent: Fri 2/6/2009 2:29 PM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

Dear Justin,

To properly diagnose your problem we need some help from you. Could you please send us the output file? Also, could you also try to run using gdb? You need to follow these steps:

$ gdb ./dmrg $ run parms_file ... crash! $ bt

and you send us the output.

Another thing: can you tell us a little bit about the problem you are studying? It may help.

Thank you, Saludos, <ADRIAN>

sinan bulut wrote:

...

On Fri, Feb 6, 2009 at 12:11 PM, Justin David Peel <justin.peel@utah.edu mailto:justin.peel@utah.edu> wrote:

I don't know the answers to all of those things (I'm not very
experienced with LINUX), but I do know some. The jobs are not all
running in the same shared folder; I am using a scratch folder
(the LINUX cluster people direct us to do so because it is faster
disk access). The machine is 64-bit. I don't know about the LAPACK
library (I'm not sure how to check). I didn't have to specify a
location for that one or the BLAS library. I just followed the
ALPS instructions for installation so I don't know if I really
installed it for 64-bit or not. I've tried looking around the for
LAPACK library but haven't found it. Maybe I'll have to ask the
people who run the cluster about this.

Thanks for the reply,
Justin

-----Original Message-----
From: comp-phys-alps-users-bounces@phys.ethz.ch
<mailto:comp-phys-alps-users-bounces@phys.ethz.ch> on behalf of
Jeff Hammond
Sent: Fri 2/6/2009 9:45 AM
To: comp-phys-alps-users@phys.ethz.ch
<mailto:comp-phys-alps-users@phys.ethz.ch>
Subject: Re: [ALPS-users] Some errors while running dmrg

This indicates a problem in LAPACK:

 *** ERROR in dsyev: info != 0 (failed to converge)

I have had issues in the past with certain proprietary LAPACK
libraries not converging but the slower Netlib version does.

What library are you using?  Is your machine 64-bit?  Are you
compiling ALPS for 64-bit integers?  Is your BLAS/LAPACK for 64-bit
integers?

When you're running on the cluster with multiple independent jobs, are
you running in the same directory on a shared file system?  The DMRG
scratch files aren't named uniquely for each job so if they are all
being written to one directory on the same filesystem, each job will
overwrite the others files.  This can cause all sorts of terrible
things to occur.

You might want to setup your job submission script to create temporary
scratch directory for each job to run in, and have the script copy
back your final output files to whatever directory you submitted the
job from, for example.

And no, GCC is not the problem here.

Jeff

On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel
<justin.peel@utah.edu <mailto:justin.peel@utah.edu>> wrote:
> I have recently been running a lot of dmrg calculations on a
linux cluster. I realize that the dmrg program is not parallel
programmed, but I run a lot of separate programs on separate
processors. However, the clusters are set up so that there are 2
processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and
lp solve 4.0. I had to specify where the mpich2 files were when I
compiled as well as specifying the compiler as gnu (maybe that's
the problem?). I was told by the support staff of the linux
cluster to try that.
>
> The most distressing error has been "St9bad_alloc" which crashes
the program most of the time (sometimes it is able to keep going).
I recently received that error followed by five lines of:
> *** ERROR in dsyev: info != 0 (failed to converge)
>
> when I was running a 2D heisenberg model (4x4 lattice).
>
> I also have received the error:
> *** glibc detected *** double free or corruption (!prev):
0x00000000007b34a0 ***
>
> but it doesn't seem to crash the program ever and the results
seem to be fine so I'm not as worried about that one. I don't
receive these errors all the time, but they worry me all the same.
Any ideas on what might be wrong? Is it because I used gnu as the
compiler?
>
> Thanks,
> Justin
>
>

--
Jeff Hammond
The University of Chicago
http://home.uchicago.edu/~jhammond/
<http://home.uchicago.edu/%7Ejhammond/>

Adrian E. Feiguin

4:43 a.m.

New subject: Some errors while running dmrg

Hi Justin,

I don't know why you are using MPI at all. As you noticed already, the dmrg code is serial. You only need to use your queuing system to submit serial jobs on different nodes. Just make sure you are running different jobs on different directories. If you run in interactive mode or under gdb, and everything works fine, the problem is elsewhere.

Saludos, <ADRIAN>

Justin David Peel wrote:

...

I think that I may have figured it out. I think it has to do my entering one of the mpi commands incorrectly. This may not be all of the problems, but I'll let you know if I have any more of them. I was having trouble duplicating the errors in gdb, but that was probably because of my ignorance with the mpi commands. It could also be that these errors are very infrequent. I'll just have to see. Thanks so much for being willing to help and being patient with me.

Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Adrian E. Feiguin Sent: Fri 2/6/2009 2:29 PM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

Dear Justin,

To properly diagnose your problem we need some help from you. Could you please send us the output file? Also, could you also try to run using gdb? You need to follow these steps:

$ gdb ./dmrg $ run parms_file ... crash! $ bt

and you send us the output.

Another thing: can you tell us a little bit about the problem you are studying? It may help.

Thank you, Saludos,

<ADRIAN>

sinan bulut wrote:

...
type "man dsyev" in the terminal. if Lapack is properly installed, you should be able to see some explanation about this LAPACK subroutine. Can you find the explicit value of the "info" variable after the occurance of the error. As you will see in the manual page of "dsyev" that the value of "info" gives clues about the nature of the error.

On Fri, Feb 6, 2009 at 12:11 PM, Justin David Peel <justin.peel@utah.edu mailto:justin.peel@utah.edu> wrote:
I don't know the answers to all of those things (I'm not very
experienced with LINUX), but I do know some. The jobs are not all
running in the same shared folder; I am using a scratch folder
(the LINUX cluster people direct us to do so because it is faster
disk access). The machine is 64-bit. I don't know about the LAPACK
library (I'm not sure how to check). I didn't have to specify a
location for that one or the BLAS library. I just followed the
ALPS instructions for installation so I don't know if I really
installed it for 64-bit or not. I've tried looking around the for
LAPACK library but haven't found it. Maybe I'll have to ask the
people who run the cluster about this.

Thanks for the reply,
Justin


-----Original Message-----
From: comp-phys-alps-users-bounces@phys.ethz.ch
<mailto:comp-phys-alps-users-bounces@phys.ethz.ch> on behalf of
Jeff Hammond
Sent: Fri 2/6/2009 9:45 AM
To: comp-phys-alps-users@phys.ethz.ch
<mailto:comp-phys-alps-users@phys.ethz.ch>
Subject: Re: [ALPS-users] Some errors while running dmrg

This indicates a problem in LAPACK:

 *** ERROR in dsyev: info != 0 (failed to converge)

I have had issues in the past with certain proprietary LAPACK
libraries not converging but the slower Netlib version does.

What library are you using?  Is your machine 64-bit?  Are you
compiling ALPS for 64-bit integers?  Is your BLAS/LAPACK for 64-bit
integers?

When you're running on the cluster with multiple independent jobs, are
you running in the same directory on a shared file system?  The DMRG
scratch files aren't named uniquely for each job so if they are all
being written to one directory on the same filesystem, each job will
overwrite the others files.  This can cause all sorts of terrible
things to occur.

You might want to setup your job submission script to create temporary
scratch directory for each job to run in, and have the script copy
back your final output files to whatever directory you submitted the
job from, for example.

And no, GCC is not the problem here.

Jeff

On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel
<justin.peel@utah.edu <mailto:justin.peel@utah.edu>> wrote:
> I have recently been running a lot of dmrg calculations on a
linux cluster. I realize that the dmrg program is not parallel
programmed, but I run a lot of separate programs on separate
processors. However, the clusters are set up so that there are 2
processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and
lp solve 4.0. I had to specify where the mpich2 files were when I
compiled as well as specifying the compiler as gnu (maybe that's
the problem?). I was told by the support staff of the linux
cluster to try that.
>
> The most distressing error has been "St9bad_alloc" which crashes
the program most of the time (sometimes it is able to keep going).
I recently received that error followed by five lines of:
> *** ERROR in dsyev: info != 0 (failed to converge)
>
> when I was running a 2D heisenberg model (4x4 lattice).
>
> I also have received the error:
> *** glibc detected *** double free or corruption (!prev):
0x00000000007b34a0 ***
>
> but it doesn't seem to crash the program ever and the results
seem to be fine so I'm not as worried about that one. I don't
receive these errors all the time, but they worry me all the same.
Any ideas on what might be wrong? Is it because I used gnu as the
compiler?
>
> Thanks,
> Justin
>
>



--
Jeff Hammond
The University of Chicago
http://home.uchicago.edu/~jhammond/
<http://home.uchicago.edu/%7Ejhammond/>

Justin David Peel

4:42 p.m.

New subject: Some errors while running dmrg

The reason I have been using mpi is because I have only seen examples for these clusters using mpi. I know that dmrg is serial. I'll see if I can figure out how to submit the jobs without mpi. I am running different jobs in different directories.

Thanks, Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Adrian E. Feiguin Sent: Sun 2/8/2009 8:43 PM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

Hi Justin,

Saludos, <ADRIAN>

Justin David Peel wrote:

...

I think that I may have figured it out. I think it has to do my entering one of the mpi commands incorrectly. This may not be all of the problems, but I'll let you know if I have any more of them. I was having trouble duplicating the errors in gdb, but that was probably because of my ignorance with the mpi commands. It could also be that these errors are very infrequent. I'll just have to see. Thanks so much for being willing to help and being patient with me.

Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Adrian E. Feiguin Sent: Fri 2/6/2009 2:29 PM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

Dear Justin,

To properly diagnose your problem we need some help from you. Could you please send us the output file? Also, could you also try to run using gdb? You need to follow these steps:

$ gdb ./dmrg $ run parms_file ... crash! $ bt

and you send us the output.

Another thing: can you tell us a little bit about the problem you are studying? It may help.

Thank you, Saludos,

<ADRIAN>

sinan bulut wrote:

...
type "man dsyev" in the terminal. if Lapack is properly installed, you should be able to see some explanation about this LAPACK subroutine. Can you find the explicit value of the "info" variable after the occurance of the error. As you will see in the manual page of "dsyev" that the value of "info" gives clues about the nature of the error.

On Fri, Feb 6, 2009 at 12:11 PM, Justin David Peel <justin.peel@utah.edu mailto:justin.peel@utah.edu> wrote:
I don't know the answers to all of those things (I'm not very
experienced with LINUX), but I do know some. The jobs are not all
running in the same shared folder; I am using a scratch folder
(the LINUX cluster people direct us to do so because it is faster
disk access). The machine is 64-bit. I don't know about the LAPACK
library (I'm not sure how to check). I didn't have to specify a
location for that one or the BLAS library. I just followed the
ALPS instructions for installation so I don't know if I really
installed it for 64-bit or not. I've tried looking around the for
LAPACK library but haven't found it. Maybe I'll have to ask the
people who run the cluster about this.

Thanks for the reply,
Justin


-----Original Message-----
From: comp-phys-alps-users-bounces@phys.ethz.ch
<mailto:comp-phys-alps-users-bounces@phys.ethz.ch> on behalf of
Jeff Hammond
Sent: Fri 2/6/2009 9:45 AM
To: comp-phys-alps-users@phys.ethz.ch
<mailto:comp-phys-alps-users@phys.ethz.ch>
Subject: Re: [ALPS-users] Some errors while running dmrg

This indicates a problem in LAPACK:

 *** ERROR in dsyev: info != 0 (failed to converge)

I have had issues in the past with certain proprietary LAPACK
libraries not converging but the slower Netlib version does.

What library are you using?  Is your machine 64-bit?  Are you
compiling ALPS for 64-bit integers?  Is your BLAS/LAPACK for 64-bit
integers?

When you're running on the cluster with multiple independent jobs, are
you running in the same directory on a shared file system?  The DMRG
scratch files aren't named uniquely for each job so if they are all
being written to one directory on the same filesystem, each job will
overwrite the others files.  This can cause all sorts of terrible
things to occur.

You might want to setup your job submission script to create temporary
scratch directory for each job to run in, and have the script copy
back your final output files to whatever directory you submitted the
job from, for example.

And no, GCC is not the problem here.

Jeff

On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel
<justin.peel@utah.edu <mailto:justin.peel@utah.edu>> wrote:
> I have recently been running a lot of dmrg calculations on a
linux cluster. I realize that the dmrg program is not parallel
programmed, but I run a lot of separate programs on separate
processors. However, the clusters are set up so that there are 2
processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and
lp solve 4.0. I had to specify where the mpich2 files were when I
compiled as well as specifying the compiler as gnu (maybe that's
the problem?). I was told by the support staff of the linux
cluster to try that.
>
> The most distressing error has been "St9bad_alloc" which crashes
the program most of the time (sometimes it is able to keep going).
I recently received that error followed by five lines of:
> *** ERROR in dsyev: info != 0 (failed to converge)
>
> when I was running a 2D heisenberg model (4x4 lattice).
>
> I also have received the error:
> *** glibc detected *** double free or corruption (!prev):
0x00000000007b34a0 ***
>
> but it doesn't seem to crash the program ever and the results
seem to be fine so I'm not as worried about that one. I don't
receive these errors all the time, but they worry me all the same.
Any ideas on what might be wrong? Is it because I used gnu as the
compiler?
>
> Thanks,
> Justin
>
>



--
Jeff Hammond
The University of Chicago
http://home.uchicago.edu/~jhammond/
<http://home.uchicago.edu/%7Ejhammond/>

Jeff Hammond

4:51 p.m.

New subject: Some errors while running dmrg

What queue system are you using? Can you post your submission script?

You should be able to just replace the line that looks like this:

mpirun (mpi commands) inputfile > outputfile

with this:

executable inputfile > outputfile

If that doesn't work, trial-and-error should lead to a solution in less than 5 iterations, if my experience holds true.

You may need to recompile the executable using non-MPI compilers, if you use CXX=mpicxx the first time.

Jeff

On Mon, Feb 9, 2009 at 9:42 AM, Justin David Peel justin.peel@utah.edu wrote:

...

The reason I have been using mpi is because I have only seen examples for these clusters using mpi. I know that dmrg is serial. I'll see if I can figure out how to submit the jobs without mpi. I am running different jobs in different directories.

Thanks, Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Adrian E. Feiguin Sent: Sun 2/8/2009 8:43 PM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

Hi Justin,

I don't know why you are using MPI at all. As you noticed already, the dmrg code is serial. You only need to use your queuing system to submit serial jobs on different nodes. Just make sure you are running different jobs on different directories. If you run in interactive mode or under gdb, and everything works fine, the problem is elsewhere.

Saludos,

<ADRIAN>

Justin David Peel wrote:

...
I think that I may have figured it out. I think it has to do my entering one of the mpi commands incorrectly. This may not be all of the problems, but I'll let you know if I have any more of them. I was having trouble duplicating the errors in gdb, but that was probably because of my ignorance with the mpi commands. It could also be that these errors are very infrequent. I'll just have to see. Thanks so much for being willing to help and being patient with me.

Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Adrian E. Feiguin Sent: Fri 2/6/2009 2:29 PM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

Dear Justin,

To properly diagnose your problem we need some help from you. Could you please send us the output file? Also, could you also try to run using gdb? You need to follow these steps:

$ gdb ./dmrg $ run parms_file ... crash! $ bt

and you send us the output.

Another thing: can you tell us a little bit about the problem you are studying? It may help.

Thank you, Saludos,

<ADRIAN>

sinan bulut wrote:

...
type "man dsyev" in the terminal. if Lapack is properly installed, you should be able to see some explanation about this LAPACK subroutine. Can you find the explicit value of the "info" variable after the occurance of the error. As you will see in the manual page of "dsyev" that the value of "info" gives clues about the nature of the error.

On Fri, Feb 6, 2009 at 12:11 PM, Justin David Peel <justin.peel@utah.edu mailto:justin.peel@utah.edu> wrote:
I don't know the answers to all of those things (I'm not very
experienced with LINUX), but I do know some. The jobs are not all
running in the same shared folder; I am using a scratch folder
(the LINUX cluster people direct us to do so because it is faster
disk access). The machine is 64-bit. I don't know about the LAPACK
library (I'm not sure how to check). I didn't have to specify a
location for that one or the BLAS library. I just followed the
ALPS instructions for installation so I don't know if I really
installed it for 64-bit or not. I've tried looking around the for
LAPACK library but haven't found it. Maybe I'll have to ask the
people who run the cluster about this.

Thanks for the reply,
Justin


-----Original Message-----
From: comp-phys-alps-users-bounces@phys.ethz.ch
<mailto:comp-phys-alps-users-bounces@phys.ethz.ch> on behalf of
Jeff Hammond
Sent: Fri 2/6/2009 9:45 AM
To: comp-phys-alps-users@phys.ethz.ch
<mailto:comp-phys-alps-users@phys.ethz.ch>
Subject: Re: [ALPS-users] Some errors while running dmrg

This indicates a problem in LAPACK:

 *** ERROR in dsyev: info != 0 (failed to converge)

I have had issues in the past with certain proprietary LAPACK
libraries not converging but the slower Netlib version does.

What library are you using?  Is your machine 64-bit?  Are you
compiling ALPS for 64-bit integers?  Is your BLAS/LAPACK for 64-bit
integers?

When you're running on the cluster with multiple independent jobs, are
you running in the same directory on a shared file system?  The DMRG
scratch files aren't named uniquely for each job so if they are all
being written to one directory on the same filesystem, each job will
overwrite the others files.  This can cause all sorts of terrible
things to occur.

You might want to setup your job submission script to create temporary
scratch directory for each job to run in, and have the script copy
back your final output files to whatever directory you submitted the
job from, for example.

And no, GCC is not the problem here.

Jeff

On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel
<justin.peel@utah.edu <mailto:justin.peel@utah.edu>> wrote:
> I have recently been running a lot of dmrg calculations on a
linux cluster. I realize that the dmrg program is not parallel
programmed, but I run a lot of separate programs on separate
processors. However, the clusters are set up so that there are 2
processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and
lp solve 4.0. I had to specify where the mpich2 files were when I
compiled as well as specifying the compiler as gnu (maybe that's
the problem?). I was told by the support staff of the linux
cluster to try that.
>
> The most distressing error has been "St9bad_alloc" which crashes
the program most of the time (sometimes it is able to keep going).
I recently received that error followed by five lines of:
> *** ERROR in dsyev: info != 0 (failed to converge)
>
> when I was running a 2D heisenberg model (4x4 lattice).
>
> I also have received the error:
> *** glibc detected *** double free or corruption (!prev):
0x00000000007b34a0 ***
>
> but it doesn't seem to crash the program ever and the results
seem to be fine so I'm not as worried about that one. I don't
receive these errors all the time, but they worry me all the same.
Any ideas on what might be wrong? Is it because I used gnu as the
compiler?
>
> Thanks,
> Justin
>
>



--
Jeff Hammond
The University of Chicago
http://home.uchicago.edu/~jhammond/
<http://home.uchicago.edu/%7Ejhammond/>

-- Jeff Hammond The University of Chicago http://home.uchicago.edu/~jhammond/

Justin David Peel

4:58 p.m.

New subject: Some errors while running dmrg

I'm using pbs. I had already decided to try what you suggested. I just never thought to do that because I was following blindly the only examples posted by the cluster admin. It seems to be working so far.

Thanks again, Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Jeff Hammond Sent: Mon 2/9/2009 8:51 AM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

What queue system are you using? Can you post your submission script?

You should be able to just replace the line that looks like this:

mpirun (mpi commands) inputfile > outputfile

with this:

executable inputfile > outputfile

If that doesn't work, trial-and-error should lead to a solution in less than 5 iterations, if my experience holds true.

You may need to recompile the executable using non-MPI compilers, if you use CXX=mpicxx the first time.

Jeff

On Mon, Feb 9, 2009 at 9:42 AM, Justin David Peel justin.peel@utah.edu wrote:

...

The reason I have been using mpi is because I have only seen examples for these clusters using mpi. I know that dmrg is serial. I'll see if I can figure out how to submit the jobs without mpi. I am running different jobs in different directories.

Thanks, Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Adrian E. Feiguin Sent: Sun 2/8/2009 8:43 PM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

Hi Justin,

I don't know why you are using MPI at all. As you noticed already, the dmrg code is serial. You only need to use your queuing system to submit serial jobs on different nodes. Just make sure you are running different jobs on different directories. If you run in interactive mode or under gdb, and everything works fine, the problem is elsewhere.

Saludos,

<ADRIAN>

Justin David Peel wrote:

...
I think that I may have figured it out. I think it has to do my entering one of the mpi commands incorrectly. This may not be all of the problems, but I'll let you know if I have any more of them. I was having trouble duplicating the errors in gdb, but that was probably because of my ignorance with the mpi commands. It could also be that these errors are very infrequent. I'll just have to see. Thanks so much for being willing to help and being patient with me.

Justin

-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Adrian E. Feiguin Sent: Fri 2/6/2009 2:29 PM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg

Dear Justin,

To properly diagnose your problem we need some help from you. Could you please send us the output file? Also, could you also try to run using gdb? You need to follow these steps:

$ gdb ./dmrg $ run parms_file ... crash! $ bt

and you send us the output.

Another thing: can you tell us a little bit about the problem you are studying? It may help.

Thank you, Saludos,

<ADRIAN>

sinan bulut wrote:

...
type "man dsyev" in the terminal. if Lapack is properly installed, you should be able to see some explanation about this LAPACK subroutine. Can you find the explicit value of the "info" variable after the occurance of the error. As you will see in the manual page of "dsyev" that the value of "info" gives clues about the nature of the error.

On Fri, Feb 6, 2009 at 12:11 PM, Justin David Peel <justin.peel@utah.edu mailto:justin.peel@utah.edu> wrote:
I don't know the answers to all of those things (I'm not very
experienced with LINUX), but I do know some. The jobs are not all
running in the same shared folder; I am using a scratch folder
(the LINUX cluster people direct us to do so because it is faster
disk access). The machine is 64-bit. I don't know about the LAPACK
library (I'm not sure how to check). I didn't have to specify a
location for that one or the BLAS library. I just followed the
ALPS instructions for installation so I don't know if I really
installed it for 64-bit or not. I've tried looking around the for
LAPACK library but haven't found it. Maybe I'll have to ask the
people who run the cluster about this.

Thanks for the reply,
Justin


-----Original Message-----
From: comp-phys-alps-users-bounces@phys.ethz.ch
<mailto:comp-phys-alps-users-bounces@phys.ethz.ch> on behalf of
Jeff Hammond
Sent: Fri 2/6/2009 9:45 AM
To: comp-phys-alps-users@phys.ethz.ch
<mailto:comp-phys-alps-users@phys.ethz.ch>
Subject: Re: [ALPS-users] Some errors while running dmrg

This indicates a problem in LAPACK:

 *** ERROR in dsyev: info != 0 (failed to converge)

I have had issues in the past with certain proprietary LAPACK
libraries not converging but the slower Netlib version does.

What library are you using?  Is your machine 64-bit?  Are you
compiling ALPS for 64-bit integers?  Is your BLAS/LAPACK for 64-bit
integers?

When you're running on the cluster with multiple independent jobs, are
you running in the same directory on a shared file system?  The DMRG
scratch files aren't named uniquely for each job so if they are all
being written to one directory on the same filesystem, each job will
overwrite the others files.  This can cause all sorts of terrible
things to occur.

You might want to setup your job submission script to create temporary
scratch directory for each job to run in, and have the script copy
back your final output files to whatever directory you submitted the
job from, for example.

And no, GCC is not the problem here.

Jeff

On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel
<justin.peel@utah.edu <mailto:justin.peel@utah.edu>> wrote:
> I have recently been running a lot of dmrg calculations on a
linux cluster. I realize that the dmrg program is not parallel
programmed, but I run a lot of separate programs on separate
processors. However, the clusters are set up so that there are 2
processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and
lp solve 4.0. I had to specify where the mpich2 files were when I
compiled as well as specifying the compiler as gnu (maybe that's
the problem?). I was told by the support staff of the linux
cluster to try that.
>
> The most distressing error has been "St9bad_alloc" which crashes
the program most of the time (sometimes it is able to keep going).
I recently received that error followed by five lines of:
> *** ERROR in dsyev: info != 0 (failed to converge)
>
> when I was running a 2D heisenberg model (4x4 lattice).
>
> I also have received the error:
> *** glibc detected *** double free or corruption (!prev):
0x00000000007b34a0 ***
>
> but it doesn't seem to crash the program ever and the results
seem to be fine so I'm not as worried about that one. I don't
receive these errors all the time, but they worry me all the same.
Any ideas on what might be wrong? Is it because I used gnu as the
compiler?
>
> Thanks,
> Justin
>
>



--
Jeff Hammond
The University of Chicago
http://home.uchicago.edu/~jhammond/
<http://home.uchicago.edu/%7Ejhammond/>

-- Jeff Hammond The University of Chicago http://home.uchicago.edu/~jhammond/

6017

Age (days ago)

6023

Last active (days ago)

comp-phys-alps-users@lists.phys.ethz.ch

10 comments

5 participants

tags (0)

participants (5)

Adrian E. Feiguin
Akinlolu Akande
Jeff Hammond
Justin David Peel
sinan bulut