type "man dsyev" in the terminal. if Lapack is properly installed, you should be able to see some explanation about this LAPACK subroutine. Can you find the explicit value of the "info" variable after the occurance of the error. As you will see in the manual page of "dsyev" that the value of "info" gives clues about the nature of the error.
On Fri, Feb 6, 2009 at 12:11 PM, Justin David Peel justin.peel@utah.eduwrote:
I don't know the answers to all of those things (I'm not very experienced with LINUX), but I do know some. The jobs are not all running in the same shared folder; I am using a scratch folder (the LINUX cluster people direct us to do so because it is faster disk access). The machine is 64-bit. I don't know about the LAPACK library (I'm not sure how to check). I didn't have to specify a location for that one or the BLAS library. I just followed the ALPS instructions for installation so I don't know if I really installed it for 64-bit or not. I've tried looking around the for LAPACK library but haven't found it. Maybe I'll have to ask the people who run the cluster about this.
Thanks for the reply, Justin
-----Original Message----- From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Jeff Hammond Sent: Fri 2/6/2009 9:45 AM To: comp-phys-alps-users@phys.ethz.ch Subject: Re: [ALPS-users] Some errors while running dmrg
This indicates a problem in LAPACK:
*** ERROR in dsyev: info != 0 (failed to converge)
I have had issues in the past with certain proprietary LAPACK libraries not converging but the slower Netlib version does.
What library are you using? Is your machine 64-bit? Are you compiling ALPS for 64-bit integers? Is your BLAS/LAPACK for 64-bit integers?
When you're running on the cluster with multiple independent jobs, are you running in the same directory on a shared file system? The DMRG scratch files aren't named uniquely for each job so if they are all being written to one directory on the same filesystem, each job will overwrite the others files. This can cause all sorts of terrible things to occur.
You might want to setup your job submission script to create temporary scratch directory for each job to run in, and have the script copy back your final output files to whatever directory you submitted the job from, for example.
And no, GCC is not the problem here.
Jeff
On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel justin.peel@utah.edu wrote:
I have recently been running a lot of dmrg calculations on a linux
cluster. I realize that the dmrg program is not parallel programmed, but I run a lot of separate programs on separate processors. However, the clusters are set up so that there are 2 processors to a node. I'm using ALPS 1.3.3 with boost 1.34.1 and lp solve 4.0. I had to specify where the mpich2 files were when I compiled as well as specifying the compiler as gnu (maybe that's the problem?). I was told by the support staff of the linux cluster to try that.
The most distressing error has been "St9bad_alloc" which crashes the
program most of the time (sometimes it is able to keep going). I recently received that error followed by five lines of:
*** ERROR in dsyev: info != 0 (failed to converge)
when I was running a 2D heisenberg model (4x4 lattice).
I also have received the error: *** glibc detected *** double free or corruption (!prev):
0x00000000007b34a0 ***
but it doesn't seem to crash the program ever and the results seem to be
fine so I'm not as worried about that one. I don't receive these errors all the time, but they worry me all the same. Any ideas on what might be wrong? Is it because I used gnu as the compiler?
Thanks, Justin
-- Jeff Hammond The University of Chicago http://home.uchicago.edu/~jhammond/http://home.uchicago.edu/%7Ejhammond/