Re: [ALPS-users] Some errors while running dmrg

6 Feb 2009


      type "man dsyev" in the terminal.
if Lapack is properly installed, you should be able to see some explanation
about this LAPACK subroutine.
Can you find the explicit value of the "info" variable after the occurance
of the error.
As you will see in the manual page of "dsyev" that the value of "info" gives
clues about the nature of the error.
On Fri, Feb 6, 2009 at 12:11 PM, Justin David Peel justin.peel@utah.eduwrote:
...
I don't know the answers to all of those things (I'm not very experienced
with LINUX), but I do know some. The jobs are not all running in the same
shared folder; I am using a scratch folder (the LINUX cluster people direct
us to do so because it is faster disk access). The machine is 64-bit. I
don't know about the LAPACK library (I'm not sure how to check). I didn't
have to specify a location for that one or the BLAS library. I just followed
the ALPS instructions for installation so I don't know if I really installed
it for 64-bit or not. I've tried looking around the for LAPACK library but
haven't found it. Maybe I'll have to ask the people who run the cluster
about this.
Thanks for the reply,
Justin
-----Original Message-----
From: comp-phys-alps-users-bounces@phys.ethz.ch on behalf of Jeff Hammond
Sent: Fri 2/6/2009 9:45 AM
To: comp-phys-alps-users@phys.ethz.ch
Subject: Re: [ALPS-users] Some errors while running dmrg
This indicates a problem in LAPACK:
*** ERROR in dsyev: info != 0 (failed to converge)
I have had issues in the past with certain proprietary LAPACK
libraries not converging but the slower Netlib version does.
What library are you using?  Is your machine 64-bit?  Are you
compiling ALPS for 64-bit integers?  Is your BLAS/LAPACK for 64-bit
integers?
When you're running on the cluster with multiple independent jobs, are
you running in the same directory on a shared file system?  The DMRG
scratch files aren't named uniquely for each job so if they are all
being written to one directory on the same filesystem, each job will
overwrite the others files.  This can cause all sorts of terrible
things to occur.
You might want to setup your job submission script to create temporary
scratch directory for each job to run in, and have the script copy
back your final output files to whatever directory you submitted the
job from, for example.
And no, GCC is not the problem here.
Jeff
On Fri, Feb 6, 2009 at 10:35 AM, Justin David Peel justin.peel@utah.edu
wrote:
...
I have recently been running a lot of dmrg calculations on a linux
cluster. I realize that the dmrg program is not parallel programmed, but I
run a lot of separate programs on separate processors. However, the clusters
are set up so that there are 2 processors to a node. I'm using ALPS 1.3.3
with boost 1.34.1 and lp solve 4.0. I had to specify where the mpich2 files
were when I compiled as well as specifying the compiler as gnu (maybe that's
the problem?). I was told by the support staff of the linux cluster to try
that.
...
The most distressing error has been "St9bad_alloc" which crashes the
program most of the time (sometimes it is able to keep going). I recently
received that error followed by five lines of:
...
*** ERROR in dsyev: info != 0 (failed to converge)
when I was running a 2D heisenberg model (4x4 lattice).
I also have received the error:
*** glibc detected *** double free or corruption (!prev):
0x00000000007b34a0 ***
...
but it doesn't seem to crash the program ever and the results seem to be
fine so I'm not as worried about that one. I don't receive these errors all
the time, but they worry me all the same. Any ideas on what might be wrong?
Is it because I used gnu as the compiler?
...
Thanks,
Justin
--
Jeff Hammond
The University of Chicago
http://home.uchicago.edu/~jhammond/http://home.uchicago.edu/%7Ejhammond/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [ALPS-users] Some errors while running dmrg