It is currently Sun Jun 16, 2019 2:44 am

Tell a friend!


Post new topic Reply to topic  [ 7 posts ] 
BAD TERMINATION using epw.x 
Author Message

Joined: Thu May 05, 2016 5:18 pm
Posts: 63
University: Rondonia university
Post BAD TERMINATION using epw.x
Dear all,

I am running EPW-4.3 with QE-6.2.1. Once the "amn" files are being calculated , the code stops with the error:

"AMN calculated

MMN
k points = 576 in 512 pools
1 of 2 on ionode
2 of 2 on ionode

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 17975 RUNNING AT kcn464.local
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES"

Checking the error file I get:

"forrtl: error (69): process interrupted (SIGINT)
Image PC Routine Line Source
epw.x 00000000010A1A21 Unknown Unknown Unknown
epw.x 00000000010A0177 Unknown Unknown Unknown
epw.x 0000000000FA3464 Unknown Unknown Unknown
epw.x 0000000000FA3276 Unknown Unknown Unknown
epw.x 0000000000F23B04 Unknown Unknown Unknown
epw.x 0000000000F2B887 Unknown Unknown Unknown
libpthread.so.0 00002ADC89925500 Unknown Unknown Unknown
libc.so.6 00002ADC89E863A7 Unknown Unknown Unknown
libmpi.so.12 00002ADC88E56034 Unknown Unknown Unknown
libmpi.so.12 00002ADC88E503E3 Unknown Unknown Unknown
libmpi.so.12 00002ADC88F61CE9 Unknown Unknown Unknown
libmpi.so.12 00002ADC88F619DF Unknown Unknown Unknown
libmpi.so.12 00002ADC88E2385C Unknown Unknown Unknown
libmpi.so.12 00002ADC88E2773D Unknown Unknown Unknown
libmpi.so.12 00002ADC88E26F6E Unknown Unknown Unknown
libmpifort.so.12 00002ADC889FEC57 Unknown Unknown Unknown
epw.x 0000000000D04526 reduce_base_real_ 303 mp_base.f90
epw.x 0000000000CF6519 mp_mp_mp_sum_c4d_ 1537 mp.f90
epw.x 00000000005059A6 compute_mmn_para_ 1125 pw2wan90epw.f90
epw.x 00000000004F2FED pw2wan90epw_ 78 pw2wan90epw.f90
epw.x 00000000004F100C wann_run_ 69 wannierize.f90
epw.x 0000000000412C6E MAIN__ 137 epw.f90
epw.x 0000000000411EDE Unknown Unknown Unknown
libc.so.6 00002ADC89DD5CDD Unknown Unknown Unknown
epw.x 0000000000411DE9 Unknown Unknown Unknown"

I suspect that this due to the number of processors used. I am using

mpirun -np 512 /......./EPW/bin/epw.x -npool 512 < epw.in > epw.out

I need such a large number to accelerate the calculations. In fact I usually define the nodes in the pbs file as:

select= ;ncpus=;mpiprocs=;ompthreads=;

How is it possible to define the variables above such that the number of processors be equal to the number of pools (by defining more than one node, that is without having to put 512 processors per single node)?

Thanks in advance

Elio

_________________
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia


Mon Jun 11, 2018 12:22 am
Profile E-mail

Joined: Thu May 05, 2016 5:18 pm
Posts: 63
University: Rondonia university
Post Re: BAD TERMINATION using epw.x
Just a quick update on the matter...

I performed the scf, nscf and epw calculations using np 64 and npool 64 by issuing

mpirun -np 64 /path/to/executable/pw.x -npool 64 <scf.in > scf.out

mpirun -np 64 /path/to/executable/pw.x -npool 64 <nscf.in > nscf.out

mpirun -np 64 /path/to/executable/epw.x -npool 64 <epw.in > epw.out

All went fine until the MMN calculation. the code simply froze at:

MMN
k points = 576 in 64 pools
1 of 9 on ionode
2 of 9 on ionode
3 of 9 on ionode
4 of 9 on ionode
5 of 9 on ionode
6 of 9 on ionode
7 of 9 on ionode
8 of 9 on ionode
9 of 9 on ionode
MMN calculated

Also the wout file didn't contain the Wannier spread functions which are usually obtained using the W90 code alone with wannier90.x and pw2wannier90.x executables.

Any idea what is going on .. Is it still a wrong choice of procs and npool?

Regards

_________________
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia


Mon Jun 11, 2018 5:20 am
Profile E-mail
Site Admin
User avatar

Joined: Wed Jan 13, 2016 7:25 pm
Posts: 569
University: Oxford
Post Re: BAD TERMINATION using epw.x
Hello,

It could be that EPW is not compiled properly.

Is the test-suite running properly ?

You can go in cd q-e/test-suite
then do

make run-custom-test-parallel testdir=epw_base

Does that work ?

If not, then you have to recompile.

Best wishes,
Samuel

_________________
Dr. Samuel Poncé
Department of Materials
University of Oxford
Parks Road
Oxford OX1 3PH, UK


Tue Jun 12, 2018 1:16 pm
Profile E-mail

Joined: Thu May 05, 2016 5:18 pm
Posts: 63
University: Rondonia university
Post Re: BAD TERMINATION using epw.x
Dear Samuel,

Thanks for your reply. I have tried it for a system such as Al and it worked fine and it proceeded.

Anyhow it looks like it worked well when I omitted the line wdata(1)= 'exclude_bands:.....'. The bands calculated in scf are 40 whereas the Wannier functions are 11. The option exclude_bands works perfectly fine in WANNIER90 but it seems that it causes problems in EPW. I have defined the min and max frozen windows including those 11 bands only and defined nbandsub=11 omitting the exclude_band option and the calculations re running smoothly.

Regards

_________________
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia


Tue Jun 12, 2018 11:01 pm
Profile E-mail

Joined: Thu May 05, 2016 5:18 pm
Posts: 63
University: Rondonia university
Post Re: BAD TERMINATION using epw.x
Hello again,

The Wannier calculations are successfully done . Now the code is freezing at the "kmaps" calculations:
"Calculating kmap and kgmap
Progress kmap: ########################################"
for more than one hour and 40 minutes. My system has 3 atoms per unit cell with 256 points on the coarse electronic grid (as a test at this stage). I am using 64 processors :

mpirun -np 64 /......./epw.x -npool 64 < epw1.in > epw1.out , which is 1 node with 64 processors.

I know that the kmap calculations are computationally expensive but is this normal?

if yes, Is it possible to speed up the calculations by using more nodes or any other distribution of processors? What would the format be?

Sorry to bother you with this but I am really desperate to finish testing and delve into the real calculations and it is taking long!

Regards

_________________
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia


Wed Jun 13, 2018 2:54 am
Profile E-mail
Site Admin
User avatar

Joined: Wed Jan 13, 2016 7:25 pm
Posts: 569
University: Oxford
Post Re: BAD TERMINATION using epw.x
Dear eliephys78,

For your first issue, indeed EPW did not support "exclude_bands" from Wannier.

However Roxana has been working on it and it might be working in the most recent EPW development version.

For the second issue, it should indeed not take that much time.

Maybe try login on the node to make sure the memory is not exploding and/or the job is not dead.

If the code works for simple small systems, then it is most likely a problem with memory.

Best wishes,
Samuel

_________________
Dr. Samuel Poncé
Department of Materials
University of Oxford
Parks Road
Oxford OX1 3PH, UK


Thu Jun 14, 2018 1:58 pm
Profile E-mail

Joined: Thu May 05, 2016 5:18 pm
Posts: 63
University: Rondonia university
Post Re: BAD TERMINATION using epw.x
Dear Samuel,

Thanks a lot for your help as always. i will try to figure out what is going on.

Regards

_________________
Physics Department
university of Rondonia Brazil
Porto Velho- Rondonia


Thu Jun 14, 2018 10:20 pm
Profile E-mail
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 


Who is online

Users browsing this forum: Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
Powered by phpBB © phpBB Group.
Designed by Vjacheslav Trushkin

(All content on this board is governed by and is the sole responsibility of the board administrator.)


Gratis forum Free forum hosting| gratis phpbb3 forum | phpbb3 styles