
Re: How to reduce load on parallel file system
ganwar wrote:
Hi,
I'm supporting a user on our HPC facility running epw from QE 6.3. Unfortunately the jobs the user is running is generating a very high load on our parallel file system (GPFS) to the extent that several (2-3) concurrent multi-node (between 3-10 nodes) jobs are causing the file system to become unusable for other users.
Does anyone have advice on reducing this IO load? I believe with QE (pw.x) can you separately set wfcdir to a local disk (for per processor files) and outdir to the parallel file system to reduce disk IO, as well as setting disk_io. However for epw it seems that everything goes via outdir and setting it to a local disk for multinode jobs results in MPI_FILE_OPEN errors.
Any advice or suggestions would be welcome, apologies if I've misunderstood or missed something.
Thanks
Hi Samuel,
I am the HPC user mentioned here. Following is my input file for epw.
--
&inputepw
prefix = 'NbCoSn',
amass(1) = 92.90638
amass(2) =58.933195
amass(3) =118.71
! outdir = '/tmp/esscmv/NbCoSn/'
! dvscf_dir = '/tinisgpfs/home/csc/esscmv/bandstructure_qe/NbCoSn/EPW_2/save'
outdir = './'
dvscf_dir = './save'
elph = .true.
kmaps = .true.
epbwrite = .true.
epbread = .false.
epwwrite = .true.
epwread = .false.
nbndsub = 12
nbndskip = 0
wannierize = .true.
num_iter = 300
dis_win_max = 25
dis_win_min = 0
dis_froz_min= 14
dis_froz_max= 25
wdata(1) = 'bands_plot = .true.'
wdata(2) = 'begin kpoint_path'
wdata(3) = 'G 0.00 0.00 0.00 X 0.00 0.50 0.50'
wdata(4) = 'X 0.00 0.50 0.50 W 0.25 0.50 0.75'
wdata(5) = 'W 0.25 0.50 0.75 L 0.50 0.50 0.50'
wdata(6) = 'L 0.50 0.50 0.50 K 0.375 0.375 0.75'
wdata(7) = 'K 0.375 0.375 0.75 G 0.00 0.00 0.00'
wdata(8) = 'G 0.00 0.00 0.00 L 0.50 0.50 0.50'
wdata(9) = 'end kpoint_path'
wdata(10) = 'bands_plot_format = gnuplot'
iverbosity = 3
etf_mem = 1
restart=.true.
restart_freq=1000
elecselfen = .true.
delta_approx= .true.
phonselfen = .false.
efermi_read = .true.
fermi_energy= 16.4224
fsthick = 2.5 ! eV
eptemp = 300 ! K
degaussw = 0.05 ! eV
a2f = .false.
nkf1 = 48
nkf2 = 48
nkf3 = 48
nqf1 = 48
nqf2 = 48
nqf3 = 48
nk1 = 24
nk2 = 24
nk3 = 24
nq1 = 6
nq2 = 6
nq3 = 6
/
16 cartesian
0.000000000000000E+00 0.000000000000000E+00 0.000000000000000E+00
0.117851130197756E+00 0.117851130197756E+00 -0.117851130197756E+00
0.235702260395511E+00 0.235702260395511E+00 -0.235702260395511E+00
-0.353553390593267E+00 -0.353553390593267E+00 0.353553390593267E+00
0.235702260395511E+00 -0.654205191118227E-17 0.654205191118227E-17
0.353553390593267E+00 0.117851130197756E+00 -0.117851130197756E+00
-0.235702260395511E+00 -0.471404520791023E+00 0.471404520791023E+00
-0.117851130197756E+00 -0.353553390593267E+00 0.353553390593267E+00
0.261682076447291E-16 -0.235702260395511E+00 0.235702260395511E+00
0.471404520791023E+00 -0.130841038223645E-16 0.130841038223645E-16
-0.117851130197756E+00 -0.589255650988778E+00 0.589255650988778E+00
-0.261682076447291E-16 -0.471404520791023E+00 0.471404520791023E+00
-0.707106781186534E+00 0.000000000000000E+00 0.000000000000000E+00
-0.235702260395511E+00 -0.471404520791023E+00 0.707106781186534E+00
-0.117851130197756E+00 -0.353553390593267E+00 0.589255650988778E+00
-0.707106781186534E+00 0.235702260395511E+00 0.261682076447291E-16
I would be relly grateful if you can have look and see if there is anything I can change here to solve this issue.
Regards,
Chathu