Aborting showers with EPOS-LHC
Hi,
I've been producing CORSIKA showers (version 7.7420) on HoreKa using the EPOS hadronic model after I already successfully produced datasets with Sibyll2.3d. A small fraction of showers (<1%) however does not complete, even though I give the jobs sufficient memory. I identify incomplete showers by the log file which simply stops mid-way, thus doesn't end with the typical "========== END OF RUN ================================================" line. Deleting the corrupt files (data, input, log and long) and re-running the job did work a few times but most of the time, the same showers abort again and again.
The error file for a good shower looks like this:
b"rm: cannot remove '/hkfs/work/workspace/scratch/mk9399-corsika_eposlhc/datasets/proton_15001//temp//6.4/640041DAT640041': No such file or directory\nrm: cannot remove '/hkfs/work/workspace/scratch/mk9399-corsika_eposlhc/datasets/proton_15001//temp//6.4/640041DAT640041.long': No such file or directory\nrm: cannot remove '/hkfs/work/workspace/scratch/mk9399-corsika_eposlhc/datasets/proton_15001//log//6.4/DAT640041.log': No such file or directory\nNote: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL\n"
In contrast, the corrupt shower error files look like:
b"rm: cannot remove '/hkfs/work/workspace/scratch/mk9399-corsika_eposlhc/datasets/proton_15001//temp//6.4/640042DAT640042': No such file or directory\nrm: cannot remove '/hkfs/work/workspace/scratch/mk9399-corsika_eposlhc/datasets/proton_15001//temp//6.4/640042DAT640042.long': No such file or directory\nrm: cannot remove '/hkfs/work/workspace/scratch/mk9399-corsika_eposlhc/datasets/proton_15001//log//6.4/DAT640042.log': No such file or directory\n\nProgram received signal SIGSEGV: Segmentation fault - invalid memory reference.\n\nBacktrace for this error:\n#0 0x151cc7fdc6b0 in ???\n#1 0x151cc7fdb905 in ???\n#2 0x151cc7237b1f in ???\n#3 0x4a1be0 in conxyz_\n\tat /home/hk-project-icecores/mk9399/corsika/epos/epos/epos-con-lhc.f:1143\n#4 0x4a2086 in conaa_\n\tat /home/hk-project-icecores/mk9399/corsika/epos/epos/epos-con-lhc.f:104\n#5 0x47d87b in emsaaa_\n\tat /home/hk-project-icecores/mk9399/corsika/epos/epos/epos-bas-lhc.f:5640\n#6 0x49e7c1 in aepos_\n\tat /home/hk-project-icecores/mk9399/corsika/epos/epos/epos-bas-lhc.f:5340\n#7 0x42d982 in nexlnk_\n\tat /home/hk-project-icecores/mk9399/corsika/epos/src/corsika.F:65462\n#8 0x42fc6f in nucint_\n\tat /home/hk-project-icecores/mk9399/corsika/epos/src/corsika.F:25390\n#9 0x46b5c0 in aamain\n\tat /home/hk-project-icecores/mk9399/corsika/epos/src/corsika.F:3699\n#10 0x40372c in main\n\tat /home/hk-project-icecores/mk9399/corsika/epos/src/corsika.F:5518\n/hkfs/work/workspace/scratch/mk9399-corsika_eposlhc/datasets/proton_15001//temp//6.4/temp_640042.sh: line 6: 1473164 Segmentation fault (core dumped) /home/hk-project-icecores/mk9399/corsika/epos/run//corsika77420Linux_EPOS_flukainfn < /hkfs/work/workspace/scratch/mk9399-corsika_eposlhc/datasets/proton_15001//inp//6.4/SIM640042.inp > /hkfs/work/workspace/scratch/mk9399-corsika_eposlhc/datasets/proton_15001//log//6.4/DAT640042.log\n"
Tanguy already provided me with an alternative epos-con-lhc.f
file which didn't have any effect however.
Best, Julian