So, this remains a big puzzle. :( But thank you so very much Arthur for digging deeply into this, I really appreciate it. Let me try to summarize our findings so far, and please speak up if you think any of these conclusions is wrong!
So, overall this is very disturbing and puzzling. One caveat: I wrote above of "7.7420 mpi" but I am not sure if all those plots were indeed for the "7.7420 mpi" set or not some of them for the "7.75 mpi set"? @acorstanje can you clarify?
We had a very useful discussion in the LOFAR call today. Some noteworthy points:
A little update on the current state of knowledge:
Some additional info from Jelena:
I think the intermediate conclusion is that there is indeed some MPI-related problem. Maybe some runtime issue that was not there for older simulations. Although, in principle, if there is a runtime issue, the jobs should not produce any output.
Next step I think is to run one of the cancelling (and maybe also one of the endlessly-running) jobs with MPI debug output, to see more clearly what is going on. Jelena will do that.
As I was trying to simulate electric fields for Gamma Ray events (10 PeV) in CoREAS (corsika-7.7402 & corsika-7.5700) I observed that no efield traces are getting generated across any of the poles/antennas which can be also seen in 2 of the attached dat-files. I have been using the following models SIBYLL and Gheisha with thinning. While I am simulating for the other primaries such as Proton, Helium & Iron, the traces are generated properly.
Apparently, no error has popped up during corsika/coreas-compilation. I have been clueless to what is getting wrong for the Gamma Ray primaries !!
I agree that the author list should not be part of the repository. But maybe we should put a link to the current one on the Wiki? The link could actually also be printed out when one starts the corsika.cpp main program.
Thanks a lot, Felix! I think @lguelzow has to answer your specific questions.
There is this merge request https://gitlab.iap.kit.edu/AirShowerPhysics/corsika-legacy/corsika7/-/merge_requests/13 but no changes yet. The affected lines are quoted in the issue text.
If sampling rate for the radio traces is increased or slices in grammage are added, CoREAS becomes much slower. The amount of work to be done in the simulation does not increase with these changes, so it is not obvious why this happens. The most plausible explanation is that caching in the CPU cache becomes worse as more memory needs to be accessed. It would be good to look at this behaviour in a more systematic way and check whether the way that the data are stored in memory could be optimized (rows vs. columns, contiguity of data, these aspects).
I don't think it is a problem that CHARM always comes back. This is desired behaviour. If someone really wants to deactivate it, they need to contact you. ;) Hiding the "PYTHIADIR" also sounds like a good solution to me, as to not confuse people. (I was also confused!)
It seems to me we might want to close this issue? Probably the most confusing to the user is the appearance of "PYTHIADIR" because they are not aware that PYTHIA plays a role in this. Maybe this option could be renamed to something less confusing?