Optimisation of MPI global calls
This issue was created automatically from an original CUE issue. Further discussion may take place here.
The goal here woudl be to optimise scalability in regards to call to p_max/p_sum/p_min...
Sometimes 4 call to p_max are made on a 4 integers when it might be better to do one call on a 4-integer array.
Or calls within a loop on an element of an array when the call could be made on the array itself outside the loop.
Here is a list of files where it should be investiguated:
- sources/gaia/flyxptr_gaia.f: same as above but for gaia
- sources/utils/bief/fluxpr.f: call to p_min,p_max within loop on nsec could be done outside of loop
- sources/telemac2d/sluxpr_telemac.f: same as above
- sources/gaia/conlit_gaia.f: p_max(yadeb) within nfrliq loop could be done in outside loop
- sources/telemac2d/bord.f: p_max(yadeb) within nfrliq loop could be done in outside loop
- sources/telemac2d/oilspill.f: group p_sum in balance of oil spill
- sources/telemac2d/buse.f: group p_min and p_max
- sources/telemac2d/lecbus.f: group p_min p_max
- sources/telemac2d/cltrac.f: group p_max,p_min
Also it is often use p_min + p_max to get value where some are set to zero and the rest to the same value.
Creating a costum operator to handle that might be worth it.
Edited by Yoann Audouin