Impacts of loading high-core-count nodes

Running many similar root4star jobs simultaneously on any high-core-count node in the SDCC farm leads to slower (longer) CPU times, less efficient cores. These efficiency losses are outweighed by the benefit of more cores, so maximal throughput per node is still with all (virtual) cores loaded. However, at the time of this posting, the efficiency losses are not understood.

These tests are on machines with 48 real cores, which provide 96 virtual cores via hyperthreading (HT).
  • Closed symbols are with HT; open symbols are without HT
  • Circles are 64-bit; triangles are 32-bit
  • Red and orange are on an Alma9 machine; blues and greens are on an SL7 machine
  • Darker colors are in SL7 containers; lighter colors are directly on SL7 (I do not have root4star running directly on Alma9)



Inverting and multiplying by the number of jobs (i.e. # jobs divided by slowdown factor) gives the effective number of cores on the machine (i.e. 45 means effectively x45 faster than a single, isolated core).




-Gene