Dataset acess time in Xrootd. 2016 survey
Updated on Tue, 2016-05-10 10:35. Originally created by jeromel on 2016-05-10 08:19.
From the "raw" numbers, we counted the number of access to the files and sum it. Every count reported is hence not how many times a dataset was accessed but how many files from this dataset was requested from Xrootd.
The access count is showed below.
A few pattern appeared as follows
For eye guiding, below are the access count by groupings (best intent / there may be repeats of series).

Xrootd dataset access
A survey of the access time of all files in the Xrootd distributed data storage was done on 2016/05/09 and data extracted and regrouped by triggersetup name / production name (there are many datasets and combinations and the graphing became coubersome at best). Precision on the stream may have helped clarify (but this was a quick first attempt to extract some useful information). For example AuAu_200_production_mid_2014,P15ic and AuAu_200_production_mid_2014,P15ie are not a repeat of the same production (the former relates to st_physics with HFT tracking, the second st_mtd stream data only). A later version may present a split but will expand / inflate the number of representation to an un-manageable number (again, please take this as guidance and general lesson learn).
The access count is showed below.
Datasets | Access count |
muonminbias,P06ie | 6 |
ppEmcBackgroundCheck,P06ie | 14 |
pp2pp_VPDMB,P10ic | 20 |
pp500_production_fms_2013,P14ia | 28 |
vernier_scan,P10ic | 33 |
tof_prepost_himult,P10ic | 34 |
tof_prepost_himult,P11id | 34 |
tof_production2009_single,P10ic | 35 |
production2009_500Gev_25,P09ig | 37 |
ppLongTest,P06ie | 40 |
upsilonTest,P06ie | 52 |
vernier_scan,P11id | 59 |
barrelBackground62,P12ia | 75 |
ppProductionTransFPDonly,P06ie | 76 |
pp500_production_2013a,P14ia | 93 |
tof_production2009_single,P11id | 128 |
LowLuminosity_2010,P10ik | 139 |
production_fms_pp200long2_2015,P15ik | 148 |
pp2ppStrawMan,P10ic | 149 |
ppEmcCheck,P06ie | 149 |
low_luminosity2009,P10ic | 159 |
pp200_production_2012_setup,P12id | 181 |
pp500_upc_2013,P14ia | 233 |
production2009_500Gev_b,P09ig | 312 |
production_fms_pp200trans_2015,P15ik | 327 |
pp500_production_fmsonly_2013,P14ia | 341 |
barrelBackground,P06ie | 397 |
production_pp200long3_2015,P15ik | 407 |
low_luminosity2009,P11id | 431 |
production_pAu200_fms_2015,P15il | 487 |
pp500_lowluminosity_2012,P13ib | 490 |
pp500_production_fmsonly_2013,P14ig | 521 |
production2009_500GeV_carl,P09ig | 521 |
commission2009_200Gev_Lo,P10ic | 541 |
pp200_production_fms_2012,P12id | 549 |
ppProductionLongNoEmc,P06ie | 616 |
pp200_production_noemc_2012,P12id | 698 |
AuAu_200_production_mid_2014,P15il | 755 |
commission2009_200Gev_Lo,P11id | 831 |
production2009_200Gev_nocal,P10ic | 969 |
cu62productionMinBias,P13ib | 1013 |
production2009_200Gev_noendcap,P10ic | 1025 |
AuAu_200_production_low_2014,P15il | 1059 |
commission2009_200Gev_Hi,P10ic | 1212 |
pp500_production_fms_2012,P13ib | 1497 |
production62GeV,P04ie | 1775 |
production62GeV,P04id | 1840 |
production2009_200Gev_nocal,P11id | 1975 |
pp2pp_Production2009,P10ic | 2027 |
production2009_200Gev_noendcap,P11id | 2058 |
ppProductionTransNoEMC,P06ie | 2093 |
commission2009_200Gev_Hi,P11id | 2483 |
pp500_production_2013_noendcap,P14ia | 2882 |
pp500_production_2012_noeemc,P13ib | 3232 |
production2009_200Gev_Lo,P10ic | 3314 |
pp500_production_2013_noendcap,P14ig | 3332 |
ppProductionJPsi,P06ie | 3343 |
production2009_500GeV,P09ig | 3680 |
production2009_200Gev_Lo,P11id | 4241 |
zdc_polarimetry,P10ic | 5221 |
pp500_production_2011_noeemc,P11id | 5329 |
zdc_polarimetry,P11id | 5574 |
ppProductionMB62,P12ia | 6582 |
ppProduction62,P12ia | 6612 |
AuAu11_production,P10ih | 7824 |
pp2006MinBias,P06ie | 8063 |
ppProduction,P06ie | 10305 |
production2009_200Gev_Hi,P10ic | 13462 |
production_pAu200_2015,P15il | 16121 |
production2009_500Gev_c,P09ig | 18809 |
AuAu19_production,P11id | 19277 |
pp500_production_2011_long,P11id | 22772 |
AuAu7_production,P10ih | 24613 |
ppProductionLong,P06ie | 25085 |
productionMinBias,P05ic | 25469 |
production2009_200Gev_Hi,P11id | 29870 |
cuAu_production_2012,P15ie | 30779 |
AuAu27_production_2011,P11id | 30914 |
ppProductionTrans,P06ie | 31101 |
cuAu_production_2012,P14ia | 32217 |
production_pp200long2_2015,P15ik | 47721 |
production_pp200trans_2015,P15ik | 49772 |
AuAu_200_production_2014,P15ic | 50739 |
production_15GeV_2014,P14ii | 53324 |
AuAu_200_production_high_2014,P15ic | 62430 |
production2009_200Gev_Single,P10ic | 72434 |
AuAu39_production,P10ik | 75555 |
AuAu62_production,P10ik | 76928 |
ppProduction,P05if | 78457 |
pp500_production_2011,P11id | 80153 |
AuAu_200_production_2014,P15ie | 86255 |
AuAu_200_production_low_2014,P15ic | 103837 |
pp200_production_2012,P13ib | 113806 |
AuAu_200_production_low_2014,P15ie | 127728 |
pp200_production_2012,P12id | 142581 |
production2009_200Gev_Single,P11id | 146920 |
AuAu_200_production_mid_2014,P15ic | 154651 |
pp500_production_2012,P13ib | 183427 |
AuAu_200_production_mid_2014,P15ie | 205764 |
UU_production_2012,P12id | 208982 |
CosmicLocalClock,P11id | 216021 |
AuAu200_production_2011,P11id | 231039 |
pp500_production_2013,P14ia | 265296 |
pp500_production_2013,P14ig | 299906 |
AuAu_200_production_high_2014,P15ie | 395588 |
AuAu200_production,P10ik | 438687 |
A few pattern appeared as follows
- Based on he access count alone and the number of datasets as defined above (106 total), a first finding is that we see that 14% of those datasets have been accessed less than 100 counts, 23% for < 500 counts, 32% for < 1,000 counts, 50% for < 5,000 counts and 57% for < 10,000 access counts over a period of 10 months.
Observation 1: there is room for some cleanup although those datasets are unlikely large (total side still being sorted)
- Some dataset such as AuAu11_production,P10ih had been on storage for a while but not accessed at all with a small usage for the past 2 months (8k access) - this is the same for AuAu27_production_2011,P11id / AuAu39_production,P10ik / AuAu7_production,P10ih but also AuAu_200_production_2014,P15ie
Observation 2: some of those are old and it is unclear why they kept coming as data to preserve if they were not being used. From a purely resource perspective, those should have been rotated out and restored whenever needed.
For eye guiding, below are the access count by groupings (best intent / there may be repeats of series).
Au+Au 200 series
Executive summary: Overall, all datasets are being accessed in this category modulo a few of them with a clear access activity drop in the last month (not a single access) very visible and identifiable on the graph.BES, special trigger series
In this category, there is a slew of small sets that are rarely acessed (all squished to the bottom) includign things like zdc_polarimetry but also things like p+Au 200 2015 (P15il) or Cu+Au production 2012 (P15ie) that has not been accessed for a month and Au+Au 62 GeV (P10ik) not accessed for 3 months+ now. The U+U production 2012 (P12id) seems to be the most popular followed by Au+Au 39 (P10ik). Not sure of the reason but the CosmicClock dataset has been regularly acessed across the entire time period.p+p series
200 GeV
From this graph, we can see that most of the p+p 200 GeV datasets are not being accessed in the last months to the exception of only a few datasets (3 at most). Everything is below a few 1,000 access.500 GeV
The 500 GeV datasets show a similar pattern - most datasets have not been accessed in the past month (such as p+p 500 production 2012 P13ib) and a slew of datasets have not been accessed at all in ages (only 4 are regaulrly accessed).»
