- jeromel's home page
- Posts
- 2020
- 2019
- 2018
- 2017
- 2016
- 2015
- December (1)
- November (1)
- October (2)
- September (1)
- July (2)
- June (1)
- March (3)
- February (1)
- January (1)
- 2014
- 2013
- 2012
- 2011
- 2010
- December (2)
- November (1)
- October (4)
- August (3)
- July (3)
- June (2)
- May (1)
- April (4)
- March (1)
- February (1)
- January (2)
- 2009
- December (3)
- October (1)
- September (1)
- July (1)
- June (1)
- April (1)
- March (4)
- February (6)
- January (1)
- 2008
- My blog
- Post new blog entry
- All blogs
Dataset acess time in Xrootd. 2016 survey
Updated on Tue, 2016-05-10 10:35. Originally created by jeromel on 2016-05-10 08:19.
From the "raw" numbers, we counted the number of access to the files and sum it. Every count reported is hence not how many times a dataset was accessed but how many files from this dataset was requested from Xrootd.
The access count is showed below.
A few pattern appeared as follows
For eye guiding, below are the access count by groupings (best intent / there may be repeats of series).
Xrootd dataset access
A survey of the access time of all files in the Xrootd distributed data storage was done on 2016/05/09 and data extracted and regrouped by triggersetup name / production name (there are many datasets and combinations and the graphing became coubersome at best). Precision on the stream may have helped clarify (but this was a quick first attempt to extract some useful information). For example AuAu_200_production_mid_2014,P15ic and AuAu_200_production_mid_2014,P15ie are not a repeat of the same production (the former relates to st_physics with HFT tracking, the second st_mtd stream data only). A later version may present a split but will expand / inflate the number of representation to an un-manageable number (again, please take this as guidance and general lesson learn).From the "raw" numbers, we counted the number of access to the files and sum it. Every count reported is hence not how many times a dataset was accessed but how many files from this dataset was requested from Xrootd.
The access count is showed below.
Datasets | Access count |
muonminbias,P06ie | 6 |
ppEmcBackgroundCheck,P06ie | 14 |
pp2pp_VPDMB,P10ic | 20 |
pp500_production_fms_2013,P14ia | 28 |
vernier_scan,P10ic | 33 |
tof_prepost_himult,P10ic | 34 |
tof_prepost_himult,P11id | 34 |
tof_production2009_single,P10ic | 35 |
production2009_500Gev_25,P09ig | 37 |
ppLongTest,P06ie | 40 |
upsilonTest,P06ie | 52 |
vernier_scan,P11id | 59 |
barrelBackground62,P12ia | 75 |
ppProductionTransFPDonly,P06ie | 76 |
pp500_production_2013a,P14ia | 93 |
tof_production2009_single,P11id | 128 |
LowLuminosity_2010,P10ik | 139 |
production_fms_pp200long2_2015,P15ik | 148 |
pp2ppStrawMan,P10ic | 149 |
ppEmcCheck,P06ie | 149 |
low_luminosity2009,P10ic | 159 |
pp200_production_2012_setup,P12id | 181 |
pp500_upc_2013,P14ia | 233 |
production2009_500Gev_b,P09ig | 312 |
production_fms_pp200trans_2015,P15ik | 327 |
pp500_production_fmsonly_2013,P14ia | 341 |
barrelBackground,P06ie | 397 |
production_pp200long3_2015,P15ik | 407 |
low_luminosity2009,P11id | 431 |
production_pAu200_fms_2015,P15il | 487 |
pp500_lowluminosity_2012,P13ib | 490 |
pp500_production_fmsonly_2013,P14ig | 521 |
production2009_500GeV_carl,P09ig | 521 |
commission2009_200Gev_Lo,P10ic | 541 |
pp200_production_fms_2012,P12id | 549 |
ppProductionLongNoEmc,P06ie | 616 |
pp200_production_noemc_2012,P12id | 698 |
AuAu_200_production_mid_2014,P15il | 755 |
commission2009_200Gev_Lo,P11id | 831 |
production2009_200Gev_nocal,P10ic | 969 |
cu62productionMinBias,P13ib | 1013 |
production2009_200Gev_noendcap,P10ic | 1025 |
AuAu_200_production_low_2014,P15il | 1059 |
commission2009_200Gev_Hi,P10ic | 1212 |
pp500_production_fms_2012,P13ib | 1497 |
production62GeV,P04ie | 1775 |
production62GeV,P04id | 1840 |
production2009_200Gev_nocal,P11id | 1975 |
pp2pp_Production2009,P10ic | 2027 |
production2009_200Gev_noendcap,P11id | 2058 |
ppProductionTransNoEMC,P06ie | 2093 |
commission2009_200Gev_Hi,P11id | 2483 |
pp500_production_2013_noendcap,P14ia | 2882 |
pp500_production_2012_noeemc,P13ib | 3232 |
production2009_200Gev_Lo,P10ic | 3314 |
pp500_production_2013_noendcap,P14ig | 3332 |
ppProductionJPsi,P06ie | 3343 |
production2009_500GeV,P09ig | 3680 |
production2009_200Gev_Lo,P11id | 4241 |
zdc_polarimetry,P10ic | 5221 |
pp500_production_2011_noeemc,P11id | 5329 |
zdc_polarimetry,P11id | 5574 |
ppProductionMB62,P12ia | 6582 |
ppProduction62,P12ia | 6612 |
AuAu11_production,P10ih | 7824 |
pp2006MinBias,P06ie | 8063 |
ppProduction,P06ie | 10305 |
production2009_200Gev_Hi,P10ic | 13462 |
production_pAu200_2015,P15il | 16121 |
production2009_500Gev_c,P09ig | 18809 |
AuAu19_production,P11id | 19277 |
pp500_production_2011_long,P11id | 22772 |
AuAu7_production,P10ih | 24613 |
ppProductionLong,P06ie | 25085 |
productionMinBias,P05ic | 25469 |
production2009_200Gev_Hi,P11id | 29870 |
cuAu_production_2012,P15ie | 30779 |
AuAu27_production_2011,P11id | 30914 |
ppProductionTrans,P06ie | 31101 |
cuAu_production_2012,P14ia | 32217 |
production_pp200long2_2015,P15ik | 47721 |
production_pp200trans_2015,P15ik | 49772 |
AuAu_200_production_2014,P15ic | 50739 |
production_15GeV_2014,P14ii | 53324 |
AuAu_200_production_high_2014,P15ic | 62430 |
production2009_200Gev_Single,P10ic | 72434 |
AuAu39_production,P10ik | 75555 |
AuAu62_production,P10ik | 76928 |
ppProduction,P05if | 78457 |
pp500_production_2011,P11id | 80153 |
AuAu_200_production_2014,P15ie | 86255 |
AuAu_200_production_low_2014,P15ic | 103837 |
pp200_production_2012,P13ib | 113806 |
AuAu_200_production_low_2014,P15ie | 127728 |
pp200_production_2012,P12id | 142581 |
production2009_200Gev_Single,P11id | 146920 |
AuAu_200_production_mid_2014,P15ic | 154651 |
pp500_production_2012,P13ib | 183427 |
AuAu_200_production_mid_2014,P15ie | 205764 |
UU_production_2012,P12id | 208982 |
CosmicLocalClock,P11id | 216021 |
AuAu200_production_2011,P11id | 231039 |
pp500_production_2013,P14ia | 265296 |
pp500_production_2013,P14ig | 299906 |
AuAu_200_production_high_2014,P15ie | 395588 |
AuAu200_production,P10ik | 438687 |
A few pattern appeared as follows
- Based on he access count alone and the number of datasets as defined above (106 total), a first finding is that we see that 14% of those datasets have been accessed less than 100 counts, 23% for < 500 counts, 32% for < 1,000 counts, 50% for < 5,000 counts and 57% for < 10,000 access counts over a period of 10 months.
Observation 1: there is room for some cleanup although those datasets are unlikely large (total side still being sorted)
- Some dataset such as AuAu11_production,P10ih had been on storage for a while but not accessed at all with a small usage for the past 2 months (8k access) - this is the same for AuAu27_production_2011,P11id / AuAu39_production,P10ik / AuAu7_production,P10ih but also AuAu_200_production_2014,P15ie
Observation 2: some of those are old and it is unclear why they kept coming as data to preserve if they were not being used. From a purely resource perspective, those should have been rotated out and restored whenever needed.
For eye guiding, below are the access count by groupings (best intent / there may be repeats of series).
Au+Au 200 series
Executive summary: Overall, all datasets are being accessed in this category modulo a few of them with a clear access activity drop in the last month (not a single access) very visible and identifiable on the graph.BES, special trigger series
In this category, there is a slew of small sets that are rarely acessed (all squished to the bottom) includign things like zdc_polarimetry but also things like p+Au 200 2015 (P15il) or Cu+Au production 2012 (P15ie) that has not been accessed for a month and Au+Au 62 GeV (P10ik) not accessed for 3 months+ now. The U+U production 2012 (P12id) seems to be the most popular followed by Au+Au 39 (P10ik). Not sure of the reason but the CosmicClock dataset has been regularly acessed across the entire time period.p+p series
200 GeV
From this graph, we can see that most of the p+p 200 GeV datasets are not being accessed in the last months to the exception of only a few datasets (3 at most). Everything is below a few 1,000 access.500 GeV
The 500 GeV datasets show a similar pattern - most datasets have not been accessed in the past month (such as p+p 500 production 2012 P13ib) and a slew of datasets have not been accessed at all in ages (only 4 are regaulrly accessed).»
- jeromel's blog
- Login or register to post comments