This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Just-in-Time Analytics on Large File Systems
Nov. 2012 (vol. 61 no. 11)
pp. 1651-1664
H. Howie Huang, The George Washington University, Washington, DC
Nan Zhang, The George Washington University, Washington, DC
Wei Wang, University of Delaware, Newark
Gautam Das, University of Texas at Arlington, Arlington
Alexander S. Szalay, Johns Hopkins University, Baltimore
As file systems reach the petabytes scale, users and administrators are increasingly interested in acquiring high-level analytical information for file management and analysis. Two particularly important tasks are the processing of aggregate and top-k queries which, unfortunately, cannot be quickly answered by hierarchical file systems such as ext3 and NTFS. Existing preprocessing-based solutions, e.g., file system crawling and index building, consume a significant amount of time and space (for generating and maintaining the indexes) which in many cases cannot be justified by the infrequent usage of such solutions. In this paper, we advocate that user interests can often be sufficiently satisfied by approximate—i.e., statistically accurate—answers. We develop Glance, a just-in-time sampling-based system which, after consuming a small number of disk accesses, is capable of producing extremely accurate answers for a broad class of aggregate and top-k queries over a file system without the requirement of any prior knowledge. We use a number of real-world file systems to demonstrate the efficiency, accuracy, and scalability of Glance.
Index Terms:
Estimation,Aggregates,Indexes,Accuracy,Query processing,Calculators,History,file systems,Estimation,Aggregates,Indexes,Accuracy,Query processing,Calculators,History,Data analytics,Estimation,Aggregates,Indexes,Accuracy,Query processing,Calculators,History
Citation:
H. Howie Huang, Nan Zhang, Wei Wang, Gautam Das, Alexander S. Szalay, "Just-in-Time Analytics on Large File Systems," IEEE Transactions on Computers, vol. 61, no. 11, pp. 1651-1664, Nov. 2012, doi:10.1109/TC.2011.186
Usage of this product signifies your acceptance of the Terms of Use.