The Community for Technology Leaders
2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL) (2014)
London, United Kingdom
Sept. 8, 2014 to Sept. 12, 2014
ISBN: 978-1-4799-5569-5
pp: 321-330
Justin F. Brunelle , Old Dominion University, Department of Computer Science, Norfolk, Virginia, 23529, USA
Mat Kelly , Old Dominion University, Department of Computer Science, Norfolk, Virginia, 23529, USA
Hany SalahEldeen , Old Dominion University, Department of Computer Science, Norfolk, Virginia, 23529, USA
Michele C. Weigle , Old Dominion University, Department of Computer Science, Norfolk, Virginia, 23529, USA
Michael L. Nelson , Old Dominion University, Department of Computer Science, Norfolk, Virginia, 23529, USA
ABSTRACT
Web archives do not capture every resource on every page that they attempt to archive. This results in archived pages missing a portion of their embedded resources. These embedded resources have varying historic, utility, and importance values. The proportion of missing embedded resources does not provide an accurate measure of their impact on the Web page; some embedded resources are more important to the utility of a page than others. We propose a method to measure the relative value of embedded resources and assign a damage rating to archived pages as a way to evaluate archival success. In this paper, we show that Web users' perceptions of damage are not accurately estimated by the proportion of missing embedded resources. The proportion of missing embedded resources is a less accurate estimate of resource damage than a random selection. We propose a damage rating algorithm that provides closer alignment to Web user perception, providing an overall improved agreement with users on memento damage by 17% and an improvement by 51% if the mementos are not similarly damaged. We use our algorithm to measure damage in the Internet Archive, showing that it is getting better at mitigating damage over time (going from 0.16 in 1998 to 0.13 in 2013). However, we show that a greater number of important embedded resources (2.05 per memento on average) are missing over time.
INDEX TERMS
Internet, Web pages, Cascading style sheets, Harmonic analysis, Equations, Mathematical model, Twitter
CITATION
Justin F. Brunelle, Mat Kelly, Hany SalahEldeen, Michele C. Weigle, Michael L. Nelson, "Not all mementos are created equal: Measuring the impact of missing resources", 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), vol. 00, no. , pp. 321-330, 2014, doi:10.1109/JCDL.2014.6970187
82 ms
(Ver 3.3 (11022016))