Issue No. 06 - Nov.-Dec. (2014 vol. 18)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MIC.2014.104
Andrew G. West , Verisign Labs
Adam J. Aviv , US Naval Academy
Publicly posted URLs sometimes contain a wealth of information about the identities and activities of the users who share them. URLs often utilize query strings -- that is, key-value pairs appended to the URL path -- to pass session parameters and form data. Although often benign and necessary to render the Web page, query strings sometimes contain tracking mechanisms, usernames, email addresses, and other information that users might not wish to publicly reveal. In isolation, this isn't particularly problematic, but the growth of Web 2.0 platforms such as social networks and microblogging means URLs, which are often copied and pasted from Web browsers, are increasingly publicly broadcast. To study URL sharing's privacy ramifications, the authors ran a measurement study that looked at 892 million user-submitted URLs, many disseminated in semipublic forums. That corpus contained a trove of personal information, including 1.7 million email addresses. In the most egregious examples, query strings contain plaintext usernames and passwords for administrative and sensitive accounts. The authors identify data leakage via both key-driven and value-driven analysis using manual inspections and automatic detection logic. Additionally, they analyze the click-through rates of sensitive URLs, examine geographical and mobile behavior patterns, and measure the broader statistical properties of key-value pairs. Finally, they propose a CleanURL service that can "scrub"' URLs of privacy-violating content.
Privacy, Entropy, Electronic mail, Mobile communication, Internet, Mobile handsets, Uniform resource locators, Computer security, Query processing
Andrew G. West, Adam J. Aviv, "Measuring Privacy Disclosures in URL Query Strings", IEEE Internet Computing, vol. 18, no. , pp. 52-59, Nov.-Dec. 2014, doi:10.1109/MIC.2014.104