Browser Fingerprints Threaten Privacy
by George Lawton
The ongoing contest between Web users' privacy and behavior-tracking browser applications has moved from cookies to fingerprints. By gathering seemingly insignificant bits of information, such as a browser's version number and plug-ins, websites can uniquely identify ("fingerprint") a browser and, in extreme cases, its user. Browser fingerprints track users more accurately than cookies. They're also harder to detect and erase than predecessor technologies — including supercookies such as Flash local stored objects (LSOs). Moreover, their presence and sophistication are growing — not least because the technology has useful applications in detecting fraudulent online bank and merchant transactions.
And right now, websites can implement browser fingerprinting without user consent or knowledge, said Seth Schoen, staff technologist at the Electronic Frontier Foundation (EFF).
"We might not object to a financial site using these techniques to reduce fraud," Schoen said, "but that doesn't necessarily mean the technique should exist or that browsers shouldn't take measures to reduce distinctiveness. It's a problem when people can be tracked without their knowledge in a way that doesn't let them take measures to control tracking."
Internet Tracks
Websites commonly drop small files on visitors' browsers to record information about their visit, such as the items they looked at on a store or the articles they read at a news site. The practice can benefit visitors in some ways — for example, by reducing the need to reenter their names and passwords every time they check into a site.
However, privacy advocates have long condemned the practice when it's implemented without users' knowledge. They've focused on sites that didn't disclose the activity and on services, such as Doubleclick, that tracked a user's actions across multiple sites. Browser makers have responded by developing more sophisticated privacy-management tools that let users better manage cookies or choose a private browsing mode to surf anonymously. As consumers began adopting these tools, cookies became less effective for tracking online behavior. Many websites subsequently began using Flash LSOs. Because Flash is installed on over 99 percent of all browsers, it served essentially the same function as cookies. Additionally, it allowed multiple browsers to use the same LSOs.
This issue drew a lot of attention last year, when Berkeley researchers found that many sites were using Flash cookies without noting it in their privacy policies (www.computer.org/portal/web/computingnow/archive/news032). Adobe responded by improving the user tools for managing Flash supercookies. Flash version 10.1, which is scheduled for general release later this year, will include privacy enhancements that the company says will better mirror a local browser's privacy settings, including the option to completely disable Flash cookies and surf in a private browsing mode.
Again, these improvements are reducing tracking effectiveness. Results from a 2010 study conducted by Scout Analytics, a Web analytics service provider, showed that generic cookies overstating the number of unique users by two to four times and Flash cookies overstating results by 10 to 15 percent, which could increase when Flash upgrades its privacy-management tools.
Browser Entropy and User Privacy
Browser fingerprinting is the next generation of tracking technology. It's based on the idea of reducing the information entropy, or randomness, associated with identifying a variable. We can measure entropy in bits, and new information can reduce the amount of entropy by a certain number of bits. For example, learning someone's sex reduces entropy by 1 bit (21 = 2), while learning their birth month reduces entropy by 3.58 bits (23.58 = 12). Not all information bits lead to a closer identification, but to put this idea in context: In 2000, Latanya Sweeny — a professor of computer science at Carnegie Mellon University and director of its Laboratory for International Data Privacy — showed that 87 percent of Americans could be uniquely identified from three bits of information: birth date, zip code, and sex.
Some combinations of browser information can reduce device entropy by 18 bits or more. A crude form of the technique has been possible since HTTP 1.1 began supporting queries of up to 11 browser variables, such as its operating system, browser version, language, remote address, and file types it can open. About half these variables are useful for identifying a device, said Avivah Litan, research director at Gartner Group. But these don't reduce entropy enough to uniquely identify users.
Henrik Gemal, Danish Web applications developer, introduced JavaScript-based browser fingerprinting in 1999, when he built BrowserSpy as a collection of JavaScript utilities that could detect a browser’s name and version. At the time, he meant merely to demonstrate the privacy weaknesses in all browsers.
Browser Fingerprinting in Fraud Detection
In 2005, researchers at the University of California, San Diego, found variables that could measure across browsers or even multiple virtual machines on the same physical device. By measuring subtle shifts in the way different clocks track time, they reduced entropy in device IDs by 6.44 bits, making the odds of identifying a unique machine 1 in 86. This spurred the interest of Web fraud-detection vendors, who were developing techniques to help banks and merchants, noted Cory Siddens, senior product manager at CyberSource.
These vendors began using device fingerprints to identify multiple logins or purchases from otherwise unique identities. "Fraudsters are often very good at obtaining complete identities with no interconnected characteristics," Siddens explained. "But placing orders and logins from unique machines is much more difficult. Typical fraud-detection systems will contain functionality to look for linkages between orders and velocities around transactions. Without a device fingerprint, the fraudsters are using perfect fraudulent identities to evade linkage/velocity-checking systems."
According to a 2009 report from Javelin Research, identity fraud costs about $54 billion per year in the United States alone. Sophisticated browser finger-printing techniques could decrease fraud by 15 percent, said Gartner's Litan. Consequently, several vendors have sprung up to apply them to banking and merchant transactions as well as to travel agency and even dating sites, said Ori Eisen, chair of The 41st Parameter, a technology provider for online device intelligence and identification. Other vendors include CyberSource, Arcot, Iovation, ThreatMetrix, and Scout Analytics.
Burton-Taylor, which tracks the financial services publishing market, estimates the annual information services market at US$23 billion. Scout Analytics has launched a service to reduce password sharing for these exclusive publishing services. The service combines browser fingerprinting with a unique biometric based on a user's typing pattern to distinguish an authorized user from, say, a friend. Publishers generally see a 15 percent revenue increase from deploying this type of security, said Matt Shanahan, Scout Analytics' senior vice president of strategy.
One limitation of browser fingerprinting is that applications are more difficult to program than they are with other client-identification techniques, said Shanahan. They also require more storage, computation, and bandwidth, and functions such as link checking increase resource requirements even further. Finally, browser fingerprinting isn’t as effective on mobile platforms, which tend to have more constant font sets and browser versions.
Privacy Holes
Privacy concerns go beyond identity theft and fraud, of course. One fingerprinting technique can track browser history by asking a series of yes/no questions about whether a browser has visited a particular site. This approach consumes even more computer, communications, and storage resources than other techniques, but it can reveal an alarming amount of information about users. Sites such as "I want to know you" (http://agph.dyndns.org/IWTKY) demonstrate this privacy hole.
Beencounter.com recently launched a commercial service that lets a website managers see if you've visited any one of up to 50 different sites that they can select. In the interest of privacy, the company claims to block searches for visits to adult sites, gambling sites, and financial institutions.
Arvand Nayaranan, a postdoctoral privacy researcher at Stanford, has discussed several invasive uses of history tracking. For example, he described techniques that let a site gather information about the social networking groups an individual visited as a basis for uniquely identifying the person. The EFF's Schoen said that public versions of these techniques have been out since 2006, and any Web developer could code a version of it without any restrictions. Firefox was recently the first to fix this hole by letting users block a server's access to their history. Schoen expects other browsers to soon follow suit.
Addressing the Privacy Gap
Right now, there's no way to tell if a website is tracking users with browser fingerprinting. In the case of the Berkeley research on Flash LSOs, the researchers compared the site's stated policies against a measurement of LSO actual usage to determine discrepancies. So far, no tools are available to test a site's use of browser-fingerprinting technologies.
Furthermore, users don't yet have tools for making a browser more anonymous. You can turn off JavaScript, but at the cost of losing many useful website features. You can also use services such as TorButton (https://www.torproject.org/torbutton), which improves privacy in Firefox but slows down the Web surfing experience. Few similar tools exist for other browsers.
Most industry experts agree that websites must be transparent about their tracking policies to build trust. For example, John Lovett, a senior partner with the Web Analytics Demystified industry-consulting firm, said, "It borders on ethical boundaries when the consumer can't shake the browser fingerprinting."
Schoen said, "A lot of the privacy and ease-of-tracking issues on the Web are not because the browser developer intended to make it easier to track people, but because of the unintended consequences of making the Web more usable."
The legal framework regarding tracking is still in its infancy, noted Gartner's Litan. There's been considerable discussion about the use of tracking cookies in the US and Canada, and a European directive recently mandated that websites ask users to opt in to cookies. But at the moment, no regulators have even considered the possibility that PCs could be tracked without leaving a trace. "They haven't caught up with clientless device identification," Litan said, "but they will get to it."
The EFF's Primer on Information Theory and Privacy gives an excellent summary of entropy as it relates to privacy: https://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy.
George Lawton is a freelance writer based in Guerneville, CA. You can reach him at http://glawton.com.