Electronic Medical Records: Confidentiality, Care, and Epidemiology
Michael Lesk
FEB 20, 2014 13:15 PM
A+ A A-

Electronic medical record (EMR) systems are expected to improve patient care, save staff time, and support epidemiological research. For these and other reasons, the Affordable Care Act requires that all US patients have an EMR by 2014. Approximately US$35 billion will be spent to support doctors' and medical facilities' installation of records systems, with a criterion of achieving "meaningful use" in actual practice. Unfortunately, EMRs in the US suffer from not only implementation problems but also policy decisions about their privacy that might impede both patient care and medical progress.

What Does the Rest of the World Do?

Back to Top

Europe has seen impressive results with EMR systems—Denmark has had full coverage for more than 10 years, and other countries such as the Netherlands and Sweden also have essentially full coverage. Denmark has the lowest drug error rate in Europe, and its doctors report that EMR systems save them approximately one hour per day. 1 Meanwhile, the US is still struggling to reduce errors. The famous 2000 National Research Council report "To Err Is Human" estimated that approximately 100,000 deaths resulted from medical errors each year; a decade later, estimates suggest that the rate is approximately the same. 2

The most common serious errors relate to pharmaceuticals. In addition to unintentional drug errors, there are also cases of prescribed but medically inappropriate drug administration. Denmark has the lowest rate of inappropriate medication in eight European countries (Denmark, the Netherlands, the UK, Iceland, Norway, Finland, Italy, and the Czech Republic—a 5.8 percent rate, compared to 19.8 percent in these countries on average). 3 Fear of medical error is much less common in Denmark than in countries with less complete records, suggesting that Denmark's population has recognized the gains from their system, or that publicity about the dangers of EMRs hasn't obscured observation of their benefits.

In the US, we don't yet see convincing results about care quality. Some practices and experiments report fewer errors with e-prescribing, but other studies don't. Many authors report anecdotal problems with software, some even saying that paper records would be better. Serious studies disagree. To pick just a few papers, one study reports improved results from health IT in hospitals, another reports no improvements with outpatients, and a review suggests that effects are minimal and nonsignificant. 4

Patient compliance with physician instructions—a key ingredient in improving health—doesn't seem to improve as a result of automation. The excitement about personal health records and getting people to monitor their own health has died down a bit, with the demise of Google Health being an example. Devices such as the FitBit, which tracks calorie consumption, and Wi-Fi-enabled bathroom scales attract young "geeks" who are still in their 20s and don't represent a major share of health problems or costs. The elderly are less enthusiastic about maintaining their own health records.

Electronic health records (EHRs) are also vital for epidemiological research. From 2000 to 2007, studies on electronic records increased by a factor of 6. 5 The UK announced that it will make a medical records database available for UK researchers. 6 Patients choose to opt in, but a high rate of participation is expected, with some 52 million records available for study. France will similarly be making health data available for epidemiological research.

Software Problems Afflict the US EMR System

Back to Top

The US EMR systems' code base is quite old, and their interfaces old-fashioned. Years after mobile devices have become ubiquitous, many medical systems still present doctors with a desktop system, showing a screen of fields to populate. Presentation issues detract from care; for example, screens listing patient drug schedules in the order that prescriptions are written make detecting multiple prescriptions for the same drug difficult. Recently, a doctor sent me a screenshot from an EHR system showing a list of four prescriptions for the same drug interspersed with 13 warning messages (many duplicative) and asking how he could quickly and confidently figure out the total prescribed dosage with such a confusing interface.

Interoperability is another problem. Many patients are treated by multiple healthcare providers, and formatting problems can impede data exchange. Often, when a patient moves to a new provider, the old records system won't deliver structured data to the new system, just images of printed forms. One physician related an amusing problem—in her hospital, EKG tracings were presented horizontally, but when patient records arrived from a nearby provider, EKG tracings were displayed vertically, causing her to spend her time standing with her head turned 90 degrees.

Such interoperability problems impede efforts for coordinated care. Modern healthcare attempts to consider all patient problems together, rather than isolating and treating issues separately. When different specialists can't easily see the same record, care might suffer. Using a single record per patient—as many European systems do—can reduce conflicts and improve care. Many participants recognize interoperability as a major issue; for instance, Minnesota law requires interoperable health records by 2015.

One disadvantage of the single record is that it's bulkier. The modern medical record averages more than 200 pages, and doctors are supposed to spend less than 10 minutes with each patient. As a result, information overload is a serious problem, and bad displays and lack of summarization make things worse. When health providers spend all their time looking at screens instead of patients, aspects of patient behavior might be missed, and patients might feel ignored and less involved.

Some EMR standards exist, both at high (CCDs [continuity of care documents] and CCRs [continuity of care records]) and lower levels (radiological image exchange). However, too often it appears that the vendors' goal is to lock in customers rather than to facilitate data transfer to new systems.

A combination of interface and interoperability problems, along with training issues, planning problems, and installation difficulties, has meant that 30 to 40 percent of US attempts to install an EHR system fail. 7 There's now an entire literature on EMR system problems in practice, with discussions on procedures to improve interfaces, workflow, and user engagement. A recent Middleton review reports on efforts to increase computer systems' usability for the patients' benefit. 8

Typically, outsiders can't inspect EMR system code. Although the Veterans Administration uses an open source system, and West Virginia Senator Jay Rockefeller proposed a bill in 2009 to fund open source records systems, the industry in general hasn't been enthusiastic about using open source. This makes debugging healthcare software a proprietary and unobservable task.

To obtain the improvements observed in Europe, the US requires better software. Unfortunately, during the three-year period between the enactment of the requirement for health records in 2009 and the 2012 election—which might have resulted in the repeal of the Affordable Care Act—less was done to develop and improve EMR systems than might have happened without uncertainty. Over the next few years, rapid progress is necessary to achieve real patient benefit.

Problems Imposed by Vendor Confidentiality

Back to Top

Most EMR vendors insist on contracts that don't allow healthcare staff to disclose software problems to anyone except the vendor. Some contracts even forbid showing screenshots of their systems to anybody. These contracts also transfer all liability from the software vendor to the healthcare system. 9 In addition, despite its importance in medical treatment today, EMR software isn't regulated by the US Food and Drug Administration (FDA). This makes historical sense; when software was first introduced to medicine, it arrived as billing software, and billing mistakes aren't likely to cause patient injury. However, we're long past those days, and our hospitals' EMR/EHR software still lacks oversight.

Confidentiality clauses prohibiting disclosure of medical software problems contrast with the mandatory reporting of drug problems and the medical device–reporting systems. The FDA operates the Adverse Event Reporting System to which pharmaceutical companies must report adverse drug effects. Individual clinicians and even consumers can also make reports (but aren't compelled to do so). Similarly, FDA's Manufacturer and User Facility Device Experience system collects reports of adverse events involving medical devices. Again, device manufacturers must report adverse events, and hospitals must report deaths related to medical devices to the FDA but can report injuries only to the manufacturer. No similar system exists for software problems.

The largest single class of medical errors is drug errors, and e-prescribing systems are believed to reduce error. A recent comparison of e-prescribing systems' error rates found wide variations, from 5 to 37 percent.10 However, the authors couldn't get permission to publish the systems' names, so physicians don't know which are best.

If patient safety is a critical health IT issue, corporate policies that interfere with research to help patients by restricting hazards disclosure—in the name of intellectual property and liability limitations—are difficult to defend. For improved patient safety, we need data about actual problems, particularly given the increasing dominance and complexity of computer software. In fact, the confidentiality imposed on bug reports hurts the software further: vendors are encouraged to fix bugs only at the hospital that reported the problem, leading to a proliferation of versions with different bugs persisting in different places. Enforcing ignorance of software problems hurts patients and is poor public policy.

As an example of a similar confidentiality issue in a different domain, consider NASA's Aviation Safety Reporting System, which accepts reports from air transport workers, including pilots, flight crews, and ground staff in both industry and government, of any incident that presents a safety risk. As an incentive to report incidents, people who report a problem that didn't involve an actual accident or a violation of law aren't penalized for their actions if they promptly report the situation. In addition, the data is kept confidential, and the entire system is run by NASA, not the Federal Aviation Administration, which would be the enforcement agency. NASA has no enforcement power over air travel. As a result of this bargain—better data in exchange for immunity—more than 1 million incidents are in the database. These are available in anonymized form for research.

Patient Confidentiality Impacts Research

Back to Top

Increasingly, we're restricting the details available even in anonymized medical records. We used to publish death records by cause of death and town. Now they're published only by state, making it difficult to find "cancer clusters" or other data. Daniel Wartenberg and W. Douglas Thompson discuss the conflict between privacy and research, noting that in 1988, public health records included county or city location and date, whereas now they have no geographic information and only a partial date. 11 The authors note that important research on air pollution done in the past couldn't be done today. We might ask why death statistics need the same level of privacy as the recording of events about living patients.

Similarly, a study of the Health Insurance Portability and Accountability Act's (HIPAA) impact on influenza research observed the distortion resulting from HIPAA restrictions on geographic coding. As a result of privacy concerns, one research group's question about the relationship of a bacterial infection to stomach cancer in a small community couldn't be answered. 12 Because health professionals couldn't say whether the community actually had a higher-than-normal risk of stomach cancer, they couldn't address the resulting anxiety or resolve the underlying issue.

The onrush of genetic data will make privacy an increasingly relevant issue. Groups like Sage Genomics believe that by analyzing patient genomic data, they can revolutionize cancer treatment, making chemotherapy more effective with fewer side effects. Today, however, patient privacy is an obstacle to gathering such data. In the UK, the law recognizes "DNA theft" as an offense, although research on DNA samples is permitted with approval. If the US adopted similar rules, those rules would impede progress in genomic research. As with other data that might be personally identifiable—genomic or radiological, for example—research that requires access to the data can be impeded by privacy constraints.

It's particularly upsetting to epidemiologists that commercial companies have better access to medical records than researchers do, because companies can buy records from insurance companies, pharmacies, and hospitals. A few years ago, Vermont tried to give doctors the right to prohibit the sale of information about the prescriptions they wrote, but the Supreme Court struck down the law. As a result, epidemiology is easier in corporations than in medical schools and hospitals—although billions of public dollars are spent on medical research, public researchers are hampered in their efforts.

Anonymization Has Been Given a Bad Name

Back to Top

Opinions such as "the anonymization process is an illusion" are common. A few years ago, someone found the medical records of William Weld, then Governor of Massachusetts, in an anonymized dataset. More widely publicized examples are the identification of particular people in nonmedical datasets, in particular, AOL search logs and Netflix movie recommendations. These instances frightened people into further fuzzing of medical data, which impacts formal research. Institutional review board requirements also constrain attempts to do medical research. The recognition that DNA databases and radiological images are also personally identifiable has further frustrated efforts to create patient databases for use in research.

More recently, there's been some pushback. Daniel Barth-Jones argued that the Weld deanonymization case depended on publicity given to his hospitalization and doesn't represent a set of generally applicable circumstances. 13 Researchers are exploring additional ways to anonymize data. In general, these methods rely on aggregating data to a level at which individuals can't be identified. These methods can be difficult to understand and rely on statistical methods, so confusion can lead to disclosure out of ignorance. For instance, data administrators who don't understand statistical methods might allow searches that tell you that 102 people in a group are more than 40 years old, but 101 are more than 41 years old, so that you know exactly one person is 41 years old; by combining this with other features, you can find out more about that one person.

It's sometimes possible to deanonymize data by comparing multiple public sources, but it won't be apparent that individuals can be identified until the various data sources are compared. For example, people can be identified from cell phone records, even though the locations aren't reported to full precision. Limiting the number of data requests to databases can impede someone who wishes to identify an individual by comparing results from multiple queries. For example, consider how the Netflix dataset was de-anonymized. Imagine I take three rarely watched movies and find that one and only one person in the Netflix dataset saw all of them. Then I find that one and only one person on IMDb has reviewed all of them. It's a good guess that this is the same person, and IMDb reviews are signed and often link to real names and biographies. The restrictions that prevent this kind of game-playing also pose obstacles for researchers, and so restrictions and permissions for qualified researchers need to be negotiated.

Various researchers are now trying to balance anonymization with clinical needs. For example, Oscar Ferrández and his colleagues look at various methods to anonymize clinical reports and compare their effectiveness at removing personal information while leaving enough detail for clinical study. 14 Privacy advocates will object that the recommended methods don't guarantee anonymization; however, perfect confidentiality is an unachievable goal (and didn't exist with paper records, either). More computer security wouldn't have helped in the recent case of two Australian comedians who, by impersonating the Queen of England, persuaded London hospital staff members that they had authority to know about the Duchess of Cambridge's medical condition.

Amateur Epidemiology

Back to Top

Sites such as www.patientslikeme.com and www.23andme.com attempt to collect medical records for research, as do professional researchers in organizations like Sage Genomics. These efforts demonstrate that sharing medical data can produce better results for individual patients, so that patients enthusiastically participate. For example, patients know privacy risks exist in using Internet discussion groups about health, but the more serious their illness, the more willing they are to disclose information.

Professional epidemiologists have some hesitation about these sites owing to problems such as self-selection and inaccurate reporting. Nevertheless, Internet data has been valuable for medical research. The best-known example is Google Flu Trends, which uses information about popular search terms to track those that are correlated with influenza outbreaks. This search data detects places where the disease is occurring faster than the Centers for Disease Control receives and publicizes reports from doctors. Today, there is more interest in sites with more direct medical data, such as disease support groups and the volunteer sites I mentioned earlier. However, A. Cecile Janssens and Peter Kraft have several hesitations about exploiting data from online communities. 15 They worry about selection bias, for example. If you imagine that people who like the Internet are more likely to fill out forms about their psychological health, you might get a distorted view about the impact of Internet use on depression. Similarly, confounding can arise when people report only a few variables, some of which might be related to unreported variables. In a normal experiment, we might be able to ask about those other variables, but this can be more difficult with a volunteer survey. Janssens and Kraft also stress a need for careful disclosure of what's being done.

The individual genomics data on the 23andMe website can also be used in research, but questions have arisen as to whether risks are adequately disclosed. A problem with discussion of the detailed risks is that sufficiently frightened patients might refuse to discuss their problems with their physician. Others have dismissed (or at least criticized) personal testing as "recreational genetics" and tried to steer clear of this data.

Some systems are a mixture of self-managed treatment and epidemiological research. For example, some systems for diabetes patients encourage patients to pay careful attention to their own diabetes and manage their own treatment. Data collected from these patients can be valuable for epidemiology. In addition to the general sites already mentioned, researchers have exploited data from several UK diabetes-related online communities. Again, we must take steps to ensure patient protection and to understand the risks involved in data sharing.

Paul Wicks and his colleagues wrote a particularly interesting article on data exploitation from PatientsLikeMe in which they selected control patients from a dataset to reduce selection bias. 16 Their article suggests that patient-reported data can accelerate the discovery of new treatments as well as help evaluate the effectiveness and side effects of current methods. Patient-contributed data is available in large quantities and more rapidly than data from most clinical trials. It's particularly important for rarer diseases in which researchers in one geographic location might have difficulty finding enough sufferers to achieve statistical validity.

In all these situations, patients voluntarily contributed data. They might not know what the risks are, or they might have decided they are small. In any case, patients have decided that voluntary disclosure is useful to them personally and are willing to accept that other people will use the data for research. People who aren't currently sick don't see the same advantages in disclosure, but we can't do longitudinal studies without information on people who haven't yet developed the diseases we're investigating.


Back to Top

In the US, data from patient records can indeed be used against you. Although your health insurance company can't use genetic testing results in rate setting, there's no such prohibition for life insurance or long-term-care insurance companies. And, of course, it's legal to fire people for being sick. As a strange metric of risk, criminals sell social security numbers for $5 but medical records for $50. People perceive particular dangers in medical data exposure—aside from the now familiar and general risks of identity theft, barrages of telemarketing, and public notoriety, medical records might affect employment, medical treatment, insurance, and many other facets of life such as the ability to buy firearms or criminal sentencing decisions.

Strangely enough, some of the same arguments made about patient privacy are made about corporate confidentiality. Medical software companies want neither mandatory disclosure of software flaws nor regulation by the FDA. They argue the possibility of financial loss if disclosure of errors leads to liability lawsuits, for example. Just as individuals worry that they won't be able to get a new job if employers know too much about their medical history, vendors worry that they can't introduce new features if regulators and purchasers are able to closely investigate the process and take too long to evaluate new options. They argue that government regulation, in particular, will slow the creation of new features and the introduction of new software methods; this would be more convincing as an argument if there weren't still EMR systems using Cobol. Again, there's a conflict between public benefit and participants' privacy, in this case, vendors' privacy. Most of us see an ethical difference between confidentiality of patient data and confidentiality of software design, but similar arguments are being made, and I am reminded that Governor Mitt Romney argued that "corporations are people."

People—real people—legitimately fear losing their job as a result of medical records disclosure. Sometimes, these consequences are justified. In 1996, a train crash in New Jersey killed three people, including the train engineer who ran through a red signal and who had concealed from the railroad company his loss of color vision as a result of diabetes. 17 And, going back more than a century, a New York woman best known as "Typhoid Mary" infected multiple people with typhoid fever but kept taking jobs as a cook. She was released from quarantine after promising to stop working as a cook, but being a laundress paid less, so she changed her name and returned to cooking. After another series of typhoid cases, she was confined until her death.

The news media regularly feature stories about theft of records, typically credit card numbers. One result of these stories is an increased level of fear about data disclosure, causing people to demand ever more confidentiality about medical records, which as noted, interferes with medical research and treatment decisions. We don't see news stories about our inability to recognize carcinogens because we can't do adequate data mining. Thus, the media bias the discussion in favor of privacy and against medical research.

The Conflict between Epidemiology and Privacy

Back to Top

Jane Yakowitz wrote a detailed and insightful article about the conflict between research and privacy, ranging far beyond medical epidemiology. 18 She points to the many valuable studies done with large datasets and the importance of continuing such research. Anonymization is possible, if not perfect, and she suggests that the public benefit is so important that it outweighs exaggerated privacy concerns. We can also anticipate further improvements in our knowledge of anonymization techniques and our understanding of the risks.

Some believe that patient data should be "property" belonging to the patient and should not available without payment. Given that patient data is routinely traded today, 19 albeit in an anonymized HIPAA-compliant form, it's understandable to think that if patient data is sold, the patients should get the money. However, introducing property rights in medical records is likely to create a mess for the entire healthcare system. Many aspects of people's lives, such as their credit, where and how rapidly they drive, what books they buy, and what movies they watch, are of commercial value and exploited today. Should all these become an individual's property right? At a minimum, this will produce a vast expansion of license agreements, so that buying a cell phone will require acknowledgment of a transfer of ownership of travel history. As a society, we're unwilling to impede data studies done for marketing; is it not more important to preserve our ability to do medical research?

As an example of the importance of detailed data, Janet Currie and W. Reed Walker have shown that introducing E-ZPass electronic toll collection in New Jersey improved health—reducing premature births and low birth weight infants among mothers who lived within 2 km of a toll plaza. 20 Avoiding the need for drivers to stop at the tollbooths lowered congestion and pollution. This study couldn't have been done without access to the mothers' exact street addresses, which is exactly the kind of precise data that privacy advocates fear can be used for deanonymization. Should we make it difficult to have done this study?

More openness about medicine would benefit all of us. We should believe in anonymized records and make them more widely available for study and push for disclosure and regulation of medical software programs.



1. D. Protti and I. Johansen, "Widespread Adoption of Information Technology in Primary Care Physician Offices in Denmark: A Case Study," Commonwealth Fund, Mar. 2010.2. D. Grady, "Study Finds No Progress in Safety at Hospitals," The New York Times,24 Nov. 2010.3. D. Fialová et al., "Potentially Inappropriate Medication Use among Elderly Home Care Patients in Europe," J. Am. Medical Assoc.,vol. 293, no. 11, 2005, pp. 1348-1358.4. C.M. DesRoches et al., "Electronic Health Records' Limited Successes Suggest More Targeted Uses," Health Affairs, vol. 29, no. 4, 2010, pp. 639-646.5. B.B. Dean et al., "Use of Electronic Medical Records for Health Outcomes Research: A Literature Review," Medical Care Research and Rev., vol. 66, no. 6, 2009, pp. 611-638.6. I. Sample, "NHS Patient Records to Revolutionise Medical Research in Britain," The Guardian,28 Aug. 2012.7. S. Alfreds, Health Information Technology Adoption in Massachusetts: Costs and Timeframe, Univ. Massachusetts Medical School;.8. A.F. Rose et al., "Using Qualitative Studies to Improve the Usability of an EMR," J. Biomedical Informatics, vol. 38, no. 1, 2005, pp. 51-60.9. R. Koppel and D. Kreda, "Health Care Information Technology Vendors' ‘Hold Harmless' Clause: Implications for Patients and Clinicians," J. Am. Medical Assoc., vol. 301, no. 12, 2009, pp. 1276-1278.10. K.C. Nanji et al., "Errors Associated with Outpatient Computerized Prescribing Systems," J. Am. Medical Informatics Assoc., vol. 18, 2011, pp. 767-773.11. D. Wartenberg and W.D. Thompson, "Privacy versus Public Health: The Impact of Current Confidentiality Rules," Am. J. Public Health, vol. 100, no. 3, 2010, pp. 407-412.12. A. Colquhoun et al., "Challenges Created by Data Dissemination and Access Restrictions When Attempting to Address Community Concerns: Individual Privacy Versus Public Wellbeing," Int'l J. Circumpolar Health, vol. 7, 2012, pp. 1-7.13. D. Barth-Jones, "The ‘Re-Identification' of Governor William Weld's Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now,"4 June 2012; .14. O. Ferrández et al., "Evaluating Current Automatic De-identification Methods with Veteran's Health Administration Clinical Documents," BMC Medical Research Methodology, vol. 12, 2012; .15. A.C.J.W. Janssens and P. Kraft, "Research Conducted Using Data Obtained through Online Communities: Ethical Implications of Methodological Limitations," PLoS Medicine, vol. 9, no. 10, 2012; e1001328.16. P. Wicks et al., "Sharing Health Data for Better Outcomes on PatientsLikeMe," J. Medical Internet Research, vol. 12, no. 2, 2010, p. e19.17. M.L. Wald, "Eye Problem Cited in ‘96 Train Crash," The New York Times,26 Mar. 1997, p. A1.18. J. Yakowitz, "Tragedy of the Data Commons," Harvard J. Law and Technology, vol. 25, no. 1, 2011, pp. 1-67.19. M.A. Rodwin, "Patient Data: Property, Privacy & the Public Interest," Am. J. Law and Medicine, vol. 36, 2010, pp. 586-618.20. J. Currie and R. Walker, "Traffic Congestion and Infant Health: Evidence from E-ZPass," Am. Economic J.: Applied Economics, vol. 3, no. 1, 2011, pp. 65-90.

Michael Lesk is a professor of library and information science at Rutgers University. Contact him at lesk@acm.org.

[%= name %]
[%= createDate %]
[%= comment %]
Share this:
Please login to enter a comment:

Computing Now Blogs
Business Intelligence
by Keith Peterson
Cloud Computing
A Cloud Blog: by Irena Bojanova
The Clear Cloud: by STC Cloud Computing
Computing Careers: by Lori Cameron
Display Technologies
Enterprise Solutions
Enterprise Thinking: by Josh Greenbaum
Healthcare Technologies
The Doctor Is In: Dr. Keith W. Vrbicky
Heterogeneous Systems
Hot Topics
NealNotes: by Neal Leavitt
Industry Trends
The Robotics Report: by Jeff Debrosse
Internet Of Things
Sensing IoT: by Irena Bojanova