The data science community has an obligation to educate politicians and the general public about the implications of big data, said Grady Booch, chief scientist of software engineering at IBM Research. Speaking at Rock Stars of Big Data Tuesday at the Computer History Museum, Booch urged the community to speak out when they see big data being misused.
Technologists need to understand that the technologies being created can have unintended consequences. “Consider the implications beyond the technology. Just because something is possible to do, doesn’t necessarily mean that we should do it,” he said.
Inappropriate data collection isn’t anything new. Booch recalled past US censuses’ potentially inappropriate questions, such as asking if the person is a slave, has a contagious disease, can read or write, or in recent times, if the individual is a citizen. “These questions sound absolutely outrageous, but this is the kind of data collection that took place,” he said.
Often, the implications of data aren’t considered. For example, collecting data on electricity usage can yield information on private personal habits. “We know if you look at your electrical usage, I can make a pretty good prediction of when you’re in the house, when you’re doing laundry, and when you’re watching TV,” Booch said.
Even data collected with benign intent can have a dark side. Booch used data collected to protect the monk seal as an example. Making monk seals’ locations public could increase awareness of their plight, but it could also put the endangered species in peril by giving their locations out to those who wish to do them harm.
A big problem is that laws governing big data aren’t keeping up with technology. That’s where the big data community comes in. Booch said the community should be aware of ethical considerations. A good starting place is the codes of ethics that organizations such as IEEE, ACM, and INCOSE have developed.
Members of the big data community should ask themselves how they would feel if they themselves or a member of their family were affected by an unintended consequence of big data.
In the past, big data in the health insurance space meant collecting data to allow risk to be spread among large groups. But now, it can be used to evaluate individual risks, leading to some potentially troubling unintended results. For example, if data revealed that someone was going skydiving or had recently gotten a speeding ticket, their insurance rates could go up.
“The law is going to do some stupid things. We as professional people in this space have a responsibility to step up and say something,” Booch said.
Unfortunately, future misuse of big data seems unavoidable. “We’re going to see some really sad, heart-wrenching uses of big data that destroy individuals. We can’t help that,” Booch said. “But issues need to be raised. If we don’t do it, who will?”