Helping People See and Understand Data
Tableau was founded to help people see and understand data. Tableau revolutionized business intelligence by letting a domain expert explore their own data without involving specialists in databases, analytics, and graphics. And it set an intellectual standard for an analytics tool that was both easy to use and allowed deep exploration and analysis.
Tableau’s innovative approach to data analytics is based on research done by founders Chris Stolte and Pat Hanrahan at Stanford University The resulting system, called Polaris, is a user-centered, interactive tool designed so that a domain expert can explore their own data. It is based on a domain-specific language called VizQL that combines database querying with the automated visual encoding of data. Over the years, we have added many features, but the core product has remained firmly based on this foundation. In this Tableau Blog, Jock Mackinlay uses the product itself to illustrate our history of innovation, starting with our first product launch in 2004.
In 2019, Tableau was acquired by Salesforce. As Tableau Analytics, our product offerings include Einstein Analytics (now Tableau CRM). While Tableau’s products are human-centered, GUI-based tools for data exploration and visual analysis, Tableau CRM focuses on automated discovery based on ML/AI technology. We are now working as a business to integrate these two approaches (TC updates).
Tableau Research was founded in 2012 to help drive Tableau innovation with a combination of academic-style research and practical product application. To guide our research, we look both at specific product and customer needs, and at research and technology trends. We are deeply connected to the IEEE VIS community, both as contributors and as sponsors. What follows next are some research areas we feel are important both for us and for the VIS community.
Responsible Analytic Assistance
We need to use the power of automation to help people achieve analytic results that are ethical, trustworthy, and easy to verify. Instead of exploring more and different ways to visualize data, let’s invest in helping people trust and verify that what they see accurately reflects their data. We need to seriously address the question: Can you trust your visualization? Are your results correct? And if they are correct, will they lead you to do ethical things out in the world?
Collecting, shaping, visualizing, and viewing data is a process that includes many opportunities for introducing bias and error. In some cases, the problems may be obvious. But more often, the results may look entirely reasonable and correct but are not, creating what we call Visualization Mirages that can lead the unwary into discovering insights that are not true.
Here’s the challenge for our research community. Suppose we could automatically audit or check visualizations for problems? What can we detect? How do we present it effectively?
Adding ML/AI to the analytic processes offers new challenges. Now the process is more complex, introducing new opportunities to introduce bias and error. Bias and error in ML/AI algorithms themselves are currently getting a lot of attention, much of it focused on the bias built into the training sets that drive these algorithms. Our message, especially for the data visualization community, is that helping people trust what they see when they visualize data is not limited to problems with ML/AI algorithms. It is an important part of any process that helps people see and understand data.
Applying Machine Learning and AI to Visualization
Human-in-the-loop machine learning and explainable AI are large, rapidly growing areas of research and application. Our goal is to understand how best to apply these to visualization and visual analytics. Instead of offering magic boxes of automated “insights,” we need to discover how best to integrate these technologies with our human-centered tools and processes. We want to provide their power in a way that people can understand and trust what they produce.
We believe the answer is to create a collaborative, almost conversational interaction between humans and automated systems like ML/AI. Our vision is to provide a cooperative, cybernetic analytical experience that leads to ethical results. Our research emphasizes creating a trusted partnership between humans and computers, leveraging the strength of both for working with data. We need to consider what this interaction looks like in both directions.
To understand and trust our computer partners, humans need reliable ways to review and understand their results. In the other direction, we need effective ways to act on these results, correct errors, and provide feedback for improvement. The key to both of these vectors is looking broadly at the people and the processes involved in collecting, analyzing, and understanding data. Rather than focusing immediately on new models and algorithms, try first to discover both when and how people might engage with the process. Sometimes the results are surprising.
Consider the problem of defining and clustering a collection of text documents into a set of topics. Computers are essential for working with large bodies of text, but modeling and automating this process is difficult and often produces unsatisfactory results, requiring human curation and interpretation. How might we best improve these tools, and specifically, best make it possible for people and machines to work together on them?
To answer that question, we first created a text mining pipeline designed to make it easy to simulate different user interactions throughout the pipeline, then analyzed their impact on the visualized results. We discovered that high-impact interactions tend to be earlier in the data analytics pipeline, like data preparation or feature engineering. Making changes to the later modeling steps had relatively little effect. Therefore, rather than researching new models and ways to manipulate their parameters, we should explore better, more integrated tools at the pre-processing and data preparation stages of the pipeline. (Crisan and Correll).
Another exploration looked at the problem of building analytic data models from XML. The resulting prototype, called Natto, demonstrates how a collaborative interaction between human and machine can iteratively solve this complex problem.
Our challenge to the research community is to move beyond the current focus on visualizing algorithms and explaining models to broadly explore the best ways to use ML/AI in visualization as a partner with our human users. We need to discover ways to detect, inspect and correct their results as an intrinsic part of their application.
Natural Language + Visualization
Tools for analyzing and visualizing data tend to require people to learn to code (R, Python, D3), or learn a tool with a specialized, often complex GUI (Tableau, PowerBI). Advances in Natural Language Processing offer natural language as a form of interaction, which may be combined with GUIs and other more traditional tools for human-computer interaction. We want to provide ways for people to explore their data, not just create chatbots that answer simple questions. But how best to do it? We are trying to answer this question through a series of exploratory prototypes and user studies, such as those described below.
The 2016 system called Eviza demonstrates how people might ask a series of questions about their visualized data. For example, a user begins by selecting a map of earthquake locations. Their first question is to ask about large earthquakes near California. In response, the system will automatically filter the data to “large” and zoom into a view of California. In addition, it will offer a widget to allow the user to specify the magnitude they want to use for “large.”In answer to a follow-up question, for example, “how about near Texas?” it will switch the view to Texas but retain the filter to show only large earthquakes. The system infers that intent by the previous context. This maintains the analytic context, supporting a conversational flow.
Eviza demonstrated a novel way to use NL in visual analytics and emphasized the value of mixing an NL interface with more traditional visualization and direct manipulation tools. It highlighted the importance of establishing analytic intent through both the visual context and the linguistic pragmatics. And, it strongly influenced the development of Tableau’s first natural language feature, Ask Data.
Our experience with Eviza convinced us that natural language interaction should be an important part of visualization research. Fundamentally, it lets us think differently about how we build tools and help people understand their data. To make it effective, we need to address two fundamental challenges. The first is helping people learn what to explore in their data. The second is helping people ask their questions effectively.
Want more tech news? Subscribe to ComputingEdge Newsletter Today!
Autocompletion is an important and useful scaffold in any natural language interface to help with the search and discovery of information. It’s pretty much ubiquitous in any UI that allows people to type in a question. For asking questions about data, we have explored surfacing visualizations and their associated widgets as a form of autocompletion. The SneakPique and GeoSneakPique systems demonstrate these ideas.
The system called Snowy recommends analytic questions to the user, based both on their data and the visualizations they have already created. It uses a combination of data analysis (to define interestingness metrics) and language pragmatics to prompt the user to explore further.
Data values are precise, people are not. Studies can help us better understand what words people use and what they might mean when they are applied to data analytics. For example, what does a vague modifier like “high” or “expensive” mean when used in data questions and what types of visualizations best answer such questions? A more recent study explores analytic comparisons.
NL has the potential to enrich human-computer interaction far beyond its current applications such as converting speech to text or helping people answer simple questions. Fundamentally, it lets us think differently about how we build tools and help people understand their data. We challenge the visualization community to embrace this opportunity.
While these themes are important, they don’t represent all the research we do or even all the research we do relevant to the IEEE VIS community. You can find more about our recent work at IEEE VIS in the video included with this blog, which was presented in the sponsors’ session at IEEE VIS this year. A longer video, with more detail about these themes, was given as a sponsor’s keynote at the conference and can be found here. Our published papers can be found on our website.
About the Writer
Maureen Stone has been a member of Tableau Research since its founding in 2012. She joined its leadership team in 2015 and became its director in 2017. She returned to a full-time research role in December, under the leadership of longtime colleague Vidya Setlur.