2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) (2017)
Urbana, IL, USA
Oct. 30, 2017 to Nov. 3, 2017
Jiawei Han , Abel Bliss Professor, Department of Computer Science, University of Illinois at Urbana-Champaign, USA
The real-world big data are largely unstructured, interconnected text data. One of the grand challenges is to turn such massive unstructured text data into structured, actionable knowledge. We propose a text mining approach that requires only distant or minimal supervision but relies on massive text data. We show quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and entities/attributes/values can be discovered by meta-path directed pattern discovery. We show text-rich and structure-rich networks can be constructed from massive unstructured data. Finally, we speculate whether such a paradigm could be useful for turning massive software repositories into multi-dimensional structures to help searching and mining software repositories.
J. Han, "Mining structures from massive text data: Will it help software engineering?," 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Urbana, IL, USA, 2017, pp. 2.