2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) (2010)
Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
Nikos Sarkas , University of Toronto, USA
Albert Angel , University of Toronto, USA
Nick Koudas , University of Toronto, USA
Divesh Srivastava , AT&T Labs-Research, USA
The relentless pace at which textual data are generated on-line necessitates novel paradigms for their understanding and exploration. To this end, we introduce a methodology for discovering strong entity associations in all the slices (meta-data value restrictions) of a document collection. Since related documents mention approximately the same group of core entities (people, locations, etc.), the groups of coupled entities discovered can be used to expose themes in the document collection. We devise and evaluate algorithms capable of addressing two flavors of our core problem: algorithm THR-ENT for computing all sufficiently strong entity associations and algorithm TOP-ENT for computing the top-k strongest entity associations, for each slice of the document collection.
A. Angel, N. Koudas, N. Sarkas and D. Srivastava, "Efficient identification of coupled entities in document collections," 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)(ICDE), Long Beach, CA, USA, 2010, pp. 769-772.