Extracting interesting related context-dependent concepts from social media streams using temporal distributions
2013 IEEE 29th International Conference on Data Engineering (ICDE) (2013)
Brisbane, Australia Australia
Apr. 8, 2013 to Apr. 12, 2013
C. P. Sayers , Hewlett-Packard Labs., Palo Alto, CA, USA
Meichun Hsu , Hewlett-Packard Labs., Palo Alto, CA, USA
To enable the interactive exploration of large social media datasets we exploit the temporal distributions of word n-grams within the message stream to discover “interesting” concepts, determine “relatedness” between concepts, and find representative examples for display. We present a new algorithm for context-dependent “interestingness” using the coefficient of variation of the temporal distribution, apply the well-known technique of Pearson's Correlation to tweets using equi-height histogramming to determine correlation, and employ an asymmetric variant for computing “relatedness” to encourage exploration. We further introduce techniques using interestingness, correlation, and relatedness to automatically discover concepts and select preferred word N-grams for display. These techniques are demonstrated on an 800,000 tweet dataset from the Academy Awards.
Correlation, Awards activities, Histograms, Media, Visualization, Context, Twitter
C. P. Sayers and Meichun Hsu, "Extracting interesting related context-dependent concepts from social media streams using temporal distributions," 2013 29th IEEE International Conference on Data Engineering (ICDE 2013)(ICDE), Brisbane, QLD, 2013, pp. 1308-1311.