This Article 
 Bibliographic References 
 Add to: 
Competitor Mining with the Web
October 2008 (vol. 20 no. 10)
pp. 1297-1310
Shenghua Bao, Shanghai Jiao Tong University, Shanghai
Rui Li, Shanghai Jiao Tong University, Shanghai
Yong Yu, Shanghai Jiao Tong University, Shanghai
Yunbo Cao, Microsoft Research Asia, Beijing
This paper is concerned with the problem of mining competitors from the web automatically. Nowadays the fierce competition in the market necessitates every company not only to know which companies are its primary competitors, but also in which fields the company's rivals compete with itself and what its competitors' strength is in a specific competitive domain. The task of competitor mining that we address in the paper includes mining all the information such as competitors, competing fields and competitors' strength. A novel algorithm called CoMiner is proposed, which tries to conduct a web-scale mining in a domain-independent manner. The CoMiner algorithm consists of three parts: 1) given an input entity, extracting a set of comparative candidates and then ranking them according to comparability; 2) extracting the fields in which the given entity and its competitors play against each other; 3) identifying and summarizing the competitive evidence that details the competitors' strength. As for evaluation, a prototype system implementing the CoMiner algorithm is presented. An evaluation data set consisting of 70 entities is constructed. 728 competitors and 3,640 competitive fields with 6,381 competitive evidences are discovered with the prototype. The experimental results show that the proposed algorithm is highly effective.

[1] T.L. Friedman, The World Is Flat: A Brief History of the Twenty-First Century. Farrar, Straus and Giroux, Apr. 2005.
[2] A.R. Johnson, “What Is Competitive Intelligence?” http://www.aurorawdc.comwhatisci.htm, 2007.
[3] Q. Zhao, T.-Y. Liu, S.S. Bhowmick, and W.-Y. Ma, “Event Detection from Evolution of Click-Through Data,” Proc. ACM SIGKDD '06, pp. 484-493, 2006.
[4] D. Freitag and A. McCallum, “Information Extraction with HMMS and Shrinkage,” Proc. AAAI Workshop Machine Learning for Information Extraction, 1999.
[5] N. Ashish and C. Knoblock, “Wrapper Generation for Semi-Structured Internet Sources,” SIGMOD Record, pp. 8-15, 1996.
[6] O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popcscu, T. Shaked, S. Soderland, and S. Weld, “Web-Scale Information Extraction in Knowitall (Preliminary Results),” Proc. 13th Int'l Conf. World Wide Web (WWW '04), pp. 100-110, 2004.
[7] M. Hearst, “Automatic Acquisition of Hyponyms from Large Text Corpora,” Proc. 14th Int'l Conf. Computational Linguistics (COLING '92), pp. 539-545, 1992.
[8] P. Cimino, S. Handschuh, and S. Staab, “Towards the Self-Annotating Web,” Proc. 13th Int'l Conf. World Wide Web (WWW'04), pp. 462-471, 2004.
[9] B. Liu and C. Chin, “Mining Topic-Specific Concepts and Definitions on the Web,” Proc. 12th Int'l Conf. World Wide Web (WWW '03), pp. 251-260, 2003.
[10] R.A. Baeza-Yates and B.A. Ribeiro-Neto, Modern Information Retrieval, , ACM Press/Addison-Wesley, 1999.
[11] L.-F. Chien, T.-I. Huang, and M.-C. Chien, “Pat-Tree-Based Keyword Extraction for Chinese Information Retrieval,” Proc. ACM SIGIR '97, pp. 50-58, 1997.
[12] H.-J. Zeng, Q.-C. He, Z. Chen, W.-Y. Ma, and J. Ma, “Learning to Cluster Web Search Results,” Proc. ACM SIGIR '04, pp. 210-217, 2004.
[13] O. Zamir and O. Etzioni, “Grouper: A Dynamic Clustering Interface to Web Search Results,” Proc. Eighth Int'l Conf. World Wide Web (WWW '99), May 1999.
[14] O. Zamir and O. Etzioni, “Web Document Clustering: A Feasibility Demonstration,” Proc. ACM SIGIR '98, pp. 46-54, 1998.
[15] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushinna, “Mining Product Reputations on the Web,” Proc. ACM SIGKDD '02, pp. 341-349, 2002.
[16] M. Hu and B. Liu, “Mining and Summarizing Customer Reviews,” Proc. ACM SIGKDD '04, pp. 168-177, 2004.
[17] A.-M. Popescu and O. Etzioni, “Extracting Product Features and Opinions from Reviews,” Proc. Human Language Technology Conf. and Conf. Empirical Methods in Natural Language Processing (HLT/EMNLP '05), pp. 339-346, 2005.
[18] B. Liu, M. Hu, and J. Cheng, “Opinion Observer: Analyzing and Comparing Opinions on the Web,” Proc. 14th Int'l Conf. World Wide Web (WWW '05), pp. 342-351, 2005.
[19] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs Up? Sentiment Classification Using Machine Learning Techniques,” Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP '02), pp.79-96, 2002.
[20] J.-T. Sun, X. Wang, D. Shen, H.-J. Zeng, and Z. Chen, “CWS: A Comparative Web Search System,” Proc. 15th Int'l Conf. World Wide Web (WWW '06), pp. 467-476, 2006.
[21] B. Liu, Y. Ma, and P. Yu, “Discovering Unexpected Information from Your Competitors' Web Sites,” Proc. ACM SIGKDD '01, pp.144-153, 2001.
[22] B. Liu, K. Zhao, and L. Yi, “Visualizing Web Site Comparisons,” Proc. 11th Int'l Conf. World Wide Web (WWW '02), pp. 693-703, 2002.
[23] P. Zang, “CTMS: A Comparative Text Mining System,” master's thesis, Univ. of Illinois at Urbana-Champaign, 2004.
[24] C. Zhai, A. Velivelli, and B. Yu, “A Cross-Collection Mixture Model for Comparative Text Mining,” Proc. ACM SIGKDD '04, pp.743-748, 2004.
[25] N. Jindal and B. Liu, “Identifying Comparative Sentences in Text Documents,” Proc. ACM SIGIR '06, pp. 244-251, 2006.
[26] N. Jindal and B. Liu, “Mining Comparative Sentences and Relations,” Proc. 21st Nat'l Conf. Artificial Intelligence (AAAI), 2006.
[27] S. Bao, Y. Cao, B. Liu, Y. Yu, and H. Li, “Mining Latent Associations of Objects Using a Typed Mixture Model—A Case Study on Expert/Expertise Mining,” Proc. Sixth IEEE Int'l Conf. Data Mining (ICDM '06), pp. 803-807, 2006.
[28] B. Liu, Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, Dec. 2006.

Index Terms:
Content Analysis and Indexing, Information Search and Retrieval, Performance evaluation
Shenghua Bao, Rui Li, Yong Yu, Yunbo Cao, "Competitor Mining with the Web," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 10, pp. 1297-1310, Oct. 2008, doi:10.1109/TKDE.2008.98
Usage of this product signifies your acceptance of the Terms of Use.