2018 IEEE International Conference on Services Computing (SCC) (2018)
San Francisco, CA, USA
Jul 2, 2018 to Jul 7, 2018
The rapid growth in both the number and diversity of Web services raises new requirement of clustering techniques to facilitate the service discovery, service repository management etc. Existing clustering methods of Web services primarily focus on using the semantic distances between service features, e.g., topic vectors, mined from WSDL documents. However, these quality topic vectors are hard to be obtained due to the lack of abundant textual information in Web service description documents. In practice, prior knowledge from human's trajectory of utilizing Web services could be helpful in improving the accuracy of Web services clustering. With an analysis in the dataset of Web services and Mashups from ProgrammableWeb, we observe that Web services Mashuped together are highly likely to belong to different clusters and Web services being annotated with identical tags tend to be within the same cluster. Based on these observations, this paper proposes an efficient clustering approach for Web services. The approach firstly uses a probabilistic topic model to elicit the latent topic vectors from Web service description documents. It then performs clustering based on the K-means++ algorithm by incorporating parameters representing above mentioned prior knowledge. A comprehensive evaluation is conducted to validate the performance of our proposed approach based on a ground truth dataset crawled from ProgrammableWeb. Experimental comparisons of the approaches with and without these prior knowledge considerations show that our approach has a significant improvement on the clustering accuracy.
data mining, pattern clustering, probability, text analysis, Web services
M. Shi, J. Liu, B. Cao, Y. Wen and X. Zhang, "A Prior Knowledge Based Approach to Improving Accuracy of Web Services Clustering," 2018 IEEE International Conference on Services Computing (SCC), San Francisco, CA, USA, 2018, pp. 1-8.