Web Intelligence, IEEE / WIC / ACM International Conference on (2006)
Hong Kong, China
Dec. 18, 2006 to Dec. 22, 2006
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WI.2006.33
Ioannis Anagnostopoulos , University of the Aegean, Greece
Photis Stavropoulos , University of the Aegean, Greece
This paper proposes a statistical approach for estimating the evolution of web pages in directories. The proposal is based on the capture-recapture method used in wildlife biological studies in an animal, bird or fish populations, and it is modified according to the necessary assumptions and amendments for applying the experiments in a search engine directory. During these experiments, web pages are considered as animals and the specific types of web pages as particular species of animals whose abundance, birth, death and survival rates are estimated. The population is open, meaning that new web pages are submitted to the search engine directory, while others are removed from the directory indexes, resembling to emigration/immigration processes in nature. The role of the biologist who recognizes the species under study and records their history is assigned to a web page classifier, which is trained under the Open Directory?s (DMOZ project) taxonomy. The classifier is a three layer Probabilistic Neural Network capable of identifying and categorizing web pages, on the basis of information filtering. A virtual experiment is simulated based on the classifier performance over real web pages, while the results are quite promising.
I. Anagnostopoulos and P. Stavropoulos, "Adopting Wildlife Experiments for Web Evolution Estimations: The Role of an AI Web Page Classifier," 2006 IEEE/WIC/ACM International Conference on Web Intelligence(WI), Hong Kong, 2006, pp. 897-901.