|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2009 Fourth International Conference on Internet and Web Applications and Services
Utilizing RSS Feeds for Crawling the Web
Venice/Mestre, Italy
May 24-May 28
ISBN: 978-0-7695-3613-2
| ASCII Text | x | ||
| George Adam, Christos Bouras, Vassilis Poulopoulos, "Utilizing RSS Feeds for Crawling the Web," Internet and Web Applications and Services, International Conference on, pp. 211-216, 2009 Fourth International Conference on Internet and Web Applications and Services, 2009. | |||
| BibTex | x | ||
| @article{ 10.1109/ICIW.2009.37, author = {George Adam and Christos Bouras and Vassilis Poulopoulos}, title = {Utilizing RSS Feeds for Crawling the Web}, journal ={Internet and Web Applications and Services, International Conference on}, volume = {0}, year = {2009}, isbn = {978-0-7695-3613-2}, pages = {211-216}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICIW.2009.37}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Internet and Web Applications and Services, International Conference on TI - Utilizing RSS Feeds for Crawling the Web SN - 978-0-7695-3613-2 SP211 EP216 A1 - George Adam, A1 - Christos Bouras, A1 - Vassilis Poulopoulos, PY - 2009 KW - rss crawling KW - web crawler KW - rss analysis KW - offline content VL - 0 JA - Internet and Web Applications and Services, International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICIW.2009.37
We present “advaRSS” crawling mechanism which is created in order to support peRSSonal, a mechanism used to create personalized RSS feeds. In contrast to the common crawling mechanisms our system is focalized on fetching the latest news from the major and minor portals worldwide by utilizing their communication channels. The challenge between “advaRSS” and a usual crawler is the fact that the news is produced in a random order any time of the day and thus the freshness of the offline collection can be measured even in minutes. This means that the system has to be updated with news every single time they occur. In order to achieve this we utilize the communication channels that exist on the modern architecture of the WWW and more specifically in almost every modern news portal. As the RSS feeds are used by every major and minor portal it is possible to keep our crawler up to date and retain a high freshness of the “offline content” that is maintained in our system’s database by applying algorithms in order to observe the temporal behaviour of each RSS feed.
Index Terms:
rss crawling, web crawler, rss analysis, offline content
Citation:
George Adam, Christos Bouras, Vassilis Poulopoulos, "Utilizing RSS Feeds for Crawling the Web," iciw, pp.211-216, 2009 Fourth International Conference on Internet and Web Applications and Services, 2009
Usage of this product signifies your acceptance of the Terms of Use.
