IEEE International Conference on E-Commerce Technology for Dynamic E-Business (CEC-East'04)
Client-Side Deep Web Data Extraction
Beijing, China
September 13-September 15
ISBN: 0-7695-2206-8
The problem of data extraction from the Deep Web can be divided into two tasks: crawling the client-side and the server-side deep web. The objective of this paper is to define an architecture and a set of related techniques to access the information placed in the client-side deep web. This involves dealing with aspects such as JavaScript technology, non-standard session maintenance mechanisms, client redirections, pop-up menus, etc. Our work uses current browser APIs as building blocks and leverages them to implement novel crawling models and algorithms.
Citation:
Manuel ?lvarez, Alberto Pan, Juan Raposo, Angel Vi?, "Client-Side Deep Web Data Extraction," cec-east, pp.158-161, IEEE International Conference on E-Commerce Technology for Dynamic E-Business (CEC-East'04), 2004