The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (1997 vol.19)
pp: 273-277
ABSTRACT
<p><b>Abstract</b>—This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the CLiDE (Chemical Literature Data Extraction) system (http://chem.leeds.ac.uk/ICAMS/CLiDE.html), but the method described here is suitable for a broader range of documents. It is based on Kruskal's algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression.</p>
INDEX TERMS
Document analysis, physical page layout, bottom-up layout analysis, Kruskal's algorithm, spanning tree, chemical documents.
CITATION
Anikó Simon, Jean-Christophe Pret, A. Peter Johnson, "A Fast Algorithm for Bottom-Up Document Layout Analysis", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.19, no. 3, pp. 273-277, March 1997, doi:10.1109/34.584106
35 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool