Issue No. 03 - March (1997 vol. 19)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/34.584106
<p><b>Abstract</b>—This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the CLiDE (Chemical Literature Data Extraction) system (http://chem.leeds.ac.uk/ICAMS/CLiDE.html), but the method described here is suitable for a broader range of documents. It is based on Kruskal's algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression.</p>
Document analysis, physical page layout, bottom-up layout analysis, Kruskal's algorithm, spanning tree, chemical documents.
Anikó Simon, Jean-Christophe Pret, A. Peter Johnson, "A Fast Algorithm for Bottom-Up Document Layout Analysis", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 19, no. , pp. 273-277, March 1997, doi:10.1109/34.584106