The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the CLiDE (Chemical Literature Data Extraction) system (, but the method described here is suitable for a broader range of documents. It is based on Kruskal's algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression.</p>
Document analysis, physical page layout, bottom-up layout analysis, Kruskal's algorithm, spanning tree, chemical documents.

A. Simon, J. Pret and A. P. Johnson, "A Fast Algorithm for Bottom-Up Document Layout Analysis," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 19, no. , pp. 273-277, 1997.
89 ms
(Ver 3.3 (11022016))