Script-free text line segmentation using interline space model for printed document images

  • Minwoo Kim*
  • , Il Seok Oh
  • *Corresponding author for this work

    Research output: Contribution to conferenceConference paperpeer-review

    Abstract

    This paper proposes a model-based text line segmentation algorithm for machine-printed document images. The model is based on geometric configuration which uses the interline spaces rather than the text lines. The paper proposes an objective function whose maximization leads to the optimal solution. The proposed interline space model provides the primary advantage of script-free nature. Additionally the model is versatile due to its abilities of processing both horizontally and vertically written documents and inferring the semantic of reading order. The experiments performed with various document images in Latin, Korean, Chinese, and Japanese scripts have proven the aforementioned advantages and have shown the noise tolerance.

    Original languageEnglish
    Title of host publicationProceedings - 11th International Conference on Document Analysis and Recognition, ICDAR 2011
    Pages1354-1358
    Number of pages5
    DOIs
    StatePublished - 2011
    Event11th International Conference on Document Analysis and Recognition, ICDAR 2011 - Beijing, China
    Duration: 2011.09.182011.09.21

    Publication series

    NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
    ISSN (Print)1520-5363

    Conference

    Conference11th International Conference on Document Analysis and Recognition, ICDAR 2011
    Country/TerritoryChina
    CityBeijing
    Period11.09.1811.09.21

    Keywords

    • geometric matching
    • interline space
    • model-based approach
    • reading order
    • text line segmentation

    Quacquarelli Symonds(QS) Subject Topics

    • Computer Science & Information Systems
    • Data Science

    Fingerprint

    Dive into the research topics of 'Script-free text line segmentation using interline space model for printed document images'. Together they form a unique fingerprint.

    Cite this