The ENP image and ground truth dataset of historical newspapers

The ENP image and ground truth dataset of historical newspapers

Keywords:image dataset, document analysis, ground truth, historical documents
Abstract:The ENP image and ground truth dataset of historical newspapers This paper presents a research dataset of historical newspapers comprising over 500 page imagesuniquely representative of European cultural heritage from the digitization projects of 13 national and major European librariescreated within the scope of Library AutomationDocument digitization through OCR technologyRetrieval and IndexingRetrieving and indexing documents using OCR technology and layout analysisInstitutional CollectionHistorical news materials collected through digitization projectsThis article introduces a research dataset on historical newspaperscontaining over 500 pages of images obtained from digitization projects of 12 national and major European librariesrepresenting European cultural heritageEach image is accompanied by complete ground truth dataUnicode-encoded full textprecise region contourstype tagsand layout information with reading orderThe article mainly focuses on using pattern recognition technologiessuch as OCR systemsto analyze and identify the content and layout of these historical documents