Examining Patterns of Text Reuse in Digitized Text Collections

Examining Patterns of Text Reuse in Digitized Text Collections

Keywords:Text Mining, Digital Library, Text Structure
Abstract:Examining Patterns of Text Reuse in Digitized Text Collections Repeating text presents challenges to accessretrievaland mining of large-scale digital librariesWe discuss the various forms of duplicate works in scanned text collections and show the patterns of how those variations manifest within the structure o Retrieval and IndexingThis involves retrieving and identifying duplicate or similar texts from large digital librariesThis article primarily discusses the challenges posed by duplicate texts in large digital libraries for accessretrievaland miningThe article discusses how to identify identical works through page-level similarityThis identification process is related to pattern recognition as it involves analyzing and recognizing text structures to identify and manage duplicate and variant works