MexPub: Deep Transfer Learning for Metadata Extraction from German Publications

MexPub: Deep Transfer Learning for Metadata Extraction from German Publications

Keywords:transfer learning, metadata extraction, neural networks
Abstract:MexPubDeep Transfer Learning for Metadata Extraction from German Publications In contrast to most of the English scientific publications that follow standard and simple layoutsthe ordercontentposition and size of metadata in German publications vary greatly among publicationsThis variety makes traditional NLP methods fail t Retrieval and IndexingBecause it helps to automatically extract data from the literaturemaking the retrieval process more precise and efficientThis article describes a new method that treats PDF documents as images for data extractionThe method uses deep learning techniquesparticularly Mask R-CNNfor pattern recognition and object detection to accurately extract metadata from German publications with different layouts and stylesTraditional NLP methods may encounter difficulties in this areaso this computer vision-based approach provides an effective solution