HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans

Keywords：Information extraction, Content-based search, Natural language processing, Intangible cultural heritage

Abstract：HerCulBcontent-based information extraction and retrieval for cultural heritage of the Balkans Purpose The purpose of this paper is to provide a methodology for automatic annotation of a multimedia collection of intangible cultural heritage mostly in the form of interviewsAssigned annotations provide a way to search the collectionDesignmethodologyapproach Annotation is based on automatic extraction of metadata and is conducted by named entity and topic extraction from textual descriptions with a rule-based approach supported by vocabulary resourcesa compiled domain-specific classification scheme and domain-oriented corpus analysisFindings The proposed methodology for automatic annotation of a collection of intangible cultural heritageapplied on the cultural heritage of the Balkanshas very good results according to F measurewhich is 087 for the named entity and 090 for topic annotationThe overall methodology enables encapsulating domain-specific and language-specific knowledge into collections of finite state transducers and allows further improvementsOriginalityvalue Although cultural heritage has a significant role in the development of identity of a group or an individualit is one of those specific domains that have not yet been fully explored in case of many languagesA methodology is proposed that can be used for incorporating natural language processing techniques into digital libraries of cultural heritageRetrieval and IndexingExtract through name entities and themesmaking cultural heritage more easily retrieved and indexedCollection ClassificationBy classifying the collection of intangible cultural heritage by specific classification schemes in useThis article provides a way to automatically label the multimedia collection of intangible cultural heritage that mainly exists in the form of interviewsThis method is based on automatic extraction of meta-data dataand is extracted from text descriptions through the name entity and themeThe use of rules-based methods is supported by vocabulary resourcescompiled specific classification schemes and domain-oriented corpus analysis support

HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans

網站意見回饋

網站意見回饋

聯絡我們

聯絡我們