Improving the visibility of library resources via mapping library subject headings to Wikipedia articles

Improving the visibility of library resources via mapping library subject headings to Wikipedia articles

Keywords:FAST subject headings, Controlled vocabularies, Wikipedia, Data integration, Library catalogues
Abstract:Improving the visibility of library resources via mapping library subject headings to Wikipedia articles PurposeLinking libraries and Wikipedia can significantly improve the quality of services provided by these two major silos of knowledgeSuch linkage would enrich the quality of Wikipedia articles and at the same time increase the visibility of library resourcesTo this endthe purpose of this paper is to describe the design and development of a software system for automatic mapping of FAST subject headingsused to index library materialsto their corresponding articles in WikipediaDesignmethodologyapproachThe proposed system works by first detecting all the candidate Wikipedia conceptsarticlesoccurring in the titles of the books and other library materials which are indexed with a given FAST subject headingThis is then followed by training and deploying a machine learningMLalgorithm designed to automatically identify those concepts that correspond to the FAST headingIn specificthe ML algorithm used is a binary classifier which classifies the candidate concepts into eithercorrespondingornon-correspondingcategoriesThe classifier is trained to learn the characteristics of those candidates which have the highest probability of belonging to thecorrespondingcategory based on a set of 14 positionalstatisticaland semantic featuresFindingsThe authors have assessed the performance of the developed system using standard information retrieval measures of precisionrecalland F-score on a data set containing 170 FAST subject headings manually mapped to their corresponding Wikipedia articlesThe evaluation results show that the developed system is capable of achieving F-scores as high as 065 and 099 in the corresponding and non-corresponding categoriesrespectivelyResearch limitationsimplicationsThe size of the data set used to evaluate the performance of the system is rather smallHoweverthe authors believe that the developed data set is large enough to demonstrate the feasibility and scalability of the proposed approachPractical implicationsThe sheer size of English Wikipedia makes the manual mapping of Wikipedia articles to library subject headings a very labor-intensive and time-consuming taskThereforethe aim is to reduce the cost of such mapping and integrationSocial implicationsThe proposed mapping paves the way for connecting libraries and Wikipedia as two major silos of knowledgeand enables the bi-directional movement of users between the twoOriginalityvalueTo the best of the authorsknowledgethe current work is the first attempt at automatic mapping of Wikipedia to a library-controlled vocabularyRetrieval and indexingThe main purpose of this research is to correspond to the theme title of the library to the Wikipedia articles to enhance the connection between the twoThis article describes a software system to automatically map the FAST theme titlefor index library datato its corresponding article in WikipediaTo this endthe system first detects the candidate Wikipedia concepts that appear in the title of books and other library materialsand then use machine learning algorithms to automatically identify the concept corresponding to the FAST theme titleSpecificallythe machine learning algorithm is a binary classifierThe classifier is based on a set of 14 positionsstatisticsand semantic characteristics to learn the characteristics that are most likely to belong to thecorrespondingcategory