An AI-Based Methodology for the Automatic Classification of a Multiclass Ebook Collection Using Information From the Tables of Contents

An AI-Based Methodology for the Automatic Classification of a Multiclass Ebook Collection Using Information From the Tables of Contents

Keywords:Arti cial Intelligence, natural language processing, classi cation algorithms, self-organizing maps, unsupervised learning, deep neural networks, digital libraries, book classi cation, ToC vectorization.
Abstract:An AI-Based Methodology for the Automatic Classification of a Multiclass Ebook Collection Using Information From the Tables of Contents Book recommendation to support professors and students in the identification of relevant sources is of significant importance for both universities and digital libraries andhencemotivates the development of a recommendation systemThis paper aims at automatically classifying a multiclass corpus that was created from ebooks from the Springer collectionwhich is available through the Hellenic Academic Librariessubscriptionby utilizing an unsupervised neural networkNNself-organizing mapsSOMand two deep neural networkDNNarchitecturesnamelya long short-term memoryLSTMand a convolutional neural networkCNNcombined with a LSTMCNNLSTMunder various configuration scenariosThe vector construction leverages information that was extracted from the table of contentsToCof each book using the TF-IDF weighting schemefor the first caseand the Keras tokenizerfor the secondExtensive experiments were conducted using various configurations of preprocessing stepsNN set up and vector and vocabulary sizes to assess their impact on the classifiers performanceFurthermorewe show that majority voting is more suitable for selecting the dominant label for a specified nodeThe experimental analysis showed the feasibility of developing a recommendation system for supporting professors and students in the identification of related sources based on a detailed thematic descriptionegabstract or table of contents of a bookrather than a few keywordsIn the conducted experimentsthe subsystem that utilized the DNNLSTMperformed the bestwith F1-scores of 67for the 26 categories and 80for the 5 general categorieswhereas SOM realizes F1-scores of less than 5in both casesConsulting ServiceHelp professors and students to identify related sources through the recommendation systemCollection ClassificationUse multiple category classification questions to find similar booksRetrieval and IndexingUse a directoryTOCto make effective book recommendationsThis paper focuses on the use of deep learningespecially deep neural networks using LSTM and CNNLSTMand machine learningself-tissue mapSOMfrom various categories of e-booksBy using a directory from each bookthese methods are used to recommend books similar to query