Natural language processing and machine learning as practical toolsets for archival processing

Natural language processing and machine learning as practical toolsets for archival processing

Keywords:Appraisal, Machine learning, Computational archival science, Natural language processing (NLP), Personally identifiable information (PII), Sensitivity review
Abstract:Natural language processing and machine learning as practical toolsets for archival processing PurposeThisstudyaimstoprovideanoverviewofrecenteffortsrelatingtonaturallanguageprocessingNLPandmachinelearningappliedtoarchivalprocessingparticularlyappraisalandsensitivityreviewsandproposefunctionalrequirementsandworkflowconsiderationsfortransitioningfromexperimentalto operationaluseofthesetoolsDesignmethodologyapproachThepaperhasfourmainsections1AshortoverviewoftheNLP andmachinelearningconceptsreferencedinthepaper2AreviewoftheliteraturereportingonNLPand machinelearningappliedtoarchivalprocesses3Anoverviewandcommentaryonkeyexistingand developingtoolsthatuseNLPormachinelearningtechniquesforarchives4Thisreviewandanalysiswill informadiscussionoffunctionalrequirementsandworkflowconsiderationsforNLPandmachinelearning toolsforarchivalprocessingFindingsApplicationsforprocessinge-mailhavereceivedthemostattentionsofaralthoughmost initiativeshavebeenexperimentalorprojectbasedItnowseemsfeasibletobranchouttodevelopmore generalizedtoolsforborn-digitalunstructuredrecordsEffectiveNLPandmachinelearningtoolsforarchival processingshouldbeusableinteroperableflexibleiterativeandconfigurableOriginalityvalueMostimplementationsofNLPforarchiveshavebeenexperimentalorproject basedThemainexceptionthathasmovedintoproductionisePADDwhichincludesrobust NLPfeaturesthroughitsnamedentityrecognitionmoduleThispapertakesabroaderviewassessing theprospectsandpossibledirectionsforintegratingNLPtoolsandtechniquesintoarchival workflowsRetrieval and IndexingThrough NLP and machine learning technologyfiles or materials are more likely to be retrieved and indexedThis article is mainly related to archivesnot directly related to the libraryHoweverarchives and libraries are similar in some aspectsespecially in terms of data organization and managementThis article mainly discusses how to process it in file processingespecially evaluation and sensitivity reviewand apply natural language processingNLPand machine learningIt explores the functional requirements and workflow considerations of these tools from the experimental stage to the operating phase