- 年份:2021 年
- 編號:223
- Topic分類:4
- Topic分數:0.4210602647
- Publish:2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)
- 作者:Shivashankar Subramanian; Daniel King; Doug Downey; Sergey Feldman
Keywords:Digital libraries, Author name disambiguation, Out-of-domain evaluation
Abstract:S2ANDA Benchmark and Evaluation System for Author Name Disambiguation Author Name DisambiguationANDis the task of resolving which author mentions in a bibliographic database refer to the same real-world personand is a critical ingredient of digital library applications such as search and citation analysisWhile many AND algorithms have been proposedcomparing them is difficult because they often employ distinct features and are evaluated on different datasetsIn response to this challengewe present S2ANDa unified benchmark dataset for AND on scholarly papersas well as an open-source reference model implementationOur dataset harmonizes eight disparate AND datasets into a uniform formatwith a single rich feature set drawn from the Semantic ScholarS2databaseOur evaluation suite for S2AND reports performance split by facets like publication year and number of papersallowing researchers to track both global performance and measures of fairness across facet valuesOur experiments show that because previous datasets tend to cover idiosyncratic and biased slices of the literaturealgorithms trained to perform well on one on them may generalize poorly to othersBy contrastwe show how training on a union of datasets in S2AND results in more robust models that perform well even on datasets unseen in trainingThe resulting AND model also substantially improves over the production algorithm in S2reducing error by over 50in terms of B 3 F1We release our unified datasetmodel codetrained modelsand evaluation suite to the research community1 1 and IndexingThis article mainly discusses the authors name of the author in the large-scale literature databasewhich is very important for the retrieval and indexing work in the libraryThis article discusses Author Name DisambiguationandThe main purpose is to identify the authors name mentioned in the literature databaseand the names that are determined to actually refer to the same personTo this enddifferent data sets must be integrated and evaluated through data analysis and pattern recognitionThis involves a model that excavates and identify similarly from various data setsso as to gather it togetherwhich is part of data exploration
© All Rights LibAiRsystem.

