综述述评

科技文献语篇元素自动标注模型研究综述

  • 于改红 ,
  • 张智雄 ,
  • 马娜
展开
  • 1. 中国科学院大学 北京 100049;
    2. 中国科学院文献情报中心 北京 100190;
    3. 中国科学院武汉文献情报中心 武汉 430071
于改红(ORCID:0000-0003-1301-2871),馆员,硕士;马娜(ORCID:0000-0001-5016-0879),馆员,硕士。

收稿日期: 2017-12-20

  修回日期: 2018-03-16

  网络出版日期: 2018-08-05

基金资助

本文系中国科学院文献情报能力建设专项项目"基于arXiv数据的物理领域科研论文自动语义标注和索引应用示范"(项目编号:院1657)研究成果之一。

Overview of Science and Technology Literature Discourse Elements Automatic Annotation Model Research

  • Yu Gaihong ,
  • Zhang Zhixiong ,
  • Ma Na
Expand
  • 1. University of Chinese academy of sciences, Beijing 100049;
    2. National Science Library, Chinese Academy of Sciences, Beijing 100190;
    3. Wuhan Library, Chinese Academy of Sciences, Wuhan 430071

Received date: 2017-12-20

  Revised date: 2018-03-16

  Online published: 2018-08-05

摘要

[目的/意义]为更好地提升科技文献的语义丰富化效果,对国内外科技文献语篇元素标注模型、技术和方法进行调研总结,为文本挖掘、科技论文知识抽取、语义分析系统研究者提供借鉴。[方法/过程]利用学术网站搜索和相关数据库搜索引擎,对涉及科技论文标注、语篇元素、知识抽取、句子识别和自动文章分类等参考文献以及研究报告进行深入阅读和调研,对语篇元素自动标注模型以及相关工作进展进行研究总结。[结果/结论]科技文献语篇元素标注具有非常重要的实际应用价值,构建标注模型需充分考虑构建思想、标注领域和标注粒度以及标注技术手段等方面。

本文引用格式

于改红 , 张智雄 , 马娜 . 科技文献语篇元素自动标注模型研究综述[J]. 图书情报工作, 2018 , 62(15) : 132 -144 . DOI: 10.13266/j.issn.0252-3116.2018.15.015

Abstract

[Purpose/significance] In order to improve the semantic enrichment effect of scientific and technical literature, this paper summarizes the domestic and foreign scientific and technical literature discourse elements automatic annotation model, technologies and methods, and provides reference for text mining, knowledge extraction and semantic analysis system. [Method/process] This paper used Web Scholar and related database search engine to conduct in-depth reading and related research on references and research reports involving scientific and technical papers annotation, discourse elements, knowledge extraction, sentence recognition, automatic article classification, etc. and summarized the research the main technologies of each module in the framework. [Result/conclusion] The annotation of scientific literature discourse elements has very important practical application value. The construction of annotation model needs to take full account of construction thought, annotation field and annotation granularity as well as annotation techniques.

参考文献

[1] FALQUET G. New trends for reading scientific documents[C]//ACM workshop on online books, complementary social media and crowdsourcing. New York:ACM, 2011:19-24.
[2] RIBAUPIERRE H, FALQUET G. A user-centric model to semantically annotate and retrieve scientific documents[C]//Proceedings of the sixth international workshop on exploiting semantic annotations in information retrieval. New York:ACM,2013:21-24.
[3] RIBAUPIERRE H, FALQUET G. User-centric design and evaluation of a semantic annotation model for scientific documents[C]//Proceedings of the 14th international conference on knowledge technologies and data-driven business. New York:ACM,2014:40.
[4] RIBAUPIERRE H. Precise information retrieval in semantic scientific digital libraries[D]//Genève:UNIVERSITÉ DE GENÈVE,2014.
[5] RIBAUPIERRE H, FALQUET G. An automated annotation process for the SciDocAnnot scientific document model[C]//Proceedings of the 5th international workshop on semantic digital archives. Osaka:International Workshop on Semantic Digital Archives,2015:30-41.
[6] RIBAUPIERRE H, FALQUET G. Extracting discourse elements and annotating scientific documents using the SciAnnotDoc model:a use case in gender documents[J]. International journal on digital libraries, 2017,1(3):1-16.
[7] SOLDATOVA L N, KING R D. An ontology of scientific experiments[J]. Journal of the royal society interface, 2006, 3(11):795-803.
[8] SOLDATOVA L, LIAKATA M. An ontology methodology and cisp-the proposed core information about scientific papers[EB/OL].[2018-05-31]. http://repository.jisc.ac.uk/137/1/ReportCISP.pdf.
[9] LIAKATA M, SOLDATOVA L. Guidelines for the annotation of general scientific concepts[J]. Applied & environmental microbiology, 2008, 61(3):1020-1026.
[10] LIAKATA M, CLAIRE Q, SOLDATOVA L N. Semantic annotation of papers:interface & enrichment tool[C]//Proceedings of the BioNLP 2009 workshop. boulder. Colorado:Association for Computational Linguistics,2009:193-200.
[11] LIAKATA M, TEUFEL S, SIDDHARTHAN A, et al. Corpora for the conceptualisation and zoning of scientific papers[C]//International conference on language resources and evaluation. Valletta:European Languages Resources Association (ELRA), 2010:105-108.
[12] LIAKATA M, SAHA S, DOBNIK S, et al. Automatic recognition of conceptualization zones in scientific articles and two life science applications[J]. BMC bioinformatics, 2012, 28(7):991-1000.
[13] LIAKATA M, THOMPSON P, de WAARD A, et al. A three-way perspective on scientific discourse annotation for knowledge extraction[C]//Proceedings of the workshop on detecting structure in scholarly discourse. Jeju Island:Association for Computational Linguistics, 2012:37-46.
[14] KORHONEN A, SILINS I, LIN S, et al. The first step in the development of text mining technology for cancer risk assessment:identifying and organizing scientific evidence in risk assessment literature[J]. BMC bioinformatics,2009, 10(1):1-19.
[15] GUO Y, KORHONEN A, LIAKATA M, et al. Identifying the information structure of scientific abstracts:an investigation of three different schemes[C]//Proceedings of the 2010 workshop on biomedical natural language processing. Uppsala:Association for Computational Linguistics, 2010:99-107.
[16] GUO Y, KORHONEN A, LIAKATA M, et al. A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment[J]. BMC bioinformatics, 2011, 12(1):1-18.
[17] LIAKATA M, DOBNIK S, SAHA S, et al. A discourse-driven content model for summarising scientific articles evaluated in a complex question answering task[C]//Proceedings of the 2013 conference on empirical methods in natural language processing. EMNLP. Seattle:Association for Computational Linguistics, 2013:747-757.
[18] RAVENSCROFT J, OELLRICH A, SAHA S, et al. Multi-label annotation in scientific articles-the multi-label cancer risk assessment corpus[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).Portoro?:European Language Resources Association (ELRA),2016.
[19] DUMA D, LIAKATA M, CLARE A, et al. Applying core scientific concepts to context-based citation recommendation[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).Portoro?:European Language Resources Association (ELRA),2016.
[20] TEUFEL S, CARLETTA J, MOENS M. An annotation scheme for discourse-level argumentation in research articles[C]//Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics,1999:110-117.
[21] TEUFEL S. Argumentative zoning:information extraction from scientific text[D]. Edinburgh:University of Edinburgh,1999.
[22] TEUFEL S, MOENS M. Summarizing scientific articles:experiments with relevance and rhetorical status[J]. Computational linguistics, 2002, 28(4):409-445.
[23] TEUFEL S, BATCHELOR C, BATCHELOR C. Towards discipline-independent argumentative zoning:evidence from chemistry and computational linguistics[C]//Conference on empirical methods in natural language processing. Singapore:Association for Computational Linguistics,2009:1493-1502.
[24] HEFFERNAN K, TEUFEL S. Identifying problems and solutions in scientific text[J]. Scientometrics,2018(1):1-16.
[25] YAMADA H, TEUFEL S, TOKUNAGA T. Annotation of argument structure in Japanese legal documents[C]//Proceedings of the 4th workshop on argument mining. Copenhagen:Association for Computational Linguistics,2017:22-31.
[26] RONZANO F, SAGGION H. Dr. inventor framework:extracting structured information from scientific publications[C]//JAPKOWICZ N, MATWIN S. Discovery science. Cham,:Springer, 2015:209-220.
[27] FISAS B, RONZANO F, SAGGION H. On the discoursive structure of computer graphics research papers[C]//The 9th linguistic annotation workshop held in conjuncion with NAACL. Denver:Association for Computational Linguistics,2015:42-51.
[28] FISAS B, RONZANO F, SAGGION H. A multi-layered annotated corpus of scientific papers[C]//Proceedings of the tenth international conference on language resources and evaluation. Paris:European Language Resources Association, 2016.
[29] RONZANO F, SAGGION H. Knowledge extraction and modeling from scientific publications[M]//Osborne:Springer International Publishing,2016:11-25.
[30] RONZANO F, FREIRE A, SAEZ-TRUMPER D, et al. Making sense of massive amounts of scientific publications:the scientific knowledge miner project[C]//BIRNDL 2016 joint workshop on bibliometric-enhanced information retrieval and NLP for digital libraries. New York:Digital Libraries. IEEE, 2016:36-41.
[31] ANKE L E, SAGGION H, RONZANO F. Weakly supervised definition extraction[C]//Proceedings of the international conference recent advances in natural language processing. Shoumen:INCOMA Ltd, 2015:176-185.
[32] 邢美凤. 科技文献中句子级新信息探测方法研究[D].北京:中国科学院研究生院, 2012.
[33] 白光祖, 何远标, 马建霞,等. 利用小样本量机器学习实现学术文摘结构的自动识别[J]. 现代图书情报技术, 2014, 30(7):34-40.
[34] 钱力, 张晓林, 王茜. 基于科技文献的研究设计指纹描述框架研究[J]. 大学图书馆学报, 2015(1):14-20.
[35] SONG W, FU R, LIU L, et al. Discourse element identification in student essays based on global and local cohesion[C]//Proceedings of the 2015 conference on empirical methods in natural language processing. Lisbon:Association for Computational Linguistics,2015:2255-2261.
[36] GUO Y, KORHONEN A, POIBEAU T. A weakly-supervised approach to argumentative zoning of scientific documents[C]//Proceedings of the conference on empirical methods in natural language processing. Edinburgh:Association for Computational Linguistics, 2011:273-283.
[37] CONTRACTOR D, GUO Y, KORHONEN A. Using argumentative zones for extractive summarization of scientific articles[C]//Proceedings of International Conference on Computational Linguistics. Mumbai:The COLING 2012 Organizing Committee, 2012:663-678.
[38] SILINS I, KORHONEN A, GUO Y, et al. A text-mining approach for chemical risk assessment and cancer research[J]. Toxicology letters, 2014,229(4):S164-S165.
[39] GUO Y, SÉAGHDHA D O, SILINS I, et al. CRAB 2.0:a text mining tool for supporting literature review in chemical cancer risk assessment[C]//Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics:System Demonstrations. Dublin:Dublin City University and Association for Computational Linguistics,2014:76-80.
[40] KIELA D, GUO Y, STENIUS U, et al. Unsupervised discovery of information structure in biomedical documents[J]. BMC bioinformatics, 2014, 31(7):1084-1092.
[41] BAKER S, SILINS I, GUO Y, et al. Automatic semantic classification of scientific literature according to the hallmarks of cancer[J]. BMC bioinformatics, 2015, 32(3):432-440.
[42] BAKER S, KORHONEN A. Initializing neural networks for hierarchical multi-label text classification[C]//16th Workshop on Biomedical Natural Language Processing. Vancouver:Association for Comutational Linguistics, 2017:307-315.
[43] KIM S N, MARTINEZ D, CAVEDON L, et al. Automatic classification of sentences to support evidence based medicine[J]BMC bioinformatics, 2011, 12(2):S5.
[44] SOLLACI L B, Pereira M G. The introduction, methods, results, and discussion (IMRAD) structure:a fifty-year survey[J]. Journal of the medical library association,2004, 92(3):364-371.
[45] GOBEILL J, TBAHRITI I, EHRLER F, et al. Gene ontology density estimation and discourse analysis for automatic GeneRiF extraction[J]BMC bioinformatics, 2008, 9(3):S9-19.
[46] JIMENO-YEPES A J, STICCO J C, MORK J G, et al. GeneRIF indexing:sentence selection based on machine learning[J]. BMC bioinformatics, 2013, 14(1):1-10.
[47] CLARK T, CICCARESE P N, GOBLE C A. Micropublications:a semantic model for claims, evidence, arguments and annotations in biomedical communications[J]. Journal of biomedical semantics, 2014, 5(1):28-61.
[48] ABU-JBARA A, RADEV D. Reference scope identification in citing sentences[C]//Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics:human language technologies. Montreal:Association for Computational Linguistics, 2012:80-90.
文章导航

/