图书情报工作 ›› 2018, Vol. 62 ›› Issue (13): 64-73.DOI: 10.13266/j.issn.0252-3116.2018.13.009

• 情报研究 • 上一篇    下一篇

基于深度学习的数据科学招聘实体自动抽取及分析研究

王东波1, 胡昊天1, 周鑫2, 朱丹浩3   

  1. 1. 南京农业大学信息科学技术学院 南京 210095;
    2. 南京大学信息管理学院 南京 210093;
    3. 南京大学计算机科学与技术系 南京 210093
  • 收稿日期:2017-12-02 修回日期:2018-04-08 出版日期:2018-07-05 发布日期:2018-07-05
  • 作者简介:王东波(ORCID:0000-0002-9894-9550),副教授,硕士生导师,E-mail:db.wang@njau.edu.cn;胡昊天(ORCID:),本科生;周鑫(ORCID:),博士研究生;朱丹浩(ORCID:),助理馆员。
  • 基金资助:
    本文系国家社会科学基金重大项目"情报学学科建设与情报工作未来发展路径研究"(项目编号:17ZDA291)和江苏省普通高校学术学位研究生科研创新计划项目"引用内容分析——引文语义信息的自动挖掘(KYZZ16_0033)"研究成果之一。

Research of Automatic Extraction of Entities of Data Science Recruitment and Analysis Based on Deep Learning

Wang Dongbo1, Hu Haotian1, Zhou Xin2, Zhu Danhao3   

  1. 1. Colledge of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095;
    2. Department of Information Management, Nanjing University, Nanjing 210093;
    3. Department of Computer Science and Technology, Nanjing University, Nanjing 210093
  • Received:2017-12-02 Revised:2018-04-08 Online:2018-07-05 Published:2018-07-05

摘要: [目的/意义]数据科学作为一个融合诸多领域的新兴交叉学科正在快速形成。从数据科学招聘的公告信息中,抽取出相应的实体知识不仅有助于从市场的角度了解数据科学的发展动态,而且有助于改进数据科学教学的内容。[方法/过程]基于各大招聘网站职位招聘公告,结合情报学的数据获取、标注和组织方法,构建数据科学招聘语料库并从中抽取相应的实体进行分析与研究。[结果/结论]在搜集到的11 000篇经过标注的职位招聘公告语料的基础上,基于Bi-LSTM-CRF、CRF和Bi-LSTM模型,对数据科学招聘实体的抽取任务进行性能的对比,确定最终的数据科学招聘实体自动抽取模型,设计数据科学招聘实体自动抽取平台,并构建数据科学招聘实体网络。

关键词: 数据科学, 条件随机场, 深度学习, Bi-LSTM-CRF

Abstract: [Purpose/significance] Data science is emerging as a new interdisciplinary field which combines many fields. Extracting the corresponding entities knowledge from the announcement information of data science recruitment can not only help to understand the development of data science from a market perspective, but also help to improve the content of data science teaching.[Method/process] Based on the recruitment announcement from the recruitment website, combining with information science data collection, annotation and organization methods, data science corpus was constructed and the corresponding entities from it were extracted.[Result/conclusion] In the existing 11000 annotated data science corpus scale recruitment announcement, based on the Bi-LSTM-CRF, CRF and Bi-LSTM models, this paper compared the extraction performance of data science recruiting entities and finally determined the final data science recruitment entities automatic extraction model, designed the data science recruitment entities automatic extraction platform, and built a data science recruitment entities network.

Key words: data science, conditional random field, deep learning, Bi-LSTM-CRF

中图分类号: