图书情报工作 ›› 2019, Vol. 63 ›› Issue (5): 92-99.DOI: 10.13266/j.issn.0252-3116.2019.05.011

• 情报研究 • 上一篇    下一篇

基于深度学习的MOOC论坛探索型对话识别方法研究

董庆兴1,2, 李华阳3, 曹高辉1, 夏立新1   

  1. 1 华中师范大学信息管理学院 武汉 430079;
    2 武汉大学信息资源研究中心 武汉 430079;
    3 腾讯AI Lab 深圳 518057
  • 收稿日期:2018-06-27 修回日期:2018-08-12 出版日期:2019-03-05 发布日期:2019-03-05
  • 通讯作者: 曹高辉(ORCID:0000-0002-2760-4889),副院长,副教授,博士,通讯作者,E-mail:ghcao@mail.ccnu.edu.cn
  • 作者简介:董庆兴(ORCID:0000-0003-3512-9333),系主任,副教授,博士;李华阳(ORCID:0000-0003-3539-8648),助理研究员,本科;夏立新(ORCID:0000-0002-4162-2282),副校长,教授,博士
  • 基金资助:
    本文系国家自然科学基金项目"面向群智感知大数据的群体评价模型与方法研究"(项目编号:71871102)和华中师范大学中央高校基本科研业务费项目"在线医疗服务环境下用户信息感知及其线下行为变化机制研究"(项目编号:CCNU17TS0009)研究成果之一

An Exploratory Posts Detecting Method for MOOC Forums Based on Deep Learning

Dong Qingxing1,2, Li Huayang3, Cao Gaohui1, Xia Lixin1   

  1. 1 School of Information Management, Central China Normal University, Wuhan 430079;
    2 Centre for Studies of Information Resources, Wuhan University, Wuhan 430079;
    3 Tencent AI Lab, Shenzhen 518057
  • Received:2018-06-27 Revised:2018-08-12 Online:2019-03-05 Published:2019-03-05

摘要: [目的/意义]大规模在线开放课程论坛具有丰富的用户评论数据。从大量未区分的评论数据中,自动识别出知识密度较高的探索型对话并挖掘其潜在价值,对于改善教师教学质量以及提高学生知识水平具有重要影响。[方法/过程]首先利用GloVe方法训练词向量,加强对文本语义的理解,然后利用卷积神经网络自动学习文本特征,提出一种基于深度学习的探索型对话自动识别模型,并在学堂在线平台《心理学概论》课程论坛标注数据集上进行实证与对比研究。[结果/结论]实验结果显示,利用GloVe方法预训练词向量以及在训练过程中不断对词向量进行学习修正能够提高模型效果。该模型识别探索型对话的F1值为0.94,相较于传统的朴素贝叶斯方法(0.88)、逻辑斯谛回归方法(0.89)、决策树方法(0.88)以及随机森林方法(0.88)取得较大提升,具有较高的实用性和较低的学习成本。

关键词: MOOC论坛, 探索型对话, GloVe, 卷积神经网络

Abstract: [Purpose/significance] Massive Open Online Course (MOOC) forum is an important source to acquire user review data. Automatically detecting exploratory dialogues with high knowledge density from large amounts of unlabeled data and mining its potential value has a significant impact on the improvement of teaching quality and students’ mastery of knowledge. [Method/process] We proposed a new auto-detecting model based on deep learning, which firstly uses GloVe algorithm to train word embedding to reinforce semantic understanding for texts and then adopts Convolutional Neural Network (CNN) to automatically learn text features and make classifications on exploratory dialogues. An empirical and comparative study was done on the annotated dataset from the online course Introduction to Psychology on the platform of Xuetang. [Result/conclusion] Experiment result shows that using the word embedding pretrained by GloVe and fine tune it while training can improve the performance of our model. Our model gets the F1 score of 0.94, which is greatly improved compared with Naive Bayes model (0.88), Logistic Regression model (0.89), Decision Tree model (0.88) and Random Forest model (0.88) and exhibits great practicality with low learning costs.

Key words: MOOC forum, exploratory posts, GloVe, Convolutional neural network

中图分类号: