知识组织

基于查询意图的中文信息类网页分类研究

  • 王晓艳 ,
  • 林昌意
展开
  • 福建师范大学协和学院 福州 350117
王晓艳,讲师,硕士,E-mail:21155271@qq.com;林昌意,副教授,硕士。

收稿日期: 2014-11-28

  修回日期: 2014-12-20

  网络出版日期: 2015-01-05

Research on Chinese Informational Webpage Classification Based on Query Intention

  • Wang Xiaoyan ,
  • Lin Changyi
Expand
  • Concord College, Fujian Normal University, Fuzhou 350117

Received date: 2014-11-28

  Revised date: 2014-12-20

  Online published: 2015-01-05

摘要

[目的/意义] 通过网页分类提高搜索引擎及内容网站的检索性能,根据查询意图分类更精确地满足用户需求。[方法/过程] 以信息类中文网页为研究对象,采用人工归纳的方法构建信息类查询意图类目体系,提出根据该类目体系对信息类网页进行分类的方法,并通过实验进行验证。[结果/结论] 实验结果表明,所提出的方法具有较强的可行性,有助于精确地满足用户信息需求,提高搜索引擎及内容网站的检索性能。

本文引用格式

王晓艳 , 林昌意 . 基于查询意图的中文信息类网页分类研究[J]. 图书情报工作, 2015 , 59(1) : 113 -118,126 . DOI: 10.13266/j.issn.0252-3116.2015.01.015

Abstract

[Purpose/significance] Webpage classification can help to improve the retrieval performance of the search engine and the content site, and it will precisely meet the needs of users to classify pages according to query intentions. [Method/process] This thesis selects Chinese informational pages as the research object, and firstly uses the method of artificial induction to construct the classification system for informational query intentions, then proposes the method of grouping pages based on the classification system above, which is verified by experiments. [Result/conclusion] Experimental results show that the method above is viablez, and has a certain reference value for improving the retrieval relevance and meeting users' real query intentions.

参考文献

[1] 陆伟,周红霞,张晓娟.查询意图研究综述[J].中国图书馆学报,2013,39(1):100-111.
[2] 王大玲,于戈,鲍玉斌,等.基于用户搜索意图的Web网页动态泛化[J].软件学报,2010,21(5):1083-1087.
[3] Broder A.A taxonomy of Web search[J]. SIGIR Forum,2002,36(2): 3-10.
[4] Rose D E, Levinson D. Understanding user goals in Web search[C]//Proceeding of the 13th International Conference on World Wide Web. New York,: ACM, 2004:13-19.
[5] Chakrabarti S,Dom B,Indyk P. Enhanced hypertext categorization using hyperlinks[C]//Proceedings of ACM SIGMOD International Conference on Management of Data.New York: ACM,1998:307-318.
[6] Asirvatham A P,Ravi K K,Prakash A,et al. Web page classification based on document structure[EB/OL].[2014-11-28].http//citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.7710&rep=rep1&type=pdf.
[7] Cohen W W. Improving a page classifier with anchor extraction and link analysis[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge:MIT Press,2002:1481-1488.
[8] Kan M Y,Thi H O N. Fast webpage classification using URL features[C]//Proceedings of the 14th ACM International Conference on Information and Knowledge Management. New York:ACM,2005:325-326.
[9] Kovacevic M, Diligenti M, Gori M, et al. Recognition of common areas in a Web page using visual information: A possible application in a page classification[C]//Proceedings of 2002 IEEE International Conference on Data Mining(ICDM#02). Maebashi: IEEE Press, 2002:250-257.
[10] Shen Dou,Sun Jiantao,Yang Qiang,et al. A comparison of implicit and explicit links for webpage classification[C]//Proceedings of the 15th International Conference on World Wide Web. New York: ACM,2006:643-650.
[11] Elizabeth S B. Genre classification of Web documents[D]. Fort Collins: Colorado State University, 2005.
[12] 周帆.基于VSM的中文网页分类特征选择技术研究与实现[D].武汉:武汉理工大学,2012.
[13] 朱丽娜.中文网页分类特征提取方法研究[D].青岛:中国石油大学,2009.
[14] 朱珠.基于网页特征的中文网页自动分类[D].合肥:合肥工业大学,2009.
[15] 黄臻臻, 吴扬扬.中文网页体裁分类特征项的权值调整策略[J].广西师范大学学报: 自然科学版,2007,25(2):173-177.
[16] 时雷, 虎晓红, 席磊.基于集成学习的网页分类算法[J].郑州大学学报(理学版),2009,41(3):26-29.
[17] 庞观松,蒋盛益.文本自动分类技术研究综述[J].情报理论与实践,2012,35(2): 123-128.
[18] Yang Yiming, Liu Xin. A re-examination of text categorization method [C]//Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval.New York: ACM, 1999: 42-49.
[19] Rossi F,Villa N.Support vector machine for functional data classification[J].Neurocomputing,2006,69(7):730-742.

文章导航

/