图书情报工作 ›› 2022, Vol. 66 ›› Issue (13): 4-14.DOI: 10.13266/j.issn.0252-3116.2022.13.001

所属专题: 面向民族地区的汉藏双语公共数字文化服务研究

• 专题:面向民族地区的汉藏双语公共数字文化服务研究 • 上一篇    下一篇

语码转换视角下汉藏双语查询式构造研究

张书田1,2, 何竹3   

  1. 1. 武汉大学信息管理学院 武汉 430072;
    2. 武汉大学人机交互与用户行为研究中心 武汉 430072;
    3. 西藏农牧学院公共教学部 林芝 860000
  • 收稿日期:2022-01-26 修回日期:2022-05-11 出版日期:2022-07-05 发布日期:2022-07-06
  • 通讯作者: 何竹,讲师,通信作者,E-mail:378104261@qq.com。
  • 作者简介:张书田,博士研究生。
  • 基金资助:
    本文系国家社会科学基金重大项目"面向三大公共数字文化工程资源融合的多语言信息组织与检索研究"(项目编号:19ZDA341)研究成果之一。

A Study of Chinese-Tibetan Bilingual Query Formulation from the Perspective of Code-Switching

Zhang Shutian1,2, He Zhu3   

  1. 1. School of Information Management, Wuhan University, Wuhan 430072;
    2. Center for Studies of Human-Computer Interaction and User Behavior, Wuhan University, Wuhan 430072;
    3. Department of Public Education, Tibet Agricultural and Animal Husbandry College, Linzhi 860000
  • Received:2022-01-26 Revised:2022-05-11 Online:2022-07-05 Published:2022-07-06

摘要: [目的/意义]针对我国汉藏双语用户在网络信息搜寻中的语码转换现象,对汉藏双语查询式的构造进行研究,为汉藏双语用户和搜索引擎提供语码转换搜索场景下的搜索策略与检索系统优化建议。[方法/过程]采用受控用户实验的方法,采集语码转换情境下汉藏双语用户的查询式,对双语查询式集进行文本分析、聚类分析,获取汉藏语码转换情境下的查询式构造特征,归纳查询重构模式。[结果/结论]对比汉-藏与藏-汉语码转换方向,发现用户在语码转换前后的查询式长度、语法复杂度与语义相似度上均体现出明显的差异。汉藏双语语码转换的查询重构表现出偏离、邻近、扩展与简缩4种模式。

关键词: 查询构造, 查询重构, 藏语, 语码转换

Abstract: [Purpose/Significance] In order to study the code-switching phenomenon of Chinese-Tibetan bilingual users in online information search in China, the paper studies the patterns of Chinese-Tibetan bilingual query formulation, and provides suggestions for optimizing search strategies and retrieval systems in code-switching search scenarios for Chinese-Tibetan bilingual users and search engine. [Method/Process] A controlled user experiment was used to collect the query patterns of Chinese-Tibetan bilingual users in the code-switching context. Text analysis and cluster analysis on bilingual queries were conducted to obtain the query formulation characteristics of in the Chinese-Tibetan code-switching context and to summarize the query reformulation schemas. [Result/Conclusion] Comparing the directions of Chinese-Tibetan and Tibetan-Chinese code-switching, it is found that users show significant differences in query length, syntactic complexity and semantic similarity before and after code-switching. In the query reformulation of Chinese-Tibetan bilingual code-switching, four schemas of deviation, proximity, expansion and abbreviation are demonstrated.

Key words: query formulation, query reformulation, Tibetan language, code-switching

中图分类号: