REVIEW & COMMENTARY

Current Research on Intelligent Information Processing for Ancient Books

  • Liu Yang ,
  • Wang Dongbo
Expand
  • College of Information Management, Nanjing Agricultural University, Nanjing 210095

Received date: 2024-03-29

  Revised date: 2024-07-16

  Online published: 2024-12-04

Supported by

This work is supported by the major project of National Social Science Fund of China titled “Research on the Construction and Application of Cross-Language Knowledge Base of Ancient Chinese Books”(Grant No.21&ZD331).

Abstract

[Purpose/Significance] The rapid development of AI technology has accelerated the research on intelligent information processing of ancient books.In order to grasp the current research and development trend of this field as a whole, it is necessary to systematically sort out the relevant literature so far and provide reference for related research.[Method/Process]Through a systematic review and analysis of existing literature, it defined the connotation and extension of intelligent information processing of ancient books, summarized the main technical methods, tasks and applications, and discussed the development trend.[Result/Conclusion] Intelligent information processing technology for ancient books is based on statistical learning, machine learning, deep learning, pre-training models and large language models.The main tasks include digitization of ancient books, automatic sentence separation and punctuation, automatic word separation, automatic POS tagging, automatic information extraction, automatic classification, citation analysis, machine translation, entity disambiguation, knowledge base construction, and knowledge graph construction, etc.The main research applications focus on specialized and thematic research.In addition, the application of large language model technology, multimodal intelligent information processing tasks and the deepening and expansion of application scenarios are the development trends of intelligent information processing for ancient books.

Cite this article

Liu Yang , Wang Dongbo . Current Research on Intelligent Information Processing for Ancient Books[J]. Library and Information Service, 2024 , 68(23) : 120 -138 . DOI: 10.13266/j.issn.0252-3116.2024.23.010

References

[1] 黄水清, 王东波. 古文信息处理研究的现状及趋势[J]. 图书情报工作, 2017, 61(12): 43-49. (HUANG S Q, WANG D B.Review and trend of researches on ancient Chinese character information processing[J]. Library and information service, 2017, 61(12): 43-49.)
[2] 邓三鸿, 胡昊天, 王昊, 等. 古文自动处理研究现状与新时代发展趋势展望[J]. 科技情报研究, 2021, 3(1): 1-20. (DENG S H, HU H T, WANG H, et al. Review of automatic processing of ancient Chinese character and prospects for its development trends in the new era[J]. Scientific information research, 2021, 3(1): 1-20.)
[3] 林立涛, 王东波. 古籍文本挖掘技术综述[J]. 科技情报研究, 2023, 5(1): 78-91. (LIN L T, WANG D B. A survey of ancient book text mining technology[J]. Scientific information research, 2023, 5(1): 78-91.)
[4] 科技日报. “荀子”大语言模型:化繁为简, 通读古今[EB/OL]. [2024-02-05]. https://digital.gmw.cn/2024-01/09/content_37079517.htm. (Science and Technology Daily. Xunzi big language model: simplifying the complex to read the past and present[EB/OL]. [2024-02-05]. https://digital.gmw.cn/2024-01/09/content_37079517.htm.).
[5] 叶鹰. 智能信息处理的基础理论探讨[J]. 情报科学, 2008(9): 1281-1285, 1291. (YE Y. A probe into the fundamental theory of intelligent information processing[J]. Information science, 2008(9): 1281-1285, 1291.)
[6] 叶鹰. 智能信息处理和智能信息分析前瞻[J]. 图书与情报, 2017(6): 70-73, 95. (YE Y. A prospect on intelligent information processing and intelligent information analysis[J]. Library & information, 2017(6): 70-73, 95.)
[7] 夏笑吟. 武汉音乐学院图书馆特色音乐古籍资源的调查与研究[J]. 黄钟(武汉音乐学院学报), 2019(3): 124-129. (XIA X Y. Investigation and research on the characteristic music ancient books resources of Wuhan Conservatory of Music Library[J]. Huangzhong (Journal of Wuhan Conservatory of Music), 2019(3): 124-129.)
[8] 夏凌翔, 黄希庭. 古籍中自立涵义的概念分析[J]. 心理学报, 2006(6): 916-923. (XIA L X, HUANG X Q. Analysis of the concept of “Zili” in ancient Chinese texts [J]. Acta psychologica sinica, 2006(6): 916-923.)
[9] 孟伟, 王希法, 马苏林, 等. 基于古籍医案文献数据的心力衰竭用药分析[J]. 中华中医药杂志, 2014, 29(3): 898-900. (MENG W, WANG X F, MA S L, et al. Analysis of heart failure medication in ancient books based on literature data[J]. China journal of traditional Chinese medicine and pharmacy, 2014, 29(3): 898-900.)
[10] 刘浏, 黄水清, 孟凯, 等. 《春秋》三传女性人物的人文计算研究[J]. 图书情报工作, 2020, 64(23): 109-123. (LIU L, HUANG S Q, MENG K, et al. Humanity computing on women in Spring and Autumn Annals and the Three Commentaries[J]. Library and information service, 2020, 64(23): 109-123.)
[11] 吴梦成, 林立涛, 胡蝶, 等. 我国古代典籍时代特征视角下的机器翻译研究[J]. 图书馆论坛, 2023: 1-11. (WU M C, LIN L T, HU D, et al. Research on machine translation from the perspective of temporal characteristics in ancient Chinese classical texts[J]. Library tribune, 2023: 1-11.)
[12] 何琳, 乔粤, 孟凯. 基于典籍的春秋社会时间序列演变分析方法初探[J]. 情报理论与实践, 2021, 44(2): 33-40. (HE L, QIAO Y, MENG K. Social changes in period Chunqiu of China:from the perspective of time-series analysis of Chinese classical book[J]. Information studies:theory & application, 2021, 44(2): 33-40.)
[13] 常博林, 万晨, 李斌, 等. 基于词和实体标注的古籍数字人文知识库的构建与应用——以《资治通鉴·周秦汉纪》为例[J]. 图书情报工作, 2021, 65(22): 134-142. (CHANG B L, WAN C, LI B, et al. The construction and application for digital humanities knowledge base of ancient books based on word and entity annotation: a case study on Zhou Qin Han Annals of Zizhitongjian[J]. Library and information service, 2021, 65(22): 134-142.)
[14] WANG J, DUAN S, FU B, et al. Evol project: a comprehensive online platform for quantitative analysis of ancient literature[J]. Humanities and social sciences communications, 2024, 11(1): 1-13.
[15] 陈凯, 朱钰. 机器学习及其相关算法综述[J]. 统计与信息论坛, 2007(5): 105-112. (CHEN K, ZHU Y. A summary of machine learning and related algorithms[J]. Journal of statistics and information, 2007(5): 105-112.)
[16] 胡越, 罗东阳, 花奎, 等. 关于深度学习的综述与讨论[J]. 智能系统学报, 2019, 14(1): 1-19. (HU Y, LUO D Y, HUA K, et al. Overview on deep learning[J]. CAAI transactions on intelligent systems, 2019, 14(1): 1-19.)
[17] 袁悦, 王东波, 黄水清, 等. 不同词性标记集在典籍实体抽取上的差异性探究[J]. 数据分析与知识发现, 2019, 3(3): 57-65. (YUAN Y, WANG D B, HUANG S Q, et al. The comparative study of different tagging sets on entity extraction of classical books[J]. Data analysis and knowledge discovery, 2019, 3(3): 57-65.)
[18] 李娜. 基于条件随机场的方志古籍别名自动抽取模型构建[J]. 中文信息学报, 2018, 32(11): 41-48, 61. (LI N. Automatic extraction of alias in ancient local chronicles based on conditional random fields[J]. Journal of Chinese information processing, 2018, 32(11): 41-48, 61.)
[19] 陈诗, 王东波, 黄水清. 数字人文下的典籍人称代词指代消解研究[J]. 情报理论与实践, 2021, 44(10): 165-172. (CHEN S, WANG D B, HUANG S Q. Research on the resolution of personal pronoun in classical books under the digital humanism[J]. Information studies:theory & application, 2021, 44(10): 165-172.)
[20] 刘博, 杜建强, 聂斌, 等. 基于二阶HMM的中医诊断古文词性标注[J]. 计算机工程, 2017, 43(7): 211-216. (LIU B, DU J Q, NIE B, et al. Part-of-speech tagging of traditional Chinese medicine diagnosis ancient prose based on second-order HMM[J]. Computer engineering, 2017, 43(7): 211-216.)
[21] 王东波, 何琳, 黄水清. 基于支持向量机的先秦诸子典籍自动分类研究[J]. 图书情报工作, 2017, 61(12): 71-76. (WANG D B, HE L, HUANG S Q. Research of automatic classification for pre-Qin philosophers literature based on the support vector machine[J]. Library and information service, 2017, 61(12): 71-76.)
[22] 李文林, 屠强, 彭丽坤, 等. 基于关联规则分析明清古籍中疫病文献的药-症关系[J]. 时珍国医国药, 2010, 21(4): 957-959. (LI W L, TU Q, PENG L Q, et al. Research on the relationship between drugs and symptoms about epidemic febrile disease treated by doctors of Ming and Qing dynasties based on bidirectional[J]. Lishizhen medicine and materia medica research, 2010, 21(4): 957-959.)
[23] QI Y, LIU L, LI B, et al. Vector based stylistic analysis on ancient Chinese books: take the Three Commentaries on the Spring and Autumn Annals as an example[C]//ANDERSON A, GORDIN S, LI B, et al. Proceedings of the ancient language processing workshop.Varna, Bulgaria: INCOMA Ltd., 2023:117-121.
[24] 王小红, 艾伦科林, 浦江淮, 等. 人文知识发现的计算机实现——对“汉典古籍”主题建模的实证分析[J]. 自然辩证法通讯, 2018, 40(4): 50-58. (WANG X H, COLIN A, Pu J H, et al. To discover humanities knowledge by the computer: an empirical analysis of topic modeling the “Handian” ancient Chinese classics[J]. Journal of dialectics of nature, 2018, 40(4): 50-58.)
[25] 孙燕, 刘浏, 王东波. 《春秋左传正义》引书计算人文研究[J]. 图书情报工作, 2023, 67(2): 119-130. (SUN Y, LIU L, WANG D B. A computing humanities study on the citation books from Chun Qiu Zuo Zhuan Zheng Yi[J]. Library and information service, 2023, 67(2): 119-130.)
[26] 陈丽平, 李建生, 杨淑慧, 等. 基于隐结构结合Logistic回归分析探讨9323例古籍咳嗽医案证候分布[J]. 中国实验方剂学杂志, 2021, 27(14): 175-182. (CHEN L P, LI J S, YANg S H, et al. Syndrome distribution of 9323 cough cases in ancient Chinese medical books based on latent structure model and logistic regression analysis[J]. Chinese journal of experimental traditional medical formulae, 2021, 27(14): 175-182.)
[27] 杜悦, 王东波, 江川, 等. 数字人文下的典籍深度学习实体自动识别模型构建及应用研究[J]. 图书情报工作, 2021, 65(3): 100-108. (DU Y, WANG D B, JIANG C, et al. Construction and application of entity recognition model based on deep learning of classics in digital humanities[J]. Library and information service, 2021, 65(3): 100-108.)
[28] 李娜. 面向方志类古籍的多类型命名实体联合自动识别模型构建[J]. 图书馆论坛, 2021, 41(12): 113-123. (LI N. Construction of automatic recognition model of multi-type named entities for local gazetteers[J]. Library tribune, 2021, 41(12): 113-123.)
[29] 梁媛, 王东波, 黄水清. 古籍同事异文的自动发掘研究[J]. 图书情报工作, 2021, 65(9): 97-104. (LIANG Y, WANG D B, HUANG S Q. Research on automatic mining of variants expressing the same event in the ancient books[J]. Library and information service, 2021, 65(9): 97-104.)
[30] 吴梦成, 林立涛, 齐月, 等. 数字人文视域下先秦典籍植物知识挖掘与组织研究[J]. 图书情报工作, 2023, 67(12): 103-113. (WU M C, LIN L T, QI Y, et al. Plant knowledge mining and organization construction in pre-Qin classics from the perspective of digital humanities[J]. Library and information service, 2023, 67(12): 103-113.)
[31] ZHANG Y, DENG S, ZHANG Q, et al. Comparative analysis of language models for linguistic examination of ancient Chinese classics: a case study of Zuozhuan corpus[C]//2023 International conference on Asian language processing. Singapore: IEEE, 2023: 154-161.
[32] CHENG N, LI B, XIAO L, et al. Integration of automatic sentence segmentation and lexical analysis of ancient Chinese based on BiLSTM-CRF model[C]//SPRUGNOLI R, PASSAROTTI M. Proceedings of LT4HALA 2020- 1st Workshop on Language Technologies for Historical and Ancient Languages. Marseille: European Language Resources Association, 2020: 52-58.
[33] 林立涛, 王东波, 刘江峰, 等. 数字人文视域下典籍动物命名实体识别研究——以SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(10): 42-50. (LIN L T, WANG D B, LIU J F, et al. Animal named entity recognition in ancient Chinese classics from the perspective of digital humanities: based on SikuBERT pre-training model[J]. Library tribune, 2022, 42(10): 42-50.)
[34] 袁义国, 李斌, 冯敏萱, 等. 基于深度学习的古籍文本自动断句与标点一体化研究[J]. 图书情报工作, 2022, 66(22): 134-141. (YUAN Y G, LI B, FENG M X, et al. A joint model of automatic sentence segmentation and punctuation for ancient classical texts based on deep learning, 2022, 66(22): 134-141.)
[35] 张逸勤, 邓三鸿, 胡昊天, 等. 预训练模型视角下的跨语言典籍风格计算研究[J]. 数据分析与知识发现, 2023, 7(10): 50-62. (ZHANG Y Q, DENG S H, HU H T, et al. Identifying styles of cross-language classics with pre-trained models[J]. Data analysis and knowledge discovery, 2023, 7(10): 50-62.)
[36] ZHENG X, LI M, WAN Z, et al. Knowledge mining and graph visualization of ancient Chinese scientific and technological documents bibliographic summaries based on digital humanities[J/OL]. Library hi tech[2024-09-16]. https://doi.org/10.1108/LHT-11-2022-0538.
[37] 常博林, 袁义国, 李斌, 等. 融合部首信息的古汉语自动分词与词性标注一体化分析[J/OL]. 数据分析与知识发现[2024-09-16]. http://kns.cnki.net/kcms/detail/10.1478.G2.20240108.1326.002.html. (CHANG B L, YUAN Y G, LI B, et al. A joint model of automatic word segmentation and part-of-speech tagging for ancient classical texts based on radicals[J/OL]. Data analysis and knowledge discovery[2024-09-16]. http://kns.cnki.net/kcms/detail/10.1478.G2.20240108.1326.002.html.)
[38] 张琪, 江川, 纪有书, 等. 面向多领域先秦典籍的分词词性一体化自动标注模型构建[J]. 数据分析与知识发现, 2021, 5(3): 2-11. (ZHANG Q, JIANG C, JI Y S, et al. Unified model for word segmentation and POS tagging of multi-domain pre-Qin literature[J]. Data analysis and knowledge discovery, 2021, 5(3): 2-11.)
[39] TANG XUEMEI S Q. Classifying ancient Chinese text relations with entity information[J]. Data analysis and knowledge discovery, 2023, 8(1): 114-124.
[40] 张力元, 王军. 基于机器学习的古籍目录互著与别裁探析[J]. 中国图书馆学报, 2022, 48(2): 47-61. (ZHANG L Y, WANG J. Research on inter record and analytic record of classical bibliography based on machine learning[J]. Journal of library science in China, 2022, 48(2): 47-61.)
[41] CHENG X. Graph network representation of traditional Chinese medicine prescriptions: bridging ancient wisdom with AI model development[D]. Hong Kong: Hong Kong Baptist University, 2023.
[42] 刘睿珩, 叶霞, 岳增营. 面向自然语言处理任务的预训练模型综述[J]. 计算机应用, 2021, 41(5): 1236-1246. (LIU R H, YE X, YUE Z Y. Review of pre-trained models for natural language processing tasks[J]. Journal of computer applications, 2021, 41(5): 1236-1246.)
[43] 崔竞烽, 郑德俊, 王东波, 等. 基于深度学习模型的菊花古典诗词命名实体识别[J]. 情报理论与实践, 2020, 43(11): 150-155. (CUI J F, ZHENG D J, WANG D B, et al. Named entity recognition of chrysanthemum poetry based on deep learning models[J]. Information studies:theory & application, 2020, 43(11): 150-155.)
[44] 高瑞卿, 董启文, 方达, 等. 数字技术下《老子》文本与先秦两汉典籍的关系挖掘[J]. 情报杂志, 2021, 40(10): 99-107. (GAO R Q, DONG Q W, FANG D, et al. Research on the relationship between the text of “Laozi” and the classics of the pre-Qin and Han dynasties based on digital technologies[J]. Journal of intelligence, 2021, 40(10): 99-107.)
[45] 纪有书, 王东波, 黄水清. 基于词对齐的古汉语同义词自动抽取研究——以前四史典籍为例[J]. 数据分析与知识发现, 2021, 5(11): 135-144. (JI Y S, WANG D B, HUANG S Q. Automatically extracting ancient Chinese synonyms with word alignment: case study of Pre-Four-History corpus[J]. Data analysis and knowledge discovery, 2021, 5(11): 135-144.)
[46] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[J]. ArXiv preprint arXiv:1810.04805, 2019.
[47] ZHOU F, WANG C, WANG J. Named entity recognition of ancient poems based on Albert-BiLSTM-MHA-CRF model[J]. Wireless communications and mobile computing, 2022, 2022(1): 6507719.
[48] YANG Z, CHEN K, CHEN J Q. Guwen-UNILM: machine translation between ancient and modern Chinese based on pre-trained models[C]//WANG L, FENG Y S, HONG Y, et al. Natural Language Processing and Chinese Computing. Cham: Springer, 2021: 116-128.
[49] 许乾坤, 王东波, 刘禹彤, 等. 基于UniLM模型的古文到现代文机器翻译词汇共享研究[J]. 情报资料工作, 2024, 45(1): 89-100. (XU Q K, WANG D B, LIU Y T, et al. Research on vocabulary sharing for machine translation from ancient Chinese to modern Chinese based on UniLM model[J]. Information and documentation services, 2024, 45(1): 89-100.)
[50] DUAN S, WANG J, YANG H, et al. Disentangling the cultural evolution of ancient China: a digital humanities perspective[J]. Humanities and social sciences communications, 2023, 10(1): 1-15.
[51] LI M, QIN Y, HUANGFU W. RoBERTa: an efficient dating method of ancient Chinese texts[C]//SU Q, XU G, YANG X Y. Chinese Lexical Semantics. Cham: Springer, 2023: 293-301.
[52] TIAN H, YANG K, LIU D, et al. Anchibert: a pre-trained model for ancient Chinese language understanding and generation[C]//2021 International joint conference on neural networks. Shenzhen: IEEE, 2021: 1-8.
[53] 南京农业大学信息管理学院. GuwenBERT: 古文预训练语言模型(古文BERT)[EB/OL]. [2024-09-20]. https://github.com/Ethan-yt/guwenbert. (School of information management of Nanjing Agricultural University. GuwenBERT: pre-trained language model for classical Chinese texts (Classical Chinese BERT) [EB/OL]. [2024-09-20]. https://github.com/Ethan-yt/guwenbert.
[54] 王东波, 刘畅, 朱子赫, 等. SikuBERT与SikuRoBERTa:面向数字人文的《四库全书》预训练模型构建及应用研究[J]. 图书馆论坛, 2022, 42(6): 31-43. (WANG D B, LIU C, ZHU Z H, et al. Construction and application of pre-training model of “Siku Quanshu” oriented to digital humanities[J]. Library tribune, 2022, 42(6): 31-43.)
[55] 南京农业大学信息管理学院GujiBERT-and-GujiGPT[EB/OL]. [2024-09-21]. https://github.com/hsc748NLP/GujiBERT-and-GujiGPT. (School of information management of Nanjing Agricultural University[EB/OL]. [2024-09-21]. https://github.com/hsc748NLP/GujiBERT-and-GujiGPT.)
[56] 孙文龙, 张逸勤, 王凡铭, 等. 面向数字人文的典籍语义词汇抽取研究——以SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(10): 31-41. (SUN W L, ZHANG Y Q, WANG F M, et al. Study on keyword extraction from ancient Chinese classics in the context of digital humanities: taking SikuBERT pre-training model for example[J]. Library tribune, 2022, 42(10): 31-41.)
[57] MINAEE S, MIKOLOV T, NIKZAD N, et al. Large language models: a survey[J]. ArXiv preprint arXiv:2402.06196, 2024.
[58] ZHANG Y, LI H. Can large language model comprehend ancient Chinese? A preliminary test on ACLUE[J]. ArXiv preprint arXiv:2310.09550, 2023.
[59] SI S, ZHOU S, TANG L, et al. Exploring the capabilities of ChatGPT in ancient Chinese translation and person name recognition[J]. ArXiv preprint arXiv:2312.15304, 2024.
[60] 张君冬, 杨松桦, 刘江峰, 等. AIGC赋能中医古籍活化:Huang-Di大模型的构建[J/OL]. 图书馆论坛[2024-09-16]. http://kns.cnki.net/kcms/detail/44.1306.G2.20240124.1341.002.html. (ZHANG J D, YANG S H, LIU J F, et al. AIGC empowering the revitalization of traditional Chinese medicine ancient books: a study on the construction of the Huang-Di large language model[J/OL]. Library tribune[2024-09-16]. http://kns.cnki.net/kcms/detail/44.1306.G2.20240124.1341.002.html.)
[61] CAO J, PENG D, SHI Y, et al. Translating ancient Chinese to modern Chinese at scale: a large language model-based approach[C]//Proceedings of ALT2023: ancient language translation workshop. Macau SAR: Asia-Pacific Association for Machine Translation, 2023: 61-69.
[62] WANG D B. XunziALLM[EB/OL]. (2024-02-20)[2024-02-23]. https://github.com/Xunzi-LLM-of-Chinese-classics/XunziALLM.
[63] 徐娟, 刘东华, 刘宇. 基于典籍文本挖掘的明清时期色彩知识研究[J]. 图书馆论坛, 2023, 43(3): 42-53. (XU J, LIU D H, LIU Y. A study of color knowledge in Ming and Qing dynasties based on text mining of classic works[J]. Library tribune, 2023, 43(3): 42-53.)
[64] 张建立, 李仁杰, 傅学庆, 等. 古诗词文本的空间信息解析与可视化分析[J]. 地球信息科学学报, 2014, 16(6): 890-897. (ZHANG J L, LI R J, FU X Q, et al. Spatial Information analysis and visualization analysis of the ancient poetry[J]. Journal of geo-information science, 2014, 16(6): 890-897.)
[65] 朱锁玲, 包平. 方志类古籍地名识别及系统构建[J]. 中国图书馆学报, 2011, 37(3): 118-124. (ZHU S L, BAO P. The identification and system construction of place names in local chronicles[J]. Journal of library science in China, 2011, 37(3): 118-124.)
[66] ZHAO J, WEI Y, WU B. Analysis of the social network and the evolution of the influence of ancient Chinese poets[J]. Social science computer review, 2022, 40(4): 1014-1034.
[67] 李娜, 包平. 方志类古籍中物产名与别名关系的可视化——基于社会网络分析技术视角[J]. 图书馆论坛, 2017, 37(12): 108-114. (LI N, BAO P. Visual exploration of the relationship between produce names and their alias in ancient local chronicles: from social network analysis perspective[J]. Library tribune library tribune, 2017, 37(12): 108-114.)
[68] 马创新, 陈小荷. 基于引文分析的古籍文献影响力评估[J]. 大学图书馆学报, 2016, 34(1): 16-24. (MA C X, CHEN X H. Influence assessment of ancient books based on citation analysis[J]. Journal of academic library, 2016, 34(1): 16-24.)
[69] 陈力. 数字人文视域下的古籍数字化与古典知识库建设问题[J]. 中国图书馆学报, 2022, 48(2): 36-46. (CHEN L. Digitalization of ancient books and construction of classical knowledge repository from the perspective of digital humanities[J]. Journal of library science in China, 2022, 48(2): 36-46.)
[70] 李世钰, 张向先, 沈旺, 等. 古籍数字化国内外研究现状分析与路径构建研究[J]. 现代情报, 2023, 43(11): 4-20. (LI S Y, ZHANG X X, SHEN W, et al. Research status and path construction of ancient book digitization in China and abroad[J]. Journal of modern information, 2023, 43(11): 4-20.)
[71] MIAO Y, LI L, JI Y, et al. Research on denoising method of Chinese ancient character image based on Chinese character writing standard model[J]. Scientific reports, 2022, 12(1): 19795.
[72] 卢玉琪. 藏文古籍文档图像超分辨率重建研究[D]. 兰州: 西北民族大学, 2023. (LU Y Q. Research on super-resolution reconstruction of the historical Tibetan document images[D]. Lanzhou: Northwest Minzu University, 2023.)
[73] WENJUN Z, BENPENG S, RUIQI F, et al. EA-GAN: restoration of text in ancient Chinese books based on an example attention generative adversarial network[J]. Heritage science, 2023, 11(1): 42.
[74] WU L, ZHANG C, XU M, et al. Ancient Chinese recognition method based on attention mechanism[C]//20217th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). Beijing: IEEE, 2021: 309-313.
[75] 谢恩泽, 吴政, 倪劼. 基于Faster-RCNN的古籍图像文字检测研究[J]. 新世纪图书馆, 2022(11): 61-66. (XIE E Z, WU Z, NI J. Research on image character detection of Ancient Books based on Faster-RCNN[J]. New century library, 2022(11): 61-66.)
[76] 李嘉俊, 明灿, 郭志浩, 等. 基于预训练语言模型的古籍文本智能补全研究[J]. 数据分析与知识发现, 2024, 8(5): 59-67. (LI J J, MING C, GUO Z H, et al. Research on intelligent completion of ancient texts based on pre-trained language models[J]. Data analysis and knowledge discovery, 2024, 8(5): 59-67.)
[77] ZHANG N, WAN A, HUANG J, et al. A system design of virtual reality enabled Chinese ancient books for enhancing reading promotion and culture dissemination[C]//STREITZ N A, KONOMI S. Distributed, Ambient and Pervasive Interactions. Smart Living, Learning, Well-being and Health, Art and Creativity. Cham: Springer International Publishing, 2022: 217-231.
[78] 王东波, 陆昊翔, 彭运海, 等. 面向《方志物产》的自动断句深度学习模型构建研究[J]. 中国科技史杂志, 2022, 43(2): 192-203. (WANG D B, LU H X, PENG Y H, et al. Research on the construction of a deep learning model of automatic sentence segmentation for produce in local chronicles[J]. The Chinese journal for the history of science and technology, 2022, 43(2): 192-203.)
[79] 王倩, 王东波, 李斌, 等. 面向海量典籍文本的深度学习自动断句与标点平台构建研究[J]. 数据分析与知识发现, 2021, 5(3): 25-34. (WANG Q, WANG D B, LI B, et al. Deep learning based automatic sentence segmentation and punctuation model for massive classical Chinese literature[J]. Data analysis and knowledge discovery, 2021, 5(3): 25-34.)
[80] 刘畅, 王东波, 胡昊天, 等. 面向数字人文的融合外部特征的典籍自动分词研究——以SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(6): 44-54. (LIU C, WANG D B, HU H T, et al. Automatic word segmentation of classic books with external features for digital humanities: a case study of SikuBERT pre-training model[J]. Library tribune, 2022, 42(6): 44-54.)
[81] 杨涛. 中文信息处理中的自动分词方法研究[J]. 现代交际, 2019(7): 93-95. (YANG T. Research on automatic word segmentation methods in Chinese information processing[J]. Modern Communication, 2019(7): 93-95.)
[82] 王姗姗, 王东波, 黄水清, 等. 多维领域知识下的《诗经》自动分词研究[J]. 情报学报, 2018, 37(2): 183-193. (WANG S S, WANG D B, HUANG S Q, et al. Research on the automatic word segmentation of the Book of Songs under multi-dimensional domain knowledge[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(2): 183-193.)
[83] 邢付贵, 朱廷劭. 基于大规模语料库的古文词典构建及分词技术研究[J]. 中文信息学报, 2021, 35(7): 41-46. (XING F G, ZHU T S. Large scale online corpus based classical integrated Chinese dictionary construction and word segmentation[J]. Journal of Chinese information processing, 2021, 35(7): 41-46.)
[84] 黄水清, 王东波, 何琳. 以《汉学引得丛刊》为领域词表的先秦典籍自动分词探讨[J]. 图书情报工作, 2015, 59(11): 127-133. (HUANG S Q, WANG D B, HE L. Exploring of word segmentation for pre-Qin literature based on the domain glossary of Sinological Index Series[J]. Library and information service, 2015, 59(11): 127-133.)
[85] CHICHE A, YITAGESU B. Part of speech tagging: a systematic review of deep learning and machine learning approaches[J]. Journal of big data, 2022, 9(1): 10.
[86] 耿云冬, 张逸勤, 刘欢, 等. 面向数字人文的中国古代典籍词性自动标注研究——以SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(6): 55-63. (GENG Y D, ZHANG Y Q, LIU H, et al. Automatic part-of-speech tagging of ancient Chinese texts in the context of digital humanities: a case study on SikuBERT’s pre-trained language model[J]. Library tribune, 2022, 42(6): 55-63.)
[87] CHANG Y, ZHU P, WANG C, et al. Automatic word segmentation and part-of-speech tagging of ancient Chinese based on BERT model[C]//SPRUGNOLI R, PASSAROTTI M. Proceedings of the second Workshop on Language Technologies for Historical and Ancient Languages. Marseille: European Language Resources Association, 2022: 141-145.
[88] TANG B, LIN B, LI S. Simple tagging system with RoBERTa for ancient Chinese[C]//SPRUGNOLI R, PASSAROTTI M. Proceedings of the second Workshop on Language Technologies for Historical and Ancient Languages. Marseille: European Language Resources Association, 2022: 159-163.
[89] 汤亚芬. 先秦古汉语典籍中的人名自动识别研究[J]. 现代图书情报技术, 2013(Z1): 63-68. (TANG Y F. Research of automatically recognizing name in pre-Qin ancient Chinese classics[J]. Data analysis and knowledge discovery, 2013(Z1): 63-68.)
[90] 王东波, 高瑞卿, 沈思, 等. 面向先秦典籍的历史事件基本实体构件自动识别研究[J]. 国家图书馆学刊, 2018, 27(1): 65-77. (WANG D B, GAO R Q, SHEN S, et al. Research on automatic recognition of basic entity component of historic events for pre-Qin classics[J]. Journal of the National Library of China, 2018, 27(1): 65-77.)
[91] 孙超, 张文博. 中医古籍文本术语命名实体识别的研究进展与挑战[J]. 中华中医药杂志, 2021, 36(11): 6843-6845. (SUN C, ZHANG W B. Research progress and challenges of named entity recognition of terms in ancient Chinese medicine books[J]. China journal of traditional Chinese medicine and pharmacy, 2021, 36(11): 6843-6845.)
[92] 谢靖, 刘江峰, 王东波. 古代中国医学文献的命名实体识别研究——以Flat-lattice增强的SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(10): 51-60. (XIE J, LIU J F, WANG D B. Study on named entity recognition of traditional Chinese medicine classics: taking SikuBERT pre-training model enhanced by the flat-lattice transformer for example[J]. Library tribune, 2022, 42(10): 51-60.)
[93] 王东波, 高瑞卿, 沈思, 等. 基于深度学习的先秦典籍问句自动分类研究[J]. 情报学报, 2018, 37(11): 1114-1122. (WANG D B, GAO R Q, SHEN S, et al. Deep learning-based classification of pre-Qin classics questions[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(11): 1114-1122.)
[94] 胡昊天, 张逸勤, 邓三鸿, 等. 面向数字人文的《四库全书》子部自动分类研究——以SikuBERT和SikuRoBERTa预训练模型为例[J]. 图书馆论坛, 2022, 42(12): 138-148. (HU H T, ZHANG Y Q, DENG S H, et al. Automatic text classification of “Zi” part of Siku Quanshu from the perspective of digital humanities: based on SikuBERT and SikuRoBERTa pre-trained models[J]. Library tribune, 2022, 42(12): 138-148.)
[95] 秦贺然, 刘浏, 李斌, 等. 融入实体特征的典籍自动分类研究[J]. 数据分析与知识发现, 2019, 3(9): 68-76. (QIN H R, LIU L, LI B, et al. Automatic classification of ancient classics with entity features[J]. Data analysis and knowledge discovery, 2019, 3(9): 68-76.)
[96] 武帅, 杨秀璋, 何琳. 多视图融合DJ-TextRCNN的古籍文本主题推荐研究[J]. 情报学报, 2024, 43(1): 61-75. (WU S, YANG X Z, HE L. Multi-view fusion DJ-TextRCNN for the theme recommendation of ancient texts[J]. Journal of the China Society for Scientific and Technical Information, 2024, 43(1): 61-75.)
[97] 周好, 王东波, 黄水清. 古籍引书上下文自动识别研究——以注疏文献为例[J]. 情报理论与实践, 2021, 44(9): 169-175. (ZHOU H, WANG D B, HUANG S Q. Automatic recognition citation context in early Chinese literature: take the annotated literature as an example[J]. Information studies:theory & application, 2021, 44(9): 169-175.)
[98] 舒非, 丰鹂萱, 邱均平, 等. 基于我国古籍引经据典现象的引文分析研究[J]. 情报学报, 2021, 40(12): 1338-1346. (SHU F, FENG L X, QIU J P, et al. Exploring the function of citation using ancient Chinese literature[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(12): 1338-1346.)
[99] 刘浏, 齐月, 刘雏菲, 等. 计算人文下的古籍引书研究及全文本知识库的构建[J]. 情报学报, 2023, 42(12): 1498-1512. (LIU L, QI Y, LIU C F, et al. Research on ancient book citations from the perspective of computational humanities and the construction of full-text knowledge base[J]. Journal of the China Society for Scientific and Technical Information, 2023, 42(12): 1498-1512.)
[100] 黄水清, 周好, 彭秋茹, 等. 引书的自动识别及文献计量学分析[J]. 情报学报, 2021, 40(12): 1325-1337. (HUANG S Q, ZHOU H, PENG Q R, et al. Automatic recognition and bibliometric analysis of cited books[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(12): 1325-1337.)
[101] 李亚超, 熊德意, 张民. 神经机器翻译综述[J]. 计算机学报, 2018, 41(12): 2734-2755. (LI Y C, XIONG D Y, ZHANG M. A survey of neural machine translation[J]. Chinese journal of computers, 2018, 41(12): 2734-2755.)
[102] GUO G, YANG J, LU F, et al. Towards effective ancient Chinese translation: dataset, model, and evaluation[J]. Natural language processing and Chinese computing, 2023, 14303: 416-427.
[103] 吴梦成, 林立涛, 许乾坤, 等. 融合不同语义知识的中国古代典籍机器翻译研究[J]. 情报资料工作, 2024, 45(2): 97-104. (WU M C, LIN L T, XU Q K, et al. Research on machine translation of ancient Chinese classics integrating different semantic knowledge[J]. Information and documentation services, 2024, 45(2): 97-104.)
[104] 刘浏, 王东波, 黄水清, 等. 数字人文视野下的古汉语实体歧义研究[J]. 图书与情报, 2020(5): 115-124. (LIU L, WANG D B, HUANG S Q, et al. Research on ancient Chinese entity ambiguity in digital humanities[J]. Library & information, 2020(5): 115-124.)
[105] 徐健, 何琳, 刘浏, 等. 基于标目数据的《春秋》三传人物信息组织与处理流程[J]. 图书馆论坛, 2024, 44(9): 103-110. (XU J, HE L, LIU L, et al. The person information organization and processing process of the three comments based on word heading[J]. Library tribune, 2024, 44(9): 103-110.)
[106] 张琪, 王东波, 黄水清, 等. 时间维度下的史籍全文自动重组研究——数字人文视角下的探索[J]. 图书情报知识, 2022, 39(1): 51-60+147. (ZHANG Q, WANG D B, HUANG S Q, et al. Automatic reorganization of historical records from time dimension: from the perspective of digital humanities[J]. Documentation, information & knowledge, 2022, 39(1): 51-60+147.)
[107] 马刘凤. 古籍同书异名与同名异书原因探析[J]. 图书馆理论与实践, 2013(10): 76-79. (MA L F. Exploration of the reasons for same book different names and different books same names in ancient texts[J]. Library theory and practice, 2013(10): 76-79.)
[108] 黄水清, 刘浏, 王东波. 计算人文学科的内涵、体系及机遇[J]. 图书与情报, 2023(1): 1-11, 145, 153. (HUANG S Q, LIU L, WANG D B. The connotation, system and opportunity of computational humanities[J]. Library & information, 2023(1): 1-11, 145, 153.)
[109] WANG L, WANG J, TONG W. Using ontology to organize Chinese ancient books in the digital age[J]. Proceedings of the Association for Information Science and Technology, 2023, 60(1): 712-716.
[110] 许乾坤, 王东波, 刘禹彤, 等. 面向知识服务的古籍知识库构建研究[J/OL]. 情报科学[2024-09-16]. http://kns.cnki.net/kcms/detail/22.1264.G2.20240129.0947.014.html. (XU Q K, WANG D B, LIU Y T, et al. The construction of digital knowledge base of ancient books oriented to knowledge service[J]. Information science[2024-09-16]. http://kns.cnki.net/kcms/detail/22.1264.G2.20240129.0947.014.html.)
[111] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600. (LIU Q, LI Y, DUAN H, et al. Knowledge graph construction techniques[J]. Journal of computer research and development, 2016, 53(3): 582-600.)
[112] 欧阳剑, 梁珠芳, 任树怀. 大规模中国历代存世典籍知识图谱构建研究[J]. 图书情报工作, 2021, 65(5): 126-135. (OU Y J, LIANG Z F, REN S H. Research on the construction of knowledge graph of large-scale Chinese ancient books[J]. Library and information service, 2021, 65(5): 126-135.)
[113] ZHOU Y, QI X, HUANG Y, et al. Research on construction and application of TCM knowledge graph based on ancient Chinese texts[C]//IEEE/WIC/ACM international conference on Web intelligence - companion volume. New York: Association for Computing Machinery, 2019: 144-147.
[114] 刘欢, 刘浏, 王东波. 数字人文视角下的领域知识图谱自动问答研究[J]. 科技情报研究, 2022, 4(1): 46-59. (LIU H, LIU L, WANG D B. Research on automatic question answering of domain knowledge graph from the perspective of digital humanities[J]. Scientific information research, 2022, 4(1): 46-59.)
[115] 张卫, 王昊, 王东波, 等. 以数据关联促文学认知:古诗隐喻文化图式的语义组织方法[J]. 图书情报工作, 2024, 68(4): 109-123. (ZHANG W, WANG H, WANG D B, et al. Data association for literary cognition:a semantic organization approach to the metaphorical cultural schema of classical Chinese poetry[J]. Library and information service, 2024, 68(4): 109-123.)
[116] 徐润华, 王东波, 刘欢, 等. 面向古籍数字人文的《资治通鉴》自动摘要研究——以SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(12): 129-137. (XU R H, WANG D B, LIU H, et al. Automatic summarization of ZiZhi TongJian from the perspective of digital humanities based on ancient Chinese books: a case of SikuBERT pre-training model[J]. Library tribune, 2022, 42(12): 129-137.)
[117] 王东波, 黄水清, 何琳. 基于多特征知识的先秦典籍词性自动标注研究[J]. 图书情报工作, 2017, 61(12): 64-70. (WANG D B, HUANG S Q, HE L. Researches of automatic part-of-speech tagging for pre-Qin literature based on multi-feature knowledge[J]. Library and information service, 2017, 61(12): 64-70.)
[118] 何琳, 陈雅玲, 孙珂迪. 面向先秦典籍的知识本体构建技术研究[J]. 图书情报工作, 2020, 64(7): 13-19. (HE L, CHEN Y L, SUN K D. Research on ontology building methods of Chinese ancient books[J]. Library and information service, 2020, 64(7): 13-19.)
[119] 梁继文, 江川, 王东波. 基于多特征融合的先秦典籍汉英句子对齐研究[J]. 数据分析与知识发现, 2020, 4(9): 123-132. (LIANG J W, JIANG C, WANG D B. Chinese-English sentence alignment of ancient literature based on multi-feature fusion[J]. Data analysis and knowledge discovery, 2020, 4(9): 123-132.)
[120] 张琪, 王东波, 黄水清, 等. 史书多维知识重组与可视化研究——以《史记》为对象[J]. 情报学报, 2022, 41(2): 130-141. (ZHANG Q, WANG D B, HUANG S Q, et al. Multi-dimensional knowledge reorganization and visualization of history books: based on records of the Grand Historian[J]. Journal of the China Society for Scientific and Technical Information, 2022, 41(2): 130-141.)
[121] 郑童哲恒, 李斌, 冯敏萱, 等. 历史典籍的结构化探索——《史记·列传》数字人文知识库的构建与可视化研究[J]. 大数据, 2022, 8(6): 40-55. (ZHENG T Z H, LI B, FENG M X, et al. Explore the structuration of historical books: the construction and quantitative analysis of digital humanities database of the Biographies of the Shiji[J]. Big data research, 2022, 8(6): 40-55.)
[122] 范彦晓, 赵燕强, 李良群, 等. 古籍中灵芝美容养颜功效的民族植物学考证[J]. 时珍国医国药, 2019, 30(9): 2220-2222. (FAN Y X, ZHAO Y Q, LI L Q, et al. Ethnobotanical evidence of the beauty and anti-aging effects of Lingzhi in ancient Chinese texts[J]. Lishizhen medicine and materia medica research, 2019, 30(9): 2220-2222.)
Outlines

/