[1] QIAN Y, DU Y, DENG X, et al. Detecting new Chinese words from massive domain texts with word embedding[J]. Journal of information science, 2018, 45(2): 196-211.
[2] CHEN X, SHI Z, QIU X, et al. Adversarial multi-criteria learning for Chinese word segmentation[C]//Proceedings of the 55th annual meeting of the association for computational linguistics. Vancouver: Association for Computational Linguistics, 2017: 1193-1203.
[3] LILEIKYTE R, FRAGA-SILVA T, LAMEL L, et al. Effective keyword search for low-resourced conversational speech[C]//2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). New Orleans: IEEE, 2017: 5785-5789.
[4] SHEIKH I, FOHR D, ILLINA I, et al. Modelling semantic context of OOV words in large vocabulary continuous speech recognition[J]. IEEE/ACM transactions on audio, speech, and language processing, 2017, 25(3): 589-610.
[5] HUANG C, ZHAO H. Chinese word segmentation: a decade review[J]. Journal of Chinese information processing, 2007, 21(3): 8-19.
[6] 曹艳, 杜慧平, 刘竟, 等. 基于词表和N-gram算法的新词识别实验[J]. 情报科学, 2007(11): 1687-1691. (CAO Y, DU H P, LIU J, et al. An experiment of new words identification based on vocabulary and N-gram algorithm[J]. Information science, 2007(11): 1687-1691.)
[7] GOH C, ASAHARA M, MATSUMOTO Y. Machine learning-based methods to Chinese unknown word detection and POS tag guessing[J]. Journal of Chinese language and computing, 2006, 16: 185-206.
[8] LI H, HUANG C, GAO J, et al. The use of SVM for Chinese new word identification[C]//Proceedings of the first international joint conference on natural language processing. Hainan Island: Springer-Verlag, 2004: 723-732.
[9] CHEN F, LIU Y, WEI C, et al. Open domain new word detection using condition random field method[J]. Journal of software, 2013, 24(5): 1051-1060.
[10] WANG A, KAN M. Mining informal language from Chinese microtext: joint word recognition and segmentation[C]//Proceedings of the 51st annual meeting of the association for computational linguistics. Sofia: Association for Computational Linguistics, 2013: 731-741.
[11] PENG F, FENG F, MCCALLUM A. Chinese segmentation and new word detection using conditional random fields[C]//Proceedings of the 20th international conference on computational linguistics. Geneva: Association for Computational Linguistics, 2004: 562.
[12] GANG Z, YANG L, QUN L. Internet-oriented Chinese new words detection[J]. Journal of Chinese information processing, 2004, 6(18): 1-9.
[13] ZHENG Y, LIU Z, SUN M, et al. Incorporating user behaviors in new word detection[C]//Proceedings of the 21st international joint conference on artificial intelligence. Pasadena: Morgan Kaufmann Publishers Inc, 2009: 2101-2106.
[14] 邹纲, 刘洋, 刘群, 等.面向Internet的中文新词语检测[J]. 中文信息学报, 2004, 18(6): 1-9. (ZOU G, LIU Y, LIU Q, et al. Internet-oriented Chinese new words detection[J]. Journal of Chinese information processing, 2004, 18(6): 1-9.)
[15] MA W Y, CHEN K J. A bottom-up merging algorithm for Chinese unknown word extraction[C]//Proceedings of the 2nd SIGHAN workshop on Chinese language processing. Sapporo: Association for Computational Linguistics, 2003: 31-38.
[16] 郑家恒, 李文花. 基于构词法的网络新词自动识别初探[J].山西大学学报(自然科学版), 2002(2): 115-119. (ZHANG J H, LI WH. A study on automatic identification for internet new words according to word-building rule[J]. Journal of Shanxi university (natural science edition), 2002(2): 115-119.)
[17] LI X, CHEN X. New word discovery algorithm based on N-gram for multi-word internal solidification degree and frequency[C]//2020 5th international conference on control, robotics and cybernetics (CRC). Piscataway: IEEE, 2020: 51-55.
[18] YAN L, BAI B, CHEN W, et al. New word extraction from Chinese financial documents[J]. IEEE signal processing letters, 2017, 24(6): 770-773.
[19] ZHU G L, LIU W T, ZHANG S X, et al. The method for extracting new login sentiment words from Chinese micro-blog based on improved mutual information[J]. Computer systems science and engineering, 2020, 35(3): 223-232.
[20] HUANG J, POWERS D. Chinese word segmentation based on contextual entropy[C]//Proceedings of the 17th Pacific Asia conference on language, information and computation. Sentosa: Colips Publications, 2003: 152-158.
[21] LEE C W, WU Y L, YU L C. Combining mutual information and entropy for unknown word extraction from multilingual code-switching sentences[J]. Journal of information science and engineering, 2019, 35(3): 597-610.
[22] CUI S, LIU Q, MENG Y, et al. New word detection based on large-scale corpus[J]. Journal of computer research and development, 2006, 43: 927.
[23] PECINA P, PAVEL S. Combining association measures for collocation extraction[C]//Proceedings of the 21th international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (COLING/ACL 2006). Sydney: Association for Computational Linguistics, 2006: 651-658.
[24] DU L P, LI X G, LIN D Y. Chinese term extraction from web pages based on expected point-wise mutual information[C]//2016 12th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). Piscataway: IEEE, 2016: 1647-1651.
[25] 徐豪杰, 吴新丽, 杨文珍, 等. 基于改进PMI和最小邻接熵结合策略的未登录词识别[J]. 计算机系统应用, 2020, 29(6): 181-188. (XU H J, WU X L, YANG W Z, et al. Out-of-vocabulary detection based on combination strategy of improved PMI and minimum branch entropy[J]. Computer systems & applications, 2020, 29(6): 181-188.)
[26] 曹帅. 结合关联置信度与结巴分词的新词发现算法[J]. 计算机系统应用, 2020, 29(5): 144-151. (CAO S. New word detection algorithm combining correlation confidence and jieba word segmentation[J]. Computer systems & applications, 2020, 29(5): 144-151.)
[27] XIE T, WU B, WANG B. New word detection in ancient Chinese literature[C]//Asia-Pacific Web (APWeb) and Web-Age information management (WAIM) joint conference on Web and big data. Berlin: Springer, 2017, 10367: 260-275.
[28] JIANG D C, CHEN X Y, YANG X. A Chinese new word detection approach based on independence testing[C]//13th international conference on artificial intelligence and symbolic computation. Berlin: Springer, 2018, 11110: 227-236.
[29] JIANG D, JIANG A, TANG S. An adaptive method for Chinese new word detection based on hypothesis testing[J]. Pattern analysis and applications, 2022, 25: 993-999.
[30] 王欣. 一种基于多字互信息与邻接熵的改进新词合成算法[J]. 现代计算机(专业版), 2018, 4(1): 7-11. (WANG X. An improved new word synthesis algorithm based on multi word mutual information and branch entrop[J]. Modern computer, 2018, 4(1): 7-11.)
[31] 李文坤, 张仰森, 陈若愚. 基于词内部结合度和边界自由度的新词发现[J]. 计算机应用研究, 2015, 32(8): 2302-2304. (LI W K, ZHANG Y S, CHEN R Y. New word detection based on inner combination degree and boundary freedom degree of word[J]. Application research of computers, 2015, 32(8): 2302-2304.)
[32] 周霜霜, 徐金安, 陈钰枫, 等. 融合规则与统计的微博新词发现方法[J]. 计算机应用, 2017,37(4):1044-1050. (ZHOU S S, XU J A, CHEN Y F, et al. New words detection method for microblog text based on integrating of rules and statistics[J]. Journal of computer applications, 2017, 37(4): 1044-1050.)
[33] SUN X. Fast online training with frequency-adaptive learning rates for Chinese word segmentation and new word detection[C]//Proceedings of the 50th annual meeting of the association for computational linguistics. Jeju Island: Association for Computational Linguistics, 2012: 253-262.
[34] 杜丽萍, 李晓戈, 于根, 等. 基于互信息改进算法的新词发现对中文分词系统改进[J]. 北京大学学报(自然科学版), 2016, 52(1): 35-40. (DU L P, LI X G, YU G, et al. New word detection based on an improved PMI algorithm for enhancing segmentation system[J]. Acta scientiarum naturalium universitatis Pekinensis, 2016, 52(1): 35-40.)
[35] MEI L, HUANG H, WEI X, et al. A novel unsupervised method for new word extraction[J]. Science China (information sciences), 2016, 59(9): 11-21.
[36] 刘伟童, 刘培玉, 刘文锋, 等. 基于互信息和邻接熵的新词发现算法[J]. 计算机应用研究, 2019, 36(5): 1293-1296. (LIU W T, LIU P Y, L W F, et al. New word discovery algorithm based on mutual information and branch entropy[J]. Application research of computers, 2019, 36(5): 1293-1296.)
[37] JIA Y, LIU L, CHEN H, et al. A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth[J]. Pattern analysis and applications, 2020, 23(2): 1011-1020.
[38] MEI L L, HUANG H Y, WEI X C, et al. A novel unsupervised method for new word extraction[J]. Science China-information sciences, 2016, 59: 92102.
[39] 赵京胜, 宋梦雪, 高祥, 等. 自然语言处理中的文本表示研究[J]. 软件学报, 2022, 33(1): 102–128. (ZHAO J S, SONG M X, GAO X, et al. Research on text representation in natural language processing[J]. Journal of software, 2022, 33(1): 102-128.
[40] 张婧, 黄锴宇, 梁晨, 等. 面向中文社交媒体语料的无监督新词识别研究[J]. 中文信息学报, 2018, 32(3):17-25. (ZHANG J, HUANG K Y, LIANG C, et al. Unsupervised new word extraction from Chinese social media data[J]. Journal of Chinese information processing, 2018, 32(3): 17-25.)
[41] QIAN Y, DU Y, DENG X W, et al. Detecting new Chinese words from massive domain texts with word embedding[J]. Journal of information science, 2019, 45(2): 196-211.
[42] 赵志滨, 石玉鑫, 李斌阳. 基于句法分析与词向量的领域新词发现方法[J]. 计算机科学, 2019,46(6):29-34. (ZHAO Z B, SHI Y X, LI B Y. Newly-emerging domain word detection method based on syntactic analysis and term vector[J]. Computer science, 2019, 46(6): 29-34.)
[43] DU Y, YUAN H, QIAN Y. A word vector representation based method for new words discovery in massive text[C]//5th CCF conference on natural language processing and Chinese computing (NLPCC 2016) and 24th international conference on computer processing of oriental languages (ICCPOL 2016). Kunming: Springer, 2016, 10102: 76-88.
[44] 张乐, 冷基栋, 吕学强, 等. MWEC:一种基于多语义词向量的中文新词发现方法[J].数据分析与知识发现, 2022, 6(1): 113-121. (ZHANG L, LENG J D, LV X Q, et al. Discovering Chinese new words based on multi-sense word embedding[J]. Data analysis and knowledge discovery, 2022, 6(1): 113-121.)
[45] 潘嘉鑫. 基于互信息和左右邻接熵改进的新词发现算法及情感分析[D]. 武汉: 华中科技大学, 2022. (PAN J X. Improved new word discovery algorithm and sentimentanalysis based on mutual information and left and right neighbor entropy[D]. Wuhan: Huazhong University of Science and Technology, 2022.)
[46] WANG Y, ANANIADOU S, TSUJII J. Improving clinical named entity recognition in Chinese using the graphical and phonetic feature[J]. BMC medical informatics and decision making, 2019, 19: 273.
[47] ULLMANN J R. A binary N-gram technique for automatic correction of substitution, deletion, insertion and reversal errors in words[J]. The computer journal, 1977, 20: 141-147.
[48] DEVLIN J, MING-WEI C, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[Z]. Ithaca: Cornell University Library, 2019: 4171-4186.
[49] WANG W J, LI X Y, REN H L, et al. Chinese clinical named entity recognition from electronic medical records based on multisemantic features by using robustly optimized bidirectional encoder representation from transformers pretraining approach whole word masking and convolutional neural networks: model development and validation[J]. JMIR medical informatics, 2023,11: e44597.