收稿日期: 2014-11-25
修回日期: 2014-12-20
网络出版日期: 2015-01-05
A Review on Federated Search
Received date: 2014-11-25
Revised date: 2014-12-20
Online published: 2015-01-05
杨海锋 , 陆伟 . 联邦检索研究综述[J]. 图书情报工作, 2015 , 59(1) : 134 -143 . DOI: 10.13266/j.issn.0252-3116.2015.01.018
[Purpose/significance] The paper summarized research status, and put forward future research direction about federated search.[Method/process] Based on a large number of literature research, this paper has summarized and reviewed to federated search.[Result/conclusion] Research questions of federated search are mainly related to collection representation, collection selection and result merging. Some algorithms have been proposed in every aspect from different angles. But authoritative data sets and uniform evaluation criteria still relatively scarce. Although theory and technology of federated search were widely used, many new research topics have been raised in big data environment.
[1] Gogoi K,Borthakur J, Sarmah M. Federated search: An information retrieval strategy for scholarly literature[C]//Proceedings the 8th Convention PLANNER-2012 Sikkim University. Gangtok:2012.
[2] Kopliku A, Pinel-Sauvagnat K, Boughanem M. Aggregated search:A new information retrieval paradigm[J]. ACM Computing Surveys,2014,46(3):41.
[3] 陈家翠.联邦检索机制及其存在的问题[J].图书情报工作,2006,50(6):87-89.
[4] Shokouhi M, Si Luo. Federated search[J]. Foundation and Trends in Information Retrieval, 2011,5(1):1-102.
[5] Avrahami T T, Yau L, Si Luo, et al. The fedLemur project: Federated search in the real world[J]. Journal of the American Society for Information Science and Technology,2006,57(3):347-358.
[6] Elsas J L, Arguello J, Callan J, et al.Retrieval and feedback models for blog feed search[C]//Proceedings of SIGIR.New York:ACM,2008:347-354.
[7] Seo J, Croft W B.Blog site search using resource selection[C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. New York:ACM,2008:1053-1062.
[8] Demeester T, Trieschnigg D, Nguyen D, et al.Overview of the TREC 2013 federated Web search track[EB/OL]. [2014-08-10].http://snipdex.org/fedweb.
[9] Callan J, Crestani F, Nottelmann H, et al.Resource selection and data fusion in multimedia distributed digital libraries[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieva.New York:ACM,2003:363-364.
[10] Si Luo, Callan J, Cetintas S, et al. An effective and efficient results merging strategy for multilingual information retrieval in federated search environments[J]. Information Retrieval,2008,11(1):1-24.
[11] Wu Shengli, Crestani F. Shadow document methods of results merging[C]//Proceedings of the 2004 ACM Symposium on Applied Computing.New York:ACM,2004:1067-1072.
[12] Voorhees E M, Gupta N K, Johnson-Laird B. Learning collection fusion strategies[C]//Proceedings of the 18th Annual International ACM SIGIR Conference. New York:ACM,1995:172-179.
[13] Mourao A, FMartins F, Magalhaes J. NovaSearch at TREC 2013 federated Web search track: Experiments with rank fusion. [2014-08-12].https://sites.google.com/site/trecfedweb/.
[14] Nguyen D, Demeester T,Trieschnigg D, et al. Federated search in the Wild[C]//Proceedings of the 21th ACM Conference on Information and Knowledge Management.New York:ACM,2012:1874-1878.
[15] 田燕.中外跨库检索平台的功能分析及展望[J].农业图书情报学刊,2009,21(7):5-8.
[16] 李倩. 跨库检索工具分析及在图书馆的应用[J].现代情报,2011,31(10):91-94.
[17] 李广建,张智雄.国外跨库检索系统研究项目及其特点[J].情报理论与实践,2004,24(7):444-447.
[18] Crestani F, Markov I. Distributed information retrieval and applications[C]//Proceedings the 35th ECIR Conference.Berlin: Springer-Verlag, 2013:865-868.
[19] Shokouhi M, Baillie M, Azzopardi L. Updating collection representations for federated search[C]//Proceedins of the 21th ACM SIGIR Conference. New York:ACM,2007:23-27.
[20] D'Souza D, Thom J A, Zobel J. Collection selection for managed distributed document databases[J]. Information Processing and Management, 2004,40(3): 527-546.
[21] Callan J, Connell M. Query-based sampling of text databases[J].ACM Transactions on Information Systems,2001,2(19):97-130.
[22] Gravano L, Chen Chuan, Garcia-Molina H, et al. START'S Stanford proposal for Internet mets-searching[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.New York:ACM,1997:207-218.
[23] Gravano L, Garc'ia-Molina H, Tomasic A. The effectiveness of GlOSS for the text database discovery problem[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. New York:ACM,1994:126-137.
[24] Meng Weiyi, Wu Zhonghua, Yu C, et al. A highly scalable and effective method for metasearch[J]. ACM Transactions on Information Systems, 2001,19(3):310-335.
[25] Yuwono B, Lee D. Server raning for distributed text retrieval systems on the Internet[C]//Proceedings of 5th International Conference on Database Systems for Advanced Applications.Berlin: Springer-Verlag, 1997:41-50.
[26] Gravano L. Querying multiphe document collections across the Internet[D].Palo Alto:Stanford University, 1997.
[27] Arguello J, Diaze F, Callan J.Sources of evidence for vertical selection[C]//Proceedings of the 32nd International ACM SIGIR Conference. New York: 2009: 315-322.
[28] Arguello J, Callan J, Diaz F.Classification-based resource selection[C]//Proceedings of the 18th ACM CIKM. New York:ACM,2009:1277-1286.
[29] Kim J, Croft B. Ranking using multiple document types in desktop search[C]//Proceedings of the 33rd ACM SIGIR. New York:ACM,2010: 50-57.
[30] Shokouhi M, Scholer F, Zobel J. Sample sizes for query probing in uncooperative distributed information retrieval[J]. Lecture Notes in Computer Science,2006,3841:73-75.
[31] Caverlee J, Liu Ling,Bae J. Distributed query sampling:A quality-conscious approach[C]//Proceedings of the 29th Annual International ACM SIGIR.New York:ACM,2006:6-11.
[32] Ipeirotis P G, Gravano L. When one sample is not enough:Improving text database selection using shrinkage[C]//Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data.New York:ACM,2004:767-778.
[33] Ipeirotis P G, Gravano L. Distributed search over the hidden Web:Hierarchical database sampling and selection[C]//Proceedings of the 28th VLDB Conference.San Fransisco: Morgan Kaufmann Press,2002:322-333.
[34] 汪语宇,张丽.集成检索系统中资源选择技术及算法[J].图书情报工作,2005,49(10):29-32.
[35] 雷雪.分布式检索中信息集选择方法研究综述[J].情报科学,2008,26(2):316-320.
[36] Gravano L,Garcia-Molinat H, Tomasic A. The effectiveness of GlOSS for the text database discovery problem[C]//Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1994: 126-137.
[37] Callan J P, Lu Z,Croft W.Searching distributed collections with inference networks[ C] //Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM, 1995:21-28.
[38] Si Luo, Lu Jie, Callan J. Distributed information retrieval with skewed database size distributions[C]//Proceedings of the 2003 Annual National Conference on Digital Government Research.Sacramento: Digital Government Society of North America,2003:1-6.
[39] Nottelmann H, Fuhr N. Evaluating different methods of estimating retrieval quality for resource selection[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2002:290-297.
[40] Si Luo,Callan J. Relevant document distribution estimation method for resource selection[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval.New York:ACM,2003:298-305.
[41] TREC blog track[EB/OL].[2014-08-10].http://trec.nist.gov/data/blog.html.
[42] Shokouhi M. Central-rank-based collection selection[C] //Proceeding of the 29th European Conference on Information Retrieval Research. Berlin:Springer-Verlag,2007:160-172.
[43] Thomas P, Shokouhi M. SUSHI: Scoring scaled samples for server selection[C]//Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2009:419-426.
[44] Si Luo, Callan J. Unified utility maximization framework for resource selection[C]//Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management.New York:ACM,2004:32-41.
[45] Si Luo, Callan J. Modeling search engine effectiveness for federated search[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2005:15-19.
[46] Arguello J, Callan J, Diaz F. Classification-based resource selection[C]//Proceedings of the 18th ACM International Conference on Information and Knowledge Management. New York: ACM, 2009: 1277-1286.
[47] Voorhees E M, Gupta N K, Johnson-Laird B. Learning collection fusion strategies[C]//Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,1995:172-179.
[48] Cetintas S, Si Luo, Hao Yuan.Learning from past queries for resource selection[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management.New York:ACM,2009:1867-1870.
[49] Ipeirotis P G, Gravano L. Classification-aware hidden-Web text database selection[J]. ACM Transactions on Information Systems,2008,26(2):6.
[50] Hong D, Si Luo. Search result diversification in resource selection for federated search[C]//Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2013:613-622.
[51] Balog K, Neumayer R, Nørv°ag K. Collection ranking and selection for federated entity search[C]//Proceedings of String Processing and Information Retrieval. Berlin: Springer-Verlag, 2012: 73-85.
[52] Paltoglou G, Salampasis M, Satratzemi M. Collection-integral source selection for uncooperative distributed information retrieval environments[C]//Proceedings of the 2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval. New York:ACM,2008:67-74.
[53] Shokouhi M, Baillie M, Azzopardi L. Updating collection representations for federated search[C]//Proceedings of the 30th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2007:511-518.
[54] French J C, Powell A L, Callan J, et al. Comparing the performance of database selection algorithms[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,1999:238-245.
[55] Bender M, Michel S, Triantafillou P. Improving collection selection with overlap awareness in P2P search engines[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2005:15-19.
[56] Shokouhi M, Zobel J. Federated text retrieval from uncooperative overlapped collections[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2007:23-27.
[57] Powell A L, French J C. Comparing the performance of collection selection algorithms[J]. ACM Transactions on Information Systems, 2003,21(4):412-456.
[58] 雷雪,卢涛.分布式检索中查询结果合并策略研究[J].情报理论与实践,2007,30(4):558-561.
[59] Shokouhi M, Zobel J. Robust result merging using sample-based score estimates[J]. ACM Transactions on Information Systems,2009,27(3):14.
[60] Rasolofo Y, Abbaci F, Savoy J.Approaches to collection selection and rsults mergingfor distributed information retrieval[C]//Proceedings of the Tenth International Conference on Information and Knowledge Managment. New York:ACM,2001:191-198.
[61] Si Luo,Callan J. Using sampled data and regression to merge search engine results[C]//Proceedings of the 25 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieva. New York:ACM,2002: 19-26.
[62] Si Luo, Callan J A. Semisupervised learning method to merge search engine results[J]. ACM Transactions on Information Systems,2003,21(4):457-491.
[63] Paltoglou G, Salampasis M, Satratzemi M. A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase[J]. Information Processing and Management,2008,44(4):1580-1599.
[64] Lu Jie, Callan J. Merging retrieval results in hierarchical peer-to-peer neworks[C]//Proceedings of the 27th Annual International ACM SIGIR Conference. New York:ACM,2004:25-29.
[65] He Chuan, Hong D, Si Luo. A weighted curve fitting method for result merging in federated search[C]//Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2011:24-28.
[66] Hong D, Si Luo. Mixture model with multiple centralized retrieval algorithms for result merging in federated search[C]//Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieva. New York:ACM,2012: 821-830.
[67] Li Pengfei, Thomasn P, Hawking D.Merging algorithms for enterprise search[C]//Proceedings of the 18th Australasian Document Computing Symposium.New York:ACM,2013:42-49.
[68] Rasolofo Y,Hawking D,Savoy J. Result merging strategies for a current news metasearcher[J]. Information Processing and Management 2003,39(4):581-609.
[69] Wu Shengli,McClean S. Result merging methods in distributed information retrieval with overlapping databases[J]. Information Retrieval, 2007,10(3):297-319.
[70] French J C, Powell A L. Metrics for evaluating database selection techniques[J]. World Wide Web,2000,3(3):153-163.
/
〈 | 〉 |