Technology Improvement and Optimization of Massive Data Analysis Process by the Three Google Cloud Computing Techniques

  • Lu Xiaobin ,
  • Wang Tao
Expand
  • School of Information Resource Management, Renmin University of China, Beijing 100872

Received date: 2014-09-02

  Revised date: 2015-01-12

  Online published: 2015-02-05

Abstract

[Purpose/significance] Massive data analysis constructed in the cloud computing environment is a data calculation which needs to preload large data sets. Aiming at the analysis quality and efficiency issues caused by the detail way of massive data analysis and processing by the traditional methods, this paper uses the three Google cloud computing techniques to improve it.[Method/process] Applying literature research, content analysis and technical analysis to the three Google cloud computing technology: GFS, MapReduce and Bigtable, this paper summarizes the deployment innovation and design improvement of Google cloud computing technology in data processing, technology framework and algorithm model.[Result/conclusion] Comparing Google cloud computing technology comparative analysis with traditional local data processing mode, this paper concludes the processing advantages of Google cloud computing technology in operating massive data analysis. According to the Google cloud computing, we propose technology optimization and improvement of massive data analysis process in the three aspects-store and access, organization and management, as well as parallel processing.

Cite this article

Lu Xiaobin , Wang Tao . Technology Improvement and Optimization of Massive Data Analysis Process by the Three Google Cloud Computing Techniques[J]. Library and Information Service, 2015 , 59(3) : 6 -11,102 . DOI: 10.13266/j.issn.0252-3116.2015.03.001

References

[1] 陆嘉恒,文继荣,毛新生,等.分布式系统级云计算概论[M].北京:清华大学出版社,2011:107-108.
[2] 刘鹏.云计算[M].2版.北京:电子工业出版社,2011:17-19.
[3] Ghemawat S, Gobioff H, Leung Shun-Tak. The Google file system[C]//Proceedings of 19th ACM Symposium on Operating Systems Principles.New York:ACM, 2003:20-43.
[4] 林运章.并行文件系统缓存技术研究[D].武汉:华中科技大学,2004:4-7.
[5] Patterson D A, Hennessy J. Computer architecture a quantitative approach[M]. 北京:机械工业出版社,1999:14-39.
[6] 易小华,刘杰,叶丹. 面向MapReduce 的数据处理流程开发方法[J].计算机科学与探索,2011,5 (2):161-169.
[7] Dean J, Ghemawant S. MapReduce: Simplfied data processing on large clusters[J]. Communications of the ACM,2008,51 (1):107-113.
[8] 卢小宾,郭亚军.信息分析理论与实践[M].北京:清华大学出版社,2013:60-61.
[9] 韩海雯. MapReduce计算任务调度的资源配置优化研究[D].广州:华南理工大学,2013:29-32.
[10] Chang Fay, Dean J, Ghemawat S, et al. Bigtable: A distributed storage system for structured data[C]//The Proceedings of the OSDI'06: Seventh Symposium on Operating System Design and Implementation. Seattle:[s.n.], 2006:205-218.
[11] 洑云龙.云计算平台下的数据挖掘研究[D].南京:南京邮电大学,2013:5-6.
[12] 任家东,任东英. 基于时间戳数据库的分布式多层时态关联规则挖掘[J].计算机工程,2004,30 (16):63-64.

Outlines

/