报告人: Xiaodong Zhang, Robert M. Critchfield Professor ,The Ohio State University
时间: 6月24日,10:00-11:00 am
地点: 0638太阳集团软件园校区办公楼二楼学术报告厅
主持人:陈宝权 教授
Title: Building Big Data Processing Systems under the Scale-out Computing Model
Abstract:
We have entered a data-driven decision making era in almost all the applications of the society. From a system perspective, an increasingly high volume of data has the following implications: (1) Conventional database systems, including parallel database systems, are not designed to such a big volume of data, demanding new system infrastructure. (2) Big data users from many application fields require cost-effective solutions for their analytics because conventional data processing solutions are not scalable and affordable. (3) System designers and practitioners highly demand various new software tools for big data processing and analytics. (4) Computing paradigm for data processing has been shifted from a scale-up model for high performance to a scale-out model for high throughput as the main role of computers becomes data centers.
I will discuss how system community addresses the above mentioned issues by presenting a case study on major technical advancements in Apache Hive, which has been widely adopted by many organizations for various big data analytics applications. Closely working with many users and organizations, we have identified several shortcomings of the early version of Hive in its data storage structure, query planning, and query execution. I will present a community-based effort and show how academic research lays a foundation for Hive to improve its daily operations in production systems.
Bio:
Xiaodong Zhang is the Robert M. Critchfield Professor in Engineering and Chair of the Computer Science and Engineering Department at the Ohio State University. His research interests focus on data management in computer, networking and distributed systems. He has made strong efforts to transfer his academic research into advanced technology to impact production systems. He received his Ph.D. in Computer Science from University of Colorado at Boulder, where he received Distinguished Engineering Alumni Award in 2011. He is a Fellow of the ACM and a Fellow of IEEE.