Apache’s first Asian Technology Summit: a detailed introduction to the big data field

Time:2021-8-21

introduction

With more and more enterprises turning to digital transformation, the big data industry has achieved unprecedented rapid development. The prosperity of big data has also brought unprecedented opportunities and challenges to the technologies of big data ecology. When it comes to big data technology, I believe you will not be unfamiliar with Apache. The vast majority of big data open source technologies come from the Apache foundation. Let me introduce them to you today   Apache annual event-   ApacheCon

Apachecon @ official Global Conference Series

Apachecon is the official global conference series of the Apache Software Foundation (ASF), which is held once a year.As a prestigious open source feast, it is one of the most anticipated conferences in the open source community.

Since its launch in 1998, apachecon has attracted more than 350 technology projects and different communities to participate, bringing together industry experts and teachers at home and abroad, sharing the latest global technology trends and practices, and jointly discussing “tomorrow’s technology”, so that the majority of technology lovers can see the latest trends and progress of various technology frontiers, Better upgrade your technology stack.

However, apachecon has been held overseas for more than a decade, and this year isThe Organizing Committee held the first apachecon online conference for the Asia Pacific region: apachecon Asia.Asia Conference will come from China, Japan, India, the United States and other countries at home and abroad140+   TopicsIt is divided into 14 forums, including big data, incubator, API / microservices, middleware, workflow and data governance, data visualization, observability, flow processing, message system, Internet of things and industrial Internet of things, integration, open source community / culture, web server / tomcat, etc.

participate in   two thousand and twenty-one   year   eight   month   six   day  – eight   At the Asia Conference, you will receive:

·Share the latest global technology trends and practices
·Exchange opportunities with 200 + top experts at home and abroad
·3-day event, 140 + topics, free participation in the whole process

Official website of the conference:https://www.apachecon.com/aca…
Details of the agenda of the conference:https://apachecon.com/acasia2…

Apache's first Asian Technology Summit: a detailed introduction to the big data field

About big data Forum

Big data is one of Apache’s most important topics. This year’s big data farm is also very lively, covering top projects or projects in incubation, such as arrow, Atlas, bigtop, carbondata, Cassandra, dolphin scheduler, Doris (in incubation), Druid, Flink, Hadoop, HBase, hive, Hudi, impla, kylin, Kyuubi (in incubation), liminal (in incubation), Nemo, Pinot, pulsar, spark, yunikorn (in incubation), And the popular open source projects such as Milvus and openlookeng. In this three-day event, everyone can understand the cutting-edge trends of these technologies and the wonderful contents such as practical experience, principle and architecture analysis from front-line users.

publisher

Apache's first Asian Technology Summit: a detailed introduction to the big data field

Because big data technology is too hot and the agenda is full for three days, today we will give you a detailed interpretation of the technology masters at home and abroad on the first day.

Big data field is also specially invited   three   positionhost

Apache's first Asian Technology Summit: a detailed introduction to the big data field

August 6 big data agenda highlights @ Apache

extend   Impala —   Common errors and best practices

Sharing guests:Manish Maheshwari
time13:30, August 6
Topic introduction:
Apache impala is a complex engine that requires a comprehensive technical understanding to make full use of it. In this lecture, we will discuss best practices to maintain the scalability of impala deployment and access control configuration to provide a consistent experience for end users. We will also conduct a high-level study of impala’s query profile, which is used as the first stop for any performance troubleshooting. In addition, we will discuss the mistakes that users and Bi tools often make when interacting with impala. Finally, we will discuss an ideal configuration to present all the above in practice.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Manish Maheshwari
Cloudera’s principal sales engineer has more than 15 years of experience in building super large data warehouse and analysis solutions. Extensive experience in Apache Hadoop, Di and Bi tools, data mining and forecasting, data modeling, master data and metadata management, and dashboard tools. Proficient in Hadoop, SAS, R, Informatica, Teradata and qlikview.

DBS [Development Bank of Singapore]   How to make use of our data platform   Apache CarbonData   Drive real-time insight and analysis

Sharing guests: ravindra pesala / Kumar Vishal
Time: 13:30, August 6
Topic introduction:
DBS is a leading bank headquartered in Singapore. The bank has megabytes of structured and unstructured data, which provides important help for the bank’s designated strategy. In 2020, DBS invested in a carbondata based data platform to promote real-time analysis and release insight from existing data from various sources. In this lecture, we will introduce how DBS Bank uses spark and presto engine to shift from traditional data warehouse to carbondata based data lake.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Ravindra Pesala
Senior vice president of DBS Bank, head of big data platform
Apache CarbonData PMC
Lead big data engineering platform, including ingestion, computing, data access, streaming media and metadata.

Apache's first Asian Technology Summit: a detailed introduction to the big data field
Kumar Vishal
Apache CarbonData PMC
Senior Big Data Engineer  
Processing big data engineering platform, including ingestion, computing, data access and streaming media

The challenge of building a distributed fault-tolerant and scalable analysis stack

Sharing guest: nishant bangarwa
Time: 14:10, August 6
Topic introduction:
So far, the Apache Druid cluster has more than 50 trillion events, equivalent to more than 500 Pb of raw data, and is growing. In this speech, we will introduce the design and challenges of distributed fault-tolerant scalable analysis stack, and describe our path to develop Apache Druid into a powerful distributed fault-tolerant scalable analysis data store.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Nishant Bangarwa
Co founder and engineering director of rilldata.
Active open source contributor, Apache Druid PMC & Apache superset PMC, submitter of Apache cite and Apache hive.
Prior to rilldata, he was a member of cloudera’s data warehouse team and metamarkets Druid team, responsible for managing large-scale Apache Druid deployment.
Bachelor of computer science, Kurukshetra National Institute of technology, India

How is security implemented in Apache ozone

Sharing guest: Bharat viswanadham/   Shashikant Banerjee
Time: 14:10, August 6
Topic introduction:
Apache ozone is an extensible, redundant and distributed Hadoop object storage. It will become the top project of Apache in 2020. Apache ozone has two metadata services. One is storage container Manager (SCM), which manages block / container allocation and replication, certificate and node management; The other is ozone manager, which manages metadata. In this lecture, we will discuss how security is implemented in ozone.
Guest introduction:
Bharat Viswanadham: software engineering expert with more than 7 years of experience in designing and building scalable and high-performance distributed storage systems. Apache Hadoop and Apache ozone Committee & PMC.
Shashikant Banerjee: experts in distributed storage systems with more than 8 years of experience. Committer & PMC of Apache Hadoop, Apache ozone and Apache ratio communities.

Analysis and application of openlookeng heuristic index framework

Sharing guest: Li Zheng
Time: 14:50, August 6
Topic introduction:
With the application and development of big data technology, there are more and more data types, wider and wider data distribution, and more and more complex query scenarios. This makes data processing difficult or not easy. In order to improve the availability of big data, Huawei launched the data virtualization engine open source project openlookeng.
Openlookeng provides a unified SQL interface, provides basic interactive query and analysis capabilities, and continues to develop in terms of cross data center / cloud, data source expansion, performance, reliability and security, so as to simplify big data. This lecture will focus on the openlookeng heuristic indexing framework, as well as the serious indexing technology based on the framework and its implementation and application challenges.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Zheng Li
Doctor of Huazhong University of science and technology. He joined Huawei in June 2018. At present, he focuses on the performance optimization research of openlookeng and deeply participates in the design and implementation of big data query analysis engine architecture.

Kyuubi: Netease’s exploration and practical application of serverless spark scenario

Sharing guest: Yao Qin
Time: 14:50, August 6
Topic introduction:
This topic mainly covers the architecture, implementation principles and application scenarios of Netease’s open source big data component Kyuubi project, and shows Kyuubi’s ability to help businesses realize serverless spark and its corresponding process and thinking through practical cases. At the same time, it introduces how we directly participate in the spark open source community to synchronously complete the corresponding problem processing and feature optimization in this process.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Yao Qin
Apache   Main author of Kyuubi project
Apache Spark Committer
Apache Submarine Committer
From Netease big data team

Data analysis of China Merchants Bank across data sources

Sharing guest: Wu Chieh min
Time: 15:30, August 6
Topic introduction:
China Merchants Bank (CMB) has Pb level data stored in RDBMS, NoSQL database, object storage, big data framework – Apache Hadoop, spark, Flink, etc. The cost of transmitting data from different data sources through ETL method is very high. Therefore, openlookeng was introduced to connect different data sources and process data locally across data centers and hybrid clouds.
This lecture will provide an overview of CMB’s data processing engine, which can analyze geographically remote data sources in place. And how we use openlookeng’s functions, such as high availability, automatic expansion, built-in cache and index support, to meet the reliability requirements of enterprise workloads.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Chieh min Wu
China Merchants Bank big data technology expert, with 9 years of big data experience in the field of financial technology, is responsible for the architecture design, implementation and maintenance of China Merchants Bank big data platform. openLookeng PMC。

Apache   Druid’s storage and query engine insider

Sharing guest: Gian Merlino
Time: 15:30, August 6
Topic introduction:
Apache Druid is an open source columnar database, which is famous for its large scale and high performance; Its largest deployment includes thousands of servers. However, regardless of the scale, high performance should start from a good foundation. This lecture will explore the internal working principles of a single data server to deeply understand these basic principles. We will introduce how Apache Druid stores data, what compression method is used, how the storage engine is connected with the query processing engine, and how the system handles resource management and multithreading.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Gian Merlino
Imply co founder and CTO. Druid is one of the main submitters. He led the data capture team at metamarkets and held a senior engineering position at Yahoo. Bachelor of computer science, California Institute of technology.

Speed up big data analysis by using the index of Apache carbondata

Sharing guests: Akash R nilugal / Kunal Kapoor
Time: 16:10, August 6
Topic introduction:
Data in the 21st century is like oil in the 18th century: if processed in an intelligent way, it is a huge and undeveloped valuable asset. The storage and analysis of big data are challenging and expensive in terms of cost and time. Analytical solutions need to constantly adjust themselves to keep up with the challenges of exponential data growth. Apache carbondata is a unified storage solution + file format designed to optimize query performance and reduce analysis costs. Apache carbondata has been adopted by more than 100 open source users. In the database, index is one of the main functions. It can basically help query without scanning each row. Inspired by this concept, Apache carbondata supports custom indexes such as min / max, bloom, Lucene, secondary indexes and materialized views to speed up row level updates, deletions, OLAP and point queries. This presentation highlights carbondata’s custom index architecture and distributed index cache server, which helps to provide faster query results, as well as future challenges and scope.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Akash R Nilugal
Apache Carbondata PMC & Committer
Senior technical director of cloud and AI / data platform team of Huawei banglore research center.
With 5 years of experience in big data, he is interested in big data index support, materialized view, CDC of big data, spark SQL query optimization, spark structured flow, data lake and data warehouse functions.

Kunal Kapoor
Apache   Carbondata PMC & Committee, Huawei banglore   System architect of cloud and AI / data platform team of research center, mainly responsible for distributed index cache server, hive + carbondata integration, pre aggregation support, S3 pair   Carbondata   The secondary index of carbondata and spark SQL query optimization in carbondata.

Big data machine learning scheme based on Java

Sharing guest: Lan Qing
Time: 16:10, August 6
Topic introduction:
The success of machine learning (ML) applications depends on the use of big data. Most big data is provided in unstructured format. The availability of big data can also be offline and online. Although there are options for ML tasks in Python, it is quite challenging to integrate Python applications into the existing Java / Scala based big data pipeline. In addition, in Java / Scala, there are few options to bridge the gap between processing big data and using the same library for ML workload.
In order to solve the above problems, we will use the machine learning framework djl in Java to demonstrate the big data ml solution in Java. Djl provides a variety of ML engines, including tensorflow, pytorch, Apache mxnet (hatching). PaddlePaddle, ONNXRuntime and so on. By using Apache Flink and Apache spark, users can easily establish their online / offline ml pipeline. At the end of the meeting, the audience will be able to establish an easy-to-use, high-performance ml pipeline for all different scenarios.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Lan Qing
Software development engineer of Amazon AWS machine learning platform, deep cultivation of big data and application architecture of machine learning in production environment.
One of the co authors of djl (djl. AI)
Apache MXNet PPMC
Master of computer engineering, Columbia University

Insight into the secrets of the open source community – Best Practices for data-driven community operations

Sharing guests: Zhong Jun / Jiang Yikun / Peng Lei
Time: 16:50, August 6
Topic introduction:
In the evaluation process of open source community, data-driven insight and analysis of the current situation of the community is of great significance to help the community grow healthily. Therefore, data-driven operations play a key role in the community. In this topic, we will introduce best practices in data-driven community operations. This operation management system helps several of China’s most active open source communities (such as openeuler, opengauss, openlookeng, mindspire, etc.) measure the health, activity and other key indicators of the community efficiently and scientifically. Combined with the real case of openeuler community, this topic will also tell how to realize the data-driven operation system, introduce how to use the powerful Apache big data project to build the first available version (including data storage, analysis, data insight and visualization), and the improvement scheme we contribute to the Apache upstream project.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
clock   King
Participated in the open source community for more than 6 years. Responsible for the digital operation system of openeuler, mindspire, opengauss and openlookeng projects. Served as the core contributor of several communities, such as the maintainer of the infra sig team in the openeuler open source community, the maintainer of the infra sig team in the opengauss open source community, and the core member of the openstack Manila project.

Apache's first Asian Technology Summit: a detailed introduction to the big data field
Jiang Yikun
Senior Software Engineer of Huawei’s open source development team, who has participated in the open source community for more than 5 years, is committed to multi architecture support and improvement of projects in the field of big data. Five years of experience in cloud computing and big data optimization. Previously, he was a committer for the openstack storage project.

Apache's first Asian Technology Summit: a detailed introduction to the big data field
Peng   Lei
Senior Software Engineer of Huawei’s open source development team, engaged in multi architecture support and improvement of MySQL. Five years of experience in SQL development and big data use. I have studied the MySQL kernel, including MySQL group replication, and engaged in the kernel development of distributed database. Two years of experience in big data projects, such as spark / Kafka / Hadoop.

Apache Hudi on AWS

Sharing guest: Fei Lianghong
Time: 16:50, August 6
Topic introduction:
This paper introduces Apache Hudi on AWS, including Apache Hudi introduction, common use cases, Hudi storage types, writing Hudi datasets, querying Hudi datasets and some tips.
Guest introduction:
Apache's first Asian Technology Summit: a detailed introduction to the big data field
Fei Lianghong
Amazon Web Services AWS   Chief developer Evangelist
Use your 20 years of experience to support innovation and help start-ups and companies turn their ideas into reality. Focus on software development and cloud native architecture, as well as the technical and business impact of machine learning and data analysis. Before joining AWS, he worked at Apple and Microsoft. Some interests include artificial intelligence, data science and photography.
The above is the wonderful sharing of the big data forum of Asia Conference on the first day. Please look forward to all the big guys on the second and third days!

What are you hesitating about here? Come and sign up quickly!

Registration method

ApacheCon Asia 2021
August 6-8
14 forums, 100 + technology projects
140 + topic speeches
Wired dialogue global technology masters and experts
Full 3-day all-weather communication event
Free participation in the whole process
Apachecon’s first Asian online conference
August 6-8, 2021

Look forward to the arrival of friends!

Click【here】You can sign uphttps://hopin.com/events/apac…