• Week 24, 2018 – yarn of big data


    Hadoop is a large operating system, HDFS is its file system, and yarn is its computing system. Yarn (another Resource Coordinator) is a new Hadoop resource manager. Before that, MapReduce 1.0 was usedDifferences between MapReduce 1.0 and MapReduce 2.0 (yarn):MapReduce 1.0 The daemons are:1.JobTracker Daemons for processing jobs (user submit code)Decide which files are involved […]

  • IOS multithreading (GCD)


    1. Introduction to GCD What is GCD? Let’s take a look at the explanation of Baidu Encyclopedia and understand the concept From Baidu Encyclopedia Grand Central Dispatch(GCD)It is a new solution of multi-core programming developed by apple. It is mainly used to optimize applications to support multi-core processors and other symmetric multi-processing systems. It is […]

  • Gradle Java plug in


    Java plug-ins are the basis of building JVM projects, which add many capabilities to the projects, such as compiling, testing, packaging, publishing, etc. Many plug-ins are implemented based on Java plug-ins, such as Android plug-ins. usage Using the ID application plug-in plugins { id ‘java’ } Source sets Java plug-in introduces the concept of source […]

  • Simple analysis of MapReduce slice


    What is the meaning of slice in MapReduce? As the name implies, data is divided into data slices. In fact, this slice is related to the number of maps in the map stage and the amount of data processed by each map. In MapReduce, how is the number of maps of a job determined by […]

  • 60tb jobs migrated from hive to spark on Facebook


    Facebook often uses analytics to make data-driven decisions. Over the past few years, users and products have grown, bringing the data volume of a single query in our analysis engine to tens of terabytes. Some of our batch analysis is based on the hive platform (APACHE hive was contributed to the community by Facebook in […]

  • Four stages of MapReduce


    1. Split phase: In this stage, each input file is input to map in pieces. If a file has 200m, it will be divided into two pieces by default, because the default maximum value of each piece is the same as 128M of each piece. If the input is a large number of small files, […]

  • The migration practice of jstorm to Flink in today’s headlines


    Author: Zhang Guanghui This article will show you the whole process and subsequent plans of how byte skipping company will migrate storm from jstorm to Flink. You can learn about the background of the introduction of Flink and the process of building the Flink cluster. How is byte skipping compatible with previous jstorm jobs and […]

  • C ා implement task scheduling and deployment of windows services based on quartz.net


    1、 Introduction to quartz.net Quartz.net is a powerful, open-source, lightweight job scheduling framework. It is A.net porting of OpenSymphony’s Quartz API. It can be rewritten with C ා and can be used in WinForm and asp.net MVC,. Net core applications. It’s flexible and not complicated. You can use it to create simple or complex job […]

  • Read JavaScript concurrency model and event loop mechanism


    We know that JS language is serial execution, blocking, event driven, so how does it support concurrent data processing? “Single thread” language In browser implementation, each single page is an independent process, which includes JS engine, GUI interface rendering, event triggering, timing trigger, asynchronous HTTP request and other threads. Process is the smallest unit of […]

  • Practice and application of Flink in meituan


    Author: Liu Dishan This article is organized from the Flink meetup held on August 11 in Beijing, where Liu Dishan, a guest, joined the data platform of meituan in 2015. We are committed to building an efficient and easy-to-use real-time computing platform and exploring enterprise level solutions and unified services for real-time applications in different […]

  • Kafka two-level scheduling for distributed coordination of microservice task assignment in golang


    background Two level coordinated scheduling architecture based on Kafka message queue In order to coordinate the work of internal consumer and Kafka connector, Kafka implements a replication protocol. The main work is divided into two steps: Through the worker (consumer or connect) to obtain their own topic offset and other metadata information, and give it […]

  • Using gulp


    Gulp project construction Multiple developers develop a project together, each developer is responsible for different modules, which will result in a complete project is actually composed of many “code snippets”;Some preprocessing programs such as less and sass are used to reduce the maintenance cost of CSS. Finally, these preprocessing programs need to be parsed;Combining CSS […]