• Examples of using MapReduce in mongodb learning notes


    1、 MapReduce is grouped according to the first parameter of the emit function called in the map function Map reduce is a kind of calculation model, which is simply to execute a large amount of work (data) decomposition (map), and then merge the results into the final result (reduce). To use MapReduce, you need to […]

  • MapReduce development under IDE


    MapReduce development under IDE In the Hadoop class at school, group project has to write some MapReduce. The school has given the cluster, but it’s not very used to directly open VIM to write java on the cluster. So I tossed and stepped on some pits and recorded them here. Git deployment code The most […]

  • Detailed explanation of data aggregation method implemented by MapReduce in mongodb


    Mongodb is a non relational database which is born in the environment of large amount of data to store large amount of data. For a large amount of data, how to perform statistical operations is very important. So how to count some data from mongodb? In mongodb, we are provided with three methods for data […]

  • Analysis of map reduce 1.0


    A brief analysis of the first generation map reduce principle. No nonsense, above. The figure above shows the schematic diagram of the first generation map reduce. Map stage is on the left and reduce stage is on the right. Map and reduce are each one process. First lookLeft map stage: Map data exists in HDFS. […]

  • Windows debugging Hadoop MapReduce task logging (using idea)


    First, prepare the Hadoop connection driver, put it in any folder, write its bin directory to the path environment environment variable, and then take the hadoop.dll file and put it in the C disk system 32 folder.Create an empty Maven project, which is all my dependencies (there may be package conflicts in Maven dependency, please […]

  • Using Python to operate Hadoop, python MapReduce


    Environmental Science Environment use: Hadoop 3.1, python 3.6, Ubuntu 18.04 Hadoop is developed in Java. It is recommended to use java to operate HDFS. Sometimes we need to use Python to operate HDFS. This time, we will discuss how to use Python to operate HDFS, upload files, download files, view folders, and use Python to […]

  • How to implement MapReduce single process version in golang


    Preface As the programming framework of Hadoop, MapReduce is the most frequently contacted part of engineers, and it also has a great impact on the execution efficiency of the whole job except for the network environment and cluster configuration, so it is necessary to have a deep understanding of the whole process. On the first […]

  • HBase Secondary Index Scheme


    HBase Secondary Index Scheme [TOC] Using HBase Coprocessor Scheme Test case requirements: On the original table LJK_TEST, mycf: name is used as a secondary index. First step Create an index table create ‘INDEX_LJK_TEST’,’mycf’ The second step Writing code public class SecondIndexObserver extends BaseRegionObserver { private static final String INDEX_TABLE_NAME = “INDEX_LJK_TEST”; private static final byte[] […]

  • Hadoop MapReduce Spark Configuration Item


    Scope of application The configuration items covered in this article are mainly for Hadoop 2.x and Spark 2.x. MapReduce Official documents https://hadoop.apache.org/doc…Lower left corner: mapred-default.xml Examples of configuration items name value description mapreduce.job.reduce.slowstart.completedmaps 0.05 Resource requests for Reduce Task will not be made until the percentage of Map Task completed reaches that value. mapreduce.output.fileoutputformat.compress false […]

  • Hadoop Construction and the First Hadoop Small Project: Word Counting


    Construction of Hadoop I built Hadoop on Windows 10 myself. References are as follows: 1. Detailed installation and configuration of Hadoop 2. Winutils Download 3.hadoop 3.0.3 Download 4hadoop starts error-reporting java.lang.NoClassDefFoundError:/org/apache/hadoop/yarn/server/timeline Collector Manager The first Hadoop item: word counting Word counting should be the first small entry for many people into Hadoop. My own reference […]

  • Summary Design Patterns of MapReduce Design Patterns


    What is an outline design pattern? Summary analysts group similar data together and perform subsequent analysis operations such as statistical calculations, index generation, or simple counting. What are the categories of outline design patterns? (1) numerical summary (2) inverted index summary (3) counter count and so on. Numerical summary Including maximum, minimum, average, variance and […]

  • Filtering Mode of MapReduce Design Patterns


    Filtering mode Filtering (regular filtering and random sampling) Application scenario Small data sets with certain characteristics can be screened from a large data set. code implementation In Mapper stage, the values are filtered by regular expressions. In Ruducer stage, random numbers of double type are generated to determine whether they are less than the given […]