• Application instance of MapReduce programming model in mongodb


    Note: mongodb used by the author is version 2.4.7. Example of word count: Insert data for word count: Copy codeThe code is as follows: db.data.insert({sentence:’Consider the following map-reduce operations on a collection orders that contains documents of the following prototype’})db.data.insert({sentence:’I get the following error when I follow the code found in this link’}) The figure […]

  • Introduction to MapReduce in mongodb


    MongoDB MapReduce MapReduce is a computing model, which is simply to execute a large number of work (data) decomposition (map), and then merge the results into the final result (reduce). The advantage of this method is that after the task is decomposed, a large number of machines can be used for parallel computing, reducing the […]

  • Examples of using MapReduce in mongodb learning notes


    1、 MapReduce is grouped according to the first parameter of the emit function called in the map function Map reduce is a kind of calculation model, which is simply to execute a large amount of work (data) decomposition (map), and then merge the results into the final result (reduce). To use MapReduce, you need to […]

  • MapReduce development under IDE


    MapReduce development under IDE In the Hadoop class at school, group project has to write some MapReduce. The school has given the cluster, but it’s not very used to directly open VIM to write java on the cluster. So I tossed and stepped on some pits and recorded them here. Git deployment code The most […]

  • Detailed explanation of data aggregation method implemented by MapReduce in mongodb


    Mongodb is a non relational database which is born in the environment of large amount of data to store large amount of data. For a large amount of data, how to perform statistical operations is very important. So how to count some data from mongodb? In mongodb, we are provided with three methods for data […]

  • Analysis of map reduce 1.0


    A brief analysis of the first generation map reduce principle. No nonsense, above. The figure above shows the schematic diagram of the first generation map reduce. Map stage is on the left and reduce stage is on the right. Map and reduce are each one process. First lookLeft map stage: Map data exists in HDFS. […]

  • Windows debugging Hadoop MapReduce task logging (using idea)


    First, prepare the Hadoop connection driver, put it in any folder, write its bin directory to the path environment environment variable, and then take the hadoop.dll file and put it in the C disk system 32 folder.Create an empty Maven project, which is all my dependencies (there may be package conflicts in Maven dependency, please […]

  • Using Python to operate Hadoop, python MapReduce


    Environmental Science Environment use: Hadoop 3.1, python 3.6, Ubuntu 18.04 Hadoop is developed in Java. It is recommended to use java to operate HDFS. Sometimes we need to use Python to operate HDFS. This time, we will discuss how to use Python to operate HDFS, upload files, download files, view folders, and use Python to […]

  • How to implement MapReduce single process version in golang


    Preface As the programming framework of Hadoop, MapReduce is the most frequently contacted part of engineers, and it also has a great impact on the execution efficiency of the whole job except for the network environment and cluster configuration, so it is necessary to have a deep understanding of the whole process. On the first […]

  • HBase Secondary Index Scheme


    HBase Secondary Index Scheme [TOC] Using HBase Coprocessor Scheme Test case requirements: On the original table LJK_TEST, mycf: name is used as a secondary index. First step Create an index table create ‘INDEX_LJK_TEST’,’mycf’ The second step Writing code public class SecondIndexObserver extends BaseRegionObserver { private static final String INDEX_TABLE_NAME = “INDEX_LJK_TEST”; private static final byte[] […]

  • Hadoop MapReduce Spark Configuration Item


    Scope of application The configuration items covered in this article are mainly for Hadoop 2.x and Spark 2.x. MapReduce Official documents https://hadoop.apache.org/doc…Lower left corner: mapred-default.xml Examples of configuration items name value description mapreduce.job.reduce.slowstart.completedmaps 0.05 Resource requests for Reduce Task will not be made until the percentage of Map Task completed reaches that value. mapreduce.output.fileoutputformat.compress false […]

  • Hadoop Construction and the First Hadoop Small Project: Word Counting


    Construction of Hadoop I built Hadoop on Windows 10 myself. References are as follows: 1. Detailed installation and configuration of Hadoop 2. Winutils Download 3.hadoop 3.0.3 Download 4hadoop starts error-reporting java.lang.NoClassDefFoundError:/org/apache/hadoop/yarn/server/timeline Collector Manager The first Hadoop item: word counting Word counting should be the first small entry for many people into Hadoop. My own reference […]