• Virtual distributed graphics and text tutorial by configuring Hadoop with VMWare


    1、 Experimental environment number project Software and version 1 operating system CentOS6 2 Environment software VMware 12 3 JDK environment jdk1.8.0_181 4 Hadoop Hadoop2.8.5 5 Native operating system Win10 professional 2、 Prepare Linux Network Environment 1.0 click the VMware shortcut, right-click to open the file location and double-click vmnetcfg.exe ->VMnet1 host only > Modify subnet […]

  • Running Hadoop in docker and image making method


    Repeat the wheel, here we use repackaging to generate a Hadoop image based on docker; Hadoop cluster depends on the following software: JDK, SSH, etc., so as long as these two items are also related to Hadoop, they can be packaged into the image; Configuration file preparation 1. Hadoop related configuration file: Core- site.xml 、hdfs- […]

  • Detailed explanation of Hadoop 2.7.2 compiling 64 bit source code


    1、 Environmental preparation 1. CentOS configuration The best way is to use the newly cloned virtual machine, set the virtual machine memory to a bit larger (4G I set), configure the network, host name, turn off the firewall, and close SELinuxNote: use root role compilation to reduce the folder permissions 2. Jar package preparation (Hadoop […]

  • Hadoop 2.4.1 pseudo distributed building


    1. Prepare Linux Environment 1.0 click the VMware shortcut, right-click to open the file location – > double click vmnetcfg.exe – > VMnet1 host only – > Modify subnet IP to set the network segment: subnet mask: – > apply – > OK Go back to windows – > open network and Sharing […]

  • Analysis of hive cli instance


    Hive cli start [toc] CliDriver Function: Execute command: the actual running class when hive is org.apache.hadoop.hive.cli.clidriver.java. Entrance public static void main(String[] args) throws Exception { int ret = new CliDriver().run(args); System.exit(ret); } public int run(String[] args) throws Exception { //Parse the command line parameters, set the parameters of hiveconf to the environment variables, and add […]

  • Hadoop installation and environment building tutorial diagram


    First,HadoopInstallation 1. Download address: https://archive.apache.org/dist/hadoop/common/ I downloaded hadoop-2.7.3.tar.gz. 2. Create the folder zookeeper in / usr / local / mkdir hadoop 3. Upload the file to the / usr / local / source directory on Linux 3. decompress Run the following command: tar -zxvf hadoop-2.7.3.tar.gz-C /usr/local/hadoop 4. Modify the configuration file Enter intocd /usr/local/hadoop/hadoop-2.7.3/etc/hadoop/ , […]

  • Analysis of map reduce 1.0


    A brief analysis of the first generation map reduce principle. No nonsense, above. The figure above shows the schematic diagram of the first generation map reduce. Map stage is on the left and reduce stage is on the right. Map and reduce are each one process. First lookLeft map stage: Map data exists in HDFS. […]

  • Ten years of data analysis experience, summed up the best use of these three types of tools


    When it comes to data analysis tools, I believe you are familiar with them, but many people have a doubt? With so many data analysis tools, what’s the difference between them? Which is better? Which is stronger? Which should I learn? Although this question is a bit conventional, but it is very important, I have […]

  • Seemingly complex and cool data visualization large screen, learn this tool to easily handle


    “I’m drunk at present. I can’t sleep if I don’t make the report. I have to make it up for a night if I ask where the wine house has it.” behind the doggerel poem with some sense of humor, it’s my sad experience of working for many years. Yes, I’m your brother, cousin, who […]

  • Take it as an open source, use it as an open source — in-depth analysis of Alibaba’s optimization and improvement of Apache Flink


    Apache Flink overview Apache Flink (hereinafter referred to as Flink) is a big data research project born in Europe, formerly known as stratosphere. The project is a research project of the Berlin University of technology, focusing on batch computing in the early stage. In 2014, the core members of stratosphere project hatched Flink, and donated […]

  • Report tool cost or open source? I compared these six tools


    In the past year, I have been dealing with report problems, researched many report tools, and developed report applications suitable for the company’s business. Share some personal opinions on how to choose report tools, hoping to be of reference to you. For most enterprises, those who can spend time and manpower to develop applications to […]

  • Hadoop Kerberos operation


    Hadoop Kerberos SubmissionYARNMission failure diagnostics: Application application_1542706493784_0005 failed 2 times due to AM Container for appattempt_1542706493784_0005_000002 exited with exitCode: -1000 Failing this attempt.Diagnostics: [2018-11-20 17:48:05.088]Application application_1542706493784_0005 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is xxxx main : requested yarn user is xxxx User xxxx not found Solution […]