Tag:hadoop

  • Detailed explanation of Hadoop 2.7.2 compiling 64 bit source code

    Time:2020-7-29

    1、 Environmental preparation 1. CentOS configuration The best way is to use the newly cloned virtual machine, set the virtual machine memory to a bit larger (4G I set), configure the network, host name, turn off the firewall, and close SELinuxNote: use root role compilation to reduce the folder permissions 2. Jar package preparation (Hadoop […]

  • Hadoop 2.4.1 pseudo distributed building

    Time:2020-1-10

    1. Prepare Linux Environment 1.0 click the VMware shortcut, right-click to open the file location – > double click vmnetcfg.exe – > VMnet1 host only – > Modify subnet IP to set the network segment: 192.168.1.0 subnet mask: 255.255.255.0 – > apply – > OK Go back to windows – > open network and Sharing […]

  • Analysis of hive cli instance

    Time:2020-1-7

    Hive cli start [toc] CliDriver Function: Execute command: the actual running class when hive is org.apache.hadoop.hive.cli.clidriver.java. Entrance public static void main(String[] args) throws Exception { int ret = new CliDriver().run(args); System.exit(ret); } public int run(String[] args) throws Exception { //Parse the command line parameters, set the parameters of hiveconf to the environment variables, and add […]

  • Hadoop installation and environment building tutorial diagram

    Time:2019-12-15

    First,HadoopInstallation 1. Download address: https://archive.apache.org/dist/hadoop/common/ I downloaded hadoop-2.7.3.tar.gz. 2. Create the folder zookeeper in / usr / local / mkdir hadoop 3. Upload the file to the / usr / local / source directory on Linux 3. decompress Run the following command: tar -zxvf hadoop-2.7.3.tar.gz-C /usr/local/hadoop 4. Modify the configuration file Enter intocd /usr/local/hadoop/hadoop-2.7.3/etc/hadoop/ , […]

  • Analysis of map reduce 1.0

    Time:2019-12-13

    A brief analysis of the first generation map reduce principle. No nonsense, above. The figure above shows the schematic diagram of the first generation map reduce. Map stage is on the left and reduce stage is on the right. Map and reduce are each one process. First lookLeft map stage: Map data exists in HDFS. […]

  • Ten years of data analysis experience, summed up the best use of these three types of tools

    Time:2019-12-9

    When it comes to data analysis tools, I believe you are familiar with them, but many people have a doubt? With so many data analysis tools, what’s the difference between them? Which is better? Which is stronger? Which should I learn? Although this question is a bit conventional, but it is very important, I have […]

  • Seemingly complex and cool data visualization large screen, learn this tool to easily handle

    Time:2019-12-2

    “I’m drunk at present. I can’t sleep if I don’t make the report. I have to make it up for a night if I ask where the wine house has it.” behind the doggerel poem with some sense of humor, it’s my sad experience of working for many years. Yes, I’m your brother, cousin, who […]

  • Take it as an open source, use it as an open source — in-depth analysis of Alibaba’s optimization and improvement of Apache Flink

    Time:2019-11-30

    Apache Flink overview Apache Flink (hereinafter referred to as Flink) is a big data research project born in Europe, formerly known as stratosphere. The project is a research project of the Berlin University of technology, focusing on batch computing in the early stage. In 2014, the core members of stratosphere project hatched Flink, and donated […]

  • Report tool cost or open source? I compared these six tools

    Time:2019-11-29

    In the past year, I have been dealing with report problems, researched many report tools, and developed report applications suitable for the company’s business. Share some personal opinions on how to choose report tools, hoping to be of reference to you. For most enterprises, those who can spend time and manpower to develop applications to […]

  • Hadoop Kerberos operation

    Time:2019-11-29

    Hadoop Kerberos SubmissionYARNMission failure diagnostics: Application application_1542706493784_0005 failed 2 times due to AM Container for appattempt_1542706493784_0005_000002 exited with exitCode: -1000 Failing this attempt.Diagnostics: [2018-11-20 17:48:05.088]Application application_1542706493784_0005 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is xxxx main : requested yarn user is xxxx User xxxx not found Solution […]

  • Alibaba APStar big data architecture and Hadoop ecosystem

    Time:2019-11-28

    Many people asked what Alibaba’s Apsara big data platform, ladder 2, maxcompute and real-time computing really are, and what’s the difference between Alibaba’s own Hadoop platform and Alibaba’s own. Let’s talk about Hadoop first. What is Hadoop? Hadoop is an open source, highly reliable and extensible distributed big data computing framework system, which is mainly […]

  • Learn from the simple, and teach you to realize the introduction and advancement of data analysis step by step

    Time:2019-11-26

    Source data analysis Recently, many people are asking some questions about data analysis. How to learn about data analysis? How to get started quickly, and how to break through the bottleneck between technology and business? In fact, before learning data analysis, at least we need to know what skills data analysts need. Some students saw […]