• Alibaba APStar big data architecture and Hadoop ecosystem


    Many people asked what Alibaba’s Apsara big data platform, ladder 2, maxcompute and real-time computing really are, and what’s the difference between Alibaba’s own Hadoop platform and Alibaba’s own. Let’s talk about Hadoop first. What is Hadoop? Hadoop is an open source, highly reliable and extensible distributed big data computing framework system, which is mainly […]

  • Otherwise, let’s talk about Hadoop and its ecosystem


    In fact, there are a lot of articles or books about Hadoop and its ecosystem. When the concept of big data rose in 2016, I was lucky to enter the data industry. Although, in the past two years, I didn’t meet my initial expectations, I took such a step.Here, let’s talk about Hadoop and its […]

  • Hadoop small file solution – based on namenode memory and MapReduce performance solution


    [TOC] In the first article, I discussed what constitutes a small file and why Hadoop has a small file problem. I define a small file as any file smaller than 75% of Hadoop block size, and explain that Hadoop prefers smaller larger files due to namenode memory usage and MapReduce performance. In this article, I’ll […]

  • Hadoop small file solution based on file integration


    Through the study of some less commonly used alternatives to solve MapReduce performance problems and the factors to be considered when choosing a solution. Solve MapReduce performance problems The following solutions alleviate MapReduce performance problems: Change ingestion process / interval Batch file merge Sequence file HBase S3distcp (if Amazon EMR is used) Using combinefileinputformat Hive […]

  • Py = > Ubuntu Hadoop yarn HDFS hive spark installation configuration


    environment condition Java 8Python 3.7Scala 2.12.10Spark 2.4.4hadoop 2.7.7hive 2.3.6mysql 5.7mysql-connector-java-5.1.48.jar R 3.1 + (may not be installed) Install Java A priori portal: https://segmentfault.com/a/11 Install Python Bring Python 3.7 with Ubuntu Install Scala Download: https://downloads.lightbend.cDecompression: Tar – zxvf download good Scala To configure: vi ~/.bashrc export SCALA_HOME=/home/lin/spark/scala-2.12.10 export PATH=${SCALA_HOME}/bin:$PATH Save exit Activate configuration: source ~/.bashrc Install […]

  • Using Python to operate Hadoop, python MapReduce


    Environmental Science Environment use: Hadoop 3.1, python 3.6, Ubuntu 18.04 Hadoop is developed in Java. It is recommended to use java to operate HDFS. Sometimes we need to use Python to operate HDFS. This time, we will discuss how to use Python to operate HDFS, upload files, download files, view folders, and use Python to […]

  • Solution for JPS not seeing data node information under Hadoop cluster


    After each HDFS namenode-format, the cluster ID of the namenode is automatically updated. In this case, we first look at the logs log of the datanode to determine that the cluster ID is inconsistent. At this time, we should go to the tmp/dfs/current file of HDFS and update the cluster ID of the datanode to […]

  • Java client cannot upload files to HDFS


    019-07-01 16:45:24,933 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from Call#3 Retry#0 java.io.IOException: File /a1.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1620) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3350) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:678) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:491) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) […]

  • [Resolved] Java calls Hbase to report errors


    Pseudo-distributed HBase service construction, the system operation is normal, you can also query the list of all tables, but query the details of the table, the call will be wrong. java.net.connectexception: call to localhost/ failed on connection exception It can also be seen from the error message that the master node should be the name […]

  • MapReduce does not connect to HDFS


    Configuration environment Hadoop environment is really fatal, and unexpected problems can arise at any time, such as Failing this attempt.Diagnostics: Call From Hadoop 001/ to Hadoop 001:8020 failed on connection exception: java.net.ConnectException: Deny connection; For more details see: http://wiki.apache.org/hadoop/Connection Refused It’s a strange problem. All the configurations are OK. The problem is on ipv6. Just […]