Tag:hdfs

  • Alibaba APStar big data architecture and Hadoop ecosystem

    Time:2019-11-28

    Many people asked what Alibaba’s Apsara big data platform, ladder 2, maxcompute and real-time computing really are, and what’s the difference between Alibaba’s own Hadoop platform and Alibaba’s own. Let’s talk about Hadoop first. What is Hadoop? Hadoop is an open source, highly reliable and extensible distributed big data computing framework system, which is mainly […]

  • Otherwise, let’s talk about Hadoop and its ecosystem

    Time:2019-11-24

    In fact, there are a lot of articles or books about Hadoop and its ecosystem. When the concept of big data rose in 2016, I was lucky to enter the data industry. Although, in the past two years, I didn’t meet my initial expectations, I took such a step.Here, let’s talk about Hadoop and its […]

  • Hadoop small file solution – based on namenode memory and MapReduce performance solution

    Time:2019-11-19

    [TOC] In the first article, I discussed what constitutes a small file and why Hadoop has a small file problem. I define a small file as any file smaller than 75% of Hadoop block size, and explain that Hadoop prefers smaller larger files due to namenode memory usage and MapReduce performance. In this article, I’ll […]

  • Hadoop small file solution based on file integration

    Time:2019-11-18

    Through the study of some less commonly used alternatives to solve MapReduce performance problems and the factors to be considered when choosing a solution. Solve MapReduce performance problems The following solutions alleviate MapReduce performance problems: Change ingestion process / interval Batch file merge Sequence file HBase S3distcp (if Amazon EMR is used) Using combinefileinputformat Hive […]

  • Py = > Ubuntu Hadoop yarn HDFS hive spark installation configuration

    Time:2019-11-11

    environment condition Java 8Python 3.7Scala 2.12.10Spark 2.4.4hadoop 2.7.7hive 2.3.6mysql 5.7mysql-connector-java-5.1.48.jar R 3.1 + (may not be installed) Install Java A priori portal: https://segmentfault.com/a/11 Install Python Bring Python 3.7 with Ubuntu Install Scala Download: https://downloads.lightbend.cDecompression: Tar – zxvf download good Scala To configure: vi ~/.bashrc export SCALA_HOME=/home/lin/spark/scala-2.12.10 export PATH=${SCALA_HOME}/bin:$PATH Save exit Activate configuration: source ~/.bashrc Install […]

  • Using Python to operate Hadoop, python MapReduce

    Time:2019-11-4

    Environmental Science Environment use: Hadoop 3.1, python 3.6, Ubuntu 18.04 Hadoop is developed in Java. It is recommended to use java to operate HDFS. Sometimes we need to use Python to operate HDFS. This time, we will discuss how to use Python to operate HDFS, upload files, download files, view folders, and use Python to […]

  • Solution for JPS not seeing data node information under Hadoop cluster

    Time:2019-9-28

    After each HDFS namenode-format, the cluster ID of the namenode is automatically updated. In this case, we first look at the logs log of the datanode to determine that the cluster ID is inconsistent. At this time, we should go to the tmp/dfs/current file of HDFS and update the cluster ID of the datanode to […]

  • Java client cannot upload files to HDFS

    Time:2019-9-24

    019-07-01 16:45:24,933 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 58.211.111.42:63048 Call#3 Retry#0 java.io.IOException: File /a1.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1620) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3350) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:678) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:491) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) […]

  • [Resolved] Java calls Hbase to report errors

    Time:2019-9-16

    Pseudo-distributed HBase service construction, the system operation is normal, you can also query the list of all tables, but query the details of the table, the call will be wrong. java.net.connectexception: call to localhost/127.0.0.1:16020 failed on connection exception It can also be seen from the error message that the master node should be the name […]

  • MapReduce does not connect to HDFS

    Time:2019-9-11

    Configuration environment Hadoop environment is really fatal, and unexpected problems can arise at any time, such as Failing this attempt.Diagnostics: Call From Hadoop 001/0.0.0.0 to Hadoop 001:8020 failed on connection exception: java.net.ConnectException: Deny connection; For more details see: http://wiki.apache.org/hadoop/Connection Refused It’s a strange problem. All the configurations are OK. The problem is on ipv6. Just […]