• 3.1 docker image overview


    Get image Docker store official base image sourcehttps://store.docker.com/search?category=base&source=verified&type=image Netease honeycomb image sourcehttps://c.163yun.com/hub Alicloud image sourcehttps://dev.aliyun.com/search.html Docker acceleratorhttps://www.daocloud.io/mirror Generate image Can be rightcontainerAfter making the changes, commit the changes to generate a new oneimage docker commit 259b310e11e8 siguoya/centos-vim However, this method of building images is not recommended: Even if useddocker historyIt is also difficult to see […]

  • Convert string data to bigint in hive


    Use cast function to convert string to bigint: SELECT CAST(‘00321’ AS BIGINT) FROM table; As a BIGINT it will show on the screen and in delimited text files as 321. reference resources:Hive – Converting a string to bigint Supplementary knowledge:Solve the problem of association data error by associating bigint and varchar fields in hive Implicitly […]

  • The right way of landing platform in API strategy: building an efficient “API management platform”


    Baishan cloud technology“An organization’s API strategy should be an important support of the organization’s digital strategy, and a considerable proportion of it.” ——Gartner analyst Paolo malinverno, mark O’Neill I. APIStrategic rise In the era of digital transformation, with the rapid iteration of new technologies and the endless emergence of new demands, the number, complexity and […]

  • Py = > Ubuntu Hadoop yarn HDFS hive spark installation configuration


    environment condition Java 8Python 3.7Scala 2.12.10Spark 2.4.4hadoop 2.7.7hive 2.3.6mysql 5.7mysql-connector-java-5.1.48.jar R 3.1 + (may not be installed) Install Java A priori portal: https://segmentfault.com/a/11 Install Python Bring Python 3.7 with Ubuntu Install Scala Download: https://downloads.lightbend.cDecompression: Tar – zxvf download good Scala To configure: vi ~/.bashrc export SCALA_HOME=/home/lin/spark/scala-2.12.10 export PATH=${SCALA_HOME}/bin:$PATH Save exit Activate configuration: source ~/.bashrc Install […]

  • Summary of hive


    Automated mapjoin set hive.auto.convert.join=true; After automating mapjoin, we don’t need to write in query. Mapjoin is used in the scenario of small table join large table. When the large table passes mapper, the small table will be completely put into memory, and hive will be connected at the map end, because hive can match the […]

  • Big Data Engineer interview question (1)


    Let’s talk about the first project Optimization of shuffle in hive compressCompression can reduce the amount of data stored on the disk, and improve the query speed by reducing I / O. Enable compression for a series of Mr intermediate processes generated by hive set hive.exec.compress.intermediate=true; set mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; Compress the final output (files written to […]

  • Use Spark to synchronize Hive data across clusters


    This article is suitable for those who have the basic knowledge of spark Learn how to use Spark to synchronize Hive data across clusters by reading this article! One of the more mature synchronization tools in the industry is known as Sqoop, which is the bridge between relational databases and Hadoop A common scenario is […]

  • Interview Questions for Big Data Engineers (1)


    Tell me about the first project. Optimization of shuffle in hive compressCompression can reduce the amount of data stored on disk and improve query speed by reducing I/O. Enable compression for a series of MR intermediate processes generated by hive set hive.exec.compress.intermediate=true; set mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; Compression of the final output (files written to hdfs, local disk) […]

  • HBase statistics


    HBase statistical method Use Hive statistics Create a Hive Table Mapping HBase Table CREATE EXTERNAL TABLE LJKTEST( ID STRING , AGE STRING , NAME STRING , COMPANY STRING , SCHOOL STRING )STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key,0:AGE,0:NAME,0:COMPANY,0:SCHOOL”) TBLPROPERTIES(“hbase.table.name” = “LJKTEST”); Execute hive statistics SQL Neither COUNT (1) nor COUNT (*) will work here. […]

  • Installation of HIve


    1 hadoop The use of hive depends on hadoop, so let’s introduce Hadoop first. 1.1 Haoop Download and Installation The author installed Hadoop on Centos 7.3, using the version of hadoop-3.2. Install it directly according to the hadoop-2.9 document. Hadoop download address 1.2 Setting Hadoop environment variables The environment variables of Hadoop must be set, […]

  • Common skills of hive SQL


    1. Multi-line merging Multi-line merging is often used to do interval statistics. By defining a certain amount of area level, hundreds of millions of records are reduced to the total number of different intervals. Generally speaking, it is mapped to one.Typical scenarios:Based on the daily flow of user transactions, the amount of money in different […]

  • Hive + Sqoop Shallow Learning Guide


    business Demand: Statistics of PV per hour data acquisition hdfs hive Data Cleaning (ETL) Describes the process of extracting, transforming, loading data from the source to the destination. FIELD FILTRATION “31/Aug/2015:00:04:37 +0800” “GET /course/view.php?id=27 HTTP/1.1” Field Completion User Information, Commodity Information – “RDBMS” field formatting 2015-08-31 00:04:37 20150831000437 Data analysis MapReduce Hive Spark Export data […]