HBase management, performance tuning

Time:2021-1-13

HBase management, performance tuning

Label (space separated): HBase performance tuning Hadoop


Note: the original text of this article isHBASE ADMINISTRATION, PERFORMANCE TUNING

Setting up Hadoop to expand disk I / O

Modern servers usually have multiple disk hardware to provide large storage capacity. These disks are usually configured as RAID arrays as their factory settings. This is good in many cases, but not for Hadoop.

The slave node of Hadoop stores HDFS data blocks and MapReduce temporary files on its local disk. These local disk operations benefit from the use of multiple independent disks to extend disk I / O.

In this regard, we will describe how to extend disk I / O by setting up Hadoop with multiple disks.

preparation

Let’s assume that each of your datanodes has multiple disks. These disks are JBOD (simple disk bundle) or raid0 configurations. Suppose these disks are mounted on / MNT / d0, / MNT / D1 , /mnt/dn。 And the user grants HDFS write permission at each mount point.

How to do it

To set up Hadoop to extend disk I / O, follow these instructions:

  1. In each datanode node, create a directory on each disk for HDFS to store its data blocks

    hadoop$ mkdir -p /mnt/d0/dfs/data
    hadoop$ mkdir -p /mnt/d1/dfs/data
    …
    hadoop$ mkdir -p /mnt/dn/dfs/data
    
  2. Add the following code to the HDFS configuration file (HDFS)- site.xml ):

    <property>
      <name>dfs.data.dir</name>
      <value>/mnt/d0/dfs/data,/mnt/d1/dfs/data,...,/mnt/dn/dfs/data</value>
    </property>
    
  3. Synchronize modified HDFS- site.xml To cluster:

    [email protected]$ for slave in `cat     $HADOOP_HOME/conf/slaves`
    do
       rsync -avz $HADOOP_HOME/conf/ $slave:$HADOOP_HOME/conf/
    done
    
  4. Restart HDFS:

    [email protected]$ $HADOOP_HOME/bin/stop-dfs.sh
    [email protected]$ $HADOOP_HOME/bin/start-dfs.sh
    

How does it work

I recommend JBOD or raid0 as the datanode node, because you don’t need raid redundancy, because HDFS ensures its data redundancy by using replicas between nodes. Therefore, when a single disk fails, there will be no data loss.

Which one, JBOD or raid0? In theory, JBOD configuration is better than RAID configuration. This is because in a RAID configuration, you have to wait for the slowest disk in the array to complete before the entire write operation is completed, which makes the average I / O time equal to the slowest disk I / O time. In JBOD configuration, the operation in the fastest disk is independent of the slowest disk, which makes the average I / O time faster than the slowest disk. Still, enterprise raid cards may have a big impact. Before deciding which to choose, you may want to benchmark your JBOD and raid0.

For these two JBOD and raid0 configurations, you will mount the disks in different paths. The key point here is the setup dfs.data.dirproperty Creation of all directories on each disk. dfs.data.dirproperty Specifies where the datanode should store its local block. By setting it to separate multiple directories with commas, datanode stores its blocks on all disks with the round robin method. This will enable Hadoop to efficiently expand disk I / O operations on all disks.

Warning: don’t use the dfs.data.dir Property value, or it might not work as expected.

You will need to synchronize these changes to all clusters and restart HDFS for them to take effect.

More than that

If you run MapReduce, which stores its temporary files in the local file system of tasktracker, you may also want to set MapReduce to expand its disk I / O

  1. In each tasktracker node, create a directory on each disk for MapReduce to store its intermediate data files

    hadoop$ mkdir -p /mnt/d0/mapred/local
    hadoop$ mkdir -p /mnt/d1/mapred/local
    …
    hadoop$ mkdir -p /mnt/dn/mapred/local
    
  2. Add the following code to MapReduce’s configuration file (mapred)- site.xml ):

    [email protected]$ vi $HADOOP_HOME/conf/mapred-site.xml
    <property>
    <name>mapred.local.dir</name>
    <value>/mnt/d0/mapred/local,/mnt/d1/mapred/local,...,/mnt/dn/mapred/local</value>
    </property>
    
  3. Mapred of synchronous change- site.xml File to the cluster and restart MapReduce

MapReduce generates many temporary files on the local disk of tasktracker during execution. Like HDFS, setting multiple directories on different disks can greatly expand MapReduce’s disk I / O.

Using network topology script to make Hadoop frame aware

Hadoop has the concept of “rack aware”. The manager can define the rack location of each datanode in the cluster. It is very important to make Hadoop frame aware, because:

  • Rack awareness prevents data loss
  • Rack awareness improves network performance

In this regard, we will describe how to make Hadoop frame aware and why it is so important.

preparation

You need to know which rack your slave nodes belong to. The user who starts Hadoop logs in to the master node.

How to do it

The following steps describe how to make Hadoop rack aware:

  1. Create a topology.sh Script and store it in the configuration directory of Hadoop. Change topology.data In line 3, add your environment variable:

    [email protected]$ vi $HADOOP_HOME/conf/topology.sh
    while [ $# -gt 0 ] ; do
     nodeArg=$1
     exec< /usr/local/hadoop/current/conf/topology.data
     result=""
     while read line ; do
       ar=( $line )
       if [ "${ar[0]}" = "$nodeArg" ] ; then
         result="${ar[1]}"
       fi
     done
     shift
     if [ -z "$result" ] ; then
       echo -n "/default/rack "
     else
       echo -n "$result "
     fi
    done
    

    Don’t forget to set the executable permissions for this script

    [email protected]$ chmod +x $HADOOP_HOME/conf/topology.sh
    
  2. Create a topology.data File, as in the following fragment; change the IP address and rack, and add your own environment variables:

    [email protected]$ vi $HADOOP_HOME/conf/topology.data
    10.161.30.108 /dc1/rack1
    10.166.221.198 /dc1/rack2
    10.160.19.149 /dc1/rack3
    
  3. Add the following to your Hadoop core profile- site.xml ):

    [email protected]$ vi $HADOOP_HOME/conf/core-site.xml
    <property>
    <name>topology.script.file.name</name>
    <value>/usr/local/hadoop/current/conf/topology.sh</value>
    </property>
    
  4. Synchronize the changed files in the cluster and restart HDFS and MapReduce.

  5. Make sure HDFS is now rack aware. If everything works well, you should be able to find something like the following in your namenode log file:

    2012-03-10 13:43:17,284 INFO org.apache.hadoop.net.NetworkTopology: 
    Adding a new node: /dc1/rack3/10.160.19.149:50010
    2012-03-10 13:43:17,297 INFO org.apache.hadoop.net.NetworkTopology: 
    Adding a new node: /dc1/rack1/10.161.30.108:50010
    2012-03-10 13:43:17,429 INFO org.apache.hadoop.net.NetworkTopology: 
    Adding a new node: /dc1/rack2/10.166.221.198:50010
    
  6. Make sure MapReduce is now rack aware. If everything works well, you should find something like the following in your jobtracker log file:

    2012-03-10 13:50:38,341 INFO org.apache.hadoop.net.NetworkTopology: 
    Adding a new node: /dc1/rack3/ip-10-160-19-149.us-west-1.compute.internal
    2012-03-10 13:50:38,485 INFO org.apache.hadoop.net.NetworkTopology: 
    Adding a new node: /dc1/rack1/ip-10-161-30-108.us-west-1.compute.internal
    2012-03-10 13:50:38,569 INFO org.apache.hadoop.net.NetworkTopology: 
    Adding a new node: /dc1/rack2/ip-10-166-221-198.us-west-1.compute.internal
    

How does it work

The following chart shows the concept of Hadoop rack awareness:

HBase management, performance tuning

Each block of HDFS file will be copied to multiple datanodes to prevent the loss of all data copies due to the failure of one machine. However, if all the replica data is copied to the datanodes of the same rack, and then the rack fails, all the replica data will be lost. In order to avoid this, namenode needs to know the network topology in order to use that information to make smart copies of data.

As shown in the chart above, using the default three replication factors, two copies of the data will be on the same rack machine and the other will be on a different rack machine. This ensures that a single rack fails without losing all copies of the data. Normally, two machines in the same rack have more bandwidth and lower latency than two machines in different racks. Using network topology information, Hadoop can maximize network performance by reading data from appropriate datanodes. If the data is available on the local machine, Hadoop will read the data from it. If it is not available, Hadoop will try to read data from machines in the same rack, and if it is also not available, it will read data from machines in different racks.

In step 1, we created the topology.sh script. The script takes the DNS name as the parameter and returns the network topology name (rack) as the output. DNS name mapping to network topology is done by topology.data The file is provided and created in step 2. If an entity is not in the topology.data The script returns / default / rack as the default rack name.

Note: we use the IP address instead of the host name in the topology.data File. This is a known bug. Hadoop does not handle the starting letters from “a” to “F” properly. inspectHADOOP-6682For more information.

Note: the above bug has been solved in version 0.22.0. So you should be able to use the host name, but you need to test it.

In step 3, we’re at the core- site.xml Set in topology.script.file . name property, which tells Hadoop to call topology.sh To resolve DNS name as network topology name.

Then restart Hadoop. As shown in the logs in steps 5 and 6, HDFS and MapReduce add the correct rack name as the prefix of the DNS name of the slave node. This shows that the rack awareness of HDFS and MapReduce works well with the settings mentioned above.

Mounting disks with noatime and nodiratime

If you are just mounting a disk for Hadoop, you can use ext3 or ext4, or XFS file system. I suggest you mount a disk with noatim and nodiratim attributes.

If you mount a disk as noatime, the access timestamp is not updated when a file is read in the file system. In the case of the nodiratime attribute, the mount disk does not update the inode access time of the directory in the file system. Because they don’t have more disk I / O update access timestamps, this improves the access speed of the file system.

In this regard, we will describe why Hadoop recommends using noatime and nodiratime, and how to mount disks using noatime and nodiratime.

preparation

You need to have root permission on your slave node. Let’s assume that your Hadoop has only two disks – / dev / xvdc and / dev / xvdd. The two disks are mounted on / MNT / is1 and / MNT / is2 respectively. Also, let’s assume you’re using the ext3 file system.

How to do it

In order to mount disks using noatime and nodiratime, execute the following instructions on each slave node in the cluster:

  1. Add the following command to the / etc / fstab file:

    $ sudo vi /etc/fstab
    /dev/xvdc /mnt/is1 ext3 defaults,noatime,nodiratime 0 0
    /dev/xvdd /mnt/is2 ext3 defaults,noatime,nodiratime 0 0
    
  2. Unmount the disks and mount them again for the changes to take effect:

    $ sudo umount /dev/xvdc
    $ sudo umount /dev/xvdd
    
    $ sudo mount /dev/xvdc
    $ sudo mount /dev/xvdd
    
  3. Check the mount options that are in effect:

    $ mount
    /dev/xvdc on /mnt/is1 type ext3 (rw,noatime,nodiratime)
    /dev/xvdd on /mnt/is2 type ext3 (rw,noatime,nodiratime)
    

How does it work

Because Hadoop (HDFS) uses namenode to manage metadata (inode), any access time information saved by Hadoop is an independent atime attribute of an independent block. Therefore, the access timestamp of the datanode local file system is meaningless. That’s why I recommend that you use noatim and nodiratim to mount disks if the disk is purely for Hadoop. The write I / O of a local file is saved by using noatime and nodiratime mount disks.

These options are set in the / etc / fstab file. Don’t forget to unload and mount again to make the changes take effect.

Making these options effective can improve the read performance of HDFS. Because HBase stores its data on HDFS, the read performance of HBase will also be improved.

More than that

Another optimization is to reduce the percentage of reserved blocks on ext3 or ext4 file systems. By default, some file system blocks are reserved for use by privileged processes. This is to avoid the situation that the user process is full of disk space, which is required by the system daemons to keep working normally. This is very important for the disk in the host operating system, but it is limited for the disk only used by Hadoop.

Usually these Hadoop disks have a very large storage space. Reducing the percentage of reserved blocks can increase the storage capacity of HDFS cluster. Normally, the default percentage of reserved blocks is 5%. It can be reduced to 1%.

Note: do not reduce the reserved block of disk space on the host operating system.

To achieve this, run the following command on each disk of each slave node in the cluster:

$ sudo tune2fs -m 1 /dev/xvdc
tune2fs 1.41.12 (17-May-2010)
Setting reserved blocks percentage to 1% (1100915 blocks)

Put vm.swappiness Set to 0 to avoid swap

Linux moves memory pages that haven’t been accessed for a while to swap space, even if it has enough available memory. This is called swap out. In other words, it is called swap in to read swap out data from swap space into memory. Swapping is necessary in most cases, but becauseJava Virtual Machine(JVM)It doesn’t perform very well under swapping. If it is swapped, HBase may run into problems. The expiration of zookeeper’s session may be a typical problem introduced by swap.

In this regard, we’ll describe how to adjust the performance of Linux vm.swappiness Parameter to avoid swap.

preparation

Make sure you have root permission on your cluster node.

How to do it

To adjust Linux parameters to avoid swap, call the following command on each node in the cluster:

  1. Execute the following command to set the vm.swappiness The parameter is 0

    root# sysctl -w vm.swappiness=0
    vm.swappiness = 0
    

    This change will remain in effect until the next server restart.

  2. Add the following to / etc/ sysctl.conf File so that the setting takes effect permanently:

    root# echo "vm.swappiness = 0" >> /etc/sysctl.conf
    

How does it work

vm.swappiness Parameter is used to define how many active memory pages are swapped to disk. It receives any value from 0 to 100 – a lower value means that the kernel will swap less, but a higher value makes Kernel Applications swap more often. The default value is 60.

In step 1, we put the vm.swappiness Set to 0, which will prevent the kernel from swapping processes out of physical memory as much as possible. This is very useful for HBase because the process of HBase consumes a lot of memory and a high cost vm.swappiness Value will make HBase exchange a lot and suffer from very slow garbage collection. This may cause the regionserver process to be killed as the zookeeper session times out. We recommend that you set it to 0 or any lower number (for example, 10) and watch the swapping state.

Note that this value is set by the sysctl command and will only persist until the next time the server restarts. You need to be in / etc/ sysctl.conf File settings vm.swappiness So that this setting will take effect whenever the server is restarted.

Java GC and HBase heap settings

Because HBase runs in the JVM, theGarbage Collection(GC)Setting is very important for the smooth running of HBase and higher performance, in addition to the guidelines for configuring HBase heap settings. It is equally important to have HBase processes output to their GC logs, and they adjust the JVM settings based on the output of the GC logs.

I’ll describe the most important HBase JVM heap settings, how it works, and how to understand GC logs, in this respect. I’ll override some guidelines to adjust HBase’s Java GC settings.

preparation

Log in to your HBase region server.

How to do it

The following are recommended for Java GC and HBase heap settings:

  1. By editing HBase- env.sh The file size should be large enough for HBase. For example, the following fragment configures an 8000-mb heap for HBase:

    $ vi $HBASE_HOME/conf/hbase-env.sh
    export HBASE_HEAPSIZE=8000
    
  2. Make the GC log effective by the following command:

    export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/usr/local/hbase/logs/gc-hbase.log"
    
  3. Add the following code to start earlier than the defaultConcurrent-Mark-Sweep GC(CMS)

    $ vi $HBASE_HOME/conf/hbase-env.sh
    export HBASE_OPTS="$HBASE_OPTS -XX:CMSInitiatingOccupancyFraction=60"
    
  4. Synchronize changes in the cluster and restart HBase.
  5. Check the output to the specified log file (/ usr / local / HBase / logs / GC)- hbase.log )The GC log of. The GC log looks like the following screen capture:

HBase management, performance tuning

How does it work

In step 1, we configure the HBase heap memory size. By default, HBase uses 1GB of heap, which is too low for modern machines. For HBase, larger than 4GB is good. We recommend 8GB or more, but less than 16GB.

In step 2, we make the JVM log effective. With this setting, you can get the region server’s JVM log, similar to what we showed in step 5. Basic knowledge of JVM memory allocation and garbage collection is required in order to understand the log output. The following is a diagram of the JVM generational garbage collection system:

HBase management, performance tuning

There are three generationsPerm(or permanent),Old GenerationGeneration [old age], andYoungThe younger generation. The young generation consists of three independent spaces,EdenSpace and two survivor spaces,S0andS1

Usually, objects are assigned to the younger generationEdenSpace, if an allocation fails(EdenAll Java threads stop, and a minor GC is called. All those who survive in the younger generation(EdenandS0Space) is copied toS1Space. IfS1The space is full and the objects are copied (promoted) to the old age. When an ascension fails, the old age is collected (major / full GC). Permanent and older generations are usually collected together. Permanent generations are used to store methods defined in classes and objects.

Back to step 5 of our example, the minor GC output of the above options is in the following form:

<timestamp>: [GC [<collector>: <starting occupancy1> -> <ending occupancy1>, <pause time1> secs] 
<starting occupancy3> -> <ending occupancy3>, <pause time3> secs] 
[Times: <user time> <system time>, <real time>]

In this output:

  • Timestamp is the time when GC occurs, relative to the start time of the application.
  • Collector is the internal name used by collector for minor collection
  • Starting occupancy1 is the occupation of the young generation before garbage collection
  • Ending occupancy1 is the occupation of the young generation after garbage collection
  • Pause time1 is the interrupt time of minor collection
  • Starting occupancy3 is the occupation of the whole heap before garbage collection
  • Ending occupancy3 is the occupation of the whole heap after garbage collection
  • Pause time3 is the interrupt time of the whole garbage collection, including major collection.
  • [time:] explains the time spent on garbage collection, user time, system time, and actual time.

In step 5, the first line of our output shows that it is a minor GC, which interrupts the JVM for 0.0764200 seconds. It has reduced the space of the younger generation from 14.8mb to 1.6Mb.

Next, let’s look at the CMS GC log. HBase uses CMS GC as its default old age garbage collector.

CMS GC performs the following steps:

  1. Initialization flag
  2. Concurrency marker
  3. Repeat marker
  4. Concurrent sleep

CMS interrupts the application process only when it initializes tags and repeats tags. In the concurrent marking and sleep phase, CMS thread runs with the application thread.

The second line of the example shows that CMS initialization took 0.0100050 seconds and concurrent took 6.496 seconds. Note that Java is not interrupted by concurrent tags.

In the early screenshots of the GC log, the line begins at 1441.435: [GC [YG Occupation: ]There’s a break in the middle of the road. The interrupt here is 0.0413960 seconds, which is used to repeatedly mark the heap. After that, you can see the beginning of sleep. CMS sleep took 3.446 seconds, but the heap size didn’t change much here (it continued to occupy about 150MB).

The adjustment point here is to make all interrupt times lower. In order to keep the interrupt time lower, you need to use the – XX: newsize and – XX: maxnewsize JVM parameters to adjust the size of the younger generation space, in order to set them to a relatively small value (for example, a few hundred MB higher). If the server has more CPU resources, we recommend using parallel new collector by setting the – XX: + useparnewgc option. You may also want to adjust the number of parallel GC threads for your younger generation through the – XX: parallelgcthreads JVM parameter.

We recommend adding the above settings to HBase_ REGIONSERVER_ In the opts variable, replace HBase- env.sh HBase in file_ Opts variable. HBASE_ REGIONSERVER_ Opts only affects the process of the region server, which is very good because HBase master does not handle heavy tasks or participate in data processing.

For the older generation, concurrent collection (CMS) can’t be accelerated, but it can start earlier. When the space ratio allocated in the old age exceeds a threshold, CMS starts to run. This threshold is automatically calculated by the collector. In some cases, especially during loading, if CMS starts too late, HBase may directly perform full garbage collection. To avoid this, we suggest setting the – XX: cmsinitiatingoccupancyfraction JVM parameter to specify exactly what percentage CMS should be started, as we did in step 3. Starting at 60 or 70 percent is a good practice. When CMS is used in the old age, the default young generation GC will be set to parallel new collector.

More than that

If you used HBase version 0.92 before, consider using memstore local to allocate buffers to prevent heap fragmentation of the older generation

$ vi $HBASE_HOME/conf/hbase-site.xml
  <property>
    <name>hbase.hregion.memstore.mslab.enabled</name>
    <value>true</value>
  </property>

This feature is enabled by default in HBase 0.92.

Using compression

Another important feature of HBase is the use of compression. It is very important because:

  • Compression reduces the number of bytes read and written from HDFS
  • Save disk space
  • When getting data from a remote server, it improves the efficiency of network bandwidth

HBase supports gzip and LZO formats. My suggestion is to use LZO compression algorithm, because it decompresses data quickly and CPU utilization is low. Better compression ratio is the first choice of the system, you should consider gzip.

Unfortunately, HBase cannot use LZO because of license issues. HBase is Apache licensed, while LZO is GPL licensed. Therefore, we need to install LZO ourselves. We will use the Hadoop LZO library to bring a variant LZO algorithm to Hadoop.

In this regard, we will describe how to install LZO and how to configure HBase to use LZO compression.

preparation

Make sure that Java is installed on the machine where Hadoop LZO is built. Apache ant is required to build Hadoop LZO from source code. Install ant by running the following command:

$ sudo apt-get -y install ant

All nodes in the cluster need to have native LZO libraries installed. You can install it by using the following command:

$ sudo apt-get -y install liblzo2-dev

How to do it

We will use Hadoop LZO library to add LZO compression support to HBase

  1. fromhttps://github.com/toddlipcon/hadoop-lzoGet the latest Hadoop LZO source code
  2. Build a native Hadoop LZO library from the source code. Depending on your OS, you should choose to build 32-bit or 64 bit binary packages. For example, to build a 32-bit binary package, run the following command:

    $ export JAVA_HOME="/usr/local/jdk1.6"
    $ export CFLAGS="-m32"
    $ export CXXFLAGS="-m32"
    $ cd hadoop-lzo
    $ ant compile-native
    $ ant jar
    

    These commands create the Hadoop LZO / build / native directory and the Hadoop LZO / build / Hadoop lzo-x.y.z.jar file. To build a 64 bit binary package, you need to change cflags and cxxflags to M64.

  3. Copy the built package to your master node’s $HBase_ Home / lib and $HBase_ Home / lib / native Directory:

    [email protected]$ cp hadoop-lzo/build/hadoop-lzo-x.y.z.jar     $HBASE_HOME/lib
    [email protected]$ mkdir $HBASE_HOME/lib/native/Linux-i386-32
    [email protected]$ cp  hadoop-lzo/build/native/Linux-i386-32/lib/* $HBASE_HOME/lib/native/Linux-i386-32/
    

    For a 64 bit OS, change linux-i386-32 to (in the previous step) linux-amd64-64.

  4. add to hbase.regionserver.codecs Configure to your HBase- site.xml File:

    [email protected]$ vi $HBASE_HOME/conf/hbase-site.xml
    <property>
    <name>hbase.regionserver.codecs</name>
    <value>lzo,gz</value>
    </property>
    
  5. Synchronize $HBase in cluster_ Home / conf and $HBase_ Home / lib directory.
  6. HBase ships uses a tool to test whether compression is set correctly. Use this tool to test the LZO settings on each node in the cluster. If everything is configured correctly, you will get a successful output:

    [email protected]$ $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.util.CompressionTest /tmp/lzotest lzo
    12/03/11 11:01:08 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 249.6m
    12/03/11 11:01:08 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
    12/03/11 11:01:08 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev Unknown build revision]
    12/03/11 11:01:08 INFO compress.CodecPool: Got brand-new compressor
    12/03/11 11:01:18 INFO compress.CodecPool: Got brand-new decompressor
    SUCCESS
    
  7. Create a table by using LZO compression to test the configuration and verify it in HBase shell:

    $ hbase> create 't1', {NAME => 'cf1', COMPRESSION => 'LZO'}
    $ hbase> describe 't1'
    DESCRIPTION 
    ENABLED 
    {NAME => 't1', FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 
    'NONE', true 
    REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'LZO',    
    MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', 
    IN _MEMORY => 'false', BLOCKCACHE => 'true'}]}                                                           
    1 row(s) in 0.0790 seconds
    

How does it work

hbase.hregion.majorcompaction Property specifies the major completions time between all stored files on the region. The default time is 86400000, which is one day. We set it to 0 in step 1 to disable automatic major com action. This will prevent major com action from running during busy load times, such as when the MapReduce task is running on the HBase cluster.

In other words, major com action is required to help improve performance. In step 4, we have shown an example of how to manually trigger major com action on a special region through HBase shell. In this example, we have passed a region name to major_ Use the compact command to call major compact only on a single region. It may also run major com action on all regions in a table by passing the table name to the command. major_ The compact command queues the specified table or region for major compact; however, they are hosted by the region server, which will be executed in the background.

As we mentioned earlier, you may just want to perform major com action manually during a low load period. This makes it easy to call major through a timed task_ Compact.

More than that

Another way to call major com action is to use the org.apache.hadoop . hbase.client.HBaseAdmin Class. It’s very easy to call this API in Java. So you can manage complex major com action scheduling from Java.

Manage region splitting

Usually, an HBase table starts with a separate region. However, because the data keeps growing and the region reaches its configured maximum, it is automatically split into two parts, so that they can process more data. The following diagram shows a HBase region split:

HBase management, performance tuning

This is the default behavior of HBase region splitting. This principle works well in most cases, but there are some problems, such as split / COM action storm.

With the uniform data distribution and growth, all regions in the table need to be split at the same time. Following a split, compression will run in the sub regions to rewrite their data into separate files. This will cause a lot of disk I / O read and write and network traffic.

To avoid this, you can turn off automatic splitting and call it manually. Because you can control when splitting is called, it can help extend the I / O load. Another advantage is that manual splitting can give you better control over regions and help you track and solve region related problems.

In this regard, I’ll describe how to turn off automatic region splitting and call it manually.

preparation

Log in to your HBase master server using the user you started the cluster.

How to do it

To turn off automatic region splitting and call it manually, follow these steps:

  1. In HBase- site.xml Add the following code to the document:

    $ vi $HBASE_HOME/conf/hbase-site.xml
    <property>
    <name>hbase.hregion.max.filesize</name>
    <value>107374182400</value>
    </property>
    
  2. Synchronize these changes in the cluster and restart HBase.
  3. With the above settings, region splitting will not occur until the region size reaches the configured 100GB threshold. You will need to explicitly call it on the selected region.
  4. To run a region split through HBase shell, use the following command:

    $ echo "split 'hly_temp,,1327118470453.5ef67f6d2a792fb0bd737863dc00b6a7.'" | $HBASE_HOME/bin/hbase shell
    HBase Shell; enter 'help<RETURN>' for list of supported commands.
    Type "exit<RETURN>" to leave the HBase Shell  Version 0.92.0, r1231986, Tue Jan 17 02:30:24 UTC 2012
    split 'hly_temp,,1327118470453.5ef67f6d2a792fb0bd737863dc00b6a7.'
    0 row(s) in 1.6810 seconds
    

How does it work

hbase.hregion.max The. Filesize property specifies the maximum region size (bytes). The default value is 1GB (256MB before HBase 0.92). This means that when a region exceeds this size, it will split into two. In step 1, we set the maximum value of region to 100GB, which is a very high number.

Because splitting doesn’t happen until it exceeds the 100GB boundary, we need to call it explicitly. In step 4, we use the split command on a specified region to call split through HBase shell.

Don’t forget to split large regions. A region is the basic data distribution and load unit in HBase. Regions should be split into appropriate sizes during low load periods.

In other words, too many splits are bad, and too many splits on a region server will degrade its performance.

After manually splitting the region, you may want to trigger major com action and load balancing.

More than that

Our previous setting will cause the whole cluster to have a default maximum of 100GB region. In addition to changing the entire cluster, when creating a table, you can also specify Max on the basis of a column cluster_ The file size property.

  $ hbase> create 't1', {NAME => 'cf1', MAX_FILESIZE => '107374182400'}

Like major com action, you can also use it org.apache.hadoop . hbase.client.HBaseAdmin Class.

Recommended Today

Introduction to vernacular spring cloud

First of all, let me show you a picture. If you don’t understand this picture very well, I hope you will suddenly realize it after reading my article. What is spring cloud Building a distributed system doesn’t need to be complex and error prone. Spring cloud provides a simple and easy to accept programming model […]