Kudu: introduction and architecture of Apache kudu



Introduction to kudu

Kudu is open source by clouderaStorage engine, low latency can be provided at the same timeRandom reading and writingAnd efficientBatch data analysisAbility, he is a fusionHdfsandHbaseA new component with a new storage component in between.

Kudu: introduction and architecture of Apache kudu

Kudu and HBase HDFS comparison.png
  • Kudu is a big data storage engine, which can be combined with other frameworks for mass data analysis
  • It combines the high throughput data capability of HDFS and the high random reading and writing capability of HBase

Kudu architecture

Kudu uses a singleMasterNodes manage clusters and metadata, using any number ofTable ServerNodes to store actual data. Multiple master nodes can be deployed to improve fault tolerance.
Kudu architecture is divided intoMaster ServerTablet ServerTableTablet

Kudu: introduction and architecture of Apache kudu

Kudu architecture.png
  • Master Server: the leader in the kudu cluster can have multiple master servers to improve the performance of the clusterFault tolerance, butonly oneThe master server provides external services and is responsible for cluster management and managementmetadata
  • Tablet Server: the younger brother in the kudu cluster can haveAny number of, responsibleStore dataandData reading and writing。 Store on tablet serverTablet, for a tablet, only one of the table servers serves asleader, provides read-write services, and other table servers arefollower, read only.
  • TableTable concepts in kudu include:SchemaandPrimary KeyConcept, tables in kudu will behorizontal directionDivided into multipleTabletFragments are stored on the tablet server.
  • Tablet: a tablet is a of a tableContinuous fragment, table is the horizontal partition of the table, and the primary key range between tablesNo overlap, all tablet fragments of a table constitute all primary key ranges of the table. Tablet willRedundant storageSet on multiple tablet serverscopy, at any time, only one tablet server is a leader and the others are followers.

data model

Kudu’s design is orientedStructured storage, the data model is similar to the traditional relational database. A kudu cluster is composed of multiple tables, and each table is composed of multiple fields. A table must specify a primary key composed of several (> = 1) fields, as shown in the following figure:

Kudu: introduction and architecture of Apache kudu

Data model.png
  • Kudu needs to be defined when creating a tableSchemaInformation, including definitionsColumn (column type)andPrimary key primary key
  • Kudu’sData uniquenessColumn combination dependent on primary key
  • Kudu does not support the of traditional relational databasesSecondary index
  • Each field in the kudu table isStrong typeInstead of HBase, all fields are considered bytes. The advantage of this is that different types of data can be encoded differently. Kudu’s data types includeBOOL, INT8, INT16, INT32, BIGINT, INT64, FLOAT, DOUBLE, STRING, BINARY

Underlying data model

Kudu’s underlying storage is based onThe underlying storage system at the table / tablet / replica view level

Kudu: introduction and architecture of Apache kudu

Kudu underlying storage.png
  • Each table is divided into tables, and each table contains oneMetaDataAnd severalRowSet(row set)
  • Metadata recordmetadata, that is, record which table the tablet belongs to, and the rowset contains oneMemRowSetAnd severalDiskRowSet
  • Memrowset: when there isNew dataWrite the memrowset during insertion and modify the data already in the memrowset. When the memrowset is full or exceeds a certain time, brush it into the disk to form several diskrowsets. The default is 1g or 120s
  • Diskrowset: every time the memrowset is refreshed, a diskrowset will be generated. After the diskrowset is brushed, it will not change. The diskrowset containsBloomFIleAdhoxIndexBaseDataUndoFileRedoFileDeltaMem
  • Bloomfill: generated according to the key in a diskrowsetBloom filter, which is used to quickly locate whether a key is in the diskrowset
  • Adhoxindex: if the key locates the specific offset position of the key in the diskrowset
  • BaseData: memrowset is the data of the disk, stored by column and sorted by primary key
  • Redofile: saveAfter updateTo prevent the data from not being updated on disk after a successful transaction
  • Ubdofile: savingBefore updateTo prevent the recovery of original data after transaction failure
  • Deltamem: for diskrowset datato update, store the changed data in diskrowset. With the change of diskrowset, deltamem records the change records, and deltamem brushes the disk to form when it grows to a certain extentdeltaData

Data partition policy

Kudu partitions the table horizontally, and the kudu table will be partitioned horizontally and stored in multiple tables. However, compared with other storage engines, kudu provides a richer and more flexible data partitioning strategy. There are two general data partition strategies. One isRange Partitioning, another partitioning strategy isHash Partitioning

  • Range Partitioning: according toField value rangeHBase uses this method to partition data. The advantage is that most of the data can be read in batchesSequential read in the same tablet, can improveData read throughput。 And partition according to the range, we can easily expand the partition. The disadvantage is that data in the same range will be writtenFall on a single tabletWriting is stressful and slow
  • Hash Partitioning: according toHash value of the fieldFor partitioning, Cassandra adopts this method. Because it is a hash partition, data writing will be blockedEvenly distributed into each tabletFast write speed。 However, for sequential reading scenarios, this strategy is not applicable because the data is scattered. For one sequential reading, the data in each tablet needs to be read and combined separately,Low throughput。 Moreover, the hash partition cannot cope with partition expansion.

Kudu supports users to specify a value for a tableRange partitionRules andMultiple hash partitionsrule

Kudu: introduction and architecture of Apache kudu

Kudu partition.png

Kudu’s read / write update process

Kudu: introduction and architecture of Apache kudu

Kudu write process.png
  • First find the table to access according to the primary key, that is, according to the keyRangeFilter out the impossible range
  • Pass under this rangeBloom filterThen filter out the impossible rowset
  • Finally, through theB-treePinpoint whether the key exists
  • If it exists, an error will be reported; otherwise, it will be insertedMemRowSet

Kudu: introduction and architecture of Apache kudu

Kudu read process.png
  • First find the tablet according to the key range
  • Scan data under tables, and finddelta storeModify the data and merge the data that is not flushed into the disk in the memoryMenRowSet

Kudu: introduction and architecture of Apache kudu

Kudu update process.png
  • Like writing data, find the key data first
  • After finding the data, write the modified content todelta storein

Kudu web ui

Access the kudu Web UI via HTTP,http://master Host name: 8051

Kudu: introduction and architecture of Apache kudu

Master homepage.png

clickMastersYou can see that there is only one master single node, the role is leader, andRPC addressandHTTP address, used to use kudu and view kudu respectively

Kudu: introduction and architecture of Apache kudu

Click master.png

clickTablet ServerYou can see that there are 3 Registered tablet servers distributed on three machines, 3 successful and 0 failed. Click to enter the page of each tablet server

Kudu: introduction and architecture of Apache kudu

Click tablet server.png

clickTablesYou can view the table name, status, number of tables, and disk size of the table

Kudu: introduction and architecture of Apache kudu

Click tables.png

Click a table to view the primary key, data type, partition and impala table creation statement of the table

Kudu: introduction and architecture of Apache kudu

Click table.png