Develop Paper
  • Program
  • Server
  • Development Tool
  • Blockchain
  • Database
  • Artificial Intelligence
  • Blogs
Position: Home > Database > Content

Clickhouse uses an external dictionary for dimension completion

Time:2022-6-2

background

Clickhouse does not implement join very well, which is not suitable for large table Association. It advocates the concept of large and wide tables. Because by default, Clickhouse uses the hash join algorithm. Clickhouse takes <right_ Table> and create a hash table for it in RAM. After a certain memory consumption threshold, Clickhouse falls back to the merge join algorithm. Therefore, when Clickhouse is used in the case of a large amount of data, join should be used as little as possible, and field redundancy should be carried out through redundant large wide tables. However, in some scenarios, such as the scenario of dimension completion, we can use the join method when the data in the dimension table is not large. Of course, there will be better methods. Let me talk about them slowly.

Business scenario

Existence fact table: ODS_ device_ Log (equipment submission log), dimension table: product table (dim_product), account tariff configuration table (dim_billing_policy).

A simple requirement is to count the metering data of devices in the billing cycle: the number of reported messages.

ods_ device_ The log table structure is as follows:

`product_ Key` string comment “product key, which is used to obtain the account information”,

`device_ Key` string comment “device key”,

`data_ Time ` datetime comment “data reporting time”,

`log_ Info` string comment “report data content”

The structure of the product table is as follows:

`org_ ID ` Int64 comment “the account ID to which the product belongs”,

`product_ Key` string comment “product key”,

`product_ Name` string comment “product name”

The account tariff configuration table is as follows:

`org_ ID ` Int64 comment “the account ID to which the product belongs”,

`

` policy_ Name` string comment “tariff name”,

`billing_ start_ Time` datetime comment “billing cycle start time”,

`billing_ end_ Time` datetime comment “billing cycle end time”

Through join

The SQL is still relatively simple. The countif function can be used to easily count the measurement data. The SQL is as follows:

select

countIf(a.data_time>c.billing_start_time)

from

ods_device_log a

left join dim_product b on a.product_key = b.product_key

left join dim_billing_policy c on b.org_id =c.org_id

where

a.data_time >= toDateTime(‘2021-12-01 00:00:00’)

and a.data_time <toDateTime(‘2021-12-02 00:00:00’)

GROUP by

a.product_key,

a.device_key;

A better implementation

For dimension tables with small volume, we can replace the join method with an external dictionary. Because join each query will revisit the associated table. You can build a dimension table into an external dictionary table. The dictionary table periodically synchronizes dimension information. Then obtain dimension data through dictionary function. The reference SQL is as follows:

select

countIf(a.data_time > dictGetDateTime(‘dim_billing_policy’,’billing_start_time’,

tuple(dictGetInt64(‘dim_product’,’org_id’,tuple(a.product_key)),’00-00-01′))

) num

from

ods_device_log a

where

a.data_time >= toDateTime(‘2021-12-01 00:00:00’)

and a.data_time <toDateTime(‘2021-12-02 00:00:00’)

GROUP by

a.product_key,

a.device_key;

Description: dictionary table dim_ billing_ Policy primary key is ORG_ id、policy_ code。 Dictionary table dim_ The product primary key is product_ key。 First, through the product_ Key to obtain the account ID, and then obtain the start time of the billing cycle through the account ID and billing policy code.

Tags: account, charging, Dictionaries, dimension, postage

Recommended Today

C language sequence point problem

C language sequence point problem C language side effects: (side effect) refers to the modification of the objects or documents of the stacker. For example, statement v = 99; The side effect of V is to change the value of V to 99. C language sequence point: (sequence point) refers to a special point in […]

  • Alibaba cloud experience Award: how to connect polardb-x with big data and other systems
  • Activiti prints the SQL of mybatis
  • Dumping export intra table concurrency optimization – tidb tool sharing
  • Yiwen master advanced SQL skills of gaussdb (DWS): full text retrieval
  • Craftsmanship! RDS database from getting started to mastering is open for download!
  • Why mongodb uses B-tree
  • Technology sharing | how to find the repaired bug content according to the MySQL crash log
  • Six use cases for real-time decision-making – from fraud prevention to analysis, learn how global enterprises use the volt active data platform to promote digital transformation.
  • Eclipse development tool installs mybatipse
  • After waiting for more than a month, I did it myself
Pre: Class annotation template and method annotation template in idea
Next: Kotlin– realize MVVM architecture by comprehensively using hilt, paging3, flow, room, retro fit, coil, etc

    Tags

    address algorithm array assembly attribute Browser c Character string Client code command configuration file container data Database Definition Edition element Example file function java javascript Journal link linux Memory method Model Modular mysql node object page parameter php Plug-in unit project python Route source code The server Thread time user

    Recent Posts

    • C language sequence point problem
    • The idea compilation project always prompts that the package cannot be found. Delete the dependency in ieda, update maven, and re import. Ensure that the local warehouse location is the location of the project dependency reference
    • About interpersonal relationships
    • Network counting Experiment III
    • [play with cloud function] get through GitHub to enterprise wechat

    Recent Comments

    • Blogs on Answer for Java: why is array not passed by reference?
    • douya0808 on Answer for Java: why is array not passed by reference?
    • joyqi on Under the ES6 specification, the repeat function reports an error invalid count value
    • joyqi on Under the ES6 specification, the repeat function reports an error invalid count value
    • joyqi on Under the ES6 specification, the repeat function reports an error invalid count value

    Categories

    • .NET Core
    • Agile Development
    • Algorithm And Data Structure
    • Android
    • Apple MAC
    • Architecture Design
    • Artificial Intelligence
    • ASP.NET
    • Backend
    • Blockchain
    • C
    • C#
    • C++
    • Cloud
    • Database
    • Design Pattern
    • Development Tool
    • Embedded
    • Erlang
    • Freshman
    • Game
    • Golang
    • HTML/CSS
    • HTML5
    • Informal Essay
    • Information Security
    • IOS
    • Java
    • JavaScript
    • JSP
    • Linux
    • Lua
    • MongoDB
    • MsSql
    • MySql
    • Network Security
    • OOP
    • oracle
    • Other DB
    • Other Technologies
    • Other Technology
    • Perl
    • PHP
    • Program
    • Python
    • Redis
    • Regular Expression
    • Ruby
    • Rust
    • SAP
    • Server
    • Software Testing
    • Team Management
    • VBS
    • VUE
    • WEB Front End
    • Windows
    • XML/XSLT
  • java
  • php
  • python
  • linux
  • windows
  • android
  • ios
  • mysql
  • html
  • .net
  • github
  • node.js

Copyright © 2022 Develop Paper All Rights Reserved      Sitemap    About DevelopPaper    Privacy Policy    Contact Us