Small T Introduction: in《These mysterious parameters teach you the correct use of tdengine cluster》In this article, we talked about how to use the reasonable configuration of vnode to complete the data fragmentation of tdengine. In this issue, we will continue to talk about how tdengine manages data from the time dimension.
First, let’s take a look at the relevant descriptions on the official website:
“In addition to vnode fragmentation, tdengine partitions time series data according to time periods. Each data file contains only time series data of one time period, and the length of the time period is determined by the DB’s configuration parameter days. This method of partitioning by time periods also facilitates the efficient implementation of data retention policies as long as the data file exceeds the specified number of days (system configuration parameter keep) will be automatically deleted, and different time periods can be stored in different paths and storage media to facilitate the hot and cold management of big data and realize multi-level storage. “
It can be seen that the retention strategy of timing data is firmly controlled by the two parameters of keep and days. However, if we want to better understand the storage logic of tdengine timing data and optimize the performance, it is not enough to know only the above.
The official documents describe keep and days as follows:
keep: the number of days to retain data in the database. The unit is days. The default value is 3650
days: the time span of a data file to store data, in days. Default: 10
Tdengine strictly controls the timestamp range of inserted data through keep and days: for past data, the timestamp value of keep subtracted from the current time cannot be exceeded; For future data, the timestamp value of the current time plus days cannot be exceeded.
Let’s assume that the keep parameter of a database is 7, the days parameter is 3, and the current time is 0:0 on the 9th of a month.
Since keep is 7, data before the 2nd day (9-7) must not be written. In addition to limiting the insertion of future time data, data after 12 days (9 + 3) can not be inserted. In this way, there is the time range (color range) that tdengine can currently process data. When you try to write data in the gray time area, you will see the prompt of “timestamp out of time range”.
This group of graphs represents the changes in the distribution of data files and the range of writable data with the movement of the current timeline.
With the passage of time, the timestamp of the data will be calculated with the system time. Once the keep days are exceeded, it will be recognized as expired data. The data file will not be cleared from the computer until all the data in the data file has expired.
Taking the above group diagram as an example, since the data of day 2 and day 4 are in the same data file (data file 1), the data of day 4 can be retained until the end of day 11 at most, so the data of day 2 should also be retained until the end of day 11. So we can see that data file 1 was deleted on the 12th.
Careful readers may ask, if I write the data of 3 days, how do I know whether the data will fall in the range of 345, 123 or 234. In fact, it’s like this – tdengine starts from 0:00:00 on January 1, 1970 (epochtime), and delimits a partition every three days. Therefore, for any time stamp, it is “the one you draw is the one you draw”.
Due to the coarse deletion granularity of the above mechanism, in order to optimize the user experience, after version 18.104.22.168, we set the start time of where timestamp of SQL query to be greater than the expiration time by default to realize the fully controllable “expired data deletion” on the user side. Therefore, now all expired data is invisible to users.
Although at the physical level, data is still deleted in units of data files. However, except for users who have extremely fine requirements for storage space, the vast majority of users do not perceive it. After this optimization, users no longer need to worry about the granularity of deletion. As long as you can flexibly set the size of the days parameter according to your business type to find the best performance.
In addition, since the time range of writable data (now keep to now + days) and the time range of data segmentation (days) are given, the automatic deletion mechanism can be considered to be working normally as long as the number of data file groups under the vnode directory is less than or equal to keep / days + 1.
The above is what the official document says: “given the two parameters of days and keep, the maximum number of data files in a vnode is: keep / days + 2”.
Conceptually speaking, “tdengine divides big data through vnode and time dimensions, which is convenient for parallel and efficient management and horizontal expansion.” however, how to turn boring concepts into their own correct understanding still needs to be learned.These mysterious parameters teach you how to use tdengine cluster correctlyAnd this article cut into the principle of tdengine from these two dimensions, which can be said to be the core knowledge point.
For tdengine, we hope you can know its nature and why.