Intensive reading of SQL vs flux

Time:2020-1-17

1 Introduction

Intensive reading of SQL vs flux

There are two ways to process time series data, as shown in the figure. On the right is SQL, on the left is custom query language, also known as NoSQL. In the middle is SQL-like.

This paper tries to give some comparison by comparing timescaledb of SQL camp with influxdb of NoSQL camp.

2 Overview

TimescaleDB

Timescaledb fully accepts SQL syntax, so there is almost no threshold for learning, and it optimizes the way of use through visual operation.

InfluxDB

Influxdb has created a new query language, here is flux grammar

Why does influxdb create flux syntax

There are two main reasons why influxdb creates flux syntax instead of using SQL:

  1. Stronger query function: SQL can not easily complete time series query.
  2. The query of time series needs the function model based on flow, not the algebra model of SQL.

The so-called flow model is similar to the similar concept in JS functional programming:

source.pipe(
  map(x => x + x),
  mergeMap(...),
  filter(...)
)

Stronger query function?

The following is an example of influxdb:

Flux:

from(db:"telegraf")
  |> range(start:-1h)
  |> filter(fn: (r) => r._measurement == "foo")
  |> exponentialMovingAverage(size:-10s)

SQL:

select id,
       temp,
       avg(temp) over (partition by group_nr order by time_read) as rolling_avg
from (
  select id,
         temp,
         time_read,
         interval_group,
         id - row_number() over (partition by interval_group order by time_read) as group_nr
  from (
     select id,
            time_read,
            'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group,
            temp
     from readings
  ) t1
) t2
order by time_read;

Although it seems that SQL is much longer than flux, the core of flux code is to implement custom functionsexponentialMovingAveragePostgreSQL also has the ability to create functions.

Define a custom function through SQL:

CREATE OR REPLACE FUNCTION exponential_moving_average_sfunc
(state numeric, next_value numeric, alpha numeric)
RETURNS numeric LANGUAGE SQL AS
$$
SELECT
         CASE
                WHEN state IS NULL THEN next_value
                ELSE alpha * next_value + (1-alpha) * state
         END
$$;
CREATE AGGREGATE exponential_moving_average(numeric, numeric)
(sfunc = exponential_moving_average_sfunc, stype = numeric);

After that, it can be called like a flux function:

SELECT time,
       exponential_moving_average(value, 0.5) OVER (ORDER BY time)
FROM telegraph
WHERE measurement = 'foo' and time > now() - '1 hour';

It can be seen that from the function definition, it is also tied with flux. The author thinks that since the functions are the same and the learning cost of SQL based language is lower, there is no need to create a new language.

About syntax sugar and SQL standard

The author believes that although there is a view that the syntax sugar of flux is more concise than SQL, the maintainability of the code is not that the fewer the lines, the better, but whether it is easy to understand by human beings.

For creating a function standard may destroy the portability of SQL, the author thinks that is better than creating a new syntax completely.

Is stream based function model better than SQL algebra model?

Indeed, from a functional point of view, of course, the functional model is better than the algebraic model, because the algebraic model is only describing things, and can not accurately control every step of execution.

But we need to make clear the scenario of SQL, by describing an unordered query problem, let the database give the results. In the process of query, the database can optimize the SQL statement.

In contrast, the business code is used to describe the query request, which can not be automatically optimized. Although it provides users with a lower level of control, the cost is that it can not be optimized by the database execution engine.

If you prefer query language to execution logic, SQL is still the best choice.

3 Summary

The reason for this intensive reading is to explore the relationship between SQL and other query languages, and to understand why SQL has been used up to now.

SQL and other functional query languages are not on the same level. If we attack SQL in terms of legal sugar and maneuverability, we can only draw a seemingly correct, but actually absurd conclusion.

SQL is a query language. Compared with ordinary programming language, it is still at the upper level and will eventually be converted into relational algebra for execution. However, relational algebra will follow some equivalent conversion laws, such as exchange law, combination law, filter condition splitting, etc. by estimating the time cost of each step and recombining the execution order of SQL, the execution efficiency can be improved.

If more than one SQL is executed at the same time, it can also be integrated into one or more new SQL to merge duplicate query requests.

In today’s data driven business, SQL is still the most common solution for data query.

4 more discussion

Discussion address: Intensive Reading of SQL vs flux · issue 񖓿 96 · DT Fe / weekly

If you want to participate in the discussion, please click here. There are new topics every week, which will be released on weekends or Monday.