With the glory of real-time technology, Microsoft releases real-time big data analysis products!


The new programming model of cross cloud intelligent devices and parallel technology is a key aspect of digital transformation. One of the key data types generated from these new application paradigms is telemetry data. Telemetry data is everywhere: IOT sensors, application logs, network logs, infrastructure logs, security logs, indicators, click streams, time series, etc. The powerful insight released from these data has promoted the progress of networked devices that consumers and enterprises rely on every day.

Using telemetry data requires a flexible adaptive platform, which must be able to process a large amount of data and provide users with real-time insight to improve their operation and innovation. Traditionally, these data are stored and managed in the shaft system, lack of real-time visibility, scale constraints and high maintenance cost. In addition, it is complex to popularize and correlate these data with enterprise business.

What is azure synapse Data Explorer?

With the glory of real-time technology, Microsoft releases real-time big data analysis products!

In order to enable customers to make full use of logs and telemetry data, Microsoft released a public preview of azure synapse Data Explorer (data browser). In order to supplement the existing SQL pool and Apache spark engine, Microsoft optimizes the new data browser runtime engine from the product level, and uses powerful indexing technology to automatically index free text and semi-structured data, so that it can query a large number of structured, semi-structured and free text telemetry and time series data in near real time, Here are some key features that make this possible:

  • Powerful distributed query engine to index all data, including free text and semi-structured data. The data is automatically compressed, indexed and optimized, cached on SSD and persisted on storage. Computing and storage are separated, which gives users complete flexibility to scale automatically without downtime.
  • The intuitive kusto query language (KQL) uses the best text index of synapse data browser to explore the original telemetry and time series data for efficient free text search, regular expression and analysis of tracking \ text data.
  • Comprehensive JSON parsing function for querying semi-structured data, including arrays and nested structures.
  • Native and advanced time series support the creation, operation and analysis of multiple time series, and Python and R execution in the engine support model scoring.

What is the architecture of azure synapse data browser?

Data resource manager cluster realizes the architecture of horizontal expansion by separating computing resources and storage resources. In this way, users can independently expand each resource, for example, run multiple read-only calculations for the same data. The data resource manager cluster contains a set of computing engines that are responsible for automatic indexing, compression, caching and distributed query services. In addition, the data resource manager cluster also has a set of computing engines for data management services, which are responsible for background system jobs and the introduction of managed and queued data. All data is stored in a compressed, disaggregated format on a managed blob storage account.

With the glory of real-time technology, Microsoft releases real-time big data analysis products!

The data resource manager cluster supports a rich ecosystem and can introduce data using connectors, SDKs, rest APIs and other managed functions. Users can use temporary queries, reports, dashboards, alerts, rest APIs, and SDK data in a variety of ways.

What are the innovations and features of azure synapse data browser?

With the glory of real-time technology, Microsoft releases real-time big data analysis products!

Infinite stream data introduction-The data resource manager provides built-in integration for no / little code, high-throughput data introduction and caching data from real-time sources. Data can be imported from sources such as event hub, Kafka, azure data lake, open source agents such as fluent D / fluent bit, and various cross cloud and local data sources.

Unbounded data modeling-If you use Data Explorer, you do not need to generate complex data models, and you do not need to write complex scripts to convert data before using data.

Infinite data scale-Data resource manager is a distributed system. Its calculation and storage can be scaled independently, and it can easily realize data analysis on the order of Pb.

No index maintenance required-Data can be optimized to maintain query performance without performing maintenance tasks, and indexes do not need to be maintained. When you use Data Explorer, all raw data is immediately available, so you can run high-performance, highly concurrent queries against streaming and persistent data. These queries can be used to generate quasi real-time dashboards and alerts and connect operational analysis data to the rest of the data analysis platform.

Low latency, high performance, high concurrency-Data Explorer indexes semi-structured data (JSON) and unstructured data (free text), so it can run queries on such data very efficiently. By default, each field will be indexed during data import, and you can use low-level encoding policies with corresponding options to fine tune or disable indexing for specific fields. The index range is a single data slice.

Standard data analysis-The data resource manager standardizes self-service big data analysis through the intuitive kusto query language (KQL). KQL has both the expressiveness and powerful functions of SQL and the simplicity of Excel. After highly optimized, KQL can use the first-class text index technology of data resource manager to explore the original telemetry data and time series data, realize efficient free text and regular expression search, and provide comprehensive analysis functions for querying and tracking \ text data and JSON semi-structured data (including array and nested structure). KQL provides advanced timing support for creating, operating and analyzing multiple timings, and provides Python execution support inside the engine for model scoring.

Multi ecological integration-Azure synapse analytics provides interoperability for data between data resource manager, Apache spark and SQL Engine, enabling data engineers, data scientists and data analysts to easily and securely access and collaborate on the same data in the data lake.

What are the digital business scenarios supported by azure synapse data browser?

With the glory of real-time technology, Microsoft releases real-time big data analysis products!

Accurate real-time behavior optimization

Azure synapse data browser works flexibly between customers’ azure hybrid cloud solutions. For example, a railway network company can trust azure synapse data browser to replace its local log management solution. For the transportation industry, safety is the primary consideration, because people’s life depends on real-time telemetry data. With the expansion of large-scale infrastructure nationwide, railway management companies need a platform that can quickly obtain a large amount of time series and log data, and then create powerful insight and data visualization in power Bi. Azure synapse data browser enables the railway company to effectively identify behavior patterns or violations in its huge transportation network, so as to make the railway system more secure.

Real time supply chain insight

Azure synapse data browser can build real-time big data analysis for customized event and log data, so as to save time and resources for enterprises and focus on the core value of the business. For example, if an Internet takeout company wants to improve their processes and businesses to provide a consistent and first-class customer experience, they may be hindered by slow, complex and expensive log management technology solutions. However, using azure synapse data browser engine, Internet takeout companies can immediately benefit from faster data intake, higher concurrency and greater flexibility. This will enable them to focus on their core mission: providing delicious takeout and consistent customer service.

Complex security event handling

In the face of digital security threats, every second is important. Client online delay, network failure and query timeout may be devastating, but these problems may perplex network security and log management service providers. Their existing technology solutions may hinder their ability to realize the core value proposition of accessibility and transparency. In this case, network security providers can use azure synapse data browser, which will provide them with a data platform and provide their customers with valuable insights on threat detection, intelligence alerts and security trends. Therefore, network security providers can establish stronger relationships and more trust with their users.

To sum up, azure synapse data browser can create meaningful connections across various data sources and databases. Today, various digital businesses are flooded with a large number of time series, logs and telemetry data from Internet of things devices, applications, websites and other sources. This continuous real-time data flow can be overwhelming and slow for the IT infrastructure. Using the distributed query engine of azure synapse data browser, customers can gain strong insight and let them focus on their core business, whether creating a safer world or delivering the best takeout.

(azure synapse analytics, operated by 21st century Internet, is now available. Click to read the original text to learn more.)

Recommended Today

On the mutation mechanism of Clickhouse (with source code analysis)

Recently studied a bit of CH code.I found an interesting word, mutation.The word Google has the meaning of mutation, but more relevant articles translate this as “revision”. The previous article analyzed background_ pool_ Size parameter.This parameter is related to the background asynchronous worker pool merge.The asynchronous merge and mutation work in Clickhouse kernel is completed […]