Every organization that decides to use data-driven decision-making considers data architecture. Undoubtedly, compared with five years ago, a large number of new technologies can be used to change the way organizations compete and serve customers.
Compared with post-event response, modern data-driven organizations should predict changes in business needs and markets and actively optimize results. Companies that do not update or reshape their data architecture will lose customers, capital, and market share.
This paper summarizes the main characteristics of modern data architecture and can be used as a guide for organizations to develop or select modern data architecture.
What is Data Architecture
Like traditional architects who design houses or buildings, data architects create data structures that meet the short-term and long-term goals of the organization and its unique cultural and background requirements.
For most people, data architecture defines a set of standard products and tools for organizing the use and management of data. But it’s not just that. The data architecture defines a set of processes for data acquisition, transformation and processing, as well as data transfer to business users. Most importantly, it also identifies who will use the data and what they need to use it. A good data architecture should flow from right to left: from consumers of data to data sources, not in other ways.
In the past, organizations built IT-driven static data architectures called data warehouses. Because of the underlying technology and design pattern, most data warehouses need a large number of people to build and update, but the return is small. Although it is well advertised, it can solve the problem of data fusion and facilitate reporting and analysis, in fact, it may only unify the storage of data.
Modern data architecture still provides data warehouse. Ideally, data warehouse is flexible, agile and highly adaptable. But data warehouse is only a component of modern data architecture or modern data analysis environment. The new architecture is like a vibrant organism, which can detect and respond to changes, constantly learn and adapt, and provide customized and managed data access for everyone.
Contrast Data PlatformThis is the case. In addition, data architecture is not a data platform. Data platform refers to a set of engines and tools responsible for data movement, processing and validation. Data platform includes the underlying database engine for data processing (e.g., relational database, Hadoop, OLAP), and the upper framework for data integration, enabling data engineers to create data sets for business use. This is the case. Data platform emphasizes data integration, which is IT-centric, while data architecture emphasizes cooperation between business and IT.
Modern data architecture has the following ten characteristics.
- Customer-centeredRather than focusing on the data itself or on technologies used to extract, process, transform, and present information, modern data architectures derive from business users and their needs. Customers can be internal or external to the organization, and requirements can vary with roles, departments and time. A good data architecture should be able to meet users’new or changing needs and make continuous progress.
- Strong adaptabilityIn modern data architecture, data flow flows from source system to business users like water. The purpose of this architecture is to manage the process by creating a series of interconnected data pipelines to meet various business needs. These pipelines are composed of some basic data objects, including data snapshots, data increments, data views, reference data, master data, and object-oriented. Table. These data objects can be reused and supplemented to ensure the stable output of high quality related data to the business.
- automationIn order to create an adaptive architecture with continuous data flow, designers need to be able to automate everything. It must analyze and tag data while ingesting data, and map it to existing data sets and attributes. This process, also known as metadata injection, is the key function of data catalog products. Similarly, it can detect structural changes in data sources and assess the impact on downstream data and applications. In real-time environment, it can also detect anomalies and notify responding individuals or trigger alarms.
- IntelligenceIdeal data architecture is not only automated, but also uses machine learning and artificial intelligence to build data objects, tables, views, and models to maintain data flow. Use intelligence instead of violence to identify data types, common key and join relationships, identify and repair data quality problems, identify relationships between tables, recommend relevant data objects and analyze usage methods, etc. Modern data architecture enables people who manage and use data to work more efficiently through intelligent learning, adjustment, alarm, recommendation and other methods.
- flexibleModern data architecture needs to be flexible enough to support multiple business needs. We need to be able to support a variety of business users, a variety of data loading modes and frequencies (batch, small batch, streaming), a variety of query operations (creation, reading, update, deletion), a variety of deployment modes (private deployment, public cloud, hybrid cloud), a variety of data processing engines (OLAP, MapReduce, SQL, Graphing, etc.) and a variety of data channels (data warehouse, data mart, etc.). OLAP cubes, visual discovery, real-time applications. Modern data architectures need to meet everyone’s needs.
- cooperationUnlike previous IT departments, modern data architecture decentralizes the tasks of data acquisition and processing to IT departments and business users. IT departments are still responsible for obtaining and transporting data from business systems and building reusable data objects. From then on, competent and demanding business departments can take over. Data engineers and analysts in business departments use data preparation and data catalog tools to create customized data sets and use these data sets to create business-supporting applications. This collaboration eliminates the need for IT departments to understand the business because it has never been their strong point.
- GovernmentIronically, governance is the key to self-service. Modern data architecture defines data access methods for each type of user to meet their needs. Four types of users can be defined: data consumers, data explorers, data analysts and data scientists. For example, data scientists can access raw data within a landing authorization, or better, in a specially constructed sandbox, they can mix raw data with their own data.
- simpleAccording to Occam’s razor principle, the simplest architecture is the best one. Given the diversity of today’s data architecture requirements and the complexity of components, this is a daunting task. Applying this rule, organizations with small amounts of data can use a BI tool and its built-in data management environment to build data architecture, rather than a large parallel processing (MPP) engine or Hadoop environment. At the same time, in order to reduce complexity, organizations should limit the movement and replication of data as much as possible, and advocate a unified data platform, data management framework, and data analysis platform. Although there may be better choices in each direction, the simplicity brought about by unification is also important.
- elasticIn the era of large data and variable load, organizations need an extensible and flexible architecture to meet the data processing needs of continuous products. Many companies are now flocking to cloud platforms (public or private) to gain on-demand scalability at affordable prices. Flexible architecture allows administrators not to estimate capacity accurately, save usage when necessary, and avoid over-purchasing hardware.
- securityModern data architecture is a bastion of freedom, that is, it provides authorized users with access to data at any time, while preventing hackers and intruders. It also complies with privacy regulations, including the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation. It is achieved by encrypting data when it is ingested, shielding personal identity data, and tracking data elements in the data catalog, including data consanguinity, usage, and auditing. Life cycle management ensures that each data has a leader, location, and outdated plan.
reliableAny data architecture must be reliable with high availability, disaster recovery and backup/recovery capabilities. Machine failures are common, especially in data architectures running on clouds or large-scale servers. The good news, however, is that the cloud now provides built-in redundancy and failover, as well as good SLA protocols, and allows companies to back up data in geographically dispersed data centers at low cost for disaster recovery.
If you are interested in big data or data platform, you are welcome to pay attention to Wechat Public Number.