• 08 – data extraction of page parsing – Python crawler


    Generally speaking, what we need to capture is the content of a website or an application to extract useful value. The content is generally divided into two parts, unstructured text or structured text. About structured data JSON、XML、HTML HTML text (including JavaScript code) is the most common data format. It should be a structured text organization, […]

  • [big data practice] game event processing system (2) — event processing logstash


    preface In previous articles[big data practice] game event processing system (1) — event collection filebeatIn this paper, the background, objective and technical scheme of the system are summarized, and the application of the system is introducedfilebeatCollect logs and send them tologstash。 Therefore, this article will focus onlogstashHow to receive, process and output events is introduced. […]

  • Intelligent electronic medical record system based on natural language processing technology


    1、 Design concept & Product Introduction The intelligent electronic medical record system uses the unique NLP technology to process the medical records in detail and professionally, so that the internal meaning of the medical records is “understood” by the computer and the monitoring and utilization can be realized. Its core value is not only the […]

  • What is PL / SQL?


    1、 What is PL / SQL? PL / SQL (procedure language / SQL) is an extension of Oracle in the standard SQL language. QL / SQL can not only embed SQL language, but also define variables and constants, allow private conditional statements and circular statements, and allow exceptions to handle various errors, which makes its […]

  • SQL Language Overview (4.1)


    SQL Language Overview (4.1) catalog SQL Language Overview (4.1) 4.1 SQL Language Overview 4.1.1 introduction to history and standards 4.1.2 definition and characteristics of SQL language 4.1.3 instructions reference material: Database principle and design (3rd Edition) Supporting database: Microsoft SQL Server Refer to ANSI SQL-92 standard 4.1 SQL Language Overview 4.1.1 introduction to history and […]

  • 25 big data terms everyone should know


    Absrtact: if you’re new here, big data looks scary! According to your basic theory, let’s focus on some key terms to impress your date, boss, family or anyone else. Let’s start: 1. Algorithm. How is “algorithm” related to big data? Even though algorithm is a general term, big data analysis makes it more popular and […]

  • Spark authority Guide – what is spark? (qbit)


    preface This is the study note of spark authority Guide #English original 《Spark: The Definitive Guide》 By bill chambers / Matei zaharia First edition in February 2018 #Chinese Translation Spark authority Guide Translated by Zhang Yanfeng / Wang Fangjing / Chen Jingjing First edition April 2020 Most of the contents of spark authority guide are […]

  • Face object and UML class diagram


    Object-oriented why? 1. Program execution: sequence, judgment, cycle, — structured 2. Object oriented – data structure 3. Computer oriented, structured is the simplest 4. Becoming should be simple & abstract A basic class class People { constructor(name, age) { this.name = name; this.age = age; } eat() { alert(`${this.name} eat something`); } speak() { alert(`My […]

  • Research on typical characteristics and development direction of medical data architecture


    Preface At present, the medical and health industry is in a high-speed development state, and is in the key stage of enabling the medical industry through the Internet. Due to the strong privacy of medical industry data, it is difficult to obtain public medical and health data through traditional methods for research. It is an […]

  • About spark SQL


    Introduction to spark SQL Spark SQL is a module of spark to deal with structured data. It provides a programming abstraction called dataframe and functions as a distributed SQL query engine.The analogy of hive is to transform hive SQL into map reduce, and then submit it to the cluster for execution, which greatly simplifies the […]

  • A list of skills that data scientists should have


    Abstract: a list of skills that data science profession should have, including technical skills and non-technical skills. Relevant readers can gradually improve themselves according to the list. There is a link to learning resources at the end of the article! In the era of big data, what occupations are popular? The answer can be found […]

  • Running operational workloads on Hadoop


    Compared with Oracle, IBM DB2, Microsoft SQL server, Informix, mysql, PostgreSQL, Teradata and other relational databases, as well as impala, tez, hive, drill, Presto and other SQL on Hadoop solutions, what are the advantages of Apache trafodion? Apache trafodion is a state-of-the-art database, on par with the relational database described above. Each database has different […]