Category:Artificial Intelligence

  • Pipeline of spark mllib

    Time:2020-4-3

    The spark pipeline API is inspired by scikit learn and aims to simplify the creation, tuning and validation of machine learning processes.Ml pipeline usually consists of the following stages: Data preprocessing feature extraction Creation of algorithm model and fitting of model parameters Verification The phases of ML pipeline are implemented by a series of converters […]

  • Nltk natural language processing library

    Time:2020-4-2

    Natural language processing, usually referred to as NLP, is a branch of artificial intelligence, dealing with the interaction between computers and people using natural language. The ultimate goal of NLP is to read, interpret, understand and understand human language in a valuable way. Most NLP technologies rely on machine learning to extract meaning from human […]

  • Ambari 2.7.3.0 add components

    Time:2020-4-2

    Ambari 2.7.3.0 installation of new components is slightly different from previous versions. This article will briefly describe the simple process of installing new components. If you have installed ambari 2.7.3.0, you need to install new components because some components have not been added. First we log in to ambari. Then select stack and versions in […]

  • What is data center? What is a platform for counting? What is the relationship between them?

    Time:2020-4-2

    With the concept of data in the middle of Taiwan becoming more and more popular, more and more technology companies begin to slowly enter the middle of Taiwan’s track, whether it is data in the middle of Taiwan, technology in Taiwan or business in Taiwan and so on. As long as it is connected with […]

  • Scikit flow series guidance of tensorflow practice: Part 1

    Time:2020-4-2

    Original address: here Google recently opened a machine learning framework tensorflow, which won more than 10k praise on GitHub in a short time, and caused a great response among AI researchers. Why do I care? Before we get to know tensorflow, we first need to understand a problem. As a professional data scientist, why do […]

  • Realize the real-time statistics of massive billions of data by Flink

    Time:2020-4-2

    background Message report is mainly used to count the distribution of message tasks. For example, the total number of APP users who send a single push message, the number of successfully pushed mobile phones, and how many app users click the pop-up notification and open the app, etc. Through the message report, we can directly […]

  • Build Kafka test cluster

    Time:2020-4-2

    Application machine Contact OP students to apply for machines, one, three, five Linux servers, (2 * n + 1). Zookeeper cluster work is more than half of them can provide external services, more than two of them are more than half, and one is allowed to hang up. It is not necessary to use even […]

  • Spark ml parameter

    Time:2020-4-1

    In machine learning, how to fit parameters for the algorithm model according to the given data set, so that the model can achieve the optimal effect, this process is called “tuning”.Spark’s mlib providesCrossValidatorandTrainValidationSplitThere are two ways to help tune the model.Generally, the following settings are required to use the above two methods, setEstimatorMethod to specify […]

  • Python for NLP: deep learning text generation using keras

    Time:2020-4-1

    Original link: http://tecdat.cn/? P = 8448 Text generation is one of the latest applications of NLP. Deep learning technology has been used in various text generation tasks, such as writing poetry, generating movie scripts and even creating music. However, in this article, we’ll see a very simple example of text generation where given the input […]

  • How to ensure the business stability of Taobao: Noah adaptive flow control

    Time:2020-4-1

    Noah’s adaptive flow control solution is based on automatic control algorithm, which solves the problem of manual current limiting configuration omission or outdated pain points, and greatly improves the application’s ability to resist the impact of traffic. In the past double 11, Noah has guaranteed a large number of business application systems, with large-scale deployment […]

  • Scikit flow series guidance of tensorflow: Part 2

    Time:2020-4-1

    Original address: here In this part, we will continue to go deep and try to build a multi-layer fully connected neural network, and customize the network model and try to convolute the network on this basis. Multi-layer fully connected neural network Of course, there are not many other linear / logistic fit frameworks. A basic […]

  • Kafka file storage mechanism

    Time:2020-4-1

    1、 Storage file structure topic: can be understood as the name of a message queue partitionTo achieve scalability, a very large topic can be distributed to multiple brokers (i.e. servers), and a topic can be divided into multiple partitions, each of which is an orderly queue segment: partition is physically composed of multiple segments message: […]