[reading notes] calculation advertising (Part 3)


By logm

This article was originally published at https://segmentfault.com/u/logm/articles and is not allowed to be reproduced~

If the mathematical formula in the article cannot be displayed correctly, please refer to: Tips for displaying the mathematical formula correctly

This article isComputing advertising (Second Edition)Reading notes.

This part introduces the key technology of online advertising, which is for technicians.

Chapter 9 overview of computing advertising technology

9.1 personalized system framework

  • Log – > data highway – > flow calculation – > online features – > launch engine;
  • Log – > data highway – > distributed computing – > offline features – > launch engine.

9.2 optimization objectives of various advertising systems

  • Gd: meet contractual requirements;
  • Adn: CPC, estimated click through rate;
  • ADX:CPM;
  • DSP: estimated click through rate + click through value.

9.3 computing advertising system architecture

  • Advertising engine:

    • Ad server: recall + sorting + revenue management, requiring QPS and delay;
    • Ad retrieval: recall candidate set according to user tag and page tag;
    • Ad ranking: calculate ECPM, estimate click rate + click value, and sort;
    • Yield management: the goal of global income is the best;
    • Advertisement request interface: Web request or SDK;
    • Customized user Division: the division data of advertisers to users.
  • Data highway
  • Offline data processing:

    • Generation of user session log: unified sorting by user ID;
    • Behavior orientation: mining user logs and labeling;
    • Context orientation: label the context page;
    • Click rate modeling: processing features of CTR model;
    • Allocation planning: Mining appropriate allocation schemes from logs with the goal of global revenue optimization;
    • Business intelligence system: providing data (BI) for decision makers;
    • Advertising management system: the advertiser tool (AE) manages the delivery plan.
  • Online data processing:

    • Online anti cheating: eliminate cheating traffic;
    • Billing;
    • Online behavior feedback: real-time audience orientation, real-time click feedback;
    • Real time index: receive advertising data in real time and update the index.

9.4 main technologies of computer advertising system

  • Algorithm optimization:

    • Audience orientation;
    • ECPM prediction, click through rate prediction;
    • Online distribution (traffic requirements in the contract);
    • Pricing strategy: maximizing profits in the game;
    • Explore and exploit (E & E): more comprehensive sampling;
    • Personalized recommendation.
  • Architecture optimization:

    • Real time index;
    • NoSQL database;
    • Distributed computing + stream computing;
    • High concurrency and low latency bidding interface.

9.5 open source tools

  • Nginx: Web server;
  • Zookeeper: distributed cluster management;
  • Lucene: index + retrieval;
  • Thrift: cross language communication, used to encapsulate the interface of each module;
  • Flume: data highway;
  • Hadoop: distributed data processing;
  • Redis: feature online cache, belonging to NoSQL database;
  • Storm: flow calculation;
  • Spark: it can satisfy various calculation methods, including iterative calculation, batch calculation, flow calculation, graph calculation, SQL relation query, etc.

Chapter 10 basic knowledge preparation

10.1 information retrieval

  • Inverted index:

    • Basic operation: add document to index; given query, return corresponding collection
  • Vector space model (VSM):

    • TF IDF construction vector, cos distance

10.2 optimization method

  • Lagrangian method: optimization with constraints

    • When the original problem is a convex optimization problem, it satisfies strong duality, that is, the optimal solution of the dual problem is the lower bound of the optimal solution of the original problem;
    • Under strong duality, the point satisfying KKT condition is the solution of the original problem.
  • Drop simplex method:

    • In the case of non derivative, if the function is continuous, this method can be used;
    • It’s a bit like the dichotomy of high dimensional space;
    • Also known as amoeba amoeba method.
  • Gradient descent method;
  • Quasi Newton method.

10.3 statistical machine learning

  • Maximum entropy and exponential family distribution:

    • The maximum entropy solution is equivalent to the maximum likelihood solution of the corresponding exponential distribution;
    • The exponential family distribution is single-mode, which is not suitable for the expression of random variables with multiple factors.
  • Hybrid model and EM algorithm:

    • To solve the single mode problem of exponential family distribution;
    • Multiple exponential family distributions are superimposed into a hybrid model.
  • Bayesian learning;
  • In depth learning: CNN, RNN, Gan.

Chapter 11 core technology of contract advertising

11.1 advertisement scheduling system

  • CPT, non personality;
  • Anti skylight advertisement: the default advertisement when the advertisement is loaded incorrectly.

11.2 guaranteed delivery system

  • Traffic forecast: use historical data to fit future traffic;
  • Frequency control: the more times a user sees the same advertisement, the lower the click rate; the implementation method is to record the frequency to the database.

11.3 online distribution

  • Abstract as a bipartite graph matching problem, this paper introduces the assumption that the advertising traffic is approximately the same in each period to solve the problem.

Chapter 12 core technology of audience orientation

12.1 audience oriented technology classification

  • User LABEL T (U): population attribute orientation and behavior orientation;
  • Context LABEL T (c): region orientation, channel orientation and context orientation;
  • Customized tag t (a, U): the tag of a specific advertiser to a specific user, redirection and new customer recommendation.

12.2 context orientation

  • Semi online grabbing system: context orientation needs to grab context content, but the real-time grabbing delay is too large, and the whole web crawler cost is too high; the general solution is that when there is a request for a page context tag, it will be grabbed and put into the cache, and the advertisement display will ignore the tag before it is used, but the context request of the same page can be in the cache in the future Obtain.

12.3 text topic mining

  • Topic model
  • LSA (latent semantic analysis), unsupervised, SVD decomposition of TF-IDF matrix, similar to PCA.
  • Plsi (probabilistic later semantic indexing): suppose there are k topics ($z_1, z_2,.., z_k $), and K polynomial distributed hybrid models are used to model $p (w| Z,..), beta $, $\ beta $are parameters, there are k groups, and $w| n $is each word in the document; EM is used to solve the hybrid model.
  • LDA (latent Dirichlet allocation): on the basis of plsi, Bayesian is introduced to smooth the data shortage.
  • word2vec。

12.4 behavioral orientation

  • Modeling: Poisson distribution is used to model the number of hits of a user on a certain kind of directional advertisement; linear model is used to link the parameters $\ lambda $of Poisson distribution and user behavior; the whole model is equivalent to the generalized linear model of Poisson distribution.
  • Features: the user behavior is mapped to the pre-determined label system and expressed by the cumulative strength of unit time; the average value in the time window is calculated by using the sliding average; in addition, it needs to consider:

    • Training set length: in order to eliminate the periodicity brought by working days, the number of days in general training set is selected as a multiple of 7;
    • Time window size: if you want the system to react more immediately, use a narrow time window.
  • Decision making: the whole model is linear, and today’s scores can be obtained by sliding average of the scores of the previous days.
  • Evaluation: reach CTR curve.

12.5 prediction of population attributes

  • Multi classification problems in machine learning: gender, age, education level, income level.

12.6 data management platform

Chapter 13 core technology of competitive advertising

13.1 pricing algorithm of competitive advertising

  • GSP: generalized second high price;
  • MRP: market reserve price (the final price cannot be lower than this price);
  • Price squeeze factor: controls which of the click through rate and the bid has the greatest influence in the final ranking.

13.2 search advertising system

  • Query expansion:

    • Based on recommendation: collaborative filtering;
    • Methods based on theme model: theme model;
    • Method based on historical effect: obvious effect.
  • Advertisement placement: improve the overall revenue of advertisements under the premise of the restriction of the overall number of advertisements in the north area for a period of time.

13.3 advertising network

  • Short term behavior feedback and flow calculation:

    • Real time anti cheating;
    • Real time billing: ads that run out of budget will be offline in time;
    • Short time user label;
    • Short term dynamic characteristics: dynamic characteristics in CTR prediction.

13.4 advertisement search

  • Retrieval of Boolean expression;
  • Correlation retrieval: wand algorithm, TF-IDF computing correlation + small top heap fast retrieval;
  • Semantic modeling based on DNN: DSSM, YouTube personalized recommendation model;
  • Approximate nearest neighbor semantic retrieval (ANN):

    • Hash algorithm: local sensitive hash (LSH);
    • Vector quantization algorithm: hierarchical K-means tree (HKM tree);
    • Graph based algorithm: NSW.

Chapter 14 click through rate prediction model

  • The click rate forecast is modeled as a “regression problem” rather than a “ranking problem” because the click rate is used to estimate the ECPM for bidding.

14.1 click through rate prediction

  • Basic model: logistic regression;
  • Optimization algorithm: l-bfgs, confidence region method;
  • Correction: imbalance of positive and negative samples;
  • Features:

    • Features are non-linear: barrel, square, log, root sign;
    • Feature combination;
    • Dynamic feature: the historical click rate of a feature combination;
    • Deviation and coec (click on expected click):

      • Reasons: for example, the click rate deviation caused by advertising space, the click rate difference between the top advertising space and the bottom advertising space is very large;
      • Solution: estimate the “expected click through rate” (EC) for different advertising spaces, coec = the expected click through rate / expected click through rate of advertising spaces.
      • Common deviation: advertising position, advertising size, advertising delay, date and time, browser.
    • Smoothing: the problem of feature missing and statistical smoothing when the feature sample is insufficient;
    • Evaluation: ROC;
    • Intelligent frequency control: EC counting or frequency counting are added to the model as features to inhibit the placement of high-frequency advertisements.

14.2 other click through rate models

  • Factorization machine (FM);
  • GBDT;
  • Deep learning click through rate model.

14.3 exploration and utilization

  • When the advertisement is always the best, the feature sampling of some long tail advertisements is not accurate.
  • Reinforcement learning: a part of traffic is used for exploration and utilization of reinforcement learning (E & E), multi arm bandit (mAb);

    • UCB method (Upper Confidence Bound): it does not simply choose the best advertisement in experience, but considers the uncertainty of experience estimation, and selects the advertisement with the largest upper bound;
    • Consider the context of bandit: linucb.

Chapter 15 core technology of procedural transaction

15.1 advertising trading platform

  • Cookie mapping;
  • RFQ Optimization: each RFQ is only initiated for the DSP that may win, so as to reduce the pressure on the server; however, it is necessary to avoid the problem that some DSPs cannot obtain the traffic completely.

15.2 employer platform

  • Customized user label;
  • Click on predicted Click: increase COPC (click on predicted click), the ratio of real Click to predicted click, and correct the overestimation and underestimation of click rate;
  • Click value estimation: used in CPS / CPA / ROI settlement, click value = arrival rate * conversion rate * conversion unit price;
  • Bidding strategy: consider budget constraints.

15.3 supplier platform

  • Network optimization: dynamically decide which ad network to send ad requests to.

Chapter 16 other advertising related technologies

16.1 creative optimization

  • Procedural creativity: regional creativity (such as putting the contact number of the user’s region into the advertisement), search redirection creativity (such as putting the historical search term of the user into the search box of the advertisement), personalized redirection creativity (such as Taobao’s redirection advertisements are all generated in real time);
  • Click the heat map: count which area of an idea is most easily clicked by users;
  • Development trend: video, interactive, carrying more information.

16.2 experimental framework

  • Bucket is divided by users rather than randomly (because there is a correlation between multiple ad displays).

16.3 advertising monitoring and attribution

  • Advertising monitoring: entrust a third company to check the actual number of displays and clicks;
  • Advertisement security: some advertisements will have negative effects when they are put on specific media (such as vulgar media); the media will disguise their traffic as inferior; it is necessary to confirm that the browser has indeed undergone rendering process (that is, the advertisements are indeed visible to users);
  • Effect attribution: for the advertisement settled by CPA / CPS / ROI, it is necessary to confirm the correctness of the transformed data and determine which media the transformed user enters from.

16.4 cheating and anti cheating

  • Classification of cheating methods:

    • Cheating subject:

      • Media cheating: the media constructs false click behavior;
      • Cheating in advertising platform: adn or ADX constructs false click behavior; DSP constructs false click, display and transformation behavior;
      • Cheating of advertiser’s competitors: consuming advertiser’s budget;
    • Cheating principle:

      • Non human traffic (NHT): construct false display, click and transformation behavior;
      • Attributive cheating: attributing the transformation behavior brought by others to one’s own name;
    • Cheating means: machine cheating, artificial cheating.
  • Common cheating methods:

    • Server brush monitoring code: using crawlers to visit the web page makes advertisers think that the exposure is large;
    • The client swipes the monitoring code: when the user visits the web page, the web page script allows the user to automatically visit the background several times, which makes the advertiser mistakenly think that the exposure is large;
    • Frequent change of user identity: combined with the above two methods;
    • Hacker: the hacker controls the infected device to visit the web page in the background;
    • Traffic hijacking;
    • Cookie filling: for example, when the user is shopping on Taobao, the jump address is modified to make Taobao mistakenly think that the user is the third party who clicks the advertisement through the third party website;
    • IP cover: hide yourself when finding the anti cheat;
    • Click abuse and click injection.

16.5 practical selection of product technology