By logm
This article was originally published at https://segmentfault.com/u/logm/articles and is not allowed to be reproduced~
If the mathematical formula in the article cannot be displayed correctly, please refer to: Tips for displaying the mathematical formula correctly
This article isComputing advertising (Second Edition)
Reading notes.
This part introduces the key technology of online advertising, which is for technicians.
Chapter 9 overview of computing advertising technology
9.1 personalized system framework
 Log – > data highway – > flow calculation – > online features – > launch engine;
 Log – > data highway – > distributed computing – > offline features – > launch engine.
9.2 optimization objectives of various advertising systems
 Gd: meet contractual requirements;
 Adn: CPC, estimated click through rate;
 ADX：CPM；
 DSP: estimated click through rate + click through value.
9.3 computing advertising system architecture

Advertising engine:
 Ad server: recall + sorting + revenue management, requiring QPS and delay;
 Ad retrieval: recall candidate set according to user tag and page tag;
 Ad ranking: calculate ECPM, estimate click rate + click value, and sort;
 Yield management: the goal of global income is the best;
 Advertisement request interface: Web request or SDK;
 Customized user Division: the division data of advertisers to users.
 Data highway

Offline data processing:
 Generation of user session log: unified sorting by user ID;
 Behavior orientation: mining user logs and labeling;
 Context orientation: label the context page;
 Click rate modeling: processing features of CTR model;
 Allocation planning: Mining appropriate allocation schemes from logs with the goal of global revenue optimization;
 Business intelligence system: providing data (BI) for decision makers;
 Advertising management system: the advertiser tool (AE) manages the delivery plan.

Online data processing:
 Online anti cheating: eliminate cheating traffic;
 Billing;
 Online behavior feedback: realtime audience orientation, realtime click feedback;
 Real time index: receive advertising data in real time and update the index.
9.4 main technologies of computer advertising system

Algorithm optimization:
 Audience orientation;
 ECPM prediction, click through rate prediction;
 Online distribution (traffic requirements in the contract);
 Pricing strategy: maximizing profits in the game;
 Explore and exploit (E & E): more comprehensive sampling;
 Personalized recommendation.

Architecture optimization:
 Real time index;
 NoSQL database;
 Distributed computing + stream computing;
 High concurrency and low latency bidding interface.
9.5 open source tools
 Nginx: Web server;
 Zookeeper: distributed cluster management;
 Lucene: index + retrieval;
 Thrift: cross language communication, used to encapsulate the interface of each module;
 Flume: data highway;
 Hadoop: distributed data processing;
 Redis: feature online cache, belonging to NoSQL database;
 Storm: flow calculation;
 Spark: it can satisfy various calculation methods, including iterative calculation, batch calculation, flow calculation, graph calculation, SQL relation query, etc.
Chapter 10 basic knowledge preparation
10.1 information retrieval

Inverted index:
 Basic operation: add document to index; given query, return corresponding collection

Vector space model (VSM):
 TF IDF construction vector, cos distance
10.2 optimization method

Lagrangian method: optimization with constraints
 When the original problem is a convex optimization problem, it satisfies strong duality, that is, the optimal solution of the dual problem is the lower bound of the optimal solution of the original problem;
 Under strong duality, the point satisfying KKT condition is the solution of the original problem.

Drop simplex method:
 In the case of non derivative, if the function is continuous, this method can be used;
 It’s a bit like the dichotomy of high dimensional space;
 Also known as amoeba amoeba method.
 Gradient descent method;
 Quasi Newton method.
10.3 statistical machine learning

Maximum entropy and exponential family distribution:
 The maximum entropy solution is equivalent to the maximum likelihood solution of the corresponding exponential distribution;
 The exponential family distribution is singlemode, which is not suitable for the expression of random variables with multiple factors.

Hybrid model and EM algorithm:
 To solve the single mode problem of exponential family distribution;
 Multiple exponential family distributions are superimposed into a hybrid model.
 Bayesian learning;
 In depth learning: CNN, RNN, Gan.
Chapter 11 core technology of contract advertising
11.1 advertisement scheduling system
 CPT, non personality;
 Anti skylight advertisement: the default advertisement when the advertisement is loaded incorrectly.
11.2 guaranteed delivery system
 Traffic forecast: use historical data to fit future traffic;
 Frequency control: the more times a user sees the same advertisement, the lower the click rate; the implementation method is to record the frequency to the database.
11.3 online distribution
 Abstract as a bipartite graph matching problem, this paper introduces the assumption that the advertising traffic is approximately the same in each period to solve the problem.
Chapter 12 core technology of audience orientation
12.1 audience oriented technology classification
 User LABEL T (U): population attribute orientation and behavior orientation;
 Context LABEL T (c): region orientation, channel orientation and context orientation;
 Customized tag t (a, U): the tag of a specific advertiser to a specific user, redirection and new customer recommendation.
12.2 context orientation
 Semi online grabbing system: context orientation needs to grab context content, but the realtime grabbing delay is too large, and the whole web crawler cost is too high; the general solution is that when there is a request for a page context tag, it will be grabbed and put into the cache, and the advertisement display will ignore the tag before it is used, but the context request of the same page can be in the cache in the future Obtain.
12.3 text topic mining
 Topic model
 LSA (latent semantic analysis), unsupervised, SVD decomposition of TFIDF matrix, similar to PCA.
 Plsi (probabilistic later semantic indexing): suppose there are k topics ($z_1, z_2,.., z_k $), and K polynomial distributed hybrid models are used to model $p (w Z,..), beta $, $\ beta $are parameters, there are k groups, and $w n $is each word in the document; EM is used to solve the hybrid model.
 LDA (latent Dirichlet allocation): on the basis of plsi, Bayesian is introduced to smooth the data shortage.
 word2vec。
12.4 behavioral orientation
 Modeling: Poisson distribution is used to model the number of hits of a user on a certain kind of directional advertisement; linear model is used to link the parameters $\ lambda $of Poisson distribution and user behavior; the whole model is equivalent to the generalized linear model of Poisson distribution.

Features: the user behavior is mapped to the predetermined label system and expressed by the cumulative strength of unit time; the average value in the time window is calculated by using the sliding average; in addition, it needs to consider:
 Training set length: in order to eliminate the periodicity brought by working days, the number of days in general training set is selected as a multiple of 7;
 Time window size: if you want the system to react more immediately, use a narrow time window.
 Decision making: the whole model is linear, and today’s scores can be obtained by sliding average of the scores of the previous days.
 Evaluation: reach CTR curve.
12.5 prediction of population attributes
 Multi classification problems in machine learning: gender, age, education level, income level.
12.6 data management platform
Chapter 13 core technology of competitive advertising
13.1 pricing algorithm of competitive advertising
 GSP: generalized second high price;
 MRP: market reserve price (the final price cannot be lower than this price);
 Price squeeze factor: controls which of the click through rate and the bid has the greatest influence in the final ranking.
13.2 search advertising system

Query expansion:
 Based on recommendation: collaborative filtering;
 Methods based on theme model: theme model;
 Method based on historical effect: obvious effect.
 Advertisement placement: improve the overall revenue of advertisements under the premise of the restriction of the overall number of advertisements in the north area for a period of time.
13.3 advertising network

Short term behavior feedback and flow calculation:
 Real time anti cheating;
 Real time billing: ads that run out of budget will be offline in time;
 Short time user label;
 Short term dynamic characteristics: dynamic characteristics in CTR prediction.
13.4 advertisement search
 Retrieval of Boolean expression;
 Correlation retrieval: wand algorithm, TFIDF computing correlation + small top heap fast retrieval;
 Semantic modeling based on DNN: DSSM, YouTube personalized recommendation model;

Approximate nearest neighbor semantic retrieval (ANN):
 Hash algorithm: local sensitive hash (LSH);
 Vector quantization algorithm: hierarchical Kmeans tree (HKM tree);
 Graph based algorithm: NSW.
Chapter 14 click through rate prediction model
 The click rate forecast is modeled as a “regression problem” rather than a “ranking problem” because the click rate is used to estimate the ECPM for bidding.
14.1 click through rate prediction
 Basic model: logistic regression;
 Optimization algorithm: lbfgs, confidence region method;
 Correction: imbalance of positive and negative samples;

Features:
 Features are nonlinear: barrel, square, log, root sign;
 Feature combination;
 Dynamic feature: the historical click rate of a feature combination;

Deviation and coec (click on expected click):
 Reasons: for example, the click rate deviation caused by advertising space, the click rate difference between the top advertising space and the bottom advertising space is very large;
 Solution: estimate the “expected click through rate” (EC) for different advertising spaces, coec = the expected click through rate / expected click through rate of advertising spaces.
 Common deviation: advertising position, advertising size, advertising delay, date and time, browser.
 Smoothing: the problem of feature missing and statistical smoothing when the feature sample is insufficient;
 Evaluation: ROC;
 Intelligent frequency control: EC counting or frequency counting are added to the model as features to inhibit the placement of highfrequency advertisements.
14.2 other click through rate models
 Factorization machine (FM);
 GBDT；
 Deep learning click through rate model.
14.3 exploration and utilization
 When the advertisement is always the best, the feature sampling of some long tail advertisements is not accurate.

Reinforcement learning: a part of traffic is used for exploration and utilization of reinforcement learning (E & E), multi arm bandit (mAb);
 UCB method (Upper Confidence Bound): it does not simply choose the best advertisement in experience, but considers the uncertainty of experience estimation, and selects the advertisement with the largest upper bound;
 Consider the context of bandit: linucb.
Chapter 15 core technology of procedural transaction
15.1 advertising trading platform
 Cookie mapping;
 RFQ Optimization: each RFQ is only initiated for the DSP that may win, so as to reduce the pressure on the server; however, it is necessary to avoid the problem that some DSPs cannot obtain the traffic completely.
15.2 employer platform
 Customized user label;
 Click on predicted Click: increase COPC (click on predicted click), the ratio of real Click to predicted click, and correct the overestimation and underestimation of click rate;
 Click value estimation: used in CPS / CPA / ROI settlement, click value = arrival rate * conversion rate * conversion unit price;
 Bidding strategy: consider budget constraints.
15.3 supplier platform
 Network optimization: dynamically decide which ad network to send ad requests to.
Chapter 16 other advertising related technologies
16.1 creative optimization
 Procedural creativity: regional creativity (such as putting the contact number of the user’s region into the advertisement), search redirection creativity (such as putting the historical search term of the user into the search box of the advertisement), personalized redirection creativity (such as Taobao’s redirection advertisements are all generated in real time);
 Click the heat map: count which area of an idea is most easily clicked by users;
 Development trend: video, interactive, carrying more information.
16.2 experimental framework
 Bucket is divided by users rather than randomly (because there is a correlation between multiple ad displays).
16.3 advertising monitoring and attribution
 Advertising monitoring: entrust a third company to check the actual number of displays and clicks;
 Advertisement security: some advertisements will have negative effects when they are put on specific media (such as vulgar media); the media will disguise their traffic as inferior; it is necessary to confirm that the browser has indeed undergone rendering process (that is, the advertisements are indeed visible to users);
 Effect attribution: for the advertisement settled by CPA / CPS / ROI, it is necessary to confirm the correctness of the transformed data and determine which media the transformed user enters from.
16.4 cheating and anti cheating

Classification of cheating methods:

Cheating subject:
 Media cheating: the media constructs false click behavior;
 Cheating in advertising platform: adn or ADX constructs false click behavior; DSP constructs false click, display and transformation behavior;
 Cheating of advertiser’s competitors: consuming advertiser’s budget;

Cheating principle:
 Non human traffic (NHT): construct false display, click and transformation behavior;
 Attributive cheating: attributing the transformation behavior brought by others to one’s own name;
 Cheating means: machine cheating, artificial cheating.


Common cheating methods:
 Server brush monitoring code: using crawlers to visit the web page makes advertisers think that the exposure is large;
 The client swipes the monitoring code: when the user visits the web page, the web page script allows the user to automatically visit the background several times, which makes the advertiser mistakenly think that the exposure is large;
 Frequent change of user identity: combined with the above two methods;
 Hacker: the hacker controls the infected device to visit the web page in the background;
 Traffic hijacking;
 Cookie filling: for example, when the user is shopping on Taobao, the jump address is modified to make Taobao mistakenly think that the user is the third party who clicks the advertisement through the third party website;
 IP cover: hide yourself when finding the anti cheat;
 Click abuse and click injection.