[kafka KSQL] Game Log Statistical Analysis (1)

Time:2019-6-19

[kafka KSQL] Game Log Statistical Analysis (1)

Taking game settlement log as an example, this paper shows the process of statistical analysis of log using KSQL.

Start confluent

cd ~/Documents/install/confluent-5.0.1/

bin/confluent start

View the list of Kafka topics

bin/kafka-topics --list --zookeeper localhost:2181

Create a topic that accepts the game settlement log

bin/kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 4 --topic score-normalized

Use the producer command line tool to log to topic

bin/kafka-console-producer --broker-list localhost:9092 --topic score-normalized

> 

{"cost":7,""epoch": 1512342568296,"gameId":"2017-12-04_07:09:0909:28_master 1 area _200_015_185175","gameType":"situan","gamers": [[{{"balance": 4405682", "delta":-60","username""","0791754000"}, {"balance": 6953532", "delta":-60","username"" "" "" "delta" -60","username"" "" "" "" "708379797979 79 79 79 79 99"}, {{{"balance""{{"balance"""::::::::::username": "abc 6378303"}, {"balance": 23129,""delta": 180," "username", "a137671268"},""reason" ":" xiayu"}

Use the Consumer Command Line Tool to see if the log is properly written

bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic score-normalized --from-beginning

You can see

{"cost":7,""epoch": 1512342568296,"gameId":"2017-12-04_07:09:0909:28_master 1 area _200_015_185175","gameType":"situan","gamers": [[{{"balance": 4405682", "delta":-60","username""","0791754000"}, {"balance": 6953532", "delta":-60","username"" "" "" "delta" -60","username"" "" "" "" "708379797979 79 79 79 79 99"}, {{{"balance""{{"balance"""::::::::::username": "abc 6378303"}, {"balance": 23129,""delta": 180," "username", "a137671268"},""reason" ":" xiayu"}

Start KSQL Client

bin/ksql http://localhost:8088

You can see the icon after ksql is started, and the operation terminal.

Ksql terminal view Kafka topic list

ksql> show topics;

Print messages in topic

PRINT 'score-normalized';

As you can see:

Format:STRING
19-1-5 下午11时59分31秒 , NULL , {"cost":7, "epoch":1512342568296,"gameId":"2017-12-04_07:09:28_\xE9\xAB\x98\xE6\x89\x8B1\xE5\x8C\xBA_200_015_185175","gameType":"situan","gamers": [{"balance":4405682,"delta":-60,"username":"0791754000"}, {"balance":69532,"delta":-60,"username":"70837999"}, {"balance":972120,"delta":-60,"username":"abc6378303" } `balance': 23129, `delta': 180, `username': `a137671268'}, `reason', `xiayu'}

Among them:

  • First comma19-1-5 11:59:31 p.m.Represents the message time.
  • Second commaNULLFor message key, because fromkafka-console-producerPush by defaultNULL
  • What follows is the content of the push message.

Create a Stream from topic score-normalized

CREATE STREAM SCORE_EVENT \
 (epoch BIGINT, \
  gameType VARCHAR, \
  cost INTEGER, \
  gamers ARRAY< \
              STRUCT< \
                      username VARCHAR, \
                      balance BIGINT, \
                      delta BIGINT \
                      > \
               >, \
  gameId VARCHAR, \
  tax BIGINT, \
  reason VARCHAR) \
  WITH ( KAFKA_TOPIC='score-normalized', \
         VALUE_FORMAT='JSON', \
         TIMESTAMP='epoch');

amongTIMESTAMP='epoch'Represents the timestamp of an event with epoch’s time.

Delete a STREAM

DROP  STREAM stream_name ;

If a query statement is querying the stream, an error will occur:

Cannot drop USER_SCORE_EVENT. 
The following queries read from this source: []. 
The following queries write into this source: [CSAS_USER_SCORE_EVENT_2, InsertQuery_4, InsertQuery_5, InsertQuery_3]. 
You need to terminate them before dropping USER_SCORE_EVENT.

NecessaryTERMINATECommand to stop these queries and then delete the stream:

TERMINATE CSAS_USER_SCORE_EVENT_2;
TERMINATE InsertQuery_4;

Query from the earliest records

ksql> SET 'auto.offset.reset' = 'earliest';

Query all data from Stream

ksql> SELECT * FROM SCORE_EVENT;

As you can see:

15467070238989664 | null | null | 15123232342568296 | Situan | 7 | [{USERNAME = 0791754000, BALANCE = 4405682, DELTA=-60}, {USERNAME = 70837999, BALANCE = 69532, DELTA=-60}, {{USERNAME = abc63787878303 637878303, BALANCE = 972120, BALANCE = 972120, TA=-60}, {DEL {ERERNAME = 13USUSUSUSUSUSUSUSUSUSUSUSUSUSNAME = 1371717176ME, 267171717676, DELLANCE 23232323TA=180}] | 2017-12-04_07:09:28_master area 1 _200_015_185175 | null | Xiayu

Among them:

  • The first column is the timestamp of the record.
  • The second column is the key of the record.
  • Column 3 is followed by the values of the fields in the message, corresponding to the order in which the flow was created.
  • Null in the penultimate column is due to the messagetaxThe field does not exist.

Statistics2017-12-04Total number of matches in Japan

Add a game_date field for statistics
CREATE STREAM SCORE_EVENT_WITH_DATE AS \
    SELECT SUBSTRING(gameId, 0, 10) AS game_date, * \
    FROM SCORE_EVENT;
    
SELECT game_date, COUNT(*) \
    FROM SCORE_EVENT_WITH_DATE \
    WHERE game_date = '2017-12-04' AND reason = 'game' \
    GROUP BY game_date;

At present, KSQL does not support queries like the following:

SELECT COUNT(*) \
  FROM SCORE_EVENT \
  WHERE gameId LIKE '2017-12-04_%';

Statistics of the total number of players participating in the game (de-duplication)

Because a log contains game information for multiple players, the idea is to split each player into separate events.

  • Integrate player events into a unified streamUSER_SCORE_EVENT
CREATE STREAM USER_SCORE_EVENT AS \
    SELECT epoch, gameType, cost, gameId, tax, reason, gamers[0]->username AS username, gamers[0]->balance AS balance, gamers[0]->delta AS delta \
    FROM SCORE_EVENT;
    
INSERT INTO USER_SCORE_EVENT \
    SELECT epoch, gameType, cost, gameId, tax, reason, gamers[1]->username AS username, gamers[1]->balance AS balance, gamers[1]->delta AS delta \
    FROM SCORE_EVENT;
    
INSERT INTO USER_SCORE_EVENT \
    SELECT epoch, gameType, cost, gameId, tax, reason, gamers[2]->username AS username, gamers[2]->balance AS balance, gamers[2]->delta AS delta \
    FROM SCORE_EVENT;
    
INSERT INTO USER_SCORE_EVENT \
    SELECT epoch, gameType, cost, gameId, tax, reason, gamers[3]->username AS username, gamers[3]->balance AS balance, gamers[3]->delta AS delta \
    FROM SCORE_EVENT;
  • For subsequent connections for the player name usernameJOINQuery, you need to reset Key:
CREATE STREAM USER_SCORE_EVENT_REKEY AS \ 
SELECT * FROM USER_SCORE_EVENT \
PARTITION BY username;

Output:

ksql> SELECT * FROM USER_SCORE_EVENT_REKEY;


4000 | lzc | 4000 | Situan | 7 | 2017-12-04_07:09:28 Master 2 500_015_185175 null game lzc 972120 60
4000 | LZB | 4000 | Situan | 7 | 2017-12-04_07:09:28 Master 2 500_015_185175 null game LZB 69532 60

Be careful:

It is found in practice that only PARTITION BY on STREAM field can take effect.

  • Create a table by counting the total revenue of each player on the number of games, wins and losses, and contributions.USER_SCORE_TABLE
CREATE TABLE USER_SCORE_TABLE AS \
    SELECT username, COUNT(*) AS game_count, SUM(delta) AS delta_sum, SUM(tax) AS tax_sum \
    FROM USER_SCORE_EVENT_REKEY \
    WHERE reason = 'game' \
    GROUP BY username;

SeeUSER_SCORE_TABLEAll data:

ksql> SELECT * FROM USER_SCORE_TABLE;
1546709338711 | 70837999 | 70837999 | 4 | -240 | 0
1546709352758 | 0791754000 | 0791754000 | 4 | -240 | 0
1546709338711 | a137671268 | a137671268 | 4 | 720 | 0
1546709352758 | abc6378303 | abc6378303 | 4 | -240 | 0
  • Query the total number of games, wins and losses, and contributions of a player:
ksql> SELECT * FROM USER_SCORE_TABLE WHERE username = '70837999';

Output:

1546709338711 | 70837999 | 70837999 | 4 | -240 | 0

Statistics of the total number of players (weight removal)

  • Add aPuppet rankFor statistics:
CREATE TABLE USER_SCORE_WITH_TAG AS \
    SELECT 1 AS tag, * FROM USER_SCORE_TABLE;
  • Statistics of the total number of players after weight removal
SELECT tag, COUNT(username) \
FROM USER_SCORE_WITH_TAG \
GROUP BY tag;

Continued

KSQL WINDOW function.