Data set on bitcoin chain on kaggle – using Google big query API to process bitcoin data (1)

Time:2020-2-10

About kaggle

Kaggle is a data competition platform, founded in 2010 and acquired by Google in 2017. The platform provides a large number of open datasets and free computing resources. Only need to register an account can write code and analyze data online.

Big query bitcoin dataset

Data set homepage https://www.kaggle.com/bigque

There are currently more than 700 kernels. The introduction said that the data is continuously updated. At present, it will be updated to September 2018.

The data size on the bitcoin chain exceeds 100 GB. This is accessed through the Google big query API without any data files. So this dataset can only be used online, not downloaded, but they provide data extraction code (https://github.com/blockchain…), so you can choose to create this part of data locally. According to the document, the upper limit of data access per account per month is 5 TB.

There are four tables for data: blocks, inputs, outputs, and transactions.

An example

The code comes from here, with some changes (the original code cannot be executed due to the change of the library version), and some minor contents are omitted.

Query the number of bitcoin addresses received every day

from google.cloud import bigquery
import pandas as pd

client = bigquery.Client()

# Query by Allen Day, GooglCloud Developer Advocate (https://medium.com/@allenday)
query = """
#standardSQL
SELECT
  o.day,
  COUNT(DISTINCT(o.output_key)) AS recipients
FROM (
  SELECT
    TIMESTAMP_MILLIS((timestamp - MOD(timestamp,
          86400000))) AS day,
    output.output_pubkey_base58 AS output_key
  FROM
    `bigquery-public-data.bitcoin_blockchain.transactions`,
    UNNEST(outputs) AS output ) AS o
GROUP BY
  day
ORDER BY
  day
"""

query_job = client.query(query)

iterator = query_job.result(timeout=30)
rows = list(iterator)

# Transform the rows into a nice pandas dataframe
transactions = pd.DataFrame(data=[list(x.values()) for x in rows], columns=list(rows[0].keys()))

# Look at the first 10 headlines
transactions.head(10)

Output:

Data set on bitcoin chain on kaggle - using Google big query API to process bitcoin data (1)

transactions.tail(10)

Output:

Data set on bitcoin chain on kaggle - using Google big query API to process bitcoin data (1)

Drawing

import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
plt.plot(transactions['day'], transactions['recipients'])

Data set on bitcoin chain on kaggle - using Google big query API to process bitcoin data (1)

Welcome to my blog: https://codeplot.top/
My blog bitcoin categories