Bitcoin data set on kaggle – using Google big query API to process bitcoin data (1)

Time:2021-3-2

About kaggle

Kaggle is a data contest platform, which was founded in 2010 and acquired by Google in 2017. The platform provides a large number of open data sets and free computing resources. You only need to register an account to write code and analyze data online.

Big query bitcoin dataset

Dataset home page https://www.kaggle.com/bigque…

There are currently more than 700 kernels. In the introduction, it is said that the data is constantly updated. For now, it will be updated to September 2018.

The data size on the bitcoin chain exceeds 100 GB. Here, it is accessed through the Google big query API without any data files. So this dataset can only be used online, not downloaded, but they provide data extraction code( https://github.com/blockchain… )So you can choose to create this part of data locally. According to the document, each account can access 5 TB of data per month.

There are four tables: blocks, inputs, outputs and transactions.

One example

Code from here, there are changes (the original code due to changes in the library version, can not be executed), but also omit some minor content.

Query the number of bitcoin addresses received per day

from google.cloud import bigquery
import pandas as pd

client = bigquery.Client()

# Query by Allen Day, GooglCloud Developer Advocate (https://medium.com/@allenday)
query = """
#standardSQL
SELECT
  o.day,
  COUNT(DISTINCT(o.output_key)) AS recipients
FROM (
  SELECT
    TIMESTAMP_MILLIS((timestamp - MOD(timestamp,
          86400000))) AS day,
    output.output_pubkey_base58 AS output_key
  FROM
    `bigquery-public-data.bitcoin_blockchain.transactions`,
    UNNEST(outputs) AS output ) AS o
GROUP BY
  day
ORDER BY
  day
"""

query_job = client.query(query)

iterator = query_job.result(timeout=30)
rows = list(iterator)

# Transform the rows into a nice pandas dataframe
transactions = pd.DataFrame(data=[list(x.values()) for x in rows], columns=list(rows[0].keys()))

# Look at the first 10 headlines
transactions.head(10)

Output:

Bitcoin data set on kaggle - using Google big query API to process bitcoin data (1)

transactions.tail(10)

Output:

Bitcoin data set on kaggle - using Google big query API to process bitcoin data (1)

Drawing

import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
plt.plot(transactions['day'], transactions['recipients'])

Bitcoin data set on kaggle - using Google big query API to process bitcoin data (1)

The next article in this series: getting bigquery bitcoin data with SQL – using Google big query API to process bitcoin data (2)

Welcome to my blog: https://codeplot.top/
My blog bitcoin classification