About kaggle
Kaggle is a data contest platform, which was founded in 2010 and acquired by Google in 2017. The platform provides a large number of open data sets and free computing resources. You only need to register an account to write code and analyze data online.
Big query bitcoin dataset
Dataset home page https://www.kaggle.com/bigque…
There are currently more than 700 kernels. In the introduction, it is said that the data is constantly updated. For now, it will be updated to September 2018.
The data size on the bitcoin chain exceeds 100 GB. Here, it is accessed through the Google big query API without any data files. So this dataset can only be used online, not downloaded, but they provide data extraction code( https://github.com/blockchain… )So you can choose to create this part of data locally. According to the document, each account can access 5 TB of data per month.
There are four tables: blocks, inputs, outputs and transactions.
One example
Code from here, there are changes (the original code due to changes in the library version, can not be executed), but also omit some minor content.
Query the number of bitcoin addresses received per day
from google.cloud import bigquery
import pandas as pd
client = bigquery.Client()
# Query by Allen Day, GooglCloud Developer Advocate (https://medium.com/@allenday)
query = """
#standardSQL
SELECT
o.day,
COUNT(DISTINCT(o.output_key)) AS recipients
FROM (
SELECT
TIMESTAMP_MILLIS((timestamp - MOD(timestamp,
86400000))) AS day,
output.output_pubkey_base58 AS output_key
FROM
`bigquery-public-data.bitcoin_blockchain.transactions`,
UNNEST(outputs) AS output ) AS o
GROUP BY
day
ORDER BY
day
"""
query_job = client.query(query)
iterator = query_job.result(timeout=30)
rows = list(iterator)
# Transform the rows into a nice pandas dataframe
transactions = pd.DataFrame(data=[list(x.values()) for x in rows], columns=list(rows[0].keys()))
# Look at the first 10 headlines
transactions.head(10)
Output:
transactions.tail(10)
Output:
Drawing
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
plt.plot(transactions['day'], transactions['recipients'])
The next article in this series: getting bigquery bitcoin data with SQL – using Google big query API to process bitcoin data (2)
Welcome to my blog: https://codeplot.top/
My blog bitcoin classification