Bitcoin time-series forecasting

Issam Sebri
6 min readMay 16, 2021
Photo by Angela Compagnone on Unsplash

Introduction

When Bitcoin becomes the financial trend of the last few years, it is indisputable that many researchers and analysts seek to demystify this piece of technology as it conquers our daily financial lives. In this article, I will try to use machine learning to predict Bitcoin’s share price based on 9 years of records from Bitstamp and Coinbase datasets and we should hope that Elon Musk will forget about Bitcoin during this scan.

Time series forecasting

Time series is a series of data points indexed or listed related to timestamp order and taken based on fixed time period.

e.g: — weather temperature recorded every hour, Seismic sensor capture quake every second. Stock movement.

Forecasting is the way to analyze time series in order to extract meaningful pattern to predict future situation. Time series coming in many forms which hardly affect the model for prediction, so we can distinguish 4 types based on seasonality (periodical similarity), and trend (mean movement):

  • No seasonality, No trend — (fig 01)
  • No seasonality, Has a trend — (fig 02)
  • Has seasonality, No trend — (fig 03)
  • Has seasonality, Has trend — (fig 04)

The trend and seasonality will affect the value of the time series in different times. So the “White noise” series is called stationary which is not depend on time or season. Stationary time series is hard to predict in long term what we should take on consideration when windowing our datasets.

Datasets

The Bitstamp datasets is a time series recording for Bitcoin from 2011 until 2020, 9 years of consecutive monitoring of stock price windowed by 60 seconds for each row and have 7 features: Open, High, low, Close, Volume (BTC), Volume (Currency), Weighted price.

Bitstamp dataset exploring

Feature engineering

In order to prepare our data, we need to perform many steps and observations in order to understand and clean our data.

1- Imputation: Missing value is the most common problem, it’s like a worm that can mess up our data and cause lots of modeling errors. So this is the first step I took to start cleaning up my data.

We can distinguish 4 statistical imputation techniques that are not specific to the type of data used to deal with missing values: See

  • Mean imputation.
  • Mode imputation.
  • Median imputation.
  • Random sample imputation.

But we can find other imputation techniques that specific to time series data:

  • Last Observation Carried Forward (LOCF).
  • Next Observation Carried Backward (NOCB).
  • Linear Interpolation.
  • Spline Interpolation.
Difference between Mean, Median interpolation and LOCF, NOCB

As this is our final decision, we decide to go ahead with LOCF, in the following code we can observe that for each row the percentage of missing value can reach the 30% of available data.

2- Date extracting: The Timestamp column is a useful target for successful and meaningful windowing. But not this advantage in learning models. We also have to say that this timestamp format is definitely unreadable and abstract, so extracting a relevant format is our next step.

3- Data grouping: At this phase, we need to define our goal and configure our prediction interval, this step is strongly related to the specificity of the data. So, since we have to predict one hour in the future Bitcoin price based on the previous 24 hours. It is so tedious to follow the stock price every minute. we can do hourly windowing on our data. with a slight shift to catch the clock.

4- Feature correlation: Studying the correlation of features is crucial to understanding how they relate to each other. and removing highly correlated features can simplify modeling and help in terms of efficiency and getting rid of redundant data.

It’s more than clear that a lot of features are pretty similar which led us to keep just relevant ones: Volume_(BTC), Volume_(Currency) and Weighted_Price which should be our target too. and here our final data plotting:

5- Split data: We need to split our data into 70% for training, 30% for validation and just 10% for testing.

6- Standardization: The data comes in different scale and ranges which make tracking pattern and regression an impossible task for our model so scaling our features between [-1, 1] interval it’s a good practice to standardize our data.

Data Pipeline

1- Data windowing: The model that we hope to build is consist of taking feature from 24 successive hour and predict the Weighted price for the following hour. and evaluate the prediction against the existent record from the 24th hour.

Dedicate a function to split window comes with great help when we need to mapping this split behavior on the batches from the tf.data.Dataset .

2- Building datasets: To build finish with our Pipeline we need to take help from keras.preprocessing which convert a given array to a Tensorflow Dataset. this function can produce a datasets on batch form and shuffled. and can take a map from split_window function to provide an inputs and labels suited to our needs.

3- Building a model: The model is an RNN (Recurent Neural Network) with an LSTM gate (Long Short Term Memory) which insure that all the previous states of our data evolve in the prediction.

Result

By sampling plot of random different batch and compare between Prediction and the Labels, the result it seems quite decent regarding to the simplicity of the model.

And here you can find the model performance at first by evaluating the validation datasets and then against a hidden Test data.

Credit

Sorry about the length of this article, and thank you for getting this far. I need to give credit to some resources helps me a lot understanding some concepts.

You can get the whole code of previous article in this Github link

--

--