# Time Series Forecasting Using Past and Future External Data with Darts

## Raw Text

Services Data and AI Transformation Pathway Data & AI Consulting Practice Data & AI solutions Data & AI platforms Data platform operations and MLOps Palantir Foundry Services Darts Forecasting Platform

Industries Banking and finance Manufacturing Chemicals Logistics and transport Insurance Pharma and medical products Retail & Consumer Goods

Case studies All Foundry Aviation / Automotive Chemicals Data Insurance Finance Logistics Manufacturing Pharma

Resources All Blog Webinars Whitepapers Newsroom Media Releases

About

Career

Contact us

de | fr

Back to resources

by Julien Herzen

9 minutes

Building models that are able to capture external data is often a key aspect of time series forecasting projects. For instance:

Recently-observed activity on an e-commerce website can help predict future sales.

Observed rainfalls and known weather forecasts can help to predict hydro and solar electricity production.

Making the model aware of up-coming holidays can help sales forecasting.

Knowing that some intervention is ongoing on a system can be helpful for correcting forecasting / outage detection.

etcâŚ

In fact, more often than not, strictly relying on the history of a time series to predict its future is missing a lot of valuable information.

Darts Â is an open source Python library whose primary goal is to smoothen the time series forecasting experience in Python. Out of the box it provides a variety of models, from ARIMA to deep learning models, which can all be used in a similar straightforward way usingÂ fit() Â andÂ predict() . In this post, weâll show how Darts can be used to easily take âcovariatesâ â other time time series providing useful information â into account. First, let us quickly explain a subtle-yet-important distinction between âpastâ and âfutureâ covariates.

Past and Future Covariates

We define two kinds of time series which can be used for forecasting:

Past covariates Â are time series whose past values are known at prediction time. Those series often contain values that have to be observed to be known.

Future covariates Â are time series whose future values are known at prediction time. More precisely, for a prediction made at timeÂ t Â for a forecast horizonÂ n , the values at timesÂ t+1, âŚ, t+n Â are known. Often, the past values (for timesÂ t-k, t-k+1, âŚ, tÂ for some lookback window Â k ) of future covariates are known as well. Future covariates series contain for instance calendar informations or weather forecasts.

Note that in general future covariates can also be used as past covariates, whereas the reverse is not true.

Past and Future Covariates in Darts

Darts differentiates models that make use of past and future covariates:

Past covariates models: TheÂ fit()Â andÂ predict()Â methods of these models accept only aÂ past_covariatesÂ argument (specifying one or a sequence ofÂ TimeSeries). These models will look only at past values of the covariate series when making a prediction. Past covariates models:Â BlockRNNModel,Â NBEATSModel,Â TCNModel,Â TransformerModel,Â RegressionModelÂ (incl.Â LinearRegressionModelÂ andÂ RandomForest).

Depiction of the inputs/outputs for âPast Covariates modelsâ at prediction time.

Future covariates Â models : TheÂ fit() Â andÂ predict() Â methods of these models accept only aÂ future_covariates Â argument. The training procedure will look at future values of the covariates (and possibly at historic values too), and future values will have to be provided at prediction time. Global future covariates models:Â RNNModel ,Â RegressionModel Â (incl.Â LinearRegressionModel &Â RandomForest) . Local future covariates models:Â ARIMA ,Â VARIMA ,Â AutoARIMA .

âYou shouldnât be too worried about making a mistake when employing past and future covariates, because Darts will complain if you try providing the wrong kind of covariates to the wrong model or if your covariates are not known sufficiently into the future (or into the past). In addition, it takes care of slicing the covariates and targets for you automatically, even if they are not aligned (as long as the time axes of the series are correct).â

Note thatÂ RegressionModel Â (incl.Â LinearRegressionModel Â andÂ RandomForest) Â support bothÂ past_covariates Â andÂ future_covariates . In the rest of the article, weâll see how to fit some RNN-based models using either past covariates or future covariates, and then weâll fit aÂ RegressionModel Â using both past and future covariates.

A Toy Example: Forecasting a River Flow

As a toy example, letâs assume we want to forecast the flow of a river. Weâll be using synthetic time series data (created with Darts as well) to demonstrate how past and future covariates can be used. What weâll do here is only meant to demonstrate how covariates can be used, and by no means represents a good (or realistic) way to forecast an actual river flow đ

You can reproduce this example by installing Darts as follows:

pip install darts

The entire code is also available in a notebookÂ here .

A Simplistic River Model

We assume that the flow of our river on dayÂ t Â depends on two factors:

The melting rate of an upstream glacierÂ t â 5 Â days ago.

The rainfalls during the last 5 days (fromÂ t â 4 Â toÂ t ).

We want to forecast the flow 10 days in advance. Furthermore, we assume that:

The glacierâs melting rate is not known in advance because we have to measure it directly in order to know it; it is thus aÂ past covariate .

The rainfall is known 10 days in advance from weather forecasts. It is thus aÂ future covariate . It is also known in the past.

We start by generating some synthetic daily time series to create a problem instance. Dartsâ global models (such as neural networks and regression models) can easily be trained on multiple time series (for instance just callingÂ model.fit([series1, series2, ...], past_covariates=[covariate1, covariate2, ...]) ), so we could simulate several rivers and train one model on all these data. But here we will focus on showing how to use past and future covariates using only one target series.

In the code below,Â melting Â is our past glacier melting covariate series,Â rainfalls Â is the future rainfall covariate series, andÂ flow Â is the target river flow (which we want to forecast):

Our synthetic daily dataset representing the flow as a sum of lagged glacier melting and rainfalls. The dataset starts in January 2000 and lasts 3 years.

Evaluating Models

Now that we have our data, we can already think about how we would want to evaluate and compare the different models weâll build. Below we write a small function which performs backtesting and evaluates the accuracy of a 10-days ahead predictions over the last 20% of the flow series, using RMSE:

First Model: No Covariate

Letâs first create a BlockRNNModel. These models support past_covariates, but here in order to get a first benchmark, weâll fit it on the target only and see what we get. We somewhat arbitrarily select an input_chunk_length of 30 (this corresponds to the lookback window of the model), and we set the output_chunk_length to 10, as this is the horizon weâre interested to forecast:

Block RNN model without covariate. Backtest RMSE = 0.194

Second Model: Using Past Melting Data

Letâs now try to provide the melting series as a past_covariates to the model fit() function. Doing this means that the model will look at the past 30 time steps of melting (in addition to the past 30 time steps of the target) when producing a forecast.

Block RNN model with melting as a past covariate. Backtest RMSE = 0.172

This already improved the RMSE from 0.194 to 0.172, which is not bad; looking at the past melting helps because it determines part of the current flow.

Third Model: Using Past Melting and Past Rainfall Data

We can seamlessly extend this to use both the past melting and past rainfall data. The rainfall is known in advance, but here we specify it as a past_covariates, which means that the model will only look at past rainfalls.

In the following snippet, melting.stack(rainfalls) produces one multivariateTimeSeries containing two dimensions: the melting and the rainfall. This is the series we use as a past covariate.

Adding past rainfalls helps too, reducing the error further from 0.172 to 0.169. The rainfalls impacts the next 5 daysâ flow, and so past rainfalls provide some amount of signal to predict the next 10 daysâ flow. The impact is still somewhat limited, though, because this model is only looking at past rainfalls and not at the actual future rainfalls happening during the 10 days for which we want to predict the flow.

Fourth Model: Using Future Rainfalls

Letâs now try to use future rainfalls as a covariate. This might help us because a model using future_covariates will be able to look at the next 10 daysâ rainfalls (in addition to past rainfalls) in order to predict the next 10 daysâ flow. To do this, weâll use an RNNModel, which is a âpure RNNâ implementation that is able to use future_covariates(our RNNModel is similar to DeepAR ).

RNNModel using the rainfalls as a future covariate. Backtest RMSE = 0.158

It seems that itâs working: letting the model see the rainfalls for the next n=10 days brings back the RMSE down to 0.158. Again, this makes sense as the recent rainfalls make up a large component of the flow.

Note that we cannot use the melting as a future covariate, because it is not known in advance, and so we wouldnât be able to provide it at prediction time (Darts would complain if you tried to call predict() with a future_covariates series that doesnât extend at least 10 time points in the future further than the target).

Fifth Model: Using Past Melting and Future Rainfalls

Finally, we will now use a RegressionModel in order to be able to specify both a past_covariates and a future_covariates. RegressionModel in Darts is a wrapper around any âscikit-learn likeâ regression model, and by default it will use a linear regression. It can predict future values of the target series as a function of any combination of lagged values of the target, past and future covariates.

The lags of the target and past covariates have to be strictly negative (in the past), whereas the lags of the future covariates can also be positive (in the future). For instance, a lag value of -5 means that the value at time t-5 is used to predict the target at time t; and a lag of 0 means that the future covariate value at time t is used to predict the target at time t. In the code below, we specify past covariate lags as [-5, -4, -3, -2, -1] which means that the model will look at the last 5 past_covariates values (we could also have specified lags_past_covariates=5 instead). Similarly, we specify the future covariate lags as [-4, -3, -2, -1, 0] which means that the model will look at the last 4 historic values (lags -4 to -1) and the current value (lag 0) of the future_covariates. (we could also have specified lags_future_covariates=(4,1) instead). Note that we do not specify any lags here, which means that this model wonât look at past values of the target at all â it will look at covariates only.

RegressionModel (using a linear regression) predicting the flow as a function of the past 5 melting values and the past 4 and current rainfall values. Backtest RMSE = 0.102.

This model drastically improves the RMSE error, down to 0.102. So once again, linear regression wins! In fact, if we kept some additive noise on the covariates but removed the additive noise on the flow, we would find that this model produces perfect forecasts. To be fair, this was expected because the target is built as a linear combination of the covariates to begin with, and we built our RegressionModel specifying the exact right lags capturing the data generation process. Still, we expect these regression models to be very useful in practice, due to their speed, versatility in capturing both past and future covariates with precise lags, and the fact that, similar to neural networks, they can be trained on multiple series while requiring less tuning.

Conclusions

Past and future covariates often play an important role in forecasting problems, but they can be hard to handle and reason about. One goal of Darts is to make this experience easier and less error prone: using covariates with Darts boils down to providing your external time series data past_covariates or future_covariates arguments to the fit() and predict() methods of the models. In our river flow example, we observed that knowing past glacier melting and future rainfalls can each improve forecasting to different extents, and building a simple linear-regression based model capturing both obtains the best results in this case.

If you have any feedback on Darts, or if you have forecasting challenges youâd like to tell us about, feel free to reach out to us.

Related resources

The AI/ML toolkit with Darts and Nixtla

Hierarchical Forecast Reconciliation with Darts

Darts for Time Series Forecasting â Julien Herzen, Francesco Lassig at PyData Global 2021

Probabilistic Forecasting in Darts

Dartsâ Swiss Knife for Time Series Forecasting

Want to receive updates from us?

*

*

*

Our newsletter features industry news, the latest case studies, and future Unit8 events.

Thank you. Check your inbox or spam folder to confirm your subscription.

Case studies

Consulting

Foundry

Avia&Auto

Energy

Other

Chemicals

Data

Insurance

Finance

Logistics

Manufacturing

Pharma

Services

Data & AI Consulting Practice

Data & AI solutions

Data & AI platforms

Data platform operations and MLOps

Data Operations for Palantir Foundry

Palantir Foundry Services

Industries

Banking and finance

Insurance

Manufacturing

Pharma and medical products

Chemicals

Retail & Consumer Goods

Logistics and transport

Resources

About

Career

Contact us

Locations

Stockerstrasse 50, 8002 Zurich, Switzerland

Avenue d'Ouchy 4 1006 Lausanne, Switzerland

Bahnhofpl. 10b, 3011 Bern, Switzerland

Starowislna 13 31-038 Krakow, Poland

Sucha 3, 50-086 Wroclaw, Poland

Avenue d'Ouchy 4 1006 Lausanne Switzerland

Social Media

Github

Medium

Youtube

Partners

studio otwarte

Ă

We use cookies to ensure that we give you the best experience on our website. Accept

This page is only available in english

Cancel

Go to English version

## Single Line Text

Services Data and AI Transformation Pathway Data & AI Consulting Practice Data & AI solutions Data & AI platforms Data platform operations and MLOps Palantir Foundry Services Darts Forecasting Platform. Industries Banking and finance Manufacturing Chemicals Logistics and transport Insurance Pharma and medical products Retail & Consumer Goods. Case studies All Foundry Aviation / Automotive Chemicals Data Insurance Finance Logistics Manufacturing Pharma. Resources All Blog Webinars Whitepapers Newsroom Media Releases. About. Career. Contact us. de | fr. Back to resources. by Julien Herzen. 9 minutes. Building models that are able to capture external data is often a key aspect of time series forecasting projects. For instance: Recently-observed activity on an e-commerce website can help predict future sales. Observed rainfalls and known weather forecasts can help to predict hydro and solar electricity production. Making the model aware of up-coming holidays can help sales forecasting. Knowing that some intervention is ongoing on a system can be helpful for correcting forecasting / outage detection. etcâŚ In fact, more often than not, strictly relying on the history of a time series to predict its future is missing a lot of valuable information. Darts Â is an open source Python library whose primary goal is to smoothen the time series forecasting experience in Python. Out of the box it provides a variety of models, from ARIMA to deep learning models, which can all be used in a similar straightforward way usingÂ fit() Â andÂ predict() . In this post, weâll show how Darts can be used to easily take âcovariatesâ â other time time series providing useful information â into account. First, let us quickly explain a subtle-yet-important distinction between âpastâ and âfutureâ covariates. Past and Future Covariates. We define two kinds of time series which can be used for forecasting: Past covariates Â are time series whose past values are known at prediction time. Those series often contain values that have to be observed to be known. Future covariates Â are time series whose future values are known at prediction time. More precisely, for a prediction made at timeÂ t Â for a forecast horizonÂ n , the values at timesÂ t+1, âŚ, t+n Â are known. Often, the past values (for timesÂ t-k, t-k+1, âŚ, tÂ for some lookback window Â k ) of future covariates are known as well. Future covariates series contain for instance calendar informations or weather forecasts. Note that in general future covariates can also be used as past covariates, whereas the reverse is not true. Past and Future Covariates in Darts. Darts differentiates models that make use of past and future covariates: Past covariates models: TheÂ fit()Â andÂ predict()Â methods of these models accept only aÂ past_covariatesÂ argument (specifying one or a sequence ofÂ TimeSeries). These models will look only at past values of the covariate series when making a prediction. Past covariates models:Â BlockRNNModel,Â NBEATSModel,Â TCNModel,Â TransformerModel,Â RegressionModelÂ (incl.Â LinearRegressionModelÂ andÂ RandomForest). Depiction of the inputs/outputs for âPast Covariates modelsâ at prediction time. Future covariates Â models : TheÂ fit() Â andÂ predict() Â methods of these models accept only aÂ future_covariates Â argument. The training procedure will look at future values of the covariates (and possibly at historic values too), and future values will have to be provided at prediction time. Global future covariates models:Â RNNModel ,Â RegressionModel Â (incl.Â LinearRegressionModel &Â RandomForest) . Local future covariates models:Â ARIMA ,Â VARIMA ,Â AutoARIMA . âYou shouldnât be too worried about making a mistake when employing past and future covariates, because Darts will complain if you try providing the wrong kind of covariates to the wrong model or if your covariates are not known sufficiently into the future (or into the past). In addition, it takes care of slicing the covariates and targets for you automatically, even if they are not aligned (as long as the time axes of the series are correct).â Note thatÂ RegressionModel Â (incl.Â LinearRegressionModel Â andÂ RandomForest) Â support bothÂ past_covariates Â andÂ future_covariates . In the rest of the article, weâll see how to fit some RNN-based models using either past covariates or future covariates, and then weâll fit aÂ RegressionModel Â using both past and future covariates. A Toy Example: Forecasting a River Flow. As a toy example, letâs assume we want to forecast the flow of a river. Weâll be using synthetic time series data (created with Darts as well) to demonstrate how past and future covariates can be used. What weâll do here is only meant to demonstrate how covariates can be used, and by no means represents a good (or realistic) way to forecast an actual river flow đ. You can reproduce this example by installing Darts as follows: pip install darts. The entire code is also available in a notebookÂ here . A Simplistic River Model. We assume that the flow of our river on dayÂ t Â depends on two factors: The melting rate of an upstream glacierÂ t â 5 Â days ago. The rainfalls during the last 5 days (fromÂ t â 4 Â toÂ t ). We want to forecast the flow 10 days in advance. Furthermore, we assume that: The glacierâs melting rate is not known in advance because we have to measure it directly in order to know it; it is thus aÂ past covariate . The rainfall is known 10 days in advance from weather forecasts. It is thus aÂ future covariate . It is also known in the past. We start by generating some synthetic daily time series to create a problem instance. Dartsâ global models (such as neural networks and regression models) can easily be trained on multiple time series (for instance just callingÂ model.fit([series1, series2, ...], past_covariates=[covariate1, covariate2, ...]) ), so we could simulate several rivers and train one model on all these data. But here we will focus on showing how to use past and future covariates using only one target series. In the code below,Â melting Â is our past glacier melting covariate series,Â rainfalls Â is the future rainfall covariate series, andÂ flow Â is the target river flow (which we want to forecast): . Our synthetic daily dataset representing the flow as a sum of lagged glacier melting and rainfalls. The dataset starts in January 2000 and lasts 3 years. Evaluating Models. Now that we have our data, we can already think about how we would want to evaluate and compare the different models weâll build. Below we write a small function which performs backtesting and evaluates the accuracy of a 10-days ahead predictions over the last 20% of the flow series, using RMSE: First Model: No Covariate. Letâs first create a BlockRNNModel. These models support past_covariates, but here in order to get a first benchmark, weâll fit it on the target only and see what we get. We somewhat arbitrarily select an input_chunk_length of 30 (this corresponds to the lookback window of the model), and we set the output_chunk_length to 10, as this is the horizon weâre interested to forecast: Block RNN model without covariate. Backtest RMSE = 0.194. Second Model: Using Past Melting Data. Letâs now try to provide the melting series as a past_covariates to the model fit() function. Doing this means that the model will look at the past 30 time steps of melting (in addition to the past 30 time steps of the target) when producing a forecast. Block RNN model with melting as a past covariate. Backtest RMSE = 0.172. This already improved the RMSE from 0.194 to 0.172, which is not bad; looking at the past melting helps because it determines part of the current flow. Third Model: Using Past Melting and Past Rainfall Data. We can seamlessly extend this to use both the past melting and past rainfall data. The rainfall is known in advance, but here we specify it as a past_covariates, which means that the model will only look at past rainfalls. In the following snippet, melting.stack(rainfalls) produces one multivariateTimeSeries containing two dimensions: the melting and the rainfall. This is the series we use as a past covariate. Adding past rainfalls helps too, reducing the error further from 0.172 to 0.169. The rainfalls impacts the next 5 daysâ flow, and so past rainfalls provide some amount of signal to predict the next 10 daysâ flow. The impact is still somewhat limited, though, because this model is only looking at past rainfalls and not at the actual future rainfalls happening during the 10 days for which we want to predict the flow. Fourth Model: Using Future Rainfalls. Letâs now try to use future rainfalls as a covariate. This might help us because a model using future_covariates will be able to look at the next 10 daysâ rainfalls (in addition to past rainfalls) in order to predict the next 10 daysâ flow. To do this, weâll use an RNNModel, which is a âpure RNNâ implementation that is able to use future_covariates(our RNNModel is similar to DeepAR ). RNNModel using the rainfalls as a future covariate. Backtest RMSE = 0.158. It seems that itâs working: letting the model see the rainfalls for the next n=10 days brings back the RMSE down to 0.158. Again, this makes sense as the recent rainfalls make up a large component of the flow. Note that we cannot use the melting as a future covariate, because it is not known in advance, and so we wouldnât be able to provide it at prediction time (Darts would complain if you tried to call predict() with a future_covariates series that doesnât extend at least 10 time points in the future further than the target). Fifth Model: Using Past Melting and Future Rainfalls. Finally, we will now use a RegressionModel in order to be able to specify both a past_covariates and a future_covariates. RegressionModel in Darts is a wrapper around any âscikit-learn likeâ regression model, and by default it will use a linear regression. It can predict future values of the target series as a function of any combination of lagged values of the target, past and future covariates. The lags of the target and past covariates have to be strictly negative (in the past), whereas the lags of the future covariates can also be positive (in the future). For instance, a lag value of -5 means that the value at time t-5 is used to predict the target at time t; and a lag of 0 means that the future covariate value at time t is used to predict the target at time t. In the code below, we specify past covariate lags as [-5, -4, -3, -2, -1] which means that the model will look at the last 5 past_covariates values (we could also have specified lags_past_covariates=5 instead). Similarly, we specify the future covariate lags as [-4, -3, -2, -1, 0] which means that the model will look at the last 4 historic values (lags -4 to -1) and the current value (lag 0) of the future_covariates. (we could also have specified lags_future_covariates=(4,1) instead). Note that we do not specify any lags here, which means that this model wonât look at past values of the target at all â it will look at covariates only. RegressionModel (using a linear regression) predicting the flow as a function of the past 5 melting values and the past 4 and current rainfall values. Backtest RMSE = 0.102. This model drastically improves the RMSE error, down to 0.102. So once again, linear regression wins! In fact, if we kept some additive noise on the covariates but removed the additive noise on the flow, we would find that this model produces perfect forecasts. To be fair, this was expected because the target is built as a linear combination of the covariates to begin with, and we built our RegressionModel specifying the exact right lags capturing the data generation process. Still, we expect these regression models to be very useful in practice, due to their speed, versatility in capturing both past and future covariates with precise lags, and the fact that, similar to neural networks, they can be trained on multiple series while requiring less tuning. Conclusions. Past and future covariates often play an important role in forecasting problems, but they can be hard to handle and reason about. One goal of Darts is to make this experience easier and less error prone: using covariates with Darts boils down to providing your external time series data past_covariates or future_covariates arguments to the fit() and predict() methods of the models. In our river flow example, we observed that knowing past glacier melting and future rainfalls can each improve forecasting to different extents, and building a simple linear-regression based model capturing both obtains the best results in this case. If you have any feedback on Darts, or if you have forecasting challenges youâd like to tell us about, feel free to reach out to us. Related resources. The AI/ML toolkit with Darts and Nixtla. Hierarchical Forecast Reconciliation with Darts. Darts for Time Series Forecasting â Julien Herzen, Francesco Lassig at PyData Global 2021. Probabilistic Forecasting in Darts. Dartsâ Swiss Knife for Time Series Forecasting. Want to receive updates from us? * * * Our newsletter features industry news, the latest case studies, and future Unit8 events. Thank you. Check your inbox or spam folder to confirm your subscription. Case studies. Consulting. Foundry. Avia&Auto. Energy. Other. Chemicals. Data. Insurance. Finance. Logistics. Manufacturing. Pharma. Services. Data & AI Consulting Practice. Data & AI solutions. Data & AI platforms. Data platform operations and MLOps. Data Operations for Palantir Foundry. Palantir Foundry Services. Industries. Banking and finance. Insurance. Manufacturing. Pharma and medical products. Chemicals. Retail & Consumer Goods. Logistics and transport. Resources. About. Career. Contact us. Locations. Stockerstrasse 50, 8002 Zurich, Switzerland. Avenue d'Ouchy 4 1006 Lausanne, Switzerland. Bahnhofpl. 10b, 3011 Bern, Switzerland. Starowislna 13 31-038 Krakow, Poland. Sucha 3, 50-086 Wroclaw, Poland. Avenue d'Ouchy 4 1006 Lausanne Switzerland. Social Media. Twitter. LinkedIn. Instagram. Github. Medium. Youtube. Partners. studio otwarte. Ă. We use cookies to ensure that we give you the best experience on our website. Accept. This page is only available in english. Cancel. Go to English version.