# Time Series Algorithms in SQL Server

This is the fourth article on data mining

series. The below are the previous articles in this series.

Shopping

Basket Analysis in SQL Server

Using

Decision Trees in SQL Server

Data

Mining Cluster Analysis in SQL Server

This article focuses Time Series Algorithms which

are a forecasting technique. One of the most common algorithms used in

industry are time series algorithms which can be used to answer questions on the future values such sales volume for the next season,

or petrol prices in winter. Most of the cases, time series algorithms are limited to prices

and quantities. However, using the same theories and techniques, they have the

capabilities of predicting trajectory or a moving object and next video frame.

Time series data is typically correlated

with time. Also, there is often a seasonality factor. For example, hotels they have a seasons for

their sales. So when running predictions, you will need to consider seasonal activities.

## Requirements for Time Series Algorithm

Let us look at requirements for a Time

Series Algorithm. Let us use the vTimeSeries view in the AdventureWorksDW2014

database. Here is the output of the vTimeSeries view.

A single key time column

This defines the time

slices that the model will use. In the above example, TimeIndex is the time

column. This needs to be a continuous value as there shouldn’t be any gaps. In case of missing values there are few techniques can be

adopted which will be discussed later.

A predictable column

This is the column which

you will predict. In the above example, Amount and Quantity are the

predictable column.

Series key column

This is an optional column

to identify a series. In case of sales, there can be several regions so that

you can predict sales for region wise. In the above example, ModelRegion is

the series key column.

## Sample

Let us do a sample with SQL Server 2014

using the sample database. We will create a SQL Server Analysis Service (SSAS)

project. After the project creation is completed, the data source should be created

with the AdventureWorksDW2014 database. Then vTimeSeries view needs to be added

to the data source views.

After that your solution explorer of the

project looks like the below.

The next task is to create a data mining model.

By right clicking the mining structures nodes and select new mining structure.

In this wizard driven approach, you need to select Microsoft Time Series and

then vTimeSeries as the case table.

After setting a few default settings, you will end up

with option of selecting the key, predictable and series columns.

The TimeIndex column is the key column. ‘Amount’ is

selected as the Input and Predict column. In data mining a predication column

can be Predict or Predict Only. Predict means, for the prediction of next value

previous predicted values will be used. Predict Only means that predicted

values will not be used for prediction.

The ModelRegion column is selected as the

optional column. After creation of the mining model, at the properties

following can be observed.

Data types can be observed from the

following screen.

The data types can be changed from the above

screen. Most of the time, default data types are used.

Finally after the building model and

processing it, users can see the predictions from the mining model viewer.

There are a few options in this screen, where

you can show the deviations and the number of prediction steps.

Also, rather than selecting all the

regions, you have the options of selecting them. By selecting required group,

only that group can be viewed.

## Prediction

Important activity in the data mining is

predicting. Predicting can be done from the **Mining Model Predicting **tab

which is shown below.

After the above configuration, the results will

be shown below.

Data mining algorithm doesn’t understand

that after 201312, it should be 201401. Instead it will show 201313.

## DMX Query

DMX queries can be executed to retrieve

data.

The following query is to retrieve above result which will present next

4.

SELECT [v Time Series].[Model Region], PredictTimeSeries([v Time Series].[Amount],4) FROM [v Time Series]

The FLATTENED command will present the

result in a tabular format.

SELECT FLATTENED [v Time Series].[Model Region], PredictTimeSeries([v Time Series].[Amount],4) FROM [v Time Series]

The output will be the following.

DMX has the option of giving the range for the prediction. The following DMX

query will return the next 4-10 predictions ignoring 1^{st}, 2^{nd}

and 3^{rd}.

SELECT FLATTENED [v Time Series].[Model Region], PredictTimeSeries([v Time Series].[Amount],4,10) FROM [v Time Series]

A Filtering option is also available with the DMX.

SELECT FLATTENED [v Time Series].[Model Region], PredictTimeSeries([v Time Series].[Amount],4) FROM [v Time Series] Where [Model Region] ='M200 Europe'

Standard deviation and variance are important attributes in time

series which the following query will provide.

SELECT FLATTENED [Model Region], (SELECT *, PredictStdev(Amount) AS std, PredictVariance(Amount) AS Vari FROM PredictTimeSeries(Amount, 5) ) AS Forecast FROM [v Time Series]

## Missing Values

Since gaps in data are not allowed in time

series, missing value substitution is a major concern. In case of cross

prediction ( discussed below) with multiple values, all series should have

the same end points. With SQL Server Analysis Service there are three

options which are Previous, Mean and Constant.

·

Previous – > Repeats the value from the

previous time slice.

·

Mean -> Uses moving average of time slices

used in training.

·

Numeric Constant -> Uses the specified number

to replace all missing values.

Also, in some cases data mining algorithms

are used. For example, if the 10^{th} value is not available, it will

be predicted using previous nine values.

## Periodicity

Periodicity or seasonality is a repeating

pattern found within the data. For example, seasonal temperature fluctuations,

weekly or daily traffic flow fluctuations, sales in holiday seasons. Both

ARTxP and ARIMA (discussed later) support detection of seasonality or

periodicity. Fast Fourier transformations are used to detect seasonality

before training. You can control seasonality detection by using AUTODETECT_SEASONALITY

and PERIODICITY_HINT parameters.

## Cross Prediction

Cross prediction means predicting the

outcome of one series based on the behavior of another series. Typically, every

time series does not behave only based on it’s previous values. For example, petrol

prices depends on the political stability and other factors.

## Parameters

There are few parameters available for

users to tweak the time series algorithm.

#### AUTO_DETECT_PERIODICITY

Specifies a numerical value between 0 and 1

used to detect periodicity. Setting this value closer to 1 favors discovery of

many near-periodic patterns and automatic generation of periodicity hints.

Dealing with a large number of periodicity hints will likely lead to

significantly longer model training times. If the value is closer to 0,

periodicity is detected only for strongly periodic data.

#### COMPLEXITY_PENALTY

Inhibits the growth of the decision tree.

Decreasing this value increases the likelihood of a split, while increasing

this value decreases the likelihood. This is available only Enterprise

Edition.

#### FORECAST_METHOD

Specifies the forecasting algorithm to use.

There are two types available, ARTxp and ARIMA. ARTxp based on Microsoft

Decision Trees. This is optimized for predicting the next likely value in a

series and this supports cross-prediction. According to Microsoft this is a

unique feature available with Microsoft tools. On the other hand ARIMA is well

known and understood in the industry which does not supports cross-prediction.

Select MIXED to create models for both

ARTXP and ARIMA time series algorithms and combine their results during

prediction.

In Standard Edition, the models are

combined using an automatic ratio that favors ARTXP for near-term and ARIMA for

long-term prediction. In Enterprise Edition, the models are combined and

weighted according to the value set for PREDICTION_SMOOTHING.

If FORECAST_METHOD is set to ARTXP or

ARIMA, the value for PREDICTION_SMOOTHING is ignored.

#### HISTORIC_MODEL_COUNT

This specifies the number of historic

models that will be built. This is an Enterprise Edition feature

#### MAXIMUM_SERIES_VALUE

This specifies the upper constraint for any

time series prediction. Predicted values will never be greater than this

constraint. This is an Enterprise Edition feature.

#### MISSING_VALUE_SUBSTITUTION

As discussed before, this specifies the

method used to fill the gaps in historical data. By default, irregular gaps or

ragged edges in data are not allowed. The methods available to fill in

irregular gaps or edges are: by Previous value, by Mean value, or by specific

numerical constant. This parameter is used only for incomplete rows of input

data.

#### INSTABILITY_SENSITIVITY

This controls the threshold at which the

algorithm detects instability in variances in the ARTXP time series algorithm

and stops making predictions. The specified value is only valid when

FORECAST_METHOD is set to ARTXP. The default value of 1 turns on instability

detection. This setting optimizes the reliability of predictions, but might

limit the number of prediction steps. A value of 0 turns off instability

detection. This setting extends predictions indefinitely, but those predictions

might become statistically invalid. This is an Enterprise Edition feature.

Array

## No comments yet... Be the first to leave a reply!