# Ust

### Ust Oldfield's Blog

Forecasting is pretty crucial to any business as nearly every function within a company needs an estimate of what the future will look like as a foundation to create and update plans. In a previous blog I addressed the principles and methods of forecasting and in this blog I will go through how you can forecast in R and SQL using the Autoregressive Integrated Moving Average (ARIMA) technique.

I will approach this in two parts. Part 1 will be demonstrating the forecast in an R platform; and in Part 2 I will be demonstrating the forecast using the R integration in SQL Server in conjunction with SSRS to visualise the results.

Part 1: The Pure R Approach

As mentioned, this will be creating a forecast using R and taking data from a SQL database. You will need to be able to connect SQL to R in order to do follow this section. You’ll also need to install the forecast CRAN package. As always we’ll be using AdventureWorks for the provision of data.

The R script is as follows:

library(RODBC)

library(forecast)

Data <- sqlQuery(SQLConnection, "WITH CTE AS(SELECT YearDate as[Year], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]  FROM (

SELECT * FROM (SELECT --DISTINCT

SUM(salesamount) as monthsales

,MONTH([OrderDate]) as monthdate

,YEAR(OrderDate) as  yeardate

group by MONTH(orderdate), year(orderdate)) t )

AS Table1

PIVOT(SUM(MonthSales) FOR Monthdate IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) AS PVT

)

SELECT [Year], cast([1] as numeric(28,8)) as Jan, cast([2] as numeric(28,8)) as Feb, cast([3] as numeric(28,8)) as Mar, cast([4] as numeric(28,8)) as Apr, cast([5] as numeric(28,8)) as May,

cast([6] as numeric(28,8)) as Jun, cast([7] as numeric(28,8)) as Jul, cast([8] as numeric(28,8)) as Aug, cast([9] as numeric(28,8)) as Sep, cast([10] as numeric(28,8)) as Oct,

cast([11] as numeric(28,8)) as Nov, cast([12] as numeric(28,8)) as [Dec] FROM CTE

ORDER BY [Year] ASC")

month <- c(1,2,3,4,5,6,7,8,9,10,11,12)

mn<- month.abb[month]

DataMonth <- Data[ ,mn]

DataTS<- ts(c(t(DataMonth)),start = c(2010,12), end = c(2014,1),frequency=12)

ForecastArima <- forecast(auto.arima(DataTS))

plot(ForecastArima)

ForecastArima

The numerical output will look like this:

With the graphical output should look something like this:

Part 2: Forecasting in SQL

Forecasting in SQL follows a pretty similar approach to the pure R approach in producing the numerical output. But that’s where the similarity ends. As we’ll want to visualise the output we’ll need to do some manipulation on the result from the R script we will have to wrap the initial query in a stored procedure which can be called by the stored procedure which will populate the graph in SSRS.

The first step is to write the initial stored procedure which will contain the R script

CREATE PROCEDURE dbo.spForecastSalesPreparation AS

BEGIN

EXECUTE sp_execute_external_script

@language = N'R'

,@script = N'library(forecast);

Data<- InputDataSet

month<- c(1,2,3,4,5,6,7,8,9,10,11,12)

mn<-month.abb[month]

DataMonth <- Data[ ,mn]

DataTS <- ts(c(t(DataMonth)), start = c(2010,12), end = c(2014,1), frequency =12)

ForecastArima <- forecast(auto.arima(DataTS))

,@input_data_1 = N'WITH CTE AS(SELECT YearDate as[Year],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12]  FROM (

SELECT * FROM (SELECT

SUM(salesamount) as monthsales

,MONTH([OrderDate]) as monthdate

,YEAR(OrderDate) as  yeardate

group by MONTH(orderdate), year(orderdate)) t )

AS Table1

PIVOT(SUM(MonthSales) FOR Monthdate IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) AS PVT)

SELECT [Year],

cast([2] as numeric(28,8)) as Feb,

cast([3] as numeric(28,8)) as Mar,

cast([1] as numeric(28,8)) as Jan,

cast([4] as numeric(28,8)) as Apr,

cast([5] as numeric(28,8)) as May,

cast([6] as numeric(28,8)) as Jun,

cast([7] as numeric(28,8)) as Jul,

cast([8] as numeric(28,8)) as Aug,

cast([9] as numeric(28,8)) as Sep,

cast([10] as numeric(28,8)) as Oct,

cast([11] as numeric(28,8)) as Nov,

cast([12] as numeric(28,8)) as [Dec] FROM CTE

ORDER BY [Year] ASC'

WITH RESULT SETS (("Point.Forecast" numeric(28,2) NOT NULL,

"Lo.80" NUMERIC(28,2) NOT NULL,

"Hi.80" NUMERIC(28,2) NOT NULL,

"Lo.95" NUMERIC(28,2) NOT NULL,

"Hi.95" NUMERIC(28,2) NOT NULL));

END;

The R script that you invoke is very similar to the one in the pure R approach, as is the SQL query. But the key element is the WITH RESULT SETS command, which returns the results of the query in a tabular format. Note, it is important to specify the data types for each of the columns you wish to return.

Because of the desire to visualise the results from R we’ll need to join the output from R with the input to R and keep it in a structure that makes sense, both as a table and as a graph. The stored procedure that I’ve created is as follows:

CREATE PROCEDURE dbo.spForecastSales AS

BEGIN

DECLARE @ForecastResults TABLE (Id INT IDENTITY(1,1),

Value NUMERIC(28,2),

Lo80 NUMERIC(28,2),

Hi80 NUMERIC(28,2),

Lo95 NUMERIC(28,2),

Hi95 NUMERIC(28,2))

DECLARE @FinalResults TABLE (Id INT IDENTITY(1,1),

ResultType VARCHAR(10),

YearDate INT,

MonthDate INT,

Value NUMERIC(28,2),

Lo80 NUMERIC(28,2),

Hi80 NUMERIC(28,2),

Lo95 NUMERIC(28,2),

Hi95 NUMERIC(28,2))

-- insert the actual sales values

INSERT INTO @FinalResults (ResultType,

YearDate,

MonthDate,

Value)

SELECT       'Actual'

,YEAR(OrderDate)   as  YearDate

,MONTH(OrderDate)  as      MonthDate

,SUM(SalesAmount)  as    Value

FROM [dbo].[FactInternetSales]

GROUP BY MONTH(OrderDate), YEAR(OrderDate)

ORDER BY YearDate ASC, MonthDate ASC

-- insert the forecast sales values

INSERT INTO @ForecastResults

EXECUTE dbo.spForecastSalesPreparation

DECLARE @MaxId INT,

@LastDate DATE;

SET @MaxId = (SELECT MAX(Id) FROM @FinalResults)

SELECT @LastDate = DATEFROMPARTS(YearDate, MonthDate, 1)

FROM @FinalResults

WHERE Id = @MaxId

INSERT INTO @FinalResults (ResultType,

YearDate,

MonthDate,

Value,

Lo80,

Hi80,

Lo95,

Hi95)

SELECT 'Forecast',

YearDate = DATEPART(year, DATEADD(month, Id, @LastDate)),

MonthDate = DATEPART(Month, DATEADD(month, Id, @LastDate)),

Value,

Lo80,

Hi80,

Lo95,

Hi95

FROM @ForecastResults

ORDER BY Id ASC

SELECT  Id,

ResultType,

DATEFROMPARTs(YearDate, MonthDate, 1) AS YearDate2,

Value,

Lo80,

Hi80,

Lo95,

Hi95

FROM @FinalResults

END;

Following the normal steps of creating a report in SSRS using a stored procedure as a dataset your graph might look something like this:

So, we’ve shown two similar yet different approaches to forecasting using the ARIMA method and utilising R and SQL. Hopefully this demonstrates how quick and easy it is to establish a rudimentary forecasting practice wherever you are.

What comes to your mind when you hear the words forecasting, forecasts etc?

Invariably, you’ll think of weather forecasts. But forecasts are much more than that.

Forecasting is the process of making predictions of the future based on past and present data and analysis of trends. It’s a process that has existed for millennia, though often with dubious methodologies… Instead of looking in to a crystal ball to predict the future we are going to employ the power of statistics!

Why use forecasting?

Nearly every function within a company needs an estimate of what the future will look like as a foundation to create and update plans. For example:

• Marketers need forecasts to estimate the expected demand that will be generated from their activities.

• Salespeople use forecasts to understand what volumes are expected in each period and to evaluate the sales pipeline activity to make sure that it supports those expectations.

• Managers in the supply chain use forecasts to make timely purchasing requests, develop production plans, evaluate capacity needs, and develop logistical plans.

• Finance professionals use forecasts to prepare and understand financial plans. They also use them to report on earnings expectations.

What can be forecast?

The predictability of an event or a quantity depends the following conditions:

1. How well we understand the factors that contribute to it;

• For example, in energy consumption the change in temperature will have an impact on the amount of energy we use to heat our homes.

2. How much data is available;

• Generally, the more data you have to hand the more accurate your forecasts will be.

3. Whether the forecasts can affect the thing we are trying to forecast.

• This is the principle of self-fulfilling forecasts. For example, if you publicly forecast a share price buyers will adjust their behaviour in order to achieve the forecasted price.

Often in forecasting, a key step is knowing when something can be forecast accurately, and when forecasts will be no better than tossing a coin. Good forecasts capture the genuine patterns and relationships which exist in the historical data, but do not replicate past events that will not occur again.

Forecasting methods

1.       Average approach

a.       The prediction of all future values are the mean of the past values

2.       Naïve approach

a.       The prediction of all future values is that of the last observed value

b.       Approach is cost effective and provides a benchmark for more sophisticated models

3.       Drift method

a.       A variation on the Naïve method whereby the increase or decrease (drift) is the average change seen across the historical data

4.       Seasonal Naïve

a.       Accounts for seasonal change seen in the data and sets each prediction to the last observed value in the same season

5.       Autoregressive integrated moving average (ARIMA)

a.       Used to apply a better understanding of data or to predict future points

6.       Exponential smoothing

a.       Applies smoothing to the data in order to remove noise

b.       Three types

i.      Simple exponential

1.       Models level

ii.      Double exponential

1.       Models level and trend

iii.      Triple exponential

1.       Models trend and seasonal components

7.       Probabilistic

a.       Assign a probability value to each of a variety of different outcomes, and the complete set of probabilities represents a probability forecast.

b.       Most common example is in weather forecasting, but can be used in energy consumption, sports betting and population.

8.       Qualitative approaches

a.       Typically asking people, usually experts, what they think is going to happen.

i.      Can be interpreted as a crowd sourced forecast

Forecast Planning

Forecasting is a common statistical task which helps to inform decisions about scheduling of staff, production, etc., and provides a guide for strategic planning. Forecasting is often used in conjunction with planning and setting goals.

Forecasting is about the process of making predictions of the future based on past and present data and analysis of trends.

Goals are what you would like to happen. You should plan how you are going to achieve them and forecast to see if they are realistic.

Planning is a response to forecasts and goals. It involves determining an appropriate response to make forecasts match goals.

The application of these three activities sets the length of time you want to forecast.

Short-term forecasts:

The sort of activity this is needed for is scheduling of staff, production, transportation etc. Demand forecasts are often required

Medium-term forecasts:

Needed to plan future resource requirements, such as hiring of staff and ordering materials and parts.

Long-term forecasts:

Used in strategic planning.

Forecasting is a really powerful tool to add to your toolkit in order to better understand the future and to better accommodate what that forecast might bring so that you can grow your business in the best way possible.  I will be publishing a follow-up blog in how to perform forecasting in R in the near future.