Ust

Ust Oldfield's Blog

Process a Tabular model from a Stored Procedure

The Challenge

Recently, at a client, I was challenged to create a stored procedure that would process a tabular model. This stored procedure would then be executed from a web application. The process behind it being: a user enters data into a web application, which gets written to a database. That data then needs to be immediately surfaced up into reports, with additional calculations and measures along the way. Therefore the tabular model, which does all the additional calculation and measures, needs to be processed by a user from the web application.

image

That’s the challenge – process a tabular model quickly, but should be processed by users on-demand.

The Solution

Part one: Method of process

There’s quite a few methods to process a tabular model: use an SSIS package, PowerShell, SQL Agent Job and others. I opted for SQL Agent Job because it was the most simple method of execution without having to reconfigure the server or introduce technologies and languages that weren’t already in use.

First things first, create a SQL Agent Job, I called mine ProcessTabular. Then create a Step. The Type should be SQL Server Analysis Services Command, input the server address and input the refresh script. I’m using SQL Server 2016, so using Tabular Model Scripting Language (TMSL) for my command. XMLA commands also work for older versions. A full list of commands for processing a tabular database can be found here.

image

Part two: Start the Agent Job

Now that we have a SQL Agent job, we’ll need to start that job using SQL. Luckily, there’s a system stored procedure that can start agent jobs: msdb.dbo.sp_start_job

Method for calling it is

EXEC msdb.dbo.sp_start_job 'ProcessTabular'

producing the following successful message

image

Part three: The Stored Procedure

sp_start_job works, but it doesn’t accommodate for providing a completion message, or informing a user that a process is in progress.

Introducing the code for the stored procedure:

CREATE PROCEDURE [dbo].[ProcessTabularDatabase]
 
AS
 
DECLARE @JobName nvarchar(50) = 'ProcessTabular',
@ResultCheck INT
 
IF NOT EXISTS(    
         SELECT 1
         FROM msdb.dbo.sysjobs_view job 
         JOIN msdb.dbo.sysjobactivity activity ON job.job_id = activity.job_id
         WHERE 
             activity.run_Requested_date IS NOT NULL 
         AND activity.stop_execution_date IS NULL 
         AND job.name = @JobName
         )
BEGIN     
     PRINT 'Starting job ''' + @JobName + '''';
     EXEC msdb.dbo.sp_start_job @JobName;
 
  WHILE (1 = 1)
  BEGIN
   SELECT @ResultCheck =  IIF(stop_execution_date IS NULL,1,0)
      FROM msdb.dbo.sysjobactivity AS sja
      JOIN msdb.dbo.sysjobs AS sj ON sja.job_id = sj.job_id
      WHERE sj.name = @JobName
   IF @ResultCheck = 0 BREAK;
  END
 
  PRINT 'Successfully Processed Tabular Database'
END
ELSE
BEGIN
     PRINT 'Job ''' + @JobName + ''' is already started ';
END


Conclusion

This stored procedure can be executed by the web application, enabling users to process a tabular database on-demand and get feedback as to the success of the task.











Forecasting: R and SQL Server 2016

Forecasting is pretty crucial to any business as nearly every function within a company needs an estimate of what the future will look like as a foundation to create and update plans. In a previous blog I addressed the principles and methods of forecasting and in this blog I will go through how you can forecast in R and SQL using the Autoregressive Integrated Moving Average (ARIMA) technique.

I will approach this in two parts. Part 1 will be demonstrating the forecast in an R platform; and in Part 2 I will be demonstrating the forecast using the R integration in SQL Server in conjunction with SSRS to visualise the results.

Part 1: The Pure R Approach

As mentioned, this will be creating a forecast using R and taking data from a SQL database. You will need to be able to connect SQL to R in order to do follow this section. You’ll also need to install the forecast CRAN package. As always we’ll be using AdventureWorks for the provision of data.

The R script is as follows:

library(RODBC)

library(forecast)

SQLConnection <- odbcConnect("AdventureWorks")

 

Data <- sqlQuery(SQLConnection, "WITH CTE AS(SELECT YearDate as[Year], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]  FROM (

SELECT * FROM (SELECT --DISTINCT

             SUM(salesamount) as monthsales

                                ,MONTH([OrderDate]) as monthdate

                                ,YEAR(OrderDate) as  yeardate

                   FROM [AdventureworksDW2016CTP3].[dbo].[FactInternetSales]

                   group by MONTH(orderdate), year(orderdate)) t )

AS Table1

PIVOT(SUM(MonthSales) FOR Monthdate IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) AS PVT

)

SELECT [Year], cast([1] as numeric(28,8)) as Jan, cast([2] as numeric(28,8)) as Feb, cast([3] as numeric(28,8)) as Mar, cast([4] as numeric(28,8)) as Apr, cast([5] as numeric(28,8)) as May,

cast([6] as numeric(28,8)) as Jun, cast([7] as numeric(28,8)) as Jul, cast([8] as numeric(28,8)) as Aug, cast([9] as numeric(28,8)) as Sep, cast([10] as numeric(28,8)) as Oct,

cast([11] as numeric(28,8)) as Nov, cast([12] as numeric(28,8)) as [Dec] FROM CTE

ORDER BY [Year] ASC")

 

month <- c(1,2,3,4,5,6,7,8,9,10,11,12)

mn<- month.abb[month]

 

DataMonth <- Data[ ,mn]

 

DataTS<- ts(c(t(DataMonth)),start = c(2010,12), end = c(2014,1),frequency=12)

ForecastArima <- forecast(auto.arima(DataTS))

plot(ForecastArima)

ForecastArima

 

The numerical output will look like this:


With the graphical output should look something like this:


Part 2: Forecasting in SQL

Forecasting in SQL follows a pretty similar approach to the pure R approach in producing the numerical output. But that’s where the similarity ends. As we’ll want to visualise the output we’ll need to do some manipulation on the result from the R script we will have to wrap the initial query in a stored procedure which can be called by the stored procedure which will populate the graph in SSRS.

The first step is to write the initial stored procedure which will contain the R script

CREATE PROCEDURE dbo.spForecastSalesPreparation AS

BEGIN

EXECUTE sp_execute_external_script

      @language = N'R'

     ,@script = N'library(forecast);

                                Data<- InputDataSet

                                month<- c(1,2,3,4,5,6,7,8,9,10,11,12)

                                mn<-month.abb[month]

                                DataMonth <- Data[ ,mn]

                                DataTS <- ts(c(t(DataMonth)), start = c(2010,12), end = c(2014,1), frequency =12)

                                ForecastArima <- forecast(auto.arima(DataTS))

                                ForecastArimadf <- data.frame(ForecastArima)'

     ,@input_data_1 = N'WITH CTE AS(SELECT YearDate as[Year],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12]  FROM (

                                                                      SELECT * FROM (SELECT

                                                                       SUM(salesamount) as monthsales

                                                                       ,MONTH([OrderDate]) as monthdate

                                                                       ,YEAR(OrderDate) as  yeardate

                                                                       FROM [AdventureworksDW2016CTP3].[dbo].[FactInternetSales]

                                                                       group by MONTH(orderdate), year(orderdate)) t )

                                                          AS Table1

                                                          PIVOT(SUM(MonthSales) FOR Monthdate IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) AS PVT)

                                      SELECT [Year],

                                      cast([2] as numeric(28,8)) as Feb,

                                      cast([3] as numeric(28,8)) as Mar,

                                      cast([1] as numeric(28,8)) as Jan,

                                      cast([4] as numeric(28,8)) as Apr,

                                      cast([5] as numeric(28,8)) as May,

                                      cast([6] as numeric(28,8)) as Jun,

                                      cast([7] as numeric(28,8)) as Jul,

                                      cast([8] as numeric(28,8)) as Aug,

                                      cast([9] as numeric(28,8)) as Sep,

                                      cast([10] as numeric(28,8)) as Oct,

                                      cast([11] as numeric(28,8)) as Nov,

                                      cast([12] as numeric(28,8)) as [Dec] FROM CTE

                                      ORDER BY [Year] ASC'

     ,@output_data_1_name = N'ForecastArimadf'

    WITH RESULT SETS (("Point.Forecast" numeric(28,2) NOT NULL,

                                      "Lo.80" NUMERIC(28,2) NOT NULL,

                                      "Hi.80" NUMERIC(28,2) NOT NULL,

                                      "Lo.95" NUMERIC(28,2) NOT NULL,

                                      "Hi.95" NUMERIC(28,2) NOT NULL));

END;

 

The R script that you invoke is very similar to the one in the pure R approach, as is the SQL query. But the key element is the WITH RESULT SETS command, which returns the results of the query in a tabular format. Note, it is important to specify the data types for each of the columns you wish to return.

Because of the desire to visualise the results from R we’ll need to join the output from R with the input to R and keep it in a structure that makes sense, both as a table and as a graph. The stored procedure that I’ve created is as follows:

CREATE PROCEDURE dbo.spForecastSales AS

BEGIN

DECLARE @ForecastResults TABLE (Id INT IDENTITY(1,1),

                                                   Value NUMERIC(28,2),

                                                   Lo80 NUMERIC(28,2),

                                                   Hi80 NUMERIC(28,2),

                                                   Lo95 NUMERIC(28,2),

                                                  Hi95 NUMERIC(28,2))

 

DECLARE @FinalResults TABLE (Id INT IDENTITY(1,1),

                                            ResultType VARCHAR(10),

                                            YearDate INT,

                                            MonthDate INT,

                                            Value NUMERIC(28,2),

                                            Lo80 NUMERIC(28,2),

                                            Hi80 NUMERIC(28,2),

                                            Lo95 NUMERIC(28,2),

                                            Hi95 NUMERIC(28,2))

 

 

-- insert the actual sales values

INSERT INTO @FinalResults (ResultType,

                                            YearDate,

                                            MonthDate,

                                            Value)

SELECT       'Actual'

             ,YEAR(OrderDate)   as  YearDate

             ,MONTH(OrderDate)  as      MonthDate

             ,SUM(SalesAmount)  as    Value

FROM [dbo].[FactInternetSales]

GROUP BY MONTH(OrderDate), YEAR(OrderDate)

ORDER BY YearDate ASC, MonthDate ASC

                                     

-- insert the forecast sales values

INSERT INTO @ForecastResults

EXECUTE dbo.spForecastSalesPreparation

 

DECLARE @MaxId INT,

             @LastDate DATE;

 

SET @MaxId = (SELECT MAX(Id) FROM @FinalResults)

 

SELECT @LastDate = DATEFROMPARTS(YearDate, MonthDate, 1)

FROM @FinalResults

WHERE Id = @MaxId

 

INSERT INTO @FinalResults (ResultType,

                                         YearDate,

                                         MonthDate,

                                         Value,

                                         Lo80,

                                         Hi80,

                                         Lo95,

                                         Hi95)

SELECT 'Forecast',

         YearDate = DATEPART(year, DATEADD(month, Id, @LastDate)),

         MonthDate = DATEPART(Month, DATEADD(month, Id, @LastDate)),

         Value,

         Lo80,

         Hi80,

         Lo95,

         Hi95

FROM @ForecastResults

ORDER BY Id ASC

 

SELECT  Id,

             ResultType,

             DATEFROMPARTs(YearDate, MonthDate, 1) AS YearDate2,

             Value,

             Lo80,

             Hi80,

             Lo95,

             Hi95

FROM @FinalResults

 

END;

 

Following the normal steps of creating a report in SSRS using a stored procedure as a dataset your graph might look something like this:


So, we’ve shown two similar yet different approaches to forecasting using the ARIMA method and utilising R and SQL. Hopefully this demonstrates how quick and easy it is to establish a rudimentary forecasting practice wherever you are.

Utilising R in SSRS

With the release of SQL Server 2016 CTP3 the inclusion of R has been a powerful and welcome addition. As R can now be directly queried from inside SQL Server Management Studio (SSMS) one can now use the data outputs from R in SQL Server Reporting Services (SSRS) thereby utilising the power of R in the convenience of SSRS.

This blog will guide you through the process of creating a simple report in SSRS using data from R.

As demonstrated in a previous blog, it is very easy to begin using R within SQL Server and this is no different.

First of you will need your SQL R Script, for which I’m producing a simple K Means cluster of employees in the Adventure Work Data Warehouse. Then you will want to wrap that query inside a stored procedure.

CREATE PROCEDURE dbo.spKMeansEmployee

AS

BEGIN

      EXECUTE sp_execute_external_script

                   @language = N'R',

                   @script = N'ClusterCount <- 4;

                                      df <- data.frame(InputDataSet);

                                      ClusterFeatures <- data.frame(df$BaseRate, df$VacationHours, df$SickLeaveHours, df$SalaryFlag);

                                      ClusterResult <- kmeans(ClusterFeatures, centers = ClusterCount, iter.max = 10)$cluster;

                                      OutputDataSet <- data.frame(df, ClusterResult);',

                   @input_data_1 = N'SELECT

                                                   EmployeeKey,

                                                   BaseRate,

                                                   VacationHours,

                                                   SickLeaveHours,

                                                   CAST(SalariedFlag AS VARCHAR(1)) AS SalaryFlag

                                            FROM dbo.DimEmployee;'

      WITH RESULT SETS (( EmployeeKey INT NOT NULL,

                                      BaseRate MONEY NOT NULL,

                                      VacationHours INT NOT NULL,

                                      SickLeaveHours INT NOT NULL,

                                      SalaryFlag VARCHAR(1) NOT NULL,

                                      ClusterResult INT NOT NULL

                                ));

END

 

The next step is to create a new report in Visual Studio and add a new Data Source.


Then create a dataset.


And link that dataset to a new report.


Then build the report how you want, using that dataset. This is the quick output I’ve opted for as you can quickly analyse employees based on the cluster they are in.


As you can see, and hopefully reproduce, it’s a very quick and relatively easy process that allows you to make business decisions by utilising the combined powerful capabilities of R and SQL.

SQL 2016 CTP3 R Setup and Review

The release of SQL Server 2016 CTP3 is pretty big as it’s the first CTP which incorporates in-database support for R. This means that you can call R scripts and functions within a SQL query without any workarounds as previously demonstrated.

To get yourself up and running on CTP3 you’ll need to follow these steps:

1)      Download, install and configure CTP3 https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016

a.       Make sure that Advanced Analytics Extension is installed

b.       verify that the service for Advanced Analytics Extensions starts automatically using the MSSQLLaunchpad account.

c.        On the Database Engine Configuration page, for Authentication mode, select Mixed Mode (SQL Server authentication and Windows authentication).

                                                               i.      For CTP3, a SQL login is required when using the RevoScaleR functions from a remote workstation to run R scripts in the context of the server.

2)      Download, install and configure Revolution R Open https://www.microsoft.com/en-us/download/details.aspx?id=49525

3)      Download, install and configure Revolution R Enterprise https://www.microsoft.com/en-us/download/details.aspx?id=49505

4)      Enable external scripts

exec sp_configure 'external scripts enabled',1;

RECONFIGURE;

5)      Run the post-installation script(s)

a.       Open an instance of command prompt with administrator privileges and run the following script

b.       C:\Program Files\RRO\RRO-3.2.2-for-RRE-7.5.0\R-3.2.2\library\RevoScaleR\rxLibs\x64\RegisterRExt.exe /install

                                                               i.      More info can be found here: https://msdn.microsoft.com/en-us/library/mt590536.aspx

Now you should be up and running and ready to fire up a new instance of SQL Server Management Studio to begin working with R. For the rest of the demo I will be using the latest version of Adventure Works for SQL 2016 CTP3, which can be found here: https://www.microsoft.com/en-us/download/confirmation.aspx?id=49502 Please restore the backed up databases to continue.

To start off we’re just going to do something really simple, which is return the mean of the UnitPrice from Sales.SalesOrderDetail in a result set, just to demonstrate functionality and ease in which you can write and execute R scripts within SQL.

The script I’m going to use is:

USE AdventureWorks2016CTP3;

 

EXECUTE sp_execute_external_script

@language = N'R'

,@script = N'OutputDataSet <-data.frame(mean(InputDataSet[,1]))'

,@input_data_1 = N'SELECT UnitPrice FROM [Sales].[SalesOrderDetail]'

WITH RESULT SETS ((col int not null));

 

What it does is call a stored procedure with three input parameters: @language; @script; and @input_data_1

The only valid input for @language is 'R’.

@script is the external language script specified as a literal or variable input. Datatype of @script is NVARCHAR(MAX).

@input_data_1 specifies the input data for the external script in the form of a T-SQL query. Datatype of @input_data_1 is NVARCHAR(MAX).

              With the specification of _1 it would appear that in future one might be able to pass in more than one lot of data.

The syntax for sp_execute_external_script can be much more complex than demonstrated and more information can be found here: https://msdn.microsoft.com/en-us/library/mt604368.aspx

So, we’ve run our script. SQL has initiated the R script, which is taking a mean of the first column and converting it to a data frame. A data frame is similar to a table and, at the moment, the output data has to be in a data frame structure in order for it to be passed back into SQL. The end result is that the mean of the UnitPrice has been returned.

Query and result should look something like this:

clip_image001

R is a really powerful tool as it is and by incorporating in-database support for R within SQL Server 2016 allows for a convenient, usable and even more powerful toolkit to process and analyse data. The possibilities of application to your data are vast but can range from simple data exploration, as demonstrated above, to predictive modelling.

Be on the lookout for more demonstrations of R functionality within SQL Server 2016.