Ust Oldfield's Blog

Connecting SQL Server to R

In this post I’m going to use R to retrieve some data from SQL Server.

In order to use R in conjunction with SQL Server, but in the absence of SQL Server 2016 and its soon to be incorporated R functionality, it is necessary to use a few workarounds in order to produce the desired outcome.

R is a package based platform and does not inherently communicate with other platforms unless the relevant package is installed and called. There are quite a few packages that can be used for R to talk to SQL Server, however I prefer to use the RODBC package as it is simpler to use than other packages available, and will be using it for this example.


Step 1: Create a connection

As RODBC requires a connection to your SQL Server database you’ll have to open up the ODBC Data Source Administrator instance on the machine you’re using. Under the User DSN tab (though you could use the System DSN) click Add to create a new connection.

Select SQL Server Native Client 11.0 and Finish.


It will then open up the following screen and fill in appropriately


It will then open up with the necessary security information, but as I’m using a local version I will persist with Windows authentication.

The next screen is where you choose the database you want to connect to. By default you will just connect to the server, but if you wish to save the results from your analysis in R back to SQL Server and to the correct database it is important that you select the desired database. For this example I’m connecting to Adventure Works.


The next screen is general configuration properties and can be changed for the user’s needs


Click finish, test the connection and you’re all done for this step!

Step 2: Connecting R

For this next part we’ll be using an R client. My preferred client is R Studio as it provides a clean interface.

Now that you’ve created your connection you’ll want to use it within R. After firing up an instance of R with the RODBC package installed you will want to invoke it with the following syntax:


To bring the connection through to R you’ll need to assign a variable to it with the help of the odbcConnect function.

The format for invoking the function is as follows:

connectionstring <- odbcConnect("some dsn", uid = "user", pwd = "****")

connectionstring is the variable assigned to store the connection

odbcConnect is the function

“some dsn” is the name of your DSN connection

uid  and pwd are the User ID and password for the server, if needed

For our example using AdventureWorks on a local machine the syntax is as follows:

AdventureWorks <- odbcConnect ("AdventureWorks")<?xml:namespace prefix = "o" />

In order to see which objects are in your database you should run the following syntax:


Which produces an output similar to this:


You can then begin to use your data from SQL Server in R by using the sqlQuery function to extract data.

Employee <- sqlQuery(AdventureWorks,

"SELECT * FROM HumanResources.Employee")

The purpose of the sqlQuery function is to be able to get a specific set of data, potentially from multiple tables. If you just wish to return the contents of one table it would be better to use the sqlFetch function.

Employee <- sqlFetch(AdventureWorks,"HumanResources.Employee")

sqlFetch returns the contents of the specified table and stores it in the assigned variable.


Connecting R to SQL Server is relatively easy and allows you to unleash the power of R on your data without employing expensive add-ons or waiting for a future SQL Server 2016 CTP to be released.

Currency Conversion (Triangulation Arbitrage)

Some systems use a single currency as a base, which is something that I noticed recently when working with IBM Congos Controller, e.g. USD to convert local currencies into. But what if you want / need to rebase into another currency but still retain the original base?

This doesn’t appear to be easy to achieve within Cognos Controller itself, but it is achievable within SQL and a wider SSIS framework.

The basis for rebasing exchange rates uses the technique of triangulation arbitrage. The technique is most often used within the banking system and more information can be found here: In principle you deal with two known exchange rates and one unknown.

For example, if you have an exchange rate base of USD and know that the GBP/USD exchange rate is 1.54631 and the EUR/USD exchange rate is 1.11470 and wish to rebase from USD to EUR. You would begin by finding the inverse rates of GBP/USD (1/1.54631 = 0.6467), multiply by the EUR/USD rate (0.6467*1.11470 = 0.72087649) which produces the EUR/GBP and then find the inverse (1/0.72087649 = 1.3872) to produce the GBP/EUR rate. In order to find the USD/EUR exchange rate one simply finds the reverse of the EUR/USD rate (1/1.11470 = 0.8971).

That might sound simple, but most exchange rates are held in a table across a range of dates. This complicates the calculation somewhat. I’ve used CTEs because I find that it makes the script neater and easier to debug. Below is an example of the triangulation using the Sales.CurrencyRate table in the AdventureWorks2012 database.


USE AdventureWorks2012

LocalExchange AS (    
     ToCurrencyCode                                                                                    AS CurrencyCode
    ,CASE WHEN ToCurrencyCode <> 'EUR' THEN (1 / AverageRate) END                                    AS LocalCurrencyConversion
    FROM [Sales].[CurrencyRate]
EuroExchange AS (    
    ToCurrencyCode                                                                                    AS CurrencyCode
    ,CASE WHEN(ToCurrencyCode = 'EUR')THEN AverageRate END                                            AS EuroConversion
    FROM [Sales].[CurrencyRate]
        WHEN(ToCurrencyCode = 'EUR')  THEN AverageRate END IS NOT NULL
        WHEN C.ToCurrencyCode = 'EUR' THEN 'USD'
        WHEN C.AverageRate = 1 THEN 'EUR'
        ELSE C.ToCurrencyCode
     END                                                                                                AS CurrencyCode
        WHEN C.AverageRate = 1 THEN 1 ELSE
            (1 / (CASE
            WHEN C.ToCurrencyCode <> 'EUR' THEN (L.LocalCurrencyConversion * E.EuroConversion)
            ELSE C.AverageRate
        END)                                                                                                AS ImplicitEuroExchangeRate               
    --,C.ExchangeRate                                                                                        AS ExchangeRate                                                           
    [Sales].[CurrencyRate] AS C
INNER JOIN LocalExchange AS L
    ON L.CurrencyRateDate = C.CurrencyRateDate
    AND L.CurrencyCode = C.ToCurrencyCode
INNER JOIN EuroExchange AS E
    ON E.CurrencyRateDate = C.CurrencyRateDate
        WHEN c.ToCurrencyCode <> 'EUR' THEN (E.EuroConversion * L.LocalCurrencyConversion)
        ELSE 1   
ORDER BY C.CurrencyRateID


As always, if you have any feedback and can suggest a simpler way of performing the triangulation I would love to hear it.

How to get an age from two dates in SQL

For a project that I’ve been working on recently I was tasked with finding the Age of a product at a certain point in time in year and months. What should have been a relatively simple task proved to be a bit more difficult as the requirement was that the age be returned in two separate columns: Years and Months.

I approached it by using DATEDIFF as a starting point as that’s what I would want to emulate in order to produce my age.


I used DAY as the interval as I thought it would be easier to calculate the age in a more granular format, if required.

I then started breaking down the output as working in number of days is a bit unmanageable. I broke the average number of days into a month (30.42) and the number of months in a year (12)


Then I hit a stumbling block. How was I to split out the month from the year and the year from the month and have them displayed as whole numbers?

To get the year I used the following query:


To get the month I used the following query, not as long as the query to produce the year but still similar:

ROUND((DATEDIFF(DAY,@StartDate,@EndDate)/30.42/12),1)%1 * 12

The multiplication of the month by 12 is crucial, as it’s the only way to turn a fraction of a year into a number of months.

I have then wrapped the overall query up within a CTE producing the following query script:


SET @StartDate = '1969-07-16'
SET @EndDate = GETDATE()

CONVERT(INT,(ROUND(((DATEDIFF(DAY,@StartDate,@EndDate)/30.42)/12),1))%1 *12 )
    AS AgeMonth,
    AS AgeYear
SELECT AgeYear, AgeMonth

Adatis Hackathon Jan 2015 – Predictive Analytics and Azure ML

Back in January, Adatis hosted an internal hackathon in order to explore the various features of the Azure platform. We were split into three teams: STEAM (using stream analytics), DREAM (using Azure ML) and GLEAM (using PowerBI).

I was looking after the DREAM Team, which was responsible for providing a prediction on when a thermostat system would breakdown and, therefore, need to be replaced. We used the Azure Machine Learning (ML) platform to achieve this.

We had quite a substantial element of preparation work to do before we could even begin using Azure ML. We began by creating a database and tables within the Azure platform so that we could connect the data to the ML system.

Once connected we were up and running, trying different statistical techniques in order to produce our desired outcome. Our objective was to predict at what age a thermostat system would breakdown so that it could be replaced before it broke. So, our initial hypothesis was that as the age of the system increased the likelihood of the system failing would increase. Our first test was using a linear regression and this proved to be inaccurate with an accuracy rating of 0.04%. We then used a Two-Class Boosted Decision Tree model which proved to be the complete opposite with an accuracy rating of 100%. How do two models produced vastly differing results?

So we had to delve into the data itself and look row by row at what the data was telling us. Long and short of it was that by using dummy data we were unable to accurately predict an outcome because there was an absence of relationships with the different data points.

To snatch victory from the jaws of defeat we used the remainder of the exercise to give an overview of ML and its capabilities to members of the team.

Our impressions were that Azure ML is a very powerful platform and easy to get to grips with, due to the drag-and-drop interface that many of us are familiar with due to working with SSIS. However, it despite the easy to use interface it does require you to have, at least, a knowledge of statistics and statistical techniques otherwise you could end up wandering blindly through a forest. When things go wrong, it’s not intuitive to fix. It is a bit of a black box in this regards but, with the inclusion of running R Scripts, there are ways around this.

As it’s still in preview I expect that it can only get better and more intuitive. But a word of warning to would be users: brush up on your stats beforehand.