# Ust

### Ust Oldfield's Blog

For a project that I’ve been working on recently I was tasked with finding the Age of a product at a certain point in time in year and months. What should have been a relatively simple task proved to be a bit more difficult as the requirement was that the age be returned in two separate columns: Years and Months.

I approached it by using DATEDIFF as a starting point as that’s what I would want to emulate in order to produce my age.

DATEDIFF(DAY,@StartDate,@EndDate)

I used DAY as the interval as I thought it would be easier to calculate the age in a more granular format, if required.

I then started breaking down the output as working in number of days is a bit unmanageable. I broke the average number of days into a month (30.42) and the number of months in a year (12)

DATEDIFF(DAY,@StartDate,@EndDate)/30.42/12

Then I hit a stumbling block. How was I to split out the month from the year and the year from the month and have them displayed as whole numbers?

To get the year I used the following query:

ROUND((DATEDIFF(DAY,@StartDate,@EndDate)/30.42/12),1)-ROUND((DATEDIFF(DAY,@StartDate,@EndDate)/30.42/12),1)%1

To get the month I used the following query, not as long as the query to produce the year but still similar:

ROUND((DATEDIFF(DAY,@StartDate,@EndDate)/30.42/12),1)%1 * 12

The multiplication of the month by 12 is crucial, as it’s the only way to turn a fraction of a year into a number of months.

I have then wrapped the overall query up within a CTE producing the following query script:

DECLARE @StartDate AS DATE
DECLARE @EndDate AS DATE

SET @StartDate = '1969-07-16'
SET @EndDate = GETDATE()
;

WITH Age AS(
SELECT
CONVERT(INT,(ROUND(((DATEDIFF(DAY,@StartDate,@EndDate)/30.42)/12),1))%1 *12 )
AS AgeMonth,
CONVERT(INT,(ROUND(((DATEDIFF(DAY,@StartDate,@EndDate)/30.42)/12),1))-(ROUND(((DATEDIFF(DAY,@StartDate,@EndDate)/30.42)/12),1))%1)
AS AgeYear
)
SELECT AgeYear, AgeMonth
FROM Age

Back in January, Adatis hosted an internal hackathon in order to explore the various features of the Azure platform. We were split into three teams: STEAM (using stream analytics), DREAM (using Azure ML) and GLEAM (using PowerBI).

I was looking after the DREAM Team, which was responsible for providing a prediction on when a thermostat system would breakdown and, therefore, need to be replaced. We used the Azure Machine Learning (ML) platform to achieve this.

We had quite a substantial element of preparation work to do before we could even begin using Azure ML. We began by creating a database and tables within the Azure platform so that we could connect the data to the ML system.

Once connected we were up and running, trying different statistical techniques in order to produce our desired outcome. Our objective was to predict at what age a thermostat system would breakdown so that it could be replaced before it broke. So, our initial hypothesis was that as the age of the system increased the likelihood of the system failing would increase. Our first test was using a linear regression and this proved to be inaccurate with an accuracy rating of 0.04%. We then used a Two-Class Boosted Decision Tree model which proved to be the complete opposite with an accuracy rating of 100%. How do two models produced vastly differing results?

So we had to delve into the data itself and look row by row at what the data was telling us. Long and short of it was that by using dummy data we were unable to accurately predict an outcome because there was an absence of relationships with the different data points.

To snatch victory from the jaws of defeat we used the remainder of the exercise to give an overview of ML and its capabilities to members of the team.

Our impressions were that Azure ML is a very powerful platform and easy to get to grips with, due to the drag-and-drop interface that many of us are familiar with due to working with SSIS. However, it despite the easy to use interface it does require you to have, at least, a knowledge of statistics and statistical techniques otherwise you could end up wandering blindly through a forest. When things go wrong, it’s not intuitive to fix. It is a bit of a black box in this regards but, with the inclusion of running R Scripts, there are ways around this.

As it’s still in preview I expect that it can only get better and more intuitive. But a word of warning to would be users: brush up on your stats beforehand.