Ust Oldfield's Blog

Adatis Hackathon Jan 2015 – Predictive Analytics and Azure ML

Back in January, Adatis hosted an internal hackathon in order to explore the various features of the Azure platform. We were split into three teams: STEAM (using stream analytics), DREAM (using Azure ML) and GLEAM (using PowerBI).

I was looking after the DREAM Team, which was responsible for providing a prediction on when a thermostat system would breakdown and, therefore, need to be replaced. We used the Azure Machine Learning (ML) platform to achieve this.

We had quite a substantial element of preparation work to do before we could even begin using Azure ML. We began by creating a database and tables within the Azure platform so that we could connect the data to the ML system.

Once connected we were up and running, trying different statistical techniques in order to produce our desired outcome. Our objective was to predict at what age a thermostat system would breakdown so that it could be replaced before it broke. So, our initial hypothesis was that as the age of the system increased the likelihood of the system failing would increase. Our first test was using a linear regression and this proved to be inaccurate with an accuracy rating of 0.04%. We then used a Two-Class Boosted Decision Tree model which proved to be the complete opposite with an accuracy rating of 100%. How do two models produced vastly differing results?

So we had to delve into the data itself and look row by row at what the data was telling us. Long and short of it was that by using dummy data we were unable to accurately predict an outcome because there was an absence of relationships with the different data points.

To snatch victory from the jaws of defeat we used the remainder of the exercise to give an overview of ML and its capabilities to members of the team.

Our impressions were that Azure ML is a very powerful platform and easy to get to grips with, due to the drag-and-drop interface that many of us are familiar with due to working with SSIS. However, it despite the easy to use interface it does require you to have, at least, a knowledge of statistics and statistical techniques otherwise you could end up wandering blindly through a forest. When things go wrong, it’s not intuitive to fix. It is a bit of a black box in this regards but, with the inclusion of running R Scripts, there are ways around this.

As it’s still in preview I expect that it can only get better and more intuitive. But a word of warning to would be users: brush up on your stats beforehand.