Matt

Matt Willis' Blog

An agile waterfall – An approach to transitioning from a waterfall to an agile methodology

Many large organisations are embarking on the journey from a waterfall methodology to an agile methodology. Sometimes this journey is underestimated. At Adatis this is a journey we have assisted clients with and in this blog I will share with you what we have learnt.

At Adatis we exclusively work in a scrum agile methodology, we work with agile clients, waterfall clients and ever increasingly, clients in the process of transitioning from waterfall to agile. This transition can be difficult, leading to both methodologies being in practise at the same time, and this can cause friction.

This blog will outline a hybrid approach, between waterfall and agile, which will assist in crossing that bridge between the two methodologies, making everyone’s life easier.

To better explain the technique, I have gone through an example using the scrum agile methodology. This blog assumes some basic understanding of the scrum methodology, if you do not have this, please refer to this website to get you up to speed.

How to navigate the agile waterfall

Education

First things first, education, as is the case with most things in life, if you don’t understand something, then it can be daunting! Often people concentrate on the buzz words involved in agile and relate the buzz words back to waterfall, without really understanding the true definitions of the terms. Make sure everyone understands exactly what agile is, how it works and crucially why it works.

Sprint ahead

In true agile fashion, you should focus on one sprint at a time, but this is where the compromising starts and what defines this hybrid approach. As always, carry out detailed sprint planning for the current sprint. Everything else that is on your backlog should have high level, worst case estimates. Using those estimates, combined with your average capacity for a sprint, and your prioritised backlog, you should be able to work out which story point fits into which sprint. This is not fixed and will change over time, but the key here is that there is always a plan right up to a deadline in place.

clip_image002

In the diagram above, we have the prioritised backlog on the right-hand side, with the estimate in days in brackets. Our average capacity is 12 days per sprint. We pick story points off the backlog in order, and try to squeeze them into each sprint. For example, the customer dimension takes 8 days to complete, so easily fits into sprint 1. We have 4 days left, our next highest priority is the product dimension, we squeeze in as much as we can into sprint 1. There will still be 1 day left of the product dimension, making it our first item in sprint 2. We continue to do this until all story points fit within sprints.

Once it’s decided which story point fits into which sprint, you must calculate a more detailed capacity, featuring people, so that the sprint duration can be decided. Taking the sprint duration, you can start to map out dates, which will put you on a path to start satisfying the waterfall methodology.

clip_image004

Contingency, deprioritisation and delaying deadlines

If from here to the end of the project every single sprint goal is met then the project will be a roaring success. However, this will not always be the case and steps should be taken to proactively deal with potential issues.

To allow for minor issues, contingency should be added. For major issues, expectations should be set and bad news should be delivered as soon as possible. If a sprint goal is not met, and therefore carries over to the next sprint, this has ramifications. Do not fall into the trap of thinking you can make that time back later in the project, all these little setbacks add up. You must face up to it now, in this scenario there are two choices. Either you add in a new sprint and reshuffle everything around, thus extending the deadline. Or, you deprioritise a story point and everyone accepts that will no longer be delivered as part of the final solution.

It is perhaps that last paragraph that is most important and most critical to this hybrid approach. By mapping everything out in a structured waterfall approach based upon the agile methodology. This allows you to utilise agile - staying flexible and dynamic, and allows you to utilise waterfall - to plan and to identify and deal with problems early on. Thus, the two competing methodologies are playing to their strengths and working well together.

Using R Tools for Visual Studio (RTVS) with Azure Machine Learning

Azure Machine Learning

Whilst in R you can implement very complex Machine Learning algorithms, for anyone new to Machine Learning I personally believe Azure Machine Learning is a more suitable tool for being introduced to the concepts.

Please refer to this blog where I have described how to create the Azure Machine Learning web service I will be using in the next section of this blog. You can either use your own web service or follow my other blog, which has been especially written to allow you to follow along with this blog.

Coming back to RTVS we want to execute the web service we have created.

You need to add a settings JSON file. Add an empty JSON file titled settings.json to C:\Users\<your name>\Documents\.azureml. Handy tip: if you ever want to have a dot at the beginning of a folder name you must place a dot at the end of the name too, which will be removed by windows explorer. So for example if you want a folder called .azureml you must name the folder .azureml. in windows explorer.

Copy and paste the following code into the empty JSON file, making sure to enter your Workspace ID and Primary Authorization Token.

{"workspace":{

"id" : "<your Workspace ID>",

"authorization_token" : "<your Primary Authorization Token>",

"api_endpoint": "https://studioapi.azureml.net",

"management_endpoint": "https://management.azureml.net"

}}

You can get your Workspace ID by going to Settings > Name. And the Primary Authorization Token by going to Settings > Authorization Tokens. Once you’re happy save and close the JSON file.

Head back into RTVS, we’re ready to get started. There are two ways to proceed. Either I will take you line by line what to do or I have provided an R script containing a function, allowing you to take a shortcut. Whichever option you take the result is the same.

Running the predictive experiment in R – Line by line

With each line copy and paste it into the console.

Firstly a bit of setup, presuming you’ve installed the devtools package as described on the github page for the download, load AzureML and connect to the workspace specified in settings.JSON. To do this use the code below:

## Load the AzureML package.

library(AzureML)

## Load the workspace settings using the settings.JSON file.

workSpace <- workspace()

Next we need to set the web service, this can be any web service created in Azure ML, for this blog we will use the web service created in this blog. The code is as follows:

## Set the web service created in Azure ML.

automobileService <- services(workSpace, name = "Automobile Price Regression [Predictive Exp.]")

Next we need to define the correct endpoint, this can easily be achieved using:

## Set the endpoint from the web service.

automobileEndPoint <- endpoints(workSpace, automobileService)

Everything is set up and ready to go, except we need to define our test data. The test data must be in the exact same format as the source data of your experiment. So the exact same amount of columns and with the same column names. Even include the column you are predicting, entering just a 0 or leaving it blank. Below is the test data I used:

clip_image002

This will need to be loaded into R and then a data frame. To do so use the code below, make sure the path is pointing towards your test data.

## Load and set the testing data frame.

automobileTestData <- data.frame(read.csv("E:\\OneDrive\\Data Science\\AutomobilePriceTestData.csv"))

Finally we are ready to do the prediction and see the result! The final line of code needed is:

## Send the test data to the web service and output the result.

consume(automobileEndPoint, automobileTestData)

Running the predictive experiment – Short cut

Below is the entire script, paste the entirety of it into top left R script.

automobileRegression <- function(webService, testDataLocation) {

## Load the AzureML package.

library(AzureML)

## Load the workspace settings using the settings.JSON file.

amlWorkspace <- workspace()

## Set the web service created in Azure ML.

automobileService <- services(amlWorkspace, name = webService)

## Set the endpoint from the web service.

automobileEndPoint <- endpoints(amlWorkspace, automobileService)

## Load and set the testing data frame.

automobileTestData <- data.frame(read.csv(testDataLocation))

## Send the test data to the web service and output the result.

consume(automobileEndPoint, automobileTestData)

}

Run the script by highlighting the whole of the function and typing Ctrl + Enter. Then run the function by typing the below code into the console:

automobileRegression("Automobile Price Regression [Predictive Exp.]","E:\\OneDrive\\Data Science\\AutomobilePriceTestData.csv")

Where the first parameter is the name of the Azure ML web service and the second is the path of the test data file.

The Result

Both methods should give you the same result: an output of a data frame displaying the test data with the predicted value:

clip_image004

Wahoo! There you have it, a predictive analytic regression Azure Machine Learning experiment running through Visual Studio… the possibilities are endless!

Introduction to R Tools for Visual Studio (RTVS)

Introduction

This blog is not looking at one or two exciting technologies, but THREE! Namely Visual Studio, R and Azure Machine Learning. We will be looking at bringing them together in harmony using R Tools for Visual Studio (RTVS).

Installation

As this blog will be touching on a whole host of technologies, I won’t be going into much detail on how to set each one up. However instead I will provide you with a flurry of links which will provide you with all the information you need.

Here comes the flurry…!

· Visual Studio 2015 with Update 1 – I hope anyone reading this is familiar with Visual Studio, but to piece all these technologies together version 2015 with Update 1 is required, look no further than here: visualstudio.com/en-us/downloads/download-visual-studio-vs.aspx

· R – Not sure exactly what version is needed but just go ahead and get the latest version you can, which can be found here: https://cran.r-project.org/bin/windows/base/

· Azure Machine Learning – No installation required here, yay! But you will need to set up an account if you have not done so already, this can be done here studio.azureml.net/

· R Tools for Visual Studio - More commonly known as RTVS. The name is fairly self-explanatory but it allows you to run R through Visual Studio. If you have used R and Visual Studio separately before it will feel strangely familiar. Everything you need to download, install and set up can be found here: microsoft.github.io/RTVS-docs/

· Devtools Package - The final installation step is a simple one. Installing the correct R packages to allow you to interact with Azure ML. If you’ve used R to interact with Azure ML before you probably have already done this step, but for those who have not, all the information you will need to do so can be found here: github.com/RevolutionAnalytics/AzureML

Introduction to RTVS

Once all the prerequisites have been installed it is time to move onto the fun stuff! Open up Visual Studio 2015 and add an R Project: File > Add > New Project and select R. You will be presented with the screen below, name the project AutomobileRegression and select OK.

clip_image002

Microsoft have done a fantastic job realising that the settings and toolbar required in R is very different to those required when using Visual Studio, so they have split them out and made it very easy to switch between the two. To switch to the settings designed for using R go to R Tools > Data Science Settings you’ll be presented with two pop ups select Yes on both to proceed. This will now allow you to use all those nifty shortcuts you have learnt to use in RStudio. Anytime you want to go back to the original settings you can do so by going to Tools > Import/Export Settings.

You should be now be looking at a screen similar to the one below:

clip_image004

This should look very recognisable to anyone familiar to R:

 clip_image006

For those not familiar, the top left window is the R script, this will be where you do your work and what you will run.

Bottom left is the console, this allows you to type in commands and see the output, from here you will run your R scripts and test various functions.

Top right is your environment, this shows all your current objects and allows you to interact with them. You can also change to History, which displays a history of the commands used so far in the console.

Finally the bottom right is where Visual Studio differs from RStudio a bit. The familiar Solution Explorer is visible within Visual Studio and serves its usual function. Visual Studio does contain R Plot and R Help though, which both also feature in RStudio. R Plot will display plots of graphs when appropriate. R Help provides more information on the different functions available within R.

Look for my next blog, which will go into more detail on how to use RTVS.

Azure ML Regression Example - Part 3 Deploying the Model

In the final blog of this series we will take the regression model we have created earlier in the series and make it accessible so it can be consumed by other programs.

Making the experiment accessible to the outside world

The next part of the process is to make the whole experiment accessible to the world outside of Azure ML. To do so you need to create a Web Service. This can be achieved by clicking Set Up Web Service button next to Run and then selecting Predictive Web Service [Recommended]. The experiment will change in front of your eyes and you should be left with a canvas looking similar to the one displayed below.

clip_image002

If you would like to get back to your training experiment at any time you can do so by clicking Training experiment in the top right corner. You can then run the predictive web service again to update the predictive experiment.

Whilst in the predictive experiment window run the experiment once again and then click Deploy Web Service. Having done this, you should be displayed with the below screen:

clip_image004

Select Excel 2013 or later in the same row as REQUEST/RESPONSE. Click the tick to download the Excel document, open it and click Enable Editing, you will see something like the image below. If you are using Excel 2010 feel free to follow the example, it will be fairly similar, but not identical.

clip_image006

Click Automobile Price Regression [Predictive Exp.] to begin. Click Use sample data to quickly construct a table with all the appropriate columns and a few examples. Feel free to alter the sample data to your heart’s content. Once you’re happy with your data, highlight it and select it as the Input range selected. Chose an empty cell as the Output. Click Predict. You should see something similar to below:

clip_image008

You should now be able to see a replica table with a Scored Labels column displaying the estimated price for each row.

Go ahead and rerun the experiment putting in whatever attribute values you desire. This experiment will now always return a Scored Label relating to the price based upon the training model.

What next?

This has just been a toe dip into the world of Azure ML. For more information on getting started with Azure ML track down a copy of Microsoft Azure Essentials – Azure Machine Learning by Jeff Barnes, this is a great starting point.

If you want to know what you can do with Azure ML and how to start using Azure ML within other programs then check out my upcoming blog which will show you how to integrate Azure ML straight into Visual Studio.

Azure ML Regression Example - Part 2 Training the Model

This is where the real fun begins! In this blog we will get to the heart of machine learning and produce a regression model.

Training the model

We now need to split the data into training and testing sets. This is so we can train the algorithm using the training set and then test the accuracy of the prediction using the testing set. To do so search for ‘split’ in the Search experiment items search bar. Drag the Split Data task onto the canvas. Under the properties is a property called Fraction of rows in the first output dataset this lets you chose what percentage of rows is used for training and what percentage are held back to test the prediction accuracy. Let’s set it to 0.9, this means 90% will be used for training, 10% for testing. Leave the other properties as they are. The properties window should look like the below image:

clip_image002

Now let’s get to the very fundamental core of machine learning, the algorithm itself. For this we will use one of my personal favourites a Boosted Decision Tree. Decision Trees frequently have very high accurate prediction results and are great for discovering more about your data based on the leaves of the tree. Go to the item toolbox, clear the search box and navigate to Machine Learning > Initialize Model > Regression and drag on the Boosted Decision Tree Regression item onto the left side of the canvas. Change the properties to coincide with the values below, these have been selected after using a Sweep Parameters item to work out the optimal parameter settings.

Parameter Name

Parameter Value

Create trainer mode

Single Parameter

Maximum number of leaves per tree

36

Minimum number of samples per leaf mode

7

Learning rate

0.33128

Total number of trees constructed

182

Random number seed

 

Allow unknown categorical levels

Check

Drag on the Train Model item, which is located under Train on the item toolbox. Join up the appropriate output and input ports so your canvas looks like the image below.

clip_image004

Click on the Train Model item and select Launch column selector in the properties window. Here you are selecting the column you want to predict, so just select price.

Now we need to predict the results of the testing data. To do so, drag on a Score Model item (located under Score) and connect the Train Model and Split Data items to each input note of the Score Model. Once complete, hit Run to run the experiment, your canvas should be eliminated with green ticks like the image below.

clip_image005

Now let’s have a look and see if this algorithm has actually produced any decent results. Right click on destination node of the Score Model and left click on Visualise. You should see something similar to the below image.

 clip_image007

This table displays the values for each and every piece of test data. If you scroll all the way to the right and you should see two columns: price and Scored Labels. Price is the actual price of the car. Scored Labels is the amount the regression algorithm has predicted the price of the car to be. The numbers are quite close, which is exactly the result we’re after. If you click on the Score Labels column header you can conduct some further analysis, scrolling down and making sure compare to is set to price you can view a scatter plot of the two values. I have done so on the image above and looking at the scatter plot you can see that there is a strong positive correlation with only a few outliers.

Your Azure Machine Learning regression algorithm is now complete! In the next blog we will be deploying the model so we can use it outside of Azure Machine Learning and really put what we have created into practice.

Azure ML Regression Example - Part 1 Preparing the File

This blog series will give you a quick run through of Azure Machine Learning and by the end of it will have you publishing an analytical model which can be consumed by various external programs. This particular blog will focus on just preparing the file before we will look at training the model in the next blog.

Source file

The source file we will be using can be found at archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data. The column names are fairly self-explanatory but if you would like a little bit more information please visit archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.names. Make sure you download the file to somewhere you can easily access later.

Please open up the file and convert all column headers with a hyphen to camel case, so for example ‘normalized-losses’ to ‘normalizedLosses’ and save it.

Once you are logged into Microsoft Azure Machine Learning Studio you must add a data set. To do so go Datasets > New > From Local File. Click Browse and navigate to the source file you downloaded earlier and select it. The rest of the fields should populate automatically and display similar values to the image below, when you are happy click the tick to add the dataset.

clip_image001

Creating the experiment

Next, navigate to New > Experiment > Blank Experiment. You will be presented with a blank canvas, rename it by overwriting ‘Experiment Created on’ todays date to ‘Automobile Price Regression’. You should be looking at a screen similar to below:

clip_image003

Learning your way around

The far left blue strip allows you to carry out navigation within Azure Machine Learning Studio. It gives you access to Projects, Experiments, Web Services, Notebooks, Datasets, Trained Models and Settings. Projects and Notebooks are in preview so we won’t discuss these in this blog. Experiments is a list all the experiments you have created. Web Services is a list of all the experiments you have published, more information on this will be provided later. Trained Models are the complete predictive models that have been trained already, again, more information will be provided later. Finally, Settings is fairly self-explanatory and allows you to view information such as various IDs, tokens, users and information about your work space.

The white toolbox to the right of the navigation pane is the experiment item toolbox. This contains all the datasets and modules needed to create a predictive analytics model. The toolbox is searchable and the items can be dragged onto the canvas.

The canvas describes the experiment and shows all the datasets and modules used. The datasets and modules can be moved freely and lines are drawn between input and output ports to enforce the ordering.

The properties pane allows you to modify certain properties of a dataset or module by clicking on the item and modifying the chosen property in the pane.

Cleaning the source file

First up expand Saved Datasets > My Datasets within the experiment item toolbox and drag on your newly created dataset onto the canvas.

Next, expand Data Transformation > Manipulation and drag on Clean Missing Data. Connect the tasks by dragging the output port of the dataset to the input port of the Clean Missing Data task. Make sure the properties mirror the screenshot below. This will replace all missing values with the value ‘0’. The below image is what you should see when Clean Missing Data is selected:

clip_image005

Then drag on the Project Columns task onto the canvas. Drag the leftmost output port from the Clean Missing Data task to the input port of the Project Columns task. Select the Launch column selector and select All Columns under Begin With and make sure Exclude and column names are selected and add the columns bore and stroke. This will remove the selected columns because they are not relevant when predicting the price, and therefore will have an adverse effect on the accuracy of the prediction. When you are happy it should look something like the screen below. Click the tick to confirm.

clip_image007

The boring bit is now out of the way! In the next blog we will start the real machine learning by splitting the data into training and testing sets and then producing a model.