Jose Mendes

Jose Mendes' Blog

Extraction and Analysis of GeoSpatial data with Azure Data Lake Analytics

I recently had to implement a solution to prove it was possible to integrate a shape file (.SHP) in Azure Data Lake Store (ADLS) for post geographic spatial analysis using Azure Data Lake Analytics (ADLA).

A shape file is a data set used by a geographic analysis application that stores a collection of geographic features, such as streets or zip code boundaries, in the form of points, lines or area features.

As you already figured, storing a shape file in ADLS is not a difficult goal to achieve, however, how can you possibly use ADLA to obtain the geographic data from the file? In this blog I’ll explain how we can extract the data to a supported format, such as CSV, and use it to run geographic spatial analysis in ADLA, with the support of the spatial data types introduced in the SQL Server 2008 (details here).

As always, whenever we face a limitation of ADLA, C# is our best friend. In order to read the content of a shape file, we need to start by adding a geospatial assembly to our solution, which, in my case, was the “Catfood” ESRI Shapefile Reader (details here).

The shape file used in this example contains a list of parks in London. The following code demonstrates how to extract the metadata and the geographic shapes to a CSV file. The only shapes extracted are polygons, although it is possible to add more if needed.

public static void CreateWorkForThreads()
    //Create a new dataset and store the data in a table
    DataSet ds = CreateNewDataSet();
    DataTable dt = ds.Tables[0];

    int i;
    int count = 0;

    // Parse the shapefile and select the columns we are interested in
    using (Shapefile shapefile = new Shapefile(@"path\file.shp"))
        foreach (Shape shape in shapefile)
            string[] metadataNames = shape.GetMetadataNames();
            string geometry = "";
            int countParts = 0;
            int countShape = 0;

            DataRow dr = dt.NewRow();

            //Extract the metadata. The first iteraction will extract the name of the columns
            if (metadataNames != null)
                foreach (string metadataName in metadataNames)
                    if (count == 0)
                        dr[metadataName] = metadataName;
                        dr[metadataName] = shape.GetMetadata(metadataName);


            //Shape is not part of the metadata, so manually defining the name of the column
            if (count == 0)
                dr["shape"] = "shape";
                // cast shape based on the type
                switch (shape.Type)
                    case ShapeType.Point:
                        // a point is just a single x/y point
                        ShapePoint shapePoint = shape as ShapePoint;
                        MessageBox.Show("Point (" + shapePoint.Point.X.ToString() + ", " + shapePoint.Point.Y.ToString() + ")");

                    case ShapeType.Polygon:
                        // a polygon contains one or more parts - each part is a list of points which
                        // are clockwise for boundaries and anti-clockwise for holes 
                        // see
                        ShapePolygon shapePolygon = shape as ShapePolygon;
                        foreach (PointD[] part in shapePolygon.Parts)
                            countShape = 0;

                            if (countParts == 0)
                                geometry = "(";
                                geometry = geometry + " | (";

                            foreach (PointD point in part)
                                if (part.Length - 1 != countShape)
                                    geometry = geometry + point.X + " " + point.Y + " |";
                                    geometry = geometry + point.X + " " + point.Y + " )";




                //Build our Polygon. 
                //Eg. POLYGON((-122.358 47.653, -122.348 47.649| -122.348 47.658, -122.358 47.658, -122.358 47.653))
                dr["shape"] = "POLYGON(" + geometry + ")";


    //Extract the data to a csv file
    using (System.IO.StreamWriter sw =
    new System.IO.StreamWriter(@"path\filename.csv"))
        foreach (DataRow row in dt.Rows)
            object[] array = row.ItemArray;
            for (i = 0; i < array.Length - 1; i++)
                sw.Write(array[i].ToString() + ",");

public static DataSet CreateNewDataSet()
    DataSet dsTemp = new DataSet();
    DataTable dtTemp = new DataTable("londonparks");
    dtTemp.Columns.Add("id", typeof(string));
    dtTemp.Columns.Add("parkname", typeof(string));
    dtTemp.Columns.Add("street", typeof(string));
    dtTemp.Columns.Add("postcode", typeof(string));
    dtTemp.Columns.Add("shape", typeof(string));

    return dsTemp;

Now that we have a valid file that can be processed by ADLA, we can upload it to ADLS and start performing geospatial analysis. To do so, I simply followed the logic described in Sacha’s blog (here).

The following U-SQL has in consideration a dataset that contains details of the trajectory of a courier, tracked on a daily basis. With the following code, we identify if a courier drove by a park by using the Intersect function. Because we have to cross two datasets, a C# function was created to help the evaluation of multiple events.

// Reference the assemblies we require in our script.
REFERENCE ASSEMBLY [SQLServerExtensions].[SqlSpatial];

// Once the appropriate assemblies are registered, we can alias them using the USING keyword.
USING Geometry = Microsoft.SqlServer.Types.SqlGeometry;
USING Geography = Microsoft.SqlServer.Types.SqlGeography;
USING SqlChars = System.Data.SqlTypes.SqlChars;
USING [USQL].[Core].[Utilities];

// Extract the list of parks
@parks =
      [ID]       	    string,
      [PARKNAME]        string,
      [STREET]	        string,
      [POSTCODE]        string,
      [SHAPE]           string
   FROM "RAW/Parks.csv"
   USING Extractors.Text(delimiter : ',', silent : false, quoting : true, skipFirstNRows : 1);

//Extract data from the file containing the courier trajectory
@trajectories =
        GPSDateTimeUTC          DateTime,
        ReceivedDatetimeUTC     DateTime,
        VehicleKey              string,
        Altitude                int,
        Longitude               double,
        Latitude                double,
        Distance                decimal,
        VehicleSpeedMph         decimal
    FROM "CURATED/Trajectory/Trajectory.TXT"
    USING Extractors.Text(delimiter : '|', silent : false, quoting : true, skipFirstNRows : 1);

//Get the list of vehicles that drove by the park. 
@vehicleIntersection =
                    "1" AS VehicleIntersected
    FROM @trajectories AS a
         CROSS JOIN
             @parks AS b
    WHERE Utilities.Intersect(b.[SHAPE], a.[Longitude], a.[Latitude]).ToString() == "True";

//Get the list of vehicles that didn't drive by the park. 
@vehicleWithoutIntersection =
    SELECT a. *,
           "0" AS VehicleIntersected
    FROM @trajectories AS a
         LEFT JOIN
             @vehicleIntersection AS b
         ON b.VehicleKey == a.VehicleKey
            AND b.GPSDateTimeUTC == a.GPSDateTimeUTC
    WHERE b.VehicleKey IS NULL;

//Union both datasets to get the complete set of data
@finalData =
    SELECT *
    FROM @vehicleIntersection
    SELECT *
    FROM @vehicleWithoutIntersection;

//Export the results to a csv file
   @finalData TO "LABORATORY/GeoSpatialIntersection.csv"
   USING Outputters.Text(outputHeader : true, delimiter : ',', quoting : true);

And here is the C# function. It accepts three parameters and calculate the intersection of a point with a shape.

public static string Intersect(string shape, double longitude, double latitude)
	//Because we had a csv file, the coordinates in the polygon were separated by |
	//It is important to use the .MakeValid() method to validate any invalid shape
	//In case the dataset had multypoligon shapes, without the MakeValid(), the function would throw an error
    var g =
            new SqlChars(
                shape.Replace('|',',')), 4326).MakeValid();
    var h = Geography.Point(longitude, latitude, 4326);

    return g.STIntersects(h).ToString();

As always, if you have any questions or comments, do let me know.

Working with Manual and Automated KPIs in a Tabular Cube

In a recent project, we had to produce a scorecard using a set of manual and calculated KPIs. To obtain the manual KPI figures, we used Master Data Services (MDS) where the users could insert the values, while for the calculated, we used the base measures created in the tabular cube.

So far, this requirement does not look very complicated, however, what if I tell you that the same KPI can either be manually or automatically calculated by the cube? And that we have to present the values for different levels of a hierarchy? And that some of the KPIs are not absolute values but ratios? Now that I got your attention, let’s have a look at the solution we implemented.

How to join manual and automated KPIs?

Because the client couldn’t always provide the data to calculate the base measures, we delivered an MDS model to, among other functionalities, manually insert the numeric values. You can check this blog if you want to know more about the Survey model (

Since we were working with different markets, the same KPI could either be manually or automatically calculated, which means, the cube had to to select the appropriate scenario, depending on the selected market. In order to achieve such requirement, we created 3 measures.

AutKPI – Using base measures from multiple tables, we defined the DAX code to calculate the KPI

ManKPI – Knowing all the MDS values were in one table, we defined a simple DAX query to sum the values

Actual – This measure was implemented with an IF statement. Eg.

        Actual:=IF(ISBLANK([AutKPI]), [ManKPI], [AutKPI])

How to aggregate ratio KPIs?

Let’s have a look at the example below, where we are calculating the KPI for two levels of a geography hierarchy.


Automated KPI 1 – Europe is naturally aggregating the values from Great Britain and Ireland

Automated KPI 2 - Considering we are using base measures, the cube can properly calculate the KPI at Europe level.

Manual KPI 1 – All manual entries were aggregated with a SUM. Because those are absolute values, the figure for Europe is correct

Manual KPI 2 ­– Following the same logic as Manual KPI 1, we can see the Europe value is incorrect. Because this is a ratio we can't aggregate the value from the lower levels. The simplest approach to resolve this problem was to create a new calculation using an AVERAGE function, however, considering the requirements, we had to introduce a weighted average.

Implementing weighted averages

The first step to this approach is to define a weight for each market. Since the values can change according to the user’s needs, we added a new entity to the MDS model.


Now let’s consider the example below showing the weighted approach.


Following is the formula to calculate the KPI at Europe level. For a better understanding, I split it in different steps.

C1 GB: Manual KPI x Weight

C1 Ireland: Manual KPI x Weight

C2: C1 GB + C1 Ireland

C3: GB Weight + Ireland Weight

Europe KPI: C2 / C3

The scope of the project stated we had to implement the following logic:

· When presenting the KPIs at market level, don’t apply the weighting

· When presenting the KPIs at region level, apply the weighting but only for the ratio KPIs

The biggest challenge of this requirement was to overwrite the aggregating logic of the geography hierarchy. To achieve that, we implemented a dynamic segmentation pattern on the ratio KPIs (more details on this link This approach can be split in four steps.

First step is the calculation of our numerator.

Num Weighted AutKPI – Because the base measures from our automated KPIs are from different tables, we had to firstly group our data by market and region level and only then apply the calculation. Eg.

Num Weighted AutKPI:=
                          'KPI Value',
                        [AutKPI] * [KPI Weight]

Num Weighted ManKPI – On this instance, the grouping was not necessary because we only had one measure to consider. Eg.

Num Weighted ManKPI:=
                       'KPI Value',
                           SUM ( 'KPI Value'[KPIActual] ) * [KPI Weight]),
                           'KPI'[KPI] = "Manual KPI"

The second step is the calculation of our denominator.

Den Weighted AutKPI – Once again, because the weights were stored in a single table no grouping was necessary.

Den Weighted AutKPI:=
                       'KPI Value',
                        CALCULATE([KPI Weight])
                    ,'KPI'[KPI] = “Automated KPI"

Den Weighted ManKPI – The same logic applies on this instance.

Den Weighted ManKPI:=
                       'KPI Value',
                        CALCULATE([KPI Weight])
                    ,'KPI'[KPI] = “Manual KPI"

The third step is the division of our measures.

Weighted AutKPI:= DIVIDE([Num Weighted AutKPI], [Den Weighted AutKPI])
Weighted ManKPI:= DIVIDE([Num Weighted ManKPI], [Num Weighted ManKPI])

The fourth step is the calculation of our Weighted Actual measure, by once again, using an IF function.

Weighted Actual:= IF(ISBLANK([Weighted AutKPI]), [Weighted ManKPI], [Weighted AutKPI])

Finally, considering we only wanted to use the weighted measures for a subset of our KPIs, we created a new measure using a SWITCH function. Eg.

         “Percentage KPI”, [Weighted Actual],
         "Percentage KPI2", [Weighted Actual],
         “Absolute KPI”, [Actual],
         "Absolute KPI2",[Actual]

Hopefully, I was able to clearly demonstrate our problem and how we managed to implement a solution to solve it. As always, if you have any questions or comments, do let me know.

How to run a successful testing stage

As we all know, testing is one of the most important stages of an IT project, however, either because the client doesn’t know how to test the solution, because we don’t have sample data we can use to compare against our results or because there is not a general approach we can apply to all projects, testing is sometimes set to failure. On this blog, I will share the approach adopted on the project I have been working.

Step 1 - Build test scenarios

This step can only succeed with the help of the Business Analyst or any other person from the business side.

In this project, we are creating a set of KPIs to be used on Power BI and Excel. Considering the end goal, the BA created a set of scenarios (example below) that we used to test our values.



Step 2 – Create SQL scripts to query the source data

One of the biggest risks of this approach lies on this step. Here, we want to create a set of SQL scripts that will follow the logic implemented in the cube. If the logic is wrong, the KPI will show incorrect values, even though we managed to match the results from the source data and the cube. This is where the input of the business user is crucial, since only him will be able to look at the numbers and confirm they are accordingly.

Building the test script is very simple. All we should do is set a couple of variables and make sure all the business rules are applied.


DECLARE @StartDate DATETIME = '20170602'
DECLARE @EndDate DATETIME = '20170603'
DECLARE @OutletUniverse Int

IF OBJECT_ID(N'tempdb..#CallDataAggregated', N'U') IS NOT NULL DROP TABLE #CallDataAggregated;

INTO #CallDataAggregated
FROM Dms.SalRdCallPerformanceRep A
INNER JOIN Dms.SalVsDailyTimingSum B
INNER JOIN Dms.SalSlSalesman C 
WHERE (A.VISIT_DT >= @StartDate AND A.VISIT_DT < @EndDate)

SELECT @OutletUniverse = 
FROM Warehouse.Fct.MarketConfiguration
WHERE MarketKey = 13
 AND (DateKey >= CONVERT(VARCHAR(8),@StartDate,112) AND DateKey < CONVERT(VARCHAR(8),@EndDate,112))

 ,(CONVERT(FLOAT,POPCount) / @OutletUniverse) AS Coverage
FROM #CallDataAggregated


Step 3 – Share the results with the Business Analyst and Testers

Once our testing is complete and the results are approved by the BA, we release the KPIs to UAT. If we are very lucky, we will have a tester that will then carry with his own checks, however, if that is not the case, we will have to make the work for them.


Step 4 – Product testing session with the business users

To sign off the KPIs, the business users need to agree with the results that are shown on the cube, however, they don’t always have the time, skills or tools to query the data. To resolve such problem, we created some examples in excel were we compare the source data with the cube.

KPI UAT Cube – In this sheet, we run a query in the cube for a specific scenario


KPI Source – We query the source data ensuring that all the business rules are applied, which is a risky approach as discussed above

KPI Pivot – We create a Pivot table based on the data from the KPI Source sheet


Once the excel scenarios are completed, we arrange a session with the business users and demonstrate that the values from the cube match with the source data. If they agree with the results, the KPIs are signed off and the testing stage is considered a success.

If you have any questions or thoughts, please leave your comment below.

Is Azure Stack a Rhino?

Last Thursday at the MPUGUK session, I was speaking with someone who told me that the first slide of his presentation about Azure Stack had an image of a Rhino. According to him, Azure Stack is a combination of the “old” on-premises solutions (dinosaur) with the new and futuristic cloud services (unicorn), hence a rhino (dinosaur + unicorn) Smile

So, what is this new option provided by Microsoft?

Azure Stack, currently in TP3, is nothing more nothing less than the Microsoft’s Azure cloud brought into a datacenter (running Microsoft’s Hyper-V, Windows, networking and storage). It is the first true private and hybrid cloud platform that benefits us with all the advantages of the Azure platform, but, keep the physical service in our facilities.

Architecturally speaking, Azure Public and Azure Stack are the same, which means the apps built on-premises can easily be shifted to the public azure cloud. This also means that the cloud services are finally available to the CFO’s that are reluctant to publish their instances in the cloud due to the “sensitive” nature of the information and all security and data governance concerns.



Another advantage of the Azure Stack, is the possibility of organisations that were stuck with the datacenter solutions, to benefit the capabilities of a public cloud.

Let’s consider the cruise industry example provided by Airlift. The cruise ships can only connect to the internet when they are near land, and even then, they don’t have enough bandwidth to run cloud services. To run their internal IT services on board, each ship has their own mini datacenter. With Azure Stack, they can now use services like Azure Bot Service to improve their customer service (eg. personalised marketing, digital and face recognition to access the cabins and make payments,etc.).

On the less bright side, we need to consider the limitations of this platform. A very obvious one, is the cost of the equipment (that can only be bought to DELL EMC, HPE, Lenovo and later in the year, Cisco) and all other costs associated to a datacenter. 

Another downside is the elasticity and scalability limitations when comparing with Azure Public. Finally, the fact that Azure Public will always have more services than Azure Stack (the below image, shows the services that will be available in GA).


Regarding the pricing, just like Azure, we have the ability to start and stop services and will only pay for what we are using. Azure IaaS and PaaS services have no upfront fees, and use the same subscriptions, monetary commitments, and billing tools as Azure Public.


Additional sources:

Data Data Revolution – The Results

This blog will take you through the Power BI Dashboard, Data Data Revolution – The Results, which is the product of the data collected from the demo presented in the last SQLBits conference (for further details, please check my previous blog


This dashboard provides a breakdown on the player’s preferences and performance split by different indicators. In the following video, I’ll show some of the possible conclusions we can gather from the analysis of the data.

Data Data Revolution

Following the DISCO theme, Adatis decided to present all the SQLBits attendees with a challenge based on the game Dance Dance Revolution. At the end of the game, the players were presented with two Power BI dashboards, one that streamed the data in near real time and the other representing historical data. This blog will detail the different components used in the demo.

  SQLBits Architecture

     (High Level Architecture)


The starting point

The first requirement was to have a game that could run on a laptop and store the output data in a file. Based on the theme of the conference, we chose the game Stepmania 5 ( After understanding how it worked and what type of details we wanted to capture, we adapted the program so it was possible to save the output in a TXT file every time a key was pressed. Following is an example of how the data was structured.

{"Player": "0", "Row": "768", "Direction": "Left", "NoteType": "Tap", "Rating": "OKAY", "Health": "Alive", "Combo": "0", "Score": "0", "Artist": "Katrina feat. Sunseaker", "Song": "1 - Walking On Sunshine", "Difficulty": "Easy"}


Capturing player details

To complement the game output, we decided to create an MVC application that had two functions, capturing the player details in an Azure SQL DB, and, upload a new Game ID along with the player details to a reference BLOB stored in an Azure Storage Container.


Sending the data to an Event Hub

Since we wanted to stream the data in near real time, we needed an application that could read the data from the output file as soon as it was updated. To achieve this, we built a C# application that was sending the data to an Event Hub. To make sure we didn’t upload duplicate data, we implemented a logic that compared the last row with the previous one. If they were different, the row was uploaded and if not, the program would wait for the next input.


Distributing the data

To distribute the data between the Azure SQL DB and the Power BI dataset, we used two separate Stream Analytics Jobs.

The first job was using the Event Hub and the reference BLOB as inputs and the Azure SQL DB as output, while the second job was using the same inputs but having a Power BI dataset as an output. Due to the dataset limitations, we ensured that all the formatting was applied in the Stream Analytics Query (eg. cast between varchar and bigint, naming conventions, …).


Power BI streaming datasets

In this scenario, the streaming datasets only work properly when created by the Stream Analytics Job. Any of the following actions invalidates the connection between the jobs and the dataset:

· Create the dataset in Power BI

· Change column names

· Change column types

· Disable the option Historic data analysis

When the dataset crashes, the only solution to fix the issue is to delete and re-create it. As a result, all the linked reports and dashboards are deleted.


Representing the data

By the time the demo was built, the connectivity of live datasets to the Power BI Desktop was not available, which means the live streaming dashboard was built using the online interface.

It is important to note that it is impossible to pin an entire page as a dashboard when using live datasets since it won’t refresh as soon as the data is transmitted. Instead, each individual element must be pinned to the dashboard, adding some visual limitations.


The performance of the players could be followed by checking the dashboard streaming the results in near real time. The use of the word near was used several times in the blog because the streaming is limited not only by the internet connection but also by the Power BI concurrency and throughput constraints, meaning the results were not immediately refreshed.

The second report was built using Power BI Desktop and was connected to the Azure SQL DB.


At the end of the game, the players could obtain the following information:

· Who was the winner

· How did they perform during the game

· The number of hits for each rating

· Which direction they were more proficient

Design Thinking

On February 24th, I had the opportunity to present my first Half Hour Huddle on the subject of “Design Thinking”. The session followed a format where I challenged the participants to solve a problem using the methodology while I guided them through the process. The result? An assortment of different and interesting ideas, that went from a Christmas Training course for people who don’t appreciate the season; a service that would allow you to travel around the world, buy the best products of each region and deliver it in a hot air balloon to a Sponge Bob Square Pants with an inside rubber pocket that allow the user to pour in some moult wine or hot coffee.

The Introduction to Design Thinking

According to Wikipedia, Design Thinking refers to creative strategies designers utilize during the process of designing. It is also an approach that can be used to consider issues and resolve problems more broadly than within professional design practice, and has been applied in business and to social issues (

In other words, Design Thinking is a methodology focused on the users’ experiences, especially their emotional ones, that create models to examine complex problems, build prototypes to explore potential solutions, test the ideas, and most importantly, tolerates failure.


The methodology follows 5 different stages:

Empathize – Create empathy with the user and start to build a connection

Define – Define a problem statement from the previous empathy work

Ideate – Brainstorm to get a lot of new ideas to solve the defined problem

Prototype – Build and make things

Test – Test the concepts created with the users


The challenge

Participate in a crash course and redesign the gift giving experience in about 40min.

For the crash course, the participants formed pairs and were told they had to redesign the gift giving experience of the partner while following the supporting material ( In other words, the interviewee had to think about the last gift he/she offered and talk about the whole experience to the interviewer.

Definition of experience: Realizing you have to buy a gift to realizing you forgot to buy a gift to thinking about what you might get to purchasing it, wrapping it and offering it to the other person.


The 9 Steps to Achieve Success


1. Interview

The challenge is to design something useful and meaningful to the partner, and the most important thing of designing for someone is to gain empathy for that person, which means, in this step, the interviewer will make questions that would allow him to create a connection and reach the emotions of the interviewee (eg. When was the last time you gave a gift? How did it go? What was your favorite part? Least favorite?)

2. Dig Deeper

After creating a connection, the interviewer will want to forget about the gift and find out what’s important for the interviewee. He will want to dig deeper and seek for emotions, stories and motivations, which is why, an excel file is not used in this methodology. (eg. If the interviewee said he offered a gift to the mother and feels emotional, the interviewer will want to explore the subject and ask him what’s going on with the mother, why did he felt the need to offer her a gift)



3. Capture Findings

The interviewer will synthesize the learnings into a few “needs” he discovered and a few “insights” he found interesting.

Needs – typically verbs, are actions the person is trying to achieve while offering a gift (eg. Show love, be appreciated, trying to feel important)

Insights – learnings from the partner’s feelings (eg. The interviewee offered a gift because he/she feels pleased to make the other person happy)

4. Define problem statement

Using the needs and insights, the interviewer will create a statement he’s going to address with the design, which means it has to be something actionable and doable (eg. Paul wants to reconnect with an old friend because he misses the adventures they spent together while they were young).



5. Sketch

The interviewer will sketch at least 5 radical ways to meet the interviewee needs. In this step, perfection is not needed and quantity should be more important than quality, since the interviewer will want to explore all the possibilities

6. Share the solutions and capture feedback

The interviewer will share the sketches with the interviewee and capture the feedback by making open questions, always having in consideration not to defend his ideas and convince him/her what is good or bad (eg. What did you think about this sketch? what do you think it went wrong? what is missing?)

7. Reflect and generate a new solution

The interviewer will incorporate what he learned based on the solutions and the feedback provided and will create one single sketch, that can be an improvement of something he had sketched previously or something completely new



8. Build your solution

Using different art and craft materials (kitchen foil, paper clips, duct tape, balloons, plasticine, post-its, …) the interviewer will prototype the solution sketched. It should be something the interviewee can engage and react to.



9. Share your solution and get feedback

The interviewer will capture the feedback provided by point down what worked, what could be improved, questions and ideas the interviewee raised while testing the solution.


The Result

At the end of the session, the participants managed to apply the methodology in what could be a very complex experience for some users. Some great and crazy ideas were generated and who knows, if the next big thing was not born on that day?


More info on Design Thinking and how companies like IBM and GE are applying it in their business, just check the following links:

IoT Hub, Device Explorer, Stream Analytics, Visual Studio 2015 and Power BI

As we saw in my previous blog, the IoT Hub allow us to collect millions of telemetry data and establish bi-directional communication between the devices, however, more than quantity, what we need is valuable insights that will lead to smart decisions. But how can we do that?

Collecting the data

There are thousands of sensors we can use, depending on the purpose. If we check the Microsoft documentation we will find tutorials for the Raspberry Pi, Arduino, Intel Edison or even simulators created with .Net, Java or Node.

The first step is always the creation of the IoT Hub on the Azure Portal. Next, we have to add the devices, which can either be done using C# and the IoT Hub Extension for VS 2015 or the Device Explorer. This last tool, provided by Microsoft, can easily register new devices in the IoT Hub and check the communication between the device and the cloud.

Once the devices are properly configured we will need to store the data, which can be done using a SQL Azure Database.


Represent the data

Now that we collected the data, we want to be able to represent it. One of the best ways to do that, is by creating some Power BI reports and dashboards, which will be populated via Stream Analytics.

A good example of a similar architecture and example dashboards can be found on Piotr’s blog Using Azure Machine Learning and Power BI to Predict Sporting Behaviour. Note that on his example, he used Event Hubs instead of the IoT Hub.


Insights and actions

Let’s imagine a transportation company is collecting the telemetry from a food truck equipped with speed, location, temperature and breaking sensors. In order to assist their delivery process, they have a report being refreshed with real time data that triggers some alerts when certain values are reached.

One of the operators received an alert from the temperature sensor, and after checking the dashboard he realizes the temperature is too high and it will affect the quality of the products being transported. Instead of calling the driver and make him aware of the situation, because the sensors are connected to an IoT Hub, he can simply send a command to the sensor and reduce the temperature.


More info:

Azure Event Hubs and IoT Hub

Imagine that you are the CEO of a big Logistic & Transport Company that works across the UK. In order to obtain multiple insights that will allow you to efficiently analyse how your company is performing and help you take better decisions, you decide to start collecting different telemetry information from the vehicles fleet. The question is, how will you manage to deal with hundreds of connected devices producing millions of telemetry data? The answer is….

Event Hubs

Back in 2014, Microsoft announced the release of Azure Event Hubs, a service that allows the collection of high throughput ingress of data streams generated by devices and services in an easy, secure and reliable way.

The Event Hubs can be created either through the Azure Portal or the Management API and gets immediately available without the need of further setups or management/maintenance requirements. The information stored in customer partitions will provide the message streaming. Each customer will only read a specific subset of the message stream due to the portioned customer pattern.



IoT Hub

In a very simplistic way, the Azure IoT Hub is the bridge between our devices and the cloud. It is fully manageable, enables reliable and secure device-to-cloud and cloud-to-device communication between millions of IoT devices and provides a service interface to support the application development.



Event Hubs or IoT Hub

If we consider the IoT Hub as an improvement of Event Hubs, shall we assume that the solution to the scenario described on the top of the article will be the first option? The answer is… depends on what we want to achieve.

If our needs require bidirectional communication and millions of simultaneously connected devices, IoT Hub would be the choice, however, combining the IoT Hub and Event Hubs instead of using them separately is a better option. While IoT Hub can deal with the device-to-cloud communication, the Event Hubs can deal with the huge amount of event ingresses produced by our vehicle fleet.


More Info: