Adatis

Adatis BI Blogs

How to prepare for 70-766 - Perform Big Data Engineering on Microsoft Cloud Services

There is a new exam currently in beta titled "Perform Big Data Engineering on Microsoft Cloud Services (beta)". With all new exams there is little content on how to revise for the exam beyond the exams summary. This exam however, is what Adatis specialises in! Microsoft may call this "Big Data Engineering" we call it "Modern Data Analytics" and we have a few blogs on the subject. You can sign up to the exam here: https://www.microsoft.com/en-us/learning/exam-70-776.aspx Below you will find links to blog posts by Adatis consultants on topics related to all the key objectives of this exam. I will endeavour to keep this up-to-date with new content added by the team. Good luck with the exam. Design and Implement Complex Event Processing By Using Azure Stream Analytics (15-20%)Streaming data is vital to achieving real-time analytics. The following blogs posts focus on this and offer an introduction and walkthrough for getting started with Stream Analytics. When talking about a wider Lambda approach to Big Data, streaming enables rapid processing via a “Speed” layer.    http://blogs.adatis.co.uk/simonwhiteley/post/Adatis-Hackathon-Jan-2015-Streaming-Analytics-First-Thoughtshttp://blogs.adatis.co.uk/Jose%20Mendes/post/IoT-Hub-Device-Explorer-Stream-Analytics-Visual-Studio-2015-and-Power-BIhttp://blogs.adatis.co.uk/sachatomey/post/2017/01/19/Power-BI-Streaming-Datasets-An-Alternative-PowerShell-Push-Scripthttp://blogs.adatis.co.uk/Jose%20Mendes/post/Data-Data-Revolution Design and Implement Analytics by Using Azure Data Lake (25-30%)Azure Data Lake Store and Analytics are a vital component of the “Modern Data Analytics”. Data which is too large for traditional single server processing needs distributed parallel computation. Rather than pulling data and processing ADLA pushes the processing to the data. Understanding how to process large volumes of data is one part of the “Batch” layer in Lambda http://blogs.adatis.co.uk/ustoldfield/post/data-lakeshttp://blogs.adatis.co.uk/ustoldfield/post/Data-Flow-Job-Execution-in-the-Azure-Data-Lakehttp://blogs.adatis.co.uk/ustoldfield/post/Data-Flow-Pt-2-Vertexes-In-Azure-Data-Lake Design and Implement Azure SQL Data Warehouse Solutions (15-20%)Either as an alternative or in accompaniment to Data Lake is Azure SQL Data Warehouse. If Data Lake is batch across many files, Azure SQLDW is parallel batch over many databases. The key to both services is processing at the storage and not at the compute. The following is an on-going blog series covering the basics all the way to a  deep-dive.   http://blogs.adatis.co.uk/simonwhiteley/post/A-Guide-to-Azure-SQL-DataWarehousehttp://blogs.adatis.co.uk/simonwhiteley/post/Azure-SQLDW-What-is-ithttp://blogs.adatis.co.uk/simonwhiteley/post/Azure-SQLDW-How-Does-Scaling-Workhttp://blogs.adatis.co.uk/simonwhiteley/post/Azure-SQLDW-Distributionhttp://blogs.adatis.co.uk/simonwhiteley/post/Azure-SQLDW-Polybasehttp://blogs.adatis.co.uk/simonwhiteley/post/Azure-SQLDW-Polybase-Design-Patternshttp://blogs.adatis.co.uk/simonwhiteley/post/Azure-SQLDW-Polybase-Limitationshttp://blogs.adatis.co.uk/simonwhiteley/post/Azure-SQLDW-CTAS-Statements Design and Implement Cloud-Based Integration by using Azure Data Factory (15-20%)If you’re looking for a paas solution to move data in Azure, there is only really one option. Azure Data Factory. The following blogs will get you up-to-speed with ADF. http://blogs.adatis.co.uk/terrymccann/post/Getting-started-with-Azure-Data-Factoryhttp://blogs.adatis.co.uk/terrymccann/post/Setting-up-your-first-Azure-Data-Factoryhttp://blogs.adatis.co.uk/terrymccann/post/Azure-Data-Factory-using-the-Copy-Data-task-to-migrate-data-from-on-premise-SQL-Server-to-Blob-storage Manage and Maintain Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics (20-25%)Know each of the parts is only half the battle, you need to know how, when and why to use each part. What are the best practices? http://blogs.adatis.co.uk/ustoldfield/post/Deploying-a-Hybrid-Cloudhttp://blogs.adatis.co.uk/terrymccann/post/Azure-Data-Factory-Suggested-naming-conventions-and-best-practiceshttp://blogs.adatis.co.uk/ustoldfield/post/Azure-Data-Lake-Store-Storage-and-Best-Practiceshttp://blogs.adatis.co.uk/ustoldfield/post/Shaping-The-Lake-Data-Lake-Framework

Data Data Revolution

Following the DISCO theme, Adatis decided to present all the SQLBits attendees with a challenge based on the game Dance Dance Revolution. At the end of the game, the players were presented with two Power BI dashboards, one that streamed the data in near real time and the other representing historical data. This blog will detail the different components used in the demo.        (High Level Architecture)   The starting point The first requirement was to have a game that could run on a laptop and store the output data in a file. Based on the theme of the conference, we chose the game Stepmania 5 (https://www.stepmania.com/download/). After understanding how it worked and what type of details we wanted to capture, we adapted the program so it was possible to save the output in a TXT file every time a key was pressed. Following is an example of how the data was structured. {"Player": "0", "Row": "768", "Direction": "Left", "NoteType": "Tap", "Rating": "OKAY", "Health": "Alive", "Combo": "0", "Score": "0", "Artist": "Katrina feat. Sunseaker", "Song": "1 - Walking On Sunshine", "Difficulty": "Easy"}   Capturing player details To complement the game output, we decided to create an MVC application that had two functions, capturing the player details in an Azure SQL DB, and, upload a new Game ID along with the player details to a reference BLOB stored in an Azure Storage Container.   Sending the data to an Event Hub Since we wanted to stream the data in near real time, we needed an application that could read the data from the output file as soon as it was updated. To achieve this, we built a C# application that was sending the data to an Event Hub. To make sure we didn’t upload duplicate data, we implemented a logic that compared the last row with the previous one. If they were different, the row was uploaded and if not, the program would wait for the next input.   Distributing the data To distribute the data between the Azure SQL DB and the Power BI dataset, we used two separate Stream Analytics Jobs. The first job was using the Event Hub and the reference BLOB as inputs and the Azure SQL DB as output, while the second job was using the same inputs but having a Power BI dataset as an output. Due to the dataset limitations, we ensured that all the formatting was applied in the Stream Analytics Query (eg. cast between varchar and bigint, naming conventions, …).   Power BI streaming datasets In this scenario, the streaming datasets only work properly when created by the Stream Analytics Job. Any of the following actions invalidates the connection between the jobs and the dataset: · Create the dataset in Power BI · Change column names · Change column types · Disable the option Historic data analysis When the dataset crashes, the only solution to fix the issue is to delete and re-create it. As a result, all the linked reports and dashboards are deleted.   Representing the data By the time the demo was built, the connectivity of live datasets to the Power BI Desktop was not available, which means the live streaming dashboard was built using the online interface. It is important to note that it is impossible to pin an entire page as a dashboard when using live datasets since it won’t refresh as soon as the data is transmitted. Instead, each individual element must be pinned to the dashboard, adding some visual limitations.   The performance of the players could be followed by checking the dashboard streaming the results in near real time. The use of the word near was used several times in the blog because the streaming is limited not only by the internet connection but also by the Power BI concurrency and throughput constraints, meaning the results were not immediately refreshed. The second report was built using Power BI Desktop and was connected to the Azure SQL DB. At the end of the game, the players could obtain the following information: · Who was the winner · How did they perform during the game · The number of hits for each rating · Which direction they were more proficient

IoT Hub, Device Explorer, Stream Analytics, Visual Studio 2015 and Power BI

As we saw in my previous blog, the IoT Hub allow us to collect millions of telemetry data and establish bi-directional communication between the devices, however, more than quantity, what we need is valuable insights that will lead to smart decisions. But how can we do that? Collecting the data There are thousands of sensors we can use, depending on the purpose. If we check the Microsoft documentation we will find tutorials for the Raspberry Pi, Arduino, Intel Edison or even simulators created with .Net, Java or Node. The first step is always the creation of the IoT Hub on the Azure Portal. Next, we have to add the devices, which can either be done using C# and the IoT Hub Extension for VS 2015 or the Device Explorer. This last tool, provided by Microsoft, can easily register new devices in the IoT Hub and check the communication between the device and the cloud. Once the devices are properly configured we will need to store the data, which can be done using a SQL Azure Database.   Represent the data Now that we collected the data, we want to be able to represent it. One of the best ways to do that, is by creating some Power BI reports and dashboards, which will be populated via Stream Analytics. A good example of a similar architecture and example dashboards can be found on Piotr’s blog Using Azure Machine Learning and Power BI to Predict Sporting Behaviour. Note that on his example, he used Event Hubs instead of the IoT Hub.   Insights and actions Let’s imagine a transportation company is collecting the telemetry from a food truck equipped with speed, location, temperature and breaking sensors. In order to assist their delivery process, they have a report being refreshed with real time data that triggers some alerts when certain values are reached. One of the operators received an alert from the temperature sensor, and after checking the dashboard he realizes the temperature is too high and it will affect the quality of the products being transported. Instead of calling the driver and make him aware of the situation, because the sensors are connected to an IoT Hub, he can simply send a command to the sensor and reduce the temperature.   More info: https://github.com/Azure/azure-iot-sdks/commit/ed5b6e9b16c6a16be361436d3ecb7b3f8772e943?short_path=636ff09 https://github.com/Azure/connectthedots https://sandervandevelde.wordpress.com/2016/02/26/iot-hub-now-available-in-europe/ https://powerbi.microsoft.com/en-us/blog/monitor-your-iot-sensors-using-power-bi/ https://blogs.msdn.microsoft.com/mvpawardprogram/2016/12/06/real-time-temperature-webapp/