Adatis

Adatis BI Blogs

LETS Process Data–Modern ETL for the Data Lake

At the PASS Summit this year, I attended a session by Michael Rys. In this session he introduced the concept of LETS as an approach to process data in the data lake. If you are familiar with data lake, then you will be familiar of having to apply a schema to the data held within. The LETS approach is purpose design for schematization. Where ETL stands for Extract, Transform, Load or ELT stands for Extract, Load, Transform – LETS stands for Load, Extract, Transform, Store. Data are Loaded into the data lakeData are Extracted and schematizedData are Transformed in rowsetsData are Stored in a location, such as the Catalog in Azure Data Lake Analytics, Azure Data Warehouse, Azure Analysis Services, for analysis purposes. I really like this approach as it makes sense for how data are handled in the data lake. It’s something that I will be advocating and using, and I hope you do too!

SQL PASS Summit–Day 3 and Reflections

Apologies for the delay in getting this blog out to you all. When PASS finished on the Friday we had to rush over to the airport to get our flight back to the UK. When I landed on Saturday I was suffering from jet lag and only now am I in a fit state to blog again.   I got the impression from the schedule that Day 3 of PASS was going to be a wind-down day as very few of the sessions seemed as intense as the previous days’ sessions. My first session of the day, despite being the last day of PASS, was early. Earlier than any of the keynotes, but worth getting up for – a Chalk Talk with the Data Warehouse Fast Track Team. This also included the Azure Data Warehouse team as well, and the conversation was much more focused on the Azure side of Data Warehousing. Lots of conversations around Polybase and patterns in how to get data from on-prem to cloud using Polybase. In terms of patterns, it was reassuring to learn that the approach Adatis has adopted is spot on. Simon Whiteley is the man to see about that. His blog is here: http://blogs.adatis.co.uk/simonwhiteley/ On the Fast Track theme, my next session was  exploring the SQL Server Fast Track Data Warehouse, which was interesting to know about, especially the various testing that these pre-configured servers go through. At some point next year, Microsoft will be releasing the Fast Track Testing Programme to the community so that everyone will be able to test their hardware to the same exacting standards and know what their maximum throughput / IO demand etc., is in order to properly gauge hardware performance. After this session I got talking to a few people about data warehousing. The conversation was so engrossing that I missed the session that I was due to attend. Luckily, most of the sessions at PASS are recorded so I will have to chase up that session and others when they get released.   My final session of the day was a Deep Dive of SQL SSIS 2016. It wasn’t so much a deep dive and more a run-down of upcoming features. The one I’m most excited about is the Azure Data Lake Store connector, which will be released once Azure Data Lake goes into General Availability, which I’ve been told is soon…..   Now that I’ve had to week to digest and reflect on SQL PASS Summit, my findings are thus: SQL PASS Summit is invaluable. It provides an opportunity to learn so much from so many people, and not just learn from the presenters. There are so many people from all over the SQL community, with different experiences of SQL, different experiences of data, different experiences of life, that you can’t not learn something. PASS provides the easy environment to share ideas among peers and learn new technologies, new ways of working and new tricks. I’ve already started sharing some of my learning's with colleagues and I can’t wait to share them with everyone else too!

SQL PASS Summit–Day 2

Day 2, Thursday, started off with a keynote from David DeWitt on cloud data warehousing, scalable storage and scalable compute. This set my theme for the majority of the day – which turned out to be big data tech.   My first session was with James Rowland-Jones and Kevin Ngo on sizing Azure SQL Data Warehouse for proposals – essentially answering “how much is this going to cost me?”. There are various factors to consider, which I will blog on separately. I’ve already briefly fed back to members of the team and they’re excited to know what I learnt in more detail.   My second session was about best practices for Big BI which, unfortunately, ended up being a sales pitch and I came away having felt that I’ve didn’t learn anything. There’s a lot of promise for BI in the big data space, so watch this space as we explore Azure SQL Data Warehouse, Azure Data Lake (Store and Analytics), and other big data technology for BI.   The third session was with Michael Rys on Tuning and Optimising U-SQL Queries for Maximum Performance. It was a full on session, learnt loads and took loads of notes. I need time to digest this information as Michael covered off a very complex topic, very quickly. I will, however, be blogging on it in due course.   After an intense third session, I chose a less intense session for the last session of the day: a Q&A with the SQL Engineering team. This was a great opportunity to learn from other users how they’re using SQL. Most users who asked questions were wanting to know about indexing, backups and High Availability.   Tonight – packing, and networking before the last day of PASS tomorrow!

SQL PASS Summit–Day 1

Day 1, Wednesday, technically started on Tuesday with a newbies speed networking event in which we had to rotate through a crowd of 10 other people - introducing ourselves and asking questions about our professional lives. This was awkward to begin with but, as the evening wore on, introducing ourselves to strangers became a lot easier and more normal. We then moved on to the Welcome Reception and then a #SQLKaraoke event. Great opportunities to meet new people from different areas of the world and parts of the community. Wednesday morning proper, began with a keynote from Joseph Sirosh. This keynote from Joseph essentially set the tone and theme for a large part of the conference sessions - Azure, Big Data and the Cortana Intelligence Suite. The first session I attended was on Design Patterns for Azure SQL Database (for which a separate blog will be forthcoming). The next session I attended was about incorporating Azure Data Lake Analytics into a BI environment (again, another blog is in the pipeline). My final session of the day was Going Under the Hood with Azure Data Lake. This was the most insightful session of the day, which has subsequently sparked my brain into Data Lake mode (expect many blogs on this), and went through how Azure Data Lake works as well as how the U-SQL language works and resources are allocated. Tonight - more networking. So far, the community has been so welcoming and I’m very much looking forward to tomorrow where I’ll be learning about Big Data solutions and best practices. I’m also looking forward to sharing all my experiences and learning's with my colleagues and wider SQL Community.

Reflections On PASS Summit

Well it’s been quite a week and this has certainly been a summit with many announcements for the wider BI community. Whilst I really enjoyed my previous visit to PASS and the sessions were certainly very interesting, aside from a few announcements around Power BI there wasn’t much new in the BI landscape. This summit has been quite a different experience - it really feels that BI features are again high in the priority agenda with SQL Server 2016. We have seen the resurrection of SSRS and significant enhancements across the MS BI Stack. As usual, as well as the new technology announcements there were some excellent sessions from community members I particularly enjoyed the session from Rob Farley’s on Query Plan Operators and Jason Strate’s talk on the Plan Cache. Both of which contained techniques I will certainly be using in future. I thought I would write up a summary of the changes which have been announced and my thoughts on them split by feature: SSRS Probably the most unexpected announcement were changes to SSRS. There has been such little discussion of SSRS since release of the 2008 R2 edition of SQL Server I had started to suspect SSRS was going to be left to go the same way as Performance Point. This would have been a great shame as SSRS has been a great tool for providing on-premise reporting and fits nicely in a gap which other parts of the BI Stack don’t cover. The following new features were announced: - The SSRS Portal has been rewritten in HTML 5 thus making it cross browser compatible and will work across devices (tablets, phones etc). It  also means the portal looks much newer and more in line with other offerings from MS such as the Power BI Portal. - It will now be possible to setup the positioning of and grouping of parameters in SSRS reports. This will make quite a significant change to the usability of reports and was something which could not be done without building a custom wrapper to display the reports. - The portal will support hosting of DataZen reports, the functionality of DataZen will remain and reports hosted on the site will be separate from traditional SSRS reports, however this additional report type will be rolled into SSRS and the current DataZen product will not exist in it’s current form. - SSRS Report parts can now be pinned to Power BI dashboards. - Reports can now be exported directly to PowerPoint without requiring the additional use of third party tools. - Changes have been made to the default charts and report objects which should mean that off the shelf reports immediately look nicer than  previously. This seems like an immediate win as all those who have worked with SSRS will know, it can take some time in order to produce nice looking reports using SSRS and much of this work is down to changing the current default settings. - The Power BI Mobile App will now support connections to RS, allowing on the move users to view on premise SSRS reports. - Tree Map and Sunburst chart types have been added providing additional visualisation types. - Printing no longer is dependant upon ActiveX. - Support for .Net 4 There were also a couple of things I noticed which were not quite so positive in the list of announcements: - When asked about the licensing model MS could not confirm how SSRS would fit into the licensing model in SQL 2016. This could be because this has yet to be decided, however it did make me wonder that with all these enhancements is this something which could potentially incurincurr an additional fee. - Reports deployed to SharePoint will not benefit from all of the new features, for example they will not benefit from any of the parameter changes and I am unsure of the other enhancements such as HTML 5 and new chart types will be supported in reports deployed to SharePoint. Basically don’t deploy your reports to SP unless you have to. Overall though the picture is fairly positive and certainly far more than I had expected to be announced. The HTML 5 compliance and changes to parameters will certainly appeal to clients. My only criticism is that functionally the navigation and user experience of the reports will be quite similar for a desktop user. Certainly the HTML 5 will improve the look, and changing the parameters will also make some improvements but the reports will still essentially be static parameterised reports – there is certainly no announcement of any new features such as AJAX style interaction with different report parts in the way that is currently supported by Power BI. SSIS Whilst perhaps not quite as major as the changes announced to SSRS, SSIS has some pretty nifty improvements. - Reusable (parameterised) control flow templates have been added to enable re-use of SSIS logic. Whilst I will have to try this out the implementation of this for myself certainly there are often many similar actions performed using the ETL process which could certainly benefit from this. - Column names now appear in the data viewer in event of an error! This is a great change and whilst it has always amazed me that this has not been present in previous releases (one might think this is pretty fundamental!) this will be a great enhancement for many people working with SSIS on a daily basis.  - The latest version of VS 2015 (available now in preview!) will support all versions of SSIS, allowing all SSIS development to take place in the same version of Visual Studio. On a side note, ‘Data tools’ has been rolled into one product allowing for development of DB Projects, SSIS packages, SSRS and SSAS. Whilst this may not sound groundbreaking these are long overdue features which I think will have many people very happy. -  New connectors allow direct connectivity to Azure storage and Hadoop. - Packages can now be deployed incrementally to the SSIS catalog, rather than required entire projects to deployed at a time. - A new role ‘SSIS_logreader’ will be added which will allow access to read logs without requiring Admin permissions. - Foreach loop will now be able to iterate over blobs in Azure Storage. - A new property on SSIS packages AutoAdjustBufferSize will, when set to true try to calculate the optimum buffer size for the package. - Support for Excel 2013 as a datasource. - ODataSource and Connection managers now support v3 & v4 protocols enabling support for JSON and ATOM. These features certainly will make some improvements to what is already an excellent product. Fixing some of the major bugbears such as VS support for different versions and showing column names in the event of an error rather then leaving the developer to guess will further improve the product and the additional support for connections to Azure will be very useful for those hosting data in the cloud. Unfortunately, when I asked if the issues of source control around SSIS packages will be addressed (difficulties to collaboratively work & add packages when several people are working on the same project) this is not something we can expect for this release. SSAS Last but not least there were a number of changes to SSAS in both the new Tabular models and the traditional multidimensional cubes. - Tabular models will have proper support for many-to-many relationships and bi-directional cross filtering. Whilst for many this was an expected feature already being available in the Power BI Desktop app, this is a major improvement. I think that currently one of the major reasons people choose to implement the traditional multidimensional model over Tabular is the complexity around these measure calculations in Tabular. - Improved designed for Tabular models – this was shown in the demonstration and included some major improvements such as colour highlighting of DAX code, better intellisense and the ability to add comments to measure definitions. Whilst from an end user perspective there will be no change, certainly from a productivity and developer happiness side these features will be hugely appreciated and help make the Tabular model a more mature product. Again I think this is another feature which is likely to make people who had previously rejected the Tabular model reconsider it’s viability. - Tabular models will support parallel partition processing and the translations, which for Tabular has only been available with BIDS helper as an unsupported feature will become fully supported. - Direct Query has a number of enhancements. Firstly the queries generated from it are far more efficient than previously (they used to be very inefficient, so I will wait tentatively to see how much of an improvement this is). It also supports calculated columns (which must be used with caution!) and perhaps the largest change is that it has better MDX support meaning that Excel can now be used to connect to the model and query it directly. - The DAX language has some improvements - it now supports variables and can return multiple result sets from a single query. - A new Tabular JSON document representing the model will become available and can be used when compatibility level is set to the new 1200 value. From the demo it looked like this format should be far more readable to the eye as well as making changes to the model itself quicker due to an improved structure of the metadata for the model. - Support has been added to track extended events in AS graphically. - Power BI Will support connections to Multidimensional models. - Issues around inefficiencies with MDX queries generated by Excel have been resolved leading to significantly faster performance in the 2016 release of Excel. Overall this is a really nice set of improvements. We see the Tabular model becoming a more mature product and also some enhancements which will give great benefit to users of the traditional MD models, such as integration with Power BI and faster performance in Excel. There are also some significant improvements both to Power BI and the Database engine which I am looking forward to. I should also note that Microsoft have announced their BI Roadmap which gives an overview of their future direction here: http://blogs.technet.com/b/dataplatforminsider/archive/2015/10/29/microsoft-business-intelligence-our-reporting-roadmap.aspx Anyways that’s it for now, I’ll leave you with this picture from earlier in the week. When me and colleague Simon took a trip to Mt St Helens :)  

SQL PASS Summit - Day 3

Today was the final day of the pass summit. The schedule was a little different in the morning with no keynote speech. Instead I opted to see Jerimiah Peschka’s session ‘Dynamic SQL: Build Fast, Flexible Queries’. This was an interesting talk, which covered the security aspects of dynamic SQL and how to avoid things such as SQL injection and also how to ensure that queries generated from dynamic SQL ran quickly and avoided issues such as ensuring plan reuse on each execution. Whilst I very rarely find a myself situation in which Dynamic SQL is essential I found this an interesting talk and I am sure that when the need does arise it will have been very handy. Following this I attended a Microsoft lab on Azure Machine Learning (AML) ‘Authoring a hand-written digit recognizer in Microsoft Azure Machine Learning Studio’. This was a hands on session where computers and a set of exercises and sample data are supplied. Essentially the lesson consisted of providing a model with vector images of numbers, training the model to recognise the numbers and then passing a new set of number vector images to the model to see it identify them (> 85%) correctly. The session was interesting and gave me an overview of using AML Studio which was great. My main issue was that the data was already provided in vector format (csv files which were imported), whilst I can see time constraints mean that the conversion from image to vector during the class would have been difficult it would have been very interesting to have seen a few of the examples of the images so that we could see exactly what it was that was being identified and how different the numbers were as this would give an illustration of how clever the algorithms are. I finished the conference with Brain Larson’s talk entitled ‘Effective Reporting through SSRS Advanced Authoring Features’. As you may know SSRS has not been updated for some time now and with the current focus on Power View I wasn’t sure how much I would benefit from this and what the plans are for this tool in the future. I was quite surprised therefore when the room filled up and overflowed so that there were quite a crowd of people in the room who had to stand in order to watch it. The talk showed some interesting uses of SSRS - images embedded in the background of reports, adding strip lines and markers and an example of creating an interactive report which allowed users to set their preferences for running other reports. The examples given were very good as was the delivery, my only issue here is that without new features added to SSRS (for example ability to do an asynchronous refresh of some report elements) I am not sure what the uptake of SSRS is going to be. All in all I have had a fantastic and informative time here. I leave you with some pictures, taken earlier in the week :)  

SQL PASS Summit – Day 2

This morning started with a talk from Rimma Nehme a key Microsoft architect. The talk was informative and clearly Azure and the cloud are something which Microsoft are very keen, it was well thought through and there was a fantastic analogy between pizza and different flavours of Azure which is not what I had expected. Adam Machanic did a presentation covering different methods which could be used in order to force the query optimiser into choosing more efficient query plans. This was not for the feint hearted and included some ingenious methods which he had developed which could be used to ‘fool’ the optimiser into selecting different query plans. The methods tried to push towards parallel queries and ensuring that the right workload was assigned to each of the threads so they finished at roughly the same time. The only drawback to this was that some of the examples produced code which was quite obfuscated and not something which I thought could be used except in the case that there was an exceptional issue which necessitated their use. Davide Mauri’s session on Unit Testing and Agile development was very interesting, he showed demonstrations of NUnit, BI.Quality and NBI. These are tools which allow users to create unit tests when following an agile approach, whilst we also use similar methods at Adatis to provide unit testing functionality the demonstrations were very effective and I will be doing some further research into these products to realise their full potential. Connor Cunningham also presented a fantastic talk titled ‘Managing 1M + db’s – How big data is used to run SQL Azure’. This was a very insightful talk which detailed the issues that had arisen when running the SQL Azure service and the methods used in order to identify, quickly resolve and identify the root cause of the problems. Central to this was both the collection of telemetry data from many sourced which could be used to perform diagnostics and the methods used in order to identify problems. Here we saw a very interesting usage of Azure Machine Learning, which had been setup to trigger Alerts on the occurrence of unusual behaviour. Connor is a fantastic speaker and clearly one of the most knowledgeable people on the SQL engine, I really enjoyed this talk. I’m off to the evening party now at the EMP museum, looking forward to having a few beers and meeting some new people :)

SQL PASS Summit – Day 1

Having spent several days enjoying Seattle and getting to know and enjoy the city the conference is now underway. Yesterday there were some introductory meetings where we got to meet some of the other attendees and get a feel for the environment, today everything was in full swing. The morning started with the opening keynote presented to a huge audience. There were demonstrations of some really exciting new features – in one we observed Kinect being used with Power Map in order to track customer movements to observe customer interest in different parts of the store. We saw some great looking Power BI dashboarding functionality with the ability to drillthrough into detailed reports. As well as this we saw some further enhancements SQL Server Azure and on-premise integration including a new stretch functionality which will allow users to seamlessly ‘stretch’ their data into the cloud, keeping the most frequently queried records on premise and the other ones in the cloud. We also saw a Columnstore index being created on an in memory table! Miguel Lopes gave a talk on the new features in Power Query where we saw the ability to query using ODBC and support for Analysis services connections, on the whole though whilst I think the ODBC will be particularly useful for some situations, much of this talk was giving an overview of Power query as a whole rather than demonstrating new functionality. The integration of SSIS and power query in future was mentioned, however no dates have been set for this and we are told that this may (or may not) be available at some point in 2015. Jen Stirrup gave an interesting overview of some of the features available in R, the session was so popular that many people had to sit round the front on the floor! Niko Neugebauer’s contagious enthusiasm when presenting his session on ETL Patterns with Clustered Columnstore indexes was great to see and I picked up a few tricks here that were quite handy when working in 2014 environments. I also very much enjoyed John Welch’s session in Continuous Delivery for Data Warehouses and Marts, this is something I myself have been involved with a lot recently and it was very interesting to see his methods of achieving this and also to discover that in many cases we were both doing things in the same way :) Overall the day has been very interesting, we have seen some exciting new features announced today and some significant enhancements to the Power BI product range, it seemed to me for some time that the lack of dashboarding functionality in Power View was holding it back and I think many people will be very pleased with this new functionality and the further enhancements to the Azure service.