Adatis

Adatis BI Blogs

Iconography in Design

In this blog I will go about discussing the importance of design, in particular looking at how icons can help add the finishing touches to a piece of front end development. Whether you’re building a Power BI report, SSRS report, PowerApp, or doing any kind of front end visualisation and design, you’ll know that 50% of the battle is making whatever you’re building jump out of the page. You could have built the most useful report that a user could wish for, but unless it looks good, you’re not going to get any plaudits. Essentially, good design goes a long way to making a success on a piece of development. This is even more important if you’re building the underlying architecture / ETL, and for all our good practice and thoughtfulness in this area as developers, will rarely impress or impact on an end user. Its important to get inspiration for design from somewhere, so before going about designing or even building something, take a look on Google, do some research and try to look for similar things to what you’re doing and see what you think looks good, or not so good. You should build an idea up of how you want to design you’re landing page, report or report headers, etc. within the tool. If you’re in Power BI, take a look at the partner showcase for example. There’s some really good examples to give you ideas – like this one! Microsoft apps such as Power BI or PowerApps go a long way in helping us build the best when it comes to data visualisation, but unfortunately they sometimes rely on developers to go outside the box to finish things off.   Icon Finder Recently I’ve started to use a site called iconfinder.com which has a large pool of useful icons you can use for building out certain corners of apps that the standard MS tooling will not support. The icons are mostly $2 each, but you can get a subscription for £20/month or $10 with a discount code (which you should cancel as soon as you register). Please, don’t jump in if you think you need to use this as a resource. Alternatively, save the icon using Chrome and use it as a placeholder until you are happy to push to production. For non-subscribed users, the icon will always come on a background of faint grey lines. This isn’t too bad as they don’t completely ruin the look and feel of the icon in the development and are good for a placeholder for demos, etc. To get started, just type in your keyword for the type of icon you’re looking for, and then its just a case of wading through the results to find the icon that fits the look and feel your inspired to build against. Sometimes, you’ll get another bit of inspiration off the back of this which you can use as another key word to find even more icons. The site also comes with a very handy icon editor tool, essential Paint Shop Pro on the web. There’s lots of these sites out there but its useful its all integrated into one place at no extra cost. It will load the SVG icon into it automatically if subscribed which then allows you to edit colours or shapes etc. In my instance, I found a nice % complete icon set which would look good on a white background. Unfortunately, I wanted it on a blue header bar, so needed to change it up slightly to fit the look and feel. No problem, took less than a minute to modify and download. Its also worth mentioning that the site does a good job at helping you find a pool of icons which will fit together nicely using the same look and feel, showing you icons from the same icon set. In one instance, I replaced an icon I was looking for to be from the same pool even though the icon wasn’t exactly what I was looking for – because overall it just felt like it fitted together nicer with the other icons on the screen.   Design in Practice As mentioned above, doing some research before you build can really help you create a much better finish. For inspiration for a recent PowerApps design I did a quick search for landing pages on Google, and found a few I liked the look of (below). As long as the general elements you are working to are similar, it really doesn’t matter where the inspiration comes from. In these cases, they were in the form of mobile apps. From these images, I was able to identify the key components which made me bookmark them: I wanted some kind of non-offensive background, possibly semi-transparent, or with overlay. I wanted a title that stands out the page, so white on grey or similar. I wanted a small section for a blurb for the PowerApp. I wanted 2 buttons, and the buttons to stand out. I wanted logos in the top corners. I wanted a nice look and feel for the colour palette. From this, I then produced the following landing page. I found the background on picjumbo.com which turned out to be quite a nice resource for some generic business style artwork, and then added a blurring filter across the top. This still interferes slightly with the buttons / title so I’m not completely happy but satisfied enough that it achieves the look and feel I was looking for. For the title, the range of fonts supplied with PowerApps is rather limited so I could go externally for this too but was happy enough for the time being. The layout also leaves room to shrink the title and add a small blurb if need be. The buttons are made up of a number of icons and fit with the theme for the app. As mentioned above, I also added % complete icons to each page so users were able to understand how far they were along the scoring pages within the app. PowerApps provides sufficient icons for the back/refresh buttons that fit in with the white on blue theme, so I didn’t have to go externally for these. These were placed on the page header next to the logo.     Power BI While this most recent bit of design was focused on PowerApps, I also add small bits into PowerBI during report design. For instance, rather than just have a generic button that can push you to a “details” page which has a table for the row by row breakdown of some aggregated data - I looked for an icon, edited the colour palette slightly and added this to the report. With recent Power BI functionality, I can make the image act as a button and redirect the user to another page. I’ve also used icons in dashboard design where a single visual didn’t really represent the content of the report to drill into. This can also be a good way to go about adding a bit of flavour to a dashboard to mix things up. In my case, it also meant the drill down into the report level was less ambiguous, by asking a question as the title if that’s what the user wants to do. Design is always subjective of course, but its great to use other resources at your disposal to go about building out apps. Depending upon the current estate in which you develop, it also helps them stand out a bit more and add a unique context to the reports/apps within the project. Hopefully this blog has given you a few ideas for your next project!

Data Source Permissions and On-Premises Data Gateway: SQL Server and Analysis Services

In Microsoft’s documentation surrounding the On-Premises Data Gateway, the advice on permissions for the account used to authenticate the Data Source in the Power BI Service can be concerning for most, especially DBAs. In the Analysis Services section of the documentation, the advice is:The Windows account you enter must have Server Administrator permissions for the instance you are connecting to. If this account’s password is set to expire, users could get a connection error if the password isn’t updated for the data source.Server Administrator permissions…? What happened to the principle of least-privilege? In a practical sense, the On-Premises Data Gateway has to deal with two very different implementations of Analysis Services: Multidimensional and Tabular. Each are setup and configured differently from the other, and the nature of their security models are also different. As a one size fits all approach, it works. As we will soon see, the permissions do not have to be set as Server Admin The SQL section of the documentation, on the other hand, doesn’t actually specify what permissions are required for the Data Source to be established in the Power BI Service. PermissionsExactly what permissions are required for these common data sources, I hear you ask. As data sources are established at a database level, so too are the permissions set.Data SourceMinimum Permissions LevelSQL Server Databasedb_datareaderSSAS Tabular DatabaseProcess database and ReadSSAS Multidimensional DatabaseFull control (Administrator) Principle of least-permissions is now restored. Though there still are the curious incidents of Analysis Services data sources requiring permissions in addition to read. I am unsure, I have my suspicions, and have tried to find out. If you know, please leave a comment below!

Archiving the Data Lake

In a blog introducing the Data Lake Framework, keen readers will be aware that in the diagram there’s a box titled “ARCHIVE” but it has not been brought up since. The reason why the Archive layer in the data lake has not been discussed is because we’ve been waiting for the Archive Tier in Blob Storage.To remind readers of the framework and where the archive layer sits in it, here it is again with the archive layer highlighted.The Archive BlobThe Archive access tier in blob storage was made generally available today (13th December 2017) and with it comes the final piece in the puzzle to archiving data from the data lake. Where Hot and Cool access tiers can be applied at a storage account level, the Archive access tier can only be applied to a blob storage container. To understand why the Archive access tier can only be applied to a container, you need to understand the features of the Archive access tier. It is intended for data that has no or low SLAs for availability within an organisation and the data is stored offline (Hot and Cool access tiers are online). Therefore, it can take up to 15 hours for data to be made online and available. Brining Archive data online is a process called rehydration (fitting for the data lake). If you have lots of blob containers in a storage account, you can archive them and rehydrate them as required, rather than having to rehydrate the entire storage account. Archive PatternAn intended use for the Archive access tier is to store raw data that must be preserved, even after it has been fully processed, and does not need to be accessed within 180 days. Data gets loaded into the RAW area of the data lake, is fully processed through to CURATED, and a copy of the raw data is archived off to a blob container with a Cool access tier applied to it. When the archive cycle comes about, a new Cool access tiered blob container is created and the now old container has its access tier changed to Archive. For example, our Archive cycle is monthly and we have a Cool access tiered blob container in our storage account called “December 2017”. When data has finished being processed in the Azure Data Lake, the Raw data is archived to this blob container. January comes around, we create a new blob container called “January 2018” with Cool access tier settings and change the access tier of “December 2017” from Cool to Archive. This data has now been formally achieved and is only available for disaster recovery, auditing or compliance purposes. 

Testing the Waters: An Overview of Data Science using Azure Data Lakes

Data Science can fit seamlessly within the ecosystem of the data lake, whether this is through HDInsight or the extensibility of Azure Data Lake Analytics and U-SQL. This blog will give a brief overview of what Data Science is; how to link Data Science toolkits to the Azure Data Lake; and best practices for managing the data output from experiments.   Data Science Data Science is the relatively new kid on the block. One way to consider data science is as an evolutionary step in interdisciplinary fields like business analysis that incorporate computer science, modelling, statistics, analytics, and mathematics.At its core, data science involves using automated methods to analyse massive amounts of data and to extract insight from them. Data science is helping to create new branches of science, and influencing areas of social science and the humanities. The trend is expected to accelerate in the coming years as data from mobile sensors, sophisticated instruments, the web, and more, grows.   Data Science In The Data Lake The nature of the Azure Data Lake Store lends itself to Data Science in that it can hold any data, which the data scientist will want to access, transform and analyse.   HDInsight contains many implementations for data science, such as Spark, R Server and others. Hooking HDInsight to Azure Data Lake Store is pretty simple and follows these steps: In the Azure Portal Marketplace, select HDInsight which will bring up a series of blades. In this blog, I will be using Spark as my cluster type on HDInsight. In the storage settings you can then link your HDInsight cluster to Azure Data Lake Store Confirm your configuration on the next blade and wait around 20 minutes for your cluster to deploy and you’re good to go!   With Azure Data Lake Analytics, you incorporate data science by extending the capabilities of U-SQL and you do this by installing a series of files. Open up your Azure Data Lake Analytics account and click on Sample Scripts at the top. This will bring forward the following blade From there you’ll want to click on the U-SQL Advanced Analytics tab, which will copy about 1.5GB of files to the default Azure Data Lake Store associated to your ADLA account. This will take about 3 minutes to complete. When it’s finished copying the files it will then call a job to register the extension, which can be found in the assemblies folder of the master database. More resources about extensibility of U-SQL can be found here: https://blogs.msdn.microsoft.com/azuredatalake/2017/03/10/using-custom-python-libraries-with-u-sql/ https://blogs.msdn.microsoft.com/azuredatalake/2016/11/22/u-sql-advanced-analytics-introducing-python-extensions-for-u-sql/ The Laboratory Within Azure Data Lake Store, Folder and File Management is incredibly important for a well running data lake. See my blogs on Storage and Best Practices and Shaping The Lake for more information on how to set up your Azure Data Lake Store. The Laboratory is an area to be exclusively used by a data scientist. It’s an area where they can persist the results of experiments and data sets without impacting the day-to-day operations within the data lake or other data scientists. The laboratory is organised in to two broad area: Desks and Exhibits. Desks contain personal workspaces, the contents of which can be organised however the person wishes. It can be as well organised, or disorganised, as the person themselves. The Exhibit contains data sources produced in the Laboratory which are ready to be consumed by other users or systems. Both of which are laid out below. As always, if you have any feedback or comments do let me know!

Azure Data Lake Store–Storage and Best Practices

The Azure Data Lake Store is an integral component for creating a data lake in Azure as it is where data is physically stored in many implementations of a data lake. Under the hood, the Azure Data Lake Store is the Web implementation of the Hadoop Distributed File System (HDFS). Meaning that files are split up and distributed across an array of cheap storage.   What this blog will go into is the physical storage of files in the Azure Data Lake Store and then best practices, which will utilise the framework.   Azure Data Lake Store File Storage As mentioned, the Azure Data Lake Store is the Web implementation of HDFS. Each file you place into the store is split into 250MB chunks called extents. This enables parallel read and write. For availability and reliability, extents are replicated into three copies. As files are split into extents, bigger files have more opportunities for parallelism than smaller files. If you have a file smaller than 250MB it is going to be allocated to one extent and one vertex (which is the work load presented to the Azure Data Lake Analytics), whereas a larger file will be split up across many extents and can be accessed by many vertexes.   The format of the file has a huge implication for the storage and parallelisation. Splittable formats – files which are row oriented, such as CSV – are parallelizable as data does not span extents. Non-splittable formats, however, – files what are not row oriented and data is often delivered in blocks, such as XML or JSON – cannot be parallelized as data spans extents and can only be processed by a single vertex.   In addition to the storage of unstructured data, Azure Data Lake Store also stores structured data in the form of row-oriented, distributed clustered index storage, which can also be partitioned. The data itself is held within the “Catalog” folder of the data lake store, but the metadata is contained in the data lake analytics. For many, working with the structured data in the data lake is very similar to working with SQL databases.   Azure Data Lake Store Best Practices The best practices generally involve the framework as outlined in the following blog: http://blogs.adatis.co.uk/ustoldfield/post/Shaping-The-Lake-Data-Lake-Framework The framework allows you to manage and maintain your data lake. So, when setting up your Azure Data Lake Store you will want to initially create the following folders in your Root Raw is where data is landed in directly from source and the underlying structure will be organised ultimately by Source. Source is categorised by Source Type, which reflects the ultimate source of data and the level of trust one should associate with the data. Within the Source Type, data is further organised by Source System. Within the Source System, the folders are organised by Entity and, if possible, further partitioned using the standard Azure Data Factory Partitioning Pattern of Year > Month > Day etc., as this will allow you to achieve partition elimination using file sets. The folder structure of Enriched and Curated is organised by Destination Data Model. Within each Destination Data Model folder is structured by Destination Entity. Enriched or Curated can either be in the folder structure and / or within the Database.