Matt How

Matt How's Blog

Master Data in the Cloud

image_thumb4

As data warehouses become more advanced and move to the cloud, Referential Data Management is often bottom of the list. Being tied to an IaaS VM solely for Master Data Services feels like a big step in the wrong direction! In this blog, I will show you the secret of ‘app-iness with a cloud alternative which pieces together Azure and Office 365 services to deliver a beautifully mobile ready front end, coupled with a serverless, scalable and artificially intelligent back end.

Before that, let’s look at why we should move on from MDS.

Excel is easy to use, but not user friendly

Excel is on nearly every desktop in any Windows based organisation and with the Master Data Services Add-in, it puts the data well within the reach of the users. Whilst it is simple it is in no way user friendly when compared to other applications that your users may be using. Not to mention that for most this will be the only part of the solution they see! Wouldn’t it be great if there was a way to supply the same data but with an intuitive, mobile ready front end that people enjoy using?

Developers are tightly constrained

Developers like to develop, not choose options from drop down menus in a web based portal. With MDS, not only can Devs not make use of Visual Studio and a like but they are very tightly constrained by the business rules engine. At this point we should be able to make use of our preferred IDE so that we can benefit from source control, frameworks and customised business logic.

Not scalable according to modern expectations

Finally, MDS cannot scale to handle any kind of “big data”. It’s a bit of buzz word but as businesses collect more and more data, we need a data management option that can grow with that data. Due to the fact that MDS must be deployed from a server, there is no easy way to meet those big data requirements.

OK, so hopefully we are all agreed we will move on from Master Data Services but what do we move on to? In this approach the solution is actually a piecing together of a number of Azure/Office 365 services that give us an excellent front end and a robust and scalable back end. The first thing to think about is the back end. For this we should use Azure SQL DB because it offers us the scalability we need but is also a reliable PaaS option so we can maintain a cloud based architecture.

By thinking about the data it will need to contain we can create a data model that should comprise tables for user entry but also list and hierarchy tables that will supply the values for drop down lists and other data entry controls.

image_thumb9

When creating the tables, it is good practice to be strict with datatypes and nullability as our front end can make use of this data to improve the app. Once we know our model we can map individual fields to the controls that are available within our front end which will be PowerApps.

image_thumb14

Now we have a data model that can map to a front end we can actually build the first version of our App. For a guide on how to build an App from a data source you should follow this blog. The important things to consider here are the routes back to your data source and how to validate your data as cleanly as possible. The first consideration, regarding the route back to your data, can be solved in one of two ways depending on the complexity of the scenario. If you are looking to submit data simply back to a table then you can use the pre-configured SubmitForm() function. This will validate your data and then just push it into the table. If you need to execute a stored proc or make use of looping, conditional switching etc then you can create Flow pipelines that can take care of those requirements. In addition to containing those benefits Flow has some other great features shown below:

  • Flows can send push notifications back to any device that has PowerApps installed on it. You can easily create a pattern where User A creates a new customer and this then triggers Flow to notify User B. User B can then open the notification on their device which, if configured correctly, will open his version of PowerApp with that new customer loaded.
  • Flows can be triggered by updates or inserts to a SQL table, meaning that if an external system updates a record you can still get hold of that event and apply your business logic.

image_thumb19

The second consideration, data validation, can be tricky depending on the scenario. Based on the nullability of a column in your source data, Power Apps will automatically create a required flag for that field meaning that if a user tries to submit the form they will get an error and be forced to enter the missing data; crisis averted. If, however, you have pushed the data validation logic into your Flow or a Stored proc then you cannot push that error information back to your user and prompt some action. You can configure a Response object that will be executed using the run after options within Flow but this can only provide detail back to the PowerApps designer – not the user.

The final piece of the puzzle is Azure Data Lake Analytics which is what can join our master data to any transactional data, whether that be on-prem, in an Azure SQL service (DB/DW) or stored in Flat Files within Azure Data Lake. The Architecture for this is shown below.

image_thumb24

In this diagram, the Power App is reading and writing data to and from the SQL DB. Meanwhile we have some source data coming into our Data Lake. If you are interested in the structure of my data lake then check out Ust’s blogs on the Adatis Data Lake Framework. By using the U-SQL language (shown below) we can create an external data source for our master data and join that to the transactional flat file data, outputting the result to blob storage for further processing. (more info here)

image_thumb8

Hopefully you can see from this brief overview of the approach that we can leave Master Data Services behind and replace it with a much better alternative that can cover all the things that MDS is missing. For any comments, feedback or questions please comment on this post of catch me on twitter @MattTheHow.

How to Find Your Next Job with Power Apps and Flow

Both PowerApps and Flow exist within the Office 365 suite and bring enormous amounts of possibilities to mildly technical business users. No longer will Dan in IT who knows a bit of VBA be hassled to write a dodgy macro that puts some data in a database. Not only that, business users can now reach out to literally hundreds of other services that come connected straight out of the box! In this blog, I’m going to demonstrate a way we can use PowerApps to put a professional and mobile ready interface onto a Flow, allowing us to query an API and present the results back using Power BI.

 

Creating a PowerApp

You can create a PowerApp in either the Web Portal or using PowerApps Studio (https://powerapps.microsoft.com/en-us/downloads/). I personally prefer to use Studio but both work the same, and actually all connections, Flows and custom APIs are managed through a web portal. If you have ever developed with Windows Forms then PowerApps will feel very comfortable. There isn’t a toolbox as such but you can easily drag and drop controls from the ribbon bar and all the properties live on the right-hand side. It also holds some similarities with Apples xCode in the sense that you can see all your Screens (Scenes in xCode) on the left.

image

1. Ribbon Bar: Here you can drag and drop a wide range of controls, galleries and media APIs onto the App design screen

2. Preview App: This button will run your App/debug. You can also use F5

3. Screen Viewer: Here you can see all the screens that make up your App

4. App Design Surface

5. Properties Window: Configure properties about the controls within your App

 

The Common Data Service

Because we are looking at this from an Office 365 perspective we can make use of the Common Data Service, but we could also choose from any other relational data store including Oracle, MySql, SQL Server, SharePoint etc. As it says on the tin, the CDS is a generic, cloud hosted database that gives users the ability to create their own datastores and then share those throughout the organisation using AD. It also integrates very nicely with PowerApps and Flow meaning we can avoid any SQL DDL or Stored Procedures. Out of the box you get a range of standard tables that cover off a variety of business needs but you can also create custom entities that can tailor the CDS to your specific needs. Here’s an example of an entity I created in CDS to use as the main datastore for my App.

image

1. Ribbon Bar: New fields, Import/Export, Settings and Delete

2. Tab Bar: Fields and Keys. Preview Data within table

3. Custom Fields: Showing data types, Nullability and Cardinality

4. Standard Fields: Audit fields e.g. Created by / Created on

 

Developing a PowerApp

One of the best features of PowerApps is that it is very smart with metadata, we simply need to point it at a table and PowerApps can use that to make decisions on how to construct your App in a way that suits the C.R.U.D. needs of your datastore. By creating the app from the custom CDS entity, PowerApps will know that you need a browse screen, a details screen and a new/edit record screen. Better yet, PowerApps will create and populate a form control with all of the custom fields ready to be populated. Based on the fields configuration it can auto create mandatory flags, error handling and hint text.

You may question whether PowerApps has some limitations due to not having a code editor, whilst I’m sure some will find this to be true, I am yet to be disappointed. Instead of code, PowerApps uses Excel like functions and context variables which will feel very intuitive to any excel user. Context variables get stored at App level and can be called and updated from anywhere within your App.

When creating the App, you can choose from a range of controls including Power BI tiles, Star Ratings, PDF viewers, Import/Export, the list goes on. Additionally, the gallery options mean you can display data or images in a real variety of ways. Above all that though is the integration with the devices media capabilities that make PowerApps a really cool product for non-coders. With PowerApps you can take and save pictures, Play and record video/audio and even scan barcodes.

I’ve made a few basic changes to my App that you can see below but even if you hit F5 and previewed your app straight after creating it, you could successfully view, edit and input data to the database. So far I have written no code and Dan in IT is now free to go back to work.

image

1. Quick Actions: PowerApps has automatically created these quick actions to submit or close the form

2. Mandatory Indicator: Depending on the “Required” Property in the CDS

3. Text Box: In New mode will be blank, In Edit mode will show data. Can also show hint text and error messages if input is invalid.

4. Star Rating Control: I swapped a standard integer input with a star rating to make the App more user friendly.

 

Creating a Flow

By default a newly built app is configured to write data back to the datastore by using a SubmitForm() function. These functions are handy for a lot of things as they take care of resetting the form after submission but also setting the form to Edit or New mode. If we want to do anything more than this – avoiding code – then we need to start looking at Flow. Flow can do an awful lot – just look at the pre-built templates for some ideas, but I’m going to use it to call the Glassdoor API to get job progression information.

To create a Flow, you need to start with a trigger. The same goes for Logic Apps only, with Flow, you can trigger the process from a button press within your PowerApp. From then on you can create either actions, loops, branches, conditional logic and constraints in order to connect up any number of systems.

image

1. Trigger: Trigger point that is called from PowerApps

2. Initialize Variable: Passes a parameter from PowerApps into a variable to be used within the Flow

3. HTTP: Uses HTTP GET method to call the Glassdoor Job Progression API

4. Parse JSON: Parses the JSON response from Glassdoor and provides results in the form of variables

5. Email on Failure: By using the Run After feature I have configured an email notification if the Glassdoor API call fails

6. For Each Loop: Iterates over the JSON results and writes each set of variables to the database. At the moment I am using SQL so I can feed Power BI, the PowerApps team are working on deploying the CDS connector for Power BI to the UK in the coming months

The formula that is used to call the Flow from PowerApps look like this:

GetFutureJobs.Run(Occupation); Navigate(Results, ScreenTransition.None, {CurrentJob: Occupation})

In here there are 2 functions. The first (GetFutureJobs.Run(Occupation)) is the function to execute a Flow. Anything within the brackets will be passed into the Flow and can be used at any point within your process. In this case I pass in the users current job and use that to search Glassdoor for potential next jobs.

Next is the Navigate function. This is a common occurrence in PowerApps and is used to take the user to the results screen. The first parameter is the target screen, Results. The second tells PowerApps how to transition between screens and the final array (the bit between these {}) is a list of parameters that can be passed into the next screen.

 

Implementing a Power BI tile

The final step for my App is to analyse the results from Glassdoor using a Power BI tile. By creating a simple report and dashboard my PowerApp now has a fully functioning Power BI tile that will refresh on the same schedule as the main Power BI report within the service.

AppFinalDemo

 

Hopefully from this blog you can see how powerful these two services can be when paired together but also how accessible these tools are now. The fact that I can have a working mobile app within minutes is somewhat revolutionary. I can certainly see a load of opportunities for these to be used and I encourage anyone reading this to have a play and unleash the POWER!

Optimising Lookups with Caching

One of, if not the most, used transformations in SQL Server Integration Services (SSIS) must be the Lookup transform. Pretty much every ETL routine will make use of the component at some point, especially at the time of loading fact tables. Given the nature of fact and dimension tables, these lookups can be tasked with loading large amounts of data in memory to perform high speed lookups. There are occasions when a few optimisations on these lookups can greatly improve the efficiency of your ETL.

 

Choosing the Best Cache Option

One of the most important options featured on the Lookup Transform is the Cache Mode. By understanding the differences between these choices, you can be sure to pick the option that best suits your ETL needs.

image

 

Using the Full Cache Option (Default)

The Full Cache option tell SSIS to consume all your reference data into memory and then perform its lookups on that data. This has the benefit of only hitting the database once to fetch the data but can cause delay on package start-up as all the lookup data will have to be loaded prior to executing the package. You can optimise this by using a SQL Command to fetch only the columns you need but as you can imagine, a dimension with > 1million rows will take some time to cache and this is before you have even done any lookups!

Be aware that if you cannot fit your entire reference set into memory your package will FAIL!! SSIS cannot spool data out to disk if Full Cache is chosen and there is no graceful way of handling this error either. Luckily there is a great blog on how to calculate the cache size here.

Use this option when:

  • You want to avoid excessive traffic to your database or it is remote
  • You have a small reference table that can be cached quickly
  • You can fit your whole lookup dataset into memory

 

Using the Partial Cache Option

The Partial Cache is a good half way house. This option will tell SSIS to fetch the data at runtime, meaning a quicker start-up time but also a lot more queries to the database. Any value not currently in the cache will be queried for in the database and then stored in the cache, making it available for future lookups. The efficiency here comes from not having to charge the cache on package start-up and then also from having any previously fetched items stored in memory. One very specific benefit the partial cache has is that if your lookup table is being written to during the data flow, the full cache wouldn’t see those rows as it only queries the database once prior to execution. The partial cache however works perfectly here as it will see all the rows in the table at runtime.

Additional performance can be gained by making use of the Miss Cache. This option (found on the advanced page) allows a portion of the cache to be reserved for rows that don’t have a match in the database. SSIS will then know not to look for these rows again, thereby saving any unnecessary queries.

Use this option when:

  • You have a small reference set not worth charging a full cache for
  • You want to optimise your cache for more frequently accessed rows.
  • Your reference table is being updated within your dataflow.

 

Using the No Cache Option

Finally, No Cache literally means that no data will be cached. This means that every single row will be looked up against the database, even if that row has already been looked up before.

Use this option when:

  • You only need to process a small number of rows
  • You have severe memory limitations
  • For some reason, the partial cache wasn’t a better option

 

The Cascading Lookup Pattern

A nice way of optimising your lookups is to combine a full and partial cache creating what is known as a Cascading Lookup. The idea here is to use the full cache to store a subset of your reference data that is frequently looked up and then a partial cache for the rest. This works because the full cache shouldn’t take too long to charge, as it’s only a subset of the data, but will still have the benefits of being in memory. The partial cache will be used for the larger, non-frequently looked up data as it will have no charging time but can also make use of the cache to avoid querying the database twice for the same lookup value. An added benefit is that if the reference table is written to during the data flow, these rows will be caught by the partial cache lookup.

In this example, we have a large customer dimension, but we know that only 500 of those customers make a lot of purchases and so we can use a cascading lookup to optimise this dataflow.

image 

  1. Firstly, choose the Full Cache option. Then use a select statement to get only the data that is most looked up against. In this case, the 500 most loyal customers. Remember to redirect non-matching rows to the non-matching output!
  2. Now choose the Partial Cache option and then use a select statement to get the rest of the dataset – excluding the loyal customers.
  3. Finally, union the rows back together so that you have a single result set.

 

The Cache Connection Manager

Another option that is available is to use the Cache Connection Manager. This is akin to having a persisted full cache that can be referenced several times throughout your package. Better yet, you can convert a package cache connection manager to a project cache connection manager so that it can be referenced by other packages in the project! This offers multiple benefits including:

  • Charging your lookup caches in parallel to speed up package start-up
  • Caches can be loaded at any given time (although must be before the lookups are needed) so if you need to load the dimension during your ETL, you can.
  • Caches will only need to be charged once for your whole ETL project, meaning you can avoid similar lookups querying the same data twice.

The way to implement a Cache Connection Manager is seriously easy!

  • To start you need to create a data flow task and then drag a data source onto the design surface.
  • Below that, add a Cache Transform and connect the two up. Once you have configured your data source (remembering to select only the columns you absolutely need!), open the editor for the cache transform. Where prompted for a cache connection manager, select New.
  • On the General page, you can add a name and description.
  • On the Columns page, you will need to do some configuration. For the column/columns you want look up on you should add an Index Position. The lowest index number that is > 0 will be looked up first followed by any other subsequent columns with an Index Position > 0. Any columns with a 0 index are columns that you want to pull into the data flow. In this case that is the CustomerKey. You can also set data types here.

image

  • Once that’s complete you can map the columns in the cache transform.
  • Finally, you will now see a new connection manager in your Connection Managers window. Right click on the new cache connection manager and click Convert to Project.

An implementation of a cache connection manager might have a parent package like the below. Within each “Populate XXXXXX Cache” is a dataflow that extracts source data and puts it into a cache transform. The fact table load packages below that can now reference all the dimensional data using the cache connection manager.

image

In conclusion, there is a lot of optimisation that can be done with regards to lookups and their caching options. Not all of these need to be implemented all of the time. It is more on a case by case basis but knowing that there are options is half the battle.

Power BI Mobile Feature Review

In March Microsoft released Deep Touch integration for iOS amongst several other improvements to the Power BI Mobile application. This blog will look at a few of those features and examine some of the areas that still need work.


So far it seems Microsoft are doing a pretty good job of developing Power BI for the mobile platform and this is most apparent when they exploit some of the built-in functionality that make mobiles so handy! One of the best features of the latest iOS is 3D Touch integration and Power BI has fully grasped this bull by the horns. Using a Deep Touch, you can launch a pop-up menu offering some of the most useful features such as search and notifications but also quick access to your recently accessed dashboards.
 2017-04-05-T10-17-08_thumb2_thumb

 

Another big issue that the Power BI team have tackled head on is how to make rich visualisations mobile optimised. For this they have two solutions, the first being the desktop and mobile view options within Power BI desktop.

Desktop view of Report

Mobile view of Report

clip_image002_thumb1_thumb

clip_image004_thumb_thumb

 
The mobile view essentially de-constructs your report and lets you drag and drop your visualisations into the mobile template. By default, this view will always be displayed when viewing the report on a mobile device unless you rotate the device into landscape mode in which case the desktop version loads. I have mixed feelings about this feature. On the one hand, I like that I can see both views but if the report were to remain in mobile view but expand to fill the horizontal space as well, this could open up a lot more opportunities for mobile reporting.

 
However, despite gaining some control in how the report looks on a mobile device there are some pretty major limitations for the time being. Firstly, you cannot specify different properties, such as text size, for the desktop and mobile view. This means that you would need to consider both views when creating a report that will be both mobile and desktop otherwise your visual will be sacrificed as seen above in the deliveries and fragile items cards.


Another drawback is that each visual element has to be snapped to the prescribed grid and this includes the transparent grab handle/border that is used to select the visual. This border is half a grid square tall in the mobile view which means that you get a lot of white space, something you want to reduce in a mobile view.

 
Finally, visuals cannot be overlaid. Where I have circles around some of my cards in the desktop view, this is not possible in the mobile view.  Whilst you can add an image or logo you could not make use of any backgrounds whether they be an image or just a flat colour.
Thankfully, all custom visuals will work in mobile view and any cross filtering, auto play or auto zoom features (maps) are preserved perfectly.
 
Microsoft’s second solution is focussed around dashboards. From within the Power BI service you can arrange your pinned visuals into a mobile dashboard by switching the view as shown below.
image_thumb1_thumb

 

However, the best part is that it if you access a dashboard that doesn’t already have a mobile view then the app will automatically optimise the visuals into a mobile view so you don’t have to!


One of Power BI’s most notable features is Q&A – a method to query your data using natural language. Using a recognisable messenger format this feature is really well replicated in the mobile app and adds another layer of accessibility for non-techie, non-deskbound users.
 2017-04-05-T12-00-43_thumb2_thumb

 

A relatively new feature to Power BI is the ability to add custom links to a dashboard visual. This can be utilised quite nicely in the mobile app as it will make use of the deep linking technology in iOS so that I can launch a relevant app instead of just a web page. Here I have set a google maps URL as the custom URL for the map visual. Clicking on this in the mobile app launches my google maps app, not a webpage!
 2017-04-05-T12-10-49-p1bcut03ue1uis1[2]_thumb

 

Overall I can see that the Power BI mobile team have not tried to just recreate the desktop version but have embraced the mobile platform and have made use of existing features within iOS to present a feature rich app that has a familiar feel to it. Whilst there are some limitations, my next blog will look at how to create a mobile optimised report right from the start so that your users can benefit from mobile BI straight away!

Extracting Data from Salesforce using SSIS

This blog looks at a couple of methods of extracting data from Salesforce and gives a demo of how to output tables and queries into flat files.

 

Salesforce is HUGE! How do I find what I’m looking for?

Thankfully SFDC make it easy for us to find the tables that custom objects as they are suffixed with “__c” (e.g. “Assessment_Questions__c”). This convention carries through to Column level as well, meaning that when a Salesforce standard table is used, such as SFDCs Event table, any custom columns are also suffixed with “__c”.

 

Can I use SQL?

Salesforce utilises a modified version of SQL called SOQL (Salesforce Object Query Language). There are some Keyword similarities such as starting a query with “SELECT” and calling a table with “FROM” but if you want to limit a query result to a fixed number of rows you need to add “LIMIT n” to the end of the query string, not the start!

Additionally, SFDC provide some handy functions to make SOQL slightly easier on us developers. I have listed a few that I used below but a full documentation site is available here - https://help.salesforce.com/articleView?id=customize_functions.htm&type=0

Function

Usage

Example

LAST_YEAR

Provides a data range from Jan 1st of the previous year to Dec 31st of the previous year.

SELECT AccountId FROM Customers WHERE CreatedDate > LAST_YEAR

LAST_N_MONTHS: n

Provides a data range from the last day of the previous month and continues for the past n months.

SELECT Id FROM Account WHERE CreatedDate > LAST_N_MONTHS:12

BLANKVALUE

Similar to ISNULL. Will provide a substitute value if the field is null

SELECT BLANKVALUE(AddressLine2,”N/A”)

FROM…

 

How do I connect to Salesforce?

In order to access the Salesforce system, you will need a username, password and security token. From my research there are two viable methods for connecting to Salesforce using SSIS which are using an API component or using a linked server. Both have their pros and cons but generally I have found the API route to be more efficient. There are a number of 3rd Party providers around and I have listed a few of these below.

Provider

Comments

Link

C – Data (API)

 

http://www.cdata.com/drivers/salesforce/ssis/

Task Factory (API)

Forces you to down load all 50 components. Can use SSIS variables and SOQL

http://pragmaticworks.com/Products/Task-Factory/Free-Trial

Kingsway Soft (API)

Simple interface and can use SSIS variables. Auto generates SOQL. Cant edit error output from main component

http://www.kingswaysoft.com/products/ssis-integration-toolkit-for-salesforce/download

DB – Amp (Linked Server)

Allows you to use OLE DB Connection manager

http://forceamp.com/trial.htm

 

You can use these steps below to access SFDC and start retrieving data.

1.      Download an SFDC SSIS component. For this demo I have used the Kingsway Soft component available from the link above.

 

2.      Run through the install wizard and once complete, open Visual Studio.

3.      Create a new SSIS project and add a Data Flow Task. Rename this to Kingsway Soft Demo.

 

clip_image002[12]

 

4.      Go into the data flow and right click in the toolbox, hit “Refresh Toolbox”. Now you should see 2 new components called “Salesforce Destination” and “Salesforce Source”.

 

clip_image005[12]

 

5.      Drag a Salesforce Source onto the design surface and double click to edit the properties. Drop down in “Connection Manager” and select “New”. Fill out the details required using the below connection info.

 

Username: mih@adatis.co.uk

Password: S4lesforce

Security Token: oKar72dhKcBAWlq0i4M0RF7ua 

 

clip_image007[12]

 

6.      Test the connection, and click “OK” once it has connected successfully. From the component properties screen use the Source Type to select “Object Query”. Add this SOQL statement into the text editor below. Click “OK”.

 

clip_image009[12]

TIP!! – If you select “Object” from the Source Type drop down and choose your table you can then switch the source type back to “ObjectQuery” and the full SOQL statement to retrieve all columns will be generated automatically.

 

7.      Add a flat file destination and configure the connection manager. Enable the data viewer between your SFDC component and the flat file destination and preview the data that is being extracted from SFDC.

 

clip_image011[12]

 

8.      From here can then repeat the process to build up a package that will bring down all the files that you might need for your solution.  

There are additional tools such as Salesforce Workbench that will allow developers greater access to the underlying solution but these will be covered in a different blog. If this solution does or doesn’t work for you then let me know!