Ust

Ust Oldfield's Blog

Whitelisting Azure IP addresses for SQL Server

In a recent blog post, I wrote about whitelisting Azure Data Centre IP addresses for Key Vault. Key Vault’s firewall uses CIDR notation for IP ranges, which is exactly what is contained within the list of IP addresses supplied by Microsoft. However, there are some resources, like Azure SQL Server, which only accept IP ranges. Therefore, we need a way of converting CIDR to an IP range.

Handily, there’s a PowerShell script which exists to provide that conversion – called ipcalc.ps1. When you download it, make sure it’s in the same working folder as the script you’re going to use to create the new firewall rules.

From there, we can make slight amends to the script we had in the previous post and produce the following script:


If you need to assign the IP ranges to other resources you can substitute the New-AzSqlServerFirewallRule with the appropriate cmdlet and parameters


Whitelisting Azure IP addresses for Key Vault

A colleague came to me with an interesting request:

We want to put Key Vault behind a firewall, but when we do that it means that Azure Data Factory can no longer access the secrets. Is there a way to whitelist the IP addresses for a given Azure Data Centre?

The short answer is: Yes.

By default, the following option is enabled on Azure Key Vault under the Firewalls and virtual networks blade.

image

For most users, having unrestricted access from external networks to a resource that holds secrets, certificates and other sensitive information is a big red flag.

If we choose to only allow access from Selected Networks we get the following options opening up for us:

image

Note that trusted Microsoft services is not an extensive list and does not include Azure Data Factory.

image

Therefore we need to whitelist a series of IP Addresses in the firewall rules. The list of IP Addresses are published by Microsoft and are updated on a weekly basis. The IP addresses are published in an XML document, which isn’t always the best format when one needs to update firewalls in Azure.

Shredding XML

To update the Firewall in Azure, we’re going to use PowerShell to shred the XML and extract the IP ranges for a given region. Then, we’re going to use the updated Azure PowerShell module to register the IP ranges against the Key Vault.

Using the last command, we can check that the IP ranges have been registered successfully. You should see something like:

image

There we have it, explicit IP whitelisting of Azure Data Centres so we can lock down Azure resources, only opening up access when we need to.

Update

Key Vault is currently limited to 127 firewall rules. If you are adding a region with more than 127 IP ranges, you might have an issue…

Testing: What’s the point?

I’m almost certain that every developer has asked themselves this question at least once throughout their careers. You’ve developed your solution, it works fine on your machine and now the deployment into production is being held up because someone mentions the need to do testing. What’s the point of testing? Ultimately, to provide assurance about the quality of a product.

With testing, there are two approaches:

  • Manual
  • Automated

Manual testing is what most developers complain about: it’s expensive to setup; laborious to execute; time consuming to repeat; and prone to human error. Manual testing typically takes the form of User Acceptance Tests – and sometimes can be the only tests that are conducted on a product. How confident are we that the product is of high quality if we only do manual testing? Not very.

Automated testing is what every developer should be doing: they’re executed by a machine; they’re repeatable; they’re more robust and reliable than manual testing. However, like manual testing, the quality of the test is dependent on how well the test scripts have been written and the test scripts can vary hugely in complexity. The tests could vary from very simple build verification tests through to complex regression tests.

Types of Testing

At its most simplest, testing can be build verification and at its most complex, testing can be user acceptance testing. But to get a true feel of how complex they are and how often you should use them, we should refer to a testing tree.

test tree

As we can see, the wider the segment the more frequently we should employ it and, as we work our way up the pyramid, the more complex the type of testing becomes. For the remainder of this blog post, I’m going to briefly expand on the following types of tests:

  • Build Verification
  • Unit
  • Integration
  • Regression

Build Verification Tests

A build verification test is using a tool like MS Build to answer the question: does my code compile? If it does compile, the test has passed. If it doesn’t compile, then the test has failed. This can be used in the local development environment, through Visual Studio or it can be conducted as a task in a Build Pipeline within Azure DevOps. These types of tests are extremely cheap to automate and maintain; and very quick to run.

Unit Tests

Unit tests are low level tests, meaning that they are close to the source of the product. They should be written with the aim of testing individual methods and functions for a given code base, using a unit test framework to support the authoring and execution of a test. As a developer, you would typically author the unit tests in a development tool like Visual Studio; you’d run them locally to ensure that the tests pass; and then they would be executed on a regular basis as a task in a Build Pipeline within Azure DevOps. Unit Tests are cheap to automate and should be quick to run.

Integration Tests

We know that individual units of code work, due to unit tests, but how can we be sure that those units work together? Integration tests are intended to verify that the units of code and the services used in a product work together. As a result, they are more expensive to automate and maintain than unit tests; and can take considerably longer to run. Whilst unit tests can be run without dependencies of other parts of the product being available, integration tests often require multiple parts of the product – including infrastructure – to be up and running so that the integrations between units and services can be tested. Because integration tests might require infrastructure to be available, and certainly multiple parts of the product available, integration tests are best run as part of a Release Pipeline in Azure DevOps.

Regression Tests

We’ve verified that individual elements of the product work; and we’ve verified that the individual elements of the product work together; what happens if we change elements of the product? This is where regression testing comes in – to verify that newly developed code into a deployed product does not regress expected results. We’ll still need to go through the process of unit testing and integration testing; but do we want to go through the rigmarole of manual testing to check if a change has changed more than what it was meant to? That’s something that we would like to avoid, so we have regression testing to alleviate that need. Like integration tests, they do need multiple parts of the product available so would need to be executed as part of a Release Pipeline in Azure DevOps. Regression testing is expensive to automate and maintain; and slow to run – but that doesn’t mean that they should be avoided. The add a layer of confidence to a newly changed code base which is about to be deployed. However, because we are testing targeted elements, perhaps the entire solution at once, we don’t want to run all regressions tests all the time because they would take a very long time to complete.

Summary

We know why we’re do testing; we are aware of some high-level approaches; and we’ve gone through some types of automated tests in brief detail. This post is first in a series on testing, future posts will include:

As always, do let me know if you have any feedback or questions in the comments section.

Integration Testing Overview

In a previous post, I touched on the point of testing and briefly talked about integration testing. In this post, I will be going into more detail about what integration testing is and why it’s important to do it.

In the previous post, I said that Integration Tests are:

intended to verify that the units of code and the services used in a product work together. As a result, they are more expensive to automate and maintain than unit tests; and can take considerably longer to run. Whilst unit tests can be run without dependencies of other parts of the product being available, integration tests often require multiple parts of the product – including infrastructure – to be up and running so that the integrations between units and services can be tested. Because integration tests might require infrastructure to be available, and certainly multiple parts of the product available, integration tests are best run as part of a Release Pipeline in Azure DevOps.

To expand on this, integration tests are written for each integration point for a solution. But what do we mean by “integration point”? An integration point is typically where two or more units of code interact with each other, or two or more services interact with each other – verifying that the individual parts or components of a solution works as intended together with other parts. How do we define an integration point?

Integration Points

We define an integration point by whiteboarding each component of our solution with the aim to document how they interact with each other. We can highlight the integration point by drawing a circle around it.

Consider the following architecture:

image

It’s a fairly typical modern data warehouse solution. We’re ingesting data from a variety of sources and storing it in a data lake. We’re then transforming and processing that data into our warehouse schema before presenting it in a data warehouse; processing it in an analysis services model so that it can be reported on. That’s the architecture, but the components used might be very different and interact differently with the architecture.

For the ingestion, our integration points are going to be between the following components:

sourceToRawIntegration

For the transformation piece, our integration points are going to look like:

rawToCuratedIntegration

Finally, for processing our data into the semantic model, the integration points look like:

curatedToSemanticIntegration

As you can see, the integration points do not align perfectly with the architecture – bear in mind that every solution is different, so your integration points will definitely look different even if the broad architecture is the same.

Integration Testing

We’ve documented our integration points and now we need to write some integration tests. Integration tests are executed for the various integration points that exist in a solution. Like most forms of testing, integration tests follow a pattern of:

  • Initialise system under test
  • Call functionality under test
  • Assert expected outcome against result of method

Generally, integration tests will be executed after deployment as they often require the infrastructure to exist. Most of the time, integration tests should not be dependent on data. However, if data does need to exist, this must be created at the time of setting up the tests. Most of the time, you can automate integration tests using a unit test framework such as Pester or NUnit.

Some Best Practices

To get you going, I’m going to set out some best practices that you should aim to follow:

  • Only create integration tests you need
  • Don’t depend on data being available. If you have tests that depend on data – create that data before execution, as part of the test setup
  • Multiple asserts per test. You might have dependencies on external resources that you’d like to keep open or you want a fast running set of tests. Multiple asserts help with all of these.
  • Choose unit tests over integration tests when feasible. Don’t duplicate effort.

Further reading

My colleague Ben has written an excellent blog on SQL Integration Testing using NUnit.

I’ll add another post soon about how to do Integration Testing using Pester.


Unit Testing Overview

In a previous post, I touched on the point of testing and briefly talked about unit testing. In this post, I will be going into more detail about what unit testing is and why it’s important to do it.

In the previous post, I said that Unit Tests are:

low level tests, meaning that they are close to the source of the product. They should be written with the aim of testing individual methods and functions for a given code base, using a unit test framework to support the authoring and execution of a test. As a developer, you would typically author the unit tests in a development tool like Visual Studio; you’d run them locally to ensure that the tests pass; and then they would be executed on a regular basis as a task in a Build Pipeline within Azure DevOps. Unit Tests are cheap to automate and should be quick to run.

To expand on this, unit tests are written by a developer to apply to a unit of code. But what do we mean by “unit of code”? A unit of code is the smallest testable part of a solution – verifying that the individual part or component of a solution works as intended, independently from other parts. A unit could be a C# method; a PowerShell function; a T-SQL Stored Proc, and many others. Like most forms of testing, unit tests follow a pattern of:

  • Initialise system under test
  • Call method under test
  • Assert expected outcome against result of method

A best practice would be to write the unit test before the writing any code, but if you’ve not got to that level of maturity with your test approach - writing tests after code is still good practice.

How do you write a good unit test?

Keep it simple

  • A unit test shouldn’t replicate the code it is intended to test.
  • You’ll be writing lots of them, so make them quick and easy to write.

Readable

  • By keeping it simple, the test should also be readable. Making it easy to know what method is being tested and the expected behaviour of the method.
  • By making it readable, you can easily address any failures that may surface.

Reliable and Repeatable

  • Unit tests should only fail if there are bugs in the system, not because there are bugs in the tests. Keeping it simple and readable will avoid that issue.
  • Unit tests need to be run many times, sometimes multiple times throughout the course of a day, so they need to be executed quickly in a repeatable manner. Keeping it simple helps achieve this aim.

How do you write a unit test?

We’ve got an understanding of what a unit test is, but how do we write one? For this example, we’ll be writing our code and tests using C#.

Our application is a very simple calculator, which adds two numbers together.

Calculator

Simply, to add a new Unit Test, we can right-click on the method and select Create Unit Tests. Because we’ve not built any unit tests before, we can use it to create a new unit test project using a framework of choice. If we already had a unit test project, we could add the new test to the existing project.

createUnitTest

Using this method, it creates a skeleton of a unit test from which we can amend for our needs.

unitTestNew

As you can see, this doesn’t contain what we need, so we amend the test so that it reflects our requirements, as in the below.

unitTestAmended

To run a Unit Test, you can either right-click on the test method and click on Run Test(s) or open up the Test Explorer window, navigate to the desired test and click on Run Selected Tests.

Unit Tests in Azure DevOps

We’ve written our unit tests and have run them locally, but how do we make it repeatable? We utilise the power of Azure DevOps to have repeatable tests run against a changing code base as part of the Build or Continuous Integration process.

image

The process is:

  1. Install NuGet on the Build Agent
  2. Restore any packages from NuGet that your application requires
  3. Build solution
  4. Run tests
  5. Publish tests
  6. Copy successfully built and tested artifacts to a staging directory
  7. Publish those artifacts

Using Azure DevOps, or another CI tool, we can rely upon our tests in a repeatable manner.

Additional Reading

There’s a good post by Sergey Kolodiy on the importance of writing good code and how unit testing encourages good behaviour.

My colleague Jon has also written a post on the subject: Setup Unit Testing with NUnit and NBi.


Azure Active Directory Authentication and Azure Data Catalog

In a previous post I introduced Azure Data Catalog. Because it’s great for data discovery and for data asset management, it makes sense to automate, as much as possible, the process of registering new data assets, and allowing users to discover data in a more natural, perhaps conversational, way. In order to automate the registration of data assets or to allow discovery through other tools, it’s necessary to look at how Azure Data Catalog authenticates users using Azure Active Directory (AAD). This post is going to explore some of the options the Azure Data Catalog uses for authentication and a walkthrough of a code example to make authentication work without user input.

Azure Active Directory Authentication

If you have interacted with Azure Data Catalog before, you will find that there are two ways of doing so. First, there’s the web application that allows you to conduct data discovery and data asset management. Then there’s the native application that sits on your local machine that can be used for registering data assets. These use different methods of authenticating using Azure Active Directory. The first one uses Web Browser to Web Application authentication. The second uses Native Application to Web API authentication.

Web Browser to Web Application

What is involved with Web Browser to Web Application authentication? Simply put, the web application directs the user’s browser to get them to sign-in AAD. AAD then returns a token which authenticates the user to use the web application. In practice, it’s a bit more complex, so here’s a diagram to help explain it.

image

In a bit more detail, the process it follows is:

1) A user visits the application and needs to sign in, they are redirected via a sign-in request to the authentication endpoint in AAD.

2) The user signs in on the sign-in page.

3) If authentication is successful, AAD creates an authentication token and returns a sign-in response to the application’s Reply URL that was configured in the Azure Portal. The returned token includes claims about the user and AAD that are required by the application to validate the token.

4) The application validates the token by using a public signing key and issuer information available at the federation metadata document for Azure AD. After the application validates the token, Azure AD starts a new session with the user. This session allows the user to access the application until it expires.

This method of authentication is used by Azure Data Catalog when discovering data through the browser.

Native Application to Web API

What’s the process of Native Application to Web API authentication? Simply put, the application will ask you to sign-in to AAD, so that it can acquire a token in order to access resources from the Web API. In practice, it’s a bit more complex, so here’s a diagram to help explain it.

image

In a bit more detail, the process it follows is:

1) The native application makes a request to the authorisation endpoint in AAD, but using a browser pop-up. This request includes the Application ID and redirect URI of the native application (see the following article for native applications and registering them in Azure) and the Application ID URI of the Web API. The user is then requested to sign-in.

2) AAD authenticates the user. AAD then issues an authorisation code response back to the application’s redirect URI.

3) The Application then stops the browser activity and extracts the authorisation code from the response. Using the authorisation code, the Application then requests an access token from AAD. It also uses details about the native application and the desired resource (Web API).

4) The authorisation code and details are checked by AAD, which then returns an access token and a refresh token.

5) The Application then uses the access token to add to the authorisation header in its request to the Web API. Which returns the requested resource, based on successful authentication.

6) When the access token expires, the refresh token is used to acquire a new access token without requiring the user to sign-in again.

This method of authentication is used by Azure Data Catalog when registering data assets via the desktop application.

Automated Interaction with Azure Data Catalog

In both of the examples above, they require the user to interact in order to provide sign-in credentials. This is not ideal if we want to automate the registration of data assets or conduct data discovery outside of the browser. Therefore we’ll need to use a different method of authentication. This is the Server Application to Web API authentication method. Simply, it assumes that the server has already required a user to login and therefore has the user’s credentials. It then uses those credentials to request the access and refresh tokens from AAD.

image

In a bit more detail, the process it follows is:

1) The Server Application makes a request to AAD’s Token Endpoint, bypassing the Authentication Endpoint, providing the credential, Application ID and Application URI.

2) AAD authenticates the application and returns an access token that can be used to call the Web API.

3) The Application uses the access token to add to the authorisation header in its request to the Web API. Which returns the requested resource, based on successful authentication.

This method is what we’re going to use to automate our interaction with Azure Data Catalog.

From an authentication aspect, the code for Server Application to Web API is simple and this example will take us to the point of returning that token, from which we can then use to request resources from the Azure Data Catalog API. The full code can be found my GitHub repo.

We are going to use the Client Id and Secret from an application we’ve registered in AAD (full process can be found in this Microsoft article on Integrating Applications with AAD).

private static string clientId = "ApplicationId";

private static string secret = "ApplicationKey";

Then, we’re going to make sure we’re connecting to the correct AAD instance

private static string authorityUri = string.Format("https://login.windows.net/{0}", tenantId);

So we can create an authorisation context

AuthenticationContext authContext = new AuthenticationContext(authorityUri);

In order to acquire a token

authResult = await authContext.AcquireTokenAsync(resourceUri, new ClientCredential(clientId, secret));

Which can then be used in an authorisation header in requests to the Azure Data Catalog API. In the next related post, we’ll explore how to make a call to the API using this authentication method.

Introduction to Kubernetes

Kubernetes is an orchestrator for containerised applications. This post will aim to give a high-level overview of what Kubernetes is.

According to the team at Kubernetes, Kubernetes provides a container-centric management environment. It orchestrates computing, networking, and storage infrastructure on behalf of user workloads. This provides much of the simplicity of Platform as a Service (PaaS) with the flexibility of Infrastructure as a Service (IaaS), and enables portability across infrastructure providers.

Where PaaS operates at a hardware, Kubernetes sits at the container level which means that you don’t get a full PaaS offering – but you do get some features such as ease of deployment, scalability, load balancing, logging and monitoring. Unlike IaaS, it’s not a monolithic solution – each solution is optional and pluggable, providing a platform to build upon, like Lego bricks, preserving choice and flexibility where required.

It is also not just an orchestrator. Most orchestrators use workflow: Do this, then that etc., whereas Kubernetes is a set of independent control processes to drive the current state to the desired state. Traditional orchestration can be viewed as the means justify the end, whereas Kubernetes can be viewed as the end justifies the means.

You can think of Kubernetes as one of a few things. Either a container platform; a microservices platform; or a portable cloud platform. There are probably more applications for Kubernetes, but those are the three broad and dominant uses of it.

Why Containers?

Without containers, the way to deploy an application was to install the application on the host system using the OS package manager. It entangles the application with the host OS. Rollback is difficult, but possible. However rollback would often be restoring a VM image – which is heavy-duty and non-portable.

Containers virtualise the operating system rather than virtualise the hardware, like a VM does. They’re isolated from each other and the host. They have their own file systems and their resource usage can be bound. Because they are decoupled from the infrastructure and the host OS, they are portable across different operating systems and between on-prem and cloud distributions.

image


Working with Kubernetes

To interact with Kubernetes, you interact with the Kubernetes API objects. These objects describe the cluster’s desired state. Effectively, what applications or work loads do you want to run; the container image they should use; the number of replicas; the resources to make available – to name but a few. The desired state is set by creating objects using the API, typically using a command line interface called kubectl. Once this desired state has been set the Control Plane works to make the current state match the desired state. The process of doing this, Kubernetes manages automatically, but it does so through a collection of processes that run on a cluster. These are:

  • The Kubernetes Master, which is a collection of three processes (kube-apiserver, kube-controller-manager, kube-scheduler) that run on a single node in the cluster. When you interact with a Kubernetes cluster through kubectl, you’re interacting with the master.
  • A worker node will run two processes – kubelet, which communicates with the master node; and kube-proxy, which is a network proxy for the node. A worker node is a machine that runs the workload. The master controls each node.

Kubernetes Objects

There are several Kubernetes objects. As a basic set, these objects are:

  • Pod – like DNA, a Pod is the basic building block of Kubernetes. A Pod represents a process running on a cluster. It encapsulates a container and the resources it needs and the behaviour for how it should run. A Pod represents a unit of deployment: a single instance of Kubernetes, which may contain one or many tightly coupled containers. Docker is the most container runtime used in a Pod.
  • Service – a Service is a logical abstraction for a set of Pods and a policy by which to access them.
  • Volume – a Volume is similar to a shared disk but are vital to resolving issues that arise with containers. On-disk, containers are temporary. They are mortal. If a container crashes, it will be restarted but files that it had within are lost. Similarly, if you run many containers in a Pod it can be necessary to share files between the containers. Volume solves these problems.

The Control Plane

The Control Plane maintains a record of all Kubernetes objects and runs continuous maintenance loops to check that each objects matches the desired state.

At a high-level, that is Kubernetes. Be on the look out for more posts around Kubernetes.

UPDATE: This post was updated on the 20/03/2018 to give more detail to what Kubernetes is

Tabular Automation and NuGet

In a recent blog post, I wrote about processing an Azure Analysis Services tabular model using Azure Functions. In it, there’s a lengthy process of downloading some DLLs and uploading them to the Azure Function. Handily, the Analysis Services team at Microsoft have released the Analysis Services NuGet package, which means that the necessary DLLs can be automatically installed to an Azure Function without much hassle. This blog is going to go through the steps of adding the NuGet package to your Azure Function.

Add a new file to your function called project.json

image

Input the following code in the newly created file

{
   "frameworks": {
     "net46":{
       "dependencies": {
         "Microsoft.AnalysisServices.retail.amd64": "15.0.2"
       }
     }
    }
}

Then save the Azure Function to proceed with the NuGet restore and compile your function. You should see the following logs in your log window.

image

That is the entire process. Much easier than documented previously!

SQL Server Data Tools Deployment Method

The Deploy command in SQL Server Data Tools (SSDT) provides a simple and intuitive method to deploy a tabular model project from the SSDT authoring environment. However, this method should not be used to deploy to production servers. Using this method can overwrite certain properties in an existing model.

Deploying a tabular model using SSDT is a simple process; however, certain steps must be taken to ensure your model is deployed to the correct Analysis Services instance and with the correct configuration options.

Deployment in SSDT requires the properties page of the Tabular model to be configured properly. The properties and options are as follows:

Property

Default Setting

Description

Processing Option

Default

This property specifies the type of processing required when changes to objects are deployed. This property has the following options:

Default – Metadata will be deployed and unprocessed objects will be processed including any necessary recalculation of relationships, hierarchies and calculated columns.

Do Not Process – Only the metadata will be deployed. After deploying, it may be necessary to run a process operation on the deployed database to update and recalculate data.

Full – This setting specifies that both the metadata is deployed and a process full operation is performed. This assures that the deployed database has the most recent updates to both metadata and data.

Transactional Deployment

False

This property specifies whether or not the deployment is transactional. By default, the deployment of all or changed objects is not transactional with the processing of those deployed objects. Deployment can succeed and persist even though processing fails. You can change this to incorporate deployment and processing in a single transaction.

Server

<localhost>

This property, set when the project is created, specifies the Analysis Services instance by name to which the model will be deployed. By default, the model will be deployed to the default instance of Analysis Services on the local computer. However, you can change this setting to specify a named instance on the local computer or any instance on any remote computer on which you have permission to create Analysis Service objects.

Edition

<same as the edition of the instance the workspace server is located>

This property specifies the edition of the Analysis Services server to which the model will be deployed. The server edition defines various features that can be incorporated into the project. By default, the edition will be of the local Analysis Services server. If you specify a different Analysis Services server, for example a production Analysis Service server, be sure to specify the edition of the Analysis Services server.

Database

<project name>

This property specifies the name of the Analysis Services database in which model objects will be instantiated upon deployment. This name will also be specified in a reporting client data connection or an .bism data connection file. You can change this name at any time when you are authoring the model.

Model Name

Model

This property specifies the model name as shown in client tools, such as Excel and Power BI, and AMO.


image

This is where you can set what type of processing the Tabular model does once it has been deployed. For a development environment, it is best practice to set to Do Not Process due to the frequency of deployments. For deployments to more stable environments, such as UAT or Pre-Production, the Processing Option can be changed away from Do Not Process.

The Deployment Server, Version of SQL Server and Database Name can all be configured from this properties page.

Then, by right-clicking on the project, you can deploy your database to your server.

image

Introduction to Azure Data Catalog

With the rise of self-service business intelligence tools, like Power BI, and an increased engagement with data in the workplace, people’s expectations of where they can find expert information about data has changed. Where previously there would an expert that people would have to book time with in order to understand data, now people expect to get quick and detailed information about the data assets that an enterprise holds and maintains without going through a single contact. With Azure Data Catalog, data consumers can quickly discover data assets and gain knowledge about the data from documentation, tags and glossary terms from the subject matter experts. This post aims to give a brief introduction to Azure Data Catalog and what it can broadly be used for.

What is Azure Data Catalog?

Azure Data Catalog is a fully managed Azure service which is an enterprise-wide metadata catalogue that enables data discovery. With Azure Data Catalog, you register; discover; annotate; and, for some sources, connect to data assets. Azure Data Catalog is designed to manage disparate information about data; to make it easy to find data assets, understand them, and connect to them. Any user (analyst, data scientist, or developer) can discover, understand, and consume data sources. Azure Data Catalog is a one-stop central shop for all users to contribute their knowledge and build a community and culture of data.

What can Azure Data Catalog be used for?

As mentioned in the earlier headings, Azure Data Catalog can be used for data asset management; data governance; and data discovery. For data asset management, this means knowing what data is available and where; for data governance teams, this means answering questions like: where is my customer data? or what does this data model look like?; for data discovery, this means knowing which data is suitable for particular reports and who you can go to if you have any questions. There are some common scenarios for using Azure Data Catalog that Microsoft has put together, and it’s well worth reading to get a fuller understanding of what Azure Data Catalog can be used for.