Victor

Victor Mendes' Blog

SSRS Stacked Column Charts–Adjust Columns with Different Widths

I thought I’d share my findings on this subject when I was unable to find an example. You can adjust the width of all columns in a chart to the same size by modifying the chart series attribute called PointWidth, which can be found here.

This post will show you how to adjust each column with a different width for a stacked column chart.

Requirement

This was a client request who wanted to use a column chart to identify potential customers, either new or existing for a particular product.

A typical report requirement will read:

  • As a Marketing Analyst, I want to identify the potential for sales of new customers and existing customers following the launch of a new product. So that we can increase overall sales by X amount with a more effective campaign.

An example of the report is below.

image

Dataset

The dataset that allows the comparison of Customer Sales Size Percentage against the Customer Segment Size Percentage for a particular product is in the following data table.

ID AgeBand CustomerSalesSizePerc CustomerSizePerc Gender Product

1

U16

60

10

M

X

2

16-24

40

25

M

X

3

25-34

35

35

M

X

4

35-44

15

15

M

X

5

45-60

10

10

M

X

6

60+

2

5

M

X

So for ID=1, 10% of all male customers are in the age band U16. 60% of these males have purchased product X for a given period. And so on.

Reporting Services (RS) – Varying Column Report

In RS create a dataset as above. Add a Stacked Column chart and link the dataset to the chart.

image

Click on the edge of the chart to show the Chart Data window. Add the Customer Sales Size Percentage to the value pane and add Age Band to the category group pane and series group pane.

image

A category for each age band will be created from U16 to 60+. We must also add age band to the series group to be able to modify the column properties.

Bring up the age band category group properties window. By default “General” will be highlighted in the left pane. Click the fx icon to the right of the label text box. Add the following to the expression window.

  • =Fields!AgeBand.Value & " ( " & Fields!CustomerSizePerc.Value & " % )"

In the left pane highlight “Sorting” and sort by ID. Now bring up the age band series group properties window and also sort by ID.

Lets preview the chart (text has been added to the title, y-axis and x-axis).

image

Although we have added the male percentage population to the age band category, it still isn’t evident which of the age bands has a higher or lower population percentage.

Adjust Columns Width

A particular column width can be adjusted using the PixelPointWidth attribute found in the chart series custom attributes.

We adjust the width of a column by specifying it’s width in pixels.

image

A switch function is used to specify the column pixel width for each column (age band).

The width of a column is measured in pixels. To ensure the columns are in proportion with one another, I have multiplied the customer size percentage by a factor of 2. The column sizes are relative to the size of your chart and therefore need to be calculated correctly. Otherwise you will notice either large gaps between columns or overlapping.

Add the following expression to the PixelPointWidth expression window:

image

=Switch(
Fields!Id.Value=1, Fields!CustomerSizePerc.Value*2
,Fields!Id.Value=2, Fields!CustomerSizePerc.Value*2
,Fields!Id.Value=3, Fields!CustomerSizePerc.Value*2
,Fields!Id.Value=4, Fields!CustomerSizePerc.Value*2
,Fields!Id.Value=5, Fields!CustomerSizePerc.Value*2
,Fields!Id.Value=6, Fields!CustomerSizePerc.Value*2)

Lets preview our chart now.

image

We can now see there is potential for new customers for the age band U16. There is also potential for existing customers for the age bands 16-24 and 25-34. Less potential for the remaining age bands.

Summary

Hopefully this has shown a very neat and simple solution to compare segmented customer sizes for there uptake of a particular product. More time was spent writing the query to generate the required dataset, then building the chart.

What about a Column Chart?

You might be wondering why I didn’t use a column chart? I did. Although the result was similar it was not aesthetically pleasing. The more I increased the PixelWidthPoint factor the more unpleasant it became.

As we must use age band in both groups of the category and series, the column chart will try and evenly distribute six columns per age band horizontally. You can set the space between bars to zero but this applies to the category level and not the series.

When you try to apply the expression above for the PixelWidthPoint the bars are either overlapping or out of proportion. The column chart example is below.

image

Data Quality Services (RC0) - Cleanse and Train the KB

This post will show a new KB (Knowledge Base) being trained using the new feature via domain management which allows the KB to import the knowledge gained from a cleansing project. In the previous release, CTP3, the KB was trained using Knowledge Discovery from either a table or excel file. This could also have been done interactively via domain management, but not directly from a cleansed project.

In this example we will retrain the new KB by importing knowledge learnt from two cleansed samples of data. The outputs of the cleansed projects will be imported into the KB. Before retraining our KB we will need to tweak one of the cleansing configuration settings. This will retrain our new KB faster.

New Knowledge Base

If you have not created a KB before a step by step guide can be found here.

The new KB was called UK Counties. A single domain value was used and it was called UK County (Full Name).

Using the Knowledge Discovery feature we first trained our empty KB with a unique list of UK Counties. The Domain Management feature could also have been used by entering domain values or importing them from an excel file.

Training the KB doesn't stop there, further training is required to expand the KB knowledge. Interactively we could add new domain values or synonyms to the already existing values such as Bucks for Buckinghamshire if we knew this was a potential synonym. For this example we will be using the knowledge gained from our cleansing project to retrain our KB.

Below is a screen shot of some of the domain values prior to retraining and being published to the DQS server.

A90_NewKB

Configuring Cleansing Projects

During cleansing a computer-assisted process will analyse how the source data conforms to the knowledge in the KB and a confidence level is determined. There are two confidence level thresholds: suggestions and auto corrections. Suggestions can be approved or rejected. Auto corrections are already approved, but it is possible to reject them as well.

These thresholds can be set in the general settings for a cleansing project from the DQS configuration area.

By default 0.6 (60%) and 0.8 (80%) are set respectively for suggestions and auto corrections.

As the KB is still young we need to retrain it with some samples of our real data and therefore make more suggestions. Hence we will reduce the confidence level percentage for suggestions down to 0.4 (40%).

B02_Config

N.B. This can go the other way where too many suggestions are given or a source value could be mapped to a completely incorrect domain value. As the knowledge of KB increases the confidence level for suggestions should also be increased to isolate new values and minimise interactive cleaning.

Training the KB

There are a number of ways to train the KB

  1. Knowledge Discovery,  or
  2. Domain Management by either
    1. Interactively entering or editing domain values,
    2. Importing domain values from an excel file, or
    3. Using the completed output of a cleansing project.

The final option is the new feature added to RC0 which makes managing knowledge within the KB simpler.

Let’s look at our first sample of data.

B03_Config

You will notice a number of data quality issues such as postcode and county, town and county or a short county name. Our objective is to clean this data to the County Full Name domain value in the KB.

If you have not created a data quality project before a step by step guide can be found here.

We will skip to the end of our cleansing project and analyse the results.

On successful completion we can see from the results that 0 records were corrected and 7 records have suggested domain values. We can now perform an interactive cleanse by which we approve or reject the suggestions made. Where the data confidence level has not been met for suggestions of 40% or auto corrections of 80% it will be considered as a new value. For this particular example we should not be expecting new values as we have a definitive list of counties and therefore we will need to add corrected values manually.

Let’s start with the suggested values. All are correct and therefore only need to be approved. We approve all values by clicking the check box under the Approve column or we can click the Approve All icon C08_DQP. All suggested values will move into the Corrected tab area.

C05_DQP

Now let’s review the new tab below. Three new values in the source data have been found, the analysis could not confidently associate any of the three values with a confidence level greater than or equal to 40% and therefore has presented them as new values.

C06_DQP

Looking at the values we know the county is Middlesex as is in the KB, hence we manually enter this value into the Correct to column. Now we can approve each value or we can click the Approve All icon.

C07_DQP

All values have now moved to the corrected tab.

C09_DQP

Our data cleanse results are now ready for final review before finishing the project. Finishing a project no longer allows the project to be re-run for cleansing purposes as the knowledge gained would be lost, however you will be able to interactively clean to alter the final results. It’s these results and knowledge that is imported back into the KB, but the project must be marked as finished.

It is the CountyName_Source that we want to feed back to the KB as synonyms for the domain values found or modified in the CountyName_Output.

C10_DQP

To feed this new acquired knowledge we open our KB via Domain Management and can now use the new feature Import project values.

D01_KBDM

We select the finished cleansing project and uncheck Add values from New Tab.

D02_KBDM

Only the new knowledge from the project has been added to the KB. The Show Only New check box is checked automatically.

D03_KBDM

If we uncheck the Show Only New check box and scroll down to London and Middlesex you will see the synonyms for these counties have been added to the KB.

D04_KBDM

We now publish the KB to the DQS server and are ready to analyse another data sample.

E01_DQP

As above we create a new cleansing project and analyse the data. The newly found knowledge from the previous cleansing project has corrected five values with 4 values suggested, leaving one value as new.

If we view the reason column for the corrected tab, four of the values were associated with the synonyms of domain values within the KB from the knowledge learned. 'London.' was cleansed and had a confidence greater than or equal to 80% and therefore auto corrected.

E03_DQP

In the suggested tab the correct to values are all correct and only need approving.

E04_DQP

The new value failed the confidence level of 40%, however it could have been found and a suggestion made if the threshold was reduced further. This is not a problem as the correct value was entered and approved.

E06_DQP

Finally we review our results and finish the project.

E07_DQP

We now feed this new knowledge back into the KB via Domain Management.

E08_DQP

The new synonyms will now be added to the existing KB domain values.

E09_DQP

This has been a simple example on how to train and retrain a new KB. By having a definitive list of counties the need to analyse our data prior to creating the KB is somewhat reduced.

When building a KB it is important to acquire knowledge from samples of source data using any of the features available. From this example the output of a cleansing project provides a simpler method where the confidence level for suggested values is much lower. Once the KB contains sufficient knowledge for cleansing the confidence level can be increased and larger sets of data can be cleansed.

DQS is a great tool for the power user to maintain content with little input from IT. In posts to follow I will also review Matching projects and the SSIS DQS component.