Jonathon

Jonathon Eve-O'Connor's Blog

Setup Of Oracle Attunity Connectors

Fairly often I come across a project where there is a requirement to extract data from an Oracle source. Whilst this can be done in SSIS using a standard ODBC connection there are some drawbacks to this – the main one for me is that when working with larger data volumes the speed of data transfer can be quite slow.

As such I have found the best solution is generally to setup the Attunity Connectors in order to facilitate quick extraction from the Oracle system. For a great comparison on the performance between straight ODBC and Attunity – please see this post from Len Wyatt.

Unfortunately the setup can be a little bit fiddly so I thought I would document the process here (as much for me as anyone else).

1. The first step is to determine the version of the Attunity connectors required – this depends upon your SQL server version:

SQL Server Version Attunity Connector Version Supported Oracle Versions
SQL Server 2016 v4.0 Oracle 10.x; Oracle 11.x; Oracle 12c
SQL Server 2014 v3.0 Oracle 10.x, Oracle 11.x, Oracle 12c
SQL Server 2012 v2.0 Oracle 9.2.0.4 and higher, Oracle 10.x, Oracle 11.x
SQL Server 2008 / 2008 R2 v1.2 Oracle 9.2.0.4 and higher, Oracle 10.x, Oracle 11.x

2. Following this you need to install the Oracle client. Generally you will be working in a 64-bit environment which will require both the 64-bit and 32-bit versions of the Oracle client to be installed (Data tools uses a 32-bit connection but deployed packages should run in 64-bit mode for optimum performance). The Oracle client can be downloaded from http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html you will need to sign up on the site and create an account if you don’t have one already.

NOTE: It is not enough to only download the connectors for SSIS and run the installation packages in order to retrieve data you must install the Oracle client.

Once you have determined the correct version of the Attunity connectors this you will need to select the relevant version of the Client which you wish to use. Because when Visual Studio is used to run SSIS package this runs in 32-bit mode, you will need the 32-bit version of the client installed on development machines. However, when you deploy the packages and run them from the SSIS Catalog this runs in 64-bit mode so you need to install the 64-bit version to get the best performance in production environments.

There are several version of Oracle clients available for download on the site, most recently I used the following version of the Oracle components. I find that these install relatively simply and provide the required features without requiring to much disk space or consuming to much in the way of resources.

image

Depending on the version of the client you have selected the installation instruction for Oracle will vary. However I would suggest you follow the following points:

1. Run the installation packages as an administrator by right clicking and selecting ‘Run as Administrator’ – previously when I have not done this it has led to unusual results.


2. You will need to install the clients on all machines which will be running the SSIS packages – this may be several servers including developer machines used for creating SSIS packages and servers used for schedules ETL runs.

3. The installation should be done in the following order:

- Install Oracle 32-bit Client (if required)
- Install Oracle 64-bit Client (if required)
- Restart machine

Once the client is installed a simple way to test that the client is working is to use SQL Developer to open the connection to the Oracle database. Once this is working correctly you then need to install the Attunity connectors.

Again depending on the system you are working on you will need to install either the 64-bit of 32-bit Attunity components for SSIS. However here the rules are slightly different.

- If you need to run in 64-bit or 64-bit and 32-bit mode, you should install only the 64-bit version of the components.
- If you need to run in 32-bit mode only then you should install only the 32-bit version of the compoents.

The latest version of the connectors (for SQL 2016) are available from: https://www.microsoft.com/en-us/download/details.aspx?id=52950

The installation of the components is pretty straightforward – once you downloaded them you run the executable and then restart the machine.

Once completed the Attunity components will be available for selection within the SSIS toolbox in the Data Flow as per the below. The connectors are a very versatile component and support many of the features you would expect from an SSIS component. Be aware that if you are looking for expressions, then these are to be found on the Data Flow Properties rather than the properties of the actual component.

image

Nonclustered Columnstore Indexes And Stored Procedures

I was recently working on a project with a developer who needed to execute a stored procedure which followed this pattern (amended for simplicity):

CREATE PROC ExampleProc

AS

IF (SELECT COUNT(*) FROM Sys.indexes WHERE name = 'NCCIDX_DemoIndex') > 0
     DROP INDEX [NCCIDX_DemoIndex] ON [dbo].[Demo]

INSERT INTO [dbo].[Demo]([Field1], [Field2], [Field3])
VALUES('Some','Test','Data')
     ,('More','Test','Data')

CREATE NONCLUSTERED COLUMNSTORE INDEX [NCCIDX_DemoIndex] ON [dbo].[Demo]
(
   [Field1],
    [Field2],
    [Field3]

)WITH (DROP_EXISTING = OFF)

Unfortunately on execution the procedure would fail intermittently and the following error message would be returned:

Msg 35330, Level 15, State 1, Procedure ExampleProc, Line 7
INSERT statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, then rebuilding the columnstore index after INSERT is complete.

The reason for the error message is that, at the time of compilation if the COLUMNSTORE index exists the optimiser decides the INSERT statement will fail (although we drop the index within the procedure this will not be taken into account by the optimiser).

If the procedure is executed and generates a plan at a time when the index does not exist it will run as expected. However we cannot resolve the issue reliably by first removing the index and then creating the procedure. The reason for this is that when the plan is subsequently flushed from the plan cache (something which can be forced by using DBCC FREEPROCCACHE) the optimiser will create a new plan and if the index exists when this happens the query will fail.

The solution I found was to add OPTION(RECOMPILE) to the INSERT statement, this causes it to be assessed at runtime when the index has been dropped stopping the error. Please be aware as this will force recompilation each time the procedure is run adding a small overhead, whilst this was quite acceptable within the DW environment, you will need to make your own assessment.

The amended procedure therefore follows the following format:

CREATE PROC ExampleProc

AS

IF (SELECT COUNT(*) FROM Sys.indexes WHERE name = 'NCCIDX_DemoIndex') > 0
     DROP INDEX [NCCIDX_DemoIndex] ON [dbo].[Demo]

INSERT INTO [dbo].[Demo]([Field1], [Field2], [Field3])
VALUES('Some','Test','Data')
     ,('More','Test','Data')
OPTION(RECOMPILE)

CREATE NONCLUSTERED COLUMNSTORE INDEX [NCCIDX_DemoIndex] ON [dbo].[Demo]
(
   [Field1],
    [Field2],
    [Field3]

)WITH (DROP_EXISTING = OFF)

Optimising SSIS Lookups with HASHBYTES()

I was recently working on a load for a Dimension table which worked like the below.

image

The package follows a fairly standard routine. Data for the Dimension is transformed into the required format using SSIS, following this it is inserted into the target table. On the way an initial check is made to determine if the record is new ‘LKU – Check Exists’ and a secondary check to determine if the record has changed, which is subsequently used in an update routine.

Although the package was incrementally loading, it was having to sometimes check several million rows to determine if these had changed, as the table was quite wide I decided to implement the HASHBYTES() function in order to take a hash of the columns for comparison rather than compare each of the columns.

The initial results were quite promising with the package running in around half the time, however on a particularly big load I noticed the package failed as it had run out of memory for the second Lookup (full cache was used). I found this odd because the HASHBYTES function only returned a single value and I anticipated this would be smaller than the sum of the columns, however on a little deeper investigation I found that by default any row returned using the HASHBYTES function are all of a default size 8000 bytes.

image

A quick check of the SQL Documentation of the HASHBYTES function at http://msdn.microsoft.com/en-GB/library/ms174415.aspx states that the size of the data returned for the HASHBYTES function when used with the SHA2-256 algorithm is 32 bytes meaning that most of this space was being wasted.

Therefore changing the formula from 1 to 2 below significantly increases the efficiency of the lookup and also make the cache size required smaller by a factor of around 250!

1. HASHBYTES('SHA2_256',[ProductName] + '|' + [CategoryName]) AS HashValue

2. CONVERT(VARBINARY(32),HASHBYTES('SHA2_256',[ProductName] + '|' + [CategoryName])) AS NewHash

Upon doing this the size of each row was significantly reduced, the package is running faster than before and most importantly there are no memory issues :)

image