Adatis

Adatis BI Blogs

New Features in SQL Server 2016 – Part 1: Database Engine

In early May 2015, SQL Server 2016 was made available in Preview.  To download the CTP2 version, click here.  This blog will list all of the new Database Engine features coming next year.  In futures posts, I will pick a couple of the enhancements and demonstrate how they will benefit DBA’s, Developers and BI specialists.  In the meantime, I am aiming to give the SQL community a high level understanding of what is new in 2016 and help people think about how some of them could benefit their clients and stakeholders. Database Engine Feature Enhancements There are lots of new, exciting additions to the database engine in SQL Server 2016.  They can be found below.  I have picked out my favourite five and provided additional detail on them. · ·        Transact-SQL Enhancements o   Columnstore Indexes o   In-Memory OLTP o   Live Query Statistics o   Query Store o   Temporal Tables o   Backup to Microsoft Azure o   Managed Backup o   Trace Flag 4199 o   FOR JSON o   Always Encrypted o   PolyBase o   Stretch Database ·        Transact-SQL Enhancements ·        System View Enhancements ·        Security Enhancements o   Row-Level Security o   Dynamic Data Masking o   New Permissions o   Transparent Data Encryption (TDE) ·        High Availability Enhancements ·        Tools Enhancements Columnstore Indexes Whilst SQL Server 2014 offered some improvements on 2012, the 2016 version is taking Columnstore indexes to the next level.  In addition, Azure SQL Databases can incorporate almost every feature in a standard on premise database. 1.       A table can have one updateable nonclustered columnstore index. 2.       The nonclustered columnstore index definition now supports filtered conditions.  As a result, the performance impact pm am OLTP table will be minimal. 3.       An in-memory table can have one columnstore index. Previously, only a disk-based table could have a columnstore index. 4.       A clustered columnstore index can have one or more nonclustered indexes. Previously, the columnstore index did not support nonclustered indexes. 5.       Supports primary and foreign keys by using a btree index to enforce these constraints on a clustered columnstore index. In-Memory OLTP The current In-Memory-OLTP offers up to 30x transactions.  In SQL Server 2016, you will also be able to apply it to more applications and benefit from increased concurrency.  In addition, you can use the new in-memory columnstore with in-memory OLTP, delivering 100x faster queries.    There are other benefits to the refined In-memory OLTP: 1.       Support ALTER operations for memory-optimized tables and natively compiled stored procedures. 2.       Support for natively compiled, scalar user-defined functions. 3.       Support for all Collations. 4.       Storage Improvements. a.       Ability to estimate memory requirements for memory optimised tables. b.       Once you determine the size, you need to provide disk space that is four times the size of durable, in-memory tables. 5.       Enhancements to transaction performance analysis reports. a.       Transaction performance collector in SQL Server Management Studio helps you evaluate if In-Memory OLTP. b.       Use the ‘Memory Optimization Advisor’ to help migrate table to in memory. 6.       Support for subqueries and query surface area in natively compiled stored procedures a.       E.g. BETWEEN, GROUP BY, ORDER BY, TOP, UPDATE, TRY/CATCH, etc. b.       The usual DISTINCT and ORDER BY do not work together. c.       Neither does combining TOP and PERCENT in a Select statement. System View Enhancements There are a number of views that will aid a developer, especially around security and query stats.  The enhancements are grouped into 3 sections. 1.       Row Level Security a.       sys.security_predicates – Returns a row for each security predicate in a database. b.       sys.security_policies – Returns a row for each security policy in the database. 2.       Query Store Catalog Views a.       7 new Query Store support views                                                                i.      sys.database_query_store_options                                                              ii.      sys.query_context_settings                                                            iii.      sys.query_store_plan                                                            iv.      sys.query_store_query                                                              v.      sys.query_store_query_text                                                            vi.      sys.query_store_runtime_stats                                                           vii.      sys.query_store_runtime_stats_interval 3.       Query Hints a.       MIN_GRANT_PERCENT b.      MAX_GRANT_PERCENT Row-Level Security (RLS) Microsoft’s explanation of row-level security reads: “Row level security (RLS) introduces a flexible, centralised, predicate-based evaluation that considers metadata or any other criteria the administrator determines as appropriate. This is used as a criterion to determine whether or not the user has the appropriate access to the data based on user attributes. “ This can be translated to something a little easier to ingest: 1.       Ability to control access to rows in a database table based on the characteristics of the user executing a query. a.       E.g. Group Memberships, such as company department or job status. 2.       Simplifies the design and coding of application security. 3.       Restriction logic is found in the database layer, as opposed to an application.  It helps limit the various levels of required security and therefore, increases reliability. 4.       Use ‘CREATE SECURITY POLICY’ T-SQL to set up RLS.   High Availability Enhancements If you are not familiar with High Availability in SQL Server 2012/14, click here for a quick introduction.  For everyone else, the below enhancements greatly enrich this feature: 1.       Load-balancing is now possible across a set of read-only replicas.  Previously, it used to always direct connections to the first available read-only replica. 2.       There are now 3 replicas that support automatic failover cluster.  It used to be 2. 3.       Group Managed Service Accounts now facilitate AlwaysOn Failover Clusters. * 4.       AlwaysOn Availability Groups can be configured to failover when a database goes offline.  You must change the setting ‘DB_FAILOVER’ option to ON.   *             You will need to update Windows Server R2 to avoid any downtime after a password change. Future Blogs As the Database Engine contains the majority of changes, I have included the other features in a series of other blog posts.  Click the links below to access them. 1.       Part 2 - Analysis Services - TBC 2.       Part 3 - Reporting Services - TBC 3.       Part 4 - Integration Services - TBC References For more information on all of the new SQL Server 2016 features, the below resources/blogs are highly recommended. ·        What’s New in SQL Server 2016 (Official Microsoft) - https://msdn.microsoft.com/en-us/library/bb500435(v=sql.130).aspx ·        T.K. Ranga Rengarajan’s SQL Server Blog - http://blogs.technet.com/b/dataplatforminsider/archive/2015/05/27/sql-server-2016-first-public-preview-now-available.aspx ·        Jen Underwood SQL Server BI Blog - http://sqlmag.com/blog/what-coming-sql-server-2016-business-intelligence

SQL PASS Summit 2014 – Kick Off

Day 1 has kicked off in Seattle, a remarkable city. Having arrived a week early, I’ve had plenty of time to check out the sights and the food and have enjoyed it immensely - a brilliant venue for PASS! There were a few announcements at this morning’s Keynote, mostly regarding Azure. Azure SQL Databases are gaining larger index handling, parallel queries, extended events and in-memory columnstore for data marts. Joseph Sirosh gave a talk about Machine Learning & Information management showing a cool example of Azure Stream Analytics using Microsoft Kinect sensor information of customer interactions in a shop being uploaded straight into Power Map! I am looking forward to hearing more on Machine Learning. There are also a handful of great improvements for Power BI. I am most looking forward to the new Live Ops Dashboards and drill-through actions! Combo Reports also look promising… Moving onto the first session, I chose to attend ‘What’s new in Microsoft Power Query for Excel’. As it turns out there’s not a massive amount of new stuff – some new data sources and a tick box when you do a merge to remove name prefixes. However one of these new sources is the long-awaited Analysis Services data source. The new ODBC Data Source is a great addition also. There was a mention regarding the possibility of a decoupled M-query SSIS component! We probably won’t hear of anything until later in 2015, unfortunately. I would say this was not a level 300 session, more like 100/200. The second session was ‘SQL Server 2014 Always On (High Availability and Disaster Recovery)’: a nice overview of what was brought in in 2012 and the newer 2014 features – these including an increased maximum number of secondary replicas, increased availability of readable secondary’s and the Add Azure Replica Wizard. Despite not being a DBA and it being a level 300 session, I found it easy to follow and absorbable. I feel many of the DBAs in the room may not have taken away any information they would have not already have known, however. Niko Neugebauer gave a fantastic, in depth session on ‘ETL Patterns with Clustered Columnstore Indexes’. It was a fast moving talk, despite the time spent waiting for some executions. It demanded your full attention! Definitely worthy of its 400 level. It left me a little tired for Marco Russo’s ‘DAX Patterns’ session which showed some examples of workarounds for common tabular weaknesses like distinct counts for Type 2 slowly changing dimensions and cumulative quantities. Overall it was a strong day. I am looking forward to tomorrow. More to follow…

SQL Server 2012 : Columnstore Index in action

One of the new SQL Server 2012 data warehouse features is the Columnstore index. It stores data by columns instead of by rows, similar to a column-oriented DBMS like the Vertica Analytic Database and claims to increase query performance by hundreds to thousands of times. The issue with indexes in a data warehouse environment is the number and broad range of questions that the warehouse may have to answer meaning you either have to introduce a large number of large indexes (that in many cases results in a larger set of indexes than actual data), plump for a costly spindle-rich hardware infrastructure, or you opt for a balanced hardware and software solution such as a Microsoft SQL Server 2008 R2 Fast Track Data Warehouse or a HP Business Data Warehouse Appliance where the approach is ‘index-light’ and you rely on the combination of high throughput and performance power to reduce the dependency on the traditional index. The Columnstore index is different in that, when applied correctly, a broad range of questions can benefit from a single Columnstore index, the index is compressed (using the same Vertipaq technology that PowerPivot and Tabular based Analysis Services share) reducing the effort required on the expensive and slow disk subsystem and increasing the effort of the fast and lower cost memory/processor combination. In order to test the claims of the Columnstore index I’ve performed some testing on a Hyper-V instance of SQL Server 2012 “Denali” CTP3 using a blown up version of the AdventureWorksDWDenali sample database. I’ve increased the FactResellerSales table from approximately 61,000 records to approximately 15.5 million records and removed all existing indexes to give me a simple, but reasonably large ‘heap’. Heap With a clear cache, run the following simple aggregation: SELECT     SalesTerritoryKey     ,SUM(SalesAmount) AS SalesAmount FROM    [AdventureWorksDWDenali].[dbo].[FactResellerSales] GROUP BY     SalesTerritoryKey ORDER BY      SalesTerritoryKey Table 'FactResellerSales'. Scan count 5, logical reads 457665, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 7641 ms, elapsed time = 43718 ms Non-Clustered Index Before jumping straight in with a columnstore index, let’s review performance using a traditional index. I tried a variety of combinations, the fastest I could get this query to go was to simply add the following: CREATE NONCLUSTERED INDEX [IX_SalesTerritoryKey] ON [dbo].[FactResellerSales] (    [SalesTerritoryKey] ASC ) INCLUDE ([SalesAmount]) WITH (     PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,     DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,     ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100, DATA_COMPRESSION = PAGE ) ON [PRIMARY] GO Notice I have compressed the index using page compression, this reduced the number of pages my data consumed significantly. The IO stats when I re-ran the same query (on a clear cache) looked like this: Table 'FactResellerSales'. Scan count 5, logical reads 26928, physical reads 0, read-ahead reads 26816, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 6170 ms, elapsed time = 5201 ms. Much better! Approximately 6% of the original logical reads were required, resulting in a query response time of just over 5 seconds. Remember though, this new index will really only answer this specific question. If we change the query, performance is likely to fall off the cliff and revert back to the table scan. Incidentally, adopting an index-light ([no index]) approach and simply compressing (and reloading to remove fragmentation) the underlying table itself, performance was only nominally slower than the indexed table with the added advantage of being able to perform for a large number of different queries. (Effectively speeding up the table scan. Partitioning the table can help with this approach too.) Columnstore Index Okay, time to bring out the columnstore. The recommendation is to add all columns into the columnstore index (Columnstore indexes do not support ‘include’ columns), practically there may be a few cases where you do exclude some columns. Meta data, or system columns that are unlikely to be used in true analysis are good candidates to leave out of the columnstore. However, in this instance, I am including all columns: CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_Columnstore] ON [dbo].[FactResellerSales] (     [ProductKey],     [OrderDateKey],     [DueDateKey],     [ShipDateKey],     [ResellerKey],     [EmployeeKey],     [PromotionKey],     [CurrencyKey],     [SalesTerritoryKey],     [SalesOrderNumber],     [SalesOrderLineNumber],     [RevisionNumber],     [OrderQuantity],     [UnitPrice],     [ExtendedAmount],     [UnitPriceDiscountPct],     [DiscountAmount],     [ProductStandardCost],     [TotalProductCost],     [SalesAmount],     [TaxAmt],     [Freight],     [CarrierTrackingNumber],     [CustomerPONumber],     [OrderDate],     [DueDate],     [ShipDate] )WITH (DROP_EXISTING = OFF) ON [PRIMARY] Now when I run the query on a clear cache: Table 'FactResellerSales_V2'. Scan count 4, logical reads 2207, physical reads 18, read-ahead reads 3988, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 235 ms, elapsed time = 327 ms. I think the figures speak for themselves ! Sub-second response and because all columns are part of the index, a broad range of questions can be satisfied by this single index. Storage The traditional (compressed) non-clustered index takes up around 208 MB whereas the Columnstore Index comes in a little less at 194 MB so speed and storage efficiency, further compounded when you take into account the potential additional indexes the warehouse may require. So, the downsides? Columnstore indexes render the table read-only. In order to to update the table you either need to drop and re-create the index or employ a partition switching approach. The other notable disadvantage, consistently witnessed during my tests, is the columnstore index takes longer to build. The traditional non-clustered index took approximately 21 seconds to build whereas the columnstore took approximately 1 minute 49 seconds. Remember though, you only need one columnstore index to satisfy many queries so that’s potentially not a fair comparison. Troubleshooting If you don’t notice a huge difference between a table scan and a Columnstore Index Scan, check the Actual Execution Mode of the Columnstore Index Scan. This should be set to Batch, not Row. If the Actual Execution Mode is reporting Row then your query cannot run in parallel: - Ensure, if running via Hyper-V, you have assigned more than one processor to the image. - Ensure the Server Property ‘Max Degee of Parallelism’ is not set to 1. Summary In summary, for warehousing workloads, a columnstore index is a great addition to the database engine with significant performance improvements even on reasonably small data sets. It will re-define the ‘index-light’ approach that the SQL Server Fast Track Data Warehouse methodology champions and help simplify warehouse based performance tuning activities. Will it work in every scenario? I very much doubt it, but it’s a good place to start until we get to experience it live in the field.