Martyn

Martyn Bullerwell's Blog

SQL Server Delete does not release file space allocation

This one’s a short blog post, and by no means new, but it may help you in the future and maybe save some head scratching. Commonly in Data Warehousing projects you will be processing data through tables whilst staging and cleaning source data. These tables may commonly be heaps (no Clustered Index), that have no persistent data; rather they are a transient set of tables that get deleted from when your Warehouse loads each day.

What you may not know is that when a heap is deleted from (as opposed to truncated) the space that was used by that data is not automatically de-allocated from the table, and when new data is added the table space is increased, even though the table may be “empty”.

So what is actually going on? When deleting from a heap the database engine either locks the page or the row, as opposed to the whole table when deleting on a clustered index. During the delete on the heap the page is emptied, but the engine does not de-allocate the page. This behaviour only occurs on a heap, and can cause very large tables with no data in them. For my recent project, it caused the stage and clean processes to slow down gradually over time because the database engine was scanning huge tables with no data in them.

So, what can we do to stop this behaviour? There are three options (as documented by Microsoft here (see Locking Behaviour)):

1. When deleting from the table obtain an exclusive lock on the whole table, by using a TABLOCK hint as follows:

     DELETE FROM [TABLE] WITH (TABLOCK) WHERE [COLUMN] = 'CRITERIA'

2. If possible, truncate the whole table, this will de-allocate all pages.

3. Finally, my least favourite, add a clustered index to the table.

 

PerformancePoint Server and SQL 2008

So after many hours looking into this issue, it seems that the current release of PPS does not work on SQL Server 2008, and there seems to be no tricking it.  Should you manage to get round the install checks, while setting up the Database Connections to SQL and Analysis servers, the versions are checked once again, and the door is currently closed.

In the mean time, SQL 2005 will still work!  SQL 2008 is still due to Launch SQL 2008 in February, however it now looks as if the RTM (Release to Manufacture) will now not be until Q3 of 2008 - so we are unlikely to see,even early adoption

until July time.  Microsoft see  SQL server 2008 as a critical step forward towards their "broader vision for business intelligence", so I think we can be rest assured that PerformancePoint will be integrated before release date. 

In the mean time I will update you with any further developments on this, and will continue testing with further CTP's and patches.

SQL 2008 - what's changed, and what's new?

As I am sure you can all Guess SQL 2008 is the next release of SQL Server, and does not redesign the wheel, and is more an enhancement to 2005.   Microsoft claim that they have made it more secure, more reliable, more scalable etc... etc... 

Business Intelligence:

  • Analysis Scale and Performance: Allows more complex computations and aggregations and brings new cube design tools.
  • Block Computations: Provides a significant improvement in processing performance enabling users to increase the depth of their hierarchies and complexity of the computations.
  • Writeback: New MOLAP enabled writeback capabilities in SQL Server 2008 Analysis Services removes the need to query ROLAP partitions. This provides users with enhanced writeback scenarios from within analytical applications without sacrificing the traditional OLAP performance.
  • Enterprise Reporting Engine: Reports can easily be delivered both internally and externally with easier deployment and configuration.
  • Internet Report Deployment: Allows deploying of reports over the internet. 
  • Report Builder Enhancements: Easily build ad-hoc and author reports with any structure through Report Designer.
  • Report Server Application Embedding: Report Server application embedding enables the URLs in reports and subscriptions to point back to frontend applications.
  • Microsoft Office Integration: SQL Server 2008 provides new Word rendering that enables users to consume reports directly from within Microsoft Office Word. In addition, the existing Excel renderer has been greatly enhanced to accommodate the support of features like nested data regions, sub-reports as well as merged cell improvements. This lets users maintain layout fidelity and improves the overall consumption of reports from Microsoft Office applications.

Development:

  • Language Integrated Query (LINQ) See previos Blog Entries on LINQ: Enable developers to issue queries against data using a managed programming language such as C# or VB.NET, instead of SQL statements. Enable seamless, strongly-typed, set-oriented queries written in.NET languages to run against ADO.Net (LINQ to SQL), ADO.Net DataSets (LINQ to DataSets), the ADO.NET Entity Framework (LINQ to Entities) and to the Entity Data Service Mapping provider. Use the new LINQ to SQL provider that enables developers to use LINQ directly on SQL Server 2008 tables and columns.
  • ADO.Net Object Services: The Object services layer of ADO.NET enables the materialization, change tracking, and persistence of data as CLR objects. Developers using the ADO.NET framework can program against a database using CLR objects that are managed by ADO.NET. SQL Server 2008 introduces more efficient, optimized support that improves performance and simplifies development.
  • DATE/TIME: SQL Server 2008 introduces new date and time data types:
    • DATE – a date only type
    • TIME – a time only type
    • DATETIMEOFFSET – a time zone aware datetime type
    • DATETIME2 – a datetime type w/ larger fractional seconds and year range than the existing DATETIME type
  • HIERARCHY ID: Enable database applications to model tree structures in a more efficient way than currently possible. New system type, HierarchyId, can store values that represent nodes in a hierarchy tree. This new type will be implemented as a CLR UDT, and will expose several efficient and useful built-in methods for creating and operating on hierarchy nodes with a flexible programming model.
  • FILESTREAM Data: Allow large binary data to be stored directly in an NTFS file system while preserving an integral part of the database and maintaining transactional consistency. Enable the scale-out of large binary data traditionally managed by the database to be stored outside the database on more cost-effective storage without comprise.
  • Integrated Full Text Search: The Integrated Full Text search makes the transition between Text Search and relational data seamless while enabling users to use the Text Indexes to perform high-speed text searches on large text columns.
  • Sparse Columns: NULL data consumes no physical space which provides a highly efficient way of managing empty data in a database. For example, Sparse Columns allows object models that typically have numerous null values to be stored in a SQL Server 2005 database without experiencing large space costs.
  • Large User Defined Types: SQL Server 2008 eliminates the 8KB limit for User Defined Types (UDTs) allowing users to dramatically expand the size of their UDTs.
  • Spatial Data Types: Build spatial capabilities into your applications by using the support for spatial data.
    • Implement Round Earth solutions with the geography data type. Use latitude and longitude coordinates to define areas on the Earth's surface.
    • Implement Flat Earth solutions with the geometry data type. Store polygons, points, and lines that are associated with projected planar surfaces and naturally planar data, such as interior spaces

Security:

  • Transparent Data Encryption: Enable encryption of an entire database, data files or log files, without the need for application changes.
  • Extensible Key Management: Comprehensive solution for encryption and key management. Supports third-party key management and HSM products.
  • Auditing: Create and manage auditing via DDL enabling answers common questions such as "What data was retrieved?"
  • Built-in Forms Authentication: Built-in forms authentication enables users to easily switch between Windows and Forms.

Redundancy and Performance:

  • Enhanced Database Mirroring: Eliminates the need to specify the failover partner which enables existing applications to take advantage of mirroring.
  • Resource Governor: Provide a consistent and predictable response to end users with the introduction of Resource Governor.
  • Predictable Query Performance: Enable greater query performance stability and predictability by providing functionality to lock down query plans.
  • Data Compression: Enable data to be stored more effectively and reduce the storage requirements for your data. Data Compression also provides significant performance improvements for large I/O bound workloads like data warehousing.
  • Hot Add CPU: Dynamically scale a database on demand by allowing CPU resources to be added.
  • Declarative Management Framework (DMF):  Use DMF with SQL Server Management Studio to create policies that manage entities on the server, such as the instance of SQL Server, databases, and other SQL Server objects.
  • Performance Data Collection: More extensive performance data collection, a new centralized data repository for storing performance data and new tools for reporting and monitoring.
  • Backup Compression: Keeping disk-based backups online is expensive and time consuming. With SQL Server 2008 backup compression, less storage is required to keep backups online and backups run significantly faster since less disk I/O is required.
  • Partitioned Table Parallelism: Partitions enable organizations to manage large growing tables more effectively by transparently breaking them into manageable blocks of data. SQL Server 2008 builds on the advances of partitioning in SQL Server 2005 by improving the performance on large partitioned tables.
  • Star Join Query Optimizations: SQL Server 2008 provides improved query performance for common data warehouse scenarios. Star Join Query optimizations reduce query response time by recognizing data warehouse join patterns.
  • Grouping Sets: Grouping Sets is an extension to the GROUP BY clause that lets users define multiple groupings in the same query. Grouping Sets produces a single result set that is equivalent to a UNION ALL of differently grouped rows making aggregation querying and reporting easier and faster.
  • Change Data Capture: With Change Data Capture, changes are captured and placed in change tables. It captures complete content of changes and maintains cross table consistency and even works across schema changes. This enables organizations to integrate the latest information into the data warehouse.
  • MERGE SQL Statement: With the introduction of the MERGE SQL Statement, developers can more effectively handle common data warehousing scenarios like checking whether a row exists and then executing an insert or update.
  • SQL Server Integration Services (SSIS) Pipeline Improvements: Data Integration packages can now scale more effectively making use of available resources and managing the largest enterprise scale workloads. The new design improves the scalability of runtime into multiple processors.
  • SQL Server Integration Services (SSIS) Persistent Lookups: The need to perform lookups is one of the most common ETL operations. This is especially prevalent in data warehousing where fact records need to use lookups to transform business keys to their corresponding surrogates. SSIS increases the performance of lookups to support the largest tables.

The Difference Between MOSS 2007 and WSS 3.0

It always astounds me what Microsoft are willing to bundle with their software, Analysis Services or SSIS (SQL Server Integration Services) with SQL Server for example!  They haven't stopped, they bundle Windows Sharepoint Services with Windows Server 2003, with is basically a fully functional SharePoint Server, however they still have SharePoint Server 2007  as a product.  So what are the differences? At first glance it doesn't appear to be much, however for tight integration into the enterprise, it seems that MOSS 2007 (Microsoft Office SharePoint Server) is a must! I will highlight the most interesting bits that MOSS 2007 has over and above WSS 3.0 (Windows Sharepoint Services) :

  • User Profiles support - Allows each user to store profile information
  • Site Manager - Manage Navigation, Security and look and feel with drag an drop functionality
  • Enterprise Search Tools - numerous tools to search Sharepoint Sites and Portals across entire enterprises
  • Business Data Catalog - The Business Data Catalog (BDC) tightly integrates external data, providing access to external data residing within other business applications, and enabling the display of, and interaction with external data
  • Business data search - Search data residing in your business applications using the BDC (See Above)
  • Business Data Web Parts - Used for viewing lists, entities, and related information retrieved through the Business Data Catalog
  • Business document workflow support - Automate document review, approval, signature collection, and issue tracking using workflow applications
  • Retention and auditing policies - Allows customized information management policies to control retention period, expiration, and auditing
  • Browser-based forms - Integration with InfoPath, allows integration on to SharePoint of created forms and surveys.
  • Integrated, flexible spreadsheet publishing - Allows information workers to easily choose what they want to share with others and determine how others can interact with published spreadsheets.
  • Share, manage, and control spreadsheets - Provides access to spreadsheet data and analysis through server-calculated, interactive spreadsheets from a Web browser. Can help to protect any sensitive or proprietary information embedded in documents, such as financial models, and audits their usage.
  • Web-based business intelligence using Excel Services - Allows spreadsheets to be broadly and easily shared. Fully interactive, data-bound spreadsheets including charts, tables, and PivotTable views can be created as part of a portal, dashboard, or business scorecard.
  • Data Connection Libraries - Document Libraries storing ODCs (Office Data Connections), Making one single location for all data connections.
  • Business Data actions -  Easily create actions that open Web pages, display the user interfaces of other business applications, launch InfoPath forms, and perform other common tasks.
  • Integrated business intelligence dashboards - Rich, interactive BI dashboards that assemble and display business information from disparate sources by using built-in Web parts, Excel spreadsheets, Reporting Services, or a collection of business data connectivity Web Parts.
  • Report Center - Provides consistent management of reports, spreadsheets, and data connections.
  • Key performance indicators - A KPI web Part can connect to Analysis Services, Excel Spreadsheets, SharePoint Lists, or manual entered data.
  • Notification service - Improved allowing workflow users to receive emails by default, and improved triggering and filtering
  • Single Sign-On (SSO) - Allows the User to log onto a variety of applications with a single user name and password, therefore integrating back office applications, and helps pre-population with integration to the Profile part of MOSS 2007.
  • Social Networking Part - Connect to Public My Site pages to help establish connections between colleagues with common interests
  • Personal Site Support - Allows Users to create Personal Web Sites  
  • Content syndication - Use RSS feeds to syndicate content managed in a portal site.

 

This is just a few of the extras you get in MOSS 2007, and I have highlighted those that are more angled to Business Intelligence! As for the cost, you can find that here: (In USD$ I am afraid) http://office.microsoft.com/en-us/sharepointserver... 

MOSS 2007 sits centre stage in Microsoft's BPIO (Business Productivity Infrastructure Optimisation) vision for the "People Ready Business"... Lets see what it can do!