Adatis

Adatis BI Blogs

Joins vs CROSS APPLY

Every professional (who has worked with SQL Server) will have used a Join operator.  However, the same cannot be said about the APPLY operator.  Although APPLY was introduced back in SQL Server 2005, there are still a number of developers who have barely seen the syntax, let alone utilised it’s capabilities. This blog focuses solely on INNER JOINS vs CROSS APPLY and when it is more beneficial to use the latter.  I will provide additional links at the end of the post, which will go into other features of APPLY, such as performance benefits against PIVOT’s. This blog has been written with the assumption that the reader already knows about Joins.  If you would like to refresh your memory, click here. What is APPLY? The original purpose of the APPLY operator was to use them with table-valued functions (TVF’s). The MSDN definition is: “The APPLY operator is similar to the JOIN operator, but the difference is that the right-hand side operator of APPLY can reference columns from the left-hand side”.   In simple terms, a join relies on self-sufficient sets of data, i.e. sets should not depend on each other.  On the other hand, CROSS APPLY is only based on one predefined set and can be used with another separately created set.  A worked example should help with understanding this difference.  Example Problem Scenario:             Find the last 3 orders from all customers that live in the UK. Database:            AdverntureWorksDW2014 – Download the free Microsoft database here. Tables:                 dbo.FactInternetSales                               dbo.DimCustomer 1.      Create a function that accepts CustomerKey and the Top(N) orders (e.g. last 3 orders) to be returned.   USE [AdventureWorksDW2014] GO CREATE FUNCTION dbo.GetTopOrders (@CustomerKey INT, @N INT) RETURNS TABLE AS RETURN -- find last 3 orders, placed on different days        SELECT DISTINCT TOP(@N)                        SalesOrderNumber,                     OrderDate,                     CustomerKey        FROM   [dbo].[FactInternetSales]        WHERE  CustomerKey = @CustomerKey        ORDER BY OrderDate DESC, SalesOrderNumber DESC 2.    Using traditional CROSS JOIN approach.  The C.CustomerKey reference (in the CROSS JOIN & function) is intended to dynamically pass through every customer, with the ‘3’ being the number of orders to show per customer. SELECT       C.CustomerKey,              C.FirstName,              C.LastName,              C.EmailAddress,              O.SalesOrderNumber,              O.OrderDate FROM         dbo.DimCustomer C CROSS JOIN              dbo.GetTopOrders(C.CustomerKey,3) AS O   The reason why this fails is because C.CustomerKey is referencing the left set (DimCustomer), but does not exist itself in the right set (the Function).  Joins require pre-defined sets from both sides, which is not how this query works. 3.     Using CROSS APPLY.  This operator reference the left set first, before then checking the second set against the first one.  The second set runs a row by row basis, passing the CustomerKey in a type of recursive variable. SELECT C.CustomerKey,              C.FirstName,              C.LastName,              C.EmailAddress,              O.SalesOrderNumber,              O.OrderDate FROM   dbo.DimCustomer C CROSS APPLY              dbo.GetTopOrders(C.CustomerKey,3) AS O   -- 26760 rows affected The only change to the code is changing CROSS JOIN to CROSS APPLY.  Although the sets are analysed separately, you can still return data from both – just like a Join.  4.     The above example would also apply to an INNER JOIN.  The left set only returns data if it matches with the right side, so if there is not a match, neither set is applied.  A way to ensure all records are returned, is to use OUTER APPLY. For the purpose of this article, I have added a new customer (not shown below) to the dbo.Customers table, who has yet to place an order.  We can now run the original CROSS APPLY query, but as an OUTER APPLY instead. SELECT C.CustomerKey,              C.FirstName,              C.LastName,              C.EmailAddress,              O.SalesOrderNumber,              O.OrderDate FROM   dbo.DimCustomer C OUTER APPLY              dbo.GetTopOrders(C.CustomerKey,3) AS O WHERE  O.SalesOrderNumber IS NULL   The additional ‘Where’ clause specifically returns the non-matching record, but in total 26761 rows where affected. Conclusion As demonstrated, CROSS APPLY is a very useful operator when referencing a Table Function or filtering on a subset of data.  The key benefit is the ability to use one defined set with another separately created set.  Unlike Joins, you can define a dynamic subset of data to match with the outer query. APPLY is a very underrated and underused within the SQL Server Community.  Look out for future blogs, where I demonstrate the uses of the other APPLY operators and how they can produce more efficient queries. References For further reading on the keywords in this article, try the recommended links below: 1.      Boost your T-SQL with CROSS APPLY - http://www.microsoftvirtualacademy.com/training-courses/boost-your-t-sql-with-the-apply-operator 2.      CROSS APPLY in SQL Server - http://www.microsoftvirtualacademy.com/training-courses/boost-your-t-sql-with-the-apply-operator 3.      Using APPLY – https://technet.microsoft.com/en-us/library/ms175156(v=sql.105).aspx 4.      INNER JOIN VS CROSS APPLY - http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/