Friday, July 3, 2015

Set operations in SQL Server - Part 2: ANY() and ALL()

In the previous post Set operations in SQL Server - Part 1, we looked at the basic set options Union, Intersection, Complement (or subtraction) and how they can be used and combined.  We also looked at the problem of determining table equality in a relational database -- especially when duplicate rows are permitted.  In this installment, we will look at various ways of testing set membership.

Set membership is a property of an item that may or may not be a member of a given set.  In database terms, membership implies that there exists some row in a table that satisfies the membership test.  This usually means that one or more columns contain data that satisfy some constraint such as equality or inequality.  Testing for set membership is an essential operation in processing data in a relational database, such as SQL Server.

Starting easy

Let's start off with a simple example.  Suppose you have a table that contains information on bicycles.  Perhaps it looks something like this:

CREATE TABLE Bicycle ( BicycleId INT , ModelName VARCHAR(50) , Manufacturer VARCHAR(50) , Cost DECIMAL(6, 2) ) -- Add some data INSERT INTO Bicycle (BicycleId, ModelName, Manufacturer, Cost) VALUES (1, '700', 'Trek', 1000.01), (2, '500', 'Giant', 600.00)

Now, suppose this table was populated with various bike data and you want to see if there are any bicycles that cost over $1,000.00.  A typical query might look like:

IF EXISTS(SELECT * FROM Bicycle WHERE Cost > 1000.00) PRINT 'There is at least one bicycle costing over $1,000.00'

This is simple test for membership that prints a message if the constraint is satisfied. Note that it uses the function EXISTS(). There are actually four functions available to help us with membership testing like this: EXISTS(), IN(), ANY() (a.k.a SOME()) and ALL(). You're probably familiar with EXISTS() and IN() and may have used them before. Perhaps you are curious about the ANY() and ALL() functions and what they can do for you.

Using ANY()

Let's rewrite the sample query using the ANY() function instead:

IF ANY (SELECT Cost FROM Bicycle) > 1000.00 PRINT 'There is at least one bicycle costing over $1,000.00'

This query returns the same result as the first. It is actually a more direct translation of the requirement and is shorter to boot! Before we go further though, I'm going to rewrite this query this way:

IF 1000 < ANY (SELECT Cost FROM Bicycle) PRINT 'There is at least one bicycle costing over $1,000.00'

Why did I do that? Well, sometimes the query inside the call to the ALL() function can be long, making it harder to spot the condition being tested.

Using ALL()

Now that we've seen how ANY() operates, what about ALL()?  It turns out that this is fairly intuitive. The function performs the desired comparison against all rows in the query and only returns True if the comparison is True for every member examined. Sticking with out bicycle example:

IF 1000 < ALL (SELECT Cost FROM Bicycle) PRINT 'All bicycles cost more than $1,000.00'

If you try this against the sample data inserted above, you should receive no message, since there is at least one bicycle (the Giant 500) that costs less than $1,000.00.

Negating ANY() and ALL()

SQL allows for these operators when using ANY() and ALL():
    { = | <> | != | > | >= | !> | < | <= | !< } 
 
However the first time you use the negative forms, you may be surprised at the results.  For example, if you run this query:

IF 1000 !< ANY (SELECT Cost FROM Bicycle) PRINT 'No bicycle costs less than $1,000.00'

you might be surprised to see that the message is indeed printed! If so, that is probably because you are reading the query in the ordinary English meaning of the command, "If there is no bicycle that costs less than $1,000.00, print the message." However, SQL sees it differently. You see,

    <condition> ANY()

is really short hand for:

    value <condition> first item OR value <condition> second item, etc.

With our sample data, negating that:

    1000.00 !< ANY(...)

is executed as:

    1000.00 !< 1000.01 OR 1000.00 !< 600.00

Clearly. 1000.00 > 600.00 so the expression evaluates to True and the message is printed.  You might have expected behavior similar to:

IF NOT EXISTS(SELECT * FROM Bicycle WHERE Cost < 1000.00) PRINT 'No bicycle costs less than $1,000.00'

which would result in no message, but NOT EXISTS() is not the same as !< ANY() as shown. In fact, this is a case where using the alias SOME() can make the intent clearer.  That is:

IF 1000 !< SOME (SELECT Cost FROM Bicycle) PRINT 'No bicycle costs less than $1,000.00'

more clearly indicates what we're asking for. What about negating the ALL() function? Well, similar to ANY(), ALL() is evaluated like this:

    value <condition> first item AND value <condition> second item etc. So, this query:

IF 1000.00 !< ALL (SELECT Cost FROM Bicycle) PRINT 'No bicycle costs less than $1,000.00'

is executed as:

    1000.00 !< 1000.01 AND 1000.00 !< 600.00

which evaluates to False, so the message is not printed.

Note:  We can rewrite the !< ANY() query (that didn't work) to achieve the desired result in this way:

IF NOT 1000 < ALL (SELECT Cost FROM Bicycle) PRINT 'No bicycle costs less than $1,000.00'

which, when you think about it, makes perfect sense!

Summary

ANY() and ALL() are valuable functions in SQL Server that can be used to write succint queries that reflect the business logic.  However, we need to be careful when using them with negative comparisons, since their execution differs from the normal, English reading. 

No comments:

Post a Comment