- Union (UNION): Return a combination of two sets, eliminating duplicates
- Intersection(INTERSECT): Return a set consisting of members present in both of two sets
- Complement(EXCEPT): Return a set consisting of members present in one set but not the other
- Symmetric difference (A EXCEPT B UNION B EXCEPT A): Union of the two complements
- Cartesian product (A CROSS JOIN B): Set of all possible tuples from two sets
- Cardinality (COUNT(*))

### Testing set equality

One interesting problem is how to determine the equality of two sets in an RDBMS while allowing for duplicate rows. In set theory, two sets are equivalent if every member of set A is also a member of set B. That means we must have:A \ B = B \ A = {}, the empty set, where "\" is the operator for the complement operation. We can also look at the cardinality:

|A\B| = |B\A| = 0

For tables in an RDBMS with a primary key (which can have no duplicate rows), we can check equality simply:

(
SELECT * FROM A
EXCEPT
SELECT * FROM B
)
UNION
(
SELECT * FROM B
EXCEPT
SELECT * FROM A
)

which you might recognize as computing the symmetric difference. If this query returns no rows, the two tables are equal. (The parentheses are needed to overcome the default operator precedence in SQL Server. See EXCEPT and INTERSECT (Transact-SQL) for details) For tables in an RDBMS without a primary key (which enforces uniqueness), we need to use the cardinality to account for cases where some rows are duplicated. In SQL we can write:

SELECT (
SELECT count(*)
FROM (
SELECT * FROM A
EXCEPT
SELECT * FROM B
) [A\B]
) AS [|A\B|]
, (
SELECT count(*)
FROM (
SELECT * FROM B
EXCEPT
SELECT * FROM A
) [B\A]
) AS [|B\A|]

but this won't work if the two tables have the same unique rows but different duplicated rows. For example, if we include the above logic in a WITH statement such as this:

WITH A AS
(
SELECT * FROM (VALUES (1),(1),(2),(3),(4)) v(n)
)
, B AS
(
SELECT * FROM (VALUES (1),(2),(2),(3),(4)) v(n)
)
-- query from previous example --

we can see that the results indicate equality. That is, the orders of the two complements is indeed 0, but the tables are not equal, since table A has two rows of 1's but table B has two rows of 2's. What is happening here? SQL

*removes*duplicates as part of the set operations UNION, EXCEPT and INTERSECT!

To solve this problem, we need to compare the two tables row-by-row. However, since there is no primary key, how should we match them up? One simple way is to impose an ordering on the tables and assign a surrogate key. In this case we simply use the ROW_NUMBER() function. e.g.

WITH A(col1) AS
(
SELECT * FROM (VALUES (1),(1),(2),(3),(4)) v(n)
)
, B(col1) AS
(
SELECT * FROM (VALUES (1),(2),(2),(3),(4)) v(n)
)
,[A'] AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY col1) AS rn
FROM A
)
,[B'] AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY col1) AS rn
FROM B
)
SELECT (
SELECT count(*)
FROM [A'] a
INNER JOIN [B'] b
ON a.rn = b.rn
AND a.col1 = b.col1
) AS [|A join B|]
, ( SELECT COUNT(*) FROM A ) AS [|A|]
, ( SELECT COUNT(*) FROM B ) AS [|B|]

This produces the results:

|A join B| |A| |B|
4 5 5

We can see that, even though A and B have the same cardinality, the JOIN produces a different cardinality, showing us that at some point, the two sets diverge. Hence the two sets are not equal.

## No comments:

## Post a Comment