SQL (pronounced /ˌɛsˌkjuˈɛl/ S-Q-L; or alternatively as /ˈsiːkwəl/ "sequel") stands for Structured Query Language. SQL is a standard language that was designed to query and manage data in relational database management systems (RDBMSs).

AN RDBMS is a database management system based on the relational model (a semantic model for representing data), which in turn is based on two mathematical branches: set theory and predicate logic.

"NULL marker" or just "NULL" (/nʌl/) is not a NULL value but rather a marker for a missing value.

SQL is both an ANSI and ISO standard language based on the relational model, designed for querying and managing data in an RDBMS.

SQL has several categories of statements, including data definition language (DDL), data manipulation language (DML), and data control language (DCL).

  • DDL deals with object definitions and includes statements such as CREATE, ALTER, and DROP.

  • DML allows to query and modify data and includes statements such as SELECT, INSERT, UPDATE, DELETE, TRUNCATE, and MERGE.

  • DCL deals with permissions and includes statements such as GRANT and REVOKE.

Microsoft provides T-SQL as a dialect of, or an extension to, SQL in SQL Server—its on-premises RDBMS flavor, and in Azure SQL and Azure Synapse Analytics—its cloud-based RDBMS flavors.

T-SQL is based on standard SQL, but it also provides some nonstandard/proprietary extensions. Moreover, T-SQL does not implement all of standard SQL.

To run T-SQL code against a database, a client application needs to connect to a SQL Server instance and be in the context of, or use, the relevant database.

  • The application can still access objects from other databases by adding the database name as a prefix. That’s the case with both SQL Server and Azure SQL Managed Instance.

  • Azure SQL Database does not support cross-database/three-part name queries.

SQL Server supports a feature called contained databases that breaks the connection between a database user and an instance-level login.

  • The user (Windows or SQL authenticated) is fully contained within the specific database and is not tied to a login at the instance level.

  • When connecting to SQL Server, the user needs to specify the database he or she is connecting to, and the user cannot subsequently switch to other user databases.

Unless specified otherwise, all T-SQL references to the name of a database object can be a four-part name in the following form:

-- Machine -> * Servers (instances) -> * Databases -> * Schemas -> * Tables, * Views

server_name.[database_name].[schema_name].object_name

| database_name.[schema_name].object_name

| schema_name.object_name

| object_name

1. Data Integrity

SQL provides several mechanisms for enforcing data integrity:

  • PRIMARY KEY constraint

  • FOREIGN KEY constraint with actions like CASCADE, SET NULL, RESTRICT

  • NOT NULL constraint

  • CHECK constraint

  • UNIQUE constraint

  • DEFAULT constraint

  • Triggers

  • Stored procedures

USE TSQLV6;

DROP TABLE IF EXISTS dbo.Employees;

CREATE TABLE dbo.Employees (
    empid INT NOT NULL,
    firstname VARCHAR(30) NOT NULL,
    lastname VARCHAR(30) NOT NULL,
    hiredate DATE NOT NULL,
    mgrid INT NULL,
    ssn VARCHAR(20) NOT NULL,
    salary MONEY NOT NULL
);

1.1. PRIMARY KEY

A primary key constraint enforces the uniqueness of rows and also disallows NULLs in the constraint attributes.

  • Each unique combination of values in the constraint attributes can appear only once in the table—in other words, only in one row.

  • An attempt to define a primary key constraint on a column that allows NULLs will be rejected by the RDBMS.

  • Each table can have only one primary key.

ALTER TABLE dbo.Employees
  ADD CONSTRAINT PK_Employees
  PRIMARY KEY (empid);

To enforce the uniqueness of the logical primary key constraint, SQL Server will create a unique index behind the scenes.

  • A unique index is a physical object used by SQL Server to enforce uniqueness.

  • Indexes (not necessarily unique ones) are also used to speed up queries by avoiding sorting and unnecessary full table scans (similar to indexes in books).

1.2. UNIQUE

A unique constraint enforces the uniqueness of rows, allowing to implement the concept of alternate keys from the relational model in a database.

Unlike with primary keys, multiple unique constraints can be defined within the same table.

Also, a unique constraint is not restricted to columns defined as NOT NULL.

ALTER TABLE dbo.Employees
  ADD CONSTRAINT UNQ_Employees_ssn
  UNIQUE(ssn);

For the purpose of enforcing a unique constraint, SQL Server handles NULLs just like non-NULL values.

  • Consequently, for example, a single-column unique constraint allows only one NULL in the constrained column.

However, the SQL standard defines NULL-handling by a unique constraint differently, like so: “A unique constraint on T is satisfied if and only if there do not exist two rows R1 and R2 of T such that R1 and R2 have the same non-NULL values in the unique columns.”

  • In other words, only the non-NULL values are compared to determine whether duplicates exist.

  • Consequently, a standard single-column unique constraint would allow multiple NULLs in the constrained column.

1.3. FOREIGN KEY

A foreign key enforces referential integrity.

  • It is defined on one or more attributes in what’s called the referencing table and points to candidate key (primary key or unique constraint) attributes in what’s called the referenced table.

  • Note that the referencing and referenced tables can be one and the same.

  • The foreign key’s purpose is to restrict the values allowed in the foreign key columns to those that exist in the referenced columns.

DROP TABLE IF EXISTS dbo.Orders;

CREATE TABLE dbo.Orders (
    orderid INT NOT NULL,
    empid INT NOT NULL,
    custid VARCHAR(10) NOT NULL,
    orderts DATETIME2 NOT NULL,
    qty INT NOT NULL,
    CONSTRAINT PK_Orders
      PRIMARY KEY (orderid)
);
-- enforce an integrity rule that restricts the values supported by the empid column in the Orders table to the values that exist in the empid column in the Employees table.
ALTER TABLE dbo.Orders
  ADD CONSTRAINT FK_Orders_Employees
  FOREIGN KEY(empid)
  REFERENCES dbo.Employees(empid);
-- restrict the values supported by the mgrid column in the Employees table to the values that exist in the empid column of the same table.
ALTER TABLE dbo.Employees
  ADD CONSTRAINT FK_Employees_Employees
  FOREIGN KEY(mgrid)
  REFERENCES dbo.Employees(empid);
Note that NULLs are allowed in the foreign key columns (mgrid in the last example) even if there are no NULLs in the referenced candidate key columns.

1.4. CHECK

A check constraint is used to define a predicate that a row must meet to be entered into the table or to be modified.

ALTER TABLE dbo.Employees
  ADD CONSTRAINT CHK_Employees_salary
  CHECK(salary > 0.00);
Note that a check constraint rejects an attempt to insert or update a row when the predicate evaluates to FALSE. The modification will be accepted when the predicate evaluates to either TRUE or UNKNOWN.

1.5. DEFAULT

A default constraint is associated with a particular attribute.

  • It’s an expression that is used as the default value when an explicit value is not specified for the attribute when inserting a row.

ALTER TABLE dbo.Orders
  ADD CONSTRAINT DFT_Orders_orderts
  DEFAULT(SYSDATETIME()) FOR orderts;

When done, run the following code for cleanup:

DROP TABLE IF EXISTS dbo.Orders, dbo.Employees;

2. Logical Query Processing

The logical query processing in standard SQL defines how a query should be processed and the final result achieved.

  • The database engine is free to physically process a query differently by rearranging processing phases, as long as the final result would be the same as that dictated by logical query processing.

  • The database engine’s query optimizer can—and in fact, often does—apply many transformation rules and shortcuts in the physical processing of a query as part of query optimization.

USE TSQLV6;

SELECT empid, YEAR (orderdate) AS orderyear, COUNT(*) AS numorder
FROM Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR (orderdate)
HAVING COUNT(*) > 1
ORDER BY empid, orderyear;
If an identifier is irregular—for example, if it has embedded spaces or special characters, starts with a digit, or is a reserved keyword—it must be delimited. There are a couple of ways to delimit identifiers in T-SQL. One is the standard SQL form using double quotes—for example, "Order Details". Another is the T-SQL- specific form using square brackets—for example, [Order Details].

In most programming languages, the lines of code are processed in the order that they are written. In SQL, things are different. Even though the SELECT clause appears first in the query, it is logically processed almost last. The clauses are logically processed in the following order:

FROM Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate)
HAVING COUNT(*) > 1
SELECT empid, YEAR(orderdate) AS orderyear, COUNT(*) AS numorders
ORDER BY empid, orderyear
FROMWHEREGROUP BYHAVINGSELECTExpressionsDISTINCTORDER BYTOP/OFFSET-FETCH

2.1. FROM

The FROM clause is the very first query clause that is logically processed, which is used to specify the names of the tables to query and table operators that operate on those tables.

FROM Sales.Orders

2.2. WHERE

In the WHERE clause, a predicate, or logical expression is specified to filter the rows returned by the FROM phase.

FROM Sales.Orders
WHERE custid = 71
T-SQL uses three-valued predicate logic, where logical expressions can evaluate to TRUE, FALSE, or UNKNOWN. With three-valued logic, saying “returns TRUE” is not the same as saying “does not return FALSE.” The WHERE phase returns rows for which the logical expression evaluates to TRUE, and it doesn’t return rows for which the logical expression evaluates to FALSE or UNKNOWN.

2.3. GROUP BY

The GROUP BY phase is used to arrange the rows returned by the previous logical query processing phase in groups determined by the elements, or expressions.

FROM Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate)
  • If the query is a grouped query, all phases subsequent to the GROUP BY phase— including HAVING, SELECT, and ORDER BY—operate on groups as opposed to operating on individual rows.

  • Each group is ultimately represented by a single row in the final result of the query.

  • All expressions specified in clauses that are processed in phases subsequent to the GROUP BY phase are required to guarantee returning a scalar (single value) per group.

    SELECT empid, YEAR(orderdate) AS orderyear, freight -- sum(freight) AS totalfreight
    FROM Sales.Orders
    WHERE custid = 71
    GROUP BY empid, YEAR(orderdate);
    Msg 8120, Level 16, State 1, Line 1
    Column 'Sales.Orders.freight' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
    Total execution time: 00:00:00.016
    • Expressions based on elements that participate in the GROUP BY clause meet the requirement because, by definition, each such element represents a distinct value per group.

    • Elements that do not participate in the GROUP BY clause are allowed only as inputs to an aggregate function such as COUNT, SUM, AVG, MIN, or MAX.

      • Note that all aggregate functions that are applied to an input expression ignore NULLs.

        The COUNT(*) function isn’t applied to any input expression; it just counts rows irrespective of what those rows contain.

        • For example, consider a group of five rows with the values 30, 10, NULL, 10, 10 in a column called qty.

        • The expression COUNT(*) returns 5 because there are five rows in the group, whereas COUNT(qty) returns 4 because there are four known (non-NULL) values.

      • To handle only distinct (unique) occurrences of known values, specify the DISTINCT keyword before the input expression to the aggregate function, like COUNT(DISTINCT qty), AVG(DISTINCT qty) and so on.

2.4. HAVING

Whereas the WHERE clause is a row filter, the HAVING clause is a group filter.

  • Only groups for which the HAVING predicate evaluates to TRUE are returned by the HAVING phase to the next logical query processing phase.

  • Groups for which the predicate evaluates to FALSE or UNKNOWN are discarded.

  • The HAVING clause is processed after the rows have been grouped, so aggregate functions can be referred to in the HAVING filter predicate.

    SELECT empid, YEAR(orderdate) AS orderyear, SUM(freight) AS totalfreight
    FROM Sales.Orders
    WHERE custid = 71
    GROUP BY empid, YEAR(orderdate)
    -- filters only groups (employee and order year) with more than one row, and total freight with more than 500.0
    HAVING COUNT(*) > 1 AND SUM(freight) > 500.0
    ORDER BY empid, YEAR(orderdate)
    1	2021	711.13
    2	2022	672.16
    4	2022	651.83
    6	2021	628.31
    7	2022	1231.56

2.5. SELECT

The SELECT clause is where to specify the attributes (columns) to return in the result table of the query.

SELECT empid, YEAR(orderdate) AS orderyear, COUNT(*) AS numorders
FROM Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate)
HAVING COUNT(*) > 1
  • The SELECT clause is processed after the FROM, WHERE, GROUP BY, and HAVING clauses, which means that aliases assigned to expressions in the SELECT clause do not exist as far as clauses that are processed before the SELECT clause are concerned.

    It’s a typical mistake to try and refer to expression aliases in clauses that are processed before the SELECT clause, such as in the following example in which the attempt is made in the WHERE clause:

    SELECT orderid, YEAR(orderdate) AS orderyear
    FROM Sales.Orders
    WHERE orderyear > 2021;
    Msg 207, Level 16, State 1, Line 3
    Invalid column name 'orderyear'.

    One way around this problem is to repeat the expression YEAR(orderdate) in both the WHERE and SELECT clauses:

    SELECT orderid, YEAR(orderdate) AS orderyear
    FROM Sales.Orders
    WHERE YEAR(orderdate) > 2021;
In addition to supporting the AS clause, T-SQL supports the form <expression> AS <alias>, and also supports the forms <alias> = <expression> (“alias equals expression”) and <expression> <alias> (“expression space alias”).

In relational theory, a relational expression is applied to one or more input relations using operators from relational algebra, and returns a relation as output, that is, a relation in SQL is a table, and a relational expression in SQL is a table expression.

Recall that a relation’s body is a set of tuples, and a set has no duplicates. Unlike relational theory, which is based on mathematical set theory, SQL is based on multiset theory.

  • The mathematical term multiset, or bag, is similar in some aspects to a set, but it does allow duplicates.

  • A table in SQL isn’t required to have a key.

    • Without a key, the table can have duplicate rows and therefore isn’t relational.

    • Even if the table does have a key, a SELECT query against the table can still return duplicate rows. SQL query results do not have keys.

  • SQL provides the means to remove duplicates using the DISTINCT clause to return a relational result.

    SELECT DISTINCT empid, YEAR(orderdate) AS orderyear
    FROM Sales.Orders
    WHERE custid = 71;

2.6. ORDER BY

In terms of logical query processing, ORDER BY comes after SELECT.

  • With T-SQL, elements can also be specified in the ORDER BY clause that do not appear in the SELECT clause, meaning to sort by something that don’t necessarily want to be returned.

    SELECT empid, firstname, lastname, country
    FROM HR.Employees
    ORDER BY hiredate;
  • However, when the DISTINCT clause is specified, the ORDER BY are restricted to list only elements that appear in the SELECT list.

    SELECT DISTINCT empid, firstname, lastname, country
    FROM HR.Employees
    ORDER BY hiredate;
    Msg 145, Level 15, State 1, Line 1
    ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

One of the most important points to understand about SQL is that a table—be it an existing table in the database or a table result returned by a query—has no guaranteed order. That’s because a table is supposed to represent a set of rows (or multiset, if it has duplicates), and a set has no order.

  • It means that when querying a table without specifying an ORDER BY clause, SQL Server is free to return the rows in the output in any order.

  • The only way to guarantee the presentation order in the result is with an ORDER BY clause.

  • However, realizing that if specifying an ORDER BY clause, the result can’t qualify as a table, because it is ordered. Standard SQL calls such a result a cursor.

2.7. TOP

The TOP filter is a proprietary T-SQL feature that can be used to limit the number or percentage of rows queried returns. It relies on two elements as part of its specification: one is the number or percent of rows to return, and the other is the ordering.

SELECT TOP (5)
  orderid, orderdate, custid, empid
FROM Sales.Orders
ORDER BY orderdate DESC;
Note that the TOP filter is handled after DISTINCT.

The TOP can use option with the PERCENT keyword, in which case SQL Server calculates the number of rows to return based on a percentage of the number of qualifying rows, rounded up.

SELECT TOP (1) PERCENT
  orderid, orderdate, custid, empid
FROM Sales.Orders
ORDER BY orderdate DESC;

The query returns nine rows because the Orders table has 830 rows, and 1 percent of 830, rounded up, is 9.

11074	2022-05-06	73	7
11075	2022-05-06	68	8
11076	2022-05-06	9	4
11077	2022-05-06	65	1
11070	2022-05-05	44	2
11071	2022-05-05	46	1
11072	2022-05-05	20	4
11073	2022-05-05	58	2
11067	2022-05-04	17	1

In the above query, notice that the ORDER BY list is not unique (because no primary key or unique constraint is defined on the orderdate column).

  • In other words, the ordering is not strict total ordering. Multiple rows can have the same order date.

  • In such a case, the ordering among rows with the same order date is undefined, which makes the query nondeterministic—more than one result can be considered correct.

  • In case of ties, SQL Server filters rows based on optimization choices and physical access order.

  • Note that when using the TOP filter in a query without an ORDER BY clause, the ordering is completely undefined—SQL Server returns whichever n rows it happens to physically access first, where n is the requested number of rows.

  • To make the query be deterministic, a strict total ordering is needed; in other words, add a tiebreaker.

    SELECT TOP (5)
      orderid, orderdate, custid, empid
    FROM Sales.Orders
    ORDER BY orderdate DESC, orderid DESC; -- the row with the greater order ID value will be preferred.
    11077	2022-05-06	65	1
    11076	2022-05-06	9	4
    11075	2022-05-06	68	8
    11074	2022-05-06	73	7
    11073	2022-05-05	58	2
  • Instead of adding a tiebreaker to the ORDER BY list, a request can be made to return all ties by adding the WITH TIES option.

    SELECT TOP (5) WITH TIES
      orderid, orderdate, custid, empid
    FROM Sales.Orders
    ORDER BY orderdate DESC;
    • SQL Server first returned the TOP (5) rows based on orderdate DESC ordering, and it also returned all other rows from the table that had the same orderdate value as in the last of the five rows that were accessed.

    • Using the WITH TIES option, the selection of rows is deterministic, but the presentation order among rows with the same order date isn’t.

      11077	2022-05-06	65	1
      11076	2022-05-06	9	4
      11075	2022-05-06	68	8
      11074	2022-05-06	73	7
      11073	2022-05-05	58	2
      11072	2022-05-05	20	4
      11071	2022-05-05	46	1
      11070	2022-05-05	44	2
The TOP filter is very useful, but it has two shortcomings—it’s not standard, and it doesn’t support a skipping capability.

2.8. OFFSET-FETCH

T-SQL also supports a standard, TOP-like filter, called OFFSET-FETCH, which does support a skipping option, which makes it very useful for paging purposes.

According to the SQL standard, the OFFSET-FETCH filter is considered an extension to the ORDER BY clause. With the OFFSET clause indicates how many rows to skip, and with the FETCH clause indicates how many rows to filter after the skipped rows.

SELECT orderid, orderdate, custid, empid
FROM Sales.Orders
ORDER BY orderdate, orderid
  OFFSET 50 ROWS FETCH NEXT 25 ROWS ONLY;
  -- OFFSET 50 ROWS;
  -- OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY;
Note that a query that uses OFFSET-FETCH must have an ORDER BY clause. Also, contrary to the SQL standard, T-SQL doesn’t support the FETCH clause without the OFFSET clause. However, OFFSET without FETCH is allowed to skip the indicated number of rows and returns all remaining rows in the result.
In the syntax for the OFFSET- FETCH filter, the singular and plural forms ROW and ROWS, and the forms FIRST and NEXT are interchangeable to phrase the filter in an intuitive, English-like manner.

2.9. OVER

A window function is a function that, for each row in the underlying query, operates on a window (set) defined with an OVER clause of rows that is derived from the underlying query result, and computes a scalar (single) result value.

SELECT orderid, custid, freight,
  ROW_NUMBER() OVER(PARTITION BY custid
                     ORDER BY freight) AS rownum
FROM Sales.Orders
ORDER BY custid, freight;
  • For each row in the underlying query, the OVER clause exposes to the function a subset of the rows from the underlying query’s result set.

  • The OVER clause can restrict the rows in the window by using an optional window partition clause (PARTITION BY).

  • It can define ordering for the calculation (if relevant) using a window order clause (ORDER BY)—not to be confused with the query’s presentation ORDER BY clause.

Window functions are defined by the SQL standard, and T-SQL supports a subset of the features from the standard.

3. Predicates and Operators

T-SQL has language elements in which predicates can be specified—for example, query filters such as WHERE and HAVING, the JOIN operator’s ON clause, CHECK constraints, and others.

T-SQL uses three-valued predicate logic, where logical expressions can evaluate to TRUE, FALSE, or UNKNOWN.

3.1. Predicates: IN, BETWEEN, and LIKE

  • The IN predicate is used to check whether a value, or scalar expression, is equal to at least one of the elements in a set.

    SELECT orderid, empid, orderdate
    FROM Sales.Orders
    WHERE orderid IN(10248, 10249, 10250);
  • The BETWEEN predicate is used to to check whether a value falls within a specified range, INCLUSIVE of the two delimiters of the range.

    SELECT orderid, empid, orderdate
    FROM Sales.Orders
    WHERE orderid BETWEEN 10300 AND 10310;
  • The LIKE predicate is used to check whether a character string value meets a specified pattern.

    SELECT empid, firstname, lastname
    FROM HR.Employees
    WHERE lastname LIKE N'D%';
    Notice the use of the letter N to prefix the string 'D%'; it stands for National and is used to denote that a character string is of a Unicode data type (NCHAR or NVARCHAR), as opposed to a regular character data type (CHAR or VARCHAR).

3.2. Comparison Operators: =, >, <, >=, <=, and <>

  • T-SQL supports the following comparison operators: =, >, <, >=, <=, <>, !=, !>, and !<, of which the last three are not standard and should be avoided using.

    SELECT orderid, empid, orderdate
    FROM Sales.Orders
    WHERE orderdate >= '20220101';

3.3. Logical Operators OR, AND, and NOT

  • The logical operators OR, AND, and NOT are used to combine logical expressions.

    SELECT orderid, empid, orderdate
    FROM Sales.Orders
    WHERE orderdate >= '20220101'
      AND empid NOT IN(1, 3, 5);

3.4. Arithmetic Operators: +, -, *, /, and %

  • T-SQL supports the four obvious arithmetic operators: +, -, *, and /, and also supports the % operator (modulo), which returns the remainder of integer division.

    SELECT orderid, productid, qty, unitprice, discount,
      qty * unitprice * (1 - discount) AS val
    FROM Sales.OrderDetails;

    Note that the data type of a scalar expression involving two operands is determined in T-SQL by the operand with the higher data-type precedence.

    • If both operands are of the same data type, the result of the expression is of the same data type as well.

    • If the two operands are of different types, the one with the lower precedence is promoted to the one that is higher.

      WITH Numbers AS (
          SELECT 5 AS IntValue, 2 AS IntDivisor, 5.0 AS FloatValue
      )
      SELECT
        IntValue / IntDivisor AS IntegerDivisionResult, -- Integer division
        CAST(IntValue AS NUMERIC(12, 2)) / CAST(IntDivisor AS NUMERIC(12, 2)) AS DecimalDivisionResult, -- Decimal division with casting
        FloatValue / IntDivisor AS DecimalDivisionFromFloatResult -- Division with a float
      FROM Numbers;

4. CASE Expressions

A CASE expression, based on the SQL standard, is a scalar expression that returns a value based on conditional logic.

Note that CASE is an (scalar) expression and not a statement; that is, it returns a value and it is allowed wherever scalar expressions are allowed, such as in the SELECT, WHERE, HAVING, and ORDER BY clauses and in CHECK constraints.

There are two forms of CASE expressions: simple and searched.

  • The simple CASE expression has a single test value or expression right after the CASE keyword that is compared with a list of possible values or expressions, in the WHEN clauses.

    • If no value in the list is equal to the tested value, the CASE expression returns the value that appears in the ELSE clause (or NULL if an ELSE clause is not present).

      SELECT supplierid, COUNT(*) AS numproducts,
        CASE COUNT(*) % 2
           WHEN 0 THEN 'Even'
           WHEN 1 THEN 'Odd'
           ELSE 'Unknown'
         END AS countparity
      FROM Production.Products
      GROUP BY supplierid;
  • The searched CASE expression returns the value in the THEN clause that is associated with the first WHEN predicate that evaluates to TRUE.

    • If none of the WHEN predicates evaluates to TRUE, the CASE expression returns the value that appears in the ELSE clause (or NULL if an ELSE clause is not present).

      SELECT orderid, custid, freight,
        CASE
           WHEN freight < 1000.00  THEN 'Less than 1000'
           WHEN freight <= 3000.00 THEN 'Between 1000 and 3000'
           WHEN freight > 3000.00  THEN 'More than 3000'
           ELSE 'Unknown'
         END AS valuecategory
      FROM Sales.Orders;

5. Joins

T-SQL supports four table operators: JOIN, APPLY, PIVOT, and UNPIVOT.

  • The JOIN table operator is standard, whereas APPLY, PIVOT, and UNPIVOT are T-SQL extensions to the standard.

  • Each table operator acts on tables provided to it as input, applies a set of logical query processing phases, and returns a table result.

A JOIN table operator operates on two input tables with three fundamental types of joins: cross joins, inner joins, and outer joins.

  • A cross join applies only one phase—Cartesian Product.

  • An inner join applies two phases—Cartesian Product and Filter.

  • An outer join applies three phases— Cartesian Product, Filter, and Add Outer Rows.

5.1. Cross Joins

The cross join is the simplest type of join that implements only one logical query processing phase—a Cartesian Product.

  • It operates on the two tables provided as inputs and produces a Cartesian product of the two, that is, each row from one input is matched with all rows from the other.

    -- SQL-92 syntax
    SELECT C.custid, E.empid
    FROM Sales.Customers AS C
       CROSS JOIN HR.Employees AS E;
    
    -- SQL-89 syntax (not recommended)
    SELECT C.custid, E.empid
    FROM Sales.Customers AS C, HR.Employees AS E;
    
    -- Self cross joins
    SELECT
      E1.empid, E1.firstname, E1.lastname,
      E2.empid, E2.firstname, E2.lastname
    FROM HR.Employees AS E1
       CROSS JOIN HR.Employees AS E2;
    DROP TABLE IF EXISTS dbo.Digits;
    CREATE TABLE dbo.Digits (digit INT NOT NULL PRIMARY KEY);
    INSERT INTO dbo.Digits(digit)
      VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9);
    -- Producing tables of numbers
    SELECT D3.digit * 100 + D2.digit * 10 + D1.digit + 1 AS n
    FROM dbo.Digits AS D1
       CROSS JOIN dbo.Digits AS D2
       CROSS JOIN dbo.Digits AS D3
    ORDER BY n;
    1
    2
    3
    . . .
    998
    999
    1000

5.2. Inner Joins

An inner join applies two logical query processing phases—it applies a Cartesian product between the two input tables like in a cross join, and then it filters rows based on a specified predicate in a designated clause called ON.

-- SQL-92 syntax
SELECT E.empid, E.firstname, E.lastname, O.orderid
FROM HR.Employees AS E
  INNER JOIN Sales.Orders AS O
  ON E.empid = O.empid;

-- Note that the SQL-89 syntax has no ON clause.
SELECT E.empid, E.firstname, E.lastname, O.orderid
FROM HR.Employees AS E, Sales.Orders AS O
WHERE E.empid = O.empid;
As with the WHERE and HAVING clauses, the ON clause also returns only rows for which the predicate evaluates to TRUE, and it does not return rows for which the predicate evaluates to FALSE or UNKNOWN.
  • When a join condition involves only an equality operator, the join is said to be an equi join.

  • When a join condition involves any operator besides equality, the join is said to be a non-equi join.

    SELECT
      E1.empid, E1.firstname, E1.lastname,
      E2.empid, E2.firstname, E2.lastname
    FROM HR.Employees AS E1
      INNER JOIN HR.Employees AS E2
      ON E1.empid < E2.empid;

Standard SQL supports a concept called natural join, which represents an inner join based on a match between columns with the same name in both sides. T-SQL doesn’t have an implementation of a natural join.

For example, T1 NATURAL JOIN T2 joins the rows between T1 and T2 based on a match between the columns with the same names on both sides.

A join that has an explicit join predicate like equi join and non-equi join that is based on a binary operator (equality or inequality) is known as a theta join.

5.3. Outer Joins

Outer joins were introduced in SQL-92 and, unlike inner joins and cross joins, have only one standard syntax—the one in which the JOIN keyword is specified between the table names and the join condition is specified in the ON clause.

Outer joins apply the two logical processing phases that inner joins apply (Cartesian Product and the ON filter), plus a third phase called Adding Outer Rows that is unique to this type of join.

In an outer join, a table is marked as a preserved table by using the keywords LEFT OUTER JOIN, RIGHT OUTER JOIN, or FULL OUTER JOIN between the table names.

  • The OUTER keyword is optional.

  • The LEFT keyword means that the rows of the left table (the one to the left of the JOIN keyword) are preserved; the RIGHT keyword means that the rows in the right table are preserved; and the FULL keyword means that the rows in both the left and right tables are preserved.

  • The third logical query processing phase of an outer join identifies the rows from the preserved table that did not find matches in the other table based on the ON predicate, which adds those rows to the result table produced by the first two phases of the join, and it uses NULLs as placeholders for the attributes from the nonpreserved side of the join in those outer rows.

    SELECT C.custid, C.companyname, O.orderid
    FROM Sales.Customers AS C
      LEFT OUTER JOIN Sales.Orders AS O
      ON C.custid = O.custid;
A FULL OUTER JOIN with the condition ON 1=1 is functionally equivalent to a CROSS JOIN.

If the predicate in the WHERE clause refers to an attribute from the nonpreserved side of the join using an expression in the form <attribute> <operator> <value>, because attributes from the nonpreserved side of the join are NULLs in outer rows, and an expression in the form NULL <operator> <value> yields UNKNOWN (unless it’s the IS NULL operator explicitly looking for NULLs, or the distinct predicate IS [NOT] DISTINCT FROM), it’s usually an indication of a bug,

SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C
  LEFT OUTER JOIN Sales.Orders AS O
  ON C.custid = O.custid
WHERE O.orderdate >= '20220101'; -- Effectively, the join becomes an inner join.

If the predicate in the inner join’s ON clause compares an attribute from the nonpreserved side of the outer join and an attribute from the third table, all outer rows are discarded.

Remember that outer rows have NULLs in the attributes from the nonpreserved side of the join, and comparing a NULL with anything yields UNKNOWN. UNKNOWN is filtered out by the ON filter. In other words, such a predicate nullifies the outer join, effectively turning it into an inner join.

-- outer rows are dropped whenever any kind of outer join (left, right, or full) is
-- followed by a subsequent inner join or right outer join.
SELECT C.custid, O.orderid, OD.productid, OD.qty
FROM Sales.Customers AS C
  LEFT OUTER JOIN Sales.Orders AS O
  ON C.custid = O.custid
  INNER JOIN Sales.OrderDetails AS OD
  ON O.orderid = OD.orderid;

6. Subqueries

SQL supports writing queries within queries, or nesting queries.

  • The outermost query is a query whose result set is returned to the caller and is known as the outer query.

  • The inner query is a query whose result set is used by the outer query and is known as a subquery.

  • A subquery can be either self-contained or correlated.

    • A self-contained subquery has no dependency on tables from the outer query, whereas a correlated subquery does.

    • A subquery can be single-valued, multivalued, or table-valued, that is, a subquery can return a single value, multiple values, or a whole table result.

  • A scalar subquery is a subquery that returns a single value and can appear anywhere in the outer query where a single-valued expression can appear (such as WHERE or SELECT).

    DECLARE @maxid AS INT = (SELECT MAX(orderid)
    FROM Sales.Orders);
    SELECT orderid, orderdate, empid, custid
    FROM Sales.Orders
    WHERE orderid = @maxid;
    -- substitute the above variable with a scalar self-contained subquery
    SELECT orderid, orderdate, empid, custid
    FROM Sales.Orders
    WHERE orderid = (SELECT MAX(O.orderid)
    FROM Sales.Orders AS O);
  • For a scalar subquery to be valid, it must return no more than one value.

    • If a scalar subquery returns more than one value, it fails at run time.

      SELECT orderid
      FROM Sales.Orders
      WHERE empid =
         (SELECT E.empid
      FROM HR.Employees AS E
      WHERE E.lastname LIKE N'D%');
      Msg 512, Level 16, State 1, Line 1
      Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
    • If a scalar subquery returns no value, the empty result is converted to a NULL.

  • A multivalued subquery is a subquery that returns multiple values as a single column, and such as the IN predicate, operate on a multivalued subquery.

    SELECT orderid
    FROM Sales.Orders
    WHERE empid IN
      (SELECT E.empid
    FROM HR.Employees AS E
    WHERE E.lastname LIKE N'D%');
In some cases the database engine optimizes both the subquery and the the join the same way, sometimes joins perform better, and sometimes subqueries perform better.
  • A correlated subquery is subquery that refer to attributes from the tables that appear in the outer query, that means the subquery is dependent on the outer query and cannot be invoked as a standalone query.

    • Logically, the subquery is evaluated separately for each outer row in the logical query processing step in which it appears.

    • To simplify things, It’s suggested to focus attention on a single row in the outer table and think about the logical processing that takes place in the inner query for that row.

      SELECT custid, orderid, orderdate, empid
      FROM Sales.Orders AS O1
      WHERE orderid =
        (SELECT MAX(O2.orderid)
      FROM Sales.Orders AS O2
      WHERE O2.custid = O1.custid);
    • T-SQL supports a predicate called EXISTS, which accepts a subquery as input and returns TRUE if the subquery returns any rows and FALSE otherwise.

      SELECT custid, companyname
      FROM Sales.Customers AS C
      WHERE country = N'Spain'
        AND EXISTS
           (SELECT *
        FROM Sales.Orders AS O
        WHERE O.custid = C.custid);

7. Table Expressions

A table expression is an expression—typically a query—that conceptually returns a table result and as such can be nested as an operand of another table expression.

  • Recall that a table in SQL is the counterpart to a relation in relational theory.

  • A table expression is therefore SQL’s counterpart to a relational expression.

  • A relational expression in relational theory is an expression that returns a relation and as such can be nested as an operand of another relational expression.

  • A named table expression is then a table expression assigned with a name, and interacted with like doing with a base table.

T-SQL supports four types of named table expressions: derived tables, common table expressions (CTEs), views, and inline table-valued functions (inline TVFs).

7.1. Derived Tables

Derived tables are defined in the FROM clause of an outer query, which treated as if it were a regular table for the outer query, and also sometimes referred to as an inline view.

SELECT *
FROM (SELECT custid, companyname
  FROM Sales.Customers
  WHERE country = N'USA') AS USACusts;
SELECT orderyear, COUNT(DISTINCT custid) AS numcusts
FROM (SELECT YEAR(orderdate) AS orderyear, custid
  FROM Sales.Orders) AS D
GROUP BY orderyear;

7.2. CTEs

Common table expressions (CTEs) are another standard form of table expression similar to derived tables, yet with a couple of important advantages.

WITH <CTE_Name>[(<target_column_list>)]
AS
(
  <inner_query_defining_CTE>
)
<outer_query_against_CTE>;
  • CTEs also support two forms of column aliasing: inline and external. For the inline form, specify <expression> AS <column_alias>; for the external form, specify the target column list in parentheses immediately after the CTE name.

    WITH C AS
    (
      SELECT YEAR(orderdate) AS orderyear, custid
      FROM Sales.Orders
    )
    SELECT orderyear, COUNT(DISTINCT custid) AS numcusts
    FROM C
    GROUP BY orderyear;
    WITH C(orderyear, custid) AS
    (
      SELECT YEAR(orderdate), custid
      FROM Sales.Orders
    )
    SELECT orderyear, COUNT(DISTINCT custid) AS numcusts
    FROM C
    GROUP BY orderyear;
  • Each CTE can refer to all previously defined CTEs, and the outer query can refer to all CTEs.

    WITH C1 AS
    (
      SELECT YEAR(orderdate) AS orderyear,
        custid
      FROM Sales.Orders
    ),
    C2 AS
    (
      SELECT orderyear, COUNT(DISTINCT custid) AS numcusts
      FROM C1
      GROUP BY orderyear
    )
    SELECT orderyear, numcusts
    FROM C2
    WHERE numcusts > 70;
  • Multiple references in CTEs in table operators like joins

    WITH YearlyCount AS
    (
      SELECT YEAR(orderdate) AS orderyear,
        COUNT(DISTINCT custid) AS numcusts
      FROM Sales.Orders
      GROUP BY YEAR(orderdate)
    )
    SELECT Cur.orderyear,
      Cur.numcusts AS curnumcusts, Prv.numcusts AS prvnumcusts,
      Cur.numcusts - Prv.numcusts AS growth
    FROM YearlyCount AS Cur
      LEFT OUTER JOIN YearlyCount AS Prv
      ON Cur.orderyear = Prv.orderyear + 1;
  • CTEs are unique among table expressions in the sense that they support recursion.

    Recursive CTEs, like nonrecursive ones, are defined by the SQL standard.
    WITH <CTE_Name>[(<target_column_list>)]
    AS
    (
      <anchor_member>
      UNION ALL
      <recursive_member>
    )
    <outer_query_against_CTE>;
    • A recursive CTE is defined by at least two queries (more are possible)—at least one query known as the anchor member and at least one query known as the recursive member.

    • The anchor member is a query that returns a valid relational result table —like a query that is used to define a nonrecursive table expression. The anchor member query is invoked only once.

    • The recursive member is a query that has a reference to the CTE name and is invoked repeatedly until it returns an empty set. The reference to the CTE name represents the previous result set.

    • The first time that the recursive member is invoked, the previous result set represents whatever the anchor member returned.

    • In each subsequent invocation of the recursive member, the reference to the CTE name represents the result set returned by the previous invocation of the recursive member.

    • Both queries must be compatible in terms of the number of columns they return and the data types of the corresponding columns.

    • The reference to the CTE name in the outer query represents the unified result sets of the invocation of the anchor member and all invocations of the recursive member.

      WITH EmpsCTE AS
      (
        SELECT empid, mgrid, firstname, lastname
        FROM HR.Employees
        WHERE empid = 2
      
        UNION ALL
      
        SELECT C.empid, C.mgrid, C.firstname, C.lastname
        FROM EmpsCTE AS P
          INNER JOIN HR.Employees AS C
            ON C.mgrid = P.empid
      )
      SELECT empid, mgrid, firstname, lastname
      FROM EmpsCTE;
      2	1	Don	Funk
      3	2	Judy	Lew
      5	2	Sven	Mortensen
      6	5	Paul	Suurs
      7	5	Russell	King
      9	5	Patricia	Doyle
      4	3	Yael	Peled
      8	3	Maria	Cameron

7.3. Views and TVFs

Derived tables and CTEs have a single-statement scope, which means they are not reusable. Views and inline table-valued functions (inline TVFs) are two types of table expressions whose definitions are stored as permanent objects in the database, making them reusable.

CREATE OR ALTER VIEW Sales.USACusts
AS
  SELECT
    custid, companyname, contactname, contacttitle, address,
    city, region, postalcode, country, phone, fax
  FROM Sales.Customers
  WHERE country = N'USA';
GO -- The GO command is used here to terminate what’s called a batch in T-SQL.

SELECT custid, companyname
FROM Sales.USACusts;
  • Remember that a presentation ORDER BY clause is not allowed in the query defining a table expression because a relation isn’t ordered.

    CREATE OR ALTER VIEW Sales.USACusts
    AS
      SELECT
        custid, companyname, contactname, contacttitle, address,
        city, region, postalcode, country, phone, fax
      FROM Sales.Customers
      WHERE country = N'USA'
      ORDER BY region;
    GO
    Msg 1033, Level 15, State 1, Procedure USACusts, Line 8
    The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
  • Inline TVFs are reusable table expressions that support input parameters.

    • In most respects, except for the support for input parameters, inline TVFs are similar to views, or parameterized views.

    • T-SQL supports another type of table function called multi-statement TVF, which populates and returns a table variable.

      CREATE OR ALTER FUNCTION dbo.GetCustOrders
         (@cid AS INT) RETURNS TABLE
       AS
       RETURN
         SELECT orderid, custid, empid, orderdate, requireddate,
           shippeddate, shipperid, freight, shipname, shipaddress, shipcity,
           shipregion, shippostalcode, shipcountry
      FROM Sales.Orders
      WHERE custid = @cid;
      GO
      
      SELECT orderid, custid
      FROM dbo.GetCustOrders(1) AS O;
      GO
      
      SELECT O.orderid, O.custid, OD.productid, OD.qty
      FROM dbo.GetCustOrders(1) AS O
        INNER JOIN Sales.OrderDetails AS OD
        ON O.orderid = OD.orderid;
      GO
      
      DROP FUNCTION IF EXISTS dbo.GetCustOrders;

7.4. APPLY

The nonstandard APPLY operator operates on two input tables like a correlated join, instead of treaing its two inputs as a set, applies the right table (typically a derived table or a TVF) to each row from the left table (evaluated first) and produces a result table with the unified result sets.

  • A CROSS APPLY operator is equavelent to a CROSS JOIN.

    SELECT S.shipperid, E.empid
    FROM Sales.Shippers AS S
      CROSS JOIN HR.Employees AS E;
    
    SELECT S.shipperid, E.empid
    FROM Sales.Shippers AS S
      CROSS APPLY HR.Employees AS E;
  • With APPLY, the left side is evaluated first, and the right side is evaluated per row from the left, and can have references to elements from the left.

    SELECT C.custid, A.orderid, A.orderdate
    FROM Sales.Customers AS C
      CROSS APPLY
        (SELECT TOP (3) orderid, empid, orderdate, requireddate
         FROM Sales.Orders AS O
         WHERE O.custid = C.custid
         ORDER BY orderdate DESC, orderid DESC) AS A; -- A is a correlated derived table
    Because the derived table is applied to each left row, the CROSS APPLY operator returns the three most recent orders for each customer.
    1	11011	2022-04-09
    1	10952	2022-03-16
    1	10835	2022-01-15
    2	10926	2022-03-04
    2	10759	2021-11-28
    2	10625	2021-08-08
    3	10856	2022-01-28
    3	10682	2021-09-25
    3	10677	2021-09-22
    . . .
  • If the right table expression returns an empty set, the CROSS APPLY operator does not return the corresponding left row. To return rows from the left side even if there are no matches on the right side, use OUTER APPLY.

    SELECT C.custid, A.orderid, A.orderdate
    FROM Sales.Customers AS C
      OUTER APPLY
        (SELECT orderid, empid, orderdate, requireddate
         FROM Sales.Orders AS O
         WHERE O.custid = C.custid AND O.custid in (22, 57)
         ORDER BY orderdate DESC, orderid DESC
         OFFSET 0 ROWS FETCH FIRST 3 ROWS ONLY) AS A;
    1	NULL	NULL
    2	NULL	NULL
    3	NULL	NULL
  • It’s more conventional to work with inline TVFs instead of derived tables.

    CREATE OR ALTER FUNCTION dbo.TopOrders
      (@custid AS INT, @n AS INT)
      RETURNS TABLE
    AS
    RETURN
      SELECT orderid, empid, orderdate, requireddate
      FROM Sales.Orders
      WHERE custid = @custid
      ORDER BY orderdate DESC, requireddate DESC
      OFFSET 0 ROWS FETCH NEXT @N ROWS ONLY;
    GO
    
    SELECT
      C.custid, C.companyname,
      A.orderid, A.empid, A.orderdate, A.requireddate
    FROM Sales.Customers AS C
      CROSS APPLY dbo.TopOrders(C.custid, 3) AS A;
    GO
    
    -- cleanup
    DROP FUNCTION if EXISTS dbo.TopOrders

8. UNION, UNION ALL, INTERSECT, and EXCEPT

Set operators are operators that combine rows from two query result sets (or multisets). * Some of the operators remove duplicates from the result, and hence return a set, whereas others don’t, and hence return a multiset.

  • T-SQL supports the following operators: UNION, UNION ALL, INTERSECT, and EXCEPT.

  • A set operator compares complete rows between the results of the two input queries involved.

    Input Query1
    <set_operator>
    Input Query2
    [ORDER BY ...];
    • Because a set operator expects multisets as inputs, the two queries involved cannot have ORDER BY clauses.

      Remember that a query with an ORDER BY clause does not return a multiset—it returns an ordered result.
    • In terms of logical-query processing, each of the individual queries can have all logical-query processing phases except for a presentation ORDER BY.

    • The operator is applied to the results of the two queries, and the outer ORDER BY clause (if one exists) is applied to the result of the operator.

    • In terms of logical-query processing, each of the individual queries can have all logical-query processing phases except for a presentation ORDER BY, as I just explained. The operator is applied to the results of the two queries, and the outer ORDER BY clause (if one exists) is applied to the result of the operator.

    • The two input queries must produce results with the same number of columns, and corresponding columns must have compatible data types.

    • The names of the columns in the result are determined by the first query. Still, it’s considered a best practice to make sure that all columns have names in both queries, and that the names of the corresponding columns are the same.

    • When a set operator compares rows between the two inputs, it doesn’t use an equality-based comparison; rather, it uses a distinctness-based comparison.

      The semantics of distinctness-based comparisons are the same as the ones used by a standard predicate called the distinct predicate that treats NULLs just like non-NULL values for comparison purposes.

  • The SQL standard supports two "flavors" of each operator—DISTINCT (the default) and ALL.

    • The DISTINCT flavor eliminates duplicates and returns a set.

    • ALL doesn’t attempt to remove duplicates and therefore returns a multiset.

    • All three operators in T-SQL support an implicit distinct version, but only the UNION operator supports the ALL version.

    • In terms of syntax, T-SQL doesn’t allow you to specify the DISTINCT clause explicitly. Instead, it’s implied when you don’t specify ALL.

  • SQL defines precedence among set operators: INTERSECT operator precedes UNION and EXCEPT, and UNION and EXCEPT are evaluated in order of appearance.

-- the result is a multiset and not a set
SELECT country, region, city FROM HR.Employees
UNION ALL
SELECT country, region, city FROM Sales.Customers;
-- returns distinct locations
SELECT country, region, city FROM HR.Employees
UNION
SELECT country, region, city FROM Sales.Customers;
-- returns only distinct rows that appear in both input query results
SELECT country, region, city FROM HR.Employees
INTERSECT
SELECT country, region, city FROM Sales.Customers;
-- returns only distinct rows that appear in the first set but not the second
SELECT country, region, city FROM HR.Employees
EXCEPT
SELECT country, region, city FROM Sales.Customers;
SELECT country, region, city FROM Production.Suppliers
EXCEPT
SELECT country, region, city FROM HR.Employees
INTERSECT -- evaluated first
SELECT country, region, city FROM Sales.Customers;

9. Data Analysis

T-SQL in SQL Server offers robust features for data analysis, including window Functions, pivoting, unpivoting, grouping sets, and time series data handling.

9.1. Window Functions

A window function is a function that, for each row, computes a scalar result value based on a calculation against a subset as a window of the rows from the underlying query set.

  • Window functions perform calculations on a per-row basis within a defined window of rows, preserving detail, whereas grouped queries lose detail by aggregation.

  • Window functions operate directly on the underlying query result set, while subqueries often start with a fresh view of the data, potentially requiring duplication of query logic.

  • Window functions can define the order of rows for calculations separately from the presentation order of the result set.

  • Window functions are allowed only in the SELECT and ORDER BY clauses of a query.

A window function is defined by using the OVER clause with up to three parts: window-partition, window-order, and window-frame.

<function>( <expression> ) [ IGNORE NULLS | RESPECT NULLS ] OVER(...)
  • An empty OVER() clause represents the entire underlying query’s result set.

  • The window-partition clause (PARTITION BY) restricts the window to the subset of rows that have the same values in the partitioning columns as in the current row.

  • The window-order clause (ORDER BY) defines ordering, but don’t confuse this with presentation ordering.

    • In a window aggregate function, window ordering supports a frame specification.

    • In a window ranking function, window ordering gives meaning to the rank.

  • The window-frame filters a frame, or a subset, of rows from the window partition between the two specified delimiters, which is defined using the ROWS or RANGE clause.

    • ROWS: Defines the frame based on the number of rows before and after the current row.

      ROWS BETWEEN <top delimiter> AND <bottom delimiter>
      • UNBOUNDED PRECEDING: Includes all rows from the beginning of the partition up to the current row.

      • n PRECEDING: Includes the current row and the n preceding rows.

      • CURRENT ROW: Includes only the current row.

      • n FOLLOWING: Includes the current row and the n following rows.

      • UNBOUNDED FOLLOWING: Includes all rows from the current row to the end of the partition.

    • RANGE: Defines the frame based on the values of the ORDER BY column.

      RANGE BETWEEN <top delimiter> AND <bottom delimiter>
      • UNBOUNDED PRECEDING: Includes all rows from the beginning of the partition up to the current row.

      • n PRECEDING: Includes rows where the ORDER BY column’s value is within n units of the current row’s value.

      • CURRENT ROW: Includes only the current row.

      • n FOLLOWING: Includes rows where the ORDER BY column’s value is within n units of the current row’s value.

      • UNBOUNDED FOLLOWING: Includes all rows from the current row to the end of the partition.

    -- compute the running-total for each employee and month
    SELECT empid, ordermonth, val,
      SUM(val) OVER(
                    PARTITION BY empid -- For an underlying row with employee ID `1`, the window exposed to the function filters only the rows where the employee ID is `1`.
                    ORDER BY ordermonth
                    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                   ) AS runval
    FROM Sales.EmpOrders;
    1	2020-07-01	1614.88	1614.88
    1	2020-08-01	5555.90	7170.78
    1	2020-09-01	6651.00	13821.78
    . . .

9.1.1. Ranking

T-SQL supports four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE to rank each row with respect to others in the window.

SELECT orderid, custid, val,
  ROW_NUMBER() OVER(ORDER BY val) AS rownum,
  RANK()       OVER(ORDER BY val) AS rank,
  DENSE_RANK() OVER(ORDER BY val) AS dense_rank,
  NTILE(10)    OVER(ORDER BY val) AS ntile
FROM Sales.OrderValues
ORDER BY val;
orderid	custid	val	rownum	rank	dense_rank	ntile
10782	12	12.50	1	1	1	1
10807	27	18.40	2	2	2	1
10586	66	23.80	3	3	3	1
10767	76	28.00	4	4	4	1
10898	54	30.00	5	5	5	1
10900	88	33.75	6	6	6	1
10883	48	36.00	7	7	7	1
11051	41	36.00	8	7	7	1
10815	71	40.00	9	9	8	1
10674	38	45.00	10	10	9	1
11057	53	45.00	11	10	9	1
10271	75	48.00	12	12	10	1
. . .
10496	81	190.00	83	83	78	1
10793	4	191.10	84	84	79	2
10428	66	192.00	85	85	80	2
. . .
  • The ROW_NUMBER function assigns incremental sequential integers to the rows in the query result based on the mandatory window ordering.

  • The RANK or DENSE_RANK function will produce same value when there are ties in the ordering values, and the difference between the two is that RANK reflects the count of rows that have a lower ordering value than the current row (plus 1), whereas DENSE_RANK reflects the count of distinct ordering values that are lower than the current row (plus 1).

  • The NTILE function assigns a tile number to each row associated the rows in the result with tiles (equally sized groups of rows).

    If the number of rows can’t be evenly divided by the number of tiles, an extra row is added to each of the first tiles from the remainder. For example, if 102 rows and five tiles were requested, the first two tiles would have 21 rows instead of 20.

  • Window functions are logically evaluated as part of the SELECT list, before the DISTINCT clause is evaluated.

    -- DISTINCT clause has no effect here, no duplicate rows to remove
    SELECT DISTINCT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
    FROM Sales.OrderValues;
    -- an alternative solution: GROUP BY phase is processed before the SELECT phase
    SELECT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
    FROM Sales.OrderValues
    GROUP BY val;

9.1.2. Offset

T-SQL supports two pairs of offset functions: LAG and LEAD, and FIRST_VALUE and LAST_VALUE, to return an element from a row that is at a certain offset from the current row or at the beginning or end of a window frame.

  • The LAG and LEAD functions look before and ahead respectively to obtain an element from a row that is at a certain offset from the current row within the partition, based on the indicated ordering.

    -- the LAG and LEAD functions support window partitions and window-order clauses.
    LAG(column_name, offset, default_value)  OVER(...)
    LEAD(column_name, offset, default_value) OVER(...)
    • column_name: the functions (which is mandatory) is the element to return.

    • offset: (Optional) An integer specifying the number (1 if not specified) of rows to offset from the current row.

    • default_value: (Optional) A value to be returned if there is no row at the requested offset (which is NULL if not specified otherwise).

    SELECT custid, orderid, val,
      LAG(val)  OVER(PARTITION BY custid -- same as: LAG(val, 1, NULL)
                     ORDER BY orderdate, orderid) AS prevval,
      LEAD(val) OVER(PARTITION BY custid -- same as: LEAD(val, 1, NULL)
                     ORDER BY orderdate, orderid) AS nextval
    FROM Sales.OrderValues
    ORDER BY custid, orderdate, orderid;
    custid	orderid	val	prevval	nextval
    . . .
    1	10952	471.20	845.80	933.50
    1	11011	933.50	471.20	NULL
    2	10308	88.80	NULL	479.75
    2	10625	479.75	88.80	320.00
    . . .
  • The FIRST_VALUE and LAST_VALUE functions return an element from the first and last rows in the window frame, respectively.

    • To obtain the element from the first row in the window partition, use FIRST_VALUE with the window-frame extent ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

    • To obtain the element from the last row in the window partition, use LAST_VALUE with the window-frame extent ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.

    SELECT custid, orderid, val,
      FIRST_VALUE(val) OVER(PARTITION BY custid
                            ORDER BY orderdate, orderid
                            ROWS BETWEEN UNBOUNDED PRECEDING
                                     AND CURRENT ROW) AS firstval,
      LAST_VALUE(val) OVER(PARTITION BY custid
                            ORDER BY orderdate, orderid
                            ROWS BETWEEN CURRENT ROW
                                     AND UNBOUNDED FOLLOWING) AS lastval
    FROM Sales.OrderValues
    ORDER BY custid, orderdate, orderid
    custid	orderid	val	firstval	lastval
    1	10643	814.50	814.50	933.50
    . . .
    1	11011	933.50	814.50	933.50
    2	10308	88.80	88.80	514.40
    . . .

9.1.3. Aggregate

The aggregate window functions aggregate the rows in the defined window, and support window-partition, window-order, and window-frame clauses.

SELECT orderid, custid, val,
  100. * val / SUM(val) OVER() AS pctall, -- percentage out of the grand total
  100. * val / SUM(val) OVER(PARTITION BY custid) AS pctcust -- percentage out of the customer total
FROM Sales.OrderValues;
SELECT empid, ordermonth, val,
  SUM(val) OVER(PARTITION BY empid
                ORDER BY ordermonth
                ROWS BETWEEN UNBOUNDED PRECEDING
                         AND CURRENT ROW) AS runval
FROM Sales.EmpOrders;

9.1.4. WINDOW

The WINDOW clause defines and names reusable entire window specifications or part of them, improving code readability and maintainability by reducing redundancy in complex queries with multiple window functions. It is available in SQL Server 2022 and higher, as well as in Azure SQL Database, provided that the database compatibility level is set to 160 or higher.

SELECT DATABASEPROPERTYEX(N'TSQLV6', N'CompatibilityLevel'); -- 160

When considering all major query clauses (SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY), place the WINDOW clause between the HAVING and ORDER BY clauses of the query.

SELECT empid, ordermonth, val,
  SUM(val) OVER W AS runsum,
  MIN(val) OVER W AS runmin,
  MAX(val) OVER W AS runmax,
  AVG(val) OVER W AS runavg
FROM Sales.EmpOrders
WINDOW W AS (PARTITION BY empid -- name an entire window specification
             ORDER BY ordermonth
             ROWS BETWEEN UNBOUNDED PRECEDING
                      AND CURRENT ROW);
SELECT custid, orderid, val,
  FIRST_VALUE(val) OVER(PO
                        ROWS BETWEEN UNBOUNDED PRECEDING
                                 AND CURRENT ROW) AS firstval,
  LAST_VALUE(val)  OVER(PO
                        ROWS BETWEEN CURRENT ROW
                                 AND UNBOUNDED FOLLOWING) AS last
FROM Sales.OrderValues
WINDOW PO AS (PARTITION BY custid -- name part of a window specification
              ORDER BY orderdate, orderid)
ORDER BY custid, orderdate, orderid;
SELECT orderid, custid, orderdate, qty, val,
  ROW_NUMBER() OVER PO AS ordernum,
  MAX(orderdate) OVER P AS maxorderdate,
  SUM(qty) OVER POF AS runsumqty,
  SUM(val) OVER POF AS runsumval
FROM Sales.OrderValues
WINDOW P AS ( PARTITION BY custid ), -- recursively reuse one window name within another
       PO AS ( P ORDER BY orderdate, orderid ),
       POF AS ( PO ROWS UNBOUNDED PRECEDING )
ORDER BY custid, orderdate, orderid;

9.2. Pivoting

Pivoting data involves rotating data from a state of rows to a state of columns, possibly aggregating values along the way, in many cases which is generally handled by the presentation layer for purposes such as reporting.

-- create and populate the sample table dbo.Orders
USE TSQLV6;

DROP TABLE IF EXISTS dbo.Orders;

CREATE TABLE dbo.Orders
(
  orderid   INT        NOT NULL
    CONSTRAINT PK_Orders PRIMARY KEY,
  orderdate DATE       NOT NULL,
  empid     INT        NOT NULL,
  custid    VARCHAR(5) NOT NULL,
  qty       INT        NOT NULL
);

INSERT INTO dbo.Orders(orderid, orderdate, empid, custid, qty)
VALUES
  (30001, '20200802', 3, 'A', 10),
  (10001, '20201224', 2, 'A', 12),
  (10005, '20201224', 1, 'B', 20),
  (40001, '20210109', 2, 'A', 40),
  (10006, '20210118', 1, 'C', 14),
  (20001, '20210212', 2, 'B', 12),
  (40005, '20220212', 3, 'A', 10),
  (20002, '20220216', 1, 'C', 20),
  (30003, '20220418', 2, 'B', 15),
  (30004, '20200418', 3, 'C', 22),
  (30007, '20220907', 3, 'D', 30);
-- query and return the total order quantity for each employee and customer
SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid, custid;
empid	custid	sumqty
2	A	52
3	A	20
1	B	20
2	B	27
1	C	34
3	C	22
3	D	30
Pivoted view of total quantity per employee (on rows) and customer (on columns)
empid	A	B	C	D
1	NULL	20	34	NULL
2	52	27	NULL	NULL
3	20	NULL	22	30

Every pivoting request involves three logical processing phases, each with associated elements:

  1. A grouping phase with an associated grouping or on rows element

  2. A spreading phase with an associated spreading or on cols element

  3. An aggregation phase with an associated aggregation element and aggregate function

9.2.1. Grouped Query

SELECT empid,
  SUM( (3)
      CASE WHEN custid = 'A' THEN qty END (2)
     ) AS A,
  SUM(CASE WHEN custid = 'B' THEN qty END) AS B,
  SUM(CASE WHEN custid = 'C' THEN qty END) AS C,
  SUM(CASE WHEN custid = 'D' THEN qty END) AS D
FROM dbo.Orders
GROUP BY empid; (1)
1 The grouping phase is achieved with a GROUP BY clause—in this case, GROUP BY empid.
2 The spreading phase is achieved in the SELECT clause with a CASE expression for each target column.
-- returns the quantity from the current row only when
-- the current row represents an order for customer A;
--  otherwise, the expression returns a NULL.
CASE WHEN custid = 'A' THEN qty END
3 Finally, the aggregation phase is achieved by applying the relevant aggregate function to the result of each CASE expression.
-- produces the result column for customer A
SUM(CASE WHEN custid = 'A' THEN qty END) AS A

9.2.2. PIVOT Operator

T- SQL also supports a proprietary table operator called PIVOT on a source table or table expression provided to it as its left input, pivots the data, and returns a result table.

SELECT ...
FROM <input_table>
  PIVOT(<agg_function>(<aggregation_element>)
          FOR <spreading_element> IN (<list_of_target_columns>))
WHERE ...;
  • The PIVOT operator figures out the grouping elements implicitly by elimination, that are all attributes from the source table that were not specified as either the spreading element or the aggregation element.

    -- custid is the spreading element
    -- qty is the aggregation element
    -- the left empid is the implied grouping element
    SELECT empid, A, B, C, D
    FROM (SELECT empid, custid, qty
          FROM dbo.Orders) AS D
      PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;
  • As a best practice with the PIVOT operator, always work with a table expression and not query the underlying table directly.

    -- the dbo.Orders table contains the attributes orderid, orderdate, empid, custid, and qty.
    -- the remaining attributes (orderid, orderdate, and empid) are all considered the grouping elements
    SELECT empid, A, B, C, D
    FROM dbo.Orders
      PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;
  • The items in the list of the target columns must be referred to as identifiers in the IN clause, and be delimited using square brackets when they are irregular (contain spaces, special characters, or are reserved keywords).

    SELECT custid, [1], [2], [3], [4]
    FROM (SELECT empid, custid, qty
          FROM dbo.Orders) AS D
      PIVOT(SUM(qty) FOR empid IN ([1], [2], [3], [4])) AS P;

9.3. Unpivoting

Unpivoting is a technique that rotates data from a state of columns to a state of rows.

-- create and populate the sample table dbo.EmpCustOrders
USE TSQLV6;

DROP TABLE IF EXISTS dbo.EmpCustOrders;

CREATE TABLE dbo.EmpCustOrders
(
  empid INT NOT NULL
    CONSTRAINT PK_EmpCustOrders PRIMARY KEY,
  A VARCHAR(5) NULL,
  B VARCHAR(5) NULL,
  C VARCHAR(5) NULL,
  D VARCHAR(5) NULL
);

INSERT INTO dbo.EmpCustOrders(empid, A, B, C, D)
  SELECT empid, A, B, C, D
  FROM (SELECT empid, custid, qty
        FROM dbo.Orders) AS D
    PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;
SELECT * FROM dbo.EmpCustOrders;
empid	A	B	C	D
1	NULL	20	34	NULL
2	52	27	NULL	NULL
3	20	NULL	22	30
Unpivoted data returns a row for each employee and customer, along with the order quantity
empid	custid	qty
1	B	20
1	C	34
2	A	52
2	B	27
3	A	20
3	C	22
3	D	30

9.3.1. APPLY Operator

-- 1. Producing copies
SELECT *
FROM dbo.EmpCustOrders
  CROSS JOIN (VALUES('A'),('B'),('C'),('D')) AS C(custid);

--	empid	A	B	C	D	custid
--	1	NULL	20	34	NULL	A
--	1	NULL	20	34	NULL	B
--	1	NULL	20	34	NULL	C
--	. . .

-- 2. Extracting values
SELECT empid, custid, qty
FROM dbo.EmpCustOrders
  --  a join treats its two inputs as a set;
  -- use the CROSS APPLY operator instead of the CROSS JOIN operator
  -- to refer to the columns A, B, C, and D from the left side of the join (EmpCustOrders)
  CROSS APPLY (VALUES('A', A),('B', B),('C', C),('D', D)) AS C(custid, qty)

--	empid	custid	qty
--	1	A	NULL
--	1	B	20
--	1	C	34
--	. . .

-- 3. Eliminating irrelevant rows
SELECT empid, custid, qty
FROM dbo.EmpCustOrders
  CROSS APPLY (VALUES('A', A),('B', B),('C', C),('D', D)) AS C(custid, qty)
WHERE qty IS NOT NULL; -- discard rows with a NULL in the qty column

--	empid	custid	qty
--	1	B	20
--	1	C	34
--	2	A	52
--	. . .

9.3.2. UNPIVOT Operator

T- SQL, like the PIVOT operator, also supports the UNPIVOT operator to unpivot data involved producing two result columns from any number of source columns—one to hold the source column names as strings and another to hold the source column values.

SELECT ...
FROM <input_table>
  UNPIVOT(<values_column> FOR <names_column> IN(<source_columns>)
WHERE ...;
SELECT empid, custid, qty
FROM dbo.EmpCustOrders
  UNPIVOT(qty FOR custid IN(A, B, C, D)) AS U;

9.4. Grouping Sets

A grouping set is a set of expressions to group the data by in a grouped query (a query with a GROUP BY clause).

  • Traditionally in SQL, a single grouped query defines a single grouping set.

    -- set(empid, custid)
    SELECT empid, custid, SUM(qty) AS sumqty
    FROM dbo.Orders
    GROUP BY empid, custid;
    
    -- set(empid)
    SELECT empid, SUM(qty) AS sumqty
    FROM dbo.Orders
    GROUP BY empid;
    
    -- set(custid)
    SELECT custid, SUM(qty) AS sumqty
    FROM dbo.Orders
    GROUP BY custid;
    
    -- set()
    SELECT SUM(qty) AS sumqty
    FROM dbo.Orders;
  • Use UNION ALL with NULL placeholders to combine multiple queries into a single result set for reporting, but potentially lead to two main problems—the length of the code and performance due to multiple scans for separated query.

    SELECT empid, custid, SUM(qty) AS sumqty
    FROM dbo.Orders
    GROUP BY empid, custid
    
    UNION ALL
    
    SELECT empid, NULL, SUM(qty) AS sumqty
    FROM dbo.Orders
    GROUP BY empid
    
    UNION ALL
    
    SELECT NULL, custid, SUM(qty) AS sumqty
    FROM dbo.Orders
    GROUP BY custid
    
    UNION ALL
    
    SELECT NULL, NULL, SUM(qty) AS sumqty
    FROM dbo.Orders;
  • T-SQL supports the standard GROUPING SETS, CUBE, and ROLLUP subclauses of the GROUP BY clause, and the GROUPING and GROUPING_ID functions to define multiple grouping sets in the same query for reporting and data analysis.

    • The GROUPING SETS subclause is a powerful enhancement to the GROUP BY clause to define multiple grouping sets in the same query.

      • The grouping sets are listed, separated by commas within the parentheses of the GROUPING SETS subclause, and for each grouping set list the members, separated by commas, within parentheses.

      • SQL Server typically needs fewer scans of the data than the number of grouping sets because it can roll up aggregates internally.

        SELECT empid, custid, SUM(qty) AS sumqty
        FROM dbo.Orders
        GROUP BY
          GROUPING SETS
          (
            (empid, custid),
            (empid),
            (custid),
            ()
        );
  • The CUBE subclause of the GROUP BY clause provides an abbreviated way to define multiple grouping sets.

    • In the parentheses of the CUBE subclause, provide a set of members separated by commas, and get all possible grouping sets that can be defined based on the input members.

    • In set theory, the set of all subsets of elements that can be produced from a particular set is called the power set.

      For example, CUBE(a, b, c) is equivalent to GROUPING SETS( (a, b, c), (a, b), (a, c), (b, c), (a), (b), (c), () ).

      SELECT empid, custid, SUM(qty) AS sumqty
      FROM dbo.Orders
      GROUP BY CUBE(empid, custid);
  • The ROLLUP subclause of the GROUP BY clause also provides an abbreviated way to define multiple grouping sets.

    • Unlike CUBE, which produces all possible grouping sets, ROLLUP assumes a hierarchy among input members and produces only grouping sets that form leading combinations of those members.

      For example, whereas CUBE(a, b, c) produces all eight possible grouping sets, ROLLUP(a, b, c) produces only four based on the hierarchy a>b>c that is the equivalent of specifying GROUPING SETS( (a, b, c), (a, b), (a), () ) rolling up the aggregations from the most granular level (a, b, c) to higher levels like (a, b) and finally to the total ().

      -- ROLLUP(YEAR(orderdate), MONTH(orderdate), DAY(orderdate))
      -- =>
      -- GROUPING SETS(
      --   (YEAR(orderdate), MONTH(orderdate), DAY(orderdate)),
      --   (YEAR(orderdate), MONTH(orderdate)),
      --   (YEAR(orderdate)),
      --   () )
      SELECT
        YEAR(orderdate) AS orderyear,
        MONTH(orderdate) AS ordermonth,
        DAY(orderdate) AS orderday,
        SUM(qty) AS sumqty
      FROM dbo.Orders
      GROUP BY ROLLUP(YEAR(orderdate), MONTH(orderdate), DAY(orderdate))
  • The GROUPING and GROUPING_ID functions are used to identify which columns in a GROUP BY clause are included in a group set or or are represented by a NULL placeholder in the aggregated result set.

    • GROUPING: returns 1 when the element isn’t part of the grouping set and 0 otherwise.

      SELECT
        GROUPING(empid) AS grpemp,
        GROUPING(custid) AS grpcust,
        empid, custid, SUM(qty) AS sumqty
      FROM dbo.Orders
      GROUP BY CUBE(empid, custid);
      grpemp	grpcust	empid	custid	sumqty
      0	0	2	A	52
      0	0	3	A	20
      1	0	NULL	A	72
      . . .
    • GROUPING_ID: returns an integer bitmap in which each bit represents a different input element—the rightmost element represented by the rightmost bit.

      SELECT
        GROUPING_ID(empid, custid) AS groupingset,
        empid, custid, SUM(qty) AS sumqty
      FROM dbo.Orders
      GROUP BY CUBE(empid, custid);
      groupingset	empid	custid	sumqty
      0          	2    	A     	52  -- 00
      0          	3    	A     	20  -- 00
      2          	NULL 	A     	72  -- 10
      . . .

9.5. Time Series

Time series data is data representing a series of events, or measurements, typically taken at regular time intervals. Time series data analysis usually involves organizing the data in groups, also known as buckets, and then aggregating some measures per bucket.

10. INSERT, DELETE, TRUNCATE, UPDATE, and MERGE

SQL has a set of statements known as Data Manipulation Language (DML) that includes the statements SELECT, INSERT, UPDATE, DELETE, TRUNCATE, and MERGE.

10.1. INSERT

T-SQL provides several statements for inserting data into tables: INSERT VALUES, INSERT SELECT, INSERT EXEC, SELECT INTO, and BULK INSERT.

10.1.1. INSERT VALUES

The standard INSERT VALUES statement is used to insert rows into a table based on specified values.

INSERT INTO dbo.Orders(orderid, orderdate, empid, custid)
   VALUES(10001, '20220212', 3, 'A');
  • Specifying the target column names right after the table name is optional, but by doing so, it can control the value-column associations instead of relying on the order of the columns in the CREATE TABLE statement.

  • In T-SQL, specifying the INTO clause is optional.

  • If a value for a column is NOT specified, Microsoft SQL Server will use a default value if one was defined for the column.

  • If a default value isn’t defined and the column allows NULLs, a NULL will be used.

  • If no default is defined and the column does not allow NULLs and does not somehow get its value automatically, the INSERT statement will fail.

T-SQL supports an enhanced standard VALUES clause that can be used to specify multiple rows separated by commas.

  • The enhanced standard VALUES statement is processed as a transaction, meaning that if any row fails to enter the table, none of the rows in the statement enters the table.

    INSERT INTO dbo.Orders
      (orderid, orderdate, empid, custid)
    VALUES
      (10003, '20220213', 4, 'B'),
      (10004, '20220214', 1, 'A'),
      (10005, '20220213', 1, 'C'),
      (10006, '20220215', 3, 'C');
  • The enhanced VALUES clause can be used as a table-value constructor to construct a derived table.

    SELECT *
    FROM ( VALUES
             (10003, '20220213', 4, 'B'),
             (10004, '20220214', 1, 'A'),
             (10005, '20220213', 1, 'C'),
             (10006, '20220215', 3, 'C') )
         AS O(orderid, orderdate, empid, custid); -- alias(es) to the table, and the target columns

10.1.2. INSERT SELECT

The standard INSERT SELECT statement inserts a set of rows returned by a SELECT query into a target table.

  • The INSERT SELECT statement is performed as a transaction, so if any row fails to enter the target table, none of the rows enters the table.

    INSERT INTO dbo.Orders(orderid, orderdate, empid, custid)
      SELECT orderid, orderdate, empid, custid
      FROM Sales.Orders
      WHERE shipcountry = N'UK';
    If a system function such as SYSDATETIME is included in the inserted query, the function gets invoked only once for the entire query and not once per row. The exception to this rule is if globally unique identifiers (GUIDs) is generated using the NEWID function, which gets invoked per row.

10.1.3. INSERT EXEC

The INSERT EXEC statement is used to insert a result set returned from a stored procedure or a dynamic SQL batch into a target table.

CREATE OR ALTER PROC Sales.GetOrders
  @country AS NVARCHAR(40)
AS
SELECT orderid, orderdate, empid, custid
FROM Sales.Orders
WHERE shipcountry = @country;
GO
INSERT INTO dbo.Orders (orderid, orderdate, empid, custid)
EXEC Sales.GetOrders @country = N'France';

10.1.4. SELECT INTO

The SELECT INTO statement is a nonstandard (not part of the ISO and ANSI SQL) T-SQL statement that CREATEs a target table and populates it with the result set of a query.

DROP TABLE IF EXISTS dbo.Orders;

SELECT orderid, orderdate, empid, custid
INTO dbo.Orders
FROM Sales.Orders;
-- SELECT INTO statement with set operations
DROP TABLE IF EXISTS dbo.Locations;

SELECT country, region, city
INTO dbo.Locations
FROM Sales.Customers

EXCEPT

SELECT country, region, city
FROM HR.Employees;
  • The target table’s structure and data are based on the source table.

  • The SELECT INTO statement copies from the source the base structure (such as column names, types, nullability, and identity property) and the data, but does not copy from the source constraints, indexes, triggers, column properties, and permissions.

10.1.5. BULK INSERT

The BULK INSERT statement to insert into an existing table data originating from a file.

BULK INSERT dbo.Orders FROM 'c:\temp\orders.txt'
  WITH(
    DATAFILETYPE    = 'char',
    FIELDTERMINATOR = ',',
    ROWTERMINATOR   = '\n'
  );

10.2. DELETE and TRUNCATE

T-SQL provides two statements for deleting rows from a table: DELETE and TRUNCATE.

  • The DELETE statement is a standard statement used to delete data from a table based on an optional filter predicate.

    DELETE FROM dbo.Orders
    WHERE orderdate < '20210101';
    • The DELETE statement tends to be expensive when deleting a large number of rows, mainly because it’s a fully logged operation.

  • The standard TRUNCATE statement deletes all rows from a table without filter.

    TRUNCATE TABLE dbo.T1;
    • The advantage that TRUNCATE has over DELETE is that the former is minimally logged, whereas the latter is fully logged, resulting in significant performance differences.

    • TRUNCATE resets the identity value back to the original seed, but DELETE doesn’t—even when used without a filter.

    • The TRUNCATE statement is not allowed when the target table is referenced by a foreign-key constraint, even if the referencing table is empty and even if the foreign key is disabled.

    • The TRUNCATE statement can be used to truncate individual partitions in a partitioned table.

      TRUNCATE TABLE dbo.T1 WITH ( PARTITIONS(1, 3, 5, 7 TO 10) );
  • T-SQL supports a nonstandard DELETE syntax based on joins to delete rows from one table based on a filter against attributes in related rows from another table.

    DELETE FROM O
     FROM dbo.Orders AS O
      INNER JOIN dbo.Customers AS C
      ON O.custid = C.custid
    WHERE C.country = N'USA';

10.3. UPDATE

T-SQL supports a standard UPDATE statement to update rows in a table, and also supports nonstandard forms of the UPDATE statement with joins and with variables.

  • The UPDATE statement is a standard statement to update a subset of rows in a table.

    • To identify the subset of rows to update, specify a predicate in a WHERE clause.

    • Specify the assignment of values to columns in a SET clause, separated by commas.

      UPDATE dbo.OrderDetails
        SET discount = discount + 0.05
      WHERE productid = 51;
    • T-SQL supports compound assignment operators: += (plus equal), −= (minus equal), *= (multiplication equal), /= (division equal), %= (modulo equal), and others.

      UPDATE dbo.OrderDetails
        SET discount += 0.05
      WHERE productid = 51;
    • All-at-once operations: all expressions that appear in the same logical phase are evaluated as a set, logically at the same point in time.

      -- the assignments take place all at once, meaning that both assignments use the same value of col1—the value before the update.
      UPDATE dbo.T1
        SET col1 = col1 + 10, col2 = col1 + 10;
      -- swap the values in the columns col1 and col2
      UPDATE dbo.T1
        SET col1 = col2, col2 = col1;
  • The UPDATE statement also supports a nonstandard form based on joins that serves a filtering purpose.

    UPDATE OD
      SET discount += 0.05
    FROM dbo.OrderDetails AS OD
      INNER JOIN dbo.Orders AS O
      ON OD.orderid = O.orderid
    WHERE O.custid = 1;
    -- same task by using standard code (recommended)
    UPDATE dbo.OrderDetails
      SET discount += 0.05
    WHERE EXISTS
      (SELECT *
      FROM dbo.Orders AS O
      WHERE O.orderid = OrderDetails.orderid
        AND O.custid = 1);
  • T-SQL supports a proprietary UPDATE syntax that both updates data in a table and assigns values to variables at the same time.

    DECLARE @nextval AS INT;
    
    UPDATE dbo.MySequences
      SET @nextval = val += 1
    WHERE id = 'SEQ1';
    
    SELECT @nextval;

10.4. MERGE

T-SQL supports a statement called MERGE to merge data from a source into a target, applying different actions (INSERT, UPDATE, and DELETE) based on conditional logic.

A task achieved by a single MERGE statement typically translates to a combination of several other DML statements (INSERT, UPDATE, and DELETE) without MERGE.

MERGE INTO dbo.Customers AS TGT
USING dbo.CustomersStage AS SRC
  ON TGT.custid = SRC.custid
WHEN MATCHED THEN
  UPDATE SET
    TGT.companyname = SRC.companyname,
    TGT.phone = SRC.phone,
    TGT.address = SRC.address
WHEN NOT MATCHED THEN
  INSERT (custid, companyname, phone, address)
  VALUES (SRC.custid, SRC.companyname, SRC.phone, SRC.address);

11. Temporal Tables

Temporal tables provide a powerful mechanism for tracking changes to data over time to easily capture and query historical versions of data, which is crucial for various scenarios such as auditing, data analysis, and data recovery.

  • Track Data Changes: Capture all modifications (insertions, updates, deletions) to the data over time.

  • Audit Data Modifications: Track who made changes, when they were made, and the previous state of the data for auditing and compliance purposes.

  • Perform Point-in-Time Analysis: Analyze data as it existed at any point in the past.

  • Data Recovery: Easily restore previous versions of data in case of accidental deletions or updates.

  • Support Slowly Changing Dimensions: Efficiently manage slowly changing dimensions (SCDs) in data warehousing.

The SQL standard supports three types of temporal tables:

  • System-versioned temporal tables rely on the system transaction time to define the validity period of a row.

  • Application-time period tables rely on the application’s definition of the validity period of a row.

  • Bitemporal combines the two types just mentioned (transaction and valid time).

    SQL Server 2022 supports only system-versioned temporal tables.

A system-versioned temporal table has two columns representing the validity period of the row, plus a linked history table with a mirrored schema holding older states of modified rows.

  • To create a system-versioned temporal table, make sure the table definition has all the following elements:

    CREATE TABLE dbo.Employees (
      empid INT                              NOT NULL
        CONSTRAINT PK_Employees PRIMARY KEY, (1)
      empname VARCHAR(25)                    NOT NULL,
      department VARCHAR(50)                 NOT NULL,
      salary NUMERIC(10, 2)                  NOT NULL,
      validfrom DATETIME2(0)
        GENERATED ALWAYS AS ROW START HIDDEN NOT NULL, (2)
      validto DATETIME2(0)
        GENERATED ALWAYS AS ROW END   HIDDEN NOT NULL, (2)
      PERIOD FOR SYSTEM_TIME (validfrom, validto) (3)
    )
    WITH (
      SYSTEM_VERSIONING = ON (4)
        (
          HISTORY_TABLE = dbo.EmployeesHistory, (5)
          HISTORY_RETENTION_PERIOD = 5 YEARS    (6)
        )
    );
    1 A primary key
    2 Two columns defined as DATETIME2 with any precision, which are non-nullable and represent the start and end of the row’s validity period in the UTC time zone
    • A start column that should be marked with the option GENERATED ALWAYS AS ROW START

    • An end column that should be marked with the option GENERATED ALWAYS AS ROW END

    • Optionally, the period columns can be marked as hidden so that when querying the table with SELECT * they won’t be returned and when inserting data they’ll be ignored.

    • The modification times that SQL Server records in the period columns reflect the transaction start time.

      If a long-running transaction that started at point in time T1 and ended at T2, SQL Server will record T1 as the modification time for all statements.

    3 A designation of the period columns with the option PERIOD FOR SYSTEM_TIME (<startcol>, <endcol>)
    4 The table option SYSTEM_VERSIONING, which should be set to ON
    5 A linked history table (which SQL Server can create automatically) to hold the past states of modified rows

    If do not specify a name for the table, SQL Server assigns one using the form MSSQL_TemporalHistoryFor_<object_id>, where object_id is the object ID of the current table.

    6 Optionally, define a history retention policy using the HISTORY_RETENTION_PERIOD subclause of the SYSTEM_VERSIONING clause.
  • To drop a system-versioned table, first disable system versioning with an ALTER TABLE command, and then manually drop the current and history tables.

    IF OBJECT_ID(N'dbo.Employees', N'U') IS NOT NULL
    BEGIN
      ALTER TABLE dbo.Employees SET ( SYSTEM_VERSIONING = OFF );
      DROP TABLE IF EXISTS dbo.EmployeesHistory;
      DROP TABLE IF EXISTS dbo.Employees;
    END;
  • When modifying data, interact with the current table, issuing normal data-modification statements.

    SQL Server automatically updates the period columns and moves older versions of rows to the history table.

    INSERT INTO dbo.Employees (empid, empname, department, salary)
      VALUES(1, 'Sara', 'IT'       , 50000.00),
            (2, 'Don' , 'HR'       , 45000.00),
            (3, 'Judy', 'Sales'    , 55000.00),
            (4, 'Yael', 'Marketing', 55000.00),
            (5, 'Sven', 'IT'       , 45000.00),
            (6, 'Paul', 'Sales'    , 40000.00);
    -- current table has the six new rows
    SELECT empid, empname, department, salary, validfrom, validto
    FROM dbo.Employees;
    
    -- history table is empty at this point
    SELECT empid, empname, department, salary, validfrom, validto
    FROM dbo.EmployeesHistory;
    -- SQL Server moves the deleted row to the history table, setting its validto value to the deletion time.
    DELETE FROM dbo.Employees
    WHERE empid = 6;
    
    SELECT empid, empname, department, salary, validfrom, validto
    FROM dbo.EmployeesHistory;
    -- 6	Paul	Sales	40000.00	2025-01-15 03:42:15	2025-01-15 03:44:53
    -- An update of a row is treated as a delete plus an insert.
    UPDATE dbo.Employees
      SET salary *= 1.05
    WHERE department = 'IT';
    
    SELECT empid, empname, department, salary, validfrom, validto
    FROM dbo.Employees
    WHERE department = 'IT';
    -- 1	Sara	IT	52500.00	2025-01-15 03:47:42	9999-12-31 23:59:59
    -- 5	Sven	IT	47250.00	2025-01-15 03:47:42	9999-12-31 23:59:59
    
    SELECT empid, empname, department, salary, validfrom, validto
    FROM dbo.EmployeesHistory
    WHERE department = 'IT';
    -- 1	Sara	IT	50000.00	2025-01-15 03:42:15	2025-01-15 03:47:42
    -- 5	Sven	IT	45000.00	2025-01-15 03:42:15	2025-01-15 03:47:42
  • When querying data, for the current state, simply query the current table as usual, and to see a past state, correct to a certain point or period of time, query the current table followed by the FOR SYSTEM_TIME clause, plus a subclause that indicates more specifics.

    SQL Server will retrieve the data from both the current and history tables as needed.

    Table 1. Qualifying rows for FOR SYSTEM_TIME subclauses [<datetime2 value>]

    Subclause

    Qualifying rows

    AS OF @datetime

    validfrom <= @datetime AND validto > @datetime

    FROM @start TO @end

    validfrom < @end AND validto > @start

    BETWEEN @start AND @end

    validfrom <= @end AND validto > @start

    CONTAINED IN(@start, @end)

    validfrom >= @start AND validto <= @end

    ALL

    All rows from both tables (T-SQL*)

    DECLARE @datetime AS DATETIME2 = '2025-01-15 03:45:00';
    
    SELECT empid, empname, department, salary, validfrom, validto
    FROM dbo.Employees FOR SYSTEM_TIME AS OF @datetime;
    
    -- same as
    
    DECLARE @datetime AS DATETIME2 = '2025-01-15 03:45:00';
    
    SELECT empid, empname, department, salary, validfrom, validto
    FROM dbo.Employees
    WHERE validfrom <= @datetime AND validto > @datetime
    
    UNION ALL
    
    SELECT empid, empname, department, salary, validfrom, validto
    FROM dbo.EmployeesHistory
    WHERE validfrom <= @datetime AND validto > @datetime

12. Transactions and Concurrency

A transaction is a unit of work that might include multiple activities that query and modify data and that can also change the data definition.

  • Transaction boundaries can be defined either explicitly or implicitly.

    • A transaction explicitly is defined beginning with a BEGIN TRAN (or BEGIN TRANSACTION) statement, and end explicitly with a COMMIT TRAN statement to commit it and with a ROLLBACK TRAN (or ROLLBACK TRANSACTION) statement to undo its changes.

      BEGIN TRAN;
        INSERT INTO dbo.T1(keycol, col1, col2) VALUES(4, 101, 'C');
        INSERT INTO dbo.T2(keycol, col1, col2) VALUES(4, 201, 'X');
      COMMIT TRAN;
  • If the boundaries of a transaction isn’t marked explicitly, by default, SQL Server treats each individual statement as a transaction, which is known as an auto-commit mode.

  • Transactions have four properties—atomicity, consistency, isolation, and durability— abbreviated with the acronym ACID:

    • Atomicity: A transaction is an atomic unit of work. Either all changes in the transaction take place or none do.

      • If the system fails before a transaction is completed (before the commit instruction is recorded in the transaction log), upon restart, SQL Server undoes the changes that took place.

      • Also, if errors are encountered during the transaction and the error is considered severe enough, such as the target filegroup being full when trying to insert data, SQL Server automatically rolls back the transaction.

    • Consistency: The term consistency refers to the state of the data that the relational database management system (RDBMS) as concurrent transactions modify and query it, which is a subjective term, and depends on an application’s needs.

    • Isolation: Isolation ensures that transactions access only consistent data through a mechanism called isolation levels.

      • With disk-based tables, SQL Server supports two different models to handle isolation: one based purely on locking, and another based on a combination of locking and row versioning.

        • The model based on locking is the default in a box product.

          In this model, readers require shared locks. If the current state of the data is inconsistent, readers are blocked until the state of the data becomes consistent.

        • The model based on [locking and] row versioning is the default in Azure SQL Database.

          In this model, readers don’t take shared locks and don’t need to wait. If the current state of the data is inconsistent, the reader gets an older consistent state.

    • Durability: The durability property means that once a commit instruction is acknowledged by the database engine, the transaction’s changes are guaranteed to be durable—or in other words, persist—in the database.

      • A commit is acknowledged by getting control back to the application and running the next line of code.

      • Data changes are always written to the database’s transaction log on disk before they are written to the data portion of the database on disk.

      • After the commit instruction is recorded in the transaction log on disk, the transaction is considered durable even if the change hasn’t yet made it to the data portion on disk.

      • When the system starts, either normally or after a system failure, SQL Server runs a recovery process in each database that involves analyzing the log, then applying a redo phase, and then applying an undo phase.

        • The redo phase involves rolling forward (replaying) all the changes from any transaction whose commit instruction is written to the log but whose changes haven’t yet made it to the data portion.

        • The undo phase involves rolling back (undoing) the changes from any transaction whose commit instruction was not recorded in the log.

-- Start a new transaction
BEGIN TRAN;
-- Declare a variable
DECLARE @neworderid AS INT;
-- Insert a new order into the Sales.Orders table
INSERT INTO Sales.Orders
  (custid, empid, orderdate, requireddate, shippeddate,
  shipperid, freight, shipname, shipaddress, shipcity,
  shippostalcode, shipcountry)
VALUES
  (85, 5, '20220212', '20220301', '20220216',
    3, 32.38, N'Ship to 85-B', N'6789 rue de l''Abbaye', N'Reims',
    N'10345', N'France');
-- Save the new order ID in a variable
SET @neworderid = SCOPE_IDENTITY();
-- Return the new order ID
SELECT @neworderid AS neworderid;
-- Insert order lines for the new order into Sales.OrderDetails
INSERT INTO Sales.OrderDetails
  (orderid, productid, unitprice, qty, discount)
VALUES(@neworderid, 11, 14.00, 12, 0.000),
  (@neworderid, 42, 9.80, 10, 0.000),
  (@neworderid, 72, 34.80, 5, 0.000);
-- Commit the transaction
COMMIT TRAN;

12.1. Locks and Blocking

By default, a SQL Server box product uses a pure locking model to enforce the isolation property of transactions, whereas Azure SQL Database uses the row-versioning model by default.

-- turn off the database property READ_COMMITTED_SNAPSHOT to switch to the locking model as the default
ALTER DATABASE TSQLV6 SET READ_COMMITTED_SNAPSHOT OFF;

12.1.1. Locks

Locks are control resources obtained by a transaction to guard data resources, preventing conflicting or incompatible access by other transactions.

12.1.1.1. Lock Modes and Compatibility

When trying to modify data, a transaction requests an exclusive lock on the data resource, regardless of the isolation level. If granted, the exclusive lock is held until the end of the transaction.

  • For single- statement transactions, this means that the lock is held until the statement completes.

  • For multistatement transactions, this means that the lock is held until all statements complete and the transaction is ended by a COMMIT TRAN or ROLLBACK TRAN command.

As for reading data, the defaults are different for a SQL Server box product and Azure SQL Database.

  • In SQL Server, the default isolation level is called READ COMMITTED.

    In this isolation, when trying to read data, by default a transaction requests a shared lock on the data resource and releases the lock as soon as the read statement is done with that resource.

  • In Azure SQL Database, the default isolation level is called READ COMMITTED SNAPSHOT.

    Instead of relying only on locking, this isolation level relies on a combination of locking and row versioning.

    • Under this isolation level, readers do not require shared locks, and therefore they never wait; they rely on the row-versioning technology to provide the expected isolation.

  • Under the READ COMMITTED isolation level, if a transaction modifies rows, until the transaction completes, another transaction can’t read the same rows.

    This approach to concurrency control is known as the pessimistic concurrency approach.

  • Under the READ COMMITTED SNAPSHOT isolation level, if a transaction modifies rows, another transaction trying to read the data will get the last committed state of the rows that was available when the statement started.

    This approach to concurrency control is known as the optimistic concurrency approach.

    READ COMMITTED SNAPSHOT is an MVCC-based implementation of the READ COMMITTED isolation level in SQL Server.
Table 2. Lock compatibility of exclusive and shared locks
Requested mode Granted Exclusive (X) Granted Shared (S)

Exclusive

No

No

Shared

No

Yes

A “No” in the intersection means that the locks are incompatible and the requested mode is denied; the requester must wait. A “Yes” in the intersection means that the locks are compatible and the requested mode is accepted.
12.1.1.2. Lockable Resource Types

SQL Server can lock different types of resources that include rows (RID in a heap, key in an index), pages, objects (for example, tables), databases, and others. Rows reside within pages, and pages are the physical data blocks that contain table or index data.

To obtain a lock on a certain resource type, a transaction must first obtain intent locks of the same mode on higher levels of granularity to efficiently detect incompatible lock requests on higher levels of granularity and prevent the granting of those.

SQL Server determines dynamically which resource types to lock.

  • Naturally, for ideal concurrency, it’s best to lock only what needs to be locked—namely, only the affected rows.

  • However, locks require memory resources and internal management overhead. So SQL Server considers both concurrency and system resources when it’s choosing which resource types to lock.

    • When SQL Server estimates that a transaction will interact with a small number of rows, it tends to use row locks.

    • With larger numbers of rows, SQL Server tends to use page locks.

  • SQL Server might first acquire fine-grained locks (such as row or page locks) and, in certain circumstances, try to escalate the fine-grained locks to a table lock to preserve memory.

12.1.2. Blocking

When one transaction holds a lock on a data resource and another transaction requests an incompatible lock on the same resource, the request is blocked and the requester enters a wait state.

  • By default, the blocked request keeps waiting until the blocker releases the interfering lock.

    • To restrict the amount of time the session waits for a lock, set a session option called LOCK_TIMEOUT.

    • Specify a value in milliseconds—such as 5000 for 5 seconds, 0 for an immediate timeout, and –1 for no timeout (which is the default).

      SET LOCK_TIMEOUT 5000;
      
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
      Msg 1222, Level 16, State 51, Line 3
      Lock request time out period exceeded.
    • To remove the lock timeout value, set it back to the default (no timeout), and issue the query again.

      SET LOCK_TIMEOUT -1;
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
  • The dynamic management view (DMV) sys.dm_tran_locks can be used to get lock information, including both locks granted to sessions and locks sessions waiting for.

    -- Connection 1: hold exclusive lock to write
    BEGIN TRAN;
      UPDATE Production.Products
        SET unitprice += 1.00
      WHERE productid = 2;
    -- no COMMIT TRAN or ROLLBACK TRAN, the transaction remains open, and the lock is still held
    -- Connection 2: needs a shared lock to read, but be blocked and has to wait
    SELECT productid, unitprice
    FROM Production.Products
    WHERE productid = 2;
    -- Connection 3
    SELECT -- use * to explore other available attributes
        request_session_id            AS sid,
        resource_type                 AS restype,
        resource_database_id          AS dbid,
        DB_NAME(resource_database_id) AS dbname,
        resource_description          AS res,
        resource_associated_entity_id AS resid,
        request_mode                  AS mode,
        request_status                AS status
    FROM sys.dm_tran_locks;
    sid restype  dbid dbname res            resid             mode status
    52  DATABASE 6    TSQLV6                0                 S    GRANT
    56  DATABASE 6    TSQLV6                0                 S    GRANT
    59  DATABASE 6    TSQLV6                0                 S    GRANT
    52  PAGE     6    TSQLV6 1:456          72057594046251008 IS   GRANT
    56  PAGE     6    TSQLV6 1:456          72057594046251008 IX   GRANT
    52  OBJECT   6    TSQLV6                1029578706        IS   GRANT
    56  OBJECT   6    TSQLV6                1029578706        IX   GRANT
    56  KEY      6    TSQLV6 (61a06abd401c) 72057594046251008 X    GRANT
    52  KEY      6    TSQLV6 (61a06abd401c) 72057594046251008 S    WAIT
    • Each session is identified by a unique session ID.

      • A session’s ID can be determined by querying the function @@SPID.

      • If working with SQL Server Management Studio, the session ID could be found in parentheses to the right of the login name in the status bar at the bottom of the query window that has the focus, and also in the caption of the connected query window.

    • By observing that both sessions lock a row with the same res and resid values, session 52 is waiting for a shared lock on a row in the sample database TSQLV6 that is being held as an exclusive lock by session 56.

    • The involved table can be figured out by moving upward in the lock hierarchy for either session 52 or 56 and inspecting the intent locks on the object (table) where the row resides.

    • The OBJECT_NAME function can be used to translate the object ID (1029578706, in this example) that appears under the resid attribute in the object lock, that is Production.Products.

      SELECT OBJECT_NAME(1029578706); -- Products
    • The sys.dm_tran_locks view gives the information about the IDs of the sessions involved in the blocking chain, that is, two or more sessions that are involved in the blocking situation, such as session x blocking session y, session y blocking session z, and so on—hence the use of the term chain.

  • The DMV sys.dm_exec_connections can be used to get information about the connections associated with those session IDs and filter only the session IDs that are involved:

    SELECT -- use * to explore
        session_id AS sid,
        connect_time,
        last_read,
        last_write,
        most_recent_sql_handle
    FROM sys.dm_exec_connections
    WHERE session_id IN(52, 56);
    52	2025-01-13 14:50:57.367	2025-01-13 14:54:07.930	2025-01-13 14:54:07.923	0x0200000063FC7D052E09844778CDD615CFE7A2D1FB4118020000000000000000000000000000000000000000
    56	2025-01-13 14:53:33.587	2025-01-13 14:53:52.560	2025-01-13 14:53:52.560	0x020000008FAC322CF2FC73472F8E93B0DF1994A69639ED090000000000000000000000000000000000000000
    • A binary value holding a handle to the most recent SQL batch run by the connection.

      • The handle can be provided as an input parameter to a table function called sys.dm_exec_sql_text, and the function returns the batch of code represented by the handle.

        SELECT session_id, text
        FROM sys.dm_exec_connections
          CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle) AS ST
        WHERE session_id IN(52, 56);
        52	(@1 tinyint)SELECT [productid],[unitprice] FROM [Production].[Products] WHERE [productid]=@1
        56	BEGIN TRAN;    UPDATE Production.Products      SET unitprice += 1.00    WHERE productid = 2;
      • Starting with SQL Server 2016, the function sys.dm_exec_input_buffer instead of sys.dm_exec_sql_text can be used to get the code that the sessions of interest submitted last.

        SELECT session_id, event_info
        FROM sys.dm_exec_connections
          CROSS APPLY sys.dm_exec_input_buffer(session_id, NULL) AS IB
        WHERE session_id IN(52, 56);
  • The DMV sys.dm_exec_sessions can be used to find a lot of useful information about the sessions involved in a blocking situation.

    SELECT -- use * to explore
      session_id AS sid,
      login_time,
      host_name,
      program_name,
      login_name,
      nt_user_name,
      last_request_start_time,
      last_request_end_time
    FROM sys.dm_exec_sessions
    WHERE session_id IN(52, 56);
  • Another DMV sys.dm_exec_requests can probably be used to find useful for troubleshooting blocking situations.

    SELECT -- use * to explore
      session_id AS sid,
      blocking_session_id,
      command,
      sql_handle,
      database_id,
      wait_type,
      wait_time,
      wait_resource
    FROM sys.dm_exec_requests
    WHERE blocking_session_id > 0;
  • To terminate the blocker—for example, if realizing that as a result of a bug in the application the transaction remained open and nothing in the application can close it—do so by using the KILL <session_id> command.

    KILL 56;

12.2. Isolation Levels

Isolation levels determine the level of consistency when interacting with data. In the default isolation level in a box product, a reader uses shared locks on the target resources and a writer uses exclusive locks.

  • SQL Server supports four isolation levels that are based on the pure locking model: READ UNCOMMITTED, READ COMMITTED (the default in a SQL Server box product), REPEATABLE READ, and SERIALIZABLE.

  • SQL Server also supports two isolation levels that are based on a combination of locking and row versioning: SNAPSHOT and READ COMMITTED SNAPSHOT (the default in Azure SQL Database).

    SNAPSHOT and READ COMMITTED SNAPSHOT are in a sense the row-versioning counterparts of READ COMMITTED and SERIALIZABLE, respectively.

  • The isolation level of the whole session can be set by using the following command:

    SET TRANSACTION ISOLATION LEVEL <isolationname>;
  • The isolation level of a query can be set by using a table hint:

    SELECT ... FROM <table> WITH (<isolationname>);
  • With the first four isolation levels, the higher the isolation level, the stricter the locks are that readers request and the longer their duration is; therefore, the higher the isolation level is, the higher the consistency is and the lower the concurrency is.

  • With the two row-versioning-based isolation levels, SQL Server is able to store previous committed versions of rows in a version store. Readers do not request shared locks; instead, if the current version of the rows is not what they are supposed to see, SQL Server provides them with an older version.

    Table 3. Isolation level properties

    Isolation level

    Allows uncommitted reads?

    Allows nonrepeatable reads?

    Allows lost updates?

    Allows phantom reads?

    Detects update conflicts?

    Uses row versioning?

    READ UNCOMMITTED

    Yes

    Yes

    Yes

    Yes

    No

    No

    READ COMMITTED

    No

    Yes

    Yes

    Yes

    No

    No

    REPEATABLE READ

    No

    No

    No

    Yes

    No

    No

    SERIALIZABLE

    No

    No

    No

    No

    No

    No

    SNAPSHOT

    No

    No

    No

    No

    Yes

    Yes

    READ COMMITTED SNAPSHOT

    No

    Yes

    Yes

    Yes

    No

    Yes

12.2.1. READ UNCOMMITTED

READ UNCOMMITTED is the lowest available isolation level, that is, a reader doesn’t ask for a shared lock.

  • A reader that doesn’t ask for a shared lock can never be in conflict with a writer that is holding an exclusive lock, so that the reader can read uncommitted changes (also known as dirty reads).

  • It also means the reader won’t interfere with a writer that asks for an exclusive lock, that is, a writer can change data while a reader that is running under the READ UNCOMMITTED isolation level reads data.

    1. Open a transaction, update the unit price of product 2 by adding 1.00 to its current price (19.00), and then query the product’s row

      -- Connection 1
      BEGIN TRAN;
      
      UPDATE Production.Products
        SET unitprice += 1.00
      WHERE productid = 2;
      
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
      2	20.00
    2. Set the isolation level to READ UNCOMMITTED and query the row for product 2.

      The query returned the state of the row after the change, even though the change was not committed
      -- Connection 2
      SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
      
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
      2	20.00
    3. Keep in mind that Connection 1 might apply further changes to the row later in the transaction or even roll back at some point.

      -- Connection 1
      ROLLBACK TRAN;
    4. The above rollback undoes the update of product 2, changing its price back to 19.00. The value 20.00 that the reader got was never committed which is called dirty read.

12.2.2. READ COMMITTED

The lowest isolation level that prevents dirty reads is READ COMMITTED, which is also the default isolation level in SQL Server (the box product), to prevent uncommitted reads by requiring a reader to obtain a shared lock.

-- Connection 1
-- open a transaction, update the price of product 2, and query the row to show the new price
BEGIN TRAN;
   UPDATE Production.Products
     SET unitprice += 1.00
   WHERE productid = 2;

   SELECT productid, unitprice
   FROM Production.Products
   WHERE productid = 2;
-- Connection 2
-- Keep in mind that this isolation level is the default, so unless previously changed the session’s isolation level, it isn't needed to set it explicitly.
-- The SELECT statement is currently blocked because it needs a shared lock to be able to read the row, and this shared lock request is in conflict with the exclusive lock held by the writer in Connection 1
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

SELECT productid, unitprice
FROM Production.Products
WHERE productid = 2;
-- Connection 1
-- commit the transaction
COMMIT TRAN;
-- Connection 2
2	20.00
-- restore the unitprice of the product 2
UPDATE Production.Products
  SET unitprice -= 1.00 -- 19.00
WHERE productid = 2;

In terms of the duration of locks, in the READ COMMITTED isolation level, a reader holds the shared lock only until it’s done with the resource.

  • It doesn’t keep the lock until the end of the transaction; in fact, it doesn’t even keep the lock until the end of the statement, that means that in between two reads of the same data resource in the same transaction, no lock is held on the resource.

  • Therefore, another transaction can modify the resource in between those two reads, and the reader might get different values in each read, which is called nonrepeatable reads or inconsistent analysis.

12.2.3. REPEATABLE READ

The isolation level REPEATABLE READ can be used to get repeatable reads, or consistent analysis to ensure that no one can change values in between reads that take place in the same transaction, that is, not only does a reader need a shared lock to be able to read, but it also holds the lock until the end of the transaction.

The REPEATABLE READ but not by lower isolation levels can also prevent another phenomenon called a lost update that happens when two transactions read a value, make calculations based on what they read, and then update the value.

  • In isolation levels lower than REPEATABLE READ no lock is held on the resource after the read, both transactions can update the value, and whichever transaction updates the value last “wins,” overwriting the other transaction’s update.

  • In REPEATABLE READ, both sides keep their shared locks after the first read, so neither can acquire an exclusive lock later in order to update, which results in a deadlock, and the update conflict is prevented.

-- Connection 1
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- keep holding a shared lock on the row for product 2
BEGIN TRAN;
  SELECT productid, unitprice
  FROM Production.Products
  WHERE productid = 2;
2	19.00
-- Connection 2
-- blocked the modifier’s request for an exclusive lock in conflict with the reader’s granted shared lock.
UPDATE Production.Products
  SET unitprice += 1.00
WHERE productid = 2;
-- Connection 1
-- the second read got the same unit price for product 2 as the first read
  SELECT productid, unitprice
  FROM Production.Products
  WHERE productid = 2;
COMMIT TRAN; --  commit the transaction and release the shared lock
-- Connection 2
-- obtain the exclusive lock it was waiting for and update the row

12.2.4. SERIALIZABLE

Under the REPEATABLE READ isolation level, readers keep shared locks until the end of the transaction that locks only resources (e.g., rows) that the query found the first time it ran, not rows that weren’t there when the query ran, so that a second read in the same transaction might return new rows as well, which happens if, in between the reads, another transaction inserts new rows that satisfy the reader’s query filter, which are called phantoms, and such reads are called phantom reads.

The SERIALIZABLE isolation level can be used to prevent phantom reads, that requires a reader to obtain a shared lock on the whole range of keys that qualify for the query’s filter to be able to read, and it keeps the lock until the end of the transaction.

-- Connection 1
-- set the transaction isolation level to SERIALIZABLE, open a transaction, and query all products with category 1
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRAN
  SELECT productid, productname, categoryid, unitprice
  FROM Production.Products
  WHERE categoryid = 1;
-- Connection 2
-- In all isolation levels that are lower than SERIALIZABLE, such an attempt would be successful.
-- In the SERIALIZABLE isolation level, the attempt is blocked.
INSERT INTO Production.Products
    (productname, supplierid, categoryid, unitprice, discontinued)
VALUES('Product ABCDE', 1, 1, 20.00, 0);
sid               restype dbid dbname res            resid             mode     status
55 (Connection 1) KEY     6    TSQLV6 (61a06abd401c) 72057594046251008 RangeS-S GRANT
64 (Connection 2) KEY     6    TSQLV6 (61a06abd401c) 72057594046251008 X        WAIT
-- Connection 1
-- get the same output as before, with no phantoms
  SELECT productid, productname, categoryid, unitprice
  FROM Production.Products
  WHERE categoryid = 1;
COMMIT TRAN; -- transaction is committed and the shared key-range lock is released

12.2.5. Isolation Levels Based on Row Versioning

With the row-versioning technology, SQL Server can store previous versions of committed rows in a version store.

  • If the Accelerated Database Recovery (ADR) feature is not enabled in the database, the version store resides in the tempdb database.

  • If ADR is enabled, the version store resides in the user database in question.

SQL Server supports two isolation levels, called SNAPSHOT and READ COMMITTED SNAPSHOT, that are based on this row-versioning technology.

  • The SNAPSHOT isolation level is logically similar to the SERIALIZABLE isolation level in terms of the types of consistency problems that can or cannot happen.

  • The READ COMMITTED SNAPSHOT isolation level is similar to the READ COMMITTED isolation level.

  • Readers using isolation levels based on row versioning do not acquire shared locks, so they don’t wait when the requested data is exclusively locked.

    • In other words, readers don’t block writers and writers don’t block readers.

    • Readers still get levels of consistency similar to SERIALIZABLE and READ COMMITTED.

    • SQL Server provides readers with an older version of the row if the current version is not the one they are supposed to see.

Note that if enabling any of the row-versioning-based isolation levels (which are enabled in Azure SQL Database by default), the DELETE and UPDATE statements need to copy the version of the row before the change to the version store; INSERT statements don’t need to write anything to the version store, because no earlier version of the row exists.

But it’s important to be aware that enabling any of the isolation levels that are based on row versioning might have a negative impact on the performance of updates and deletes.

The performance of readers usually improves, sometimes dramatically, because they do not acquire shared locks and don’t need to wait when data is exclusively locked or its version is not the expected one.

12.2.5.1. SNAPSHOT

The SNAPSHOT isolation level, relies on row versioning instead of using shared locks, guarantees the reader to get the last committed version of the row that was available when the transaction started to get committed reads and repeatable reads, and not phantom reads.

To work with the SNAPSHOT isolation level in a SQL Server box product instance (enabled by default in Azure SQL Database), first enable the option at the database level by running the following code in any open query window:

ALTER DATABASE TSQLV6 SET ALLOW_SNAPSHOT_ISOLATION ON;
  1. Open a transaction, update the price of product 2 by adding 1.00 to its current price of 19.00, and show the new price.

    -- Connection 1
    BEGIN TRAN;
    
      UPDATE Production.Products
        SET unitprice += 1.00
      WHERE productid = 2;
    
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
    2	20.00
  2. If someone begins a transaction using the SNAPSHOT isolation level, that session can request the version before the update.

    -- Connection 2
    SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
    
    BEGIN TRAN;
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
    2	19.00
  3. If a transaction were under the SERIALIZABLE isolation level, the query would be blocked.

    -- Connection 3
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED; -- by default in SQL Server
    SET LOCK_TIMEOUT 5000;
    
    SELECT productid, unitprice
    FROM Production.Products
    WHERE productid = 2;
    Msg 1222, Level 16, State 51, Line 5
    Lock request time out period exceeded.
  4. Go back to Connection 1, and commit the transaction.

    -- at this point, the current version of the row with the price of 20.00 is a committed version.
    COMMIT TRAN;
  5. Go back to Connection 2, and run the following code.

    -- still get the last committed version of the row that was available when the transaction started (with a price of 19.00).
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
    COMMIT TRAN;
    2	19.00
12.2.5.2. Conflict Detection

The SNAPSHOT isolation level prevents update conflicts, but unlike the REPEATABLE READ and SERIALIZABLE isolation levels that do so by generating a deadlock, the SNAPSHOT isolation level generates a more specific error, indicating that an update conflict was detected by examining the version store to figure out whether another transaction modified the data between a read and a write that took place in a transaction.

  1. Set the transaction isolation level to SNAPSHOT, open a transaction, and read the row for product 2.

    -- Connection 1
    SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
    
    BEGIN TRAN;
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
    2	19.00
  2. Update the price of the product queried previously to 20.00, and commit the transaction.

    -- Connection 1
      UPDATE Production.Products
        SET unitprice = 20.00
      WHERE productid = 2;
    COMMIT TRAN;

    No other transaction modified the row between the read, calculation, and write; therefore, there was no update conflict and SQL Server allowed the update to take place.

  3. Restore the price of product 2 back to 19.00:

    UPDATE Production.Products
      SET unitprice = 19.00
    WHERE productid = 2;
  4. Next, run the following code in Connection 1, again, to open a transaction, and read the row for product 2:

    -- Connection 1
    BEGIN TRAN;
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
  5. Next, run the following code in Connection 2 to update the price of product 2 to 25.00:

    -- Connection 2
    UPDATE Production.Products
      SET unitprice = 25.00
    WHERE productid = 2;
  6. Go back to Connection 1, and try to update the price of the product to 20.00:

    UPDATE Production.Products
      SET unitprice = 20.00
    WHERE productid = 2;

    SQL Server detected that this time another transaction modified the data between the read and write; therefore, it fails the transaction with the following error:

    Msg 3960, Level 16, State 2, Line 1
    Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table 'Production.Products' directly or indirectly in database 'TSQLV6' to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement.
12.2.5.3. READ COMMITTED SNAPSHOT

The READ COMMITTED SNAPSHOT isolation level is also based on row versioning, but differs from the SNAPSHOT isolation level in that instead of providing a reader with a transaction-level consistent view of the data, it provides the reader with a statement-level consistent view of the data, and also does not detect update conflicts.

To make a reader to acquire a shared lock under READ COMMITTED SNAPSHOT, add a table hint called READCOMMITTEDLOCK to the SELECT statements, as in SELECT * FROM dbo.T1 WITH (READCOMMITTEDLOCK).

To enable the use of the READ COMMITTED SNAPSHOT isolation level in a SQL Server box product (enabled by default in Azure SQL Database), turn on a database option called READ_COMMITTED_SNAPSHOT.

ALTER DATABASE TSQLV6 SET READ_COMMITTED_SNAPSHOT ON;
Unlike the SNAPSHOT isolation level, this flag changes the meaning, or semantics, of the READ COMMITTED isolation level to READ COMMITTED SNAPSHOT, which means that when this database flag is turned on, unless explicitly changing the session’s isolation level, READ COMMITTED SNAPSHOT is the default.
  1. Run the following code in Connection 1 to open a transaction, update the row for product 2, and read the row, leaving the transaction open:

    -- Connection 1
    BEGIN TRAN;
    
    UPDATE Production.Products
      SET unitprice += 1.00
    WHERE productid = 2;
    
    SELECT productid, unitprice
    FROM Production.Products
    WHERE productid = 2;
    2	20.00
  2. In Connection 2, open a transaction and read the row for product 2, leaving the transaction open:

    -- Connection 2
    BEGIN TRAN;
    
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
    2	19.00
  3. Run the following code in Connection 1 to commit the transaction:

    -- Connection 1
    COMMIT TRAN;
  4. Now run the code in Connection 2 to read the row for product 2 again, and commit the transaction:

    -- Connection 2
    -- get the last committed version of the row that was available when the statement started (20.00) and not when the transaction started (19.00)
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
    COMMIT TRAN;
    2	19.00
  5. Close all connections. Open a new connection, and run the following code to disable the isolation levels that are based on row versioning in the TSQLV6 database:

    ALTER DATABASE TSQLV6 SET ALLOW_SNAPSHOT_ISOLATION OFF;
    ALTER DATABASE TSQLV6 SET READ_COMMITTED_SNAPSHOT OFF;

12.3. Deadlocks

A deadlock is a situation in which two or more sessions block each other.

  • An example of a two- session deadlock is when session A blocks session B and session B blocks session A.

  • An example of a deadlock involving more than two sessions is when session A blocks session B, session B blocks session C, and session C blocks session A.

  • In any of these cases, SQL Server detects the deadlock and intervenes by terminating one of the transactions.

  • If SQL Server did not intervene, the sessions involved would remain deadlocked forever.

Unless otherwise specified (DEADLOCK_PRIORITY), SQL Server chooses to terminate the transaction that did the least work (based on the activity written to the transaction log), because rolling that transaction’s work back is the cheapest option.

  1. Run the following code in Connection 1 to open a new transaction, update a row in the Production.Products table for product 2, and leave the transaction open:

    -- Connection 1
    BEGIN TRAN;
    
      UPDATE Production.Products
        SET unitprice += 1.00
      WHERE productid = 2;
  2. Run the following code in Connection 2 to open a new transaction, update a row in the Sales.OrderDetails table for product 2, and leave the transaction open:

    -- Connection 2
    BEGIN TRAN;
    
      UPDATE Sales.OrderDetails
        SET unitprice += 1.00
      WHERE productid = 2;
  3. Run the following code in Connection 1 to attempt to query the rows for product 2 in the Sales.OrderDetails table, and commit the transaction:

    -- Connection 1
    -- needs a shared lock to be able to perform its read, but blocked by Connection 2
      SELECT orderid, productid, unitprice
      FROM Sales.OrderDetails
      WHERE productid = 2;
    COMMIT TRAN;
  4. Next, run the following code in Connection 2 to attempt to query the row for product 2 in the Production.Products table and commit the transaction:

    -- Connection 2
    -- needs a shared lock to be able to perform its read, but blocked by Connection 1
      SELECT productid, unitprice
      FROM Production.Products
      WHERE productid = 2;
    COMMIT TRAN;
  5. At this point, each of the sessions blocks the other —results a deadlock. SQL Server identifies the deadlock (typically within a few seconds), chooses one of the sessions involved as the deadlock victim, and terminates its transaction with the following error:

    Msg 1205, Level 13, State 51, Line 3
    Transaction (Process ID 57) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Deadlocks are expensive because they involve undoing work that has already been done and then, usually with some error-handling logic, redoing the work.

  • Obviously, the longer the transactions are, the longer locks are kept, increasing the probability of deadlocks.

    • Keep transactions as short as possible, taking activities out of the transaction that aren’t logically supposed to be part of the same unit of work. For example, don’t use transactions that require user input to finish!

  • One typical deadlock, also called a deadly embrace deadlock (e.g., the above example), happens when transactions access resources in inverse order.

    • By swapping the order in one of the transactions, you can prevent this type of deadlock from happening—assuming that it makes no logical difference to your application.

  • Deadlocks often also happen when there is no real logical conflict (for example, trying to access the same rows), because of a lack of good indexing to support query filters.

    For example, suppose both statements in the transaction in Connection 2 filtered product 5.

    • Now that the statements in Connection 1 handle product 2 and the statements in Connection 2 handle product 5, there shouldn’t be any conflict.

    • However, if indexes aren’t defined on the productid column in the tables to support the filter, SQL Server has to scan (and lock) all rows in the table, that is, of course, can lead to a deadlock.

      While scanning the entire table, both connections might attempt to acquire locks on the same or overlapping table pages, leading to a deadlock situation even though they are working with different product data.

    • In short, good index design can help mitigate the occurrences of deadlocks that have no real logical conflict.

  • Another option to consider to mitigate deadlock occurrences is the choice of isolation level.

    • The SELECT statements in the example needed shared locks because they ran under the READ COMMITTED isolation level.

    • If using the READ COMMITTED SNAPSHOT isolation level, readers will not need shared locks, and deadlocks that evolve because of the involvement of shared locks can be eliminated.

13. Programmable Objects

Programmable objects in SQL Server are reusable code blocks that extend the capabilities of the database beyond basic data storage and retrieval.

13.1. Variables

A variable is used to temporarily store data values for later use in the same batch in which they were declared, and a batch is one or more T-SQL statements sent to Microsoft SQL Server for execution as a single unit.

  • Use a DECLARE statement to declare one or more variables, and use a SET statement to assign a value to a single variable.

    -- declares two variables called @i and @j of INT data type and assigns it the value 10 and 20
    DECLARE @i AS INT, @j AS INT;
    
    -- SET statement can operate on only one variable at a time
    SET @i = 10;
    SEt @j = 20;
  • Alternatively, a variable can be declared and initialized in the same statement, like this:

    DECLARE @i AS INT = 10, @j AS INT = 20;
  • When assign a value to a scalar variable, the value must be the result of a scalar expression.

    DECLARE @empname AS NVARCHAR(61);
    
    -- a scalar subquery
    SET @empname = (SELECT firstname + N' ' + lastname
                    FROM HR.Employees
                    WHERE empid = 3);
    -- a scalar subquery fails at run time if it returns more than one value
    DECLARE @empname AS NVARCHAR(61);
    
    SET @empname = (SELECT firstname + N' ' + lastname
                    FROM HR.Employees
                    WHERE mgrid = 2);
    Msg 512, Level 16, State 1, Line 2
    Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
  • T-SQL also supports a nonstandard assignment SELECT statement to query data and assign multiple values obtained from the same row to multiple variables by using a single statement.

    DECLARE @firstname AS NVARCHAR(20), @lastname AS NVARCHAR(40);
    
    -- if the query has more than one qualifying row, the values in the variables are those from the last row that SQL Server happened to access.
    SELECT
      @firstname = firstname,
      @lastname  = lastname
    FROM HR.Employees
    WHERE empid = 3;
    
    SELECT @firstname AS firstname, @lastname AS lastname;

13.2. Batchs

A batch is one or more T-SQL statements sent by a client application to SQL Server for execution as a single unit.

A transaction is an atomic unit of work. A batch can have multiple transactions, and a transaction can be submitted in parts as multiple batches.
  • Client application programming interfaces (APIs) such as ADO.NET provide methods for submitting a batch of code to SQL Server for execution.

  • SQL Server utilities such as SQL Server Management Studio (SSMS), Azure Data Studio (ADS), SQLCMD, and OSQL provide a client tool command called GO that signals the end of a batch.

Note that the GO command is a client tool command and not a T-SQL server command, and do not terminate the GO command with a semicolon.
  • A batch is a set of commands that are parsed and executed as a unit.

    • If the parsing is successful, SQL Server then attempts to execute the batch.

    • In the event of a syntax error in the batch, the whole batch is not submitted to SQL Server for execution.

      -- Valid batch
      PRINT 'First batch';
      USE TSQLV6;
      GO
      -- Invalid batch
      PRINT 'Second batch';
      SELECT custid FROM Sales.Customers;
      SELECT orderid FOM Sales.Orders;
      GO
      -- Valid batch
      PRINT 'Third batch';
      SELECT empid FROM HR.Employees;
      First batch
      Msg 102, Level 15, State 1, Line 8
      Incorrect syntax near 'Sales'.
      Third batch
  • A variable is local to the batch in which it’s defined, that is, it can’t be referred to in another batch.

    DECLARE @i AS INT = 10;
    -- Succeeds
    PRINT @i;
    GO
    
    -- Fails
    PRINT @i;
    10
    Msg 137, Level 15, State 2, Line 6
    Must declare the scalar variable "@i".
  • The CREATE statements cannot be combined with other statements in the same batch.

    -- CREATE DEFAULT, CREATE FUNCTION, CREATE PROCEDURE, CREATE RULE, CREATE SCHEMA, CREATE TRIGGER, and CREATE VIEW
    DROP VIEW IF EXISTS Sales.MyView;
    -- GO -- To get around the problem, add a GO command here
    
    CREATE VIEW Sales.MyView
    AS
    SELECT YEAR(orderdate) AS orderyear, COUNT (*) AS numorders
    FROM Sales.Orders
    GROUP BY YEAR(orderdate);
    GO
    Msg 111, Level 15, State 1, Line 3
    'CREATE VIEW' must be the first statement in a query batch.
  • A batch is a unit of resolution (also known as binding), that means that checking the existence of objects and columns happens at the batch level.

    When applying schema changes to an object and try to manipulate the object data in the same batch, SQL Server might not be aware of the schema changes yet and fail the data-manipulation statement with a resolution error.

    DROP TABLE IF EXISTS dbo.T1;
    CREATE TABLE dbo.T1(col1 INT);
    GO
    
    ALTER TABLE dbo.T1 ADD col2 INT;
    -- GO -- To get around the problem, add a GO command here
    SELECT col1, col2 FROM dbo.T1; -- Invalid column name 'col2'.
    GO
    
    DROP TABLE IF EXISTS dbo.T1;
  • The GO command is not really a T-SQL command; it’s actually a command used by SQL Server’s client tools, such as SSMS, to denote the end of a batch. It also supports an argument indicating how many times you want to execute the batch.

    DROP TABLE IF EXISTS dbo.T1;
    CREATE TABLE dbo.T1(col1 INT IDENTITY);
    GO
    
    SET NOCOUNT ON;
    INSERT INTO dbo.T1 DEFAULT VALUES;
    GO 100
    
    SELECT SUM(col1) FROM dbo.T1; -- (1 + 100) * 100 / 2 = 5050
    GO
    
    DROP TABLE IF EXISTS dbo.T1;

13.3. Flow Elements: IF and WHILE

T-SQL provides basic forms of control with flow elements to control the flow of the code, including the IF . . . ELSE element and the WHILE element.

  • The IF . . . ELSE element is used to control the flow of a code based on the result of a predicate.

    IF YEAR(SYSDATETIME()) <> YEAR(DATEADD(day, 1, SYSDATETIME()))
    -- a statement or statement block that is executed if the predicate is TRUE
       PRINT 'Today is the last day of the year.';
    -- optionally a statement or statement block that is executed if the predicate is FALSE or UNKNOWN.
    ELSE
    BEGIN -- mark the boundaries of a statement block with the BEGIN and END keywords
      IF MONTH(SYSDATETIME()) <> MONTH(DATEADD(day, 1, SYSDATETIME()))
        PRINT 'Today is the last day of the month but not the last day of the year.';
      ELSE
        PRINT 'Today is not the last day of the month.';
    END
    DECLARE @score AS INT = CAST(RAND() * 100 AS INT);
    
    IF @score > 90
      PRINT 'A';
    ELSE IF @score > 80
      PRINT 'B';
    ELSE IF @score > 70
      PRINT 'C';
    ELSE IF @score > 60
      PRINT 'D';
    ELSE
      PRINT 'F';
    -- CASE is expression, instead of statement
    DECLARE @score AS INT = CAST(RAND() * 100 AS INT);
    
    SELECT CASE
      WHEN @score >= 90 THEN 'A'
      WHEN @score >= 80 THEN 'B'
      WHEN @score >= 70 THEN 'C'
      WHEN @score >= 60 THEN 'D'
      ELSE 'F'
    END AS Grade;
  • The WHILE element executes a statement or statement block repeatedly while the predicate specified after the WHILE keyword is TRUE, otherwise, the loop terminates when the predicate is FALSE or UNKNOWN.

    DECLARE @i AS INT = 0;
    WHILE @i <= 10
    BEGIN
      SET @i = @i + 1;
      IF @i = 3 CONTINUE; -- skip the rest of the activity in the current iteration and evaluate the loop’s predicate again
      PRINT @i;
      IF @i = 7 BREAK;    -- break out of the current loop and proceed to execute the statement that appears after the loop’s body
    END;

13.4. Cursors

SQL and T-SQL also support an object called cursor to process rows from a result of a query one at a time and in a requested order.

  • Primarily, the use of cursors contradicts the fundamental principles of the relational model, which is grounded in set theory.

  • Cursors, due to their record-by-record processing, incur significant overhead compared to set-based operations, resulting in significantly slower execution times even for similar underlying physical processing.

  • Cursor solutions, being imperative, tend to be longer, less readable, and harder to maintain than the declarative set solutions.

Working with a cursor generally involves the following steps:

  1. Declare the cursor based on a query.

  2. Open the cursor.

  3. Fetch attribute values from the first cursor record into variables.

  4. While not reaching the end of the cursor (the value of a function called @@FETCH_STATUS is 0), loop through the cursor records.

    In each iteration of the loop, perform the processing needed for the current row, and then fetch the attribute values from the next row into the variables.

  5. Close the cursor.

  6. Deallocate the cursor.

    DROP VIEW IF EXISTS Sales.CustOrders;
    GO
    
    CREATE VIEW Sales.CustOrders
    AS
      SELECT DISTINCT O.custid, O.orderdate AS ordermonth, SUM(OD.qty) AS qty
      FROM Sales.Orders AS O
      INNER JOIN Sales.OrderDetails AS OD
        ON O.orderid = OD.orderid
      GROUP BY o.custid, O.orderdate
    GO
    -- Suppress messages indicating how many rows were affected
    SET NOCOUNT ON;
    
    -- Declare table variable to hold the final result
    DECLARE @Result AS TABLE
    (
      custid     INT,
      ordermonth DATE,
      qty        INT,
      runqty     INT,
      PRIMARY KEY(custid, ordermonth)
    );
    
    -- Declare local variables that are used to store intermediate variables
    DECLARE
      @custid     AS INT,
      @prvcustid  AS INT,
      @ordermonth AS DATE,
      @qty        AS INT,
      @runqty     AS INT;
    
    -- Step 1: Declare the cursor based on a query
    DECLARE C CURSOR FAST_FORWARD /* read only, forward only */ FOR
      SELECT custid, ordermonth, qty
      FROM Sales.CustOrders
      ORDER BY custid, ordermonth;
    
    -- Step 2: Open the cursor
    OPEN C;
    
    -- Step 3: Fetch attribute values from the first cursor record into variables
    FETCH NEXT FROM C INTO @custid, @ordermonth, @qty;
    
    -- Initialize variables
    SELECT @prvcustid = @custid, @runqty = 0;
    
    -- Step 4: Loop through the cursor records while last fetch was
    --         In each iteration:
    --            Reset variables if customer ID changes
    --            Compute current running total and insert into table
    --            Fetch next cursor record
    WHILE @@FETCH_STATUS = 0
    BEGIN
      IF @custid <> @prvcustid
        SELECT @prvcustid = @custid, @runqty = 0;
    
      SET @runqty = @runqty + @qty;
    
      INSERT INTO @Result VALUES(@custid, @ordermonth, @qty, @runqty);
    
      FETCH NEXT FROM C INTO @custid, @ordermonth, @qty;
    END;
    
    -- Step 5: Close the cursor
    CLOSE C;
    
    -- Step 6: Deallocate the cursor
    DEALLOCATE C;
    
    -- Enable showing messages indicating how many rows were affected
    SET NOCOUNT OFF;
    
    -- Query the table variable to return the final result
    SELECT
      custid,
      CONVERT(VARCHAR(7), ordermonth, 121) AS ordermonth,
      qty,
      runqty
    FROM @Result
    ORDER BY custid, ordermonth;
    -- address the same task with a window function
    SELECT custid, ordermonth, qty,
      SUM(qty) OVER(PARTITION BY custid
                    ORDER BY ordermonth
                    ROWS UNBOUNDED PRECEDING) AS runqty
    FROM Sales.CustOrders
    ORDER BY custid, ordermonth;

13.5. Temporary Tables

Temporary tables are temporary storage structures within a SQL Server database. Unlike permanent tables, they are designed for short-term data storage and have limited lifespans.

SQL Server supports three kinds of temporary tables to be more conveniental to work with than permanent tables in such cases: local temporary tables, global temporary tables, and table variables. All three kinds of temporary tables are created in the tempdb database.

It’s crucial to distinguish temporary tables (local, global, table variables) from system-versioned temporal tables.

  • Temporary Tables: Primarily used for temporary storage within a specific session or batch, often for intermediate results or data manipulation.

  • System-Versioned Temporal Tables: Specifically designed to track the history of data changes over time.

13.5.1. Local Temporary Tables

A local temporary table is created by naming it with a single number sign (#) as a prefix, such as #T1.

  • A local temporary table is visible only to the session that created it, in the creating level and all inner levels in the call stack (inner procedures, triggers, and dynamic batches).

  • A local temporary table is destroyed automatically by SQL Server when the creating level in the call stack goes out of scope.

  • A suffix is added to the table name by SQL Server internally that makes it unique in tempdb.

    DROP TABLE IF EXISTS #MyOrderTotalsByYear;
    GO
    
    CREATE TABLE #MyOrderTotalsByYear
    (
      orderyear INT NOT NULL PRIMARY KEY,
      qty       INT NOT NULL
    );
    
    INSERT INTO #MyOrderTotalsByYear(orderyear, qty)
      SELECT
        YEAR(O.orderdate) AS orderyear,
        SUM(OD.qty) AS qty
      FROM Sales.Orders AS O
        INNER JOIN Sales.OrderDetails AS OD
          ON OD.orderid = O.orderid
      GROUP BY YEAR(orderdate);
    
    SELECT Cur.orderyear, Cur.qty AS curyearqty, Prv.qty AS prvyearq
    FROM #MyOrderTotalsByYear AS Cur
      LEFT OUTER JOIN #MyOrderTotalsByYear AS Prv
        ON Cur.orderyear = Prv.orderyear + 1;
    -- clean up resources as soon as possible
    DROP TABLE IF EXISTS #MyOrderTotalsByYear;

13.5.2. Global Temporary Tables

A global temporary table is created by naming it with a double number sign (##) as a prefix, such as ##T1.

  • A global temporary table is visible to all other sessions.

  • A global temporary table is destroyed automatically by SQL Server when the creating session disconnects and there are no active references to the table.

    -- creates a global temporary table called ##Globals with columns called id and val
    CREATE TABLE ##Globals
    (
      id  sysname     NOT NULL PRIMARY KEY,
      val SQL_VARIANT NOT NULL
    );
    -- anyone can insert rows into the table.
    INSERT INTO ##Globals(id, val) VALUES(N'I', CAST(10 AS INT));
    -- anyone can modify and retrieve data from the table.
    SELECT val FROM ##Globals WHERE id = N'I';
    -- explicitly destroy the global temporary table
    DROP TABLE IF EXISTS ##Globals;

13.5.3. Table Variables

A table variable is a local, temporary table-like data structure declared within a single batch.

  • As with local temporary tables, table variables have a physical presence as a table in the tempdb database.

  • Like local temporary tables, table variables are visible only to the creating session, but because they are variables they have a more limited scope: only the current batch.

  • If an explicit transaction is rolled back, changes made to temporary tables in that transaction are rolled back as well; however, changes made to table variables by statements that completed in the transaction aren’t rolled back.

  • In terms of performance, usually it makes more sense to use table variables with small volumes of data (only a few rows) and to use local temporary tables otherwise.

    DECLARE @MyOrderTotalsByYear TABLE
    (
      orderyear INT NOT NULL PRIMARY KEY,
      qty       INT NOT NULL
    );
    
    INSERT INTO @MyOrderTotalsByYear(orderyear, qty)
      SELECT
        YEAR(O.orderdate) AS orderyear,
        SUM(OD.qty) AS qty
      FROM Sales.Orders AS O
        INNER JOIN Sales.OrderDetails AS OD
          ON OD.orderid = O.orderid
      GROUP BY YEAR(orderdate);
    
    SELECT Cur.orderyear, Cur.qty AS curyearqty, Prv.qty AS prvyearqty
    FROM @MyOrderTotalsByYear AS Cur
      LEFT OUTER JOIN @MyOrderTotalsByYear AS Prv
        ON Cur.orderyear = Prv.orderyear + 1;

A table type is a user-defined data structure that defines the schema (columns, data types) of a table to be reused as the table definition of table variables and input parameters of stored procedures and user-defined functions.

-- create a table type called dbo.OrderTotalsByYear
DROP TYPE IF EXISTS dbo.OrderTotalsByYear;

CREATE TYPE dbo.OrderTotalsByYear AS TABLE
(
  orderyear INT NOT NULL PRIMARY KEY,
  qty       INT NOT NULL
);
-- simply specify dbo.OrderTotalsByYear as the variable’s type
DECLARE @MyOrderTotalsByYear AS dbo.OrderTotalsByYear;

13.6. Dynamic SQL

A dynamic SQL in SQL Server is a batch of T-SQL code as a character string that can be executed by using the EXEC command and the sp_executesql stored procedure.

  • The EXEC (short for EXECUTE) command accepts a regular or Unicode character string in parentheses as input and executes the batch of code within the character string.

    DECLARE @sql AS VARCHAR(100);
    SET @sql = 'PRINT ''This message was printed by a dynamic SQL batch'''
    EXEC(@sql);
  • The sp_executesql stored procedure is an alternative tool to the EXEC command for executing dynamic SQL code.

    • It’s more secure and more flexible in the sense that it has an interface; that is, it supports input and output parameters.

      In terms of security, parameters that appear in the code cannot be considered part of the code—they can only be considered operands in expressions.

    • Note that unlike EXEC, sp_executesql supports only Unicode character strings as the input batch of code.

    • The sp_executesql stored procedure can perform better than EXEC because its parameterization aids in reusing cached execution plans, which incur cost when SQL Server needs to create them anew.

      • An execution plan is the physical processing plan SQL Server produces for a query, with the set of instructions describing which objects to access, in what order, which indexes to use, how to access them, which join algorithms to use, and so on.

      • One of the requirements for reusing a previously cached plan is that the query string be the same as the one for which the cached plan was created.

      DECLARE @sql AS NVARCHAR(100);
      
      SET @sql = N'SELECT orderid, custid, empid, orderdate
      FROM Sales.Orders
      WHERE orderid = @orderid;';
      
      EXEC sp_executesql
        @stmt = @sql, -- a Unicode character string holding the batch of code
        @params = N'@orderid AS INT', -- a Unicode character string holding the declarations of input and output parameters
        @orderid = 10248; -- an input parameter called @orderid

13.7. Routines

Routines are programmable objects that encapsulate code to calculate a result or to execute activity. SQL Server supports three types of routines: user-defined functions, stored procedures, and triggers.

13.7.1. User-defined Functions

A user-defined function (UDF) is used to encapsulate logic that calculates something, possibly based on input parameters, and return a result. SQL Server supports scalar and table-valued UDFs.

  • Scalar UDFs return a single value; table-valued UDFs return a table.

  • Scalar UDFs can appear anywhere in the query where an expression that returns a single value can appear (for example, in the SELECT list).

  • Table UDFs can appear in the FROM clause of a query.

  • UDFs are not allowed to have any side effects, that means UDFs are not allowed to apply any schema or data changes in the database.

    CREATE OR ALTER FUNCTION dbo.GetNewID()
    RETURNS UNIQUEIDENTIFIER
    AS
    BEGIN
      RETURN NEWID(); -- Invalid use of a side-effecting operator 'newid' within a function.
    END;
    -- create a UDF called dbo.GetAge that returns the age of a person
    DROP FUNCTION IF EXISTS dbo.GetAge;
    GO
    
    CREATE OR ALTER FUNCTION dbo.GetAge
    (
      @birthdate AS DATE, -- a specified birth date (@birthdate argument)
      @eventdate AS DATE  -- a specified event date (@eventdate argument)
    )
    RETURNS INT
    AS
    
    BEGIN
      RETURN -- a RETURN clause that returns a value
        DATEDIFF(year, @birthdate, @eventdate)
        - CASE WHEN 100 * MONTH(@eventdate) + DAY(@eventdate)
                  < 100 * MONTH(@birthdate) + DAY(@birthdate)
               THEN 1 ELSE 0
          END;
    END;
    GO
    
    -- use a UDF in a query
    SELECT
      empid, firstname, lastname, birthdate,
      dbo.GetAge(birthdate, SYSDATETIME()) AS age
    FROM HR.Employees;

13.7.2. Stored Procedures

Stored procedures are routines that encapsulate logic with input and output parameters, return result sets of queries, and can have side effects.

  • Stored procedures encapsulate logic, allowing for centralized modification and ensuring all users utilize the updated implementation.

  • Stored procedures give better control of security.

    • A user permissions can be granted to execute the procedure without granting the user direct permissions to perform the underlying activities to ensure that all the required validations and auditing always take place.

    • Stored procedures with parameters can help prevent SQL injection.

  • All error-handling code can be incorporated within a procedure, silently taking corrective action where relevant.

  • Stored procedures give performance benefits.

    • Parameterized queries within stored procedures enhance performance by leveraging cached execution plans.

    • Stored procedures reduce network traffic by minimizing data exchange between the client and server.

    -- create a stored procedure called Sales.GetCustomerOrders
    CREATE OR ALTER PROC Sales.GetCustomerOrders
      -- a customer ID (@custid) and a date range (@fromdate and @todate) as inputs
      @custid   AS INT,
      @fromdate AS DATETIME = '19000101', -- default 19000101
      @todate   AS DATETIME = '99991231', -- default 99991231
      @numrows  AS INT OUTPUT -- the number of affected rows (@numrows) as an output
    AS
    SET NOCOUNT ON; -- suppress messages indicating affected rows by DML
    
    SELECT orderid, custid, empid, orderdate
    FROM Sales.Orders
    WHERE custid = @custid
      AND orderdate >= @fromdate
      AND orderdate < @todate;
    
    SET @numrows = @@rowcount;
    -- execute the procedure, and absorb the value of the output parameter @numrows in the variable @rc
    DECLARE @rc AS INT;
    
    EXEC Sales.GetCustomerOrders
      @custid   = 1,
      @fromdate = '20210101',
      @todate   = '20220101',
      @numrows  = @rc OUTPUT;
    
    SELECT @rc AS numrows;

13.7.3. Triggers

A trigger is a special kind of stored procedure attached to an event—one that cannot be executed explicitly.

  • SQL Server supports the association of triggers with two kinds of events: data manipulation events (DML triggers) such as INSERT, and data definition events (DDL triggers) such as CREATE TABLE.

  • A trigger is considered part of the transaction that includes the event that caused the trigger to fire.

  • Triggers in SQL Server fire per statement and not per modified row.

13.7.3.1. DML triggers

SQL Server supports two kinds of DML triggers: after and instead of.

  • An after trigger fires after the event it’s associated with finishes and can be defined only on permanent tables.

  • An instead of trigger fires instead of the event it’s associated with and can be defined on permanent tables and views.

In the trigger’s code, pseudo tables called inserted and deleted that contain the rows that were affected by the modification that caused the trigger to fire can be accessed.

  • The inserted table holds the new image of the affected rows in the case of INSERT and UPDATE actions.

  • The deleted table holds the old image of the affected rows in the case of DELETE and UPDATE actions.

  • In the case of instead of triggers, the inserted and deleted tables contain the rows that were supposed to be affected by the modification that caused the trigger to fire.

-- create a table called dbo.T1, and a table called dbo.T1_Audit
DROP TABLE IF EXISTS dbo.T1_Audit, dbo.T1;

CREATE TABLE dbo.T1
(
  keycol  INT         NOT NULL PRIMARY KEY,
  datacol VARCHAR(10) NOT NULL
);

CREATE TABLE dbo.T1_Audit
(
  audit_lsn  INT          NOT NULL IDENTITY PRIMARY KEY, -- audit log serial number
  dt         DATETIME2(3) NOT NULL DEFAULT(SYSDATETIME()),
  login_name sysname      NOT NULL DEFAULT(ORIGINAL_LOGIN()),
  keycol     INT          NOT NULL,
  datacol    VARCHAR(10)  NOT NULL
);
GO

-- create the AFTER INSERT trigger trg_T1_insert_audit on the T1 table to audit insertions
CREATE OR ALTER TRIGGER trg_T1_insert_audit ON dbo.T1 AFTER INSERT
AS
SET NOCOUNT ON;

INSERT INTO dbo.T1_Audit(keycol, datacol)
  SELECT keycol, datacol FROM inserted;
GO

-- trigger fires after each statement
INSERT INTO dbo.T1(keycol, datacol) VALUES(10, 'a');
INSERT INTO dbo.T1(keycol, datacol) VALUES(30, 'x');
INSERT INTO dbo.T1(keycol, datacol) VALUES(20, 'g');
GO

SELECT audit_lsn, dt, login_name, keycol, datacol
FROM dbo.T1_Audit;
-- 1	2025-01-15 16:03:50.997	sa	10	a
-- 2	2025-01-15 16:03:51.004	sa	30	x
-- 3	2025-01-15 16:03:51.008	sa	20	g
GO

-- cleanup
DROP TABLE dbo.T1_Audit, dbo.T1;
13.7.3.2. DDL triggers

SQL Server supports DDL triggers, which can be used for purposes such as auditing, policy enforcement, and change management.

  • SQL Server box product supports the creation of DDL triggers at two scopes, the database scope and the server scope, depending on the scope of the event.

  • Azure SQL Database currently supports only database triggers.

  • SQL Server supports only after DDL triggers; it doesn’t support instead of DDL triggers.

  • Within the trigger, information about the event that caused the trigger to fire can be obtained by querying a function called EVENTDATA, which returns the event information as an XML instance.

-- creates the dbo.AuditDDLEvents table to hold the audit information
DROP TABLE IF EXISTS dbo.AuditDDLEvents;

CREATE TABLE dbo.AuditDDLEvents
(
  audit_lsn        INT          NOT NULL IDENTITY,
  posttime         DATETIME2(3) NOT NULL,
  eventtype        sysname      NOT NULL,
  loginname        sysname      NOT NULL,
  schemaname       sysname      NOT NULL,
  objectname       sysname      NOT NULL,
  targetobjectname sysname      NULL,
  eventdata        XML          NOT NULL,
  CONSTRAINT PK_AuditDDLEvents PRIMARY KEY(audit_lsn)
);
GO

-- create the trg_audit_ddl_events audit trigger on the database by using the event group DDL_DATABASE_LEVEL_EVENTS
CREATE OR ALTER TRIGGER trg_audit_ddl_events
  ON DATABASE FOR DDL_DATABASE_LEVEL_EVENTS
AS
SET NOCOUNT ON;

DECLARE @eventdata AS XML = eventdata();
INSERT INTO dbo.AuditDDLEvents(
  posttime, eventtype, loginname, schemaname,
  objectname, targetobjectname, eventdata)
  VALUES(
    @eventdata.value('(/EVENT_INSTANCE/PostTime)[1]', 'VARCHAR(23)'), -- XQuery expressions
    @eventdata.value('(/EVENT_INSTANCE/EventType)[1]', 'sysname'),
    @eventdata.value('(/EVENT_INSTANCE/LoginName)[1]', 'sysname'),
    @eventdata.value('(/EVENT_INSTANCE/SchemaName)[1]', 'sysname'),
    @eventdata.value('(/EVENT_INSTANCE/ObjectName)[1]', 'sysname'),
    @eventdata.value('(/EVENT_INSTANCE/TargetObjectName)[1]', 'sysname'),
    @eventdata);
GO

-- test the trigger
CREATE TABLE dbo.T1(col1 INT NOT NULL PRIMARY KEY);
ALTER TABLE dbo.T1 ADD col2 INT NULL;
ALTER TABLE dbo.T1 ALTER COLUMN col2 INT NOT NULL;
CREATE NONCLUSTERED INDEX idx1 ON dbo.T1(col2);
GO

SELECT * FROM dbo.AuditDDLEvents;
GO

-- cleanup
DROP TRIGGER IF EXISTS trg_audit_ddl_events ON DATABASE;
DROP TABLE IF EXISTS dbo.AuditDDLEvents, dbo.T1;

13.8. Error Handling: TRY. . .CATCH

The TRY. . .CATCH construct in SQL Server handles errors by placing the usual T-SQL code in a TRY block and placing all the error-handling code in the adjacent CATCH block.

  • If the TRY block has no error, the CATCH block is simply skipped.

  • If the TRY block has an error, control is passed to the corresponding CATCH block.

    BEGIN TRY   -- TRY block (between the BEGIN TRY and END TRY keywords)
      PRINT 10/0;
      PRINT 'No error';
    END TRY
    BEGIN CATCH -- CATCH block (between the BEGIN CATCH and END CATCH keywords)
      PRINT '    Error Message : ' + ERROR_MESSAGE();
      PRINT '    Error Line    : ' + CAST(ERROR_LINE() AS VARCHAR(10));
    END CATCH;

SQL Server also provides a set of functions to get information about the error.

  • The ERROR_NUMBER function returns an integer with the number of the error.

  • The ERROR_MESSAGE function returns error-message text.

    To get the list of error numbers and messages, query the sys.messages catalog view.

  • The ERROR_SEVERITY and ERROR_STATE functions return the error severity and state.

  • The ERROR_LINE function returns the line number in the code where the error happened.

  • The ERROR_PROCEDURE function returns the name of the procedure in which the error happened and returns NULL if the error did not happen within a procedure.

    -- create a table called dbo.Employees
    DROP TABLE IF EXISTS dbo.Employees;
    
    CREATE TABLE dbo.Employees
    (
      empid   INT         NOT NULL,
      empname VARCHAR(25) NOT NULL,
      mgrid   INT         NULL,
      CONSTRAINT PK_Employees PRIMARY KEY(empid),
      CONSTRAINT CHK_Employees_empid CHECK(empid > 0),
      CONSTRAINT FK_Employees_Employees
        FOREIGN KEY(mgrid) REFERENCES dbo.Employees(empid)
    );
    BEGIN TRY
      INSERT INTO dbo.Employees(empid, empname, mgrid)
        VALUES(1, 'Emp1', NULL);
      -- Also try with empid = 0, 'A', NULL
    END TRY
    BEGIN CATCH
      IF ERROR_NUMBER() = 2627
      BEGIN
        PRINT '    Handling PK violation...';
      END;
      ELSE IF ERROR_NUMBER() = 547
      BEGIN
        PRINT '    Handling CHECK/FK constraint violation...';
      END;
      ELSE IF ERROR_NUMBER() = 515
      BEGIN
        PRINT '    Handling NULL violation...';
      END;
      ELSE IF ERROR_NUMBER() = 245
      BEGIN
        PRINT '    Handling conversion error...';
      END;
      ELSE
      BEGIN
        PRINT 'Re-throwing error...';
      THROW;    --  re-throws the error
      END;
      PRINT '    Error Number  : ' + CAST(ERROR_NUMBER() AS VARCHAR(10));
      PRINT '    Error Message : ' + ERROR_MESSAGE();
      PRINT '    Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10));
      PRINT '    Error State   : ' + CAST(ERROR_STATE() AS VARCHAR(10));
      PRINT '    Error Line    : ' + CAST(ERROR_LINE() AS VARCHAR(10));
      PRINT '    Error Proc    : ' + COALESCE(ERROR_PROCEDURE(), 'Not within a procedure');
    END CATCH;
    -- a stored procedure that encapsulates reusable error-handling code
    CREATE OR ALTER PROC dbo.ErrInsertHandler
    AS
    SET NOCOUNT ON;
    IF ERROR_NUMBER() = 2627
    BEGIN
      PRINT 'Handling PK violation...';
    END;
    ELSE IF ERROR_NUMBER() = 547
    BEGIN
      PRINT 'Handling CHECK/FK constraint violation...';
    END;
    ELSE IF ERROR_NUMBER() = 515
    BEGIN
      PRINT 'Handling NULL violation...';
    END;
    ELSE IF ERROR_NUMBER() = 245
    BEGIN
      PRINT 'Handling conversion error...';
    END;
    PRINT 'Error Number  : ' + CAST(ERROR_NUMBER() AS VARCHAR(10));
    PRINT 'Error Number  : ' + CAST(ERROR_NUMBER() AS VARCHAR(10));
    PRINT 'Error Message : ' + ERROR_MESSAGE();
    PRINT 'Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10));
    PRINT 'Error State   : ' + CAST(ERROR_STATE() AS VARCHAR(10));
    PRINT 'Error Line    : ' + CAST(ERROR_LINE() AS VARCHAR(10));
    PRINT 'Error Proc    : ' + COALESCE(ERROR_PROCEDURE(), 'Not within a procedure');
    GO
    
    BEGIN TRY
      INSERT INTO dbo.Employees(empid, empname, mgrid)
        VALUES(1, 'Emp1', NULL);
    END TRY
    BEGIN CATCH
      IF ERROR_NUMBER() IN (2627, 547, 515, 245)
        EXEC dbo.ErrInsertHandler;
    ELSE THROW;
    END CATCH;
    GO
    -- clean up
    DROP TABLE IF EXISTS dbo.Employees;
    DROP PROC IF EXISTS dbo.ErrInsertHandler;

Appendix A: Character Data Types

SQL Server supports two kinds of character data type pairs: regular kind (CHAR and VARCHAR) and N-kind (NCHAR and NVARCHAR).

Each of the type pairs can support different character encoding systems, and can result in different encoded byte lengths and on-disk storage sizes, based on the effective collation of the data and the character code range in use.

  • The regular types, with UTF-8 collation, use the UTF-8 encoding system and support full range of Unicode characters.

  • The N-kind types, with supplementary character collation (SC), use the UTF-16 encoding system and support the full range of Unicode characters. Otherwise, they support only the subset of characters from the UCS-2 character encoding system.

  • The VARCHAR(size) defines the maximum size in bytes, while the NVARCHAR(size) defines the size in byte pairs.

  • The literals of regular type character are enclosed in single quotes, while N-kind type require a preceding 'N'.

  • Any data type without the VAR element (CHAR, NCHAR) in its name has a fixed length, that is, SQL Server preserves the maximum space in the row based on the column’s defined size and not on the actual user data that is stored.

  • A data type with the VAR element (VARCHAR, NVARCHAR) in its name has a variable length, that is, SQL Server uses as much storage space in the row as required to store the actual character string, plus two extra bytes for offset data.

    • Updates of variable-length data types can be less efficient than those of fixed-length types due to potential row expansion and data movement.

    • Variable-length data types can use the MAX specifier, allowing for up to 2GB per value.

    • Any value with a size up to a certain threshold (8,000 bytes by default) can be stored inline in the row. Any value with a size above the threshold is stored external to the row as a large object (LOB).

  • The collation is a property of character data that encapsulates several aspects: language support, sort order, case sensitivity, accent sensitivity, and more.

    -- get the set of supported collations and their descriptions
    SELECT name, description
    FROM sys.fn_helpcollations();
    • In an on-premises SQL Server implementation and Azure SQL Managed Instance, collation can be defined at four different levels: instance, database, column, and expression. The lowest level is the effective one that is used.

      SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation'); -- SQL_Latin1_General_CP1_CI_AS
    • In Azure SQL Database, collation can be defined at the database, column, and expression levels.

Appendix B: Date and Time Data Types

T-SQL supports six date and time data types: DATETIME and SMALLDATETIME, which are considered legacy types, as well as DATE, TIME, DATETIME2, and DATETIMEOFFSET.

  • The legacy types DATETIME and SMALLDATETIME include date and time components that are inseparable.

    The two types differ in their storage requirements, their supported date range, and their precision.

  • The DATE and TIME data types provide a separation between the date and time components.

  • The DATETIME2 data type has a bigger date range and better precision than the legacy types.

  • The DATETIMEOFFSET data type is similar to DATETIME2, but it also includes the offset from UTC.

T-SQL doesn’t provide the means to express a date and time literal; instead, a literal of a different type that can be converted can be specified—explicitly or implicitly—to a date and time data type.

-- SQL Server recognizes the literal '20220212' as a character-string literal of a VARCHAR
-- datatype, and not as a date and time literal; however, because the expression involves
-- operands of two different types, one operand needs to be implicitly converted to the other’s type.
SELECT orderid, custid, empid, orderdate
FROM Sales.Orders
WHERE orderdate = '20220212'; -- equivalent to: WHERE orderdate = CAST('20220212' AS DATE);

References

  • [1] Itzik Ben-Gan T-SQL Fundamentals (Developer Reference). 3rd edition, Microsoft Press; August 3, 2016