Subsections

10.2 SQL queries

The basic format of an SQL query is this:

SELECT columns
    FROM
tables
    WHERE
row_condition
    ORDER BY
columns
    GROUP BY
columns

This will select the named columns from the specified tables and return all rows matching the row_condition.


10.2.1 Selecting columns

The special character * selects all columns, otherwise only those columns named are included in the result. If more than one column name is given, the column names must be separated by commas.

SELECT *
    ...

SELECT colname
    ... 

SELECT colname1, colname2
    ...

The column name may be followed by a column alias and that alias can be used elsewhere in the query (e.g., in the WHERE clause.

SELECT colname colalias
    ...

If more than one table is included in the query, and the tables share a column with the same name, a column name must be preceded by the relevant table name, with a full stop in between.

SELECT tablename.colname
    ...

Functions and operators may be used to produce results that are calculated from the column. The set of functions that is provided varies widely between DBMS, but the normal mathematical operators for addition, subtraction, multiplication, and division, plus a set of basic aggregation functions for maximum value (MAX), minimum value (MIN), summation (SUM), and arithmetic mean (AVG) should always be available.

SELECT MAX(colname)
    ...

SELECT colname1 + colname2
    ...

A column name can also be a constant value (number or string), in which case the value is replicated for every row of the result.

10.2.2 Specifying tables: the FROM clause

The FROM clause must contain at least one table and all columns specified in the query must exist in at least one of the tables in the the FROM clause.

If a single table is specified, then the result is all rows of that table, subject to any filtering applied by a WHERE clause. A table name may be followed by an table alias, in which case, the alias may be used anywhere else in the query.

SELECT colname
    FROM tablename
    ...

SELECT talias.colname
    FROM tablename talias
    ...

If two tables are specified, separated only by a comma, the result is all possible combinations of the rows of the two tables (a cartesian product). This is known as a cross join.

SELECT ...
    FROM table1, table2
    ...

An inner join is created from a cross join by specifying a condition so that only rows that have matching values are returned (typically using a foreign key to match with a primary key). The condition may be specified within the WHERE clause, or as part of an INNER JOIN syntax as shown below.

SELECT ...
    FROM table1 INNER JOIN table2
        ON table1.primarykey = table2.foreignkey
    ...

An outer join extends the inner join by including in the result rows from one table that have no match in the other table. There are left outer joins (where rows are retained from the table named on the left of the join syntax), right outer joins, and full outer joins (where non-matching rows from both tables are retained).

SELECT ...
    FROM table1 LEFT OUTER JOIN table2
        ON table1.primarykey = table2.foreignkey
    ...

A self join is a join of a table with itself. This requires the use of table aliases.

SELECT ...
    FROM tablename alias1, tablename alias2
    ...

10.2.3 Selecting rows: the WHERE clause

By default, all rows from a table or from a combination of tables, are returned. However, if the WHERE clause is used to specify a condition, then only rows matching that condition will be returned.

Conditions may involve any column from any table that is included in the query. Conditions usually involve a comparison between a column and a constant value, or between columns. Valid comparison operators include: equality (=), greater-than or less-than (>, <, or equal-to, >=, <=), and inequality (!= or <>).

SELECT ...
    FROM ...
    WHERE colname = 0;

SELECT ...
    FROM ...
    WHERE column1 > column2;

Complex conditions can be constructed by combining simple conditions with logical operators: AND, OR, and NOT. Parentheses should be used to make the order of evaluation explicit.

SELECT ...
    FROM ...
    WHERE column1 = 0 AND 
          column2 != 0;

SELECT ...
    FROM ...
    WHERE NOT (stroke = 'IM' AND
               (distance = 50 OR 
                distance = 100));

For the case where a column can match several possible values, the special IN keyword can be used to specify a range of valid values.

SELECT ...
    FROM ...
    WHERE column1 IN ('value1', 'value2');

Comparison with string constants can be generalised to allow patterns using the special LIKE comparison operator. In this case, within the string constant, the underscore character, _, has a special meaning; it will match any single character. The percent character, %, is also special and it will match any number of characters of any sort.

SELECT ...
    FROM ...
    WHERE stroke LIKE '%stroke';

See Section 11.9 for more advanced pattern matching tools.

10.2.4 Sorting results: the ORDER BY clause

The order of the columns in the results of a query is based on the order of the column names in the query.

The order of the rows in a result is undetermined unless an ORDER BY clause is included in the query. The ORDER BY clause names the column to order results by followed by ASC for ascending and DESC for descending order.

SELECT ...
    FROM ...
    ORDER BY columnname ASC;

The results can be order by the values in several columns simply by specifying several column names, separated by commas. The results are ordered by the first column then, within identical values of the first column, rows are ordered by the second column, and so on.

SELECT ...
    FROM ...
    ORDER BY column1 ASC, column2 DESC;

10.2.5 Aggregating results: the GROUP BY clause

The aggregation functions MAX, MIN, SUM, and AVG (see Section 10.2.1) all return a single value from a column. If a GROUP BY clause is included in the query, aggregated values are reported for each unique value of the column specified in the GROUP BY clause.

SELECT column1, SUM(column2)
    FROM ...
    GROUP BY column1;

Results can be reported for combinations of unique values of several columns simply by naming several columns in the GROUP BY clause.

SELECT column1, column2, SUM(column3)
    FROM ...
    GROUP BY column1, column2;

The GROUP BY clause can include a HAVING sub-clause that works like the WHERE clause, but operates on the rows of aggregated results rather than the original rows.

SELECT column1, SUM(column2) counts
    FROM ...
    GROUP BY column1
        HAVING counts > 0;

10.2.6 Sub-queries

The result of an SQL query may be used as part of a larger query. The sub-query is placed within parentheses, but otherwise follows the same syntax as a normal query.

Sub-queries are used in place of table names within the FROM clause and to provide comparison values within a WHERE clause.

SELECT column1
    FROM table1
    WHERE column1 IN 
        ( SELECT column2
              FROM table2
                  ...  );

Paul Murrell

Creative Commons License
This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.