Nested Subqueries




 

SQL provides a mechanism for nesting subqueries. A subquery is a select-from where expression that is nested within another query. A common use of subqueries is to perform tests for set membership, make set comparisons, and determine set cardinality, by nesting subqueries in the where clause.

1.  Set Membership

SQL allows testing tuples for membership in a relation. The in connective tests
for set membership, where the set is a collection of values produced by a select
clause. The not in connective tests for the absence of set membership.

As an illustration, reconsider the query “Find all the courses taught in the
both the Fall 2009 and Spring 2010 semesters.” Earlier, we wrote such a query by
intersecting two sets: the set of courses taught in Fall 2009 and the set of courses
taught in Spring 2010. We can take the alternative approach of finding all courses
that were taught in Fall 2009 and that are also members of the set of courses
taught in Spring 2010. Clearly, this formulation generates the same results as the
previous one did, but it leads us to write our query using the in connective of SQL.
We begin by finding all courses taught in Spring 2010, and we write the subquery

(select course_id
from section
where semester = ’Spring’ and year= 2010)

We then need to find those courses that were taught in the Fall 2009 and that
appear in the set of courses obtained in the subquery. We do so by nesting the
subquery in the where clause of an outer query. The resulting query is

select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and

course_id in (select course_id

from section
where semester = ’Spring’ and year= 2010);

This example shows that it is possible to write the same query several ways in
SQL. This flexibility is beneficial, since it allows a user to think about the query in
the way that seems most natural. We shall see that there is a substantial amount
of redundancy in SQL.

We use the not in construct in a way similar to the in construct. For example,
to find all the courses taught in the Fall 2009 semester but not in the Spring 2010
semester, we can write:

select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and
course id not in (select course id

from section
where semester = ’Spring’ and year= 2010);

The in and not in operators can also be used on enumerated sets. The following query selects the names of instructors whose names are neither “Mozart” nor
“Einstein”.

select distinct name
from instructor
where name not in (’Mozart’, ’Einstein’);


In the preceding examples, we tested membership in a one-attribute relation.
It is also possible to test for membership in an arbitrary relation in SQL. For
example, we can write the query “find the total number of (distinct) students who
have taken course sections taught by the instructor with ID 110011” as follows:

select count (distinct ID)
from takes
where (course_id, sec_id, semester, year) in (select course_id, sec_id, semester, year

from teaches
where teaches.ID= 10101);

2. Set Comparison

As an example of the ability of a nested subquery to compare sets, consider the
query “Find the names of all instructors whose salary is greater than at least one
instructor in the Biology department.” We wrote this query previously as
follows:

select distinct T.name
from instructor as T, instructor as S
where T.salary > S.salary and S.dept_name = ’Biology’;

SQL does, however, offer an alternative style for writing the preceding query. The
phrase “greater than at least one” is represented in SQL by > some. This construct
allows us to rewrite the query in a form that resembles closely our formulation
of the query in English.

select name
from instructor
where salary > some (select salary

from instructor
where dept_name = ’Biology’);

The subquery:

(select salary
from instructor
where dept_name = ’Biology’)

generates the set of all salary values of all instructors in the Biology department.
The > some comparison in the where clause of the outer select is true if the salary
value of the tuple is greater than at least one member of the set of all salary values
for instructors in Biology.

SQL also allows < some, <= some, >= some, = some, and <> some comparisons. As an exercise, verify that = some is identical to in, whereas <> some
is not the same as not in.

Now we modify our query slightly. Let us find the names of all instructors
that have a salary value greater than that of each instructor in the Biology department. The construct > all corresponds to the phrase “greater than all.” Using this construct we write the query as follows:

select name
from instructor
where salary > all (select salary

from instructor
where dept_name = ’Biology’);

As it does for some, SQL also allows < all, <= all, >= all, = all, and <> all
comparisons. As an exercise, verify that <> all is identical to not in, whereas =
all is not the same as in.

As another example of set comparisons, consider the query “Find the departments that have the highest average salary.” We begin by writing a query to find
all average salaries, and then nest it as a subquery of a larger query that finds those departments for which the average salary is greater than or equal to all
average salaries:

select dept_name
from instructor
group by dept_name
having avg (salary) >= all (select avg (salary)
from instructor
group by dept_name);

3. Test for Empty Relations

SQL includes a feature for testing whether a subquery has any tuples in its result.
The exists construct returns the value true if the argument subquery is nonempty.
Using the exists construct, we can write the query “Find all courses taught in both
the Fall 2009 semester and in the Spring 2010 semester” in still another way:

select course_id
from section as S
where semester = ’Fall’ and year= 2009 and
exists (select *
from section as T
where semester = ’Spring’ and year= 2010 and
S.course_id= T.course_id);

The above query also illustrates a feature of SQL where a correlation name
from an outer query (S in the above query), can be used in a subquery in the
where clause. A subquery that uses a correlation name from an outer query is
called a correlated subquery.

In queries that contain subqueries, a scoping rule applies for correlation
names. In a subquery, according to the rule, it is legal to use only correlation
names defined in the subquery itself or in any query that contains the subquery.
If a correlation name is defined both locally in a subquery and globally in a
containing query, the local definition applies. This rule is analogous to the usual
scoping rules used for variables in programming languages.

We can test for the nonexistence of tuples in a subquery by using the not exists
construct. We can use the not exists construct to simulate the set containment (that
is, superset) operation: We can write “relation Acontains relation B” as “not exists
(B except A).” (Although it is not part of the current SQL standards, the contains
operator was present in some early relational systems.) To illustrate the not exists
operator, consider the query “Find all students who have taken all courses offered
in the Biology department.” Using the except construct, we can write the query
as follows:

select distinct S.ID, S.name
from student as S
where not exists ((select course_id

from course
where dept_name = ’Biology’)
except
(selec
t T.course_id
from takes as T
where S.ID = T.ID));

Here, the subquery:

(select course_id
from course
where dept_name = ’Biology’)

finds the set of all courses offered in the Biology department. The subquery:

(select T.course_id
from takes as T
where S.ID = T.ID)

finds all the courses that student S.ID has taken. Thus, the outer select takes each
student and tests whether the set of all courses that the student has taken contains
the set of all courses offered in the Biology department.

4. Test for the Absence of Duplicate Tuples

SQL includes a boolean function for testing whether a subquery has duplicate
tuples in its result. The unique construct returns the value true if the argument
subquery contains no duplicate tuples. Using the unique construct, we can write
the query “Find all courses that were offered at most once in 2009” as follows:

select T.course_id
from course as T
where unique (select R.course_id

from section as R
where T.course_id= R.course_id and
R.year = 2009);

Note that if a course is not offered in 2009, the subquery would return an empty
result, and the unique predicate would evaluate to true on the empty set.

An equivalent version of the above query not using the unique construct is:

select T.course_id
from course as T
where 1 <= (select count(R.course_id)

from section as R
where T.course_id= R.course_id and
R.year = 2009);

We can test for the existence of duplicate tuples in a subquery by using the
not unique construct. To illustrate this construct, consider the query “Find all
courses that were offered at least twice in 2009” as follows:

select T.course_id
from course as T
where not unique (select R.course_id

from section as R
where T.course_id= R.course_id and
R.year = 2009);

Formally, the unique test on a relation is defined to fail if and only if the
relation contains two tuples t1 and t2 such that t1 = t2. Since the test t1 = t2 fails
if any of the fields of t1 or t2 are null, it is possible for unique to be true even if
there are multiple copies of a tuple, as long as at least one of the attributes of the
tuple is null.

5. Subqueries in the From Clause

SQL allows a subquery expression to be used in the from clause. The key concept
applied here is that any select-from-where expression returns a relation as a result
and, therefore, can be inserted into another select-from-where anywhere that a
relation can appear.

Consider the query “Find the average instructors’ salaries of those departments where the average salary is greater than $42,000.” We wrote this query in
Section 3.7 by using the having clause. We can now rewrite this query, without
using the having clause, by using a subquery in the from clause, as follows:

select dept_name, avg salary
from (select dept_name, avg (salary) as avg_salary

from instructor
group by dept_name)
where avg_salary > 42000;

The subquery generates a relation consisting of the names of all departments and
their corresponding average instructors’ salaries. The attributes of the subquery
result can be used in the outer query, as can be seen in the above example.

Note that we do not need to use the having clause, since the subquery in
the from clause computes the average salary, and the predicate that was in the
having clause earlier is now in the where clause of the outer query.

We can give the subquery result relation a name, and rename the attributes,
using the as clause, as illustrated below.

select dept_name, avg_salary
from (select dept_name, avg (salary)

from instructor
group by dept_name)
as dept_avg (dept_name, avg_salary)
where avg_salary > 42000;

The subquery result relation is named dept avg, with the attributes dept_name and
avg_salary.

Nested subqueries in the from clause are supported by most but not all SQL
implementations. However, some SQL implementations, notably Oracle, do not
support renaming of the result relation in the from clause.

As another example, suppose we wish to find the maximum across all departments of the total salary at each department. The having clause does not help
us in this task, but we can write this query easily by using a subquery in the from
clause, as follows:

select max (tot_salary)
from (select dept_name, sum(salary)

from instructor
group by dept_name) as dept_total (dept_name, tot_salary
);

We note that nested subqueries in the from clause cannot use correlation
variables from other relations in the from clause. However, SQL:2003 allows a
subquery in the from clause that is prefixed by the lateral keyword to access
attributes of preceding tables or subqueries in the from clause. For example, if
we wish to print the names of each instructor, along with their salary and the
average salary in their department, we could write the query as follows:

select name, salary, avg_salary
from instructor I1, lateral (select avg(salary) as avg_salary

from instructor I2
where I2.dept_name= I1.dept_name
);

Without the lateral clause, the subquery cannot access the correlation variable
I1 from the outer query. Currently, only a few SQL implementations, such as IBM
DB2, support the lateral clause.

6. The with Clause

The with clause provides a way of defining a temporary relation whose definition
is available only to the query in which the with clause occurs. Consider the
following query, which finds those departments with the maximum budget.

with max_budget (value) as

(select max(budget)
from department)

select budget
from department, max_budget
where department.budget = max_budget.value;

The with clause defines the temporary relation max_budget, which is used in
the immediately following query. The with clause, introduced in SQL:1999, is
supported by many, but not all, database systems.

We could have written the above query by using a nested subquery in either
the from clause or the where clause. However, using nested subqueries would
have made the query harder to read and understand. The with clause makes the
query logic clearer; it also permits a view definition to be used in multiple places
within a query.

For example, suppose we want to find all departments where the total salary
is greater than the average of the total salary at all departments. We can write the
query using the with clause as follows.

with dept_total (dept_name, value) as

(select dept_name, sum(salary)
from instructor
group by dept_name
),

dept_total_avg(value) as

(select avg(value)
from dept_total)

select dept_name
from dept_total, dept_total_avg
where dept_total.value >= dept_total avg.value;

We can, of course, create an equivalent query without the with clause, but it would
be more complicated and harder to understand. You can write the equivalent
query as an exercise.

7. Scalar Subqueries

SQL allows subqueries to occur wherever an expression returning a value is
permitted, provided the subquery returns only one tuple containing a single
attribute; such subqueries are called scalar subqueries. For example, a subquery can be used in the select clause as illustrated in the following example that lists
all departments along with the number of instructors in each department:

select dept_name,

(select count(*)
from instructor
where department.dept_name = instructor.dept_nam
e)
as num_instructors

from department;

The subquery in the above example is guaranteed to return only a single value
since it has a count(*) aggregate without a group by. The example also illustrates
the usage of correlation variables, that is, attributes of relations in the from clause
of the outer query, such as department.dept_name in the above example.

Scalar subqueries can occur in select, where, and having clauses. Scalar subqueries may also be defined without aggregates. It is not always possible to figure
out at compile time if a subquery can return more than one tuple in its result;
if the result has more than one tuple when the subquery is executed, a run-time
error occurs.

Note that technically the type of a scalar subquery result is still a relation,
even if it contains a single tuple. However, when a scalar subquery is used in an
expression where a value is expected, SQL implicitly extracts the value from the
single attribute of the single tuple in the relation, and returns that value.

 

 

 



Frequently Asked Questions

+
Ans: The main idea of Sampling Algorithm is to select a small sample , one that fits in main memory , of the database of transaction view more..
+
Ans: The main Idea for the Sampling Algorithm is to select a small sample , one that fits in main memory of the database of transaction view more..
+
Ans: Aggregate functions view more..
+
Ans: SQL provides a mechanism for nesting subqueries. A subquery is a select-from where expression that is nested within another query. view more..
+
Ans: SQL provides a mechanism for nesting subqueries. A subquery is a select-from where expression that is nested within another query. view more..
+
Ans: We have restricted our attention until now to the extraction of information from the database. Now, we show how to add,remove, or change information with SQL. view more..
+
Ans: We introduced the natural join operation. SQL provides other forms of the join operation, including the ability to specify an explicit join predicate, and the ability to include in the result tuples that are excluded by natural join. We shall discuss these forms of join in this section. view more..
+
Ans: In our examples up to this point, we have operated at the logical-model level. That is, we have assumed that the relations in the collection we are given are the actual relations stored in the database. view more..
+
Ans: A transaction consists of a sequence of query and/or update statements. view more..
+
Ans: Integrity constraints ensure that changes made to the database by authorized users do not result in a loss of data consistency. view more..




Rating - 3/5
453 views

Advertisements