Aggregate Functions




Aggregate functions are functions that take a collection (a set or multiset) of values
as input and return a single value. SQL offers five built-in aggregate functions:

• Average: avg
• Minimum: min
• Maximum: max
• Total: sum
• Count: count

The input to sum and avg must be a collection of numbers, but the other operators
can operate on collections of nonnumeric data types, such as strings, as well.

1. Basic Aggregation

Consider the query “Find the average salary of instructors in the Computer Sci?ence department.” We write this query as follows:

select avg (salary)
from instructor
where dept_name= ’Comp. Sci.’;

The result of this query is a relation with a single attribute, containing a single
tuple with a numerical value corresponding to the average salary of instructors
in the Computer Science department. The database system may give an arbitrary
name to the result relation attribute that is generated by aggregation; however,
we can give a meaningful name to the attribute by using the as clause as follows:

select avg (salary) as avg_salary
from instructor
where dept_name= ’Comp. Sci.’;

In the instructor relation of Figure 2.1, the salaries in the Computer Science
department are $75,000, $65,000, and $92,000. The average balance is $232,000/3
= $77,333.33.

Retaining duplicates is important in computing an average. Suppose the
Computer Science department adds a fourth instructor whose salary happens to
be $75,000. If duplicates were eliminated, we would obtain the wrong answer
($232,000/4 = $58.000) rather than the correct answer of $76,750.

There are cases where we must eliminate duplicates before computing an
aggregate function. If we do want to eliminate duplicates, we use the keyword
distinct in the aggregate expression. An example arises in the query “Find the
total number of instructors who teach a course in the Spring 2010 semester.”
In this case, an instructor counts only once, regardless of the number of course
sections that the instructor teaches. The required information is contained in the
relation teaches, and we write this query as follows:

select count (distinct ID)
from teaches
where semester = ’Springand year = 2010;

Because of the keyword distinct preceding ID, even if an instructor teaches more
than one course, she is counted only once in the result.

We use the aggregate function count frequently to count the number of tuples
in a relation. The notation for this function in SQL is count (*). Thus, to find the
number of tuples in the course relation, we write

select count (*)
from course;

Aggregate Functions�

SQL does not allow the use of distinct with count (*). It is legal to use distinct
with max and min, even though the result does not change. We can use the
keyword all in place of distinct to specify duplicate retention, but, since all is the
default, there is no need to do so.

2. Aggregation with Grouping

There are circumstances where we would like to apply the aggregate function
not only to a single set of tuples, but also to a group of sets of tuples; we specify
this wish in SQL using the group by clause. The attribute or attributes given in
the group by clause are used to form groups. Tuples with the same value on all
attributes in the group by clause are placed in one group.

As an illustration, consider the query “Find the average salary in each depart?ment.” We write this query as follows:

select dept_name, avg (salary) as avg_salary
from instructor
group by dept_name;

Figure 3.14 shows the tuples in the instructor relation grouped by the dept_name attribute, which is the first step in computing the query result. The specified
aggregate is computed for each group, and the result of the query is shown in
Figure 3.15.

In contrast, consider the query “Find the average salary of all instructors.” We
write this query as follows:

select avg (salary)
from instructor;

Aggregate Functions�

In this case the group by clause has been omitted, so the entire relation is treated
as a single group.

As another example of aggregation on groups of tuples, consider the query
“Find the number of instructors in each department who teach a course in the
Spring 2010 semester.” Information about which instructors teach which course
sections in which semester is available in the teaches relation. However, this in?formation has to be joined with information from the instructor relation to get the
department name of each instructor. Thus, we write this query as follows:

select dept_name, count (distinct ID) as instr_count
from instructor natural join teaches
where semester = ’Springand year = 2010
group by dept_name;

The result is shown in Figure 3.16.

When an SQL query uses grouping, it is important to ensure that the only
attributes that appear in the select statement without being aggregated are those
that are present in the group by clause. In other words, any attribute that is not
present in the group by clause must appear only inside an aggregate function if
it appears in the select clause, otherwise the query is treated as erroneous. For
example, the following query is erroneous since ID does not appear in the group
by
clause, and yet it appears in the select clause without being aggregated:

Aggregate Functions�

/* erroneous query */
select dept_name, ID, avg (salary)
from instructor
group by dept_name;

Each instructor in a particular group (defined by dept_name) can have a different
ID, and since only one tuple is output for each group, there is no unique way of
choosing which ID value to output. As a result, such cases are disallowed by SQL.

3.The Having Clause

At times, it is useful to state a condition that applies to groups rather than to
tuples. For example, we might be interested in only those departments where the
average salary of the instructors is more than $42,000. This condition does not
apply to a single tuple; rather, it applies to each group constructed by the group
by
clause. To express such a query, we use the having clause of SQL. SQL applies
predicates in the having clause after groups have been formed, so aggregate
functions may be used. We express this query in SQL as follows:

select dept_name, avg (salary) as avg_salary
from instructor
group by dept_name
having avg (salary) > 42000;

The result is shown in Figure 3.17.

As was the case for the select clause, any attribute that is present in the having
clause without being aggregated must appear in the group by clause, otherwise
the query is treated as erroneous.

The meaning of a query containing aggregation, group by, or having clauses
is defined by the following sequence of operations:

1. As was the case for queries without aggregation, the from clause is first
evaluated to get a relation.

Aggregate Functions�

2. If a where clause is present, the predicate in the where clause is applied on
the result relation of the from clause.

3. Tuples satisfying the where predicate are then placed into groups by the
group by clause if it is present. If the group by clause is absent, the entire
set of tuples satisfying the where predicate is treated as being in one group.

4. The having clause, if it is present, is applied to each group; the groups that
do not satisfy the having clause predicate are removed.

5. The select clause uses the remaining groups to generate tuples of the result
of the query, applying the aggregate functions to get a single result tuple for
each group.

To illustrate the use of both a having clause and a where clause in the same
query, we consider the query “For each course section offered in 2009, find the
average total credits (tot_cred) of all students enrolled in the section, if the section
had at least 2 students.”

select course_id, semester, year, sec_id, avg (tot_cred)
from takes natural join student
where year = 2009
group by course_id, semester, year, sec_id
having count (ID) >= 2;

Note that all the required information for the preceding query is available from
the relations takes and student, and that although the query pertains to sections, a
join with section is not needed.

4. Aggregation with Null and Boolean Values

Null values, when they exist, complicate the processing of aggregate operators.
For example, assume that some tuples in the instructor relation have a null value
for salary. Consider the following query to total all salary amounts:

select sum (salary)
from instructor;

The values to be summed in the preceding query include null values, since some
tuples have a null value for salary. Rather than say that the overall sum is itself
null, the SQL standard says that the sum operator should ignore null values in its
input.

In general, aggregate functions treat nulls according to the following rule: All
aggregate functions except count (*) ignore null values in their input collection.
As a result of null values being ignored, the collection of values may be empty. The
count of an empty collection is defined to be 0, and all other aggregate operations return a value of null when applied on an empty collection. The effect of null values on some of the more complicated SQL constructs can be subtle.

A Boolean data type that can take values true, false, and unknown, was
introduced in SQL:1999. The aggregate functions some and every, which mean
exactly what you would intuitively expect, can be applied on a collection of
Boolean values.



Frequently Asked Questions

+
Ans: Null values present special problems in relational operations, including arithmetic operations, comparison operations, and set operations. view more..
+
Ans: The SQL operations union, intersect, and except operate on relations. view more..
+
Ans: Four main control measures are used to provide security of data in databases: Access control, Inference control, Flow control, Data encryption view more..
+
Ans: Aggregate functions view more..
+
Ans: The main Idea for the Sampling Algorithm is to select a small sample , one that fits in main memory of the database of transaction view more..
+
Ans: The main idea of Sampling Algorithm is to select a small sample , one that fits in main memory , of the database of transaction view more..
+
Ans: SQL provides a mechanism for nesting subqueries. A subquery is a select-from where expression that is nested within another query. view more..
+
Ans: SQL provides a mechanism for nesting subqueries. A subquery is a select-from where expression that is nested within another query. view more..
+
Ans: We have restricted our attention until now to the extraction of information from the database. Now, we show how to add,remove, or change information with SQL. view more..
+
Ans: We introduced the natural join operation. SQL provides other forms of the join operation, including the ability to specify an explicit join predicate, and the ability to include in the result tuples that are excluded by natural join. We shall discuss these forms of join in this section. view more..
+
Ans: In our examples up to this point, we have operated at the logical-model level. That is, we have assumed that the relations in the collection we are given are the actual relations stored in the database. view more..
+
Ans: A transaction consists of a sequence of query and/or update statements. view more..
+
Ans: Integrity constraints ensure that changes made to the database by authorized users do not result in a loss of data consistency. view more..




Rating - 4/5
485 views

Advertisements