Aggregate Functions




Aggregate functions are functions that take a collection (a set or multiset) of values
as input and return a single value. SQL offers five built-in aggregate functions:

• Average: avg
• Minimum: min
• Maximum: max
• Total: sum
• Count: count

The input to sum and avg must be a collection of numbers, but the other operators
can operate on collections of nonnumeric data types, such as strings, as well.

1. Basic Aggregation

Topics You May Be Interested In
Database-system Applications Basic Structure Of Sql Queries
Database Security Entity Types, Entity Sets, Keys, And Value Sets-2
Data Mining And Information Retrieval Example Of Notation- Uml Class Diagrams
Introduction To Database Security Issues Access Control, User Accounts, And Database Audits
Overview Of Object Database Concepts Aggregate Functions

Consider the query “Find the average salary of instructors in the Computer Sci?ence department.” We write this query as follows:

select avg (salary)
from instructor
where dept_name= ’Comp. Sci.’;

The result of this query is a relation with a single attribute, containing a single
tuple with a numerical value corresponding to the average salary of instructors
in the Computer Science department. The database system may give an arbitrary
name to the result relation attribute that is generated by aggregation; however,
we can give a meaningful name to the attribute by using the as clause as follows:

select avg (salary) as avg_salary
from instructor
where dept_name= ’Comp. Sci.’;

Topics You May Be Interested In
View Of Data History Of Database Systems
Specialization And Generalization Introduction To Database Security Issues
Database Languages Chapter 11 Object And Object-relational Databases
Relational Databases Ordering The Display Of Tuples
Specialty Databases Where Clause Predicates

In the instructor relation of Figure 2.1, the salaries in the Computer Science
department are $75,000, $65,000, and $92,000. The average balance is $232,000/3
= $77,333.33.

Retaining duplicates is important in computing an average. Suppose the
Computer Science department adds a fourth instructor whose salary happens to
be $75,000. If duplicates were eliminated, we would obtain the wrong answer
($232,000/4 = $58.000) rather than the correct answer of $76,750.

There are cases where we must eliminate duplicates before computing an
aggregate function. If we do want to eliminate duplicates, we use the keyword
distinct in the aggregate expression. An example arises in the query “Find the
total number of instructors who teach a course in the Spring 2010 semester.”
In this case, an instructor counts only once, regardless of the number of course
sections that the instructor teaches. The required information is contained in the
relation teaches, and we write this query as follows:

select count (distinct ID)
from teaches
where semester = ’Springand year = 2010;

Topics You May Be Interested In
Characteristics Of The Database Approach Entity Types, Entity Sets, Attributes, And Keys-1
Relational Databases Ordering The Display Of Tuples
Database Schema Database Security And The Dba
Sql Data Definition Set Operations-introduction
A Sample Database Application Null Values

Because of the keyword distinct preceding ID, even if an instructor teaches more
than one course, she is counted only once in the result.

We use the aggregate function count frequently to count the number of tuples
in a relation. The notation for this function in SQL is count (*). Thus, to find the
number of tuples in the course relation, we write

select count (*)
from course;

Aggregate Functions�

Topics You May Be Interested In
Specialization And Generalization Transaction Management
Modeling Of Union Types Using Categories Chapter 11 Object And Object-relational Databases
A Sample University Eer Schema, Design Choices, And Formal Definitions Attribute Specification In Select Clause
Data Abstraction, Knowledge Representation, And Ontology Concepts Relationship Types Of Degree Higher Than Two
View Of Data Sensitive Data And Types Of Disclosures

SQL does not allow the use of distinct with count (*). It is legal to use distinct
with max and min, even though the result does not change. We can use the
keyword all in place of distinct to specify duplicate retention, but, since all is the
default, there is no need to do so.

2. Aggregation with Grouping

There are circumstances where we would like to apply the aggregate function
not only to a single set of tuples, but also to a group of sets of tuples; we specify
this wish in SQL using the group by clause. The attribute or attributes given in
the group by clause are used to form groups. Tuples with the same value on all
attributes in the group by clause are placed in one group.

As an illustration, consider the query “Find the average salary in each depart?ment.” We write this query as follows:

Topics You May Be Interested In
Modeling Of Union Types Using Categories Introduction To Database Security Issues
Using High-level Conceptual Data Models For Database Design Overview Of Object Database Concepts
Purpose Of Database Systems Additional Basic Operations-the Rename Operation
Discretionary Access Control Based On Granting And Revoking Privileges A Sample Database Application
Introduction To Database Security Issues Entity Types, Entity Sets, Attributes, And Keys-1

select dept_name, avg (salary) as avg_salary
from instructor
group by dept_name;

Figure 3.14 shows the tuples in the instructor relation grouped by the dept_name attribute, which is the first step in computing the query result. The specified
aggregate is computed for each group, and the result of the query is shown in
Figure 3.15.

In contrast, consider the query “Find the average salary of all instructors.” We
write this query as follows:

select avg (salary)
from instructor;

Topics You May Be Interested In
View Of Data Database Design
Database-system Applications Specialty Databases
Modeling Of Union Types Using Categories Additional Basic Operations-string Operations
A Sample University Eer Schema, Design Choices, And Formal Definitions Access Control, User Accounts, And Database Audits
Introduction To Transaction Processing Null Values

Aggregate Functions�

In this case the group by clause has been omitted, so the entire relation is treated
as a single group.

As another example of aggregation on groups of tuples, consider the query
“Find the number of instructors in each department who teach a course in the
Spring 2010 semester.” Information about which instructors teach which course
sections in which semester is available in the teaches relation. However, this in?formation has to be joined with information from the instructor relation to get the
department name of each instructor. Thus, we write this query as follows:

select dept_name, count (distinct ID) as instr_count
from instructor natural join teaches
where semester = ’Springand year = 2010
group by dept_name;

Topics You May Be Interested In
Constraints And Characteristics Of Specialization And Generalization Hierarchies Relational Query Languages
Database Schema Relational Operations
Introduction To Database Security Issues Entity Types, Entity Sets, Attributes, And Keys-1
Discretionary Access Control Based On Granting And Revoking Privileges Relationship Types, Relationship Sets, Roles, And Structural Constraints
Schema Diagrams Set Operations-introduction

The result is shown in Figure 3.16.

When an SQL query uses grouping, it is important to ensure that the only
attributes that appear in the select statement without being aggregated are those
that are present in the group by clause. In other words, any attribute that is not
present in the group by clause must appear only inside an aggregate function if
it appears in the select clause, otherwise the query is treated as erroneous. For
example, the following query is erroneous since ID does not appear in the group
by
clause, and yet it appears in the select clause without being aggregated:

Aggregate Functions�

/* erroneous query */
select dept_name, ID, avg (salary)
from instructor
group by dept_name;

Topics You May Be Interested In
Database-system Applications Database Architecture
Specialization And Generalization Specialty Databases
Constraints And Characteristics Of Specialization And Generalization Hierarchies Database Schema
A Sample University Eer Schema, Design Choices, And Formal Definitions Overview Of The Sql Query Language
Relational Databases Null Values

Each instructor in a particular group (defined by dept_name) can have a different
ID, and since only one tuple is output for each group, there is no unique way of
choosing which ID value to output. As a result, such cases are disallowed by SQL.

3.The Having Clause

At times, it is useful to state a condition that applies to groups rather than to
tuples. For example, we might be interested in only those departments where the
average salary of the instructors is more than $42,000. This condition does not
apply to a single tuple; rather, it applies to each group constructed by the group
by
clause. To express such a query, we use the having clause of SQL. SQL applies
predicates in the having clause after groups have been formed, so aggregate
functions may be used. We express this query in SQL as follows:

select dept_name, avg (salary) as avg_salary
from instructor
group by dept_name
having avg (salary) > 42000;

Topics You May Be Interested In
A Sample University Eer Schema, Design Choices, And Formal Definitions Relationship Types, Relationship Sets, Roles, And Structural Constraints
Relational Databases Refining The Er Design For The Company Database
Xml: Extensible Markup Language Relationship Types Of Degree Higher Than Two
Schema Diagrams Ordering The Display Of Tuples
Overview Of Object Database Concepts Where Clause Predicates

The result is shown in Figure 3.17.

As was the case for the select clause, any attribute that is present in the having
clause without being aggregated must appear in the group by clause, otherwise
the query is treated as erroneous.

The meaning of a query containing aggregation, group by, or having clauses
is defined by the following sequence of operations:

1. As was the case for queries without aggregation, the from clause is first
evaluated to get a relation.

Topics You May Be Interested In
Data Abstraction, Knowledge Representation, And Ontology Concepts Chapter 11 Object And Object-relational Databases
Using High-level Conceptual Data Models For Database Design Overview Of Object Database Concepts
Database Languages Sql Data Definition
Relational Databases Basic Structure Of Sql Queries
Specialty Databases Initial Conceptual Design Of The Company Database

Aggregate Functions�

2. If a where clause is present, the predicate in the where clause is applied on
the result relation of the from clause.

3. Tuples satisfying the where predicate are then placed into groups by the
group by clause if it is present. If the group by clause is absent, the entire
set of tuples satisfying the where predicate is treated as being in one group.

4. The having clause, if it is present, is applied to each group; the groups that
do not satisfy the having clause predicate are removed.

Topics You May Be Interested In
Database Security Chapter 11 Object And Object-relational Databases
Database Languages Relational Operations
Database Design Introduction To Database Security Issues
Relational Databases Initial Conceptual Design Of The Company Database
Specialty Databases Example Of Notation- Uml Class Diagrams

5. The select clause uses the remaining groups to generate tuples of the result
of the query, applying the aggregate functions to get a single result tuple for
each group.

To illustrate the use of both a having clause and a where clause in the same
query, we consider the query “For each course section offered in 2009, find the
average total credits (tot_cred) of all students enrolled in the section, if the section
had at least 2 students.”

select course_id, semester, year, sec_id, avg (tot_cred)
from takes natural join student
where year = 2009
group by course_id, semester, year, sec_id
having count (ID) >= 2;

Note that all the required information for the preceding query is available from
the relations takes and student, and that although the query pertains to sections, a
join with section is not needed.

Topics You May Be Interested In
Using High-level Conceptual Data Models For Database Design Relational Databases
Introduction To Transaction Processing Relational Databases
Database Security Structure Of Relational Databases
Database Languages Introduction To Database Security Issues
Database Design Entity Types, Entity Sets, Attributes, And Keys-1

4. Aggregation with Null and Boolean Values

Null values, when they exist, complicate the processing of aggregate operators.
For example, assume that some tuples in the instructor relation have a null value
for salary. Consider the following query to total all salary amounts:

select sum (salary)
from instructor;

The values to be summed in the preceding query include null values, since some
tuples have a null value for salary. Rather than say that the overall sum is itself
null, the SQL standard says that the sum operator should ignore null values in its
input.

In general, aggregate functions treat nulls according to the following rule: All
aggregate functions except count (*) ignore null values in their input collection.
As a result of null values being ignored, the collection of values may be empty. The
count of an empty collection is defined to be 0, and all other aggregate operations return a value of null when applied on an empty collection. The effect of null values on some of the more complicated SQL constructs can be subtle.

A Boolean data type that can take values true, false, and unknown, was
introduced in SQL:1999. The aggregate functions some and every, which mean
exactly what you would intuitively expect, can be applied on a collection of
Boolean values.



Frequently Asked Questions

+
Ans: Null values present special problems in relational operations, including arithmetic operations, comparison operations, and set operations. view more..
+
Ans: The SQL operations union, intersect, and except operate on relations. view more..
+
Ans: Four main control measures are used to provide security of data in databases: Access control, Inference control, Flow control, Data encryption view more..
+
Ans: Aggregate functions view more..
+
Ans: The main Idea for the Sampling Algorithm is to select a small sample , one that fits in main memory of the database of transaction view more..
+
Ans: The main idea of Sampling Algorithm is to select a small sample , one that fits in main memory , of the database of transaction view more..
+
Ans: SQL provides a mechanism for nesting subqueries. A subquery is a select-from where expression that is nested within another query. view more..
+
Ans: SQL provides a mechanism for nesting subqueries. A subquery is a select-from where expression that is nested within another query. view more..
+
Ans: We have restricted our attention until now to the extraction of information from the database. Now, we show how to add,remove, or change information with SQL. view more..
+
Ans: We introduced the natural join operation. SQL provides other forms of the join operation, including the ability to specify an explicit join predicate, and the ability to include in the result tuples that are excluded by natural join. We shall discuss these forms of join in this section. view more..
+
Ans: In our examples up to this point, we have operated at the logical-model level. That is, we have assumed that the relations in the collection we are given are the actual relations stored in the database. view more..
+
Ans: A transaction consists of a sequence of query and/or update statements. view more..
+
Ans: Integrity constraints ensure that changes made to the database by authorized users do not result in a loss of data consistency. view more..




Rating - 4/5
503 views

Advertisements