Data Mining Concepts




28.1 Overview of Data Mining Technology
In reports such as the very popular Gartner Report,1 data mining has been hailed as
one of the top technologies for the near future. In this section we relate data mining
to the broader area called knowledge discovery and contrast the two by means of an
illustrative example.
28.1.1 Data Mining versus Data Warehousing
The goal of a data warehouse (see Chapter 29) is to support decision making with
data. Data mining can be used in conjunction with a data warehouse to help
with certain types of decisions. Data mining can be applied to operational databases
with individual transactions. To make data mining more efficient, the data warehouse
should have an aggregated or summarized collection of data. Data mining
helps in extracting meaningful new patterns that cannot necessarily be found by
merely querying or processing data or metadata in the data warehouse. Therefore,
data mining applications should be strongly considered early, during the design of a
data warehouse. Also, data mining tools should be designed to facilitate their use in
conjunction with data warehouses. In fact, for very large databases running into terabytes
and even petabytes of data, successful use of data mining applications will
depend first on the construction of a data warehouse.
28.1.2 Data Mining as a Part of the Knowledge
Discovery Process
Knowledge Discovery in Databases, frequently abbreviated as KDD, typically
encompasses more than data mining. The knowledge discovery process comprises
six phases:2 data selection, data cleansing, enrichment, data transformation or
encoding, data mining, and the reporting and display of the discovered information.
As an example, consider a transaction database maintained by a specialty consumer
goods retailer. Suppose the client data includes a customer name, ZIP Code, phone
number, date of purchase, item code, price, quantity, and total amount. A variety of
new knowledge can be discovered by KDD processing on this client database.
During data selection, data about specific items or categories of items, or from stores
in a specific region or area of the country, may be selected. The data cleansing
process then may correct invalid ZIP Codes or eliminate records with incorrect
phone prefixes. Enrichment typically enhances the data with additional sources of
information. For example, given the client names and phone numbers, the store
may purchase other data about age, income, and credit rating and append them to
each record. Data transformation and encoding may be done to reduce the amount



Frequently Asked Questions

+
Ans: Aggregate functions view more..
+
Ans: Null values present special problems in relational operations, including arithmetic operations, comparison operations, and set operations. view more..
+
Ans: The SQL operations union, intersect, and except operate on relations. view more..
+
Ans: The main Idea for the Sampling Algorithm is to select a small sample , one that fits in main memory of the database of transaction view more..
+
Ans: The main idea of Sampling Algorithm is to select a small sample , one that fits in main memory , of the database of transaction view more..
+
Ans: SQL provides a mechanism for nesting subqueries. A subquery is a select-from where expression that is nested within another query. view more..
+
Ans: SQL provides a mechanism for nesting subqueries. A subquery is a select-from where expression that is nested within another query. view more..
+
Ans: We have restricted our attention until now to the extraction of information from the database. Now, we show how to add,remove, or change information with SQL. view more..
+
Ans: We introduced the natural join operation. SQL provides other forms of the join operation, including the ability to specify an explicit join predicate, and the ability to include in the result tuples that are excluded by natural join. We shall discuss these forms of join in this section. view more..
+
Ans: In our examples up to this point, we have operated at the logical-model level. That is, we have assumed that the relations in the collection we are given are the actual relations stored in the database. view more..
+
Ans: A transaction consists of a sequence of query and/or update statements. view more..
+
Ans: Integrity constraints ensure that changes made to the database by authorized users do not result in a loss of data consistency. view more..




Rating - 4/5
550 views

Advertisements