XML: Extensible Markup Language




The information stored in databases is known as structured data because it is represented in a strict format. For example, each record in a relational database table— such as each of the tables in the COMPANY database in Figure 3.6—follows the same format as the other records in that table. For structured data, it is common to carefully design the database schema using techniques such as those described in Chapters 7 and 8 in order to define the database structure. The DBMS then checks to ensure that all data follows the structures and constraints specified in the schema.

However, not all data is collected and inserted into carefully designed structured databases. In some applications, data is collected in an ad hoc manner before it is known how it will be stored and managed. This data may have a certain structure, but not all the information collected will have the identical structure. Some attributes may be shared among the various entities, but other attributes may exist only in a few entities. Moreover, additional attributes can be introduced in some of the newer data items at any time, and there is no predefined schema. This type of data is known as semistructured data. A number of data models have been introduced for representing semistructured data, often based on using tree or graph data structures rather than the flat relational model structures.

A key difference between structured and semistructured data concerns how the schema constructs (such as the names of attributes, relationships, and entity types) are handled. In semistructured data, the schema information is mixed in with the data values, since each data object can have different attributes that are not known in advance. Hence, this type of data is sometimes referred to as self-describing data. Consider the following example. We want to collect a list of bibliographic references related to a certain research project. Some of these may be books or technical reports, others may be research articles in journals or conference proceedings, and still others may refer to complete journal issues or conference proceedings. Clearly, each of these may have different attributes and different types of information. Even for the same type of reference—say, conference articles—we may have different information. For example, one article citation may be quite complete, with full information about author names, title, proceedings, page numbers, and so on, whereas another citation may not have all the information available. New types of bibliographic sources may appear in the future—for instance, references to Web pages or to conference tutorials—and these may have new attributes that describe them.

Semistructured data may be displayed as a directed graph, as shown in Figure 12.1. The information shown in Figure 12.1 corresponds to some of the structured data shown in Figure 3.6. As we can see, this model somewhat resembles the object model (see Section 11.1.3) in its ability to represent complex objects and nested structures. In Figure 12.1, the labels or tags on the directed edges represent the schema names: the names of attributes, object types (or entity types or classes), and relationships. The internal nodes represent individual objects or composite attributes. The leaf nodes represent actual data values of simple (atomic) attributes.

There are two main differences between the semistructured model and the object model that we discussed in Chapter 11:

1. The schema information—names of attributes, relationships, and classes (object types) in the semistructured model is intermixed with the objects and their data values in the same data structure.

2. In the semistructured model, there is no requirement for a predefined schema to which the data objects must conform, although it is possible to define a schema if necessary.XML: Extensible Markup LanguageXML: Extensible Markup LanguageIn addition to structured and semistructured data, a third category exists, known as unstructured data because there is very limited indication of the type of data. A typical example is a text document that contains information embedded within it. Web pages in HTML that contain some data are considered to be unstructured data. Consider part of an HTML file, shown in Figure 12.2. Text that appears between angled brackets, , is an HTML tag. A tag with a slash, , indicates an end tag, which represents the ending of the effect of a matching start tag. The tags mark up the document1 in order to instruct an HTML processor how to display the text between a start tag and a matching end tag. Hence, the tags specify document formatting rather than the meaning of the various data elements in the document. HTML tags specify information, such as font size and style (boldface, italics, and so on), color, heading levels in documents, and so on. Some tags provide text structuring in documents, such as specifying a numbered or unnumbered list or a table. Even these structuring tags specify that the embedded textual data is to be displayed in a certain manner, rather than indicating the type of data represented in the table.

HTML uses a large number of predefined tags, which are used to specify a variety of commands for formatting Web documents for display. The start and end tags specify the range of text to be formatted by each command. A few examples of the tags shown in Figure 12.2 follow

? The... tags specify the boundaries of the document

? The document header information—within the... tags—specifies various commands that will be used elsewhere in the document. For example, it may specify various script functions in a language such as JavaScript or PERL, or certain formatting styles (fonts, paragraph styles, header styles, and so on) that can be used in the document. It can also specify a title to indicate what the HTML file is for, and other similar information that will not be displayed as part of the document.

The example in Figure 12.2 illustrates a static HTML page, since all the information to be displayed is explicitly spelled out as fixed text in the HTML file. In many cases, some of the information to be displayed may be extracted from a database. For example, the project names and the employees working on each project may be extracted from the database in Figure 3.6 through the appropriate SQL query. We may want to use the same HTML formatting tags for displaying each project and the employees who work on it, but we may want to change the particular projects (and employees) being displayed. For example, we may want to see a Web page displaying the information for ProjectX, and then later a page displaying the information for ProjectY. Although both pages are displayed using the same HTML formatting tags, the actual data items displayed will be different. Such Web pages are called dynamic, since the data parts of the page may be different each time it is displayed, even though the display appearance is the same.

   

 



Frequently Asked Questions

+
Ans: Object databases is the power they give the designer to specify both the structure of complex objects and the operations that can be applied to these objects view more..
+
Ans: This chapter discusses techniques for securing databases against a variety of threats. It also presents schemes of providing access privileges to authorized users. view more..
+
Ans: This chapter discusses techniques for securing databases against a variety of threats. It also presents schemes of providing access privileges to authorized users. view more..
+
Ans: XML (Extensible Markup Language)—has emerged as the standard for structuring and exchanging data over the Web. XML can be used to provide information about the structure and meaning of the data in the Web pages rather than just specifying how the Web pages are formatted for display on the screen view more..
+
Ans: A database schema, along with primary key and foreign key dependencies, can be depicted by schema diagrams. view more..
+
Ans: A query language is a language in which a user requests information from the database. view more..
+
Ans: All procedural relational query languages provide a set of operations that can be applied to either a single relation or a pair of relations. view more..
+
Ans: An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented. Object-relational databases are a hybrid of both approaches. view more..
+
Ans: IBM developed the original version of SQL, originally called Sequel, as part of the System R project in the early 1970s. view more..
+
Ans: The set of relations in a database must be specified to the system by means of a data-definition language (DDL). view more..
+
Ans: The basic structure of an SQL query consists of three clauses: select, from, and where. view more..
+
Ans: This chapter discusses techniques for securing databases against a variety of threats. It also presents schemes of providing access privileges to authorized users. view more..
+
Ans: The natural join operation operates on two relations and produces a relation as the result. view more..
+
Ans: Reason to rename a relation is a case where we wish to compare tuples in the same relation. view more..
+
Ans: SQL permits a variety of functions on character strings. Read to know about them. view more..
+
Ans: The company is organized into departments. Each department has a unique name, a unique number, and a particular employee who manages the department. We keep track of the start date when that employee began managing the department. A department may have several locations.  A department controls a number of projects, each of which has a unique name, a unique number, and a single location view more..
+
Ans: Entities and Their Attributes. The basic object that the ER model represents is an entity, which is a thing in the real world with an independent existence. An entity may be an object with a physical existence (for example, a particular person, car, house, or employee) view more..
+
Ans: A database usually contains groups of entities that are similar. For example, a company employing hundreds of employees may want to store similar information concerning each of the employees. These employee entities share the same attributes, but each entity has its own value(s) for each attribut view more..




Rating - 4/5
465 views

Advertisements