Web search




 

Web search has emerged as a major growth industry in the last decade, with recent figures indicating that the global number of searches has risen to over 10 billion per calendar month. The task of a web search engine is to index the entire contents of the World Wide Web, encompassing a wide range of information styles including web pages, multimedia sources and (scanned) books. This is a very complex task, as current estimates state that the Web consists of over 63 billion pages and one trillion unique web

 

Finance and commerce  - The growth of eCommerce as exemplified by companies such as Amazon and eBay, and underlying payments technologies such as PayPal; the associated emergence of online banking and trading and also complex information dissemination systems for financial markets.
The information society  - The growth of the World Wide Web as a repository of information and knowledge; the development of web search engines such as Google and Yahoo to search this vast repository; the emergence of digital libraries and the large-scale digitization of legacy information sources such as books (for example, Google Books); the increasing significance of user-generated content through sites such as YouTube, Wikipedia and Flickr; the emergence of social networking through services such as Facebook and MySpace.

Creative industries and entertainment -  The emergence of online gaming as a novel and highly interactive form of entertainment; the availability of music and film in the home through networked media centres and more widely in the Internet via downloadable or streaming content; the role of user-generated content (as mentioned above) as a new form of creativity, for example via services such as YouTube; the creation of new forms of art and entertainment enabled by emergent (including networked) technologies.

Healthcare  - The growth of health informatics as a discipline with its emphasis on online electronic patient records and related issues of privacy; the increasing role of telemedicine in supporting remote diagnosis or more advanced services such as remote surgery (including collaborative working between healthcare teams); the increasing application of networking and embedded systems technology in assisted living, for example for monitoring the elderly in their own homes.

Education -  The emergence of e-learning through for example web-based tools such as virtual learning environments; associated support for distance learning; support for collaborative or community-based learning.

Transport and logistics -  The use of location technologies such as GPS in route finding systems and more general traffic management  systems; the modern car itself as an example of a complex distributed system (also applies to other forms of transport such as aircraft); the development of web-based map services such as MapQuest, Google Maps and Google Earth.

Science The emergence - of the Grid as a fundamental technology for science including the use of complex networks of computers to support the storage, analysis, and processing of (often very large quantities of) scientific data; the associated use of the Grid as an enabling technology for worldwide collaboration between groups of scientists.

the natural environment, for The use of (networked) sensor technology to both monitor and manage Environmental management example to provide early warning of natural disasters such as earthquakes, floods or tsunamis and to co ordinate emergency response; the collation and analysis of global environmental parameters to better understand complex natural phenomena such as climate change addresses. Given that most search engines analyze the entire web content and then carry out sophisticated processing on this enormous database, this task itself represents a major challenge for distributed systems design.

Google, the market leader in web search technology, has put significant effort into the design of a sophisticated distributed system infrastructure to support search (and indeed other Google applications and services such as Google Earth). This represents one of the largest and most complex distributed systems installations in the history of computing and hence demands close examination. Highlights of this infrastructure include:

• an underlying physical infrastructure consisting of very large numbers of networked computers located at data centers all around the world.
• a distributed file system designed to support very large files and heavily optimized for the style of usage required by search and other Google applications (especially reading from files at high and sustained rates).
• an associated structured distributed storage system that offers fast access to very large datasets.
• a lock service that offers distributed system functions such as distributed locking and agreement.
• a programming model that supports the management of very large parallel and distributed computations across the underlying physical infrastructure.
Further details on Google’s distributed systems services and underlying communications support can be found in Chapter 21, a compelling case study of a modern distributed system in action.



Frequently Asked Questions

+
Ans: the wide range of applications in use today, from relatively localized systems (as found, for example, in a car or aircraft) to globalscale systems involving millions of nodes, from data-centric services to processorintensive tasks, from systems built from very small and relatively primitive sensors to those incorporating powerful computational elements, from embedded systems to ones that support a sophisticated interactive user experience, and so on. view more..
+
Ans: concurrency of components, lack of a global clock and independent failures of components and the ability to work well when the load or the number of users increases – failure handling, concurrency of components, transparency and providing quality of service view more..
+
Ans: What Is Information Systems Analysis and Design? Information systems analysis and design is a method used by companies ranging from IBM to PepsiCo to Sony to create and maintain information systems that perform basic business functions such as keeping track of customer names and addresses, processing orders, and paying employees. The main goal of systems analysis and design is to improve organizational systems, typically through applying software that can help employees accomplish key business tasks more easily and efficiently. As a systems analyst, you will be at the center of developing this software. view more..
+
Ans: The task of a web search engine is to index the entire contents of the World Wide Web, encompassing a wide range of information styles including web pages, multimedia sources and (scanned) books view more..
+
Ans: The growth of the World Wide Web as a repository of information and knowledge; the development of web search engines such as Google and Yahoo to search this vast repository view more..
+
Ans: The engineering of MMOGs represents a major challenge for distributed systems technologies, particularly because of the need for fast response times to preserve the user experience of the game. view more..
+
Ans: a very different style of underlying architecture from the styles mentioned above (for example client-server), and such systems typically employ what is known as distributed event-based systems. view more..
+
Ans: the emergence of ubiquitous computing coupled with the desire to support user mobility in distributed systems view more..
+
Ans: The Internet is also a very large distributed system. It enables users, wherever they are, to make use of services such as the World Wide Web, email and file transfer. (Indeed, the Web is sometimes incorrectly equated with the Internet.) view more..
+
Ans: Technological advances in device miniaturization and wireless networking have led increasingly to the integration of small and portable computing devices into distributed systems. view more..
+
Ans: The crucial characteristic of continuous media types is that they include a temporal dimension, and indeed, the integrity of the media type is fundamentally dependent on preserving real-time relationships between elements of a media type. view more..
+
Ans: hysical resources such as storage and processing can be made available to networked computers, removing the need to own such resources on their own. At one end of the spectrum, a user may opt for a remote storage facility for file storage requirements view more..
+
Ans: In practice, patterns of resource sharing vary widely in their scope and in how closely users work together. At one extreme, a search engine on the Web provides a facility to users throughout the world, users who need never come into contact with one another directly. At the other extreme, in computer-supported cooperative working (CSCW), a group of users who cooperate directly share resources such as documents in a small, closed group. view more..
+
Ans: Data types such as integers may be represented in different ways on different sorts of hardware – for example, there are two alternatives for the byte ordering of integers. These differences in representation must be dealt with if messages are to be exchanged between programs running on different hardware view more..
+
Ans: the publication of interfaces is only the starting point for adding and extending services in a distributed system. The challenge to designers is to tackle the complexity of distributed systems consisting of many components engineered by different people. view more..
+
Ans: a firewall can be used to form a barrier around an intranet, restricting the traffic that can enter and leave, this does not deal with ensuring the appropriate use of resources by users within an intranet, or with the appropriate use of resources in the Internet, that are not protected by firewalls. view more..
+
Ans: ly and efficiently at many different scales, ranging from a small intranet to the Internet. A system is described as scalable if it will remain effective when there is a significant increase in the number of resources and the number of users. The number of computers and servers in the Internet has increased dramatically. view more..
+
Ans: Failures in a distributed system are partial – that is, some components fail while others continue to function. Therefore the handling of failures is particularly difficult. The following techniques for dealing with failures are discussed throughout the book view more..



Recommended Posts:


Rating - 3/5
493 views

Advertisements