# Five Representative Problem

The Stable Matching Problem provides us with a rich example of the process of algorithm design. For many problems, this process involves a few significant steps: formulating the problem with enough mathematical precision that we can ask a concrete question and start thinking about algorithms to solve it; designing an algorithm for the problem; and analyzing the algorithm by proving it is correct and giving a bound on the running time so as to establish the algorithm’s efficiency.

This high-level strategy is carried out in practice with the help of a few fundamental design techniques, which are very useful in assessing the inherent complexity of a problem and in formulating an algorithm to solve it. As in any area, becoming familiar with these design techniques is a gradual process; but with experience one can start recognizing problems as belonging to identifiable genres and appreciating how subtle changes in the statement of a problem can have an enormous effect on its computational difficulty.

To get this discussion started, then, it helps to pick out a few representative milestones that we’ll be encountering in our study of algorithms: cleanly formulated problems, all resembling one another at a general level, but differing greatly in their difficulty and in the kinds of approaches that one brings to bear on them. The first three will be solvable efficiently by a sequence of increasingly subtle algorithmic techniques; the fourth marks a major turning point in our discussion, serving as an example of a problem believed to be unsolvable by any efficient algorithm; and the fifth hints at a class of problems believed to be harder still.

The problems are self-contained and are all motivated by computing applications. To talk about some of them, though, it will help to use the terminology of graphs. While graphs are a common topic in earlier computer science courses, we’ll be introducing them in a fair amount of depth in Chapter 3; due to their enormous expressive power, we’ll also be using them extensively throughout the book. For the discussion here, it’s enough to think of a graph G as simply a way of encoding pairwise relationships among a set of objects. Thus, G consists of a pair of sets (V, E)—a collection V of nodes and a collection E of edges, each of which “joins” two of the nodes. We thus represent an edge e ∈ E as a two-element subset of V: e = {u, v} for some u, v ∈ V, where we call u and v the ends of e. We typically draw graphs as in Figure 1.3, with each node as a small circle and each edge as a line segment joining its two ends.

Let’s now turn to a discussion of the five representative problems.

#### Interval Scheduling

Consider the following very simple scheduling problem. You have a resource— it may be a lecture room, a supercomputer, or an electron microscope—and many people request to use the resource for periods of time. A request takes the form: Can I reserve the resource starting at time s, until time f? We will assume that the resource can be used by at most one person at a time. A scheduler wants to accept a subset of these requests, rejecting all others, so that the accepted requests do not overlap in time. The goal is to maximize the number of requests accepted.

More formally, there will be n requests labeled 1, . . . , n, with each request i specifying a start time si and a finish time fi. Naturally, we have si < fi for all i. Two requests i and j are compatible if the requested intervals do not overlap: that is, either request i is for an earlier time interval than request j (fi ≤ sj), or request i is for a later time than request j (fj ≤ si). We’ll say more generally that a subset A of requests is compatible if all pairs of requests i, j ∈ A, i = j are compatible. The goal is to select a compatible subset of requests of maximum possible size.

We illustrate an instance of this Interval Scheduling Problem in Figure 1.4. Note that there is a single compatible set of size 4, and this is the largest compatible set.

We will see shortly that this problem can be solved by a very natural algorithm that orders the set of requests according to a certain heuristic and then “greedily” processes them in one pass, selecting as large a compatible subset as it can. This will be typical of a class of greedy algorithms that we will consider for various problems—myopic rules that process the input one piece at a time with no apparent look-ahead. When a greedy algorithm can be shown to find an optimal solution for all instances of a problem, it’s often fairly surprising. We typically learn something about the structure of the underlying problem from the fact that such a simple approach can be optimal.

#### Weighted Interval Scheduling

.In the Interval Scheduling Problem, we sought to maximize the number of requests that could be accommodated simultaneously. Now, suppose more generally that each request interval i has an associated value, or weight, vi > 0; we could picture this as the amount of money we will make from the i th individual if we schedule his or her request. Our goal will be to find a compatible subset of intervals of maximum total value.

The case in which vi = 1 for each i is simply the basic Interval Scheduling Problem; but the appearance of arbitrary values changes the nature of the maximization problem quite a bit. Consider, for example, that if v1 exceeds the sum of all other vi, then the optimal solution must include interval 1 regardless of the configuration of the full set of intervals. So any algorithm for this problem must be very sensitive to the values, and yet degenerate to a method for solving (unweighted) interval scheduling when all the values are equal to 1.

There appears to be no simple greedy rule that walks through the intervals one at a time, making the correct decision in the presence of arbitrary values. Instead, we employ a technique, dynamic programming, that builds up the optimal value over all possible solutions in a compact, tabular way that leads to a very efficient algorithm.

#### Bipartite Matching

When we considered the Stable Matching Problem, we defined a matching to be a set of ordered pairs of men and women with the property that each man and each woman belong to at most one of the ordered pairs. We then defined a perfect matching to be a matching in which every man and every woman belong to some pair.

We can express these concepts more generally in terms of graphs, and in order to do this it is useful to define the notion of a bipartite graph. We say that a graph G = (V, E) is bipartite if its node set V can be partitioned into sets X

and Y in such a way that every edge has one end in X and the other end in Y. A bipartite graph is pictured in Figure 1.5; often, when we want to emphasize a graph’s “bipartiteness,” we will draw it this way, with the nodes in X and Y in two parallel columns. But notice, for example, that the two graphs in Figure 1.3 are also bipartite.

Now, in the problem of finding a stable matching, matchings were built from pairs of men and women. In the case of bipartite graphs, the edges are pairs of nodes, so we say that a matching in a graph G = (V, E) is a set of edges M ⊆ E with the property that each node appears in at most one edge of M. M is a perfect matching if every node appears in exactly one edge of M.

To see that this does capture the same notion we encountered in the Stable Matching Problem, consider a bipartite graph G with a set X of n men, a set Y of n women, and an edge from every node in X to every node in Y. Then the matchings and perfect matchings in G are precisely the matchings and perfect matchings among the set of men and women.

In the Stable Matching Problem, we added preferences to this picture. Here, we do not consider preferences; but the nature of the problem in arbitrary bipartite graphs adds a different source of complexity: there is not necessarily an edge from every x ∈ X to every y ∈ Y, so the set of possible matchings has quite a complicated structure. In other words, it is as though only certain pairs of men and women are willing to be paired off, and we want to figure out how to pair off many people in a way that is consistent with this. Consider, for example, the bipartite graph G in Figure 1.5: there are many matchings in G, but there is only one perfect matching. (Do you see it?)

Matchings in bipartite graphs can model situations in which objects are being assigned to other objects. Thus, the nodes in X can represent jobs, the nodes in Y can represent machines, and an edge (xi, yj) can indicate that machine yj is capable of processing job xi. A perfect matching is then a way of assigning each job to a machine that can process it, with the property that each machine is assigned exactly one job. In the spring, computer science departments across the country are often seen pondering a bipartite graph in which X is the set of professors in the department, Y is the set of offered courses, and an edge (xi, yj) indicates that professor xi is capable of teaching course yj. A perfect matching in this graph consists of an assignment of each professor to a course that he or she can teach, in such a way that every course is covered.

Thus the Bipartite Matching Problem is the following: Given an arbitrary bipartite graph G, find a matching of maximum size. If |X|=|Y| = n, then there is a perfect matching if and only if the maximum matching has size n. We will find that the algorithmic techniques discussed earlier do not seem adequate for providing an efficient algorithm for this problem. There is, however, a very elegant and efficient algorithm to find a maximum matching; it inductively builds up larger and larger matchings, selectively backtracking along the way. This process is called augmentation, and it forms the central component in a large class of efficiently solvable problems called network flow problems.

#### Independent Set

Now let’s talk about an extremely general problem, which includes most of these earlier problems as special cases. Given a graph G = (V, E), we say a set of nodes S ⊆ V is independent if no two nodes in S are joined by an edge. The Independent Set Problem is, then, the following: Given G, find an independent set that is as large as possible. For example, the maximum size of an independent set in the graph in Figure 1.6 is four, achieved by the four-node independent set {1, 4, 5, 6}.

The Independent Set Problem encodes any situation in which you are trying to choose from among a collection of objects and there are pairwise conflicts among some of the objects. Say you have n friends, and some pairs of them don’t get along. How large a group of your friends can you invite to dinner if you don’t want any interpersonal tensions? This is simply the largest independent set in the graph whose nodes are your friends, with an edge between each conflicting pair.

Interval Scheduling and Bipartite Matching can both be encoded as special cases of the Independent Set Problem. For Interval Scheduling, define a graph G = (V, E) in which the nodes are the intervals and there is an edge between each pair of them that overlap; the independent sets in G are then just the compatible subsets of intervals. Encoding Bipartite Matching as a special case of Independent Set is a little trickier to see. Given a bipartite graph G = (V, E), the objects being chosen are edges, and the conflicts arise between two edges that share an end. (These, indeed, are the pairs of edges that cannot belong to a common matching.) So we define a graph G = (V, E) in which the node set V is equal to the edge set E of G. We define an edge between each pair of elements in V that correspond to edges of G with a common end. We can now check that the independent sets of G are precisely the matchings of G.While it is not complicated to check this, it takes a little concentration to deal With this type of “edges-to-nodes, nodes-to-edges” transformation.^{2}

Given the generality of the Independent Set Problem, an efficient algorithm to solve it would be quite impressive. It would have to implicitly contain algorithms for Interval Scheduling, Bipartite Matching, and a host of other natural optimization problems.

The current status of Independent Set is this: no efficient algorithm is known for the problem, and it is conjectured that no such algorithm exists. The obvious brute-force algorithm would try all subsets of the nodes, checking each to see if it is independent, and then recording the largest one encountered. It is possible that this is close to the best we can do on this problem. We will see later in the book that Independent Set is one of a large class of problems that are termed NP-complete. No efficient algorithm is known for any of them; but they are all equivalent in the sense that a solution to any one of them would imply, in a precise sense, a solution to all of them.

Here’s a natural question: Is there anything good we can say about the complexity of the Independent Set Problem? One positive thing is the following: If we have a graph G on 1,000 nodes, and we want to convince you that it contains an independent set S of size 100, then it’s quite easy. We simply show you the graph G, circle the nodes of S in red, and let you check that no two of them are joined by an edge. So there really seems to be a great difference in difficulty between checking that something is a large independent set and actually finding a large independent set. This may look like a very basic observation—and it is—but it turns out to be crucial in understanding this class of problems. Furthermore, as we’ll see next, it’s possible for a problem to be so hard that there isn’t even an easy way to “check” solutions in this sense.

#### Competitive Facility Location

Finally, we come to our fifth problem, which is based on the following twoplayer game. Consider two large companies that operate cafe franchises across ´ the country—let’s call them JavaPlanet and Queequeg’s Coffee—and they are currently competing for market share in a geographic area. First JavaPlanet opens a franchise; then Queequeg’s Coffee opens a franchise; then JavaPlanet; then Queequeg’s; and so on. Suppose they must deal with zoning regulations that require no two franchises be located too close together, and each is trying to make its locations as convenient as possible. Who will win?

Let’s make the rules of this “game” more concrete. The geographic region in question is divided into n zones, labeled 1, 2, . . . , n. Each zone i has a

value bi, which is the revenue obtained by either of the companies if it opens a franchise there. Finally, certain pairs of zones (i, j) are adjacent, and local zoning laws prevent two adjacent zones from each containing a franchise, regardless of which company owns them. (They also prevent two franchises from being opened in the same zone.) We model these conflicts via a graph G = (V, E), where V is the set of zones, and (i, j) is an edge in E if the zones i and j are adjacent. The zoning requirement then says that the full set of franchises opened must form an independent set in G.

Thus our game consists of two players, P_{1 }and P_{2}, alternately selecting nodes in G, with P_{1} moving first. At all times, the set of all selected nodes must form an independent set in G. Suppose that player P_{2} has a target bound B, and we want to know: is there a strategy for P_{2} so that no matter how P_{1} plays, P_{2 }will be able to select a set of nodes with a total value of at least B? We will call this an instance of the Competitive Facility Location Problem.

Consider, for example, the instance pictured in Figure 1.7, and suppose that P_{2}’s target bound is B = 20. Then P_{2} does have a winning strategy. On the other hand, if B = 25, then P_{2} does not.

One can work this out by looking at the figure for a while; but it requires some amount of case-checking of the form, “If P_{1} goes here, then P_{2} will go there; but if P_{1} goes over there, then P_{2} will go here. . . . ” And this appears to be intrinsic to the problem: not only is it computationally difficult to determine whether P_{2} has a winning strategy; on a reasonably sized graph, it would even be hard for us to convince you that P_{2} has a winning strategy. There does not seem to be a short proof we could present; rather, we’d have to lead you on a lengthy case-by-case analysis of the set of possible moves.

This is in contrast to the Independent Set Problem, where we believe that finding a large solution is hard but checking a proposed large solution is easy. This contrast can be formalized in the class of PSPACE-complete problems, of which Competitive Facility Location is an example. PSPACE-complete problems are believed to be strictly harder than NP-complete problems, and this conjectured lack of short “proofs” for their solutions is one indication of this greater hardness. The notion of PSPACE-completeness turns out to capture a large collection of problems involving game-playing and planning; many of these are fundamental issues in the area of artificial intelligence.

**Frequently Asked Questions**

## Recommended Posts:

**NAN/5**