In this video we'll begin our study of our final topic. First of all, what are NP complete for computational and tractable problems. And second of all, what are some good strategies for approaching them algorithmically. Let's begin by formally defining computational tractability via polynomial time solvability. That is where it define the complexity class p. As you know from the past several weeks of hard work, the focus of these design and analysis of evidence courses is to learn practical algorithms and relevant supporting theory for fundamental computational problems. By now, we have a very long list of such problems, sorting, searching, shortest path, minimum spanning trees, sequence alignment, and so on. But the sad fact is I've given you a quite biased sample thus far of computational problems. And for many important computational problems, including ones you are likely to run into in your own projects, there is no known efficient, let alone blazingly fast, algorithm for solving the problem. And as much as the focus of these courses is what you can do with algorithms rather than what you can't do, it would nevertheless be fraudulent, I think, to teach you about algorithms without discussing the spectre of intractability that haunts the professional algorithm designer and the serious programmer. In the same way your program or toolbox should include lots of different design paradigms like dynamic programming, greedy algorithms, divide and conquer, lots of different data structures, hash tables, balance search trees, and so on. That toolbox also should include the knowledge of computationally intractable that is mp complete problems. Maybe how to even establish problems are mp complete. Again this is because, these problems are likely to show up in your own projects. It is not uncommon that a graduate student at Stanford will come into my office asking for some advice, how to approach a problem that's come up in their own project from an algorithmic perspective. Very often, after they've sketched the problem for two minutes, five minutes on my white board, it's very easy to see that it is, in fact, an MP complete problem. They should not seek out a efficient exactly correct algorithm and instead they should resort to one of the strategies for dealing with MP complete problems that we'll be discussing later. So in the next sequence of videos my goal is to formalize the notion of computational intractability through the definition of NP completeness and also give you a couple of examples. The relatively brief discussion that I am going to give you of NP completeness is no substitute for a proper study of the topic, I encourage you to seek out a textbook or other free courses on the web to learn more. And indeed NP completeness I think is a topic worth learning at least twice from multiple different perspectives. And indeed those of you who have study it before I hope you find my treatment somewhat complementary to previous treatments that you might have seen. Indeed NP completeness is really one of the most famous important and influential exports from all of computer science to the broader scientific world. And as so many different disciplines are getting increasingly computational, and here I'm thinking of everything from the mathematical sciences to the natural sciences, even to the social sciences. These fields have really no choice but to confront and learn about NP completeness since it genuinely governs what can be accomplished computationally and what cannot. My perspective here will be unapologetically that of an algorithm designer. And in particular after our discussion of NP completeness, the rest of this course is going to focus on algorithmic approaches to these problems. If you're confronted with an NP complete problem, what should you do about it? So rather than starting out by formalizing computational intractability, it's a little simpler to begin by defining computation tractability. That brings us to the uber important definition of polynomial time solvability. So, the definitions of a mouthful, but it's exactly what you think it would be. So, the problem is polymer time solubility naturally if there's a polynomial algorithm that solves it. That is there's an algorithm and there's a constant K, so that if you feed in an input of length n to this algorithm then it will correctly solve the problem. Either it will correctly answer yes or no or it will correctly output an optimal solution, whatever. It correctly solves it in time, big o into the K. Pretty much all of the problems we've discussed it's been kind of obvious what the input length was. For example, the number of vertices plus the number of edges or it was the number of numbers that had to be sorted, etc. For abstract problem just informally I wouldn't encourage you to think about the input length, and as the number of keystrokes on the computer that would be required to describe that given instance. And yes it is true that strictly speaking to be polynomial time solvable all you need is a running tab of big N to the K for some constant K. Even if you get K = 10,000, that is enough to meet the demands of this definition. And of course for concrete problems you're going to see small values of K, in this course Ks usually been, one, two, or three. So a quick comment for those of you that are wondering about randomizing algorithms, and of course we saw some cool examples of randomized algorithms back in part one. So to keep the discussion simple, when we discuss computational intractability let's just think only about the deterministic algorithms. But at the same time, don't worry really all of the qualitative conclusions we're going to reach are believed to hold equally well for randomized algorithms. In particular we don't think that there are problems that deterministic algorithms require exponential time and yet randomization allows you to magically get polynomial time. There's even some mathematical evidence suggesting that is the case. So that brings us to the definition of the complexity class capital P. Capital P is just defined as the set of all polynomials times solvable problems. The set all problems that admits a polynomial time algorithm solving it. For those of you who have heard of the P verses NP question, this is the same P, throughout these courses we've exhibited many examples of problems in P, right? That's the whole meaning of the course. Sequence alignment, minimum cut, sorting, minimum expanding tree, shortest paths and so on. There are two exceptions. One we talked about exclusively at the time. Which we discussed the shortest paths in graphs with negative edge costs. And we stated that if you have negative cost cycles and you wanted shortest paths that are simple, that do not have any cycles in them. Then than turns out to be an NP complete problem. For those of you who that know about reductions an easy reduction from the Hamiltonian path problem. So we did not give a polynomial time algorithm for that version of the shortest path problem. We just punted on it. And the second example is a really subtle one. So, believe it or not we did not actually give a polynomial time algorithm for the knapsack problem. So, this requires an explication because at the time in our dynamic programming algorithm I'll bet it felt like it was a polynomial type algorithm. So, lets review what it's running time was. So, the knapsack problem the input is N items that have values and sizes. And also you're given this knapsack capacity, which is a positive integer capital W. And our table, our two dimensional array, had theta of N times capital W some problems, we spent constant time filling in each entry. So the running time of our dynamic programming algorithm was theta(n), the number of items, times capital W, the knapsack capacity. What on the other hand is the length of an instance of Knapsack? Well as we just said, the input to Knapsack is 2n plus 1 numbers. The end sizes, the end values, and the Knapsack capacity and naturally to write down any one of these numbers, you only need log of the magnitude of the number, right? So if you have an integer k and you want to write it in binary, that's log base 2 of k bits. If you want to write it down in base 10 digits, it's going to be log base 10 of K, which is the same up to a constant factor. So, the point is the input length is going to be proportional to N, and proportional to log of the magnitudes of the numbers, in particular log of capital W. So, here's another way to see why the dynamic programming algorithm is exponential in terms of the input length. Let me use an analogy. Suppose you were solving some problem over graph cuts. So remember, the number of cuts in a graph is exponential to the number of vertices. It's roughly two to the end for end vertices. So that means if you're brute force search and I add just one more vertex to the graph, it doubles the running time of your algorithm. That's exponential growth. But, actually the same thing is going on in the knapsack problem of the dynamic programming algorithm. Suppose you write everything in base ten and I just add one extra zero to the knapsack and passi, and multiply it by ten. Within the number of sure problems you have to solve goes up by ten. So, again I just added one digit, one keystroke to the input. And yes your running time got multiplied by a custom factor. So, it is again exponential growth with respect to the coding length of the input. Now it's no accident that we failed to get a poly time algorithm for the knapsack problem. Because that is in fact an NP complete problem. Again, we'll explain what NP complete means shortly. So that's the mathematical definition of the complexity class P, but more important than the math is the interpretation. That is how should you think about the class P. How should you think about polynomial time solvability. To the practicing algorithm design or the practicing programmer, membership in P, polynomial time solvability can be though of a rough litmus test for commutation tractability. Now, the identification of algorithms that run in big O (n to the k) time for some constant, k and practical computationally efficient algorithms is imperfect. There are of course, algorithms which in principle run in polynomial time but are too slow, to be useful in practice. Conversely, there are algorithms which are not polynomial time but are useful in practice. You already coded one up when you did the dynamic programming solution to knapsack, and we'll see several more examples in future lectures. And more generally, anyone who has the guts to write down a precise mathematical definition, like this complexity class P, needs to be ready for the inevitability that the definition will accommodate a few edge cases that you wish it didn't, and that it will exclude a few edge cases that you wish it didn't. This is an inevitable friction between the binary nature of mathematical properties and the fuzziness of reality. These edge cases are absolutely no excuse to ignore or dismiss a mathematical definition like this one, quite the contrary. The friction makes it that much more astonishing that you could write down a clean mathematical definition, something that either is satisfied or not satisfied by every single computational problem. And have it serve as such a useful guide to classify problems into the tractable in practice problems, and the intractable in practice problems. And we now have 40 yeas of experience that indicates this definition has been unreasonably effective in just that kind of classification. Generally speaking computational problems in P can be well solve using an off the shelf solutions. As we've seen of the many examples in this class. Problems that are believed not to be in P, generally requires significant computational resources, significant human resources and a lot of domain expertise to solve and practice. So we've mentioned in passing a couple of problems that are believed to not be polynomial time solvable. Let me pause here and tell you about a more famous one. The traveling salesman problem. The traveling salesman problem remarkably just doesn't sound that different than the minimum spanning tree problem, a problem for which we now know a litany of greedy algorithms that run in near linear time. So, the input is an undirected graph and let's go ahead and assume that it's a complete graph. So there's an edge between every pair of vertices every edge has a cost and the problem is nontrivial, even if every edge cost is just either one or two. Let me draw an example in green with four vertices. The responsibility of an algorithm for the TSP problem is to output the tour, that is a cycle that visits every single vortex exactly once. And amongst all tours, which you can think of as a permutation of the vertices, the order in which you visit them. You want the one that minimizes the sum of the edge costs. So for example, in the green graph that I've drawn, the minimum cost tour has total cost 13. So the TSP problem is simple enough to state. It's clear you could solve it by brute force search and roughly in factorial time just by trying all permutations of the vertices. But you know we see many, many examples in this class where you can do better than brute force search. And the TSP problem, people have been studying seriously, from a computational perspective, since at least the 1950s, including many huge names in optimization, people like George Danzig. So despite the interest over 60 years, there is, to date, no known polynomial time algorithm that solves the traveling salesman problem. In fact, way back in 1965, Jack Edmonds in a remarkable paper called Paths, Trees, and Flowers, conjectured that no polynomial time algorithm exists for the traveling salesman problem. This conjecture remains unresolved to this day, almost 50 years later. As we'll see, this is equivalent to the conjecture that P is not equal to NP. So, how would you formalize a proof of this conjecture? How would you amass evidence that this conjecture holds in the absence of a proof? Those are the topics of the upcoming videos.