Difference between revisions of "P versus NP"

(began wikification)
(Importance: sp. millennium)
 
(15 intermediate revisions by 8 users not shown)
Line 1: Line 1:
The relation between the complexity classes P and NP is one of the most important open problems in theoretical [[computer science]] and [[mathematics]]. It is studied in [[computational complexity theory]], the part of the theory of computation dealing with the resources required during computation to solve a given problem. The most common resources are [[time]] (how many steps it takes to solve a problem) and [[space]] (how much memory it takes to solve a problem). In such analysis, a [[model]] of the computer for which time must be analysed is required. Typically, such models assume that the computer is deterministic - that, given the computer's present state and any inputs, there is only one possible action that the computer might take - and sequential - it performs actions one after the other. These assumptions reflect the behaviour of all practical computers yet devised, even including machines featuring [[parallel processing]].
+
'''<math>P</math> versus <math>NP</math>''' is one of the greatest [[computability and complexity]] problems of modern mathematics, and one of the [[Millennium Problems]]. <math>P</math> the class of decision problems (those whose answer is either "yes" or "no," as opposed to other classes such as counting problems) that can be solved by a deterministic algorithm in polynomial time. <math>NP</math> is the class of decision problems that can be solved by a ''non-deterministic'' algorithm in polynomial time. The <math>P</math> versus <math>NP</math> question asks whether these two classes are the same, or whether there are problems in <math>NP</math> that are not in <math>P</math>.
  
In this theory, the class P consists of all those decision problems that can be solved on a [[deterministic sequential machine]] in an amount of time that is [[polynomial]] in the size of the input; the class NP consists of all those decision problems whose positive solutions can be verified in [[polynomial time]] given the right [[information]], or equivalently, whose solution can be found in polynomial time on a [[non-deterministic machine]]. Arguably, the biggest open question in theoretical computer science concerns the relationship between those two classes:
+
Since all modern computers (with the exception of a few quantum computers) are deterministic, non-deterministic algorithms are of theoretical, rather than practical, interest. However, the class <math>NP</math> can also be defined without reference to nondeterminism:  
 +
{{stub}}
  
    Is P equal to NP?
+
==Overview==
 +
The relation between the complexity classes <math>P</math> and <math>NP</math> is one of the most important open problems in theoretical [[computer science]] and [[mathematics]]. The most common measurements are [[time]] (how many steps it takes to solve a problem as a function of input, usually expressed with big-O notation) and [[space]] (how much memory it takes to solve a problem). In such analysis, a [[model]] of the computer for which time must be analyzed is required. Typically, such models assume that the computer is deterministic - that, given the computer's present state and any inputs, there is only one possible action that the computer might take - and sequential - it performs actions one after the other, such as a deterministic Turing machine. These assumptions reflect the behaviour of all practical computers yet devised, even including machines featuring [[parallel processing]].
  
In a 2002 poll of 100 researchers, 61 believed the answer is no, 9 believed the answer is yes, 22 were unsure, and 8 believed the question may be independent of the currently accepted axioms, and so impossible to prove or disprove.[1] The [[Clay Mathematics Institute]] has offered a USD 1,000,000 prize for a correct solution.
+
A ''decision problem'' is a problem that admits a yes or no answer (as opposed to an optimization problem, such as "What is the length of the longest path from <math>s</math> to <math>t</math>?"). More formally, a decision problem may be thought of as a language <math>L</math> for which we wish to decide if a given word <math>w</math> belongs to the language.
  
An important role in this discussion is played by the set of NP-complete problems (or NPC) which can be loosely described as the hardest problems in NP and therefore they are the least likely to be in P. More precisely, any problem in NP, through some efficient (takes at most a polynomial-bounded number of steps) transformation, can be expressed as a problem in NP-complete. Therefore if one finds an efficient (again, polynomial-bounded) solution to any NP-complete problem, then every problem in NP can be solved efficiently and therefore must be in P, hence proving P = NP. (See NP-complete for the exact definition.) Most theoretical computer scientists currently believe that the relationship among the classes P, NP, and NPC is as shown in the picture, with the P and NPC classes disjoint.
+
We say that an algorithm <math>A</math> ''decides'' a language <math>L</math> if, for all inputs <math>w</math>, <math>A</math> either accepts or rejects <math>w</math>.
  
In essence, the P = NP question asks: if positive solutions to a YES/NO problem can be verified quickly, can the answers also be computed quickly? Here is an example to get a feeling for the question. Given a set of integers, does any subset of them sum to 0? For instance, does a subset of the set {-2, -3, 8, 15, -10} add up to 0? The answer is YES, though it may take a little while to find a subset that does - and if the set was larger, it might take a very long time to find a subset that does. On the other hand, if someone claims that the answer is "YES, because {-2, -3, -10, 15} add up to zero", then we can quickly check that with a few additions. Verifying that the subset adds up to zero is much faster than finding the subset in the first place. The information needed to verify a positive answer is also called a certificate. So we conclude that given the right certificates, positive answers to our problem can be verified quickly (i.e. in polynomial time) and that's why this problem is in NP.
+
===The class P===
 +
The class <math>P</math> consists of all those decision problems (languages) that can be decided using a deterministic Turing machine in an amount of time that is [[polynomial]] in the size of the input. More formally, <math>P = \bigcup_{k \ge 0} \text{TIME}(O(n^k))</math> where <math>\text{TIME}(f(n))</math> is the set of languages decidable by an <math>O(f(n))</math>-time deterministic Turing machine.
  
The restriction to YES/NO problems doesn't really make a difference; even if we allow more complicated answers, the resulting problem (whether FP = FNP) is equivalent.
+
===The class NP===
 +
The class <math>NP</math> (for ''non-deterministic polynomial time'') consists of all those decision problems that are decidable using a ''non-deterministic Turing machine''. It is equivalent to the set of decision problems for which whose ''yes'' instances are efficiently verifiable in polynomial time using a certificate. Examples of problems in <math>P</math> and <math>NP</math> are given below.
 +
 
 +
== Importance ==
 +
Arguably, the biggest open question in theoretical computer science concerns the relationship between those two classes:
 +
 
 +
Is <math>P</math> equal to <math>NP</math>?
 +
 
 +
In a 2002 poll of 100 researchers, 61 believed the answer is no, 9 believed the answer is yes, 22 were unsure, and 8 believed the question may be independent of the currently accepted axioms, and so impossible to prove or disprove.
 +
 
 +
The [[Clay Mathematics Institute]] has offered a USD \$1,000,000 prize for a correct solution, as it has listed it as one of its [[Millennium Problems]].
 +
 
 +
==Arguments==
 +
It is easy to show that <math>P \subseteq NP</math>, as if we are given any <math>L \in P</math>, a polynomial-time verifier for <math>L</math>, given input <math>w</math> and a certificate <math>c</math>, can simply ignore the certificate and decide if <math>w \in L</math>.
 +
 
 +
An important role in this discussion is played by the set of <math>NP</math>-complete problems (or <math>NPC</math>) which can be loosely described as the hardest problems in <math>NP</math>. More precisely, a language <math>L</math> is ''NP-complete'' if both are true:
 +
 
 +
* <math>L \in NP</math>
 +
* Any language in NP has a polynomial-time reduction to <math>L</math> (NP-hardness).
 +
 
 +
The main idea behind a polynomial-time reduction is this: If we knew how to decide <math>L</math> in polynomial time, then any problem in <math>NP</math> can be converted into an instance of <math>L</math> in polynomial time, and then we can use the algorithm that decides <math>L</math> as a subroutine.
 +
 
 +
===Examples of P, NP, NP-complete problems===
 +
The following problems are examples of problems in <math>P</math> (i.e. ones we can answer in polynomial time as a function of input):
 +
 
 +
*Given a list of <math>n</math> integers, is it sorted in non-decreasing order?
 +
*Given a weighted, undirected graph <math>G = (V,E)</math> and two vertices <math>s, t \in E</math>, does there exist a path from <math>s</math> to <math>t</math> of weight at most <math>c</math>?
 +
*Given two positive integers <math>m</math> and <math>n</math> and a positive integer <math>d</math>, is it true that <math>d = \gcd(m,n)</math>?
 +
 
 +
A classic example of a problem that is <math>NP</math>-complete but not known to be in <math>P</math> is the ''subset sum problem'': Given a list <math>S</math> of <math>n</math> integers and a number <math>t</math>, all encoded in some base <math>b > 1</math>, is there some subset of numbers in <math>S</math> whose sum is <math>t</math>? For example, is there a subset of <math>\{-4,2,3,10,-8,7\}</math> whose sum is 14? The answer is ''yes,'' and it can be checked in polynomial time that the answer is ''yes'' (by giving the certificate <math>\{2,3,10,-8,7\}</math>, but this is a difficult problem to solve in general as a brute force solution requires <math>O(2^n n)</math> computations, and it is not known if subset sum is in <math>P</math>.
 +
 
 +
The following examples are NP-complete problems:
 +
 
 +
*SAT, 3-SAT (Boolean satisfiability)
 +
*Subset sum
 +
*Vertex cover
 +
*Maximum clique in a graph
 +
*Set cover
 +
*Traveling salesman problem
 +
 
 +
The restriction to <math>YES/NO</math> problems doesn't really make a difference; even if we allow more complicated answers, the resulting problem (whether <math>FP = FNP</math>) is equivalent.

Latest revision as of 14:11, 25 October 2017

$P$ versus $NP$ is one of the greatest computability and complexity problems of modern mathematics, and one of the Millennium Problems. $P$ the class of decision problems (those whose answer is either "yes" or "no," as opposed to other classes such as counting problems) that can be solved by a deterministic algorithm in polynomial time. $NP$ is the class of decision problems that can be solved by a non-deterministic algorithm in polynomial time. The $P$ versus $NP$ question asks whether these two classes are the same, or whether there are problems in $NP$ that are not in $P$.

Since all modern computers (with the exception of a few quantum computers) are deterministic, non-deterministic algorithms are of theoretical, rather than practical, interest. However, the class $NP$ can also be defined without reference to nondeterminism: This article is a stub. Help us out by expanding it.

Overview

The relation between the complexity classes $P$ and $NP$ is one of the most important open problems in theoretical computer science and mathematics. The most common measurements are time (how many steps it takes to solve a problem as a function of input, usually expressed with big-O notation) and space (how much memory it takes to solve a problem). In such analysis, a model of the computer for which time must be analyzed is required. Typically, such models assume that the computer is deterministic - that, given the computer's present state and any inputs, there is only one possible action that the computer might take - and sequential - it performs actions one after the other, such as a deterministic Turing machine. These assumptions reflect the behaviour of all practical computers yet devised, even including machines featuring parallel processing.

A decision problem is a problem that admits a yes or no answer (as opposed to an optimization problem, such as "What is the length of the longest path from $s$ to $t$?"). More formally, a decision problem may be thought of as a language $L$ for which we wish to decide if a given word $w$ belongs to the language.

We say that an algorithm $A$ decides a language $L$ if, for all inputs $w$, $A$ either accepts or rejects $w$.

The class P

The class $P$ consists of all those decision problems (languages) that can be decided using a deterministic Turing machine in an amount of time that is polynomial in the size of the input. More formally, $P = \bigcup_{k \ge 0} \text{TIME}(O(n^k))$ where $\text{TIME}(f(n))$ is the set of languages decidable by an $O(f(n))$-time deterministic Turing machine.

The class NP

The class $NP$ (for non-deterministic polynomial time) consists of all those decision problems that are decidable using a non-deterministic Turing machine. It is equivalent to the set of decision problems for which whose yes instances are efficiently verifiable in polynomial time using a certificate. Examples of problems in $P$ and $NP$ are given below.

Importance

Arguably, the biggest open question in theoretical computer science concerns the relationship between those two classes:

Is $P$ equal to $NP$?

In a 2002 poll of 100 researchers, 61 believed the answer is no, 9 believed the answer is yes, 22 were unsure, and 8 believed the question may be independent of the currently accepted axioms, and so impossible to prove or disprove.

The Clay Mathematics Institute has offered a USD $1,000,000 prize for a correct solution, as it has listed it as one of its Millennium Problems.

Arguments

It is easy to show that $P \subseteq NP$, as if we are given any $L \in P$, a polynomial-time verifier for $L$, given input $w$ and a certificate $c$, can simply ignore the certificate and decide if $w \in L$.

An important role in this discussion is played by the set of $NP$-complete problems (or $NPC$) which can be loosely described as the hardest problems in $NP$. More precisely, a language $L$ is NP-complete if both are true:

  • $L \in NP$
  • Any language in NP has a polynomial-time reduction to $L$ (NP-hardness).

The main idea behind a polynomial-time reduction is this: If we knew how to decide $L$ in polynomial time, then any problem in $NP$ can be converted into an instance of $L$ in polynomial time, and then we can use the algorithm that decides $L$ as a subroutine.

Examples of P, NP, NP-complete problems

The following problems are examples of problems in $P$ (i.e. ones we can answer in polynomial time as a function of input):

  • Given a list of $n$ integers, is it sorted in non-decreasing order?
  • Given a weighted, undirected graph $G = (V,E)$ and two vertices $s, t \in E$, does there exist a path from $s$ to $t$ of weight at most $c$?
  • Given two positive integers $m$ and $n$ and a positive integer $d$, is it true that $d = \gcd(m,n)$?

A classic example of a problem that is $NP$-complete but not known to be in $P$ is the subset sum problem: Given a list $S$ of $n$ integers and a number $t$, all encoded in some base $b > 1$, is there some subset of numbers in $S$ whose sum is $t$? For example, is there a subset of $\{-4,2,3,10,-8,7\}$ whose sum is 14? The answer is yes, and it can be checked in polynomial time that the answer is yes (by giving the certificate $\{2,3,10,-8,7\}$, but this is a difficult problem to solve in general as a brute force solution requires $O(2^n n)$ computations, and it is not known if subset sum is in $P$.

The following examples are NP-complete problems:

  • SAT, 3-SAT (Boolean satisfiability)
  • Subset sum
  • Vertex cover
  • Maximum clique in a graph
  • Set cover
  • Traveling salesman problem

The restriction to $YES/NO$ problems doesn't really make a difference; even if we allow more complicated answers, the resulting problem (whether $FP = FNP$) is equivalent.