Difference between revisions of "P versus NP"

(wikify)
(Importance: sp. millennium)
 
(14 intermediate revisions by 7 users not shown)
Line 1: Line 1:
'''<math>P</math> versus <math>NP</math>''' is one of the greatest [[computability and complexity]] problems of modern mathematics, and one of the [[Millennium Problems]]. <math>P</math> stands for [[polynomial]], and it represents the category of problems for which there is an [[algorithm]] such that a computer could process it. <math>NP</math> stands for non-deterministic polynomial, and it represents problems such that a computer could not solve with a straightforward general algorithm.
+
'''<math>P</math> versus <math>NP</math>''' is one of the greatest [[computability and complexity]] problems of modern mathematics, and one of the [[Millennium Problems]]. <math>P</math> the class of decision problems (those whose answer is either "yes" or "no," as opposed to other classes such as counting problems) that can be solved by a deterministic algorithm in polynomial time. <math>NP</math> is the class of decision problems that can be solved by a ''non-deterministic'' algorithm in polynomial time.  The <math>P</math> versus <math>NP</math> question asks whether these two classes are the same, or whether there are problems in <math>NP</math> that are not in <math>P</math>.
 +
 
 +
Since all modern computers (with the exception of a few quantum computers) are deterministic, non-deterministic algorithms are of theoretical, rather than practical, interest. However, the class <math>NP</math> can also be defined without reference to nondeterminism:
 +
{{stub}}
  
 
==Overview==
 
==Overview==
The relation between the complexity classes <math>P</math> and <math>NP</math> is one of the most important open problems in theoretical [[computer science]] and [[mathematics]]. The most common resources are [[time]] (how many steps it takes to solve a problem) and [[space]] (how much memory it takes to solve a problem). In such analysis, a [[model]] of the computer for which time must be analysed is required. Typically, such models assume that the computer is deterministic - that, given the computer's present state and any inputs, there is only one possible action that the computer might take - and sequential - it performs actions one after the other. These assumptions reflect the behaviour of all practical computers yet devised, even including machines featuring [[parallel processing]].
+
The relation between the complexity classes <math>P</math> and <math>NP</math> is one of the most important open problems in theoretical [[computer science]] and [[mathematics]]. The most common measurements are [[time]] (how many steps it takes to solve a problem as a function of input, usually expressed with big-O notation) and [[space]] (how much memory it takes to solve a problem). In such analysis, a [[model]] of the computer for which time must be analyzed is required. Typically, such models assume that the computer is deterministic - that, given the computer's present state and any inputs, there is only one possible action that the computer might take - and sequential - it performs actions one after the other, such as a deterministic Turing machine. These assumptions reflect the behaviour of all practical computers yet devised, even including machines featuring [[parallel processing]].
 +
 
 +
A ''decision problem'' is a problem that admits a yes or no answer (as opposed to an optimization problem, such as "What is the length of the longest path from <math>s</math> to <math>t</math>?"). More formally, a decision problem may be thought of as a language <math>L</math> for which we wish to decide if a given word <math>w</math> belongs to the language.
 +
 
 +
We say that an algorithm <math>A</math> ''decides'' a language <math>L</math> if, for all inputs <math>w</math>, <math>A</math> either accepts or rejects <math>w</math>.
 +
 
 +
===The class P===
 +
The class <math>P</math> consists of all those decision problems (languages) that can be decided using a deterministic Turing machine in an amount of time that is [[polynomial]] in the size of the input. More formally, <math>P = \bigcup_{k \ge 0} \text{TIME}(O(n^k))</math> where <math>\text{TIME}(f(n))</math> is the set of languages decidable by an <math>O(f(n))</math>-time deterministic Turing machine.
  
In this theory, the class <math>P</math> consists of all those decision problems that can be solved on a [[deterministic sequential machine]] in an amount of time that is [[polynomial]] in the size of the input; the class <math>NP</math> consists of all those decision problems whose positive solutions can be verified in [[polynomial time]] given the right [[information]], or equivalently, whose solution can be found in polynomial time on a [[non-deterministic machine]].
+
===The class NP===
 +
The class <math>NP</math> (for ''non-deterministic polynomial time'') consists of all those decision problems that are decidable using a ''non-deterministic Turing machine''. It is equivalent to the set of decision problems for which whose ''yes'' instances are efficiently verifiable in polynomial time using a certificate. Examples of problems in <math>P</math> and <math>NP</math> are given below.
  
 
== Importance ==
 
== Importance ==
 
Arguably, the biggest open question in theoretical computer science concerns the relationship between those two classes:
 
Arguably, the biggest open question in theoretical computer science concerns the relationship between those two classes:
  
Is <math>P</math> equal to <math>NP</math>?
+
Is <math>P</math> equal to <math>NP</math>?
  
 
In a 2002 poll of 100 researchers, 61 believed the answer is no, 9 believed the answer is yes, 22 were unsure, and 8 believed the question may be independent of the currently accepted axioms, and so impossible to prove or disprove.  
 
In a 2002 poll of 100 researchers, 61 believed the answer is no, 9 believed the answer is yes, 22 were unsure, and 8 believed the question may be independent of the currently accepted axioms, and so impossible to prove or disprove.  
  
The [[Clay Mathematics Institute]] has offered a USD <dollar/>1,000,000 prize for a correct solution, as it has listed it as one of its [[Millenium Problems]].
+
The [[Clay Mathematics Institute]] has offered a USD \$1,000,000 prize for a correct solution, as it has listed it as one of its [[Millennium Problems]].
  
 
==Arguments==
 
==Arguments==
An important role in this discussion is played by the set of <math>NP</math>-complete problems (or <math>NPC</math>) which can be loosely described as the hardest problems in <math>NP</math> and therefore they are the least likely to be in P. More precisely, any problem in <math>NP</math>, through some efficient (takes at most a polynomial-bounded number of steps) transformation, can be expressed as a problem in <math>NP</math>-complete. Therefore if one finds an efficient (again, polynomial-bounded) solution to any <math>NP</math>-complete problem, then every problem in <math>NP</math> can be solved efficiently and therefore must be in <math>P</math>, hence proving <math>P = NP</math>. (See <math>NP</math>-complete for the exact definition.) Most theoretical computer scientists currently believe that the relationship among the classes <math>P</math>, <math>NP</math>, and <math>NPC</math> is as shown in the picture, with the P and NPC classes disjoint.
+
It is easy to show that <math>P \subseteq NP</math>, as if we are given any <math>L \in P</math>, a polynomial-time verifier for <math>L</math>, given input <math>w</math> and a certificate <math>c</math>, can simply ignore the certificate and decide if <math>w \in L</math>.
 +
 
 +
An important role in this discussion is played by the set of <math>NP</math>-complete problems (or <math>NPC</math>) which can be loosely described as the hardest problems in <math>NP</math>. More precisely, a language <math>L</math> is ''NP-complete'' if both are true:
 +
 
 +
* <math>L \in NP</math>
 +
* Any language in NP has a polynomial-time reduction to <math>L</math> (NP-hardness).
 +
 
 +
The main idea behind a polynomial-time reduction is this: If we knew how to decide <math>L</math> in polynomial time, then any problem in <math>NP</math> can be converted into an instance of <math>L</math> in polynomial time, and then we can use the algorithm that decides <math>L</math> as a subroutine.
 +
 
 +
===Examples of P, NP, NP-complete problems===
 +
The following problems are examples of problems in <math>P</math> (i.e. ones we can answer in polynomial time as a function of input):
 +
 
 +
*Given a list of <math>n</math> integers, is it sorted in non-decreasing order?
 +
*Given a weighted, undirected graph <math>G = (V,E)</math> and two vertices <math>s, t \in E</math>, does there exist a path from <math>s</math> to <math>t</math> of weight at most <math>c</math>?
 +
*Given two positive integers <math>m</math> and <math>n</math> and a positive integer <math>d</math>, is it true that <math>d = \gcd(m,n)</math>?
 +
 
 +
A classic example of a problem that is <math>NP</math>-complete but not known to be in <math>P</math> is the ''subset sum problem'': Given a list <math>S</math> of <math>n</math> integers and a number <math>t</math>, all encoded in some base <math>b > 1</math>, is there some subset of numbers in <math>S</math> whose sum is <math>t</math>? For example, is there a subset of <math>\{-4,2,3,10,-8,7\}</math> whose sum is 14? The answer is ''yes,'' and it can be checked in polynomial time that the answer is ''yes'' (by giving the certificate <math>\{2,3,10,-8,7\}</math>, but this is a difficult problem to solve in general as a brute force solution requires <math>O(2^n n)</math> computations, and it is not known if subset sum is in <math>P</math>.
 +
 
 +
The following examples are NP-complete problems:
  
In essence, the <math>P = NP</math> question asks: if positive solutions to a <math>YES/NO</math> problem can be verified quickly, can the answers also be computed quickly? Here is an example to get a feeling for the question. Given a set of integers, does any subset of them sum to 0? For instance, does a subset of the set <math>\{-2, -3, 8, 15, -10\}</math> add up to <math>0</math>? The answer is <math>YES</math>, though it may take a little while to find a subset that does - and if the set was larger, it might take a very long time to find a subset that does. On the other hand, if someone claims that the answer is <math>YES</math>, because <math>\{-2, -3, -10, 15\}</math> add up to zero, then we can quickly check that with a few additions. Verifying that the subset adds up to zero is much faster than finding the subset in the first place. The information needed to verify a positive answer is also called a certificate. So we conclude that given the right certificates, positive answers to our problem can be verified quickly (i.e. in polynomial time) and that's why this problem is in <math>NP</math>.
+
*SAT, 3-SAT (Boolean satisfiability)
 +
*Subset sum
 +
*Vertex cover
 +
*Maximum clique in a graph
 +
*Set cover
 +
*Traveling salesman problem
  
 
The restriction to <math>YES/NO</math> problems doesn't really make a difference; even if we allow more complicated answers, the resulting problem (whether <math>FP = FNP</math>) is equivalent.
 
The restriction to <math>YES/NO</math> problems doesn't really make a difference; even if we allow more complicated answers, the resulting problem (whether <math>FP = FNP</math>) is equivalent.

Latest revision as of 14:11, 25 October 2017

$P$ versus $NP$ is one of the greatest computability and complexity problems of modern mathematics, and one of the Millennium Problems. $P$ the class of decision problems (those whose answer is either "yes" or "no," as opposed to other classes such as counting problems) that can be solved by a deterministic algorithm in polynomial time. $NP$ is the class of decision problems that can be solved by a non-deterministic algorithm in polynomial time. The $P$ versus $NP$ question asks whether these two classes are the same, or whether there are problems in $NP$ that are not in $P$.

Since all modern computers (with the exception of a few quantum computers) are deterministic, non-deterministic algorithms are of theoretical, rather than practical, interest. However, the class $NP$ can also be defined without reference to nondeterminism: This article is a stub. Help us out by expanding it.

Overview

The relation between the complexity classes $P$ and $NP$ is one of the most important open problems in theoretical computer science and mathematics. The most common measurements are time (how many steps it takes to solve a problem as a function of input, usually expressed with big-O notation) and space (how much memory it takes to solve a problem). In such analysis, a model of the computer for which time must be analyzed is required. Typically, such models assume that the computer is deterministic - that, given the computer's present state and any inputs, there is only one possible action that the computer might take - and sequential - it performs actions one after the other, such as a deterministic Turing machine. These assumptions reflect the behaviour of all practical computers yet devised, even including machines featuring parallel processing.

A decision problem is a problem that admits a yes or no answer (as opposed to an optimization problem, such as "What is the length of the longest path from $s$ to $t$?"). More formally, a decision problem may be thought of as a language $L$ for which we wish to decide if a given word $w$ belongs to the language.

We say that an algorithm $A$ decides a language $L$ if, for all inputs $w$, $A$ either accepts or rejects $w$.

The class P

The class $P$ consists of all those decision problems (languages) that can be decided using a deterministic Turing machine in an amount of time that is polynomial in the size of the input. More formally, $P = \bigcup_{k \ge 0} \text{TIME}(O(n^k))$ where $\text{TIME}(f(n))$ is the set of languages decidable by an $O(f(n))$-time deterministic Turing machine.

The class NP

The class $NP$ (for non-deterministic polynomial time) consists of all those decision problems that are decidable using a non-deterministic Turing machine. It is equivalent to the set of decision problems for which whose yes instances are efficiently verifiable in polynomial time using a certificate. Examples of problems in $P$ and $NP$ are given below.

Importance

Arguably, the biggest open question in theoretical computer science concerns the relationship between those two classes:

Is $P$ equal to $NP$?

In a 2002 poll of 100 researchers, 61 believed the answer is no, 9 believed the answer is yes, 22 were unsure, and 8 believed the question may be independent of the currently accepted axioms, and so impossible to prove or disprove.

The Clay Mathematics Institute has offered a USD $1,000,000 prize for a correct solution, as it has listed it as one of its Millennium Problems.

Arguments

It is easy to show that $P \subseteq NP$, as if we are given any $L \in P$, a polynomial-time verifier for $L$, given input $w$ and a certificate $c$, can simply ignore the certificate and decide if $w \in L$.

An important role in this discussion is played by the set of $NP$-complete problems (or $NPC$) which can be loosely described as the hardest problems in $NP$. More precisely, a language $L$ is NP-complete if both are true:

  • $L \in NP$
  • Any language in NP has a polynomial-time reduction to $L$ (NP-hardness).

The main idea behind a polynomial-time reduction is this: If we knew how to decide $L$ in polynomial time, then any problem in $NP$ can be converted into an instance of $L$ in polynomial time, and then we can use the algorithm that decides $L$ as a subroutine.

Examples of P, NP, NP-complete problems

The following problems are examples of problems in $P$ (i.e. ones we can answer in polynomial time as a function of input):

  • Given a list of $n$ integers, is it sorted in non-decreasing order?
  • Given a weighted, undirected graph $G = (V,E)$ and two vertices $s, t \in E$, does there exist a path from $s$ to $t$ of weight at most $c$?
  • Given two positive integers $m$ and $n$ and a positive integer $d$, is it true that $d = \gcd(m,n)$?

A classic example of a problem that is $NP$-complete but not known to be in $P$ is the subset sum problem: Given a list $S$ of $n$ integers and a number $t$, all encoded in some base $b > 1$, is there some subset of numbers in $S$ whose sum is $t$? For example, is there a subset of $\{-4,2,3,10,-8,7\}$ whose sum is 14? The answer is yes, and it can be checked in polynomial time that the answer is yes (by giving the certificate $\{2,3,10,-8,7\}$, but this is a difficult problem to solve in general as a brute force solution requires $O(2^n n)$ computations, and it is not known if subset sum is in $P$.

The following examples are NP-complete problems:

  • SAT, 3-SAT (Boolean satisfiability)
  • Subset sum
  • Vertex cover
  • Maximum clique in a graph
  • Set cover
  • Traveling salesman problem

The restriction to $YES/NO$ problems doesn't really make a difference; even if we allow more complicated answers, the resulting problem (whether $FP = FNP$) is equivalent.