Difference between revisions of "Newton's method"
(Definition and derivation.) |
m |
||
(9 intermediate revisions by 2 users not shown) | |||
Line 4: | Line 4: | ||
At each step of the recursion, we have <math>x_i</math> and seek a root of <math>f(x)</math>. Since all nonconstant [[Linear function|linear functions]] have exactly one root, as long as <math>f'(x_i) \neq 0</math> we can construct a [[Taylor polynomial#Tangent-line approximation|tangent-line approximation]] to <math>f(x)</math> and find its root as an approximation. The tangent-line approximation of <math>f(x)</math> about <math>x_i</math> is <cmath>f(x_i) + (x - x_i)f'(x_i).</cmath> We seek the value <math>x = x_{i+1}</math> such that the above expression equals <math>0</math>; hence, <cmath>f(x_i) + (x_{i+1} - x_i)f'(x_i) = 0.</cmath> Dividing by <math>f'(x_i)</math> (as long as <math>f'(x_i) \neq 0</math>), <cmath>\frac{f(x_i)}{f'(x_i)} + x_{i+1} - x_i = 0.</cmath> Therefore, the desired value of <math>x_{i+1}</math> is <cmath>x_{i+1} = x_i - \frac{f(x_i)}{f'(x_{i})}.</cmath> | At each step of the recursion, we have <math>x_i</math> and seek a root of <math>f(x)</math>. Since all nonconstant [[Linear function|linear functions]] have exactly one root, as long as <math>f'(x_i) \neq 0</math> we can construct a [[Taylor polynomial#Tangent-line approximation|tangent-line approximation]] to <math>f(x)</math> and find its root as an approximation. The tangent-line approximation of <math>f(x)</math> about <math>x_i</math> is <cmath>f(x_i) + (x - x_i)f'(x_i).</cmath> We seek the value <math>x = x_{i+1}</math> such that the above expression equals <math>0</math>; hence, <cmath>f(x_i) + (x_{i+1} - x_i)f'(x_i) = 0.</cmath> Dividing by <math>f'(x_i)</math> (as long as <math>f'(x_i) \neq 0</math>), <cmath>\frac{f(x_i)}{f'(x_i)} + x_{i+1} - x_i = 0.</cmath> Therefore, the desired value of <math>x_{i+1}</math> is <cmath>x_{i+1} = x_i - \frac{f(x_i)}{f'(x_{i})}.</cmath> | ||
− | {{ | + | ==Worked example== |
+ | Problem: Find the values of the roots of <math>f(x) = x^2 - x - 1</math>. | ||
+ | |||
+ | Solution: We could use the [[quadratic formula]] to find that the roots are <math>\frac{1 \pm \sqrt 5}{2}</math>, but this approach does not immediately yield a decimal value. | ||
+ | |||
+ | To form a guess, we note that <math>\sqrt 5</math> is slightly larger than <math>2</math>, so a suitable guess for the greater root is <math>\frac{1 + 2}{2} = 1.5</math>. | ||
+ | |||
+ | The derivative of <math>x^2 - x - 1</math> is <math>2x - 1</math> by the [[Derivative/Formulas|power rule]] for derivatives. Therefore, the recursive formula is | ||
+ | |||
+ | <cmath>x_{i+1} = x_i - \frac{x_i^2 - x_i - 1}{2x_i - 1} = \frac{x_i^2 + 1}{2x_i - 1}.</cmath> | ||
+ | |||
+ | Starting at <math>x_0 = 1.5</math>, we apply this formula repeatedly: | ||
+ | <cmath>\begin{alignat*}{2} x_1 &= \frac{1.5^2 + 1}{2 * 1.5 - 1} &&= 1.625, \\ | ||
+ | x_2 &= \frac{1.625^2 + 1}{2 * 1.625 - 1} &&= 1.6180556, \\ | ||
+ | x_3 &= \frac{1.6180556^2 + 1}{2 * 1.6180556 - 1} &&= 1.6180340, \\ | ||
+ | x_4 &= \frac{1.6180340^2 + 1}{2 * 1.6180340 - 1} &&= 1.6180340. \\ \end{alignat*}</cmath> | ||
+ | |||
+ | Because <math>x_4 = x_3</math> as calculated, the ratio <math>\frac{f(x_3)}{f'(x_3)}</math> must be very close to <math>0</math>. Since <math>f'(x_3) = 2.2360680</math> is not too large, <math>f(x_3)</math> must be quite close to <math>0</math>, so <math>x_3 = 1.6180340</math> is a very good estimate of the greater root. | ||
+ | |||
+ | The sum of the roots is <math>1</math> by [[Vieta's formulas]], so the lesser root is simply <math>1 - 1.6180340 = -0.6180340</math>. | ||
+ | |||
+ | ==Failure cases== | ||
+ | Although powerful, Newton's method is delicate and can be very sensitive to the initial guess and type of function, even when the function is differentiable everywhere. | ||
+ | |||
+ | ===Zero derivative=== | ||
+ | |||
+ | Suppose in the above example of finding a root of <math>x^2 - x - 1</math>, we had started with <math>x_0 = 0.5</math>. In the process of calculating <math>x_1</math>, then, we would have to divide by <math>f'(x_0) = 2x_0 - 1 = 0</math>, creating an undefined result. | ||
+ | |||
+ | ===Wrong root=== | ||
+ | Suppose we try to estimate the greater root as before, but start with <math>x_0 = -0.5</math>. Applying Newton's method to <math>x^2 - x - 1</math> yields the following sequence of estimates: | ||
+ | |||
+ | <cmath>\begin{alignat*}{2} x_1 &= \frac{-0.5^2 + 1}{2 * (-0.5) - 1} &&= -0.625, \\ | ||
+ | x_2 &= \frac{(-0.625)^2 + 1}{2 * (-0.625) - 1} &&= -0.6180556, \\ | ||
+ | x_3 &= \frac{(-0.6180556)^2 + 1}{2 * (-0.6180556) - 1} &&= -0.6180340, \\ | ||
+ | x_4 &= \frac{(-0.6180340)^2 + 1}{2 * (-0.6180340) - 1} &&= -0.61803395, \\ | ||
+ | x_5 &= \frac{(-0.61803395)^2 + 1}{2 * (-0.61803395) - 1} &&= -0.61803406, \\ | ||
+ | x_6 &= \frac{(-0.61803406)^2 + 1}{2 * (-0.61803406) - 1} &&= -0.61803395. \\ \end{alignat*}</cmath> | ||
+ | |||
+ | These estimates converge to the lesser root, in this case because the initial estimate was closer to the lesser root than the greater root. In general, predicting which root Newton's method will finally converge to is difficult, and Newton's method may converge to a given root even if a closer root is available within the interval between the initial estimate and the given root. | ||
+ | |||
+ | ===Periodic behavior without finding a root=== | ||
+ | When finding roots of <math>x^2 - x - 1</math> as above, the estimates eventually stabilize into a predictable pattern: in the <math>x_0 = 1.5</math> case, <math>x_4 = x_3</math>, so every subsequent estimate will equal <math>x_3</math>; in the <math>x_0 = -0.5</math> case, <math>x_6 = x_4</math>, so the subsequent estimates will oscillate, with every even estimate equal to <math>x_4</math> and every odd estimate equal to <math>x_5</math>. These behaviors are due to the limited precision of the calculations; with indefinite precision, the estimates would continue to approach the root without becoming periodic. | ||
+ | |||
+ | However, for some functions, Newton's method can cycle without coming close to a root. Consider <math>f(x) = x^4 - 3x^2 - 2</math>, which indeed has two real roots <math>\pm \sqrt{r}</math>, where <math>r</math> is the (unique) positive root of <math>x^2 - 3x - 2</math>. However, if we attempt to find the value of <math>\sqrt{r}</math> using Newton's method on <math>f(x)</math> with an initial guess <math>x_0 = 1</math>, then we get <math>f'(x) = 4x^3 - 6x</math>, so <cmath>\begin{align*} x_1 &= 1 - \frac{1 - 3 - 2}{4 - 6} = -1, \\ | ||
+ | x_2 &= -1 - \frac{1 - 3 - 2}{-4 + 6} = 1. \\ \end{align*}</cmath> | ||
+ | Because the value of <math>x_{i+1}</math> only depends on the single previous term <math>x_i</math>, by induction <math>x_{2n} = 1</math> and <math>x_{2n+1} = -1</math> for all nonnegative integers <math>n</math>, so the estimates never converge to a root. | ||
+ | |||
+ | ===Unbounded sequence of estimates=== | ||
+ | The function <math>f(x) = e^x</math> is always positive, so it has no roots. Since <math>f(x) = e^x</math> satisfies the differential equation <math>f(x) = f'(x)</math> for all <math>x</math>, the ratio <math>\frac{f(x)}{f'(x)}</math> is always <math>1</math>, so the sequence of estimates produced by Newton's method becomes <math>x_{i+1} = x_i - 1</math>, diverging in the negative direction. | ||
+ | |||
+ | The estimates can also decrease (or increase) without bound even if the function has a root. For example, <math>g(x) = e^x - e^{2x}</math> has a root at <math>x = 0</math>. We have <math>g'(x) = e^x - 2e^{2x}</math> (using the [[Derivative/Formulas|chain rule]]), so the steps of Newton's method are <cmath>x_{i+1} = x_i - \frac{e^{x_i} - e^{2x_i}}{e^{x_i} - 2e^{2x_i}}.</cmath> However, for <math>x_i < \ln \left( \frac{1}{2} \right)</math>, we have <math>e^{x_i} - e^{2x_i} > e^{x_i} - 2e^{2x_i} > 0</math>, so <math>x_{i+1} < x_i - 1</math>. Thus, if the initial guess is less than <math>\ln \left( \frac{1}{2} \right)</math>, each estimate is guaranteed to be less than the previous by a margin of more than <math>1</math>, so the estimates again diverge toward <math>-\infty</math>. | ||
+ | |||
+ | ==Applications in Analytic Number Theory (Advanced)== | ||
+ | The applications of Newton's Method actually apply to [[Hensel's Lemma]], specifically in the <math>p</math> adic integers <math>\mathbb{Z}_p</math>. In particular, take some polynomial <math>p\in\mathbb{Z}[x]</math> and an integer <math>x</math> such that <math>p(x)\equiv 0\pmod{p}</math>. Our goal is to approximate some <math>\hat{x}</math> such that <math>p(\hat{x})\equiv 0\pmod{p^2}</math>. However, in the <math>p</math> adics, we can't just ``lift" arbitrarily. We actually need to approximate such a solution in the next <math>p</math> power level, and then refine it iteratively. | ||
+ | |||
+ | Here's what that looks like realistically. WLOG we assume that <math>x</math> is some <math>a_0</math> between <math>0</math> and <math>p-1</math>. We search for <math>\hat{x}=a_0+a_1p</math> such that <math>p(\hat{x})\equiv 0\pmod{p^2}</math>. Note the following critical step: | ||
+ | <center><cmath>p(\hat{x})=p(a_0+a_1p)=p(a_0)+p'(a_0)a_1p+(a_1p)^2b</cmath></center> | ||
+ | for some <math>b\in\mathbb{Z}</math>. By assumption we know that <math>p(a_0)\equiv 0\pmod{p}</math> so let <math>p(a_0)=pt</math> for some <math>t</math>. The desired congruence holds modulo <math>p^2</math> when <math>t+p'(a_0)a_1\equiv 0\pmod{p}</math>. When <math>p'(a_0)\not\equiv 0\pmod{p}</math>, we see that <math>a_1\equiv -t/p'(a_0)\pmod{p}</math> which means that | ||
+ | <center><cmath>\hat{x}=a_0+a_1p=a_0-\frac{pt}{p'(a_0)}=x-\frac{p(a_0)}{p'(a_0)}.</cmath></center> | ||
+ | This is exactly the statement of the Newtonian Algorithm! Our root is approximated, so we found some <math>\hat{x}</math> such that <math>p(\hat{x})\equiv 0\pmod{p^2}</math>. But this isn't the whole story. | ||
+ | |||
+ | |||
+ | '''Theorem''': Let <math>p(x)\in\mathbb{Z}_p[x]</math> and <math>x\in\mathbb{Z}_p</math> such that <math>p(x)\equiv 0\pmod{p^n}</math>. If <math>k=v(P'(x))<n/2</math> then there exists an <math>\hat{x}=x-p(x)/p'(x)</math> such that (1) <math>p(\hat{x})\equiv 0\pmod{p^{n+1}}</math>, (2) <math>\hat{x}\equiv x\pmod{p^{n-k}}</math> and (3) <math>k=v(p'(x))=v(p'(\hat{x}))</math>. | ||
+ | |||
+ | ''Proof'': Let <math>p(x)=p^ny</math> for <math>y\in\mathbb{Z}_p</math> and <math>p'(x)=p^ku</math> where <math>u\in\mathbb{Z}_p^{\times}</math>. By definition we see that | ||
+ | <center><cmath>\hat{x}-x=-\frac{p(x)}{p'(x)}=-\frac{p^ny}{p^ku}=-p^{n-k}yu^{-1}\in p^{n-k}\mathbb{Z}_p.</cmath></center> | ||
+ | This proves (2). Notice, however, that in the Taylor expansion of <math>p</math> about <math>x</math> we see that | ||
+ | <center><cmath>p(\hat{x})=p(x)-\frac{p(x)p'(x)}{p'(x)}+(\hat{x}-x)^2t=(\hat{x}-x)^2t</cmath></center> | ||
+ | where <math>t\in\mathbb{Z}_p</math>. Also notice that | ||
+ | <center><cmath>p(\hat{x})=(\hat{x}-x)^2t\in p^{2n-2k}\mathbb{Z}_p</cmath></center> | ||
+ | but we finish (1) since <math>p^{2n-2k}\mathbb{Z}_p=p^n\cdot p^{n-2k}\mathbb{Z}_p\subset p^{n-1}\mathbb{Z}_p</math>. To prove (3) we just compute the order of <math>p'(\hat{x})</math>. Taking the Taylor expansion of <math>p'</math> at point <math>x</math> gives | ||
+ | <center><cmath>p'(\hat{x})=p'(x+(\hat{x}-x))=p'(x)+(\hat{x}-x)s=p^ku+p^{n-k}zs=p^k(u+p^{n-2k}zs)=p^kv.</cmath></center> | ||
+ | But since <math>k<n/2\implies 2k<n\implies 2k-n<0</math> and <math>u\in\mathbb{Z}_p^{\times}</math> we have | ||
+ | <center><cmath>v=u+p^{n-2k}zs\in u+p\mathbb{Z}_p\subset \mathbb{Z}_p^{\times}</cmath></center> | ||
+ | which proves that <math>k=v(p'(x))=v(p'(\hat{x}))</math> <math>\square</math> | ||
+ | |||
+ | This sets up the entire motivation behind Hensel's Lemma in a more formal way. In addition, the proof of the lemma is just an extension of what we previously noted, which actually makes all of our previous work worth it in the long run. | ||
+ | |||
+ | |||
+ | '''Hensel's Lemma''': Assume <math>p\in\mathbb{Z}_p[x]</math> and <math>x\in\mathbb{Z}_p</math> satisfies <math>p(x)\equiv 0\pmod{p^n}</math>. If <math>k=v(p'(x))<n/2</math>, then there exists a unique root <math>\xi</math> of <math>p\in\mathbb{Z}_p[x]</math> such that <math>\xi\equiv x\pmod{p^{n-k}}</math> and <math>k=v(p'(x))=v(p'(\xi))</math>. | ||
+ | |||
+ | ''Proof'': We first prove the existence of <math>\xi</math>. Let <math>x_0=x</math> be given. We can construct <math>x_1\in\mathbb{Z}</math> such that | ||
+ | <cmath>x_1\equiv x\pmod{p^{n-k}}~\text{and}~p(x_1)\equiv 0\pmod{p^{n+1}},~v(p'(x_0))=v(p'(x_1))=k.</cmath> | ||
+ | We can do this for <math>x_2</math> and <math>x_n</math> such that we have a sequence <math>(x_n)_{n\ge 0}</math> where <math>x_n\to \xi</math> as <math>n\to \infty</math> where <math>p(\xi)=0</math> and <math>\xi\equiv x\pmod{p^{n-k}}</math>. Now we prove that <math>\xi</math> is unique. For the sake of contradiction, suppose that there are two roots <math>\xi</math> and <math>\omega</math> satisfying | ||
+ | <center><cmath>\xi\equiv \omega\pmod{p^{n-k}}.</cmath></center> | ||
+ | Since it was given that <math>k<n/2</math> we have <math>2k<n</math>. This means that <math>n-k\ge k+1</math> so we have at the very least | ||
+ | <center><cmath>\xi\equiv \omega\pmod{p^{k+1}}.</cmath></center> | ||
+ | Now we have, | ||
+ | <center><cmath>p(\omega)=p(\xi)+p'(\xi)(\omega-\xi)+(\omega-\xi)^2a</cmath></center> | ||
+ | for some <math>a\in\mathbb{Z}_p</math>. Since <math>\xi</math> and <math>\omega</math> were assumed to be roots, we factor to get | ||
+ | <center><cmath>(\omega-\xi)(p'(\xi)+(\omega-\xi)a)=0</cmath></center> | ||
+ | but since <math>v(p'(\xi))=k</math> and <math>v((\omega-\xi)a)\ge k+1</math> | ||
+ | <center><cmath>p'(\xi)+(\omega-\xi)a\ne 0</cmath></center> | ||
+ | which means that <math>\omega-\xi=0\implies \omega=\xi</math>, so uniqueness holds <math>\square</math> | ||
+ | |||
+ | ==See also== | ||
+ | *[[Secant method]] | ||
+ | *[[Bisection method]] | ||
[[Category: Calculus]] | [[Category: Calculus]] |
Latest revision as of 22:34, 28 May 2023
Newton's method uses the derivative of a differentiable function to approximate its real or complex roots. For a function , the approximations are defined recursively by To begin the recursion, an initial guess must be chosen. Often the choice of determines which of several possible roots is found, and in some cases the initial guess can cause the recursion to cycle or diverge instead of converging to a root.
Contents
Derivation
At each step of the recursion, we have and seek a root of . Since all nonconstant linear functions have exactly one root, as long as we can construct a tangent-line approximation to and find its root as an approximation. The tangent-line approximation of about is We seek the value such that the above expression equals ; hence, Dividing by (as long as ), Therefore, the desired value of is
Worked example
Problem: Find the values of the roots of .
Solution: We could use the quadratic formula to find that the roots are , but this approach does not immediately yield a decimal value.
To form a guess, we note that is slightly larger than , so a suitable guess for the greater root is .
The derivative of is by the power rule for derivatives. Therefore, the recursive formula is
Starting at , we apply this formula repeatedly:
Because as calculated, the ratio must be very close to . Since is not too large, must be quite close to , so is a very good estimate of the greater root.
The sum of the roots is by Vieta's formulas, so the lesser root is simply .
Failure cases
Although powerful, Newton's method is delicate and can be very sensitive to the initial guess and type of function, even when the function is differentiable everywhere.
Zero derivative
Suppose in the above example of finding a root of , we had started with . In the process of calculating , then, we would have to divide by , creating an undefined result.
Wrong root
Suppose we try to estimate the greater root as before, but start with . Applying Newton's method to yields the following sequence of estimates:
These estimates converge to the lesser root, in this case because the initial estimate was closer to the lesser root than the greater root. In general, predicting which root Newton's method will finally converge to is difficult, and Newton's method may converge to a given root even if a closer root is available within the interval between the initial estimate and the given root.
Periodic behavior without finding a root
When finding roots of as above, the estimates eventually stabilize into a predictable pattern: in the case, , so every subsequent estimate will equal ; in the case, , so the subsequent estimates will oscillate, with every even estimate equal to and every odd estimate equal to . These behaviors are due to the limited precision of the calculations; with indefinite precision, the estimates would continue to approach the root without becoming periodic.
However, for some functions, Newton's method can cycle without coming close to a root. Consider , which indeed has two real roots , where is the (unique) positive root of . However, if we attempt to find the value of using Newton's method on with an initial guess , then we get , so Because the value of only depends on the single previous term , by induction and for all nonnegative integers , so the estimates never converge to a root.
Unbounded sequence of estimates
The function is always positive, so it has no roots. Since satisfies the differential equation for all , the ratio is always , so the sequence of estimates produced by Newton's method becomes , diverging in the negative direction.
The estimates can also decrease (or increase) without bound even if the function has a root. For example, has a root at . We have (using the chain rule), so the steps of Newton's method are However, for , we have , so . Thus, if the initial guess is less than , each estimate is guaranteed to be less than the previous by a margin of more than , so the estimates again diverge toward .
Applications in Analytic Number Theory (Advanced)
The applications of Newton's Method actually apply to Hensel's Lemma, specifically in the adic integers . In particular, take some polynomial and an integer such that . Our goal is to approximate some such that . However, in the adics, we can't just ``lift" arbitrarily. We actually need to approximate such a solution in the next power level, and then refine it iteratively.
Here's what that looks like realistically. WLOG we assume that is some between and . We search for such that . Note the following critical step:
for some . By assumption we know that so let for some . The desired congruence holds modulo when . When , we see that which means that
This is exactly the statement of the Newtonian Algorithm! Our root is approximated, so we found some such that . But this isn't the whole story.
Theorem: Let and such that . If then there exists an such that (1) , (2) and (3) .
Proof: Let for and where . By definition we see that
This proves (2). Notice, however, that in the Taylor expansion of about we see that
where . Also notice that
but we finish (1) since . To prove (3) we just compute the order of . Taking the Taylor expansion of at point gives
But since and we have
which proves that
This sets up the entire motivation behind Hensel's Lemma in a more formal way. In addition, the proof of the lemma is just an extension of what we previously noted, which actually makes all of our previous work worth it in the long run.
Hensel's Lemma: Assume and satisfies . If , then there exists a unique root of such that and .
Proof: We first prove the existence of . Let be given. We can construct such that We can do this for and such that we have a sequence where as where and . Now we prove that is unique. For the sake of contradiction, suppose that there are two roots and satisfying
Since it was given that we have . This means that so we have at the very least
Now we have,
for some . Since and were assumed to be roots, we factor to get
but since and
which means that , so uniqueness holds