LECTURE 7: NONLINEAR EQUATIONS and 1-d OPTIMIZATION • The topic is related to optimization (lecture 8) so we cover solving nonlinear equations and 1-d optimization here • f(x)=0 (either in 1-d or many dimensions) • In 1-d we can bracket the root and then find it, in many dims we cannot • Bracketing in 1-d: if f(x)<0 at a and f(x)>0 at b>a (or the other way around) and f(x) is continuous then there is a root at a
Other situations:

no roots or one or two roots but no sign change many roots

singularity

Bisection for bracketing • We can use bisection: divide interval by 2, evaluate at the new position, and choose left or right halfinterval depending on where the function has opposite sign. Number of steps is log2[(b-a)/e], where e is the error tolerance. The method must succeed. • Error at next step is en+1= en/2, so converges linearly • Higher order methods scale as en+1=cenm, with m>1

1d OPTIMIZATION: LOCAL AND GLOBAL EXTREMA, BRACKETING • Optimization: minimization or maximization • In most cases only local minimum (B,D, F) or local maximum (A,C,E,G) can be found, difficult to prove they are global minimum (D) or global maximum (G) • We bracket a local minimum it if we find Xf(Y) and f(Z)>f(Y)

Golden ratio search

• Remember that we need a triplet of points to bracket a
Newton (-Raphson) method • Most celebrated of all methods, we will use it extensively in higher dimensions • Requires a gradient: f(x+d)=f(x)+df’(x)+… • We want f(x+d)=0, hence d=-f(x)/f’(x) • Rate of convergence is quadratic (NR 9.4) ei+1=ei2f’’(x)/(2f’(x))

Newton-Raphson is not failure free

Newton-Raphson for 1-d optimization • Expand function to 2nd order (note: we did this already when expanding log likelihood) • f(x+d)=f(x)+df’(x)+d2f’’(x)/2+… • Expand its derivative f’(x+d)=df’(x)+df’’(x)+… • Extremum requires f’(x+d)=0 hence d=-f’(x)/f’’(x) • This requires f’’: Newton’s optimization method • In least square problems we sometimes only need f’2: Gauss-Newton method (next lecture)

Secant method for nonlinear equations • Newton’s method using numerical evaluation of a gradient defined across the entire interval: f’(x2)=[f(x2)-f(x1)]/(x2-x1) • x3=x2-f(x2)/f’(x2) • Can fail, since does not always bracket • m=1.618 (golden ratio), a lot faster than bisection

False position method for nonlinear equations • Similar to secnt, but keep the points that bracket the solution, so guaranteed to succeed, but with more steps than secant

Sometimes convergence can be slow

Better methods without derivatives such as Ridders or Brent’s method combine these basic techniques: use these as default option and (optionally) switch to Newton once the solution is guaranteed for a higher convergence rate

Parabolic method for 1-d optimization • Approximate the function of a,b,c as a parabola

Gradient descent in 1-d

• Suppose we do not have f’’, but we have f’: so we know the direction of function descent. We can take a small step in that direction: d=-hf’(x). We must choose the sign of h to descend (if minimum is what we want) and it must be small enough not to overshoot. • We can make a secant version of this method by evaluating gradient with finite difference: f’(x2)=[f(x2)-f(x1)]/(x2-x1)

Nonlinear equations in many dimensions

• f(x,y)=0 and g(x,y)=0: but the two functions f and g are unrelated, so it is difficult to look for general methods that will find all solutions

Newton-Raphson in higher dimensions • Assume N functions

• • • • • •

Taylor expand Define Jacobian In matrix notation Setting we find This is a matrix equations: solve with LU Update and iterate again

Globally convergent methods and secant methods • If quadratic approximation in N-R method is not accurate taking a full step may make the solution worse. Instead one can do a line search backtracking and combine it with a descent direction (or use a thrust region). • When derivatives are not available we can approximate them: multi-dimensional secant method (Broyden’s method). • Both of these methods have clear analogies in optimization and since the latter is more important for data science we will explain the concepts in optimization lecture next.

Relaxation methods • Another class of methods solving x=f(x) • Take x=2-e-x, start at x0=1 and evaluate f(x0)=2-e-1=1.63=x1 • Now use this solution again: f(x1)=2-e-1.63=1.80=x2 • Correct solution is x=1.84140… • If there are multiple solutions which one one converges to depends on the starting point • Convergence is not guaranteed: suppose x0 is exact solution: xn+1=f(xn)=f(x0)+(xn-x0)f’(x0)+…since x0=f(x0) we get xn+1-x0=f’(x0)(xn-x0) so this converges if |f’(x0)|<1 • When this is not satisfied we can try to invert the equation to get u=f-1(u) so that |f’-1(u)|<1

Relaxation methods in many dimensions • Same idea: write equations as x=f(x,y) and y=g(x,y), use some good starting point and see if you converge • Easily generalized to N variables and equations • Simple, and (sometimes) works! • Again impossible to find all the solutions unless we know something about their structure

Over-relaxation • • • • • •

We can accelerate the convergence: Dxn=xn+1-xn=f(xn)-xn xn+1=xn+(1+w)Dxn if w=0 this is relaxation method If w>0 this is over-relaxation method No general theory for how to select w : trial and error

Literature • Numerical Recipes Ch. 9, 10 • Newman, Ch. 6

LECTURE 7: NONLINEAR EQUATIONS and 1-d ... - GitHub

solving nonlinear equations and 1-d optimization here. • f(x)=0 (either in 1-d or many ... dimensions. • Requires a gradient: f(x+δ)=f(x)+δf'(x)+… • We want f(x+δ)=0, hence δ=-f(x)/f'(x). • Rate of convergence is quadratic (NR 9.4) εi+1. =εi. 2f''(x)/(2f'(x)) ... Secant method for nonlinear equations. • Newton's method using.

721KB Sizes 0 Downloads 282 Views

Recommend Documents

lecture 16: ordinary differential equations - GitHub
Bulirsch-Stoer method. • Uses Richardson's extrapolation again (we used it for. Romberg integration): we estimate the error as a function of interval size h, then we try to extrapolate it to h=0. • As in Romberg we need to have the error to be in

Nonlinear Ordinary Differential Equations - Problems and Solutions ...
Page 3 of 594. Nonlinear Ordinary Differential Equations - Problems and Solutions.pdf. Nonlinear Ordinary Differential Equations - Problems and Solutions.pdf.

Lecture 7
Nov 22, 2016 - Faculty of Computer and Information Sciences. Ain Shams University ... A into two subsequences A0 and A1 such that all the elements in A0 are ... In this example, once the list has been partitioned around the pivot, each sublist .....

Evaluation of 1D nonlinear total-stress site response model ...
It is well-known that nonlinear soil behavior exhibits a strong influence on surficial ... predictions, statistically significant conclusions are drawn on the predictive ...

Lecture 1 - GitHub
Jan 9, 2018 - We will put special emphasis on learning to use certain tools common to companies which actually do data ... Class time will consist of a combination of lecture, discussion, questions and answers, and problem solving, .... After this da

Transcriptomics Lecture - GitHub
Jan 17, 2018 - Transcriptomics Lecture Overview. • Overview of RNA-Seq. • Transcript reconstruc囉n methods. • Trinity de novo assembly. • Transcriptome quality assessment. (coffee break). • Expression quan懿a囉n. • Differen鶯l express

Lectures / Lecture 7
Apr 5, 2010 - Contents. 1 Introduction (0:00–5:00). 2. 2 Security (5:00–112:00). 2 .... use it to distribute pornography, you don't have to pay for disk space or bandwidth, but you might make money off ... requests—the more of a threat you pose

Lecture # 08 (Nonlinear Systems).pdf
o Newton's Method using Jacobians for two equations. o Newton-Raphson Method without Jacobians for two equations. o Generalized Newton's Method from ...

Network Lecture # 7
16.4 Describe the signal pattern produced on the medium by the Manchester-encoded preamble of the IEEE. 802.3 MAC frame. 10101010 10101010 10101010 10101010 10101010 10101010 10101010 transmitted in order from left to right. The Manchester pattern pr

Lectures / Lecture 7
Apr 5, 2010 - Next we might try passing a phrase like “DELETE FROM ... your hard drive. This is why you should never open attachments from sources you don't trust! • Worms are more insidious because they don't require user interaction in order to

lecture 15: fourier methods - GitHub
LECTURE 15: FOURIER METHODS. • We discussed different bases for regression in lecture. 13: polynomial, rational, spline/gaussian… • One of the most important basis expansions is ... dome partial differential equations. (PDE) into ordinary diffe

lecture 12: distributional approximations - GitHub
We have data X, parameters θ and latent variables Z (which often are of the ... Suppose we want to determine MLE/MAP of p(X|θ) or p(θ|X) over q: .... We limit q(θ) to tractable distributions. • Entropies are hard to compute except for tractable

lecture 4: linear algebra - GitHub
Inverse and determinant. • AX=I and solve with LU (use inv in linalg). • det A=L00. L11. L22 … (note that Uii. =1) times number of row permutations. • Better to compute ln detA=lnL00. +lnL11. +…

Ordinary Differential Equations Autumn 2016 - GitHub
Mar 29, 2017 - A useful table of Laplace transforms: http://tutorial.math.lamar.edu/pdf/Laplace Table.pdf. Comment. Here you finally get the opportunity to practise solving ODE's using the powerful method of Laplace transformations. Please takes note

7"3V - GitHub
Page 1. (7"3V. ' 2- & a. ~z f. Car-). Page 2. 5*,, > O. $ /£o 0k>fov>«

pdf-0943\periodicities-in-nonlinear-difference-equations-advances ...
... apps below to open or edit this item. pdf-0943\periodicities-in-nonlinear-difference-equatio ... e-mathematics-and-applications-by-ea-grove-g-ladas.pdf.

Java 7 - GitHub
Sep 8, 2011 - Memory sizing options. Alexey Ragozin – http://blog.ragozin.info. Available combinations of garbage collection algorithms in HotSpot JVM.

Old Dominion University Lecture 2 - GitHub
Old Dominion University. Department of ... Our Hello World! [user@host ~]$ python .... maxnum = num print("The biggest number is: {}".format(maxnum)) ...

lecture 5: matrix diagonalization, singular value ... - GitHub
We can decorrelate the variables using spectral principal axis rotation (diagonalization) α=XT. L. DX. L. • One way to do so is to use eigenvectors, columns of X. L corresponding to the non-zero eigenvalues λ i of precision matrix, to define new

C2M - Team 101 lecture handout.pdf - GitHub
Create good decision criteria in advance of having to make difficult decision with imperfect information. Talk to your. GSIs about their experience re: making ...

LECTURE 8: OPTIMIZATION in HIGHER DIMENSIONS - GitHub
We want to descend down a function J(a) (if minimizing) using iterative sequence of steps at a t . For this we need to choose a direction p t and move in that direction:J(a t. +ηp t. ) • A few options: fix η. • line search: vary η ... loss (co

lecture 9: monte carlo sampling and integration - GitHub
analysis if needed. • There is also a concept of quasi-random numbers, which attempt to make the Monte Carlo integrals converge faster than N-1/2: e.g. Sobol ... Note that the weights can be >1 and the method is not very useful when the values>>1.