{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Second Order Optimization\n", "\n", "###### By Hrishikesh Patel, Raghav Dhanuka, Siddharth Uniyal & Dicky Ardian Yunindra Wardana" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1) Non-linear Equations and Least Squares" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Equations not having form $ f(x) = ax + b $ or $ f(x) = constant $ are considered as non-linear equations.
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "i.e. $ f(x) = e^{x} + x^{2} $ is a non-liner equation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1) Set of Non-linear equations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider a set of m equations in n variables $ x_{1},...,x_{n} $ .

\n", "$ f_{i}(x) = 0;$ $ i = 1,...,m $

\n", "Here $ i^{th} $ equation $ f_{i}(x) $ is also a $ i^{th} $ residual. And $x = (x_{1},...,x_{n}) $ is a vector of unknowns.

\n", "Collectively $ f(x) = (f_{1}(x),...,f_{m}(x)) $, where $ f(x) $ is a vector of residuals." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example,

\n", "Consider a system of two non-linear equations $ e^{x_{1}} + (x_{2})^{3} = 4 $ and $ (x_{1})^{2} + cos(x_{2}) = 10 $.

\n", "Here $x = (x_{1},x_{2}) $ is a 2-vector of unknowns,

\n", "$ f_{1}(x) = e^{x_{1}} + (x_{2})^{3} - 4 $ and $ f_{2}(x) = (x_{1})^{2} + cos(x_{2}) - 10 $ are two residuals and

\n", "$ f(x) = (e^{x_{1}} + (x_{2})^{3} - 4,\\ (x_{1})^{2} + cos(x_{2}) - 10) $ is a vector of residuals." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2) Non-linear Least squares" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Here our goal is to find $ \\widehat{x} $ which minimizes $ \\left \\| f(x) \\right \\|^{2} = f_{1}(x)^{2} + ... + f_{m}(x)^{2} $.\n", "

\n", "Optimality condition for any $ \\widehat{x} $ being a solution is to satisfy $ \\nabla \\left \\| f(x) \\right \\|^{2} = 0$.

\n", "$2D f(x)^Tf(\\hat{x})= 0$

\n", "Important: The optimality condition is necessary condition but not sufficient. There may be other values that satisfy the condition but they are not solutions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.3) Difficulty in solving Non-linear Least squares problem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Non-linear least squares are difficult to solve in reality as there is no luxury to have QR factorisation like in Linear equations and least squares problem.

\n", "However, there are some heuristics algorithms like following are useful for solving the equations.
\n", "1) Gauss-Newton algorithm
\n", "2) Levenberg–Marquardt algorithm
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, if the function is convex then it reduces computation to find the minimum value of the function. Let's understand the concept of Convexity." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convexity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### What is convexity ?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A function is convex if its graph remains above its tangents." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "e.g. $ f(x) = x^{2} $
\n", "It is clear that the tangents remain below the graph of $ x^{2} $" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mathematically, function $ F(x_{1},...,x_{n}) $ is convex if its Hessian matrix (second derivative matrix) is positive semidefinite at all x." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Why convexity ?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convexity prevents two local minima and hence if function is convex then it would have usually one minimum value." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Newton Algorithm\n", "$$ x^{k+1} = x^k - ( \\bigtriangledown f(x)^T \\bigtriangledown f(x))^{-1} \\bigtriangledown f(x)^T * f(x) $$\n", "\n", "$$ H = \\bigtriangledown f(x)^T \\bigtriangledown f(x) $$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "newton (generic function with 1 method)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using LinearAlgebra,Plots\n", "\n", "function newton(f, Df, x1; kmax = 20, tol = 1e-6)\n", " x = x1\n", " n = length(x)\n", " fnorms = zeros(0,1)\n", " for k = 1:kmax\n", " fk = f(x)\n", " Dfk = Df(x)\n", " fnorms = [fnorms; norm(fk)]\n", " if norm(fk) < tol\n", " break\n", " end\n", " \n", " if det(Dfk'Dfk) == 0\n", " println(\"(Dfk'Dfk) matrix has dependent Columns, therefore, Quitting\")\n", " break \n", " end\n", " \n", " x = x - inv(Dfk'Dfk)*Dfk'fk\n", "\n", " end\n", " return x, fnorms\n", "end" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Dfk'Dfk) matrix has dependent Columns, therefore, Quitting\n", "\n", "Moved 3 steps before quitting.\n" ] } ], "source": [ "f(x) = (x[1]-3)^2/4 + (x[2]-5)^2/9\n", "Df(x) = [(x[1]-3)/2 2*(x[2]-5)/9]\n", "\n", "x, gnorms = newton(f, Df, [0.5, 1.1])\n", "println(\"\")\n", "println(\"Moved \", length(gnorms), \" steps before quitting.\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "1.0\n", "\n", "\n", "1.5\n", "\n", "\n", "2.0\n", "\n", "\n", "2.5\n", "\n", "\n", "3.0\n", "\n", "\n", "0.5\n", "\n", "\n", "1.0\n", "\n", "\n", "1.5\n", "\n", "\n", "2.0\n", "\n", "\n", "2.5\n", "\n", "\n", "3.0\n", "\n", "\n", "k\n", "\n", "\n", "|f|\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot(gnorms, shape=:circle, legend = false, xlabel = \"k\", ylabel = \"|f|\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# levenberg-Marquardt Algorithm\n", "
\n", "$$ x^{k+1} = x^k - ( \\bigtriangledown f(x)^T \\bigtriangledown f(x) + \\lambda I)^{-1} \\bigtriangledown f(x)^T * f(x) $$" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "levenberg_marquardt (generic function with 1 method)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eye(n) = 1.0 .* Matrix(I,n,n)\n", "function levenberg_marquardt(f, Df, x1, lambda1; kmax=100, tol=1e-6)\n", " n = length(x1)\n", " x = x1\n", " lambda = lambda1\n", " objectives = zeros(0,1)\n", " residuals = zeros(0,1)\n", "\n", " for k = 1:kmax\n", " fk = f(x)\n", " Dfk = Df(x)\n", " \n", " objectives = [objectives; norm(fk)^2]\n", " residuals = [residuals; norm(2*Dfk'*fk)]\n", " \n", " if norm(2*Dfk'*fk) < tol\n", " break\n", " end\n", " \n", " xt = x - inv(Dfk'Dfk + lambda*eye(n))*Dfk'fk\n", " \n", " if norm(f(xt)) < norm(fk)\n", " lambda = 0.8*lambda\n", " x = xt\n", " else\n", " lambda = 2.0*lambda\n", " end\n", " end\n", " \n", " return x, Dict([(\"objectives\", objectives),(\"residuals\", residuals)])\n", "end" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Moved 10 steps before function converges to minimum with final step [2.99999; 4.97316]\n" ] } ], "source": [ "f(x) = (x[1]-3)^2/4 + (x[2]-5)^2/9\n", "Df(x) = [(x[1]-3)/2 2*(x[2]-5)/9]\n", "\n", "x, history = levenberg_marquardt(f, Df, [0.5, 1.1], 1.0)\n", "\n", "println(\"Moved \", length(history[\"residuals\"][1:10]), \" steps before function converges to minimum with final step \", x)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "2\n", "\n", "\n", "4\n", "\n", "\n", "6\n", "\n", "\n", "8\n", "\n", "\n", "10\n", "\n", "\n", "0\n", "\n", "\n", "1\n", "\n", "\n", "2\n", "\n", "\n", "3\n", "\n", "\n", "k\n", "\n", "\n", "|f|\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot(sqrt.(history[\"objectives\"][1:10]), shape = :circle,legend = false, xlabel = \"k\", ylabel = \"|f|\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Fitting\n", "\n", "As in linear model fitting, we choose the parameter θ by (approximately) minimizing the sum of the squares of the prediction residuals\n", "\n", "$$\\sum\\limits_{i = 1}^N {{{\\left( {\\widehat f\\left( {{x^{\\left( i \\right)}};\\theta } \\right) - {y^{\\left( i \\right)}}} \\right)}^2}} $$\n", "\n", "Function we have used here is :\n", "$$f\\left( {x;\\theta } \\right) = {\\theta _1}{e^{{\\theta _2}x}}\\cos \\left( {{\\theta _3}x + {\\theta _4}} \\right)$$\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "\n", "5\n", "\n", "\n", "10\n", "\n", "\n", "15\n", "\n", "\n", "20\n", "\n", "\n", "-1.0\n", "\n", "\n", "-0.5\n", "\n", "\n", "0.0\n", "\n", "\n", "0.5\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Random\n", "Random.seed!(45)\n", "\n", "theta_ex = [1, -0.2, pi, pi/3]\n", "#Choose 60 points x between 0 and 20.\n", "M = 30\n", "xd = [5*rand(M); 5 .+ 15*rand(M)]\n", "\n", "# Evaluate function at these points.\n", "yd = theta_ex[1] * exp.(theta_ex[2]*xd) .*cos.(theta_ex[3] * xd .+ theta_ex[4])\n", "\n", "# Create a random perturbation of yd.\n", "N = length(xd)\n", "yd = yd .* (1 .+ 0.2*randn(N)) .+ 0.015 * randn(N)\n", "\n", "# Plot data points.\n", "using Plots\n", "scatter(xd, yd, legend=false)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "\n", "5\n", "\n", "\n", "10\n", "\n", "\n", "15\n", "\n", "\n", "20\n", "\n", "\n", "-1.0\n", "\n", "\n", "-0.5\n", "\n", "\n", "0.0\n", "\n", "\n", "0.5\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f(theta) = theta[1] * exp.(theta[2]*xd) .*cos.(theta[3] * xd .+ theta[4]) - yd\n", "Df(theta) = hcat(exp.(theta[2]*xd) .* cos.(theta[3] * xd .+ theta[4]),theta[1] * ( xd .* exp.(theta[2]*xd) .*\n", "cos.(theta[3] * xd .+ theta[4])),-theta[1] * ( exp.(theta[2]*xd) .* xd .*sin.(theta[3] * xd .+ theta[4])),\n", "-theta[1] * ( exp.(theta[2]*xd) .*sin.(theta[3] * xd .+ theta[4])) )\n", "\n", "theta1 = [1, 0, 1, 0]\n", "theta, history = levenberg_marquardt(f, Df, theta1, 1.0)\n", "theta\n", "\n", "# Plot the fitted model.\n", "x = range(0, stop= 20, length = 500)\n", "y=theta[1]*exp.(theta[2]*x) .* cos.(theta[3]*x .+ theta[4])\n", "plot!(x, y, legend = false)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.0.3 (4 threads)", "language": "julia", "name": "julia-1.0k" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.0.3" } }, "nbformat": 4, "nbformat_minor": 2 }