Simple Linear Regression | |||||||||||

Pages:
01_intro 02_formulas 03_data 04_implementation 05_gradient_descent 99_formulas 99_nitty_gritty One Page |
05_gradient_descent
20160120
## Gradient descent## Minimize the Residual Sum of SquaresThe slope & intercept can also be found via minimizing the RSS, which is a function of the slope (w₁) and intercept (w₀): Finding the minimum or maximum corresponds to setting the derived function equal to zero. KLAD For a concave or convex function there is only one point where there is a maximum or minimum, ie where the derivative equals zero. If a function is not concave nor convex, then it may have multiple maxima or minima. ## Side-step: hill descentThe gradient descent algorithm tries to find a minimum, in a step-wise fashion. This can easily be compared to the following hill descent for a simple parabole function, eg. ## Hill descent with overshootSuppose you start at point x=1. To see if the function is going down or up we add a minute quantity, dx, and know that we are descending if f(x) > fx(x+dx). Increment x with step=15.0, and look again at the direction in which the function is going. Here the direction has flipped, we have Keep doing this until the difference is smaller than a predefined constant. To reach the result of 9.99999773021 required 57 iterations.
## Hill descent without overshootingThe following algo is similar, but avoids overshooting: in case we overshoot, it keeps adjusting the step-size until we land on the left side of the minimum. This time to get the result of 9.99999773022 required only 21 iterations, quite a bit better than above, but that of course may be due to the function and the choosen stepsize.
## The common functions for above code## Function and direction
## Plotting
## Mathematical approachThe above 2 solutions are the naive implementations, that perform fairly well. The mathematical way would be to calculate the next x as: with step η typically chosen as 0.1. During the ensuing iterations η can be kept constant or decreasing eg. Choosing the stepsize is an art! ## Gradient descentSimilarly to above, for gradient descent we update the coefficients by moving in the negative gradient direction (gradient is direction of increase). The increment in gradient descent is: |