Simple Linear Regression

04_implementation
20160102

# Implementation of the 3 formulas

Code:

 ```15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 ``` ``````# Formula 1 xmean = sum(x_v)/len(x_v) ymean = sum(y_v)/len(y_v) wh1_f1= sum([ (xi-xmean)*(yi-ymean) for (xi,yi) in zip(x_v,y_v) ]) / \ sum([ (xi -xmean)**2 for xi in x_v ]) wh0_f1= ymean-wh1_f1*xmean print "Formula 1: slope={} intercept={}".format(wh1_f1,wh0_f1) # Formula 2 n=len(x_v) sig_y = sum(y_v) sig_x = sum(x_v) sig_xy = sum( [ xi*yi for (xi,yi) in zip(x_v,y_v) ]) sig_x2 = sum( [ xi*xi for xi in x_v ] ) wh1_f2= (sig_xy - (sig_y*sig_x)/n ) / ( sig_x2 - sig_x*sig_x/n) wh0_f2= (sig_y - wh1_f2 * sig_x) /n print "Formula 2: slope={} intercept={}".format(wh1_f2,wh0_f2) # Formula 3 # Watchout: for calculating the correlation don't use np.correlate() # but use the pearson correlation! wh1_f3=pearsonr( y_v, x_v)[0] * np.std(y_v)/np.std(x_v) wh0_f3= ymean-wh1_f3*xmean print "Formula 3: slope={} intercept={}".format(wh1_f3,wh0_f3)``````

Output:

``````Formula 1: slope=1.53848181625 intercept=117.041068001
Formula 2: slope=1.53848181625 intercept=117.041068001
Formula 3: slope=1.53848181625 intercept=117.041068001``````

# Use libraries

## Python

You can use scipy's `stats.linregress()` or numpy's `np.polyfit()`

Code:

 ```15 16 17 18 19 20 21 ``` ``````# scipy stats wh1_l1, wh0_l1, r_value, p_value, std_err = stats.linregress(x_v,y_v) print "Library Function 1: slope={} intercept={}".format(wh1_l1,wh0_l1) # numpy polyfit wh1_l2,wh0_l2=np.polyfit(x_v,y_v,1) print "Library Function 2: slope={} intercept={}".format(wh1_l2,wh0_l2)``````

Output:

``````Library Function 1: slope=1.53848181625 intercept=117.041068001
Library Function 2: slope=1.53848181625 intercept=117.041068001``````

## Plot the result

Plot the points plus fitted line:

``````    # fitted line, compute 2 points
xl=[ 0.8*min(x_v), 1.2*max(x_v) ]
yl=map( lambda x: slope*x+intercept, xl)

plt.scatter(x_v, y_v)  # all points
plt.plot( xl,yl, 'r')  # fitted line
plt.show()``````

Predict the price for 100, 200 and 400 m² :

``````    [ (x,round(slope*x+intercept)) for x in  [100,200,400] ]

[(100, 271.0),
(200, 425.0),
(400, 732.0)]``````

## R implementation using lm()

First load the vectors x_v and y_v (see higher).

``````    df=data.frame(sqm=x_v, price=y_v)
model=lm(price~sqm, df)

model\$coefficients
(Intercept)         sqm
117.041068    1.538482 ``````

Plot:

``````    plot(price~sqm,df)
abline(model,col="red",lwd=3)``````

Predict the price for a 100, 200 and 400 m² house:

``````    predict(model, data.frame(sqm=c(100,200,400)))

1        2        3
270.8892 424.7374 732.4338 ``````

Notes by Data Munging Ninja. Generated on nini:sync/20151223_datamungingninja/linregsimple at 2016-10-18 07:18