Simple Sales Prediction
 
01_intro
20160616

Intro

You have a years worth of sales data for 2 shops, for 6 products. Use simple linear regression to predict the sales for the next month. Tool to use: Apache Spark dataframes.

You receive the data in this 'wide' format, and beware: not all of the cells have data! Spot the 'nulls'.

scala> sale_df.orderBy("shop","product").show()

|----------+-------+----+----+----+----+----+----+----+----+----+----+----+----+
|      shop|product| jan| feb| mar| apr| may| jun| jul| aug| sep| oct| nov| dec|
|----------+-------+----+----+----+----+----+----+----+----+----+----+----+----+
|  megamart|  bread| 371| 432| 425| 524| 468| 414|null| 487| 493| 517| 473| 470|
|  megamart| cheese|  51|  56|  63|null|  66|  66|  50|  56|  58|null|  48|  50|
|  megamart|   milk|null|  29|  26|  30|  26|  29|  29|  25|  27|null|  28|  30|
|  megamart|   nuts|1342|1264|1317|1425|1326|1187|1478|1367|1274|1380|1584|1156|
|  megamart| razors| 599|null| 500| 423| 574| 403| 609| 520| 495| 577| 491| 524|
|  megamart|   soap|null|   7|   8|   9|   9|   8|   9|   9|   9|   6|   6|   8|
|superstore|  bread| 341| 398| 427| 344| 472| 370| 354| 406|null| 407| 465| 402|
|superstore| cheese|  57|  52|null|  54|  62|null|  56|  66|  46|  63|  55|  53|
|superstore|   milk|  33|null|null|  33|  30|  36|  35|  34|  38|  32|  35|  29|
|superstore|   nuts|1338|1369|1157|1305|1532|1231|1466|1148|1298|1059|1216|1231|
|superstore| razors| 360| 362| 366| 352| 365| 361| 361| 353| 317| 335| 290| 406|
|superstore|   soap|   8|   8|   7|   8|   6|null|   7|   7|   7|   8|   6|null|
|----------+-------+----+----+----+----+----+----+----+----+----+----+----+----+

(in the appendix of this article, you'll find the Scala code that creates this dataframe)

All the data manipulation is done in Spark Dataframes.

These dataframe functions are used:

  • groupBy(..).agg( sum(..), avg(..) )
  • withColumn()
  • withColumnRenamed()
  • join()
  • drop()
  • select(), ..

Formula

Here's the formula to calculate the coefficients for the simple linear regression, picked up from article Simple Linear Regression :

 
Notes by Data Munging Ninja. Generated on nini:/home/willem/sync/20151223_datamungingninja/simplesalesprediction at 2016-06-25 10:02