Simple Sales Prediction
 
12_to_wide
20160616

Turn narrow dataframe into wide

Just for matter of completeness, here a way of turning the narrow dataframe back into a wide one.

We have this narrow dataframe:

sale_narrow_df.show()

+----------+-------+---+----+
|      shop|product|  x|   y|
+----------+-------+---+----+
|superstore|  bread|  1| 341|
|superstore| cheese|  1|  57|
|superstore| razors|  1| 360|
|superstore|   soap|  1|   8|
|superstore|   milk|  1|  33|
|superstore|   nuts|  1|1338|
|  megamart|  bread|  1| 371|
|  megamart| cheese|  1|  51|
|  megamart| razors|  1| 599|
|  megamart|   nuts|  1|1342|
|superstore|  bread|  2| 398|
|superstore| cheese|  2|  52|
|superstore| razors|  2| 362|
|superstore|   soap|  2|   8|
|superstore|   nuts|  2|1369|
|  megamart|  bread|  2| 432|
|  megamart| cheese|  2|  56|
|  megamart|   soap|  2|   7|
|  megamart|   milk|  2|  29|
|  megamart|   nuts|  2|1264|
+----------+-------+---+----+

And here's a way to turn it back into a wide frame:

Step 1: create a list of dataframes, one per month

val months=Array("jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec") 

def sel( mon:Int, colname:String) = {
        sale_narrow_df.where(s"x=$mon").select("shop","product","y").
                       withColumnRenamed("y",colname) 
} 
    
val df_ls = for ( i <- 0 until months.length ) yield { sel(i+1, months(i) ) } 

Step 2: join the list of dataframes on columns 'shop' and 'product' :

def jn(df1: org.apache.spark.sql.DataFrame, df2:org.apache.spark.sql.DataFrame) = {
    df1.join(df2,Seq("shop","product"), "outer")
}

val wide_df=df_ls.reduce( jn(_,_) ) 

Et voila:

wide_df.orderBy("shop","product").show()

+----------+-------+----+----+----+----+----+----+----+----+----+----+----+----+
|      shop|product| jan| feb| mar| apr| may| jun| jul| aug| sep| oct| nov| dec|
+----------+-------+----+----+----+----+----+----+----+----+----+----+----+----+
|  megamart|  bread| 371| 432| 425| 524| 468| 414|null| 487| 493| 517| 473| 470|
|  megamart| cheese|  51|  56|  63|null|  66|  66|  50|  56|  58|null|  48|  50|
|  megamart|   milk|null|  29|  26|  30|  26|  29|  29|  25|  27|null|  28|  30|
|  megamart|   nuts|1342|1264|1317|1425|1326|1187|1478|1367|1274|1380|1584|1156|
|  megamart| razors| 599|null| 500| 423| 574| 403| 609| 520| 495| 577| 491| 524|
|  megamart|   soap|null|   7|   8|   9|   9|   8|   9|   9|   9|   6|   6|   8|
|superstore|  bread| 341| 398| 427| 344| 472| 370| 354| 406|null| 407| 465| 402|
|superstore| cheese|  57|  52|null|  54|  62|null|  56|  66|  46|  63|  55|  53|
|superstore|   milk|  33|null|null|  33|  30|  36|  35|  34|  38|  32|  35|  29|
|superstore|   nuts|1338|1369|1157|1305|1532|1231|1466|1148|1298|1059|1216|1231|
|superstore| razors| 360| 362| 366| 352| 365| 361| 361| 353| 317| 335| 290| 406|
|superstore|   soap|   8|   8|   7|   8|   6|null|   7|   7|   7|   8|   6|null|
+----------+-------+----+----+----+----+----+----+----+----+----+----+----+----+
 
Notes by Data Munging Ninja. Generated on nini:/home/willem/sync/20151223_datamungingninja/simplesalesprediction at 2016-06-25 10:02