Dig deeper into Spark dataframes to predict the next months sale units, via simple linear regression. Get comfortable with dataframes functions for aggregation, defining new columns, dropping and renaming columns, joining dataframes on multiple keys, converting from a wide to a narrow format dataframe. Also learn how to create a simple User Defined Function (UDF).
Aardvark is about putting a bunch of code files together in one file, and executing one command to do all that's necessary to produce the desired output.
Stop being a manager of files, but concentrate on code writing!
How do I package a Scala program for submitting on my Spark cluster? And how do I check that the cluster is firing on all cylinders? (Note: using NYC Taxi data)
Cicero has upset his wife, and to make amends he is looking for the flower-shop nearest to his house on the Palantine hill (41.8898803,12.4849976).
Data: the OpenStreetMap pbf file of Europe (17 gigabyte).
Query tool of choice: Hive.