All Cylinders


Here you need to do two things, nearly at the same time:

  1. kick off your monitor-data collection process (on nk00, xen domU)
  2. submit the job on the spark cluster (on nk01, the xen vm running spark )


Before you sumbit the spark job, you want to kick off your virtual server monitoring.

1. Collect monitoring data

Logon to your domain zero (dom0) host, ie the hypervisor or mother of your xen virtual images.

If you haven't installed it yet: sudo install xentop

Then run it as follows (more details in next section) :

sudo xentop -b -d 1 -i 500 > xt.log

2. Submit spark job

Meanwhile on your spark system submit your taxi job on the spark cluster as follows:

$SPARK_HOME/bin/spark-submit  --master yarn-cluster --num-executors 12   \
        taxi.jar hdfs:///user/dmn/20160421_nyc_taxi

For the above you need your freshly created taxi.jar and the location of the NYC taxi-ride csv files on your HDFS cluster.

Check via hadoop's web interface, how your job is faring... In this case that is on node 1: http://nk01:8088/cluster/apps

Notes by Data Munging Ninja. Generated on nini:sync/20151223_datamungingninja/allcylinders at 2016-10-18 07:19