All Cylinders

One Page

04_launch

20160428

Launch

Here you need to do two things, nearly at the same time:

Before you sumbit the spark job, you want to kick off your virtual server monitoring.

Logon to your domain zero (dom0) host, ie the hypervisor or mother of your xen virtual images.

If you haven't installed it yet: sudo install xentop

Then run it as follows (more details in next section) :

sudo xentop -b -d 1 -i 500 > xt.log

Meanwhile on your spark system submit your taxi job on the spark cluster as follows:

$SPARK_HOME/bin/spark-submit  --master yarn-cluster --num-executors 12   \
        taxi.jar hdfs:///user/dmn/20160421_nyc_taxi

For the above you need your freshly created taxi.jar and the location of the NYC taxi-ride csv files on your HDFS cluster.

Check via hadoop's web interface, how your job is faring... In this case that is on node 1: http://nk01:8088/cluster/apps

Notes by Data Munging Ninja. Generated on nini:sync/20151223_datamungingninja/allcylinders at 2016-10-18 07:19