Terentia Flores | |||
One Page |
03_query
20151226
Hive queriesTable with all data: t_europeFor this we assume that the ingest has taken place (details: see next section), and the CSV files have been put under directory '/user/dmn/europe' on HDFS. First create the table, linked to the CSV files on HDFS:
First query: how many waypoints do we have?
The 'rough distance' functionFor finding out the distance between two waypoints, there is no standard function in Hive, so we write a User Defined Function in Java, compile it, and put it in a jar named 'udf.jar' (more details: see section 'hive_udf'). Then we incorporate it into Hive as follows:
Now we can make use of the function as shown in this query that answers how many waypoints we have in a 10km radius around the Palantine hill (lat,lon= 41.8898803,12.4849976) in Rome:
Answer:
Table with subset of data: t_palantineLet's put the waypoints close to the Palantine hill in a separate table. We also want to have a column 'distance' being the distance between the Palantine hill and all the points. At the same time we also want to filter by 'distance' :
This takes a considerable amount of time: 7 minutes 17 seconds 410 msec. Maybe because the distance has to be calculated twice? Let's use a CTE (a common table expression) in our SQL, to avoid that.
This only takes: 73.209 seconds. Big improvement! Flowers for TerentiaLet's now answer the question: which flower shops are near Cicero's on the Palantine hill?
The result:
His best bet would be 'Vivaio Piante' at 2.29 km: (picture belongs to Google maps) |