Introduction

Intention

Write a few short programs to transfer waypoint data from an Openstreetmap protobuf file to a Hadoop Distributed File System. Then use Hive to write simple SQL queries on this data. Use the 'user-defined-function' feature of Hive to write custom Java functions that can be used in Hive-SQL.

The road map for this article.

General principle: the further you go, the deeper the detail.

Architecture: this high level overview sketches how the data is read from the protobuf file and piped into Redis. Another process then reads the data from Redis and moves it into HDFS. After that the spotlight is on writing Hive queries.
Running Hive Queries
Ingest: more detailed explanation of the ingestion pipeline, and how to execute it.
Hive UDF: how to incorporate User Defined Functions into Hive
All the source

Prerequisites

a Hadoop cluster and Hive, and a recent version of Java
a Redis installation, anywhere on your network
a Linux system (anywhere on your network) that can hold a big Openstreetmap PBF file, and a Go language compiler. Does not need to be superfast.

Versions of software used:

Debian Linux 8.2
Java 1.8.0_66
hadoop 2.7.1
hive 1.2.1
redis 2.8.14

Github repository

The source code can be cloned from github.com/dtmngngnnj/terentiaflores

Notes by Data Munging Ninja. Generated on nini:sync/20151223_datamungingninja/terentiaflores at 2016-10-18 07:18