Terentia Flores
 
05_Hive_UDF
20151226

User Defined Functions in Hive

In Hive it is very easy to define your own function:

  • write some Java code
  • wrap it into a JAR
  • add the jar in Hive
  • define a function to your UDF

The user defined hive function of UdfRoughDistance.java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import java.lang.Math;
import org.apache.hadoop.hive.ql.exec.UDF;

public class UdfRoughDistance extends UDF {

    /** Calculate the approximate distance between two points */ 
    public double evaluate(double lat1, double lon1, double lat2, double lon2) {

        // convert to radians
        lat1 = lat1 * Math.PI / 180.0;
        lon1 = lon1 * Math.PI / 180.0;
        lat2 = lat2 * Math.PI / 180.0;
        lon2 = lon2 * Math.PI / 180.0;

        double r = 6371.0; // radius of the earth in kilometer 
        double x = (lon2 - lon1) * Math.cos((lat1+lat2)/2.0);
        double y = (lat2 - lat1);
        return r*Math.sqrt(x*x+y*y);

    } // end evaluate

    /* The above formulas are called the "equirectangular approximation", 
     * to be used for small distances, if performance is more important 
     * than accuracy. 
     * See: http://www.movable-type.co.uk/scripts/latlong.html
     */
}

Once you have setup the proper class path, just compile your java file:

javac UdfRoughDistance.java

.. and create a jar file:

jar cvf udf.jar UdfRoughDistance.class

.. which you incorporate into hive as follows:

ADD JAR udf.jar;
CREATE TEMPORARY FUNCTION UDF_ROUGH_DISTANCE as 'UdfRoughDistance';

My classpath is defined as follows (from script: compile_jar_udf.sh) :

7
8
9
10
11
12
13
14
15
16
17
export HH=/opt/hadoop-2.7.1/share/hadoop
export HI=/opt/apache-hive-1.2.1-bin
export CLASSPATH=$CLASSPATH\
:$HH/common/hadoop-common-2.7.1.jar\
:$HH/hdfs/hadoop-hdfs-2.7.1.jar\
:$HH/mapreduce/lib/*\
:$HH/common/lib/*\
:$HH/tools/lib/*\
:$HI/lib/hive-common-1.2.1.jar\
:$HI/lib/lib/hive-contrib-1.2.1.jar\
:$HI/lib/hive-exec-1.2.1.jar
 
Notes by Data Munging Ninja. Generated on nini:sync/20151223_datamungingninja/terentiaflores at 2016-10-18 07:18