How to Install Hive on Mac OS

Setup Hive on Mac

This tutorial will provide step by step instruction to setup Hive on Mac OS.

Download Apache Hive
You can click here to download apache Hive 3.0.0 version or go to Apache site http://www-eu.apache.org/dist/hive/ to download directly from there. Move the downloaded Hive binary to below path & extract it.

$HOME/hadoop/

There are 7 steps to complete in order run Hive Each Step is described in detail below

  1. Validating hadoop: Hadoop version can be checked using below command.
    
    $ hadoop version
    
    Output: Hadoop 3.0.3
    

  2. Set below Hive & mysql variables in the .profile file in $HOME directory
    
    ## HIVE env variables
    export HIVE_HOME=$HOME/hadoop/apache-hive-3.0.0-bin 
    export PATH=$PATH:/$HIVE_HOME/bin
    
    
    ## MySQL ENV
    export PATH=$PATH:/usr/local/mysql-8.0.12-macos10.13-x86_64/bin
    

  3. Setup Hive Metastore: Mysql, oracle or Derby databases can be used for Hive metastore. Generally it is suggested to use Mysql over Derby. Derby is inbuilt with Hive but with this, you can create only single Hive session.

    • Mysql Setup: Mysql can be downloaded from this mysql site link. After downloading & installing mysql server, please follow below steps to start mysql server.

      1. Go to system preference and click on Dolphin like MySql icon as shown in figure below. Mysql System  Preference

      2. Next window will have button "Start MySql Server" to start server. Also ensure that checkbox "Start MySql when your computer starts up". Move to next step if it is already started. Start Mysql Server

      3. Add the below command in .profile file in home directory
        
        export PATH=$PATH:/usr/local/mysql-8.0.12-macos10.13-x86_64/bin
        

      4. By default, there is no password for root user of mysql. We need to login root user & create new hive user to be used by Hive & spark. Login to MySql root user and execute below commands
        
        mysql -u root -p 
        CREATE DATABASE metastore;
        
        use metastore;
        
        SOURCE [$HOME path]/hadoop/apache-hive-3.0.0-bin/scripts/metastore/upgrade/mysql/hive-schema-3.0.0.mysql.sql 
        
        CREATE USER 'hiveuser'@'%' IDENTIFIED BY 'hivepassword';
        
        GRANT all on metastore.* to hiveuser;
        FLUSH PRIVILEGES;
        

    • Derby Setup: Derby is inbuilt and can be setup by running below commands.
      
      cd $HOME/hadoop
      
      schematool -dbType derby -initSchema
      

      If you get below error after running above script
      
      Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
      org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
      Underlying cause: java.io.IOException : Schema script failed, errorcode 2
      Use --verbose for detailed stacktrace.
      *** schemaTool failed ***
      

      Then search NUCLEUS_ASCII & NUCLEUS_MATCH function in below metadata load file and comment them using double hyphen(--).
      
      $HOME/hadoop/apache-hive-3.0.0-bin/scripts/metastore/upgrade/derby/hive-schema-3.0.0.derby.sql
      
      And run the below command again
      schematool -dbType derby -initSchema
      

  4. MySql Connector: Click here Download MySql connector or directly download from MySql site and extract it. Place the jar file at the following location.
    
    $HOME/hadoop/apache-hive-3.0.0-bin/lib
    

  5. Setting up configuration files for Hive binary.
    • hive-config.sh: Add below command in $HIVE_HOME/bin/hive-config.sh file.
      
      export HADOOP_HOME=[$HADOOP_HOME path]
      

    • hive-site.xml (Mysql as Metastore) : Please create hive-site.xml file with below properties under $HOME/hadoop/apache-hive-3.0.0-bin/conf/ directory
      
      <?xml version="1.0" encoding="UTF-8" standalone="no"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <!--
         Licensed to the Apache Software Foundation (ASF) under one or more
         contributor license agreements.  See the NOTICE file distributed with
         this work for additional information regarding copyright ownership.
         The ASF licenses this file to You under the Apache License, Version 2.0
         (the "License"); you may not use this file except in compliance with
         the License.  You may obtain a copy of the License at
             http://www.apache.org/licenses/LICENSE-2.0
         Unless required by applicable law or agreed to in writing, software
         distributed under the License is distributed on an "AS IS" BASIS,
         WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
         See the License for the specific language governing permissions and
         limitations under the License.
      -->
      <configuration>
          
      <property>
          <name>hive.querylog.location</name>
          <value>[$HIVE_HOME PATH]/log/hive.log</value>
          <description>Location of Hive run time structured log file</description>
        </property>
        <property>
          <name>hive.querylog.enable.plan.progress</name>
          <value>false</value>
          <description>
            Whether to log the plan's progress every time a job's progress is checked.
            These logs are written to the location specified by hive.querylog.location
          </description>
        </property>
      
        <property>
          <name>hive.log.explain.output</name>
          <value>false</value>
          <description>
            Whether to log explain output for every query.
            When enabled, will log EXPLAIN EXTENDED output for the query at INFO log4j log level
            and in WebUI / Drilldown / Show Query.
          </description>
        </property>
      
      <!-- Properties for MySql as Metastore -->
         <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&useSSL=false</value>
            <description>metadata is stored in a MySQL server</description>
         </property>
         <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.cj.jdbc.Driver</value>
            <description>MySQL JDBC driver class</description>
         </property>
         <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>hiveuser</value>
            <description>user name for connecting to mysql server</description>
         </property>
         <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>hivepassword</value>
            <description>password for connecting to mysql server</description>
         </property>
      
      <!-- Properties to use Hive by Spark -->
         <property>
            <name>hive.metastore.schema.verification</name>
            <value>false</value>
            <description>connection from Spark</description>
         </property>
      
      <property>
          <name>hive.metastore.warehouse.dir</name>
          <value>hdfs://localhost:9000/user/hive/warehouse</value>
          <description>Warehouse Location</description>
        </property>
      
      </configuration>
      
    • hive-site.xml (Derby as Metastore) : Please replace below property with MySql properties in the above hive-site.xml site.
      
      <property>
         <name>javax.jdo.option.ConnectionURL</name>
         <value>jdbc:derby:$HOME/hadoop/metastore_db;create=true </value>
         <description>JDBC connect string for a JDBC metastore </description>
      </property>
      

  6. Setup HDFS (creating directory) for storing Hive data & temporary data
    
    Create warehouse folder under hive and provide permission
    hadoop fs -mkdir -p /user/hive/warehouse
    hadoop fs -chmod g+w /user/hive/warehouse
    
    
    Create tmp folder in root and provide permission
    hadoop fs -mkdir -p /tmp
    hadoop fs -chmod g+w /tmp
    
    hadoop fs -mkdir -p /tmp/hive
    hadoop fs -chmod 777 /tmp/hive
    
    
  7. Starting Hive: Hive can be started with 'hive' command and hive shell will open
    
    $ hive
    
    hive> show databases;
    OK
    default
    

    If hive is able to connect to MySql metastore then above command will work else it will fail with below error
    
    FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    

    Start hive console again with below command & again run show database command
    
    hive --hiveconf hive.root.logger=INFO,console
    
    hive> show databases;
    
    Note: Hive console in above mode will show exact error why Hive is failing.

    If you are seeing timezone error like below
    
    java.sql.SQLException: The server time zone value 'PDT' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support.
    
    Then change the below value in hive-site.xml
    
    FROM: jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&amp;useSSL=false
    
    TO:  jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=UTC