Learning Hive QL

While based on SQL, HiveQL does not strictly follow the full SQL-92 standard. HiveQL offers extensions not in SQL, including multitable inserts and create table as select, but only offers basic support for indexes.

Also, HiveQL lacks support for transactions and materialized views, and only limited sub query support. There are plans for adding support for insert, update, and delete with full ACID functionality.

Internally, a compiler translates HiveQL statements into a directed acyclic graph of MapReduce jobs, which are submitted to Hadoop for execution.

Commands are non-SQL statement such as setting a property or adding a resource. They can be used in HiveQL scripts or directly in the CLI orBeeline.

Command

Description

quit
exit
Use quit or exit to leave the interactive shell.
reset Resets the configuration to the default values (as of Hive 0.10: see HIVE-3202). Any configuration parameters that were set using the set command or -hiveconf parameter in hive commandline will get reset to default value.

Note that this does not apply to configuration parameters that were set in set command using the “hiveconf:” prefix for the key name (for historic reasons).

set <key>=<value> Sets the value of a particular configuration variable (key).
Note: If you misspell the variable name, the CLI will not show an error.
set Prints a list of configuration variables that are overridden by the user or Hive.
set -v Prints all Hadoop and Hive configuration variables.
add FILE[S] <filepath> <filepath>*
add JAR[S] <filepath> <filepath>*
add ARCHIVE[S] <filepath> <filepath>*
Adds one or more files, jars, or archives to the list of resources in the distributed cache.
list FILE[S]
list JAR[S]
list ARCHIVE[S]
Lists the resources already added to the distributed cache.
list FILE[S] <filepath>*
list JAR[S] <filepath>*
list ARCHIVE[S] <filepath>*
Checks whether the given resources are already added to the distributed cache or not.
delete FILE[S] <filepath>*
delete JAR[S] <filepath>*
delete ARCHIVE[S] <filepath>*
Removes the resource(s) from the distributed cache.
! <command> Executes a shell command from the Hive shell.
dfs <dfs command> Executes a dfs command from the Hive shell.
<query string> Executes a Hive query and prints results to standard output.
source FILE <filepath> Executes a script file inside the CLI.
compile `<groovy string>` AS GROOVY NAMED <name> This allows inline Groovy code to be compiled and be used as a UDF (as of Hive 0.13.0)

Sample Usage:

  hive> set mapred.reduce.tasks=32;
  hive> set;
  hive> select a.* from tab1;
  hive> !ls;
  hive> dfs -ls;
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s