The Apache Hive project provides a data warehouse view of the data in HDFS. Using a SQL-like language Hive lets you create summarizations of your data, perform ad-hoc queries, and analysis of large datasets in the Hadoop cluster. The overall approach with Hive is to project a table structure on the dataset and then manipulate it with HiveQL.
To make it easy to interact with Hive we use a tool in the Hortonworks Sandbox called Beeswax. Beeswax gives us an interactive interface to Hive. We can type in queries and have Hive evaluate them for us using a series of MapReduce jobs.