Refining the Raw Data

    • In the Hortonworks Sandbox virtual machine (VM) console window, press the Alt and F5 keys, then log in to the Sandbox using the following user name and password:

Login: root\       Password: hadoop

After you log in, the command prompt will appear with the prefix [root@sandbox ~]#:

    • At the command prompt, type in the following command, then press the Enter key: hive -f hiveddl.sql

Lines of text appear as the script runs a series of MapReduce jobs. It will take a few minutes for the script to finish running. When the script has finished running, the time taken is displayed, and the normal command prompt appears.

22_hiveddl_sql_complete

The hiveddl.sql script has performed the following steps to refine the data:

  • Converted the raw Twitter data into a tabular format.
  • Used the dictionary file to score the sentiment of each Tweet by the number of positive words compared to the number of negative words, and then assigned a positive, negative, or neutral sentiment value to each Tweet.
  • Created a new table that includes the sentiment value for each Tweet.

We can look at the data using the Hive command line. We can start Hive by typing hive at the prompt.

23_start_hive (1)

Remember to add the json serde jar file so we can look at the tables.

23.1_add_jar

The command “show tables” will show you the tables. You can browse the data using the “select * from

limit 10;” command. The limit 10 gives you the first 10 records instead of the whole table.

24_show_tables

We can also use HCatalog to view the results,

25_tweetsbi_table

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s