We will use Excel Professional Plus 2013 to access the refined sentiment data.
- In Windows, open a new Excel workbook, then select Data > From Other Sources > From Microsoft Query.
- On the Choose Data Source pop-up, select the Hortonworks ODBC data source you installed previously, then click OK.The Hortonworks ODBC driver enables you to access Hortonworks data with Excel and other Business Intelligence (BI) applications that support ODBC.
- After the connection to the Sandbox is established, the Query Wizard appears. Select the “tweetsbi” table in the Available tables and columns box, then click the right arrow button to add the entire “tweetsbi” table to the query. Click Next to continue.
- Select the “text” column in the “Columns in your query” box, then click the left arrow button to remove the text column.
- After the “text” column has been removed, click Next to continue.
- On the Filter Data screen, click Next to continue without filtering the data.
- On the Sort Order screen, click Next to continue without setting a sort order.
- Click Finish on the Query Wizard Finish screen to retrieve the query data from the Sandbox and import it into Excel.
- On the Import Data dialog box, click OK to accept the default settings and import the data as a table.
- The imported query data appears in the Excel workbook.
Now that we have successfully imported the Twitter sentiment data into Microsoft Excel, we can use the Excel Power View feature to analyze and visualize the data.
In this section we will see how sentiment varies by country,
- In the Excel worksheet with the imported “tweetsbi” table, select Insert > Power View to open a new Power View report.
- The Power View Fields area appears on the right side of the window, with the data table displayed on the left. Drag the handles or click the Pop Out icon to maximize the size of the data table.
- In the Power View Fields area, clear the checkboxes next to the id and ts fields, then click Map on the Design tab in the top menu.
- The map view displays a global view of the data.
- Now let’s display the sentiment data by color. In the Power View Fields area, click sentiment, then selectAdd as Color.
- Under SIZE, click sentiment, then select Count (Not Blank).
- Now the map displays the sentiment data by color:
- Orange: positive
- Blue: negative
- Red: neutral
- Use the map controls to zoom in on Ireland. About half of the tweets have a positive sentiment score, as indicated by the color orange.
- Next, use the map controls to zoom in on the sentiment data in China.
- The United States is the biggest market, so let’s look at sentiment data there. The size of the United States pie chart indicates that a relatively large number of the total tweets come from the US.About half of the tweets in the US show neutral sentiment, with a relatively small amount of negative sentiment.
- In the Hortonworks Sandbox virtual machine (VM) console window, press the Alt and F5 keys, then log in to the Sandbox using the following user name and password:
Login: root\ Password: hadoop
After you log in, the command prompt will appear with the prefix [root@sandbox ~]#:
- At the command prompt, type in the following command, then press the Enter key:
hive -f hiveddl.sql
Lines of text appear as the script runs a series of MapReduce jobs. It will take a few minutes for the script to finish running. When the script has finished running, the time taken is displayed, and the normal command prompt appears.
The hiveddl.sql script has performed the following steps to refine the data:
- Converted the raw Twitter data into a tabular format.
- Used the dictionary file to score the sentiment of each Tweet by the number of positive words compared to the number of negative words, and then assigned a positive, negative, or neutral sentiment value to each Tweet.
- Created a new table that includes the sentiment value for each Tweet.
We can look at the data using the Hive command line. We can start Hive by typing hive at the prompt.
Remember to add the json serde jar file so we can look at the tables.
The command “show tables” will show you the tables. You can browse the data using the “select * from
limit 10;” command. The limit 10 gives you the first 10 records instead of the whole table.
We can also use HCatalog to view the results,
Sentiment data is unstructured data that represents opinions, emotions, and attitudes contained in sources such as social media posts, blogs, online product reviews, and customer support interactions.
3.5.2 Potential Uses of Sentiment Data
Organizations use sentiment analysis to understand how the public feels about something at a particular moment in time, and also to track how those opinions change over time.
An enterprise may analyze sentiment about:
- A product – For example, does the target segment understand and appreciate messaging around a product launch? What products do visitors tend to buy together, and what are they most likely to buy in the future?
- A service – For example, a hotel or restaurant can look into its locations with particularly strong or poor service.
- Competitors – In what areas do people see our company as better than (or weaker than) our competition?
- Reputation – What does the public really think about our company? Is our reputation positive or negative?
The benefits that this data offers are:
- Improves Customer Service:
It gives useful insights about current and future customers purchase preferences, brand affiliations, topics of interests, opinions, point of views on discussions, likes and dislikes in products/services and much more. This useful information lets organizations to drastically improve their customer service and engagement strategies by building on the positive sentiments and formulating methods to combat negative sentiments.
- Reviving Brand:
One of the best uses of this data is that it allows organizations to quantify perceptions – about their brand, products and services, marketing campaigns, social engagement initiatives, online content etc. Organizations can use this information for devising better and more effective branding and marketing strategies and thus improve your brand reputation.
- Beats Competition:
This data lets the organizations to know sentiments surrounding their competitors too. This allows to benchmark their performance against that of their competitors.
- Measure Effectiveness of Marketing Campaigns:
Analyze changes in sentiment relative to specific campaigns, audiences and social outlets, and can quickly identify positive talking points around the brand to measure, inform and evaluate marketing strategy.
Flume needs agents that connect to data sources to work. Agents are written in Java–and you could write your own agent. Luckily for us, Cloudera provides a sample Twitter agent for Flume that we’ll use in this tutorial. Although shared by Cloudera, it will work with other Hadoop distributions as well.
Copy the .jar file to the /usr/lib/flume/lib folder on the node where you installed the Flume software.
Now that the agent code is in place, we need to configure flume to create an agent using the class in that .jar. We do this by updating the /etc/flume/conf/flume.conf file.
Then make the following changes. Note that the configuration file uses the term “consumerKey” and “consumerSecret“. Twitter now calls these “API Key” and “API Secret“, respectively. Simply substitute in the keys from the Twitter app.
The TwitterAgent.sources.Twitter.keywords contains a comma-separated list of words used to select which tweets needed to be added to HDFS.
The TwitterAgent.sinks.HDFS.hdfs.path provides the path from the name node where the tweets should be saved. Be sure that the user running the Flume agent can write to this HDFS file location.
Now that the configuration is complete, start the flume agent. Since the agent needs to be continue running even after closing ssh session, start the process using NOHUP:
nohup flume-ng agent –conf-file /etc/flume/conf/flume.conf –name TwitterAgent >flume_twitteragent.log &
As the agent begins running, monitor its progress by using the tail command against the log file (specified on the previous command line) with the “follow” flag:
tail -f flume_twitteragent.log
And of course the “acid test” is to look at the files being collected in HDFS:
hadoop fs -ls /root/flume/tweets
The next step is to install Flume on the Hadoop cluster node you’ll use as the flume agent. Pick a node that has a client role, as the agent will be connecting with the name node, and sending files to HDFS nodes as data is streamed.
Flume is easy to install with HDP. Just run the following two Yum commands as root (or use sudo if not logged in as root):
yum install flume
yum install flume-node
After these two commands are complete, the Flume code is installed and ready to be used.
- You probably already have a twitter account. If you don’t, create one on Twitter.com.
- Next, browse to dev.twitter.com, login with your Twitter ID, and read through some of the introductory API material on Twitter Apps. A “Twitter App” can take many forms. Any piece of software that interacts with twitter on behalf of a user is an app.A mobile phone app you can write as an alternative to the one Twitter provides is a “Twitter App”. The Flume agent we’ll create is an “App” too.
- Next browse to apps.twitter.com, and click the Create New App button.
You’ll be redirected to the management page for your new app. Switch to the API Keys tab, and click the create my access token button.
OK, you’re done! There are four pieces of information you need to copy from the form before we go to the Hadoop cluster to setup the Flume agent:
- API key
- API secret
- Access token
- Access token secret
These four keys will be added to a configuration file in Flume, enabling it to use your Twitter account to stream tweets. Copy the four into a text file for later reference.