provide us with a chance to analyse the Social Web using knowledge obtained from this unit with assistance
Project Specification 300958 Social Web Analytics
Part A
1 Aim
provide us with a chance to analyse the Social Web using knowledge obtained from this unit with assistance from a computer based statistical package. For this project, we will focus on identifying a chosen company’s Twitter image.
2 Method
To complete this project:
1. Read through this specification.
2. Choose a company that is active on Twitter, check that it is not already on the list of Group Project Twitter Handles. Then submit the Twitter handle of the company using the same link. Note that a given company cannot be allocated to more than one group. If duplicate company names are found on the list, the group with the later time stamp will be asked to find a new company.
3. Complete the data analysis required by the specification.
4. Write up your analysis using your favourite word processing/typesetting program, making sure that all of the working is shown and presented well. Include all the R code along with its output in your assignment.
5 Report Format
Once the required analysis is performed by the group, the members of the group are to write up the analysis as a report. Remember that the assessor will only see the groups’ report and will be marking the group's analysis based on your report. Therefore, the report should contain a clear and concise description of the procedures carried out, comments on the code, explanations of what you tried to do, the analysis of results and any conclusions reached from the analysis.
6. Include the student declaration text on the front page of your report. Please make sure that the names and student numbers of each group member are clearly displayed on the front page. If a group member did not contribute to any part of the project, do not put their name to the cover (no contribution means 0 mark).
7. Submit the report as a PDF by the due date using the Submit Group Project. More detailed screenshots of your code should be in the Appendix part of the assignment, include comments in the code to explain what you tried to do.
The required analysis in this specification covers the material presented in lectures and labs. Students should use the computer software R to carry out the required analysis and then present the results from the analysis in the report.
By including this statement, we the authors of this work, verify that:
· We hold a copy of this assignment that we can produce if the original is lost or damaged.
· We hereby certify that no part of this assignment/product has been copied from any other student's work or from any other source except where due acknowledgement is made in the assignment.
· No part of this assignment/product has been written/produced for us by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned.
· We are aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).
· We hereby certify that we have read and understand what the School of Computing and Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning guide for this unit.
Note: An examiner or lecturer/tutor has the right not to mark this project report if the above declaration has not been added to the cover of the report.
8 Project Description PART A (due Week 10, Friday 11:59 pm)
A company is investigating its public image and has approached your team to identify what the public associates with the company name. The company wants the three pieces of analysis to be performed in your first report.
8.1 Analysing the source of the tweets
In this section, we want to find out which sources the people use while tweeting about the company
1. Use the search_tweets function from the rtweet library to search for 750 tweets about the company you selected. Save these tweets as “tweets.about”.
2. Examine the source column to see the source of tweets. Find out how many different levels of source exists in your tweets.
3. Obtain a vector of frequencies of each different source.
4. Create a data frame to save this information where first column represents source names, second column represents source counts.
5. List the top 10 most frequent tweet source name and draw the bar plot of the frequency of these top ten tweets source. Make sure each bar has names of the source.
6. Comment on the bar plot.
7. Company owner claims that Twitter users are equally likely to use ‘Twitter for iPhone’, ‘Twitter for Android’ and ‘Twitter Web Client’ when they post a tweet about the company. Use your tweet sample to test at a 5% level of significance whether this claim is true (Hint: First find frequencies of these sources in your data frame and save these counts in a vector, then apply the appropriate statistical test).
7. Comment on your findings.
8.2 Word-cloud of the company tweets and public tweets
In this section we want to visualize the similarity between the company tweets and public tweets as well as the language used in the tweets
8. Download the last 750 tweets from the chosen Twitter handle’s timeline, and save astweets.company.
9. After doing pre-processing,
a. Construct a document term matrix of TFIDF weights of thetweets.company.
b. Construct a document term matrix of TFIDF weights of thetweets.about.
10. Construct word clouds of the words intweets.about andtweets.company. Comment on both word-clouds.
11. Combine (merge) thetweets.about withtweets.company and construct the document-term-matrix of the merged tweets using TFIDF weighting.
8.3 Connection between public and the company
In this section, we want to categorize (cluster) all the tweets and want to determine which topics are dominated by public tweets.
12. Compute the most appropriate number of clusters using the elbow method for the merged tweets you calculated in question 11 .
13. Cluster the merged tweets using the most appropriate clustering method.
14. Visualize your clustering in 2-dimensional vector space. Show each cluster in different colour and the tweets intweets.about andtweets.company with different symbols in your visualization.
15. Comment on your visualization.
16. Compute the proportion oftweets.company in each cluster. Print these proportions for all clusters.
17. Which cluster is dominated by tweets.about? Print top 20 words in the dominated cluster and comment on the theme of this cluster.
The company wants the above three parts of analysis to be written up as a professional report in the first deliverables. Each part should have its own section of the report and all questions should have thoughtful answers. Include all the code along with its output in your assignment.
Hint
Management The practice of collating data for social media sites whilst analyzing that data by means of analytical tools towards business resolutions is social medial analytics. The most basic use of the same is mining customer sentiment in support of customer service activities and marketing. ...
The practice of collating data for social media sites whilst analyzing that data by means of analytical tools towards business resolutions is social medial analytics. The most basic use of the same is mining customer sentiment in support of customer service activities and marketing.