Automatic Detection of Rumors on Twitter

October 11, 2020

Order ID 53563633773

Type Essay

Writer Level Masters

Style APA

Sources/References 4

Perfect Number of Pages To Order 5-10 Pages

Description/Paper Instructions

Automatic Detection of Rumors on Twitter

Research Proposal

Abstract

Recently online fake news problems have acquired increasing attention in an era of which the content created by internet users is a major force in shaping and disseminating news stories. Misinformation is wildly spread through popular social media such as Twitter, which is traced back to the open and uncontrolled nature of these platforms. Consequently, there is a need to build models that are capable of assessing the tweet messages and judging their credibility.

We aim in this project to build a classification model to classify Saudi Arabia news posted on Twitter whether is a rumor or not. Set of features will be extracted from rumor and non-rumor tweets to train a model that will be build using a supervised learning approach. To the extent of our knowledge, this is the first work that assesses the credibility of Saudi Arabia news.

Keywords— Social Media Mining, Rumor Detection, credibility, Twitter

Introduction

Today, each of us can easily create their own content and share it through the internet with the public. Social media has transformed communication from unidirectional communication such as television, print newspapers, and radio to multi-directional communication where everyone can be part of it. Social media facilitated moving from a world with a small number of people reporting, creating and taking decisions to a world where everyone has opportunities for active participation (Fischer & Reuber, 2011).

Social media are online communications channels that rely on user-generated content (UGC) where users are capable of creating online content to share information, ideas, personal messages, pictures and videos (Moturu & Liu, 2011). Social media contain many platforms and they are constantly increasing and become popular all over the world. Hundreds of millions of people are using these platforms on a daily basis. Facebook users exceeded 2.23 billion users and the number of users is increasing steadily. Twitter also is so famous. According to www. statista.com, the second famous social network site worldwide as of July 2018 was Twitter with 1,900 billion monthly active users. Instagram which is a mainly mobile photo sharing network where users can exchange photos and videos, reached 1 billion users all over the world.

Twitter gained a high interest among Arab users, especially during the Arab spring events in 2011(O’Donnell, 2011). According to the report of (Salem, 2017), Saudi users are considered the top in term of the number of users, posted content and interaction. In Saudi Arabia, Twitter is not just a conventional social media platform; it turned out to be a huge platform to introduce advertisements, customer service, travel and restaurants experiments, entertainment content, and to share news.

With the development of social networks, the amount of information has increased dramatically, Twitter and others platforms became an important source of opinions, news, and any other related information about the current events. With the large volume of information produced in these outlets, their quality and credibility would exceedingly fluctuate. All kinds of misinformation, especially rumor, permeate nearly every corner of social networks. This would, in turn, affects the readers’ opinions and the accuracy of the tasks being carried out on such information(Zhang, Zhang, Dong, Xiong, & Cheng, 2015).

• Problem Statement

Social media become an important news resource. (Reuters Institute for the Study of Journalism, 2013) reported that people find social media one of the most important channels of news sources. It has been observed in many outbreak events, Arab spring revolution for instance, the news was spreading faster in popular social media than conventional news media(Khondker, 2011). This tells us how much social media affects the news sources and eventually its interpretation and content. However, the available researches (Allcott & Gentzkow, 2017; Castillo, Mendoza, & Poblete, 2011; Schmierbach & Oeldorf-Hirsch, 2012) show many people have published fake news on Twitter and shown several examples where Twitter has led to the dissemination of misinformation especially during disaster events such as the Hurricane Sandy in 2012(Aditi, Hemank, Ponnurangam, & Anupam, 2013), earthquake in Chile in 2012 (Mendoza, Poblete, & Castillo, 2010) and Boston Marathon blasts in 2013(Gupta, Lamba, & Kumaraguru, 2013). The wild spread of this misinformation can be traced back to the open and uncontrolled nature of social media, where everyone is able to post fake messages on Twitter and take part in discussions about everything.

In conventional media such as newspapers, the source of the news is known, and the owner of the medium takes responsibility for the content. However, in Twitter the source of the news may be compromised and minimum liability for the published content. In many cases, the username is the only information we know about the source.

The spread of fake news changes how the people interpret and react to the true news. For instance, some news has been fabricated and intended to prompt people’s mistrust and confusion, which to some extent hinder their ability to differentiate between the truth news from those that are false (Shu, Sliva, Wang, Tang, & Liu, 2017).

Users on social media lack all the fact of the event that would allow them to decide the truth of the information they are being exposed to, especially for inexperienced users and thus easily misled them by the unreliable information. For example, (Hotez, 2016) argued social media has played a serious role in the Texas anti-vaxxer movement, denying children from vaccination, by circulating side effects of vaccination with no substantial scientific evidence. The statistic shows that the rate of children not receiving their annual school vaccines jumped of nearly 20-fold over the last decade. He believes carrying on this circulation without immediate interference would increase mortality rate and have catastrophic consequences.

Research Motivation

Nowadays, Twitter is no longer just a social network, but it has become one of the most important sources for publishing and sharing news and because the content on Twitter is created and published by different users from diverse backgrounds, such an environment could be misused to spread fake news and rumors intentionally or unintentionally. Consequently, there is a need to build models that are capable of assessing the tweets and judging their credibility whether if they are a rumor or not.

Al-Saggaf & Chutikulrungsee (2015) found that Twitter usage different from one country to another and it affected by the country’s culture. As will showed in literature, credibility in social media has been studied thoroughly from the perspective of western countries, but there were little researches that addressed this critical field from the perspective of the Arab world especially in Saudi Arabia. This motivated us to conduct this research and build a model that assesses the credibility of Saudi Arabia news on Twitter.

• Research Aim and Objectives

We aim in this project to build a model for classifying tweets into true and fake news. Crowdsourcing web application will be built to annotated tweets collected from Twitter and to develop a labeled dataset of true and fake news covering various topics. Then several features will be extracted and apply different machine learning approaches to train classifiers for rumor classification. We aim to develop this model in a Saudi Arabia news context. We also aim to answer the following questions:

Can distinctive features be extracted from rumor and non-rumor tweets to train a classifier that can evaluate tweets credibility with high precision and recall?

Which features have a high impact on the accuracy of classifier?

Does the accuracy of the classification will vary depending on the dimensionality reduction technique will be used?

• Delimitation

Although, to the best of our knowledge, no similar work deal with rumors written in Arabic, this research work is limited to detection of rumors that addressed Saudi Arabia written in English. We did not choose Arabic language for two reasons. First, according to recent study by El-Masri et al. (2017), there are few tools and libraries available to extract Arabic sentiments from the text. This, in turn, will make processing the morphologically complex Arabic natural language a difficult challenge that requires more than the time available for us to finish the project. Second, this complex task is further exacerbated when dealing with the dialects of

Saudi Arabia which do not adhere to the formal grammatical structures of modern standard Arabic (El-Masri, Altrabsheh, & Mansour, 2017). Table 1 below shows a simple examples illustrate the difference between the Hijazi dialect, the Najdi dialect, and modern standard Arabic.

Table 1: differences between the Hijazi dialect, the Najdi dialect and modern standard Arabic

English word Hijazi dialect Najdi dialect Modern standard Arabic

What إیش وش ماذا

Window طاقھ شباك نافذة

We hope in the future, tools and resources become available to help us in processing the Arabic language. In addition, the experience we will gain from completing this project will help us in the future to face the challenges that arise when dealing with the Arabic language when conducting a project aimed at detecting rumors in Arabic.

• Research Contribution

The main contributions of this project are:

To the extent of our knowledge, this is the first work that assesses the credibility of

Saudi Arabia news.

To achieve this task, we will build a model to automatically predict the credibility of

Saudi Arabia news on Twitter.

This project will give some recommendation for selecting the optimal settings of data set size that maximizes accuracy of classifiers.

The dataset that will build to train the classifiers will be available for public access.

Literature Review

Wyrwoll (2014) categorized the platforms according to the type of metadata provided by them and he came up with 8 categories as the following:

Forums

Microblogs

Question and answer platforms

Blogs

Media sharing platforms

Location sharing and annotation platforms

Social networks

Rating and review platforms

Microblogs is a type of platforms that allow users to post and share information by broadcasting real-time, short messages and do not require any user-side design or publishing skills.

Twitter is a well-known micro-blog provider, where users are able on a real-time basis to post textual content, images, videos, and links and share it with families, friends as well as strangers. A tweet (also referred as post message or status) is the name of the content unit published on Twitter. Initially, users used to be allowed to post up to maximum 140 characters, but in late 2017, Twitter management decided to extend this limit to 280 characters, which allowed the user to express more effectively. There are two main components of each tweet, the content of tweet and the user who posted it. Beside tweets; Twitter introduced “hashtag” which is a way to categories tweets topics which make it easier for users to search and browse tweets for similar content. For example: if we search for #Riyadh, it will return all the tweets that mention the same hashtag and written by other users(Java, Song, Finin, & Tseng, 2007).

• Rumor Definition

Previous literature did not agree on a single definition, as the definition of the rumor differed from one study to another. A number of studies have provided definition of the rumors as incorrect information (Cai, Wu, & Lv, 2014; Liang, He, Xu, Chen, & Zeng, 2015). While most of the studies adopted in their definition of rumors on the distinctive feature of them, which are not verified at the time of posting.

(DiFonzo & Bordia, 2007) define rumors as “unverified and instrumentally relevant information statements in circulation that arise in contexts of ambiguity, danger or potential threat, and that function to help people make sense and manage risk”. This is in agreement with the definitions provided by famous dictionaries for instance the Oxford Dictionary which defined the rumor as “A currently circulating story or report of uncertain or doubtful truth.”(“Rumour,” n.d.) or Merriam Webster Dictionary, which introduced a similar definition which is “a statement or report current without known authority for its truth” (“Rumor,” n.d.).

We can come to the conclusion that a rumor is an item of information has not yet been verified because there is no evidence to support it or reject it or it does not confirm or deny by official sources or credible sources. Thus their true value remains unresolved at the time of their circulating and may later prove to be true or partially false or completely false.

There are two types of research in rumor aspects, detecting rumor information and finding rumors sources (Cossu, Labatut, & Dugué, 2016). Most of prior research addressed rumor detection, and approaches used varied between supervised, unsupervised and hybrid

(Alzanin & Azmi, 2018).

• Detecting Rumors in Twitter

There is an extensive collection of researches have been carried to automatically detect rumors and evaluate the credibility of tweets by using different approaches based on different methods and factors. In this section, we only focus on closely related works on the supervised learning approach to classify or rank the credibility of the tweet.

1. Decision tree algorithm

(Castillo et al., 2011) is one of the earliest works produced in the automatic prediction of tweet credibility. Their method was aim to predict the topic credibility rather than individual tweets. They used supervised classifiers to accomplish two tasks. First automatically determine whether the tweets describe a newsworthy event or informal conversation. Second, assesses the credibility of the news topics. Data annotations are done on two rounds using crowdsourcing tools, namely Amazon Mechanical Turk. First round to separate topics which spread information about a news event from the cases which correspond to conversations or personal opinions. 113 cases labeled as news and 134 cases labeled as chat. In the second round, evaluators were asked to indicate the levels of credibility for news cases by giving one of four labels: almost certainly true, likely to be false, almost certainly false, and cannot be decided. The features of the dataset taken into account were grouped into four classes: Message, User, Topic, and Propagation. Several learning algorithms were trained, J48 decision tree achieved best results with 68% accuracy. The results showed that some of the features were more effective in the accuracy of the classification of topics as credible or not credible such as the fraction of tweets having URLs, number of re-tweets.

Similar study but based on individual tweets was carried out by (Kang, O’Donovan, & Höllerer, 2012) .They built three models to assess the credibility of the tweet based on either source features or content features or both. Several indicators were discovered that can help determine the credibility such as the number of followers where there was a remarkable correlation between the credibility of the tweet and the number of followers author has. a J48 decision tree algorithm was used and the results show that the social model (which based on source features only) outperforms hybrid model and content-based model in terms of accuracy in predicting the credibility of tweets.

2. Support Vector Machines (SVM) algorithm

(Gupta & Kumaraguru, 2012)Assessed the credibility of the tweets and then ranked them according to the score of credibility by adopted supervised machine learning and Pseudo Relevance Feedback approach. Human annotators annotated 7000 tweets about 14 topics to construct credibility ground truth. Content-based features and user-based features have been considered as input to the SVM ranking algorithm. Then the most frequent unigrams were extracted from the top tweets. Text similarity between repeated words unigrams and the top tweets used to re-rank them using Pseudo Relevance Feedback (PRF) technique. The accuracy of credibility enhanced after applying PRF technique and reached 73%. They observed content level features no less important than user-level features and features such as the number of swear language words, number of unique characters, number of followers and length of username play an important role in predicting the credibility of the tweets.

Yang et al. (2012) ) carried out a study similar to the work conduct by (Castillo et al., 2011), except they conducted it on Sina Weibo (Chinese microblogging service provider ) instead of Twitter. They studied rumors characteristics to extract the features that can be used to build a classifier able to automatically predict whether the post is rumor or not. 19 features extracted, some of these features were previously studied and proposed two new features, which were client program used to post a microblog and event location mentioned by rumor-related microblogs. Series of experiments using SVM classifier were conducted to study the impact of the features in the rumor classification. The results showed that the accuracy of the classification improved after adding the new features. The classification accuracy before adding the new features was 72% and after adding them the accuracy increased to 77-78%.

Liu et al.( 2015) propose real-time rumors debunking algorithm for Twitter. Their focus on identification of rumors as an event that could include one or more conflicting tweets. For the sake of creating the dataset, they crawled snopes.com and emergent.info to find rumor events then they used Twitter API to fetch relevant tweets, their final dataset was consisting of 421 true and 421 false events. They proposed several features that belong to 6 categories, these categories are credibility, identity, diversity and location of the source in addition to messages belief and event propagation. Most of the features were overlap with features used in (Yang et al., 2012) and (Castillo et al., 2011), only 16 proposed features were new. They suggested using the SVM classifier that uses beliefs of the crowd in conjunction with the adopted features. The results of the evaluation show that their classifier can outperform the models from (Yang et al., 2012) and (Castillo et al., 2011). Moreover, they compared their approach with human-based rumor debunking services and the result shows the ability of their classifier to detect 75% of the rumors faster than the corresponding services.

A framework was proposed by (Krishnan & Chen, 2018) to identify tweets containing fake news content using several techniques including Google reverse image search, cross verification of fake news source, statistical analysis, and data mining. The framework consists of two main parts: core and website. The core, responsible for bringing tweets from twitter, create a set of features, classify the tweets and provide the evaluation report. On the other hand, the website present tweet credibility predictions, details of the tweet. For any give tweet 7 user features and 13 content features are extracted. J48 tree decision and support vector machine classifiers are used to train the model for the classification task. The results of the experiments conducted proved the effectiveness of their framework.

3. Bayesian algorithm

A study about predicting tweet’s credibility was carried out by (Xia, Yang, Wu, Li, & Bao, 2012). The researcher in this study focuses on predicting the credibility of information on Twitter in emergency situations. First, they build a novel model using unsupervised learning algorithm based on dynamic keywords set to monitor Twitter to semi-automatic detect the emergency situation. Second, they proposed a classifier that is capable of predicting the credibility of published tweets about emergency situations like riots, earthquakes and so on using a supervised method. 350 tweets were collected about the British riots and manually labeled by 5 experts to either credibility or incredibility. The classification accuracy of their proposed CIT Bayesian Network structure learning algorithm reached 63%.

4. Conditional Random Fields

On the basis of their definition of rumors as circulating information and has not yet been confirmed, (Zubiaga, Liakata, & Procter, 2017)built a rumor detection system to differentiate between verified and unverified tweets. This, in their view, would be very useful for limiting the dissemination of tweets that may later prove to be false information and warn users to believe it or publish it. Their dataset comprised 5,802 tweets labeled by journalists as rumors and non-rumors. 12 features were used and tested. 7 content-based features and 5 user-based features. Researchers have used Conditional Random Fields (CRF) to test their hypothesis that states “aggregating rumourous and non-rumourous posts preceding the tweet being classified will improve the performance of the rumor detection system”. CRF used as a sequential classifier that enables of aggregation the previous tweets as a thread of tweet will be classified. Maximum Entropy classifier has been used to test the validity of the hypothesis as the nonsequential equivalent of CRF. The results showed that the performance of CRF was

substantially better than the rest of other classifiers that have been compared to it.

Research Methodology

Our proposed methodology will be detailed here and as shown in the next figure. We formulate rumor detection task as a classification problem and we will use the supervised learning approach to build a model to automatically classify the news as a rumor or not. We selected this approach due to the goal we want to achieve which is to train a classifier on a labeled dataset to make predictions on unseen data.

1. Data Collection

We will scrap keyword-specific tweets from Twitter using Twitter’s APIs, and store them into a database. We will select Saudi Arabia popular events news that would create a flurry of activity in the network and covering various domains like political, financial, sport, and entertainment news. TwitterSearch API Python library will be used to pull tweets from the Twitter.

           2.   Data Annotation

For the purpose of annotation, crowdsourcing web application will be created to annotate tweets by people. Many problems depend on utilizing the wisdom of the crowd where an aggregated answer from a large number of persons produces viable solutions to a problem (Brabham, 2008). Human annotation to understand the ground truth is considered as a wellestablished research methodology(Castillo et al., 2011). Majority voting will be used to aggregate the users’ responses. The web application will display tweets to the user, one at a time and four different options will be displayed with each tweet.

Tweet is a rumor

Tweet is non-rumor

Tweet not related (if tweet relevance to the topic but does not contain information that will help the reader to gain knowledge about the topic or if tweet is spam)

Skip (if user can’t decide)

For each tweet, the labeled option selected by users will be saved on the database. When we get a sufficient number of people responses, we will be ready to start the next phase.

3. Data Preprocessing

The goal of this phase is to represent data in a format that can be efficiently analyzed and to improve the quality of data to be more meaningful and informative. This phase will include some tasks like:

Convert user responses into numerical feature.

Preprocess the content like Remove stope word, stemming and so on.

Different scaling and transformations of the data will be conducted in order to decide on the structure of the data for the next phase.

           4.    Features extraction

Feature extraction is considered a key step in classification. We will rely on related studies to identifying the features that will be used to assess and analyze tweets. Taking into consideration most of the features described in the literature are very heterogeneous and numerous very close variants with name overlaps. Table 1 displays some of the features and their type that we plan to extract and compute.

Table 3: List of some features and their type

Type Feature

Content-based

§ Number of comments

§ Number of retweets

§ Number of mentions

§ Presence of URL

§ Number of duplications

Author-based § Number of followers

§ Number of friends

§ Verified account

§ Whether has description

§ Number of user’s past tweets

5. Feature Selection and Dimensionality reduction

Using more features than necessary leads to overfitting and reduces the performance of the classifier (Guyon & Elisseeff, 2003). Features reduction is considered an active field of research and many techniques have been developed (Verleysen & François, 2005). We will try different methods of features reduction to avoid overfitting and reduce the complexity as well as the time of computational tasks. In order to find the best method, we will apply each of them to our dataset and evaluate and compare the results.

           6.   Classification

After features selection, the dataset will be split into training and testing dataset. Different machine learning algorithms will be tested to find the best model.

7. Evaluation metrics

Finally, we will apply the classifier to testing data to evaluate the effectiveness classification technique. We will use a set of well-known classification metrics: precision, recall and accuracy, and F1 measure. Precision is the ratio of how many of the predicted values are actually correct. Recall is the ratio of how often tweets are classified correctly as the correct class. F-measure is a harmonic mean of precision and recall. Accuracy is the proportion of the correctly classified tweets to the total number of tweets. These will be calculated as follows:

Required Tools:

The data mining tool chosen for the project is python. It is a high-level language and comes with large number of machine learning libraries.

The database will be built using It is an open-source document database with high scalability and flexibility. It has been shown to be suitable for storing tweets as well as supports straightforward queries and different indices (Kumar, Morstatter, & Liu, 2014).

The application programming interface will be developed using It is a free and open source web framework, written in Python to develop a scalable and maintainable web application. It encourages rapid development and clean, pragmatic design. There are many extensions provided by the community that makes adding new functionality easy.

TwitterSearch API will be used to extract data from the twitter. It is a tweet collecting library from Twitter. TwitterSearch is flexible to use with automatic reload of the next pages while using the iteration and download all available information about the tweet including meta information

Conclusion

Today, social media has proven itself as an important source of news, but because the open and uncontrolled nature of these platforms, users can misuse them to spread rumors intentionally or unintentionally. Users lack all the fact of the event that would allow them to decide the truth of the information they are being exposed to and thus easily misled them by the unreliable information that may lead to numerous of undesirable responses. Therefore, there is a need to build models for assessing the credibility of the tweets and judging if they are a rumor or not. We aim in this project to build a model for classifying tweets into true and fake news. We aim to develop this model in a Saudi Arabia news context.

References

Aditi, G., Hemank, L., Ponnurangam, K., & Anupam, J. (2013). Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy’, paper presented to the. In Proceedings of the 22nd international conference on World Wide Web companion.

Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election.

Journal of Economic Perspectives, 31(2), 211–36.

https://doi.org/10.1257/jep.31.2.211

Al-Saggaf, Y., & Chutikulrungsee, T. T. (2015). Twitter usage in Australia and Saudi Arabia and influence of culture: an exploratory cross-country comparison. In Refereed proceedings of the Australian and New Zealand Communication Association Conference: Rethinking communication, space and identity. Retrieved from http://www. anzca. net/conferences/past-conferences.

Alzanin, S. M., & Azmi, A. M. (2018). Detecting rumors in social media: A survey. Procedia Computer Science, 142, 294–300. https://doi.org/10.1016/j.procs.2018.10.495

Brabham, D. C. (2008). Crowdsourcing as a model for problem solving: An introduction and cases. Convergence, 14(1), 75–90. https://doi.org/10.1177/1354856507084420

Cai, G., Wu, H., & Lv, R. (2014). Rumors detection in chinese via crowd responses. In Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 912–917). IEEE Press.

Castillo, C., Mendoza, M., & Poblete, B. (2011). Information credibility on twitter. In Proceedings of the 20th international conference on World wide web (pp. 675–684). ACM. https://doi.org/10.1145/1963405.1963500

Cossu, J.-V., Labatut, V., & Dugué, N. (2016). A review of features for the discrimination of twitter users: Application to the prediction of offline influence. Social Network

Analysis and Mining, 6(1), 25. https://doi.org/10.1007/s13278-016-0329-x

DiFonzo, N., & Bordia, P. (2007). Rumor, gossip and urban legends. Diogenes, 54(1), 19–35. https://doi.org/10.1177/0392192107073433

El-Masri, M., Altrabsheh, N., & Mansour, H. (2017). Successes and challenges of Arabic sentiment analysis research: a literature review. Social Network Analysis and Mining,

7(1), 54. https://doi.org/10.1007/s13278-017-0474-x

Fischer, E., & Reuber, A. R. (2011). Social interaction via new social media:(How) can interactions on Twitter affect effectual thinking and behavior? Journal of Business

Venturing, 26(1), 1–18. https://doi.org/doi:10.1016/j.jbusvent.2010.09.002

Gupta, A., & Kumaraguru, P. (2012). Credibility ranking of tweets during high impact events. In Proceedings of the 1st workshop on privacy and security in online social

media (p. 2). ACM. https://doi.org/10.1145/2185354.2185356

Gupta, A., Lamba, H., & Kumaraguru, P. (2013). $1.00 per rt# bostonmarathon# prayforboston: Analyzing fake content on twitter. In eCrime Researchers Summit

(eCRS), 2013 (pp. 1–12). IEEE. https://doi.org/10.1109/eCRS.2013.6805772

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar), 1157–1182.

Hotez, P. J. (2016). Texas and its measles epidemics. PLoS Medicine, 13(10), e1002153.

https://doi.org/10.1371/journal.pmed.1002153

Java, A., Song, X., Finin, T., & Tseng, B. (2007). Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis (pp. 56–65).

ACM. https://doi.org/10.1145/1348549.1348556

Kang, B., O’Donovan, J., & Höllerer, T. (2012). Modeling topic specific credibility on twitter. In Proceedings of the 2012 ACM international conference on Intelligent User

Interfaces (pp. 179–188). ACM. https://doi.org/10.1145/2166966.2166998

Khondker, H. H. (2011). Role of the new media in the Arab Spring. Globalizations, 8(5),

675–679. https://doi.org/10.1080/14747731.2011.621287

Krishnan, S., & Chen, M. (2018). Identifying Tweets with Fake News. In 2018 IEEE International Conference on Information Reuse and Integration (IRI) (pp. 460–464).

IEEE. https://doi.org/10.1109/IRI.2018.00073

Kumar, S., Morstatter, F., & Liu, H. (2014). Twitter data analytics. Springer.

Liang, G., He, W., Xu, C., Chen, L., & Zeng, J. (2015). Rumor identification in microblogging systems based on users’ behavior. IEEE Transactions on Computational Social Systems, 2(3), 99–108.

https://doi.org/10.1109/TCSS.2016.2517458

Liu, X., Nourbakhsh, A., Li, Q., Fang, R., & Shah, S. (2015). Real-time rumor debunking on twitter. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 1867–1870). ACM.

https://doi.org/10.1145/2806416.2806651

Mendoza, M., Poblete, B., & Castillo, C. (2010). Twitter Under Crisis: Can we trust what we RT? In Proceedings of the first workshop on social media analytics (pp. 71–79). ACM. https://doi.org/10.1145/1964858.1964869

Moturu, S. T., & Liu, H. (2011). Quantifying the trustworthiness of social media content. Distributed and Parallel Databases, 29(3), 239–260. https://doi.org/DOI

10.1007/s10619-010-7077-0

O’Donnell, C. (2011). New study quantifies use of social media in Arab Spring | UW News. Retrieved November 1, 2018, from

http://www.washington.edu/news/2011/09/12/new-study-quantifies-use-of-socialmedia-in-arab-spring/

Reuters Institute for the Study of Journalism. (2013). Reuters institute digital news report 2013.

Rumor. (n.d.). merriam webster dictionary. Retrieved from https://www.merriamwebster.com/dictionary/rumor

Rumour. (n.d.). oxford dictionary. Retrieved from https://en.oxforddictionaries.com/definition/rumour

Salem, F. (2017). The Arab social media report 2017: Social media and the internet of things: Towards data-driven policymaking in the Arab World (Vol. 7).

Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1),

22–36. https://doi.org/10.1145/3137597.3137600

Verleysen, M., & François, D. (2005). The curse of dimensionality in data mining and time series prediction. In International Work-Conference on Artificial Neural Networks

(pp. 758–770). Springer. https://doi.org/10.1007/11494669_93

Wyrwoll, C. (2014). Social media: Fundamentals, models, and ranking of user-generated content. Springer.

Xia, X., Yang, X., Wu, C., Li, S., & Bao, L. (2012). Information credibility on twitter in emergency situation. In Pacific-Asia Workshop on Intelligence and Security Informatics (pp. 45–59). Springer. https://doi.org/10.1007/978-3-642-30428-6_4

Yang, F., Liu, Y., Yu, X., & Yang, M. (2012). Automatic detection of rumor on Sina Weibo.

In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics (p. 13). ACM. https://doi.org/10.1145/2350190.2350203

Zhang, Q., Zhang, S., Dong, J., Xiong, J., & Cheng, X. (2015). Automatic detection of rumor on social network. In Natural Language Processing and Chinese Computing (pp.

113–122). Springer. https://doi.org/10.1007/978-3-319-25207-0 10

Zubiaga, A., Liakata, M., & Procter, R. (2017). Exploiting context for rumour detection in social media. In International Conference on Social Informatics (pp. 109–123). Springer. https://doi.org/10.1007/978-3-319-67217-5_8

RUBRIC

QUALITY OF RESPONSE NO RESPONSE POOR / UNSATISFACTORY SATISFACTORY GOOD EXCELLENT

Content (worth a maximum of 50% of the total points) Zero points: Student failed to submit the final paper. 20 points out of 50: The essay illustrates poor understanding of the relevant material by failing to address or incorrectly addressing the relevant content; failing to identify or inaccurately explaining/defining key concepts/ideas; ignoring or incorrectly explaining key points/claims and the reasoning behind them; and/or incorrectly or inappropriately using terminology; and elements of the response are lacking. 30 points out of 50: The essay illustrates a rudimentary understanding of the relevant material by mentioning but not full explaining the relevant content; identifying some of the key concepts/ideas though failing to fully or accurately explain many of them; using terminology, though sometimes inaccurately or inappropriately; and/or incorporating some key claims/points but failing to explain the reasoning behind them or doing so inaccurately. Elements of the required response may also be lacking. 40 points out of 50: The essay illustrates solid understanding of the relevant material by correctly addressing most of the relevant content; identifying and explaining most of the key concepts/ideas; using correct terminology; explaining the reasoning behind most of the key points/claims; and/or where necessary or useful, substantiating some points with accurate examples. The answer is complete. 50 points: The essay illustrates exemplary understanding of the relevant material by thoroughly and correctly addressing the relevant content; identifying and explaining all of the key concepts/ideas; using correct terminology explaining the reasoning behind key points/claims and substantiating, as necessary/useful, points with several accurate and illuminating examples. No aspects of the required answer are missing.

Use of Sources (worth a maximum of 20% of the total points). Zero points: Student failed to include citations and/or references. Or the student failed to submit a final paper. 5 out 20 points: Sources are seldom cited to support statements and/or format of citations are not recognizable as APA 6^th Edition format. There are major errors in the formation of the references and citations. And/or there is a major reliance on highly questionable. The Student fails to provide an adequate synthesis of research collected for the paper. 10 out 20 points: References to scholarly sources are occasionally given; many statements seem unsubstantiated. Frequent errors in APA 6^th Edition format, leaving the reader confused about the source of the information. There are significant errors of the formation in the references and citations. And/or there is a significant use of highly questionable sources. 15 out 20 points: Credible Scholarly sources are used effectively support claims and are, for the most part, clear and fairly represented. APA 6^th Edition is used with only a few minor errors. There are minor errors in reference and/or citations. And/or there is some use of questionable sources. 20 points: Credible scholarly sources are used to give compelling evidence to support claims and are clearly and fairly represented. APA 6^th Edition format is used accurately and consistently. The student uses above the maximum required references in the development of the assignment.

Grammar (worth maximum of 20% of total points) Zero points: Student failed to submit the final paper. 5 points out of 20: The paper does not communicate ideas/points clearly due to inappropriate use of terminology and vague language; thoughts and sentences are disjointed or incomprehensible; organization lacking; and/or numerous grammatical, spelling/punctuation errors 10 points out 20: The paper is often unclear and difficult to follow due to some inappropriate terminology and/or vague language; ideas may be fragmented, wandering and/or repetitive; poor organization; and/or some grammatical, spelling, punctuation errors 15 points out of 20: The paper is mostly clear as a result of appropriate use of terminology and minimal vagueness; no tangents and no repetition; fairly good organization; almost perfect grammar, spelling, punctuation, and word usage. 20 points: The paper is clear, concise, and a pleasure to read as a result of appropriate and precise use of terminology; total coherence of thoughts and presentation and logical organization; and the essay is error free.

Structure of the Paper (worth 10% of total points) Zero points: Student failed to submit the final paper. 3 points out of 10: Student needs to develop better formatting skills. The paper omits significant structural elements required for and APA 6^th edition paper. Formatting of the paper has major flaws. The paper does not conform to APA 6^th edition requirements whatsoever. 5 points out of 10: Appearance of final paper demonstrates the student’s limited ability to format the paper. There are significant errors in formatting and/or the total omission of major components of an APA 6^th edition paper. The can include the omission of the cover page, abstract, and page numbers. Additionally the page has major formatting issues with spacing or paragraph formation. Font size might not conform to size requirements. The student also significantly writes too large or too short of and paper 7 points out of 10: Research paper presents an above-average use of formatting skills. The paper has slight errors within the paper. This can include small errors or omissions with the cover page, abstract, page number, and headers. There could be also slight formatting issues with the document spacing or the font Additionally the paper might slightly exceed or undershoot the specific number of required written pages for the assignment. 10 points: Student provides a high-caliber, formatted paper. This includes an APA 6^th edition cover page, abstract, page number, headers and is double spaced in 12’ Times Roman Font. Additionally the paper conforms to the specific number of required written pages and neither goes over or under the specified length of the paper.

GET THIS PROJECT NOW BY CLICKING ON THIS LINK TO PLACE THE ORDER

CLICK ON THE LINK HERE: https://www.perfectacademic.com/orders/ordernow

Do You Have Any Other Essay/Assignment/Class Project/Homework Related to this? Click Here Now [CLICK ME] and Have It Done by Our PhD Qualified Writers!!

Simply Easy Learning

Automatic Detection of Rumors on Twitter

Description/Paper Instructions

• Problem Statement

• Research Aim and Objectives

• Delimitation

• Research Contribution

Literature Review

• Rumor Definition

• Detecting Rumors in Twitter

1. Decision tree algorithm

2. Support Vector Machines (SVM) algorithm

3. Bayesian algorithm

4. Conditional Random Fields

Research Methodology

1. Data Collection

2. Data Annotation

3. Data Preprocessing

4. Features extraction

5. Feature Selection and Dimensionality reduction

6. Classification

7. Evaluation metrics

RUBRIC

GET THIS PROJECT NOW BY CLICKING ON THIS LINK TO PLACE THE ORDER

CLICK ON THE LINK HERE: https://www.perfectacademic.com/orders/ordernow

admin

Automatic Detection of Rumors on Twitter

Description/Paper Instructions

• Problem Statement

• Research Aim and Objectives

• Delimitation

• Research Contribution

Literature Review

• Rumor Definition

• Detecting Rumors in Twitter

1. Decision tree algorithm

2. Support Vector Machines (SVM) algorithm

3. Bayesian algorithm

4. Conditional Random Fields

Research Methodology

1. Data Collection

2. Data Annotation

3. Data Preprocessing

4. Features extraction

5. Feature Selection and Dimensionality reduction

6. Classification

7. Evaluation metrics

RUBRIC

GET THIS PROJECT NOW BY CLICKING ON THIS LINK TO PLACE THE ORDER

CLICK ON THE LINK HERE: https://www.perfectacademic.com/orders/ordernow

PLACE THE ORDER WITH US TODAY AND GET A PERFECT SCORE!!!

admin

Related posts

PCI DSS Compliance: Small Merchant Guide Analysis

Analyzing Power Dynamics

Recording Social Media Workplace Insights

Leave a Reply Cancel reply