Event Details

Web Mining for Social Network Analysis

Presenter: Mohamed Elhaddad
Supervisor:

Date: Fri, July 23, 2021
Time: 11:00:00 - 00:00:00
Place: ZOOM - Please see below.

ABSTRACT

Zoom meeting linkhttps://uvic.zoom.us/j/84871529919?pwd=dWU5ZFdWVGJBWm1UQ1dLTzdxN29Bdz09

Meeting ID: 848 7152 9919
Password: 844033
One tap mobile
+17789072071,,84871529919# Canada
+16475580588,,84871529919# Canada

Dial by your location
+1 778 907 2071 Canada
+1 647 558 0588 Canada
Meeting ID: 848 7152 9919
Find your local number: https://uvic.zoom.us/u/kbpqT3TLFB

 

Note: Please log in to Zoom via SSO and your UVic Netlink ID 

 

Abstract:  Undoubtedly, the rapid development of information systems and the widespread use of electronic means and social networks have played a significant role in accelerating the pace of events worldwide, such as, in the 2012 Gaza conflict (the 8-day war), in the pro-secessionist rebellion in the 2013-2014 conflict in Eastern Ukraine, in the 2016 US Presidential elections, and conjunction with the outbreak of the COVID-19 pandemic since the beginning of 2020. Although the presence of many social networking platforms, Twitter is of the most used ones. It allows its users to communicate, share their opinions, and express their emotions (sentiments) in the form of short blogs easily and at no cost. Moreover, unlike other social networking platforms, Twitter allows research institutions to access its public and historical data, upon request and under control. Therefore, many organizations, on different levels (governmental, commercial), are seeking to benefit from the analysis and the classification of the shared tweets to serve in many application domains, for example, sentiment analysis to evaluate and determine user’s polarity from the content of their shared text, and misleading information detection to ensure the legitimacy and the credibility of the shared information. To attain this objective, one can apply numerous data representation, preprocessing, natural language processing techniques, and machine/deep learning algorithms. There are several challenges and limitations with existing techniques, including issues with the management of tweets in multiple languages, the determination of what features the feature vector should include, and the assignment of representative and descriptive weights to these features for different mining tasks. Besides, there are limitations of the existing performance evaluation metrics in fully assessing the performance of the developed classification systems.

In this work, two novel frameworks are introduced; the first is to efficiently analyze and classify bilingual textual content of social networks, while the second is for evaluating the performance of binary classification algorithms. The first framework is designed with: (1) An approach to handle Arabic and English written tweets, and can be extended to cover data written in more languages and from other social networking platforms, (2) An effective data preparation, and preprocessing techniques, (3) A novel feature selection technique that allows utilizing different types of features (content-dependent, context-dependent, and domain-dependent), in addition to (4) A novel feature extraction technique to assign weights to the linguistic features based on how representative they are in their containing class. The proposed framework is employed in performing sentiment analysis and misleading information detection. The performance of this framework is compared to state-of-the-art classification approaches utilizing 11 benchmark datasets comprising both Arabic and English textual content, demonstrating considerable improvement on all performance evaluation metrics used. Then, this framework is utilized a real-life case study to detect misleading information surrounding the spread of COVID-19. 

In the second framework, a new multidimensional classification assessment score (MCAS) is introduced. MCAS can determine how good the classification algorithm is when dealing with binary classification problems. It takes into consideration the effect of misclassification errors on the probability of correct detection of instances from both classes. Moreover, it is valid regardless of the size of the dataset and whether the dataset has a balanced or an unbalanced distribution of its instances over the classes. An empirical and practical analysis is conducted on both synthetic and real-life datasets to compare the comportment of the proposed metric against those commonly used. The analysis reveals that the new measure can make a distinction between the performance of different classification techniques. Furthermore, it allows performing a class-based assessment of the classification algorithms, to assess the ability of the classification algorithm when dealing with data from each class separately. This can be used if one of the classifying instances from one class is more important than instances from the other class, such as in COVID-19 testing where the detection of positive patients is way more important than negative ones.