Skip to main content

Muhammad Naveed Jokhio

  • BE (Mehran University of Engineering and Technology, Pakistan, 2014)
Notice of the Final Oral Examination for the Degree of Master of Applied Science

Topic

StretchVADER– A Rule-based Technique to Improve Sentiment Intensity Detection using Stretched Words and Fine-Grained Sentiment Analysis

Electrical and Computer Engineering

Date & location

  • Monday, January 15, 2024

  • 11:00 A.M.

  • Virtual Defence

Reviewers

Supervisory Committee

  • Dr. T. Aaron Gulliver, Department of Electrical and Computer Engineering, University of Victoria (Supervisor)

  • Dr. Mihai Sima, Department of Electrical and Computer Engineering, UVic (Member) 

External Examiner

  • Dr. George Tzanetakis, Department of Computer Science, University of Victoria 

Chair of Oral Examination

  • Dr. Peter Dukes, Department of Mathematics and Statistics, UVic

     

Abstract

Watching a horror movie and someone shouts “HEEEELLLPPPPPPPPP” or someone replies to your joke with a huge “HAHAHAHAHAHAHAHAHAHAHA” is known as word stretching. Word stretching is not only an integral part of spoken language but is also found in many texts. Though, it is very rare in formal writing, it is frequently used on social media. Word stretching emphasizes the meaning of the underlying word, changes the context and impacts the sentiment intensity of the sentence. In this work, a rule-based fine-grained approach to sentiment analysis named StretchVADER is introduced that extends the capabilities of the rule-based approach called VADER. StretchVADER detects improved sentiment intensity using textual features such as stretched words and smileys by calculating a StretchVADER Score (SVS). This score is also used to label the dataset. It has been observed that many tweets contain stretched words and smileys, e.g. 28.5% in a randomly extracted dataset from Twitter. A dataset is also generated and annotated using SVS which contains detailed features related to stretched words and smileys. Finally, Machine Learning (ML) models are evaluated on two different data encoding techniques, e.g. TF-IDF and Word2Vec. The results obtained show that the XGBoost algorithm with 1500 gradient-boosted trees and TF-IDF data encoding achieved a higher accuracy, precision, recall and F1-score than the other ML models, i.e. 91.24%, 91.11%, 91.24% and 91.08%, respectively.