Date of Award

Spring 2025

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Committee Chairperson

Richard Burns, Ph.D.

Committee Member

David Cooper, Ph.D.

Committee Member

Md Amiruzzaman, Ph.D.

Abstract

The rapid evolution of language, driven by technological advancements, has created notable cultural gaps between generations, particularly in how they communicate. This gap is most apparent in the growing use of slang and emojis among younger generations. This study aims to explore whether Reddit comments can be classified by generation based on the usage of slang and emojis, the frequency of their use across generations, and how such features (slang and emojis) might influence the meaning of traditional language. Using Reddit’s API, we collected comments from four generational subreddits and applied various machine learning models, Naïve Bayes, Neural Networks, and Decision Trees to identify the most effective classification method. We compared both standard models and improved models that focus on selective features—slang and emojis—using both imbalanced and balanced datasets. Through this research, we seek to determine if machine learning models can effectively classify social media comments by generation based on certain linguistic features. Our findings show that the Neural Network model outperforms the other two models, making it a promising choice for future work in improving classifying comments by generation.To our knowledge, this is the first work of cross-examining machine learning models for real-world generational classification of text based on specific features (slang and emojis), offering insight for applications in public social media platforms, video games, and general industry communication. It also contributes to human linguistics by helping to show patterns and understand communication differences by generations.

Share

COinS