Date of Graduation
Spring 2026
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Committee Chairperson
Md Amiruzzaman, PhD
Committee Member
Stefanie Amiruzzaman, PhD
Committee Member
Linh Ngo, PhD
Abstract
This thesis investigates the deployment of high-accuracy Isolated ASL Recognition (ISLR) in resource-constrained edge environments. We train a lightweight Spatio-Temporal Attention Network (SSTAN,∼2.7 M parameters,∼10 MB) on the WLASL-100 benchmark, achieving 75.25% Top-1 and 88.24% Top-5 accuracy with 139 ms CPU-only inference. A systematic comparison against frontier multimodal LLMs (Gemini 3 Flash, Gemini 3.1 Pro, Qwen 3 VL) shows SSTAN outperforms the best LLM baseline by∼1.85×in accuracy while being 22–230×faster and up to 40×cheaper annually. The LLMs’ core limitation is a lack of fine-grained temporal perception; they impose English-language semantic priors rather than learning the articulatory distinctions that define ASL signs. To demonstrate practical impact, we integrate the model into a browser-based ASL Word Search Game where users sign words via webcam instead of typing, grounding vocabulary practice in embodied, gesture-driven interaction.
Final Version Confirmation
1
Recommended Citation
Batchu, Raga Mouni, "Real-Time Isolated ASL Recognition: Evaluating Spatial-Temporal Networks and Multimodal LLMs" (2026). West Chester University Graduate Theses, Dissertations, and Final Projects. 64.
https://digitalcommons.wcupa.edu/all_capstones/64
