Low-resource NLP & speech intelligence.
Independent research and international academic collaboration — King Saud University, EPU Kuwait, Doane University (USA), and Hanyang University (Korea). Every release ships reproducible pipelines, evaluation documentation, and a permanent DOI.
Low-Resource NLP
Datasets, annotation pipelines, and fine-tuned models for Roman Urdu — filling the gap between major-language NLP and underrepresented South Asian languages.
Speech Intelligence
Vocal fatigue estimation, speaker verification, and continuous vocal load monitoring — deployed as open libraries and production REST APIs.
Cognitive & Ergonomic Systems
Multi-institutional research applying Cognitive Systems Engineering to real-world healthcare settings.
- Under Review2026
Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs
ECAPA-TDNN-VHE designed from scratch with supervised contrastive loss — 2.5× accuracy over baseline (78% vs 36%), F1 scores 0.85 / 0.78 / 0.70 across three fatigue classes.
Springer · EURASIP J. on Signal ProcessingDOI - Under Review2026
Continuous Vocal Load Monitoring in Professional Voice Users
Development and occupational validation of an automated vocal load assessment tool for professional voice users — clinical-grade speech analysis in production.
Journal of Voice · King Saud University & EPU Kuwait - Under Review2026
RUEmoCorp: A Large-Scale Roman Urdu Emotion Corpus & Benchmark Suite
First large-scale Roman Urdu emotion corpus — 134K labeled samples with Fleiss κ = 0.658 (substantial agreement), multi-institute annotation, fully open-source on HuggingFace and Harvard Dataverse.
Language Resources and Evaluation (Springer)DOI - Published Preprint2026
RUDaSA: Roman Urdu Dataset for Sentiment Analysis — A Large-Scale, Curated Corpus with Privacy-Preserving Embeddings and Competitive Benchmarking of Transformer Models
Large-scale Roman Urdu sentiment corpus built via privacy-preserving embedding pipelines. Benchmarks state-of-the-art Transformer models — addressing a critical gap in low-resource South Asian NLP.
Research Square · PreprintDOI - Published Preprint2025
Data-Centric Roman Urdu NLP: Dataset Curation & Model Benchmarking
Largest high-quality Roman Urdu sentiment dataset via privacy-preserving embedding pipelines — SOTA 0.84 accuracy, 0.83 Macro-F1.
Zenodo · PreprintDOI - Published Preprint2025
Forecast-Based Decision Support System for Mango Malformation
Time-series forecasting and smart-agriculture DSS — demonstrated 50–60% yield improvement through data-driven intervention.
Zenodo · PreprintDOI - In Progress2026
Ergonomic Interventions and Cognitive Workload in Healthcare Settings: A Qualitative Case Study Using Cognitive Systems Engineering
Multi-institutional international study applying Cognitive Systems Engineering to healthcare ergonomics — systematic analysis of workload, safety, and intervention efficacy.
Hanyang University (Korea) · King Saud University (Saudi Arabia) · Doane University (USA)