Research output

Low-resource NLP & speech intelligence.

Independent research and international academic collaboration — King Saud University, EPU Kuwait, Doane University (USA), and Hanyang University (Korea). Every release ships reproducible pipelines, evaluation documentation, and a permanent DOI.

Low-Resource NLP

Datasets, annotation pipelines, and fine-tuned models for Roman Urdu — filling the gap between major-language NLP and underrepresented South Asian languages.

Speech Intelligence

Vocal fatigue estimation, speaker verification, and continuous vocal load monitoring — deployed as open libraries and production REST APIs.

Cognitive & Ergonomic Systems

Multi-institutional research applying Cognitive Systems Engineering to real-world healthcare settings.

Publications & preprints

Under Review2026
Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs
ECAPA-TDNN-VHE designed from scratch with supervised contrastive loss — 2.5× accuracy over baseline (78% vs 36%), F1 scores 0.85 / 0.78 / 0.70 across three fatigue classes.
Springer · EURASIP J. on Signal ProcessingDOI
Under Review2026
Continuous Vocal Load Monitoring in Professional Voice Users
Development and occupational validation of an automated vocal load assessment tool for professional voice users — clinical-grade speech analysis in production.
Journal of Voice · King Saud University & EPU Kuwait
Under Review2026
RUEmoCorp: A Large-Scale Roman Urdu Emotion Corpus & Benchmark Suite
First large-scale Roman Urdu emotion corpus — 134K labeled samples with Fleiss κ = 0.658 (substantial agreement), multi-institute annotation, fully open-source on HuggingFace and Harvard Dataverse.
Language Resources and Evaluation (Springer)DOI
Published Preprint2026
RUDaSA: Roman Urdu Dataset for Sentiment Analysis — A Large-Scale, Curated Corpus with Privacy-Preserving Embeddings and Competitive Benchmarking of Transformer Models
Large-scale Roman Urdu sentiment corpus built via privacy-preserving embedding pipelines. Benchmarks state-of-the-art Transformer models — addressing a critical gap in low-resource South Asian NLP.
Research Square · PreprintDOI
Published Preprint2025
Data-Centric Roman Urdu NLP: Dataset Curation & Model Benchmarking
Largest high-quality Roman Urdu sentiment dataset via privacy-preserving embedding pipelines — SOTA 0.84 accuracy, 0.83 Macro-F1.
Zenodo · PreprintDOI
Published Preprint2025
Forecast-Based Decision Support System for Mango Malformation
Time-series forecasting and smart-agriculture DSS — demonstrated 50–60% yield improvement through data-driven intervention.
Zenodo · PreprintDOI
In Progress2026
Ergonomic Interventions and Cognitive Workload in Healthcare Settings: A Qualitative Case Study Using Cognitive Systems Engineering
Multi-institutional international study applying Cognitive Systems Engineering to healthcare ergonomics — systematic analysis of workload, safety, and intervention efficacy.
Hanyang University (Korea) · King Saud University (Saudi Arabia) · Doane University (USA)

Low-resource NLP & speech intelligence.

Low-Resource NLP

Speech Intelligence

Cognitive & Ergonomic Systems

Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs

Continuous Vocal Load Monitoring in Professional Voice Users

RUEmoCorp: A Large-Scale Roman Urdu Emotion Corpus & Benchmark Suite

RUDaSA: Roman Urdu Dataset for Sentiment Analysis — A Large-Scale, Curated Corpus with Privacy-Preserving Embeddings and Competitive Benchmarking of Transformer Models

Data-Centric Roman Urdu NLP: Dataset Curation & Model Benchmarking

Forecast-Based Decision Support System for Mango Malformation

Ergonomic Interventions and Cognitive Workload in Healthcare Settings: A Qualitative Case Study Using Cognitive Systems Engineering