Roman Urdu Sentiment Corpus (RUDaSA)
Harvard Dataverse · HuggingFace
A large-scale Roman Urdu sentiment dataset built via privacy-preserving embedding pipelines. Competitive benchmarking of state-of-the-art Transformer models. Addresses a critical gap in low-resource South Asian NLP.
Task
Sentiment Analysis
Language
Roman Urdu
Models tested
XLM-RoBERTa, mBERT, and others
Privacy
Privacy-preserving embedding pipeline
DOI
10.21203/rs.3.rs-9827763/v1