An Enhanced Machine Learning with Nlp Modelling Technique for Smishing Attacks Detection in Low-Resourced Languages

Zimba, Aaron; Phiri, Katongo, Ongani

Please use this identifier to cite or link to this item: http://41.63.8.17:80/jspui/handle/123456789/272

Full metadata record

DC Field	Value	Language
dc.contributor.author	Zimba, Aaron	-
dc.contributor.author	Phiri, Katongo, Ongani	-
dc.date.accessioned	2025-08-19T13:52:11Z	-
dc.date.available	2025-08-19T13:52:11Z	-
dc.date.issued	2025-04	-
dc.identifier.citation	Zimba, Aaron and Phiri, Katongo Ongani, An Enhanced Machine Learning with Nlp Modelling Technique for Smishing Attacks Detection in Low-Resourced Languages. Available at SSRN: https://ssrn.com/abstract=5195337 or http://dx.doi.org/10.2139/ssrn.5195337	en_US
dc.identifier.uri	http://41.63.8.17:80/jspui/handle/123456789/272	-
dc.description.abstract	Smishing, a form of phishing through SMS, has emerged as a significant cybersecurity threat, particularly on mobile money platforms in regions with limited cybersecurity awareness. This research introduces a robust machine learning model integrated with advanced natural language processing (NLP) techniques for effective smishing detection. The proposed model targets English and Bemba, a low-resourced language, addressing a critical gap in cybersecurity research for linguistically diverse, resource-constrained environments. The model incorporates pseudonymization to enhance data security by anonymizing sensitive information such as personal identifiers while retaining the contextual integrity of messages. Named Entity Recognition (NER) is employed to detect and mask sensitive entities, further safeguarding user privacy. To bolster model robustness against adversarial attacks, adversarial training is applied, exposing the model to perturbed inputs during training to improve its resilience to manipulation. Regularization techniques, specifically L1 regularization, are used to optimize the model by reducing overfitting and ensuring efficient performance. The evaluation utilized datasets in English, Bemba, and a combination of both to assess the model’s adaptability to multilingual inputs. The results demonstrate superior performance, with high F1-Scores, low log loss, and AUC values exceeding 0.97 across datasets. These metrics underscore the model’s capability to distinguish between smishing and legitimate messages effectively. By combining machine learning and NLP in a privacy-preserving and security-enhanced framework, this research provides a scalable, efficient solution for smishing detection in under-resourced contexts, contributing significantly to advancements in cybersecurity for low-resourced languages.	en_US
dc.language.iso	en	en_US
dc.publisher	KeAi: Cyber Security & Applications	en_US
dc.subject	Data Privacy	en_US
dc.subject	Mobile money platforms	en_US
dc.subject	Adversarial training,	en_US
dc.subject	Low-resourced language,	en_US
dc.subject	Pseudonymization	en_US
dc.title	An Enhanced Machine Learning with Nlp Modelling Technique for Smishing Attacks Detection in Low-Resourced Languages	en_US
dc.type	Article	en_US
Appears in Collections:	Research Papers and Journal Articles

Files in This Item:

File	Description	Size	Format
ssrn-5195337.pdf		738.65 kB	Adobe PDF	View/Open

Show simple item record