Paper
6 May 2022 An ensemble multilingual model for toxic comment classification
Gaofei Xie
Author Affiliations +
Proceedings Volume 12176, International Conference on Algorithms, Microchips and Network Applications; 121761P (2022) https://doi.org/10.1117/12.2636419
Event: International Conference on Algorithms, Microchips, and Network Applications 2022, 2022, Zhuhai, China
Abstract
The online toxic comments cause enormous harm to the society, where toxicity is defined as anything rude, disrespectful or otherwise likely to make someone leave a discussion. To have a safer, more collaborative internet, grateful contributions are made by a main area of focus on machine learning models to identify toxicity in English, whereas part of misinformation disseminates in other languages. Over the past year, pretraining multilingual language models give rise to impressive gains for cross lingual toxicity classification. This paper presents an approach to build toxicity models applying the Jigsaw Multilingual Toxic Comment Classification dataset provided by Kaggle. We set our ensemble model in three parts based on Besides, we implement subsample, Pseudo-labeling with open-subtitles, translating non-English languages to English language, and Post Processing to improve the classification accuracy indispensably. Our final model achieved an AUC of 0.9469 for the training set and 0.9485 for the validation set, demonstrating the effectiveness of performance under cross-lingual toxicity detectors.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Gaofei Xie "An ensemble multilingual model for toxic comment classification", Proc. SPIE 12176, International Conference on Algorithms, Microchips and Network Applications, 121761P (6 May 2022); https://doi.org/10.1117/12.2636419
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Toxicity

Performance modeling

Associative arrays

Classification systems

Transformers

Neck

RELATED CONTENT

Toxic detection based on RoBERTa and TF-IDF
Proceedings of SPIE (November 10 2022)
Use transformer encoder for KPI anomaly detection
Proceedings of SPIE (May 05 2022)
Research on text classification model based on ERNIE2.0-DICNN
Proceedings of SPIE (November 22 2022)
Study on identification method for Artemisia argyi floss
Proceedings of SPIE (December 18 2023)

Back to Top