Paper
8 April 2024 Topic extraction over danmaku text with pre-training model
Jing Yang, Xin Chen, Junchao Wu
Author Affiliations +
Proceedings Volume 13090, International Conference on Computer Application and Information Security (ICCAIS 2023); 1309015 (2024) https://doi.org/10.1117/12.3025823
Event: International Conference on Computer Application and Information Security (ICCAIS 2023), 2023, Wuhan, China
Abstract
Topic extraction over danmaku is an important task due to the prevalence of danmaku text on the video website. Directly applying traditional topic models on danmaku text can’t work well. The underlying cause is that danmaku text is very short, unconventional and lacking explicit meaning. In this paper, an improved topic model that extends BTM is proposed to infer topics from danmaku text. The special steps in our method are that: (1) a pretraining model is trained on danmaku corpus to obtain word and sentence embeddings; (2) danmaku texts are clustered to generate distinct pseudo-danmaku texts; (3) biterms with the same or similar word pairs are removed from biterm set. Experimental results show that our method can improve the diversity among topics and find some special topic words.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Jing Yang, Xin Chen, and Junchao Wu "Topic extraction over danmaku text with pre-training model", Proc. SPIE 13090, International Conference on Computer Application and Information Security (ICCAIS 2023), 1309015 (8 April 2024); https://doi.org/10.1117/12.3025823
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Printing

Data modeling

Education and training

Modeling

3D printing

Analytical research

Back to Top