Paper
21 July 2023 Sumformer: recursive positional encoding for transformer in short text classification
Peililn Zhan, Liren Lu, Weishuang Huang, Manna Zheng, Qingwen Lin, Yayao Zuo
Author Affiliations +
Proceedings Volume 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023); 127172W (2023) https://doi.org/10.1117/12.2687014
Event: 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 2023, Wuhan, China
Abstract
In various transformers, positional encoding is used to compensate for the inability of the attention mechanism to capture positional information between words. Previous research on transformers' temporal modeling has utilized recursive and relative positional encoding based on the Recurrent Neural Network (RNN). Recursive positional encoding captures linear text structure but lacks parallelization, hindering speed. In contrast, relative positional encoding ignores linear text structure, leading to weaker performance in short text classification compared to recursive positional encoding. To address the issues, we propose a model, sumformer, which mainly includes two parts different from the other transformers: cumsum calculation and summer initialization. Cumsum calculation simplifies the feature extraction part of RNN by a substitution method, replacing the dynamic rate function of RNNs with static trainable position parameters, and preserves the recursive structure, which enables the model to capture the linear structure information of the text through cumsum calculation method and maintains a low time overhead compared to RNNs. In addition, the summer initialization method, which limits the highest standard deviation of the positional parameter, enables the model to pay attention to the multi-level information of the text during initialization, with the richer optimization space, thereby improving the convergence ability of the model. The experimental results show the sumformer achieves roughly a 3% improvement in performance and a 58% improvement in speed compared to existing transformers based on recursive positional encoding. It achieves better short text classification faster, and summer initialization also can improve the performance without increasing training and inference time.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Peililn Zhan, Liren Lu, Weishuang Huang, Manna Zheng, Qingwen Lin, and Yayao Zuo "Sumformer: recursive positional encoding for transformer in short text classification", Proc. SPIE 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 127172W (21 July 2023); https://doi.org/10.1117/12.2687014
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Modeling

Performance modeling

Data modeling

Feature extraction

Matrices

Mathematical optimization

Back to Top