Sumformer: recursive positional encoding for transformer in short text classification

Peililn Zhan; Liren Lu; Weishuang Huang; Manna Zheng; Qingwen Lin; Yayao Zuo

doi:10.1117/12.2687014

21 July 2023 Sumformer: recursive positional encoding for transformer in short text classification

Peililn Zhan, Liren Lu, Weishuang Huang, Manna Zheng, Qingwen Lin, Yayao Zuo

Proceedings Volume 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023); 127172W (2023) https://doi.org/10.1117/12.2687014
Event: 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 2023, Wuhan, China

Abstract

In various transformers, positional encoding is used to compensate for the inability of the attention mechanism to capture positional information between words. Previous research on transformers' temporal modeling has utilized recursive and relative positional encoding based on the Recurrent Neural Network (RNN). Recursive positional encoding captures linear text structure but lacks parallelization, hindering speed. In contrast, relative positional encoding ignores linear text structure, leading to weaker performance in short text classification compared to recursive positional encoding. To address the issues, we propose a model, sumformer, which mainly includes two parts different from the other transformers: cumsum calculation and summer initialization. Cumsum calculation simplifies the feature extraction part of RNN by a substitution method, replacing the dynamic rate function of RNNs with static trainable position parameters, and preserves the recursive structure, which enables the model to capture the linear structure information of the text through cumsum calculation method and maintains a low time overhead compared to RNNs. In addition, the summer initialization method, which limits the highest standard deviation of the positional parameter, enables the model to pay attention to the multi-level information of the text during initialization, with the richer optimization space, thereby improving the convergence ability of the model. The experimental results show the sumformer achieves roughly a 3% improvement in performance and a 58% improvement in speed compared to existing transformers based on recursive positional encoding. It achieves better short text classification faster, and summer initialization also can improve the performance without increasing training and inference time.

Citation Download Citation

Peililn Zhan, Liren Lu, Weishuang Huang, Manna Zheng, Qingwen Lin, and Yayao Zuo "Sumformer: recursive positional encoding for transformer in short text classification", Proc. SPIE 12717, 3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023), 127172W (21 July 2023); https://doi.org/10.1117/12.2687014

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
8 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Transformers

Modeling

Performance modeling

Data modeling

Feature extraction

Matrices

Mathematical optimization

Show All Keywords

Keywords/Phrases

Search In:

Publication Years