Facing the pressure of the extremely inflated network information and data overload surplus, it is extremely important to locate “valuable information” efficiently and accurately. Text summarization technology in the field of Natural Language Processing (NLP) is an effective means to analyze and process network information. In this paper, we propose an abstractive text summarization model based on bidirectional encoder representations from transformers (BERT) vectorization and bidirectional decoding. The BERT is adopted to obtain a more global vector representation, which helps the subsequent encoder and decoder to fuse the full-text information to generate a summary with high generality. The decoding phase adopts a bidirectional decoding structure and combines the attention mechanism to maintain the bilateral decoding result to generate summaries. The bidirectional decoding structure can be fine-tuned according to the bidirectional results, which can overcome the tilt problem of the unidirectional structure, and the generated summaries are more consistent. The experimental results on the NLPCC2017 text summarization dataset show that the summaries generated by our model have the higher coherence at the word and sentence level, and the stronger generalization of the full text.
In this paper, we propose the DBI-based parallel clustering partition method to address the problem of determination on the number of clusters for large-scale datasets. First, we calculate the dispersion of the samples within the class cluster under the current K centroids. Second, according to the idea of MapReduce programming framework, the parallelized algorithm processing is designed to calculate the distance between each class cluster in the clustering result, and the distance between class clusters is measured by calculating the new center of mass formed by the data samples between different class clusters. Third, the maximum of the similarity between this class cluster and all other class clusters is calculated as the similarity of the clustering result class clusters. Finally, the similarity of all class clusters is averaged as the DBI index under the current K value, which is used as the evaluation criterion for clustering performance. The experimental results show the effectiveness and efficiency of our algorithm on two datasets for experimental comparison.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.