The study of bug report-oriented program error location has the characteristics of strong pertinence and low cost, which is an important direction in the current research on program error location. This type of research takes bug reports and source code as input sources, and establishes a mapping relationship between the two through semantic mapping strategies to locate program errors. In the fine-grained program error location scenario, there is a problem that the location accuracy is greatly reduced. Existing empirical studies analyze the difference in location accuracy from two aspects: input source data noise and semantic mapping strategy selection, but most studies take the established location tools and methods as the evaluation object, the evaluation data type is single, and there is a lack of fine-grained analysis of constructing key variables. In order to evaluate the influence of key variables of location method on location accuracy, this paper decouples the location method through pseudo-siamese network, measures the sensitivity of location accuracy by counting the gain of location accuracy under different input source data types, and adds input source data types and a variety of semantic mapping strategies, Based on the evaluation of 23808 bug reports and corresponding source code data in 7 open source projects published on JIRA, this paper provides a more detailed empirical basis for additional data type selection and weight allocation, combined learning of multiple data types and different semantic mapping strategies in fine-grained program error location.
KEYWORDS: Semantics, Associative arrays, Error analysis, Data modeling, Feature extraction, Education and training, Tunable filters, Data processing, Deep learning, Reliability
Information retrieval (IR) based bug location technology is a relatively recognized lightweight bug location method at present. Most IR bug location methods solve the problem of semantic difference between natural language and code language in the bug report based on code semantic intelligibility, and use semantic similarity to construct IR model to locate source code errors through bug report. However, most IR localization studies take error report description as the guidance for code semantic generation, ignoring the difference between error report and error semantics. Due to the irregular submission of error reports and the ambiguity of error descriptions, this kind of research faces the problem of low location accuracy. We found that the code data is the data written in the specification and verified by the program compilation. Compared with the bug data submitted by the tester, the semantic ambiguity is relatively weaker. Therefore, we use code data as the semantic generation of teacher network training bug data to form SGBL method. In addition, based on the bug data set composed by Jena and other projects, we evaluated the effectiveness of our method and explained the relationship between the semantic extraction method and the bug location accuracy. The experimental results show the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.