Authors : Zahra Aminoroaya, Behzad Soleimani Neysiani and Mohammad Hossein Nadimi Shahraki
Abstract: With the advent of the Internet and the spread of computer users, many applications have been developed that are used by millions of user's everyday tasks like office applications or web browsers. Software companies spend over 45% of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage which aims to correctly assign a developer to a new bug. Bug-tracking and issue-tracking systems tend to be populated with bugs, issues or tickets written by a wide variety of bug reporters with different levels of training and knowledge about the system being discussed. Many bug reporters lack the skills, vocabulary, knowledge or time to efficiently search the issue tracker for similar issues. As a result, issue trackers are often full of duplicate issues and bugs and bug triaging is time consuming and error prone. Software bugs occur for a wide range of reasons. Bug reports can be generated automatically or drafted by user of software. Bug reports can also go with other malfunctions of the software, mostly for the beta or unsteady versions of the software. Most often, these bug reports are improved with user contributed accounts experiences as to know what in fact faced by him/her. Addressing these bug's for the majority of effort spent in the maintenance phase of a software project life cycle. Most often, several bug reports, sent by different users, match up to the same defect. Nevertheless, every bug report is to be analyzed separately and carefully for the possibility of a potential bug. The person responsible for processing the newly reported bugs, checking for duplicates and passing them to suitable developers to get fixed is called a Triager and this process is called Triaging. The utility of bug tracking systems is hindered by a large number of duplicate bug reports. In many open source software projects as many as one third of all reports are duplicates. This identification of duplicacy in bug reports is time-taking and adds to the already high cost of software maintenance. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. This study presents an overview of the works done to better detect duplicate bugs have been conducted on open source data set.
Zahra Aminoroaya, Behzad Soleimani Neysiani and Mohammad Hossein Nadimi Shahraki, 2018. Detecting Duplicate Bug Reports Techniques. Research Journal of Applied Sciences, 13: 522-531.