In RSS, the patches generated are ranked based on LCS
(longest common subsequence) length between original and
patched code (from largest to smallest). ELIXIR+ leverages
two synthesis strategies: the original one of ELIXIR and RSS.
The final list of candidate patches is built by interleaving two
lists of patches individually generated from the two synthesis
strategies.
Also, we introduce another repair template that simply
swaps method arguments (among the same/different expres-
sion(s)) in the buggy statement for the issue in lesson 3-(C).
This template likely generates candidate patches similar to the
original code when mutating method invocations.
The result of the above enhancements is shown in Table II
(ELIXIR+ column). Although our enhancements are simple,
the performance improvement is remarkable. With the real
fault localization results, ELIXIR+ correctly repairs 8/20 (40%)
bugs, whereas that of ELIXIR is only 2/20 (10%). With the
perfect fault localization results, the success rate rises from
2/20 (10%) to 10/20 (50%).
We consider that the redundancy assumption holds also
in industrial software, and thereby redundancy-based patch
generation produces a better result. In addition, widening the
variety of repair templates will well contribute to the repair
performance.
V. RELATED WORK
APR is a major research topic in the software engineering
area; hundreds of papers on APR have been published [1].
While many studies reported on applying APR tools to OSS,
industrial experience reports on APR are very few [4].
Two APR tools, SapFix [5] and Getafix [6], are integrated
into the Facebook development workflow. SapFix is an end-
to-end repair tool: first, it detects latent NPEs with tests auto-
matically designed by Sapienz; then, it tries to repair them via
mutation or fix templates, resulting in ≈ 50% correct fixes [5].
Getafix is a repair tool that learns fix patterns from bug-fixing
histories. Unlike G&V repair, it utilizes the static analyzer,
Infer, for latent bug detection and patch validation. Over 40–
60% of null-related bugs are correctly fixed in Facebook [6].
Naitou et al. [7] reported an industrial application of two
general APR tools, ASTOR [8] and NOPOL [9]. Of 327
industrial bugs to investigate, they applied the APR tools to 9
bugs, resulting in only 1 correct fix. They also reported some
barriers to the industrial use of APR; for instance, only a small
portion of the bugs can be repaired by program-code mutation
(i.e., other types of files need changing). It indicates the
difficulty of applying general APR tools to industrial software
and the immatureness of current APR techniques.
Apart from industrial reports, current main streams of
APR research are to prevent patch overfitting and better rank
candidate patches. A major approach to overfitting prevention
is to leverage test case generation [21]. As for better ranking,
the types of approaches are diverse: e.g., machine learning-
based [10], similarity-based [15], [16], etc.
A recent study reported that the issue of lacking bug-
exposing test cases exists also in OSS [14]. To deal with that,
new APR approaches from different angles are required, such
as bug report-driven [14], static analysis-based [19], etc.
VI. CONCLUSION
This paper reported our experience applying ELIXIR, a
state-of-the-art APR tool, to large industrial software. Our case
study revealed several critical obstacles to the industrial use of
APR: low recall, lack of bug-exposing tests, and poor success
rate, among others. Current APR techniques still have several
immature aspects for practical industrial deployment; it needs
further improvement of the practicality of APR techniques.
We also presented the preliminary results of our ongoing im-
provement efforts. ELIXIR+, an enhanced version of ELIXIR,
additionally leverages new repair templates and a redundancy-
based synthesis strategy based on the insights from our first
trial. The enhancements are simple but contribute substantially
to repair performance, increasing the success rate of repair
from 10% up to 40%.
We hope this report contributes to future research in the
APR community.
REFERENCES
[1] L. Gazzola et al., “Automatic software repair: A survey,” IEEE Trans.
Softw. Eng., vol. 45, no. 1, pp. 34–67, 2019.
[2] T. Durieux et al., “Empirical review of Java program repair tools: A
large-scale experiment on 2,141 bugs and 23,551 repair attempts,” in
FSE, 2019, pp. 302–313.
[3] R. Just et al., “Defects4J: A database of existing faults to enable
controlled testing studies for Java programs,” in ISSTA, 2014, pp. 437–
440.
[4] M. Monperrus, “The living review on automated program repair,”
HAL/archives-ouvertes.fr, Tech. Rep. hal-01956501, 2018.
[5] A. Marginean et al., “Sapfix: Automated end-to-end repair at scale,” in
ICSE-SEIP, 2019, pp. 269–278.
[6] J. Bader et al., “Getafix: Learning to fix bugs automatically,” Proc. ACM
Program. Lang., vol. 3, no. OOPSLA, pp. 159:1–159:27, Oct. 2019.
[7] K. Naitou et al., “Toward introducing automated program repair tech-
niques to industrial software development,” in ICPC, 2018, pp. 332–335.
[8] M. Martinez and M. Monperrus, “ASTOR: A program repair library for
Java (demo),” in ISSTA, 2016, pp. 441–444.
[9] J. Xuan et al., “Nopol: Automatic repair of conditional statement bugs
in Java programs,” IEEE Trans. Softw. Eng., vol. 43, no. 1, pp. 34–55,
Jan. 2017.
[10] R. K. Saha et al., “Elixir: Effective object-oriented program repair,” in
ASE, 2017, pp. 648–659.
[11] C. Le Goues et al., “A systematic study of automated program repair:
Fixing 55 out of 105 bugs for $8 each,” in ICSE, 2012, pp. 3–13.
[12] S. Mechtaev et al., “Angelix: Scalable multiline program patch synthesis
via symbolic analysis,” in ICSE, 2016, pp. 691–701.
[13] Z. Chen et al., “SEQUENCER: Sequence-to-sequence learning for end-
to-end program repair,” IEEE Trans. Softw. Eng., 2019.
[14] K. Liu et al., “Tbar: Revisiting template-based automated program
repair,” in ISSTA, 2019, pp. 31–42.
[15] J. Jiang et al., “Shaping program repair space with existing patches and
similar code,” in ISSTA, 2018, pp. 298–309.
[16] M. Wen et al., “Context-aware patch generation for better automated
program repair,” in ICSE, 2018, pp. 1–11.
[17] S. Saha et al., “Harnessing evolution for multi-hunk program repair,” in
ICSE, 2019, pp. 13–24.
[18] A. Koyuncu et al., “iFixR: Bug report driven program repair,” in FSE,
2019, pp. 314–325.
[19] R. Bavishi et al., “Phoenix: Automated data-driven synthesis of repairs
for static analysis violations,” in FSE, 2019, pp. 613–624.
[20] E. T. Barr et al., “The plastic surgery hypothesis,” in FSE, 2014, pp.
306–317.
[21] Z. Yu et al., “Alleviating patch overfitting with automatic test generation:
a study of feasibility and effectiveness for the Nopol repair system,”
Empirical Software Engineering, vol. 24, no. 1, pp. 33–67, Feb 2019.
616