GraalVM Compiler Benchmark Results Dataset (Data Artifact)

Lubomír Bulej, Vojtech Horký, Michele Tucci, Petr Tuma, François Farquet, David Leopoldseder, Aleksandar Prokopec
ICPE 2023

Abstract

Systematic testing of software performance during development is a persistent challenge, made increasingly important by the magnifying effect of mass software deployment on any savings. In practice, such systematic performance evaluation requires a combination of an efficient and reliable measurement procedure integrated into the development environment, coupled with an automated evaluation of the measurement results and compact reporting of detected performance anomalies.

A realistic evaluation of research contributions to systematic software performance testing can benefit from the availability of measurement data that comes from long term development activities in a well documented context. This paper presents a data artifact that aggregates more than 70 machine time years of performance measurements over 7 years of development of the GraalVM Compiler Project, aiming to reduce the costs of evaluating research contributions in this and similar contexts.

Links

[PDF] [BibTex] [ACM]

References

[1] 2008. SPECjvm2008 Project Home Page. https://www.spec.org/jvm2008

[2] 2023. GraalVM Benchmark Results Artifact. https://zenodo.org/communities/graalvm-compiler-benchmark-results.

[3] 2023. GraalVM Benchmark Results Viewer. https://graal.d3s.mff.cuni.cz.

[4] 2023. GraalVM Project Home Page. https://www.graalvm.org.

[5] Milad Abdullah, Lubomír Bulej, Tomáš Bureš, Petr Hnětynka, Vojtěch Horký, and Petr Tůma. 2022. Reducing Experiment Costs in Automated Software Performance Regression Detection. In Proceedings of SEAA 2022.

[6] Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of OOPSLA 2006. https://doi.org/10.1145/1167473.1167488

[7] Jinfu Chen, Weiyi Shang, and Emad Shihab. 2022. Perf JIT: Test-Level Just-in-Time Prediction for Performance Regression Introducing Commits. IEEE Transactions on Software Engineering 48, 5 (May 2022), 1529–1544. https://doi.org/10.1109/ TSE.2020.3023955

[8] David Daly. 2021. Creating a Virtuous Cycle in Performance Testing at MongoDB. In Proceedings of ICPE 2021. https://doi.org/10.1145/3427921.3450234

[9] David Daly. 2021. MongoDB Benchmark Results Artifact. https://doi.org/10. 5281/zenodo.5138516

[10] Augusto Born De Oliveira, Sebastian Fischmeister, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2017. Perphecy: Performance Regression Test Selection Made Simple but Effective. In Proceedings of ICST 2017. https: //doi.org/10.1109/ICST.2017.17

[11] Zishuo Ding, Jinfu Chen, and Weiyi Shang. 2020. Towards the Use of the Readily Available Tests from the Release Pipeline as Performance Tests: Are We There Yet?. In Proceedings of ICSE 2020. https://doi.org/10.1145/3377811.3380351

[12] Tobias Hartmann, Albert Noll, and Thomas Gross. 2014. Efficient Code Management for Dynamic Multi-Tiered Compilation Systems. In Proceedings of PPPJ 2014. https://doi.org/10.1145/2647508.2647513

[13] Henrik Ingo and David Daly. 2020. Automated System Performance Testing at MongoDB. In Proceedings of DBTEST 2020. https://doi.org/10.1145/3395032. 3395323

[14] Christoph Laaber, Harald C. Gall, and Philipp Leitner. 2021. Applying Test Case Prioritization to Software Microbenchmarks. Empirical Software Engineering 26, 6 (Sept. 2021), 133. https://doi.org/10.1007/s10664-021-10037-x

[15] Christoph Laaber and Philipp Leitner. 2018. An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment. In Proceedings of MSR 2018. https://doi.org/10.1145/3196398.3196407

[16] Philipp Leitner and Cor-Paul Bezemer. 2017. An Exploratory Study of the State of Practice of Performance Testing in Java-Based Open Source Projects. In Proceedings of ICPE 2017. https://doi.org/10.1145/3030207.3030213

[17] Shaikh Mostafa, Xiaoyin Wang, and Tao Xie. 2017. PerfRanker: Prioritization of Performance Regression Tests for Collection-Intensive Software. In Proceedings of ISSTA 2017. https://doi.org/10.1145/3092703.3092725

[18] Stefan Mühlbauer, Sven Apel, and Norbert Siegmund. 2021. Identifying Software Performance Changes across Variants and Versions. In Proceedings of ASE 2020. https://doi.org/10.1145/3324884.3416573

[19] Aleksandar Prokopec, Andrea Rosà, David Leopoldseder, Gilles Duboscq, Petr Tůma, Martin Studener, Lubomír Bulej, Yudi Zheng, Alex Villazón, Doug Simon, Thomas Würthinger, and Walter Binder. 2019. Renaissance: Benchmarking Suite for Parallel Applications on the JVM. In Proceedings of PLDI 2019. 17. https://doi.org/10.1145/3314221.3314637

[20] Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. 2011. Da Capo Con Scala: Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine. In Proceedings of OOPSLA 2011. https://doi.org/10.1145/2048066. 2048118

[21] Luca Traini, Vittorio Cortellessa, Daniele Di Pompeo, and Michele Tucci. 2022. Towards Effective Assessment of Steady State Performance in Java Software: Are We There Yet? Empirical Software Engineering 28, 1 (Nov. 2022), 13. https://doi.org/10.1007/s10664-022-10247-x

[22] Petr Tůma. 2018. Frame Allocation Randomizer Project Home Page. https://github.com/d-iii-s/frame-randomizer.