ICSE-SEIP ’20, May 23–29, 2020, Seoul, Republic of Korea Diamantopoulos, et al.
REFERENCES
[1]
Apache Arrow. 2019. Apache Arrow. A cross-language development platform for
in-memory data. https://arrow.apache.org/
[2]
Susan Athey and Stefan Wager. 2017. Ecient Policy Learning.
arXiv:math.ST/1702.02896
[3]
Florian Auer and Michael Felderer. 2018. Current state of research on continuous
experimentation: A systematic mapping study. In Proceedings - 44th Euromicro
Conference on Software Engineering and Advanced Applications, SEAA 2018. 335–
344. https://doi.org/10.1109/SEAA.2018.00062
[4]
Juliette Aurisset, Michael Ramm, and Joshua Parks. 2017. Innovating Faster on
Personalization Algorithms at Netix Using Interleaving. https://medium.com/
netix-techblog/interleaving-in-online-experiments-at-netix-a04ee392ec55
[5]
George Box, Stuart Hunter, and William Hunter. 2005. Statistics for Experimenters.
Wiley.
[6]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psy-
chology. Qualitative Research in Psychology 3, 2 (jan 2006), 77–101. http:
//www.tandfonline.com/doi/abs/10.1191/1478088706qp063oa
[7]
Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert DeLine, Danyel
Fisher, John C. Platt, James F. Terwilliger, and John Wernsing. 2014. Trill: a
high-performance incremental query processor for diverse analytics. Proceedings
of the VLDB Endowment 8, 4 (Dec. 2014), 401–412.
[8]
Robert DeLine and Danyel Fisher. 2015. Supporting exploratory data analysis
with live programming. In 2015 IEEE Symposium on Visual Languages and Human-
Centric Computing (VL/HCC). IEEE, Atlanta, GA, 111–119.
[9]
Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker. 2013. Improving the sensitivity
of online controlled experiments by utilizing pre-experiment data. In Proceedings
of the sixth ACM international conference on Web search and data mining. ACM,
123–132.
[10]
Pavel Dmitriev, Somit Gupta, Dong Woo Kim, and Garnet Vaz. 2017. A Dirty
Dozen. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining - KDD ’17, Vol. Part F1296. ACM Press, New
York, New York, USA, 1427–1436.
[11]
Dmitriy Ryaboy. [n.d.]. Twitter experimentation: technical
overview. https://blog.twitter.com/engineering/en_us/a/2015/
twitter-experimentation-technical-overview.html
[12]
Dirk Eddelbuettel and Romain François. 2011. Rcpp: Seamless R and C++ Inte-
gration. Journal of Statistical Software 40, 8 (2011), 1–18. http://www.jstatsoft.
org/v40/i08/
[13]
Bradley Efron and Robert J Tibshirani. 1994. An introduction to the bootstrap.
CRC press.
[14]
A. Fabijan, P. Dmitriev, H. Holmström Olsson, and J. Bosch. 2018. Eective
Online Controlled Experiment Analysis at Large Scale. In 2018 44th Euromicro
Conference on Software Engineering and Advanced Applications (SEAA). 64–67.
[15]
Aleksander Fabijan, Pavel Dmitriev, Helena Holmstrom Olsson, and Jan Bosch.
2017. The Evolution of Continuous Experimentation in Software Product Devel-
opment: From Data to a Data-Driven Organization at Scale. In 2017 IEEE/ACM
39th International Conference on Software Engineering (ICSE). IEEE, Buenos Aires,
770–780. http://ieeexplore.ieee.org/document/7985712/
[16]
Aleksander Fabijan, Pavel Dmitriev, Helena Holmström Olsson, Jan Bosch, Lukas
Vermeer, and Dylan Lewis. 2019. Three Key Checklists and Remedies for Trust-
worthy Analysis of Online Controlled Experiments at Scale. In 2019 IEEE/ACM
41st International Conference on Software Engineering: Software Engineering in
Practice (ICSE-SEIP). 1–10.
[17]
Alessio Farcomeni. 2008. A review of modern multiple hypothesis testing, with
particular attention to the false discovery proportion. Statistical methods in
medical research 17, 4 (2008), 347–388.
[18]
Danyel Fisher, Badrish Chandramouli, Robert DeLine, Jonathan Goldstein, Andrei
Aron, Mike Barnett, John C Platt, James F Terwilliger, and John Wernsing. 2014.
Tempe: An Interactive Data Science Environment for Exploration of Temporal
and Streaming Data. (2014), 7.
[19]
Ronald A Fisher. 1922. On the mathematical foundations of theoretical statistics.
Philosophical Transactions of the Royal Society of London. Series A, Containing
Papers of a Mathematical or P hysical Character 222, 594-604 (1922), 309–368.
[20] Laurent Gautier. 2019. rpy2 - R in Python. https://rpy2.bitbucket.io/
[21]
Corey Grunewald and Matt Jaquish. 2018. Modernizing
the Web Playback UI. https://medium.com/netix-techblog/
modernizing-the-web-playback-ui-1ad2f184a5a0
[22]
Gaël Guennebaud, Benoît Jacob, et al
.
2010. Eigen v3. http://eigen.tuxfamily.org.
[23]
Somit Gupta, Lucy Ulanova, Sumit Bhardwaj, Pavel Dmitriev, Paul Ra, and Alek-
sander Fabijan. 2018. The Anatomy of a Large-Scale Experimentation Platform.
In 2018 IEEE International Conference on Software Architecture (ICSA). IEEE, 1–109.
https://doi.org/10.1109/ICSA.2018.00009
[24]
Hevner, March, Park, and Ram. 2004. Design Science in Information Systems
Research. MIS Quarterly 28, 1 (2004), 75. https://doi.org/10.2307/25148625
[25]
David Issa Mattos, Pavel Dmitriev, Aleksander Fabijan, Jan Bosch, and Helena
Holmström Olsson. 2018. An Activity and Metric Model for Online Controlled
Experiments. In Lecture Notes in Computer Science (including subseries Lecture
Notes in Articial Intelligence and Lecture Notes in Bioinformatics). Vol. 11271
LNCS. 182–198. http://link.springer.com/10.1007/978-3-030-03673-7{_}14
[26]
Wenzel Jakob, Jason Rhinelander, and Dean Moldovan. 2019. pybind11 – Seamless
operability between C++11 and Python. https://github.com/pybind/pybind11.
[27]
Raphael Lopez Kaufman, Jegar Pitchforth, and Lukas Vermeer. 2017. Democratiz-
ing online controlled experiments at Booking.com. arXiv:1710.08217 [cs] (Oct.
2017). http://arxiv.org/abs/1710.08217 arXiv: 1710.08217.
[28]
Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya
Xu. 2012. Trustworthy online controlled experiments: Five Puzzling Outcomes
Explained. In Proceedings of the 18th ACM SIGKDD international conference on
Knowledge discovery and data mining - KDD ’12. ACM Press, New York, New
York, USA, 786. http://dl.acm.org/citation.cfm?doid=2339530.2339653
[29]
Ron Kohavi, Roger Longbotham, Dan Sommereld, and Randal M Henne. 2009.
Controlled experiments on the web: Survey and practical guide. Data Mining
and Knowledge Discovery 18, 1 (2009), 140–181.
[30]
Ron Kohavi, Diane Tang, and Ya Xu. 2020. Trustworthy Online Controlled Experi-
ments: A Practical Guide to A/B Testing. Cambridge University Press, Cambridge,
United Kingdom ; New York, NY.
[31]
Gopal Krishnan. 2016. Selecting the best artwork for videos
through A/B testing. https://medium.com/netix-techblog/
selecting-the-best-artwork-for-videos-through-a-b-testing-f6155c4595f6
[32]
Roman Lukyanenko, Joerg Evermann, and Jerey Parsons. 2014. Instantiation va-
lidity in IS design research. In International Conference on Design Science Research
in Information Systems. Springer, 321–328.
[33]
Roman Lukyanenko, Joerg Evermann, and Jerey Parsons. 2015. Guidelines
for establishing instantiation validity in IT artifacts: A survey of IS research.
In International Conference on Design Science Research in Information Systems.
Springer, 430–438.
[34]
Toby Mao, Sri Sri Perangur, and Colin McFarland. [n.d.]. Reimagining Ex-
perimentation Analysis at Netix. https://medium.com/netix-techblog/
reimagining-experimentation-analysis-at-netix-71356393af21
[35]
Nick Nelson. 2016. The Power Of A Picture. https://media.netix.com/en/
company-blog/the-power-of-a-picture
[36]
Jay F. Nunamaker, Minder Chen, and Titus D. M. Purdin. 1990. Systems Devel-
opment in Information Systems Research. J. Manage. Inf. Syst. 7, 3 (Oct. 1990),
89âĂŞ106.
[37]
H. H. Olsson and J. Bosch. 2014. From Opinions to Data-Driven Software R
D: A Multi-case Study on How to Close the ’Open Loop’ Problem. In 2014 40th
EUROMICRO Conference on Software Engineering and Advanced Applications.
9–16.
[38]
Donald B Rubin. 2005. Causal Inference Using Potential Outcomes. J. Amer.
Statist. Assoc. 100, 469 (2005), 322–331.
[39]
Jerzy Splawa-Neyman, Dorota M Dabrowska, and TP Speed. 1990. On the ap-
plication of probability theory to agricultural experiments. Essay on principles.
Section 9. Statist. Sci. (1990), 465–472.
[40]
Martin Traverso. 2016. Presto: Interacting with petabytes of data at
Facebook. https://www.facebook.com/notes/facebook-engineering/
presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920/
[41]
Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy
array: a structure for ecient numerical computation. Computing in Science &
Engineering 13, 2 (2011), 22.
[42]
John Whitehead. 1997. The Design and Analysis of Sequential Clinical Trials,
Revised, 2nd Edition. Wiley.
[43]
John Whitehead. 1997. Group Sequential Methods with Applications to Clinical
Trials. Jennison, Christopher and Turnbull, Bruce W.
[44]
Jerey Wong, Randall Lewis, and Matthew Wardrop. 2019. Ecient Com-
putation of Linear Model Treatment Eects in an Experimentation Platform.
arXiv:stat.CO/1910.01305
[45]
Jerey Wooldridge. 2010. Econometric Analysis of Cross Section and Panel Data.
The MIT Press, Chapter 4.2.
[46]
Huizhi Xie and Juliette Aurisset. 2016. Improving the Sensitivity of Online
Controlled Experiments: Case Studies at Netix. ACM Press, 645–654.
[47]
Ya Xu and Nanyu Chen. 2016. Evaluating Mobile Apps with A/B and Quasi
A/B Tests. In Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining - KDD ’16, Vol. 13-17-Augu. ACM Press,
New York, New York, USA, 313–322. https://doi.org/10.1145/2939672.2939703
[48]
Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin. 2015.
From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social
Networks. In Proc. of KDD’15 (KDD ’15). ACM, 2227–2236.
[49]
Zhenyu Zhao, Miao Chen, Don Matheson, and Maria Stone. 2016. Online Exper-
imentation Diagnosis and Troubleshooting Beyond AA Validation. In Proc. of
DSAA’16. IEEE, 498–507.
[50]
Zhengyuan Zhou, Susan Athey, and Stefan Wager. 2018. Oine Multi-Action
Policy Learning: Generalization and Optimization. arXiv:stat.ML/1810.04778
200