Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform
References
Adomavicius, G. and Tuzhilin, A. Toward the next genera-
tion of recommender systems: A survey of the state-of-
the-art and possible extensions. IEEE Transactions on
Knowledge & Data Engineering, (6):734–749, 2005.
Agarwal, A., Bird, S., Cozowicz, M., Hoang, L., Langford,
J., Lee, S., Li, J., Melamed, D., Oshri, G., Ribas, O.,
et al. Making contextual decisions with low technical
debt. arXiv preprint arXiv:1606.03966, 2016.
Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E.
The nonstochastic multiarmed bandit problem. SIAM
journal on computing, 32(1):48–77, 2002.
Bellemare, M., Castro, P. S., Gelada, C., Kumar, S., and
Moitra, S. Dopamine. 2018. URL
https://github.
com/google/dopamine.
Bellemare, M. G., Dabney, W., and Munos, R. A distri-
butional perspective on reinforcement learning. arXiv
preprint arXiv:1707.06887, 2017.
Bellman, R. A markovian decision process. Journal of
Mathematics and Mechanics, pp. 679–684, 1957.
Bishop, C. M. Mixture density networks. Technical report,
1994.
Bottou, L., Peters, J., Qui
˜
nonero-Candela, J., Charles, D. X.,
Chickering, D. M., Portugaly, E., Ray, D., Simard, P.,
and Snelson, E. Counterfactual reasoning and learning
systems: The example of computational advertising. The
Journal of Machine Learning Research, 14(1):3207–3260,
2013.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. Openai gym.
arXiv preprint arXiv:1606.01540, 2016.
Caspi, I., Leibovich, G., Novik, G., and Endrawis, S. Rein-
forcement learning coach, December 2017. URL
https:
//doi.org/10.5281/zenodo.1134899.
Chen, M., Beutel, A., Covington, P., Jain, S., Belletti, F.,
and Chi, E. H. Top-k off-policy correction for a rein-
force recommender system. In Proceedings of the Twelfth
ACM International Conference on Web Search and Data
Mining, pp. 456–464. ACM, 2019.
Deisenroth, M. and Rasmussen, C. E. Pilco: A model-based
and data-efficient approach to policy search. In Proceed-
ings of the 28th International Conference on machine
learning (ICML-11), pp. 465–472, 2011.
Dudık, M., Langford, J., and Li, L. Doubly robust policy
evaluation and learning. 2011.
Dulac-Arnold, G., Evans, R., van Hasselt, H., Sunehag, P.,
Lillicrap, T., Hunt, J., Mann, T., Weber, T., Degris, T., and
Coppin, B. Deep reinforcement learning in large discrete
action spaces. arXiv preprint arXiv:1512.07679, 2015.
Exchange, O. N. N. Onnx github repository, 2018.
Finn, C. and Levine, S. Deep visual foresight for planning
robot motion. In 2017 IEEE International Conference on
Robotics and Automation (ICRA), pp. 2786–2793. IEEE,
2017.
Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I.,
Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin,
O., et al. Noisy networks for exploration. arXiv preprint
arXiv:1706.10295, 2017.
Fujimoto, S., Meger, D., and Precup, D. Off-policy deep re-
inforcement learning without exploration. arXiv preprint
arXiv:1812.02900, 2018.
Ha, D. and Schmidhuber, J. World models. arXiv preprint
arXiv:1803.10122, 2018.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. Soft
actor-critic: Off-policy maximum entropy deep reinforce-
ment learning with a stochastic actor. arXiv preprint
arXiv:1801.01290, 2018.
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostro-
vski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., and
Silver, D. Rainbow: Combining improvements in deep
reinforcement learning. arXiv preprint arXiv:1710.02298,
2017.
Horvitz, D. G. and Thompson, D. J. A generalization of sam-
pling without replacement from a finite universe. Journal
of the American statistical Association, 47(260):663–685,
1952.
Huang, T.-W. Tensorboardx.
https://github.com/
lanpa/tensorboardX, 2018.
Ioffe, S. and Szegedy, C. Batch normalization: Accelerating
deep network training by reducing internal covariate shift.
arXiv preprint arXiv:1502.03167, 2015.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.,
Girshick, R., Guadarrama, S., and Darrell, T. Caffe:
Convolutional architecture for fast feature embedding. In
Proceedings of the 22nd ACM international conference
on Multimedia, pp. 675–678. ACM, 2014.
Jiang, N. and Li, L. Doubly robust off-policy value evalu-
ation for reinforcement learning. In Proceedings of the
33rd International Conference on International Confer-
ence on Machine Learning (ICML), volume Volume 48,
pp. 652–661. JMLR. org, 2016.