Application Papers and Class Presentations

Below is a list of possible application papers for class presentation.
You may choose a paper from the list below, or suggest another paper. Suggested papers should in general be journal papers or published in a major conference (ICML, NIPS etc.).
Papers marked in light green cannot be chosen. 


 Baxter, A. Tridgell and L. Weaver, “Learning to play chess using temporal differences”, Machine Learning, Vol. 40(3), 2000, pp. 243-263. 


 X. Yan, P. Diaconis, P. Rusmevichientong, and B. Van Roy, “Solitaire: Man Versus Machine“, Proc. Advances in Neural Information Processing Systems (NIPS) 17, 2005.  [Yuval B.]


 S. Gelly and Y. Wang, “Exploration-exploitattion in Go: UCT for Monte-Carlo Go”, NIPS 2006. 


 D. Silver, R. Sutton and M. Müller, “Reinforcement learning of local shape in the game of Go”. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007. 


Remark: The earliest significant application of RL is Tesauro’s Backgammon (Shesh-Besh) program. This is still one of the most impressive ones. See:
G. Tesauro, “Temporal difference learning and TD Gammon”, Communications of the ACM 38(3), 58-67, 1995. It is recommended to take a look at this paper, but it is not offered for presentation in class.
Computing, Scheduling and Networking
 R. Crites and A. Barto. “Elevator Group Control Using Multiple Reinforcement Learning Agents.” Machine Learning 33, 1998, pp. 235-262. 


 S. Singh and D. Bertsekas, “Reinforcement learning in dynamic channel allocation in cellular telephone systems,” Proc. of NIPS-10 (Neural Information Processing Systems), 1997. 


A.Y. Zomaya et. al., “Framework for reinforcement-based scheduling in parallel processor systems”, IEEE Trans. on Parallel and Distributed Systems, Vol. 9, March 1998, pp. 249-260.


 D.P. Bertsekas et al., “Missile defense and interceptor allocation by neuro-dynamic programming”, IEEE Transactions on Systems, Man and Cybernetics Part A, Vol 30(1), 2000, pp. 42-51. 


 P. Marbach, O. Mihatch and J. Tsitsiklis, “Call admission control and routing in integrated services networks using neuro-dynamic programming”, IEEE Journal on Selected Areas in Communications, Vol. 18(2), Feb. 2000, pp. 197-208.  [Arik S.]


 H. Tong and T.X. Brown, “Reinforcement learning for call admission control and routing under quality of service constraints in multimedia networks”, Machine Learning, Vol. 49(2), 2002, pp. 111-139. 


 G. Tesauro et. al, “Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning”, NIPS 2007. 


Robotics and Control:


J. Morimoto and K. Doya, “Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning”, Robotics and Autonomous Systems, Vol. 36, 2001, pp. 37-51.


C. Atkeson, A. Moore and S. Schaal, Locally weighted learning for control, Artificial Intelligence Review, 11(1-5), 1997, pp.75-113.


 P. Stone, R.S. Sutton and G. Kuhlmann, “Reinforcement Learning for RoboCup-Soccer Keepaway”, Adaptive Behavior, vol. 13(3), 2005, pp. 165-188. 


 C. Kwok and D. Fox, “Reinforcement Learning for Sensing Strategies”. Proceedings of IROS, 2004.  [Amit W.]


 S. Jodogne and J.H. Piater, “Closed-Loop Learning of Visual Control Policies”, Journal of Artificial Intelligence Research 28, 2007, pp. 349–391.  [Yehuda F.]


 A. Ng et. al.,. Inverted autonomous helicopter flight via reinforcement learning. In Proc. Of the International Symposium on Experimental Robotics, 2004.
Together with: P. Abbeel et al, “An Application of Reinforcement Learning to Aerobatic Helicopter Flight”, in NIPS 19, 2007. 


J. Bagnell and J. Schneider, “Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods”, Proceedings of the International Conference on Robotics and Automation (ICRA) 2001.


J. Peng and E. Bhanu, Closed-loop object recognition using reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (2), 1998, pp. 139–154.


T. Martınez-Martin and T. Duckett. Fast reinforcement learning for vision-guided mobile robots. In Proc. of the IEEE International Conference on Robotics and Automation, 2005, pp. 4170–4175. Also see: M. Shaker, S. Yue and T. Duckett, Vision-based reinforcement learning using approximate policy iteration. In: 14th International Conference on Advanced Robotics (ICAR), June 2009.




P.-Y. Yin, “Maximum entropy-based optimal threshold selection using deterministic
reinforcement learning with controlled randomization”. Signal Processing, 82, 2002, pp. 993–1006.


S. Singh, D. Litman, M. Kearns and M. Walker. “Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System”. Journal of Artificial Intelligence Research, Vol. 16, pp. 105-133, 2002.


Added 15/6:


Y. Li, C. Szepesvari and D. Schuurmans, “Learning Exercise Policies for American Options”, Proc. AISTAT-09, pp. 16-18 Apr 2009. Also look at: Y. Li and D. Schuurmans, “Policy Iteration for Learning an Exercise Policy for American Options”, Proc. EWRL 2008, pp. 165-178.


 S. Proper and P. Tadepalli, “Scaling Model-Based Average-reward Reinforcement Learning for Product Delivery”, in ECML 2006: Proceedings of the 17th European Conference on Machine Learning, pp. 735-742.  [Yahel D.]


 Simao, H. P., J. Day, A. George, T. Gifford, W. B. Powell, “An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application,” Transportation Science 43(2), 2009, pp. 178-197.  [Doron P.]


 Papadaki, K. and W.B. Powell, “An Adaptive Dynamic Programming Algorithm for a Stochastic Multiproduct Batch Dispatch Problem,” Naval Research Logistics, Vol. 50, No. 7, pp. 742-769, 2003.  [Mark S.]