Based on Sutton and Barto's "Reinforcement Learning", I did a simple python implementation of tic tac toe.
The algorithm has no knowledge of tic tac toe, except it knows when it has won or lost. By playing against you and itself it gradually learns how to play well.
This particular implementation does not take "exploratory" steps, but it would be simple to implement.
Python code