Skip to main content

Week of 3/11-- Technical Journal

Week of 3/11

     Our original goal for this week (these two weeks?) was to find a dataset we could train a supervised learning algorithm with to play tic-tac-toe with. We planned to create the game and then have Mr. Lee's 4th period AP Computer Science class play the game to generate the game states and the data for the algorithm to learn from. This, however, turned out to be more trouble than it was worth. For instance, we went online to find some datasets for tic-tac-toe game states. This would make the supervised learning process much easier. However, the data that we found either was formatted really weird or only represented endgame states. This created some problems. First, just having the endgame data sets won't teach the algorithm anything, and it wouldn't feed the algorithm with any information about optimal game states or plays/actions that it can make. Additionally, the strange formatting on some of the data make it unusable for us. We had to format the data in such a way that could be accessible by the code, and if we couldn't decipher the dataset to organize it in the first place, then the data would still be useless regardless. We also had a dataset that had every single gamestate, but since it doesn't have game state/action pairs, then it would be difficult to assign weights and rewards to those actions/states. 

x,x,x,x,o,o,x,o,o,positive
x,x,x,x,o,o,o,x,o,positive
x,x,x,x,o,o,o,b,b,positive
x,x,x,x,o,o,o,o,x,positive
x,x,x,x,o,o,b,o,b,positive
x,x,x,x,o,b,o,b,o,positive
x,x,x,x,o,o,b,b,o,positive
x,x,x,x,o,b,o,o,b,positive
x,x,x,o,x,o,x,o,o,positive
x,x,x,x,o,b,b,o,o,positive
x,x,x,x,b,o,o,o,b,positive
x,x,x,x,b,o,b,o,o,positive
x,x,x,x,b,o,o,b,o,positive
x,x,x,o,x,o,o,x,o,positive
x,x,x,o,x,b,o,o,b,positive
x,x,x,o,x,o,o,o,x,positive
x,x,x,o,x,o,o,b,b,positive
x,x,x,o,x,o,b,b,o,positive
x,x,x,o,x,o,b,o,b,positive
x,x,x,o,x,b,o,b,o,positive
x,x,x,o,o,x,o,o,x,positive
x,x,x,o,x,b,b,o,o,positive
x,x,x,o,o,x,x,o,o,positive
x,x,x,o,o,x,o,x,o,positive
Example from the UCI repository. This only has the endgame game states.
     We scrounged the web for more datasets, but our quest for Tic Tac Toe datasets turned out to be futile. Then, we decided to look in a direction and see if we could implement q-learning on a tic-tac-toe game board to let the agent learn how to play tic tac toe. It's kind of a solved game, but hey, it's always good to know what you're doing before you face the big project. We had a general idea of what to make and we mostly knew where we were going from, and some extensive research really blew our minds. I found some GitHub repositories of the games that used Q-agent learning in order to play tic-tac-toe, and I downloaded it and tried to test them out and decipher the code it contained within its treasure box. The best q-learning agent/code that we found only tested different Q-tables and created new Q-tables every time we ran it. It didn't actually allow for users to interact, and it only played the game by itself 100,000 times until it reached a tie every time it played itself. Or maybe I just was remiss in reading the code and didn't catch the part where I could run the game and play against the trained algorithm. We're still figuring that out.
The right side is the Tic-Tac-Toe code.
     Will, being the passionate Monte-Carlo Search Tree advocate that he is, found a code for tic-tac-toe that was played by a bot using a Monte-Carlo Search Tree engine. I've beat it a few times, but it's seriously impressive. It randomly explores different game states and actions and assigns its weights and rewards based on the game states that it explores. It then decides which game states should be pursued and further explored. It uses this process to test the importance and usefulness of each game state. The advantage of this search tree is that it's "smarter" than the Q-learning agents. This search tree can favor strategies, which is important especially when playing in a game like dots and boxes or tic-tac-toe.
Will's Monte-Carlo "baby"
     We also found a Github repository that had a dots-and-boxes played and learning using Q-learning. It's a pretty tough bot to beat, but I've bested it quite a few times before. I really like the interface it has, and I want to see how I can replicate it for our own dots and boxes game. One of the flaws I've seen with this game, however, is that it's not smart like the Monte-Carlo game. Its main purpose is just to get boxes; there are no strategies that can be found in this game. My main goals for this game is to be able to expand it's Q-table and maybe increase the size of the game board from a 4x4 gamespace to a 5x5 gamespace. I tried doing that, but because of the limitations of the Q-table, it wouldn't run. I will have to learn how to redo a Q-table for this game, which will take quite a while, if I'm being honest here. I'm overall pretty optimistic for our work ahead, though. :)
The left side is the Dots and Boxes code

Comments