Skip to main content

Week of 4/24

Week of 4/24

     Throughout the weeks, we worked to refine our knowledge on Q-learning and the whole jazz. Mr. Lee asked Elaine to help us create an aesthetically pleasing and functional dots and boxes interface. We saw the initial version, and to say that's its impressive would be an understatement. It looks really nice and well-made, and it works very well, so honestly, I have no room to complain. There were a few aesthetic improvements we wanted to touch up on, and I'm honestly so thankful to Elaine for her help. The game is going great and I'm excited to see what we can work on in the future when we integrate the interface with the actual game itself. It's gonna be epic. 

Here's what it looks like right now. Exciting, right?

     Since I kind of have to show y'all the code, here's an excerpt: 



Whoever told us that Q-learning only took 1-2 lines of code freaking lied to us. There's variables to initialize, moves to define, game boards to set up... I couldn't have figured this out on my own. You can spot the basic Q-learning function near the bottom. I love going back to the basics. 

      I mentioned earlier that we had a code already in our hands. That code had its own Q-tables, fitted and tested through a pre-made code that was on Github. The predetermined Q-table was functional and gave us quite a challenge in playing the games, but there were some issues with using a pre-made Q-table that troubled us deeply. First, we had no idea how to set up a Q-table on our own. That means that at best we'd just be copying someone else's work. That's not learning, not in my opinion. The second problem was that if given the current Q-table, we could only play 4x4 games of dots and boxes. This didn't allow for expansion and shrinking of the game board array. That's problematic since it only locks us into one option and one method of playing. Third, we had no idea how many values the current Q-table had. Robert did mention that he found a way to fix a line or two and get it to work right, so we are hoping for that

Here's the raw Q-table. Notice how many entries there are. Then notice the scroll bar on the side. And then think about how this is only for a 4x4 gameboard. 

      The best Q-table has already sorted through every single game state and action pair within a given game; that is, it already knows the results of certain games and moves and the exact rewards of performing each before it even plays the game. This in the context of dots and boxes, however, produces some problems. First, a complete Q-table would require for us to test every single gamestate and game action pair. For dots and boxes, this requires an incredible amount of memory and time for a computer like Mr. Lee's to run. Normally, we run the code during the period, but that definitely was not enough time for us to do anything or retain any substantial Q-values. We wanted to run the code over the weekend, but it was totally my fault that I forgot to do it. I realized my mistake Saturday afternoon. That was incredibly disappointing. 

This is printed to us after every 100 games and reports to us the ratio of wins/losses. A balanced (50/50) ratio would be the best Q-result. We haven't hit that value, but we hope that we can run it eventually for long enough for that to happen. 

      Will wants to use our Q-tables with his Monte-Carlo search tree. I'm still not sure how that's gonna work out exactly, but I think that he'll use the initial Q-reward values to basically jumpstart the Monte-Carlo algorithms and see if that can increase its efficiency when "branching out" to its other decision nodes. I'm not too sure how the current code actually stores the gamestates and the rewards, but I'm sure Will is on it to figure it out. He's been gone recently, and he has the code with him, so I think our best bet with the entire Q-learning/Monte-Carlo Frankenstein thing will be when he comes back and we can actually come together and have a conversation about it together.

Comments