Week of 4/10

As per my last blog post, no one showed up to the last Caltech meeting last time so we really did not have that much instruction as to what we should be doing next. We had a lot of questions on implementing Q-learning and understanding the whole reward function aspect of Q-learning, and these questions are critical to the application of our knowledge into code for the actual project. Therefore, we had to find a way to communicate with either the professor or the grad student soon in order to proceed with our project. In an effort to get the questions answered as soon as possible, I attempted to email Dr. Hassibi our questions and ask about whether or not he could come in anytime and help answer some of our questions. While we waited for him to respond to the emails, we tried to still delve a little deeper into what we already had.

Whoever told us that Q-learning only took 1-2 lines of code freaking lied to us. There's variables to initialize, moves to define, game boards to set up... I couldn't have figured this out on my own. You can spot the basic Q-learning function near the bottom. I love going back to the basics.

I mentioned earlier that we had a code already in our hands. That code had its own Q-tables, fitted and tested through a pre-made code that was on Github. The predetermined Q-table was functional and gave us quite a challenge in playing the games, but there were some issues with using a pre-made Q-table that troubled us deeply. First, we had no idea how to set up a Q-table on our own. That means that at best we'd just be copying someone else's work. That's not learning, not in my opinion. The second problem was that if given the current Q-table, we could only play 4x4 games of dots and boxes. This didn't allow for expansion and shrinking of the game board array. That's problematic since it only locks us into one option and one method of playing. Third, we had no idea how many values the current Q-table had.

Here's the raw Q-table. Notice how many entries there are. Then notice the scroll bar on the side. And then think about how this is only for a 4x4 gameboard.

The best Q-table has already sorted through every single game state and action pair within a given game; that is, it already knows the results of certain games and moves and the exact rewards of performing each before it even plays the game. This in the context of dots and boxes, however, produces some problems. First, a complete Q-table would require for us to test every single gamestate and game action pair. For dots and boxes, this requires an incredible amount of memory and time for a computer like Mr. Lee's to run. Normally, we run the code during the period, but that definitely was not enough time for us to do anything or retain any substantial Q-values. We tried running the code over the weekend, but the power went out and we were left resultless. That was incredibly disappointing.

This is printed to us after every 100 games and reports to us the ratio of wins/losses. A balanced (50/50) ratio would be the best Q-result. We haven't hit that value, but we hope that we can run it eventually for long enough for that to happen.

We are making Q-tables with a 5x5 game board. What makes this difficult that a 4x4 game board already required 2^24 unique game states (not counting the actions). A 5x5 game board requires 2^40 game states, which is insanely huge if you actually think about it. That's probably one of the reasons why the Q-table is taking forever to establish its Q-values. I might want to test the code out and downsize the gameboard to 3x3 before I actually want to expand it to a 5x5 board. That might be my next goal. Dr. Hassibi visited us on Monday. It was really awkward because he kept on asking us back the same questions we tried to ask him, particularly about Q-values and reward functions. He's coming back tomorrow (Wednesday), so I hope that maybe some progress could be made. I brushed up a little on the Q-learning algorithm basics so hopefully it'll make it less awkward.

Update! We had a conversation with Dr. Hassibi. The conversation was productive and we crystallized the meaning of the reward function and how to find game-states?

The fruits of our discussion.

$73M C4L73CH J0URN4L

Search This Blog

Week of 4/10

Week of 4/10

Comments

Post a Comment