昨日は関数近似にニューラルネットワークを使ったSarsa()法を実装してみた。
けど、うまくいかなかったので、いろいろパラメータを変えてみるということをやってみた。
パラメータテスト
パラメータとして調整しないといけないものは、次の4つ:
- 中間層のユニット数
- ステップサイズ
- トレース減衰パラメータ
- 学習回数
そこで、これらについて、それぞれ以下のように変えたときにどうなるのかを調べてみた:
- 中間層のユニット数
4, 8, 16 - ステップサイズ
0.01, 0.005, 0.001 - トレース減衰パラメータ
0.3, 0.6, 0.9 - 学習回数
20,000, 40,000, 60,000, 80,000, 100,000
書いたのは次のようなコード。
#==================== # parameter_test.rb #-------------------- # パラメータによる学習の違いの確認 #==================== require_relative "mark" require_relative "game" require_relative "nn_sarsa_com" sample_size = 10 hidden_unit_sizes = [4, 8, 16] step_sizes = [0.01, 0.005, 0.001] td_lambdas = [0.3, 0.6, 0.9] time_stamps = [20000, 40000, 60000, 80000, 100000] maru_filename_format = "parameter_test/maru%03d_hidden%02d_step%5.3f_lambda%3.1f_%06d.dat" batsu_filename_format = "parameter_test/batsu%02d_hidden%02d_step%5.3f_lambda%3.1f_%06d.dat" # create if ARGV[0] == "create" hidden_unit_sizes.each do |hidden_unit_size| step_sizes.each do |step_size| td_lambdas.each do |td_lambda| sample_size.times do |sample| maru_player = NNSarsaCom.new(Mark::Maru, hidden_unit_size, 0.1, step_size, td_lambda) batsu_player = NNSarsaCom.new(Mark::Batsu, hidden_unit_size, 0.1, step_size, td_lambda) (1..100000).each do |i| game = Game.new(maru_player, batsu_player) game.start(false) if time_stamps.index(i) puts "[hidden: #{hidden_unit_size}, step: #{step_size}, lambda: #{td_lambda}] sample: #{sample} - #{i}" maru_filename = sprintf maru_filename_format, sample, hidden_unit_size, step_size, td_lambda, i batsu_filename = sprintf batsu_filename_format, sample, hidden_unit_size, step_size, td_lambda, i maru_player.save(maru_filename) batsu_player.save(batsu_filename) end end end end end end end # for access from methods @sample_size = sample_size @maru_filename_format = maru_filename_format @batsu_filename_format = batsu_filename_format def load_maru_players(hidden_unit_size, step_size, td_lambda, time_stamp) maru_players = Array.new @sample_size.times do |sample| filename = sprintf @maru_filename_format, sample, hidden_unit_size, step_size, td_lambda, time_stamp maru_player = NNSarsaCom.load(filename) maru_player.learn_mode = false maru_players.push maru_player end maru_players end def load_batsu_players(hidden_unit_size, step_size, td_lambda, time_stamp) batsu_players = Array.new @sample_size.times do |sample| filename = sprintf @batsu_filename_format, sample, hidden_unit_size, step_size, td_lambda, time_stamp batsu_player = NNSarsaCom.load(filename) batsu_player.learn_mode = false batsu_players.push batsu_player end batsu_players end def check_win_rate(maru_players, batsu_players) total_count = 0.0 maru_win_count = 0.0 batsu_win_count = 0.0 draw_count = 0.0 maru_players.each do |maru_player| batsu_players.each do |batsu_player| total_count += 1.0 maru_player.learn_mode = false batsu_player.learn_mode = false game = Game.new(maru_player, batsu_player) case game.start(false) when Mark::Maru maru_win_count += 1.0 when Mark::Batsu batsu_win_count += 1.0 when Mark::Empty draw_count += 1.0 end end end [maru_win_count/total_count, batsu_win_count/total_count, draw_count/total_count] end # 続く
引数に"create"が指定された場合、中間層のユニット数、ステップサイズ、 のそれぞれの組合せについて、10回ずつ試行が行われる。
そして、20,000回、40,000回、60,000回、80,000回、100,000回学習したタイミングでデータが保存され、あとでいろいろ試すことが出来るようにした。
それと、保存したデータを簡単に復元するためのメソッドと、勝率を調べるためのメソッドを用意してある。
学習が進んでいるかの確認
まずやったのは、そもそも学習がちゃんと進んでいるかの確認。
理屈からいえば、学習が進んだなら、学習回数の少ない相手には勝てるはず。
# 続き # compare with time stamps if ARGV[0] == "time_stamp" hidden_unit_sizes.each do |hidden_unit_size| step_sizes.each do |step_size| td_lambdas.each do |td_lambda| puts "[hidden unit: #{hidden_unit_size}, step size: #{step_size}, td lambda: #{td_lambda}]" puts "---------------------------------------------------------------------------------------" puts "maru\\batsu | 20000 40000 60000 80000 100000" puts "---------------------------------------------------------------------------------------" time_stamps.each do |maru_time_stamp| maru_players = load_maru_players(hidden_unit_size, step_size, td_lambda, maru_time_stamp) print sprintf "%10d |", maru_time_stamp time_stamps.each do |batsu_time_stamp| batsu_players = load_batsu_players(hidden_unit_size, step_size, td_lambda, batsu_time_stamp) result = check_win_rate(maru_players, batsu_players) print sprintf " %4.2f/%4.2f/%4.2f", *result end print "\n" end puts "---------------------------------------------------------------------------------------" puts "" end end end end # 続く
引数に"time_stamps"が指定された場合、中間層のユニット数、ステップサイズ、 のそれぞれの組合せについて、各学習回数のAIの勝率が表示されるようにしている。
これを実行した結果が、以下。
各セルは、「○の勝率/×の勝率/引き分け率」となっている。
[hidden unit: 4, step size: 0.01, td lambda: 0.3] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.70/0.27/0.03 0.68/0.27/0.05 0.66/0.31/0.03 0.69/0.27/0.04 0.71/0.28/0.01 40000 | 0.66/0.33/0.01 0.76/0.23/0.01 0.61/0.31/0.08 0.67/0.30/0.03 0.70/0.28/0.02 60000 | 0.72/0.28/0.00 0.73/0.22/0.05 0.71/0.23/0.06 0.73/0.21/0.06 0.72/0.24/0.04 80000 | 0.63/0.33/0.04 0.63/0.28/0.09 0.74/0.24/0.02 0.76/0.21/0.03 0.70/0.29/0.01 100000 | 0.71/0.24/0.05 0.74/0.25/0.01 0.72/0.27/0.01 0.81/0.17/0.02 0.73/0.26/0.01 --------------------------------------------------------------------------------------- [hidden unit: 4, step size: 0.01, td lambda: 0.6] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.77/0.21/0.02 0.68/0.29/0.03 0.76/0.21/0.03 0.74/0.24/0.02 0.76/0.18/0.06 40000 | 0.77/0.22/0.01 0.78/0.21/0.01 0.77/0.21/0.02 0.72/0.25/0.03 0.78/0.21/0.01 60000 | 0.77/0.19/0.04 0.70/0.27/0.03 0.73/0.23/0.04 0.71/0.25/0.04 0.70/0.28/0.02 80000 | 0.84/0.11/0.05 0.78/0.21/0.01 0.81/0.17/0.02 0.77/0.19/0.04 0.79/0.16/0.05 100000 | 0.90/0.08/0.02 0.88/0.08/0.04 0.86/0.10/0.04 0.83/0.14/0.03 0.85/0.13/0.02 --------------------------------------------------------------------------------------- [hidden unit: 4, step size: 0.01, td lambda: 0.9] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.70/0.26/0.04 0.64/0.31/0.05 0.64/0.30/0.06 0.68/0.26/0.06 0.69/0.24/0.07 40000 | 0.75/0.20/0.05 0.76/0.23/0.01 0.81/0.15/0.04 0.68/0.31/0.01 0.78/0.18/0.04 60000 | 0.83/0.14/0.03 0.86/0.10/0.04 0.75/0.19/0.06 0.74/0.21/0.05 0.80/0.16/0.04 80000 | 0.80/0.17/0.03 0.90/0.09/0.01 0.80/0.14/0.06 0.80/0.17/0.03 0.82/0.14/0.04 100000 | 0.83/0.16/0.01 0.76/0.21/0.03 0.79/0.18/0.03 0.80/0.20/0.00 0.83/0.14/0.03 --------------------------------------------------------------------------------------- [hidden unit: 4, step size: 0.005, td lambda: 0.3] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.74/0.23/0.03 0.79/0.19/0.02 0.80/0.18/0.02 0.72/0.25/0.03 0.76/0.22/0.02 40000 | 0.68/0.28/0.04 0.75/0.20/0.05 0.70/0.25/0.05 0.67/0.26/0.07 0.66/0.28/0.06 60000 | 0.67/0.26/0.07 0.63/0.32/0.05 0.63/0.33/0.04 0.67/0.28/0.05 0.67/0.28/0.05 80000 | 0.64/0.25/0.11 0.70/0.25/0.05 0.65/0.29/0.06 0.67/0.27/0.06 0.68/0.29/0.03 100000 | 0.70/0.23/0.07 0.75/0.21/0.04 0.69/0.22/0.09 0.69/0.28/0.03 0.70/0.28/0.02 --------------------------------------------------------------------------------------- [hidden unit: 4, step size: 0.005, td lambda: 0.6] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.74/0.26/0.00 0.78/0.20/0.02 0.81/0.19/0.00 0.72/0.22/0.06 0.74/0.22/0.04 40000 | 0.81/0.16/0.03 0.82/0.15/0.03 0.84/0.15/0.01 0.72/0.24/0.04 0.73/0.23/0.04 60000 | 0.73/0.24/0.03 0.79/0.18/0.03 0.79/0.19/0.02 0.72/0.23/0.05 0.67/0.25/0.08 80000 | 0.77/0.22/0.01 0.81/0.17/0.02 0.76/0.22/0.02 0.73/0.24/0.03 0.73/0.22/0.05 100000 | 0.82/0.15/0.03 0.85/0.12/0.03 0.83/0.13/0.04 0.78/0.16/0.06 0.81/0.14/0.05 --------------------------------------------------------------------------------------- [hidden unit: 4, step size: 0.005, td lambda: 0.9] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.76/0.19/0.05 0.74/0.22/0.04 0.69/0.28/0.03 0.62/0.32/0.06 0.63/0.32/0.05 40000 | 0.77/0.16/0.07 0.71/0.24/0.05 0.74/0.21/0.05 0.79/0.17/0.04 0.72/0.23/0.05 60000 | 0.79/0.18/0.03 0.72/0.25/0.03 0.70/0.27/0.03 0.74/0.22/0.04 0.73/0.22/0.05 80000 | 0.90/0.09/0.01 0.87/0.10/0.03 0.84/0.15/0.01 0.87/0.12/0.01 0.82/0.13/0.05 100000 | 0.86/0.13/0.01 0.83/0.14/0.03 0.80/0.17/0.03 0.79/0.17/0.04 0.81/0.15/0.04 --------------------------------------------------------------------------------------- [hidden unit: 4, step size: 0.001, td lambda: 0.3] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.61/0.29/0.10 0.64/0.24/0.12 0.62/0.28/0.10 0.65/0.28/0.07 0.66/0.25/0.09 40000 | 0.66/0.30/0.04 0.66/0.25/0.09 0.63/0.26/0.11 0.69/0.28/0.03 0.67/0.27/0.06 60000 | 0.65/0.34/0.01 0.68/0.29/0.03 0.69/0.26/0.05 0.71/0.22/0.07 0.66/0.27/0.07 80000 | 0.62/0.31/0.07 0.77/0.20/0.03 0.73/0.21/0.06 0.73/0.23/0.04 0.65/0.28/0.07 100000 | 0.64/0.31/0.05 0.72/0.23/0.05 0.68/0.25/0.07 0.66/0.27/0.07 0.62/0.27/0.11 --------------------------------------------------------------------------------------- [hidden unit: 4, step size: 0.001, td lambda: 0.6] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.71/0.21/0.08 0.65/0.28/0.07 0.63/0.33/0.04 0.57/0.40/0.03 0.67/0.30/0.03 40000 | 0.75/0.17/0.08 0.71/0.20/0.09 0.62/0.32/0.06 0.56/0.34/0.10 0.71/0.25/0.04 60000 | 0.77/0.12/0.11 0.71/0.21/0.08 0.63/0.27/0.10 0.57/0.33/0.10 0.66/0.27/0.07 80000 | 0.82/0.14/0.04 0.82/0.13/0.05 0.81/0.18/0.01 0.72/0.22/0.06 0.75/0.23/0.02 100000 | 0.75/0.18/0.07 0.74/0.18/0.08 0.77/0.20/0.03 0.73/0.23/0.04 0.76/0.21/0.03 --------------------------------------------------------------------------------------- [hidden unit: 4, step size: 0.001, td lambda: 0.9] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.64/0.29/0.07 0.61/0.30/0.09 0.57/0.38/0.05 0.58/0.37/0.05 0.61/0.37/0.02 40000 | 0.76/0.16/0.08 0.68/0.23/0.09 0.69/0.27/0.04 0.73/0.24/0.03 0.75/0.23/0.02 60000 | 0.67/0.24/0.09 0.66/0.23/0.11 0.67/0.27/0.06 0.66/0.30/0.04 0.67/0.27/0.06 80000 | 0.67/0.26/0.07 0.59/0.30/0.11 0.65/0.28/0.07 0.64/0.30/0.06 0.66/0.30/0.04 100000 | 0.62/0.32/0.06 0.60/0.34/0.06 0.64/0.31/0.05 0.58/0.35/0.07 0.62/0.32/0.06 --------------------------------------------------------------------------------------- [hidden unit: 8, step size: 0.01, td lambda: 0.3] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.80/0.18/0.02 0.78/0.17/0.05 0.77/0.21/0.02 0.76/0.24/0.00 0.76/0.23/0.01 40000 | 0.77/0.17/0.06 0.77/0.16/0.07 0.74/0.21/0.05 0.72/0.23/0.05 0.68/0.29/0.03 60000 | 0.89/0.11/0.00 0.77/0.18/0.05 0.81/0.17/0.02 0.78/0.22/0.00 0.72/0.26/0.02 80000 | 0.83/0.14/0.03 0.83/0.11/0.06 0.74/0.20/0.06 0.78/0.17/0.05 0.72/0.23/0.05 100000 | 0.88/0.11/0.01 0.80/0.15/0.05 0.84/0.16/0.00 0.75/0.20/0.05 0.75/0.22/0.03 --------------------------------------------------------------------------------------- [hidden unit: 8, step size: 0.01, td lambda: 0.6] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.77/0.20/0.03 0.72/0.26/0.02 0.70/0.23/0.07 0.70/0.25/0.05 0.74/0.23/0.03 40000 | 0.82/0.13/0.05 0.82/0.15/0.03 0.78/0.15/0.07 0.76/0.21/0.03 0.75/0.20/0.05 60000 | 0.78/0.18/0.04 0.78/0.19/0.03 0.72/0.19/0.09 0.66/0.26/0.08 0.75/0.20/0.05 80000 | 0.82/0.12/0.06 0.80/0.14/0.06 0.79/0.16/0.05 0.75/0.17/0.08 0.80/0.15/0.05 100000 | 0.89/0.10/0.01 0.84/0.13/0.03 0.87/0.09/0.04 0.86/0.12/0.02 0.89/0.06/0.05 --------------------------------------------------------------------------------------- [hidden unit: 8, step size: 0.01, td lambda: 0.9] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.79/0.19/0.02 0.68/0.31/0.01 0.72/0.25/0.03 0.62/0.31/0.07 0.62/0.28/0.10 40000 | 0.84/0.13/0.03 0.76/0.23/0.01 0.81/0.12/0.07 0.75/0.21/0.04 0.61/0.27/0.12 60000 | 0.82/0.17/0.01 0.80/0.20/0.00 0.74/0.22/0.04 0.69/0.29/0.02 0.69/0.24/0.07 80000 | 0.81/0.15/0.04 0.85/0.13/0.02 0.73/0.20/0.07 0.79/0.17/0.04 0.70/0.19/0.11 100000 | 0.89/0.07/0.04 0.84/0.11/0.05 0.78/0.14/0.08 0.85/0.07/0.08 0.75/0.13/0.12 --------------------------------------------------------------------------------------- [hidden unit: 8, step size: 0.005, td lambda: 0.3] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.76/0.19/0.05 0.69/0.26/0.05 0.69/0.26/0.05 0.67/0.25/0.08 0.69/0.25/0.06 40000 | 0.77/0.22/0.01 0.77/0.18/0.05 0.74/0.22/0.04 0.65/0.25/0.10 0.64/0.24/0.12 60000 | 0.81/0.18/0.01 0.74/0.22/0.04 0.73/0.22/0.05 0.68/0.22/0.10 0.70/0.18/0.12 80000 | 0.80/0.19/0.01 0.76/0.22/0.02 0.75/0.19/0.06 0.74/0.22/0.04 0.78/0.16/0.06 100000 | 0.74/0.21/0.05 0.72/0.23/0.05 0.72/0.26/0.02 0.73/0.23/0.04 0.80/0.14/0.06 --------------------------------------------------------------------------------------- [hidden unit: 8, step size: 0.005, td lambda: 0.6] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.74/0.21/0.05 0.73/0.20/0.07 0.74/0.20/0.06 0.76/0.20/0.04 0.76/0.24/0.00 40000 | 0.76/0.18/0.06 0.77/0.20/0.03 0.74/0.23/0.03 0.73/0.23/0.04 0.74/0.24/0.02 60000 | 0.72/0.22/0.06 0.80/0.17/0.03 0.75/0.24/0.01 0.79/0.21/0.00 0.83/0.15/0.02 80000 | 0.87/0.10/0.03 0.80/0.18/0.02 0.80/0.20/0.00 0.81/0.19/0.00 0.81/0.19/0.00 100000 | 0.82/0.10/0.08 0.87/0.13/0.00 0.88/0.11/0.01 0.84/0.15/0.01 0.86/0.12/0.02 --------------------------------------------------------------------------------------- [hidden unit: 8, step size: 0.005, td lambda: 0.9] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.79/0.17/0.04 0.76/0.24/0.00 0.71/0.28/0.01 0.72/0.27/0.01 0.69/0.30/0.01 40000 | 0.83/0.16/0.01 0.79/0.19/0.02 0.73/0.23/0.04 0.76/0.20/0.04 0.78/0.17/0.05 60000 | 0.82/0.18/0.00 0.82/0.15/0.03 0.77/0.19/0.04 0.73/0.22/0.05 0.83/0.15/0.02 80000 | 0.89/0.11/0.00 0.86/0.12/0.02 0.82/0.15/0.03 0.80/0.15/0.05 0.86/0.10/0.04 100000 | 0.86/0.14/0.00 0.88/0.09/0.03 0.84/0.11/0.05 0.85/0.10/0.05 0.79/0.14/0.07 --------------------------------------------------------------------------------------- [hidden unit: 8, step size: 0.001, td lambda: 0.3] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.73/0.23/0.04 0.68/0.28/0.04 0.64/0.31/0.05 0.62/0.32/0.06 0.67/0.29/0.04 40000 | 0.71/0.27/0.02 0.67/0.26/0.07 0.67/0.27/0.06 0.60/0.34/0.06 0.67/0.30/0.03 60000 | 0.70/0.24/0.06 0.71/0.24/0.05 0.68/0.28/0.04 0.62/0.34/0.04 0.63/0.31/0.06 80000 | 0.72/0.19/0.09 0.66/0.26/0.08 0.69/0.25/0.06 0.67/0.27/0.06 0.69/0.26/0.05 100000 | 0.73/0.21/0.06 0.69/0.24/0.07 0.72/0.22/0.06 0.70/0.28/0.02 0.71/0.26/0.03 --------------------------------------------------------------------------------------- [hidden unit: 8, step size: 0.001, td lambda: 0.6] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.62/0.31/0.07 0.63/0.27/0.10 0.64/0.30/0.06 0.64/0.30/0.06 0.61/0.36/0.03 40000 | 0.73/0.18/0.09 0.71/0.21/0.08 0.67/0.25/0.08 0.63/0.30/0.07 0.67/0.30/0.03 60000 | 0.69/0.24/0.07 0.73/0.22/0.05 0.68/0.28/0.04 0.68/0.27/0.05 0.68/0.30/0.02 80000 | 0.72/0.23/0.05 0.76/0.21/0.03 0.66/0.29/0.05 0.68/0.27/0.05 0.70/0.28/0.02 100000 | 0.72/0.22/0.06 0.79/0.16/0.05 0.76/0.23/0.01 0.75/0.19/0.06 0.72/0.20/0.08 --------------------------------------------------------------------------------------- [hidden unit: 8, step size: 0.001, td lambda: 0.9] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.78/0.14/0.08 0.72/0.16/0.12 0.66/0.21/0.13 0.73/0.18/0.09 0.70/0.21/0.09 40000 | 0.84/0.11/0.05 0.75/0.17/0.08 0.79/0.15/0.06 0.77/0.15/0.08 0.77/0.17/0.06 60000 | 0.76/0.18/0.06 0.72/0.22/0.06 0.76/0.21/0.03 0.76/0.21/0.03 0.76/0.22/0.02 80000 | 0.86/0.07/0.07 0.77/0.17/0.06 0.75/0.22/0.03 0.83/0.13/0.04 0.77/0.19/0.04 100000 | 0.79/0.14/0.07 0.78/0.14/0.08 0.73/0.20/0.07 0.78/0.14/0.08 0.77/0.19/0.04 --------------------------------------------------------------------------------------- [hidden unit: 16, step size: 0.01, td lambda: 0.3] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.74/0.21/0.05 0.70/0.25/0.05 0.61/0.35/0.04 0.62/0.36/0.02 0.61/0.36/0.03 40000 | 0.83/0.13/0.04 0.86/0.12/0.02 0.81/0.16/0.03 0.82/0.14/0.04 0.81/0.15/0.04 60000 | 0.81/0.17/0.02 0.81/0.15/0.04 0.79/0.21/0.00 0.80/0.19/0.01 0.73/0.26/0.01 80000 | 0.84/0.13/0.03 0.77/0.16/0.07 0.78/0.19/0.03 0.78/0.16/0.06 0.80/0.18/0.02 100000 | 0.80/0.18/0.02 0.71/0.26/0.03 0.71/0.26/0.03 0.74/0.23/0.03 0.78/0.20/0.02 --------------------------------------------------------------------------------------- [hidden unit: 16, step size: 0.01, td lambda: 0.6] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.82/0.17/0.01 0.77/0.21/0.02 0.82/0.12/0.06 0.68/0.24/0.08 0.68/0.25/0.07 40000 | 0.82/0.16/0.02 0.84/0.12/0.04 0.90/0.08/0.02 0.69/0.27/0.04 0.70/0.25/0.05 60000 | 0.81/0.17/0.02 0.82/0.13/0.05 0.88/0.07/0.05 0.75/0.20/0.05 0.68/0.23/0.09 80000 | 0.83/0.15/0.02 0.80/0.12/0.08 0.84/0.07/0.09 0.77/0.18/0.05 0.71/0.20/0.09 100000 | 0.85/0.12/0.03 0.77/0.14/0.09 0.83/0.08/0.09 0.72/0.20/0.08 0.69/0.19/0.12 --------------------------------------------------------------------------------------- [hidden unit: 16, step size: 0.01, td lambda: 0.9] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.79/0.19/0.02 0.78/0.21/0.01 0.73/0.24/0.03 0.70/0.28/0.02 0.70/0.28/0.02 40000 | 0.85/0.14/0.01 0.83/0.13/0.04 0.76/0.19/0.05 0.76/0.20/0.04 0.76/0.22/0.02 60000 | 0.90/0.07/0.03 0.91/0.04/0.05 0.89/0.08/0.03 0.77/0.19/0.04 0.78/0.17/0.05 80000 | 0.91/0.09/0.00 0.87/0.07/0.06 0.83/0.11/0.06 0.78/0.19/0.03 0.81/0.16/0.03 100000 | 0.93/0.07/0.00 0.79/0.18/0.03 0.78/0.18/0.04 0.74/0.23/0.03 0.79/0.18/0.03 --------------------------------------------------------------------------------------- [hidden unit: 16, step size: 0.005, td lambda: 0.3] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.62/0.30/0.08 0.59/0.34/0.07 0.70/0.28/0.02 0.56/0.39/0.05 0.58/0.39/0.03 40000 | 0.76/0.22/0.02 0.74/0.25/0.01 0.77/0.21/0.02 0.77/0.21/0.02 0.71/0.26/0.03 60000 | 0.70/0.25/0.05 0.73/0.25/0.02 0.78/0.21/0.01 0.73/0.26/0.01 0.70/0.28/0.02 80000 | 0.74/0.22/0.04 0.80/0.16/0.04 0.82/0.15/0.03 0.77/0.17/0.06 0.76/0.20/0.04 100000 | 0.77/0.21/0.02 0.75/0.17/0.08 0.81/0.16/0.03 0.75/0.21/0.04 0.72/0.24/0.04 --------------------------------------------------------------------------------------- [hidden unit: 16, step size: 0.005, td lambda: 0.6] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.83/0.16/0.01 0.79/0.18/0.03 0.77/0.22/0.01 0.74/0.26/0.00 0.70/0.28/0.02 40000 | 0.69/0.26/0.05 0.73/0.22/0.05 0.70/0.28/0.02 0.70/0.28/0.02 0.66/0.31/0.03 60000 | 0.85/0.13/0.02 0.91/0.08/0.01 0.82/0.15/0.03 0.80/0.17/0.03 0.73/0.25/0.02 80000 | 0.86/0.12/0.02 0.93/0.06/0.01 0.83/0.14/0.03 0.83/0.14/0.03 0.78/0.19/0.03 100000 | 0.82/0.14/0.04 0.87/0.10/0.03 0.82/0.16/0.02 0.82/0.14/0.04 0.72/0.25/0.03 --------------------------------------------------------------------------------------- [hidden unit: 16, step size: 0.005, td lambda: 0.9] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.80/0.17/0.03 0.79/0.15/0.06 0.68/0.25/0.07 0.71/0.24/0.05 0.67/0.26/0.07 40000 | 0.81/0.15/0.04 0.88/0.11/0.01 0.78/0.18/0.04 0.81/0.16/0.03 0.80/0.16/0.04 60000 | 0.81/0.16/0.03 0.79/0.20/0.01 0.75/0.21/0.04 0.73/0.23/0.04 0.73/0.19/0.08 80000 | 0.92/0.06/0.02 0.92/0.07/0.01 0.86/0.10/0.04 0.86/0.11/0.03 0.83/0.15/0.02 100000 | 0.83/0.13/0.04 0.88/0.11/0.01 0.80/0.16/0.04 0.84/0.13/0.03 0.78/0.18/0.04 --------------------------------------------------------------------------------------- [hidden unit: 16, step size: 0.001, td lambda: 0.3] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.68/0.26/0.06 0.69/0.26/0.05 0.70/0.25/0.05 0.68/0.26/0.06 0.64/0.29/0.07 40000 | 0.70/0.27/0.03 0.62/0.33/0.05 0.64/0.33/0.03 0.67/0.30/0.03 0.65/0.28/0.07 60000 | 0.77/0.21/0.02 0.70/0.27/0.03 0.78/0.20/0.02 0.68/0.29/0.03 0.68/0.30/0.02 80000 | 0.77/0.19/0.04 0.77/0.22/0.01 0.79/0.18/0.03 0.75/0.22/0.03 0.75/0.22/0.03 100000 | 0.81/0.18/0.01 0.77/0.21/0.02 0.76/0.21/0.03 0.78/0.17/0.05 0.74/0.21/0.05 --------------------------------------------------------------------------------------- [hidden unit: 16, step size: 0.001, td lambda: 0.6] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.83/0.12/0.05 0.81/0.12/0.07 0.82/0.13/0.05 0.72/0.25/0.03 0.75/0.22/0.03 40000 | 0.79/0.13/0.08 0.76/0.17/0.07 0.76/0.18/0.06 0.74/0.20/0.06 0.75/0.21/0.04 60000 | 0.76/0.18/0.06 0.74/0.23/0.03 0.75/0.22/0.03 0.69/0.27/0.04 0.68/0.24/0.08 80000 | 0.81/0.13/0.06 0.80/0.16/0.04 0.80/0.15/0.05 0.79/0.17/0.04 0.79/0.15/0.06 100000 | 0.88/0.08/0.04 0.86/0.12/0.02 0.87/0.12/0.01 0.83/0.16/0.01 0.80/0.15/0.05 --------------------------------------------------------------------------------------- [hidden unit: 16, step size: 0.001, td lambda: 0.9] --------------------------------------------------------------------------------------- maru\batsu | 20000 40000 60000 80000 100000 --------------------------------------------------------------------------------------- 20000 | 0.73/0.19/0.08 0.76/0.16/0.08 0.73/0.20/0.07 0.78/0.18/0.04 0.72/0.23/0.05 40000 | 0.81/0.16/0.03 0.79/0.19/0.02 0.73/0.23/0.04 0.76/0.21/0.03 0.73/0.25/0.02 60000 | 0.74/0.20/0.06 0.83/0.12/0.05 0.72/0.21/0.07 0.70/0.24/0.06 0.71/0.25/0.04 80000 | 0.75/0.19/0.06 0.82/0.13/0.05 0.75/0.19/0.06 0.75/0.19/0.06 0.72/0.25/0.03 100000 | 0.84/0.10/0.06 0.80/0.15/0.05 0.77/0.17/0.06 0.79/0.17/0.04 0.81/0.17/0.02 ---------------------------------------------------------------------------------------
学習がちゃんと進んでいれば、各表の下に進むほど、○の勝率が増える(もしくは引き分け率が増える)し、右に進むほど、×の勝率が増える(もしくは引き分け率が増える)はずなんだけど、全然そんなふうにはなってない。
○の勝率はどれも70%弱〜80%強だし、×の勝率は20%弱〜30%強になっている。
そして、本来なら収束して増えるべき引き分け率は、ほとんど5%前後に留まっている。
つまり、学習がそもそもちゃんと出来てない・・・
今となってみれば、(16個とかでも)中間層のユニット数が少なすぎて、特徴を表現しきれていなかったのだと思う。
そのせいで、あちらを立てればこちらが立たずという状況が(この程度の学習回数でも)起きてしまい、それ以上学習が進まないという状況に陥ってしまっていたのだと思う。
なので、ほぼ横ばいの勝率で揺れ動くということが起こっていたと。
もっとも、それが分かるのは、中間層のユニット数をもっと増やし(128個とか)、学習回数も増やせば(少なくとも1,000,000回)、それなりに学習が進むと分かったからだけど・・・
勝率のいいパラメータを探す
なんか学習がうまく進んでいないようだったので、せめて勝率のいいパラメータを探すことで、よさそうなパラメータの値を得ようと思ったのが、以下のコード。
# 続き # find good com if ARGV[0] == "find_good_com" all_maru_players = Hash.new all_batsu_players = Hash.new hidden_unit_sizes.each do |hidden_unit_size| step_sizes.each do |step_size| td_lambdas.each do |td_lambda| time_stamps.each do |time_stamp| sample_size.times do |sample| maru_filename = sprintf maru_filename_format, sample, hidden_unit_size, step_size, td_lambda, time_stamp batsu_filename = sprintf batsu_filename_format, sample, hidden_unit_size, step_size, td_lambda, time_stamp all_maru_players[maru_filename] = NNSarsaCom.load(maru_filename) all_maru_players[maru_filename].learn_mode = false all_batsu_players[batsu_filename] = NNSarsaCom.load(batsu_filename) all_batsu_players[batsu_filename].learn_mode = false end end end end end puts "[good maru players]" all_maru_players.each do |filename, maru_player| win_count = 0 draw_count = 0 good = true all_batsu_players.each do |_, batsu_player| game = Game.new(maru_player, batsu_player) result = game.start(false) if result == Mark::Maru win_count += 1 elsif result == Mark::Empty draw_count += 1 else good = false break end end if good puts "#{filename}, win: #{win_count}, draw: #{draw_count}" end end puts "\n[good batsu players]" all_batsu_players.each do |filename, batsu_player| win_count = 0 draw_count = 0 good = true all_maru_players.each do |_, maru_player| game = Game.new(maru_player, batsu_player) result = game.start(false) if result == Mark::Batsu win_count += 1 elsif result == Mark::Empty draw_count += 1 else good = false break end end if good puts "#{filename}, win: #{win_count}, draw: #{draw_count}" end end end
引数に"find_good_com"が指定された場合、すべての○と×を対戦させ、負けない○と負けない×を探し出すようにしている。
これを実行した結果が以下。
[good maru players] parameter_test/maru005_hidden04_step0.010_lambda0.6_040000.dat, win: 1316, draw: 34 parameter_test/maru004_hidden04_step0.010_lambda0.6_080000.dat, win: 1277, draw: 73 parameter_test/maru005_hidden04_step0.010_lambda0.6_080000.dat, win: 1272, draw: 78 parameter_test/maru000_hidden08_step0.010_lambda0.9_080000.dat, win: 1232, draw: 118 [good batsu players]
×弱すぎ・・・
それはともかく、ステップサイズ0.01、 0.6がよさそうというのを、この結果は伝えている。
それと、中間層のユニット数は少ない方がいいと・・・
ただ、これは×があまりに弱すぎるせいで、適当にやっても大体○が勝ってしまい、そうしたときに、中間層のユニット数が多いものよりも少ないものの方が、局面をよりシンプルに見られて、変な手に惑わされることが少なかっただけなのだと思う。
実際、maru005_hidden04_step0.010_lambda0.6_xxxxxx.datというデータは、学習回数40,000回と80,000回のものがリストアップされているけど、学習回数が増えたものの方が勝率自体は落ちてしまっている。
これは学習回数が増える中で、変な手に惑わされることが増えてしまい、勝てる手を選べなくなってしまっているからだと思う。
今日はここまで!