いものやま。

雑多な知識の寄せ集め

強化学習とニューラルネットワークを組合せてみた。(その6)

昨日は関数近似ニューラルネットワークを使ったSarsa( \lambda)法を実装してみた。

けど、うまくいかなかったので、いろいろパラメータを変えてみるということをやってみた。

パラメータテスト

パラメータとして調整しないといけないものは、次の4つ:

  • 中間層のユニット数
  • ステップサイズ
  • トレース減衰パラメータ  \lambda
  • 学習回数

そこで、これらについて、それぞれ以下のように変えたときにどうなるのかを調べてみた:

  • 中間層のユニット数
    4, 8, 16
  • ステップサイズ
    0.01, 0.005, 0.001
  • トレース減衰パラメータ  \lambda
    0.3, 0.6, 0.9
  • 学習回数
    20,000, 40,000, 60,000, 80,000, 100,000

書いたのは次のようなコード。

#====================
# parameter_test.rb
#--------------------
# パラメータによる学習の違いの確認
#====================

require_relative "mark"
require_relative "game"
require_relative "nn_sarsa_com"

sample_size = 10

hidden_unit_sizes = [4, 8, 16]
step_sizes = [0.01, 0.005, 0.001]
td_lambdas = [0.3, 0.6, 0.9]
time_stamps = [20000, 40000, 60000, 80000, 100000]

maru_filename_format = "parameter_test/maru%03d_hidden%02d_step%5.3f_lambda%3.1f_%06d.dat"
batsu_filename_format = "parameter_test/batsu%02d_hidden%02d_step%5.3f_lambda%3.1f_%06d.dat"

# create

if ARGV[0] == "create"
  hidden_unit_sizes.each do |hidden_unit_size|
    step_sizes.each do |step_size|
      td_lambdas.each do |td_lambda|
        sample_size.times do |sample|
          maru_player = NNSarsaCom.new(Mark::Maru, hidden_unit_size, 0.1, step_size, td_lambda)
          batsu_player = NNSarsaCom.new(Mark::Batsu, hidden_unit_size, 0.1, step_size, td_lambda)
          (1..100000).each do |i|
            game = Game.new(maru_player, batsu_player)
            game.start(false)
            if time_stamps.index(i)
              puts "[hidden: #{hidden_unit_size}, step: #{step_size}, lambda: #{td_lambda}] sample: #{sample} - #{i}"
              maru_filename = sprintf maru_filename_format, sample, hidden_unit_size, step_size, td_lambda, i
              batsu_filename = sprintf batsu_filename_format, sample, hidden_unit_size, step_size, td_lambda, i
              maru_player.save(maru_filename)
              batsu_player.save(batsu_filename)
            end
          end
        end
      end
    end
  end
end

# for access from methods

@sample_size = sample_size
@maru_filename_format = maru_filename_format
@batsu_filename_format = batsu_filename_format

def load_maru_players(hidden_unit_size, step_size, td_lambda, time_stamp)
  maru_players = Array.new
  @sample_size.times do |sample|
    filename = sprintf @maru_filename_format, sample, hidden_unit_size, step_size, td_lambda, time_stamp
    maru_player = NNSarsaCom.load(filename)
    maru_player.learn_mode = false
    maru_players.push maru_player
  end
  maru_players
end

def load_batsu_players(hidden_unit_size, step_size, td_lambda, time_stamp)
  batsu_players = Array.new
  @sample_size.times do |sample|
    filename = sprintf @batsu_filename_format, sample, hidden_unit_size, step_size, td_lambda, time_stamp
    batsu_player = NNSarsaCom.load(filename)
    batsu_player.learn_mode = false
    batsu_players.push batsu_player
  end
  batsu_players
end

def check_win_rate(maru_players, batsu_players)
  total_count = 0.0
  maru_win_count = 0.0
  batsu_win_count = 0.0
  draw_count = 0.0
  maru_players.each do |maru_player|
    batsu_players.each do |batsu_player|
      total_count += 1.0
      maru_player.learn_mode = false
      batsu_player.learn_mode = false
      game = Game.new(maru_player, batsu_player)
      case game.start(false)
      when Mark::Maru
        maru_win_count += 1.0
      when Mark::Batsu
        batsu_win_count += 1.0
      when Mark::Empty
        draw_count += 1.0
      end
    end
  end
  [maru_win_count/total_count, batsu_win_count/total_count, draw_count/total_count]
end

# 続く

引数に"create"が指定された場合、中間層のユニット数、ステップサイズ、 \lambda のそれぞれの組合せについて、10回ずつ試行が行われる。
そして、20,000回、40,000回、60,000回、80,000回、100,000回学習したタイミングでデータが保存され、あとでいろいろ試すことが出来るようにした。

それと、保存したデータを簡単に復元するためのメソッドと、勝率を調べるためのメソッドを用意してある。

学習が進んでいるかの確認

まずやったのは、そもそも学習がちゃんと進んでいるかの確認。

理屈からいえば、学習が進んだなら、学習回数の少ない相手には勝てるはず。

# 続き

# compare with time stamps

if ARGV[0] == "time_stamp"
  hidden_unit_sizes.each do |hidden_unit_size|
    step_sizes.each do |step_size|
      td_lambdas.each do |td_lambda|
        puts "[hidden unit: #{hidden_unit_size}, step size: #{step_size}, td lambda: #{td_lambda}]"
        puts "---------------------------------------------------------------------------------------"
        puts "maru\\batsu |          20000          40000          60000          80000         100000"
        puts "---------------------------------------------------------------------------------------"
        time_stamps.each do |maru_time_stamp|
          maru_players = load_maru_players(hidden_unit_size, step_size, td_lambda, maru_time_stamp)
          print sprintf "%10d |", maru_time_stamp
          time_stamps.each do |batsu_time_stamp|
            batsu_players = load_batsu_players(hidden_unit_size, step_size, td_lambda, batsu_time_stamp)
            result = check_win_rate(maru_players, batsu_players)
            print sprintf " %4.2f/%4.2f/%4.2f", *result
          end
          print "\n"
        end
        puts "---------------------------------------------------------------------------------------"
        puts ""
      end
    end
  end
end

# 続く

引数に"time_stamps"が指定された場合、中間層のユニット数、ステップサイズ、 \lambda のそれぞれの組合せについて、各学習回数のAIの勝率が表示されるようにしている。

これを実行した結果が、以下。
各セルは、「○の勝率/×の勝率/引き分け率」となっている。

[hidden unit: 4, step size: 0.01, td lambda: 0.3]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.70/0.27/0.03 0.68/0.27/0.05 0.66/0.31/0.03 0.69/0.27/0.04 0.71/0.28/0.01
     40000 | 0.66/0.33/0.01 0.76/0.23/0.01 0.61/0.31/0.08 0.67/0.30/0.03 0.70/0.28/0.02
     60000 | 0.72/0.28/0.00 0.73/0.22/0.05 0.71/0.23/0.06 0.73/0.21/0.06 0.72/0.24/0.04
     80000 | 0.63/0.33/0.04 0.63/0.28/0.09 0.74/0.24/0.02 0.76/0.21/0.03 0.70/0.29/0.01
    100000 | 0.71/0.24/0.05 0.74/0.25/0.01 0.72/0.27/0.01 0.81/0.17/0.02 0.73/0.26/0.01
---------------------------------------------------------------------------------------

[hidden unit: 4, step size: 0.01, td lambda: 0.6]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.77/0.21/0.02 0.68/0.29/0.03 0.76/0.21/0.03 0.74/0.24/0.02 0.76/0.18/0.06
     40000 | 0.77/0.22/0.01 0.78/0.21/0.01 0.77/0.21/0.02 0.72/0.25/0.03 0.78/0.21/0.01
     60000 | 0.77/0.19/0.04 0.70/0.27/0.03 0.73/0.23/0.04 0.71/0.25/0.04 0.70/0.28/0.02
     80000 | 0.84/0.11/0.05 0.78/0.21/0.01 0.81/0.17/0.02 0.77/0.19/0.04 0.79/0.16/0.05
    100000 | 0.90/0.08/0.02 0.88/0.08/0.04 0.86/0.10/0.04 0.83/0.14/0.03 0.85/0.13/0.02
---------------------------------------------------------------------------------------

[hidden unit: 4, step size: 0.01, td lambda: 0.9]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.70/0.26/0.04 0.64/0.31/0.05 0.64/0.30/0.06 0.68/0.26/0.06 0.69/0.24/0.07
     40000 | 0.75/0.20/0.05 0.76/0.23/0.01 0.81/0.15/0.04 0.68/0.31/0.01 0.78/0.18/0.04
     60000 | 0.83/0.14/0.03 0.86/0.10/0.04 0.75/0.19/0.06 0.74/0.21/0.05 0.80/0.16/0.04
     80000 | 0.80/0.17/0.03 0.90/0.09/0.01 0.80/0.14/0.06 0.80/0.17/0.03 0.82/0.14/0.04
    100000 | 0.83/0.16/0.01 0.76/0.21/0.03 0.79/0.18/0.03 0.80/0.20/0.00 0.83/0.14/0.03
---------------------------------------------------------------------------------------

[hidden unit: 4, step size: 0.005, td lambda: 0.3]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.74/0.23/0.03 0.79/0.19/0.02 0.80/0.18/0.02 0.72/0.25/0.03 0.76/0.22/0.02
     40000 | 0.68/0.28/0.04 0.75/0.20/0.05 0.70/0.25/0.05 0.67/0.26/0.07 0.66/0.28/0.06
     60000 | 0.67/0.26/0.07 0.63/0.32/0.05 0.63/0.33/0.04 0.67/0.28/0.05 0.67/0.28/0.05
     80000 | 0.64/0.25/0.11 0.70/0.25/0.05 0.65/0.29/0.06 0.67/0.27/0.06 0.68/0.29/0.03
    100000 | 0.70/0.23/0.07 0.75/0.21/0.04 0.69/0.22/0.09 0.69/0.28/0.03 0.70/0.28/0.02
---------------------------------------------------------------------------------------

[hidden unit: 4, step size: 0.005, td lambda: 0.6]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.74/0.26/0.00 0.78/0.20/0.02 0.81/0.19/0.00 0.72/0.22/0.06 0.74/0.22/0.04
     40000 | 0.81/0.16/0.03 0.82/0.15/0.03 0.84/0.15/0.01 0.72/0.24/0.04 0.73/0.23/0.04
     60000 | 0.73/0.24/0.03 0.79/0.18/0.03 0.79/0.19/0.02 0.72/0.23/0.05 0.67/0.25/0.08
     80000 | 0.77/0.22/0.01 0.81/0.17/0.02 0.76/0.22/0.02 0.73/0.24/0.03 0.73/0.22/0.05
    100000 | 0.82/0.15/0.03 0.85/0.12/0.03 0.83/0.13/0.04 0.78/0.16/0.06 0.81/0.14/0.05
---------------------------------------------------------------------------------------

[hidden unit: 4, step size: 0.005, td lambda: 0.9]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.76/0.19/0.05 0.74/0.22/0.04 0.69/0.28/0.03 0.62/0.32/0.06 0.63/0.32/0.05
     40000 | 0.77/0.16/0.07 0.71/0.24/0.05 0.74/0.21/0.05 0.79/0.17/0.04 0.72/0.23/0.05
     60000 | 0.79/0.18/0.03 0.72/0.25/0.03 0.70/0.27/0.03 0.74/0.22/0.04 0.73/0.22/0.05
     80000 | 0.90/0.09/0.01 0.87/0.10/0.03 0.84/0.15/0.01 0.87/0.12/0.01 0.82/0.13/0.05
    100000 | 0.86/0.13/0.01 0.83/0.14/0.03 0.80/0.17/0.03 0.79/0.17/0.04 0.81/0.15/0.04
---------------------------------------------------------------------------------------

[hidden unit: 4, step size: 0.001, td lambda: 0.3]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.61/0.29/0.10 0.64/0.24/0.12 0.62/0.28/0.10 0.65/0.28/0.07 0.66/0.25/0.09
     40000 | 0.66/0.30/0.04 0.66/0.25/0.09 0.63/0.26/0.11 0.69/0.28/0.03 0.67/0.27/0.06
     60000 | 0.65/0.34/0.01 0.68/0.29/0.03 0.69/0.26/0.05 0.71/0.22/0.07 0.66/0.27/0.07
     80000 | 0.62/0.31/0.07 0.77/0.20/0.03 0.73/0.21/0.06 0.73/0.23/0.04 0.65/0.28/0.07
    100000 | 0.64/0.31/0.05 0.72/0.23/0.05 0.68/0.25/0.07 0.66/0.27/0.07 0.62/0.27/0.11
---------------------------------------------------------------------------------------

[hidden unit: 4, step size: 0.001, td lambda: 0.6]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.71/0.21/0.08 0.65/0.28/0.07 0.63/0.33/0.04 0.57/0.40/0.03 0.67/0.30/0.03
     40000 | 0.75/0.17/0.08 0.71/0.20/0.09 0.62/0.32/0.06 0.56/0.34/0.10 0.71/0.25/0.04
     60000 | 0.77/0.12/0.11 0.71/0.21/0.08 0.63/0.27/0.10 0.57/0.33/0.10 0.66/0.27/0.07
     80000 | 0.82/0.14/0.04 0.82/0.13/0.05 0.81/0.18/0.01 0.72/0.22/0.06 0.75/0.23/0.02
    100000 | 0.75/0.18/0.07 0.74/0.18/0.08 0.77/0.20/0.03 0.73/0.23/0.04 0.76/0.21/0.03
---------------------------------------------------------------------------------------

[hidden unit: 4, step size: 0.001, td lambda: 0.9]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.64/0.29/0.07 0.61/0.30/0.09 0.57/0.38/0.05 0.58/0.37/0.05 0.61/0.37/0.02
     40000 | 0.76/0.16/0.08 0.68/0.23/0.09 0.69/0.27/0.04 0.73/0.24/0.03 0.75/0.23/0.02
     60000 | 0.67/0.24/0.09 0.66/0.23/0.11 0.67/0.27/0.06 0.66/0.30/0.04 0.67/0.27/0.06
     80000 | 0.67/0.26/0.07 0.59/0.30/0.11 0.65/0.28/0.07 0.64/0.30/0.06 0.66/0.30/0.04
    100000 | 0.62/0.32/0.06 0.60/0.34/0.06 0.64/0.31/0.05 0.58/0.35/0.07 0.62/0.32/0.06
---------------------------------------------------------------------------------------

[hidden unit: 8, step size: 0.01, td lambda: 0.3]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.80/0.18/0.02 0.78/0.17/0.05 0.77/0.21/0.02 0.76/0.24/0.00 0.76/0.23/0.01
     40000 | 0.77/0.17/0.06 0.77/0.16/0.07 0.74/0.21/0.05 0.72/0.23/0.05 0.68/0.29/0.03
     60000 | 0.89/0.11/0.00 0.77/0.18/0.05 0.81/0.17/0.02 0.78/0.22/0.00 0.72/0.26/0.02
     80000 | 0.83/0.14/0.03 0.83/0.11/0.06 0.74/0.20/0.06 0.78/0.17/0.05 0.72/0.23/0.05
    100000 | 0.88/0.11/0.01 0.80/0.15/0.05 0.84/0.16/0.00 0.75/0.20/0.05 0.75/0.22/0.03
---------------------------------------------------------------------------------------

[hidden unit: 8, step size: 0.01, td lambda: 0.6]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.77/0.20/0.03 0.72/0.26/0.02 0.70/0.23/0.07 0.70/0.25/0.05 0.74/0.23/0.03
     40000 | 0.82/0.13/0.05 0.82/0.15/0.03 0.78/0.15/0.07 0.76/0.21/0.03 0.75/0.20/0.05
     60000 | 0.78/0.18/0.04 0.78/0.19/0.03 0.72/0.19/0.09 0.66/0.26/0.08 0.75/0.20/0.05
     80000 | 0.82/0.12/0.06 0.80/0.14/0.06 0.79/0.16/0.05 0.75/0.17/0.08 0.80/0.15/0.05
    100000 | 0.89/0.10/0.01 0.84/0.13/0.03 0.87/0.09/0.04 0.86/0.12/0.02 0.89/0.06/0.05
---------------------------------------------------------------------------------------

[hidden unit: 8, step size: 0.01, td lambda: 0.9]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.79/0.19/0.02 0.68/0.31/0.01 0.72/0.25/0.03 0.62/0.31/0.07 0.62/0.28/0.10
     40000 | 0.84/0.13/0.03 0.76/0.23/0.01 0.81/0.12/0.07 0.75/0.21/0.04 0.61/0.27/0.12
     60000 | 0.82/0.17/0.01 0.80/0.20/0.00 0.74/0.22/0.04 0.69/0.29/0.02 0.69/0.24/0.07
     80000 | 0.81/0.15/0.04 0.85/0.13/0.02 0.73/0.20/0.07 0.79/0.17/0.04 0.70/0.19/0.11
    100000 | 0.89/0.07/0.04 0.84/0.11/0.05 0.78/0.14/0.08 0.85/0.07/0.08 0.75/0.13/0.12
---------------------------------------------------------------------------------------

[hidden unit: 8, step size: 0.005, td lambda: 0.3]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.76/0.19/0.05 0.69/0.26/0.05 0.69/0.26/0.05 0.67/0.25/0.08 0.69/0.25/0.06
     40000 | 0.77/0.22/0.01 0.77/0.18/0.05 0.74/0.22/0.04 0.65/0.25/0.10 0.64/0.24/0.12
     60000 | 0.81/0.18/0.01 0.74/0.22/0.04 0.73/0.22/0.05 0.68/0.22/0.10 0.70/0.18/0.12
     80000 | 0.80/0.19/0.01 0.76/0.22/0.02 0.75/0.19/0.06 0.74/0.22/0.04 0.78/0.16/0.06
    100000 | 0.74/0.21/0.05 0.72/0.23/0.05 0.72/0.26/0.02 0.73/0.23/0.04 0.80/0.14/0.06
---------------------------------------------------------------------------------------

[hidden unit: 8, step size: 0.005, td lambda: 0.6]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.74/0.21/0.05 0.73/0.20/0.07 0.74/0.20/0.06 0.76/0.20/0.04 0.76/0.24/0.00
     40000 | 0.76/0.18/0.06 0.77/0.20/0.03 0.74/0.23/0.03 0.73/0.23/0.04 0.74/0.24/0.02
     60000 | 0.72/0.22/0.06 0.80/0.17/0.03 0.75/0.24/0.01 0.79/0.21/0.00 0.83/0.15/0.02
     80000 | 0.87/0.10/0.03 0.80/0.18/0.02 0.80/0.20/0.00 0.81/0.19/0.00 0.81/0.19/0.00
    100000 | 0.82/0.10/0.08 0.87/0.13/0.00 0.88/0.11/0.01 0.84/0.15/0.01 0.86/0.12/0.02
---------------------------------------------------------------------------------------

[hidden unit: 8, step size: 0.005, td lambda: 0.9]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.79/0.17/0.04 0.76/0.24/0.00 0.71/0.28/0.01 0.72/0.27/0.01 0.69/0.30/0.01
     40000 | 0.83/0.16/0.01 0.79/0.19/0.02 0.73/0.23/0.04 0.76/0.20/0.04 0.78/0.17/0.05
     60000 | 0.82/0.18/0.00 0.82/0.15/0.03 0.77/0.19/0.04 0.73/0.22/0.05 0.83/0.15/0.02
     80000 | 0.89/0.11/0.00 0.86/0.12/0.02 0.82/0.15/0.03 0.80/0.15/0.05 0.86/0.10/0.04
    100000 | 0.86/0.14/0.00 0.88/0.09/0.03 0.84/0.11/0.05 0.85/0.10/0.05 0.79/0.14/0.07
---------------------------------------------------------------------------------------

[hidden unit: 8, step size: 0.001, td lambda: 0.3]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.73/0.23/0.04 0.68/0.28/0.04 0.64/0.31/0.05 0.62/0.32/0.06 0.67/0.29/0.04
     40000 | 0.71/0.27/0.02 0.67/0.26/0.07 0.67/0.27/0.06 0.60/0.34/0.06 0.67/0.30/0.03
     60000 | 0.70/0.24/0.06 0.71/0.24/0.05 0.68/0.28/0.04 0.62/0.34/0.04 0.63/0.31/0.06
     80000 | 0.72/0.19/0.09 0.66/0.26/0.08 0.69/0.25/0.06 0.67/0.27/0.06 0.69/0.26/0.05
    100000 | 0.73/0.21/0.06 0.69/0.24/0.07 0.72/0.22/0.06 0.70/0.28/0.02 0.71/0.26/0.03
---------------------------------------------------------------------------------------

[hidden unit: 8, step size: 0.001, td lambda: 0.6]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.62/0.31/0.07 0.63/0.27/0.10 0.64/0.30/0.06 0.64/0.30/0.06 0.61/0.36/0.03
     40000 | 0.73/0.18/0.09 0.71/0.21/0.08 0.67/0.25/0.08 0.63/0.30/0.07 0.67/0.30/0.03
     60000 | 0.69/0.24/0.07 0.73/0.22/0.05 0.68/0.28/0.04 0.68/0.27/0.05 0.68/0.30/0.02
     80000 | 0.72/0.23/0.05 0.76/0.21/0.03 0.66/0.29/0.05 0.68/0.27/0.05 0.70/0.28/0.02
    100000 | 0.72/0.22/0.06 0.79/0.16/0.05 0.76/0.23/0.01 0.75/0.19/0.06 0.72/0.20/0.08
---------------------------------------------------------------------------------------

[hidden unit: 8, step size: 0.001, td lambda: 0.9]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.78/0.14/0.08 0.72/0.16/0.12 0.66/0.21/0.13 0.73/0.18/0.09 0.70/0.21/0.09
     40000 | 0.84/0.11/0.05 0.75/0.17/0.08 0.79/0.15/0.06 0.77/0.15/0.08 0.77/0.17/0.06
     60000 | 0.76/0.18/0.06 0.72/0.22/0.06 0.76/0.21/0.03 0.76/0.21/0.03 0.76/0.22/0.02
     80000 | 0.86/0.07/0.07 0.77/0.17/0.06 0.75/0.22/0.03 0.83/0.13/0.04 0.77/0.19/0.04
    100000 | 0.79/0.14/0.07 0.78/0.14/0.08 0.73/0.20/0.07 0.78/0.14/0.08 0.77/0.19/0.04
---------------------------------------------------------------------------------------

[hidden unit: 16, step size: 0.01, td lambda: 0.3]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.74/0.21/0.05 0.70/0.25/0.05 0.61/0.35/0.04 0.62/0.36/0.02 0.61/0.36/0.03
     40000 | 0.83/0.13/0.04 0.86/0.12/0.02 0.81/0.16/0.03 0.82/0.14/0.04 0.81/0.15/0.04
     60000 | 0.81/0.17/0.02 0.81/0.15/0.04 0.79/0.21/0.00 0.80/0.19/0.01 0.73/0.26/0.01
     80000 | 0.84/0.13/0.03 0.77/0.16/0.07 0.78/0.19/0.03 0.78/0.16/0.06 0.80/0.18/0.02
    100000 | 0.80/0.18/0.02 0.71/0.26/0.03 0.71/0.26/0.03 0.74/0.23/0.03 0.78/0.20/0.02
---------------------------------------------------------------------------------------

[hidden unit: 16, step size: 0.01, td lambda: 0.6]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.82/0.17/0.01 0.77/0.21/0.02 0.82/0.12/0.06 0.68/0.24/0.08 0.68/0.25/0.07
     40000 | 0.82/0.16/0.02 0.84/0.12/0.04 0.90/0.08/0.02 0.69/0.27/0.04 0.70/0.25/0.05
     60000 | 0.81/0.17/0.02 0.82/0.13/0.05 0.88/0.07/0.05 0.75/0.20/0.05 0.68/0.23/0.09
     80000 | 0.83/0.15/0.02 0.80/0.12/0.08 0.84/0.07/0.09 0.77/0.18/0.05 0.71/0.20/0.09
    100000 | 0.85/0.12/0.03 0.77/0.14/0.09 0.83/0.08/0.09 0.72/0.20/0.08 0.69/0.19/0.12
---------------------------------------------------------------------------------------

[hidden unit: 16, step size: 0.01, td lambda: 0.9]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.79/0.19/0.02 0.78/0.21/0.01 0.73/0.24/0.03 0.70/0.28/0.02 0.70/0.28/0.02
     40000 | 0.85/0.14/0.01 0.83/0.13/0.04 0.76/0.19/0.05 0.76/0.20/0.04 0.76/0.22/0.02
     60000 | 0.90/0.07/0.03 0.91/0.04/0.05 0.89/0.08/0.03 0.77/0.19/0.04 0.78/0.17/0.05
     80000 | 0.91/0.09/0.00 0.87/0.07/0.06 0.83/0.11/0.06 0.78/0.19/0.03 0.81/0.16/0.03
    100000 | 0.93/0.07/0.00 0.79/0.18/0.03 0.78/0.18/0.04 0.74/0.23/0.03 0.79/0.18/0.03
---------------------------------------------------------------------------------------

[hidden unit: 16, step size: 0.005, td lambda: 0.3]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.62/0.30/0.08 0.59/0.34/0.07 0.70/0.28/0.02 0.56/0.39/0.05 0.58/0.39/0.03
     40000 | 0.76/0.22/0.02 0.74/0.25/0.01 0.77/0.21/0.02 0.77/0.21/0.02 0.71/0.26/0.03
     60000 | 0.70/0.25/0.05 0.73/0.25/0.02 0.78/0.21/0.01 0.73/0.26/0.01 0.70/0.28/0.02
     80000 | 0.74/0.22/0.04 0.80/0.16/0.04 0.82/0.15/0.03 0.77/0.17/0.06 0.76/0.20/0.04
    100000 | 0.77/0.21/0.02 0.75/0.17/0.08 0.81/0.16/0.03 0.75/0.21/0.04 0.72/0.24/0.04
---------------------------------------------------------------------------------------

[hidden unit: 16, step size: 0.005, td lambda: 0.6]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.83/0.16/0.01 0.79/0.18/0.03 0.77/0.22/0.01 0.74/0.26/0.00 0.70/0.28/0.02
     40000 | 0.69/0.26/0.05 0.73/0.22/0.05 0.70/0.28/0.02 0.70/0.28/0.02 0.66/0.31/0.03
     60000 | 0.85/0.13/0.02 0.91/0.08/0.01 0.82/0.15/0.03 0.80/0.17/0.03 0.73/0.25/0.02
     80000 | 0.86/0.12/0.02 0.93/0.06/0.01 0.83/0.14/0.03 0.83/0.14/0.03 0.78/0.19/0.03
    100000 | 0.82/0.14/0.04 0.87/0.10/0.03 0.82/0.16/0.02 0.82/0.14/0.04 0.72/0.25/0.03
---------------------------------------------------------------------------------------

[hidden unit: 16, step size: 0.005, td lambda: 0.9]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.80/0.17/0.03 0.79/0.15/0.06 0.68/0.25/0.07 0.71/0.24/0.05 0.67/0.26/0.07
     40000 | 0.81/0.15/0.04 0.88/0.11/0.01 0.78/0.18/0.04 0.81/0.16/0.03 0.80/0.16/0.04
     60000 | 0.81/0.16/0.03 0.79/0.20/0.01 0.75/0.21/0.04 0.73/0.23/0.04 0.73/0.19/0.08
     80000 | 0.92/0.06/0.02 0.92/0.07/0.01 0.86/0.10/0.04 0.86/0.11/0.03 0.83/0.15/0.02
    100000 | 0.83/0.13/0.04 0.88/0.11/0.01 0.80/0.16/0.04 0.84/0.13/0.03 0.78/0.18/0.04
---------------------------------------------------------------------------------------

[hidden unit: 16, step size: 0.001, td lambda: 0.3]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.68/0.26/0.06 0.69/0.26/0.05 0.70/0.25/0.05 0.68/0.26/0.06 0.64/0.29/0.07
     40000 | 0.70/0.27/0.03 0.62/0.33/0.05 0.64/0.33/0.03 0.67/0.30/0.03 0.65/0.28/0.07
     60000 | 0.77/0.21/0.02 0.70/0.27/0.03 0.78/0.20/0.02 0.68/0.29/0.03 0.68/0.30/0.02
     80000 | 0.77/0.19/0.04 0.77/0.22/0.01 0.79/0.18/0.03 0.75/0.22/0.03 0.75/0.22/0.03
    100000 | 0.81/0.18/0.01 0.77/0.21/0.02 0.76/0.21/0.03 0.78/0.17/0.05 0.74/0.21/0.05
---------------------------------------------------------------------------------------

[hidden unit: 16, step size: 0.001, td lambda: 0.6]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.83/0.12/0.05 0.81/0.12/0.07 0.82/0.13/0.05 0.72/0.25/0.03 0.75/0.22/0.03
     40000 | 0.79/0.13/0.08 0.76/0.17/0.07 0.76/0.18/0.06 0.74/0.20/0.06 0.75/0.21/0.04
     60000 | 0.76/0.18/0.06 0.74/0.23/0.03 0.75/0.22/0.03 0.69/0.27/0.04 0.68/0.24/0.08
     80000 | 0.81/0.13/0.06 0.80/0.16/0.04 0.80/0.15/0.05 0.79/0.17/0.04 0.79/0.15/0.06
    100000 | 0.88/0.08/0.04 0.86/0.12/0.02 0.87/0.12/0.01 0.83/0.16/0.01 0.80/0.15/0.05
---------------------------------------------------------------------------------------

[hidden unit: 16, step size: 0.001, td lambda: 0.9]
---------------------------------------------------------------------------------------
maru\batsu |          20000          40000          60000          80000         100000
---------------------------------------------------------------------------------------
     20000 | 0.73/0.19/0.08 0.76/0.16/0.08 0.73/0.20/0.07 0.78/0.18/0.04 0.72/0.23/0.05
     40000 | 0.81/0.16/0.03 0.79/0.19/0.02 0.73/0.23/0.04 0.76/0.21/0.03 0.73/0.25/0.02
     60000 | 0.74/0.20/0.06 0.83/0.12/0.05 0.72/0.21/0.07 0.70/0.24/0.06 0.71/0.25/0.04
     80000 | 0.75/0.19/0.06 0.82/0.13/0.05 0.75/0.19/0.06 0.75/0.19/0.06 0.72/0.25/0.03
    100000 | 0.84/0.10/0.06 0.80/0.15/0.05 0.77/0.17/0.06 0.79/0.17/0.04 0.81/0.17/0.02
---------------------------------------------------------------------------------------

学習がちゃんと進んでいれば、各表の下に進むほど、○の勝率が増える(もしくは引き分け率が増える)し、右に進むほど、×の勝率が増える(もしくは引き分け率が増える)はずなんだけど、全然そんなふうにはなってない。
○の勝率はどれも70%弱〜80%強だし、×の勝率は20%弱〜30%強になっている。
そして、本来なら収束して増えるべき引き分け率は、ほとんど5%前後に留まっている。
つまり、学習がそもそもちゃんと出来てない・・・

今となってみれば、(16個とかでも)中間層のユニット数が少なすぎて、特徴を表現しきれていなかったのだと思う。
そのせいで、あちらを立てればこちらが立たずという状況が(この程度の学習回数でも)起きてしまい、それ以上学習が進まないという状況に陥ってしまっていたのだと思う。
なので、ほぼ横ばいの勝率で揺れ動くということが起こっていたと。

もっとも、それが分かるのは、中間層のユニット数をもっと増やし(128個とか)、学習回数も増やせば(少なくとも1,000,000回)、それなりに学習が進むと分かったからだけど・・・

勝率のいいパラメータを探す

なんか学習がうまく進んでいないようだったので、せめて勝率のいいパラメータを探すことで、よさそうなパラメータの値を得ようと思ったのが、以下のコード。

# 続き

# find good com

if ARGV[0] == "find_good_com"
  all_maru_players = Hash.new
  all_batsu_players = Hash.new
  hidden_unit_sizes.each do |hidden_unit_size|
    step_sizes.each do |step_size|
      td_lambdas.each do |td_lambda|
        time_stamps.each do |time_stamp|
          sample_size.times do |sample|
            maru_filename = sprintf maru_filename_format, sample, hidden_unit_size, step_size, td_lambda, time_stamp
            batsu_filename = sprintf batsu_filename_format, sample, hidden_unit_size, step_size, td_lambda, time_stamp
            all_maru_players[maru_filename] = NNSarsaCom.load(maru_filename)
            all_maru_players[maru_filename].learn_mode = false
            all_batsu_players[batsu_filename] = NNSarsaCom.load(batsu_filename)
            all_batsu_players[batsu_filename].learn_mode = false
          end
        end
      end
    end
  end

  puts "[good maru players]"
  all_maru_players.each do |filename, maru_player|
    win_count = 0
    draw_count = 0
    good = true
    all_batsu_players.each do |_, batsu_player|
      game = Game.new(maru_player, batsu_player)
      result = game.start(false)
      if result == Mark::Maru
        win_count += 1
      elsif result == Mark::Empty
        draw_count += 1
      else
        good = false
        break
      end
    end
    if good
      puts "#{filename}, win: #{win_count}, draw: #{draw_count}"
    end
  end

  puts "\n[good batsu players]"
  all_batsu_players.each do |filename, batsu_player|
    win_count = 0
    draw_count = 0
    good = true
    all_maru_players.each do |_, maru_player|
      game = Game.new(maru_player, batsu_player)
      result = game.start(false)
      if result == Mark::Batsu
        win_count += 1
      elsif result == Mark::Empty
        draw_count += 1
      else
        good = false
        break
      end
    end
    if good
      puts "#{filename}, win: #{win_count}, draw: #{draw_count}"
    end
  end
end

引数に"find_good_com"が指定された場合、すべての○と×を対戦させ、負けない○と負けない×を探し出すようにしている。

これを実行した結果が以下。

[good maru players]
parameter_test/maru005_hidden04_step0.010_lambda0.6_040000.dat, win: 1316, draw: 34
parameter_test/maru004_hidden04_step0.010_lambda0.6_080000.dat, win: 1277, draw: 73
parameter_test/maru005_hidden04_step0.010_lambda0.6_080000.dat, win: 1272, draw: 78
parameter_test/maru000_hidden08_step0.010_lambda0.9_080000.dat, win: 1232, draw: 118

[good batsu players]

×弱すぎ・・・

それはともかく、ステップサイズ0.01、 \lambda 0.6がよさそうというのを、この結果は伝えている。
それと、中間層のユニット数は少ない方がいいと・・・

ただ、これは×があまりに弱すぎるせいで、適当にやっても大体○が勝ってしまい、そうしたときに、中間層のユニット数が多いものよりも少ないものの方が、局面をよりシンプルに見られて、変な手に惑わされることが少なかっただけなのだと思う。
実際、maru005_hidden04_step0.010_lambda0.6_xxxxxx.datというデータは、学習回数40,000回と80,000回のものがリストアップされているけど、学習回数が増えたものの方が勝率自体は落ちてしまっている。
これは学習回数が増える中で、変な手に惑わされることが増えてしまい、勝てる手を選べなくなってしまっているからだと思う。

今日はここまで!

yamaimo.hatenablog.jp