いものやま。

雑多な知識の寄せ集め

「BirdHead」の思考ルーチンを作ってみた。(その5)

昨日はアクション選択と学習のアルゴリズムを実装した。

今日は、パラメータの保存とロードを実装して、クラスとして完成させる。

SarsaComクラス(続き)

保存とロード

保存とロードの実装を入れたクラス全体の様子は、以下のようになる。
(省略されているところは、昨日、一昨日の記事を参照)

//==============================
// BirdHead
//------------------------------
// SarsaCom.swift
//==============================

import Foundation

class SarsaCom: Player {
  static func load(filename: String) -> SarsaCom {
    let saveData = NSDictionary(contentsOfFile: filename)!
    let name = saveData["name"]! as! String
    let epsilon = saveData["epsilon"]! as! Double
    let lambda = saveData["lambda"]! as! Double
    let stepSize = saveData["stepSize"]! as! Double
    let weight = saveData["weight"]! as! [Double]

    let sarsaCom = SarsaCom(name: name, epsilon: epsilon, lambda: lambda, stepSize: stepSize)
    sarsaCom.weight = weight

    return sarsaCom
  }

  private(set) var name: String
  private(set) var isCom: Bool

  private var epsilon: Double
  private var lambda: Double
  private var stepSize: Double

  var learn: Bool
  var debugPrint: Bool

  private(set) var weight: [Double]

  private var previousFeature: [Double]
  private var currentFeature: [Double]
  private var accumulatedFeature: [Double]

  init(name: String, epsilon: Double, lambda: Double, stepSize: Double) {
    self.name = name
    self.isCom = true

    self.epsilon = epsilon
    self.lambda = lambda
    self.stepSize = stepSize

    self.learn = true
    self.debugPrint = false

    self.weight = [Double](count: 98, repeatedValue: 0.0)

    self.previousFeature = [Double]()
    self.currentFeature = [Double]()
    self.accumulatedFeature = [Double](count: 98, repeatedValue: 0.0)
  }

  func select(view: GameInfo.PlayerView) throws -> Action {
    // 省略
  }

  func learn(minusPoint: Int) {
    // 省略
  }

  func save(filename: String) {
    var saveData = [String: AnyObject]()
    saveData["name"] = self.name
    saveData["epsilon"] = self.epsilon
    saveData["lambda"] = self.lambda
    saveData["stepSize"] = self.stepSize
    saveData["weight"] = self.weight
    (saveData as NSDictionary).writeToFile(filename, atomically: true)
  }

  private func toFeature(view: GameInfo.PlayerView, action: Action) -> [Double] {
    // 省略
  }

  private func valueOfFeature(feature: [Double]) -> Double {
    // 省略
  }
}

保存ではプロパティをplistファイルに書き出し、逆にロードではプロパティをplistファイルから読みだすようにしている。

これでSarsaComクラスは完成!

自己対戦による学習

これで実装は出来たので、自己対戦で学習させてみる。

import Foundation

let stdout = NSFileHandle.fileHandleWithStandardOutput()

let players: [Player] = [
  SarsaCom(name: "Sarsa Com 1", epsilon: 0.1, lambda: 0.9, stepSize: 0.001),
  SarsaCom(name: "Sarsa Com 2", epsilon: 0.1, lambda: 0.9, stepSize: 0.001),
  SarsaCom(name: "Sarsa Com 3", epsilon: 0.1, lambda: 0.9, stepSize: 0.001),
  SarsaCom(name: "Sarsa Com 4", epsilon: 0.1, lambda: 0.9, stepSize: 0.001),
]

// learn
for i in 0..<100 {
  stdout.synchronizeFile()
  stdout.writeString("[\(i)] learning")
  for _ in 0..<100 {
    let deck = Deck()
    let game = GameInfo(deck: deck, playerCount: 4)
    let controller = GameController(gameInfo: game, players: players)
    controller.learn = true
    try! controller.start()
    stdout.writeString(".")
  }
  print("")

  for player in players {
    let sarsa = player as! SarsaCom
    sarsa.learn = false
    sarsa.debugPrint = true
    print("\(sarsa.name) weight: \(sarsa.weight)")
  }

  let deck = Deck()
  let game = GameInfo(deck: deck, playerCount: 4)
  let controller = GameController(gameInfo: game, players: players)
  controller.output = true
  try! controller.start()

  for player in players {
    let sarsa = player as! SarsaCom
    sarsa.learn = true
    sarsa.debugPrint = false
  }
}

// save
for i in 0..<4 {
  (players[i] as! SarsaCom).save("sarsa\(i).plist")
}

100回ごとに途中経過を出力しつつ、合計で10,000回学習させている。

ちょっと気をつけたいのが、ステップサイズの指定。
特徴の数が約100個あるので、0.1/100 = 0.001を指定している。
これは、線形結合しているので、特徴の数で割ってやらないと価値関数のレベルでのステップサイズがとんでもないことになるから。
(最初、単に0.1としてしまったので、パラメータが途中で+/-∞に発散してしまった)

これをビルドして実行した結果が、以下。

[0] learning....................................................................................................
Sarsa Com 1 weight: (省略)
Sarsa Com 2 weight: (省略)
Sarsa Com 3 weight: (省略)
Sarsa Com 4 weight: (省略)
--------------------
deal...
----------
[Sarsa Com 1]
action: Play([2]), value: -1.88601832147534
action: Play([3]), value: -2.1396629596506
action: Play([4]), value: -0.976964429850177
action: Play([4, 4]), value: -1.88794972202002
action: Play([5]), value: -1.61011720965024
action: Play([5, 5]), value: -1.48229989912132
action: Play([7]), value: -1.61855856394451
action: Play([7, 7]), value: -1.06738382619868
action: Play([9]), value: -1.03978661155825
action: Play([11]), value: -1.30387326053594
selected action: Play([4])
[Sarsa Com 2]
action: Discard([2]), value: -3.47260842094249
action: Play([5]), value: -3.43794212330004
action: Play([6]), value: -2.6766395787999
action: Play([7]), value: -2.89124387558303
action: Play([9]), value: -3.14855587932015
action: Play([11]), value: -2.87142951407622
selected action: Play([6])
[Sarsa Com 3]
action: Discard([4]), value: -2.73646500832458
action: Play([6]), value: -1.45843191106511
action: Play([8]), value: -2.00908027110381
action: Play([10]), value: -1.83481358750941
action: Play([11]), value: -1.63103209464365
selected action: Play([6])
[Sarsa Com 4]
action: Discard([2]), value: -4.25876948690351
action: Play([6]), value: -3.19366006265389
action: Play([7]), value: -3.18491459831658
action: Play([8]), value: -3.29571386976795
action: Play([9]), value: -3.25895442074911
action: Play([10]), value: -3.76713499225674
selected action: Play([7])
----------
trick 0 is done.
Sarsa Com 4 takes trick.
----------
[Sarsa Com 4]
action: Play([2]), value: -3.45387561078593
action: Play([4]), value: -2.65821054661879
action: Play([4, 4]), value: -3.4888563970052
action: Play([5]), value: -3.71971402795031
action: Play([6]), value: -2.51686386884614
action: Play([8]), value: -2.6189176759602
action: Play([9]), value: -2.58215822694136
action: Play([10]), value: -3.09033879844899
action: Play([10, 10]), value: -2.70866442678027
selected action: Play([6])
[Sarsa Com 1]
action: Discard([2]), value: -1.75787622135135
action: Play([7]), value: -1.41367065199455
action: Play([9]), value: -0.834898699608291
action: Play([11]), value: -1.09898534858598
selected action: Play([9])
[Sarsa Com 2]
action: Discard([2]), value: -3.09539638946885
action: Play([9]), value: -2.73104585969227
action: Play([11]), value: -2.45391949444834
selected action: Play([11])
[Sarsa Com 3]
action: Discard([4]), value: -2.43218603348729
action: Play([11]), value: -1.34465595675913
selected action: Play([11])
----------
trick 1 is done.
Sarsa Com 3 takes trick.
----------
[Sarsa Com 3]
action: Play([4]), value: -2.04348486502394
action: Play([5]), value: -1.73120055903466
action: Play([8]), value: -1.49015261910165
action: Play([8, 8]), value: -1.42457772120615
action: Play([8, 8, 8]), value: -0.864043633201311
action: Play([10]), value: -1.31588593550726
action: Play([10, 10]), value: -1.37424627167578
action: Play([10, 10, 10]), value: -0.997172381308313
selected action: Play([8, 8, 8])
[Sarsa Com 4]
action: Discard([2, 4, 4]), value: -3.93030196924925
action: Play([8, 9, 10]), value: -1.90364392074412
action: Play([8, 10, 10]), value: -2.03690930532466
action: Play([9, 10, 10]), value: -2.00014985630582
selected action: Play([8, 9, 10])
[Sarsa Com 1]
action: Discard([2, 3, 4]), value: -2.72376050377358
selected action: Discard([2, 3, 4])
[Sarsa Com 2]
action: Discard([2, 3, 3]), value: -2.37888230446322
selected action: Discard([2, 3, 3])
----------
trick 2 is done.
Sarsa Com 4 takes trick.
----------
[Sarsa Com 4]
action: Play([2]), value: -2.31580022191776
action: Play([4]), value: -1.52013515775062
action: Play([4, 4]), value: -2.35078100813703
action: Play([5]), value: -2.58163863908214
action: Play([10]), value: -1.52534027897033
selected action: Play([4])
[Sarsa Com 1]
action: Play([5]), value: -3.0039699381072
action: Play([7]), value: -3.01241129240146
action: Play([11]), value: -2.6977259889929
selected action: Play([11])
[Sarsa Com 2]
action: Discard([3]), value: -3.3732721045302
selected action: Discard([3])
[Sarsa Com 3]
action: Discard([4]), value: -2.07295945290131
selected action: Discard([4])
----------
trick 3 is done.
Sarsa Com 1 takes trick.
----------
[Sarsa Com 1]
action: Play([5]), value: -2.56594762380864
action: Play([5, 5]), value: -2.43813031327972
action: Play([7]), value: -2.57438897810291
action: Play([7, 7]), value: -2.02321424035707
selected action: Play([7, 7])
[Sarsa Com 2]
action: Discard([5, 7]), value: -3.74539596637002
action: Play([7, 9]), value: -3.33940793282986
action: Play([9, 9]), value: -3.0887909203641
selected action: Play([9, 9])
[Sarsa Com 3]
action: Discard([5, 10]), value: -2.81011334707719
action: Play([10, 10]), value: -2.11522500023954
selected action: Play([10, 10])
[Sarsa Com 4]
action: Discard([2, 4]), value: -2.87282518631956
selected action: Discard([2, 4])
----------
trick 4 is done.
Sarsa Com 3 takes trick.
----------
[Sarsa Com 3]
action: Play([5]), value: -2.2645385064677
action: Play([10]), value: -1.46133813928329
selected action: Play([10])
[Sarsa Com 4]
action: Discard([5]), value: -3.49158138922092
action: Play([10]), value: -2.27752912057932
selected action: Play([10])
[Sarsa Com 1]
action: Discard([5]), value: -2.35044618784709
selected action: Discard([5])
[Sarsa Com 2]
action: Discard([5]), value: -3.28708582018571
selected action: Discard([5])
----------
trick 5 is done.
Sarsa Com 4 takes trick.
----------
deal is done.
last cards:
Sarsa Com 1: 5
Sarsa Com 2: 7
Sarsa Com 3: 5
Sarsa Com 4: 5
["Sarsa Com 2"] lose in deal.
minus points:
Sarsa Com 1: []
Sarsa Com 2: [7]
Sarsa Com 3: []
Sarsa Com 4: []
--------------------
deal...
----------

〜省略〜

--------------------
deal...
----------
[Sarsa Com 1]
action: Play([2]), value: -2.43751192490417
action: Play([3]), value: -2.69115656307944
action: Play([4]), value: -1.52845803327901
action: Play([4, 4]), value: -2.43944332544885
action: Play([5]), value: -1.90242317020955
action: Play([6]), value: -1.83985885259004
action: Play([6, 6]), value: -2.01646013464381
action: Play([7]), value: -1.67193048383309
action: Play([9]), value: -1.59128021498709
action: Play([10]), value: -1.63640452112774
selected action: Play([4])
[Sarsa Com 2]
action: Discard([2]), value: -1.83067654618551
action: Play([4]), value: -2.42451047054952
action: Play([5]), value: -1.79601024854306
action: Play([8]), value: -1.44591486450213
action: Play([9]), value: -1.50662400456317
action: Play([10]), value: -1.57833714304249
selected action: Play([8])
[Sarsa Com 3]
action: Discard([3]), value: -2.93869290600437
action: Play([8]), value: -2.91579555406669
action: Play([9]), value: -2.62907299695951
action: Play([10]), value: -3.00791218922798
selected action: Play([9])
[Sarsa Com 4]
action: Discard([2]), value: -1.15253494993807
action: Play([11]), value: -1.3074991251928
selected action: Discard([2])
----------
trick 0 is done.
Sarsa Com 3 takes trick.
----------
[Sarsa Com 3]
action: Play([3]), value: -2.22596691391021
action: Play([3, 3]), value: -2.74996511490158
action: Play([5]), value: -2.91703714028062
action: Play([7]), value: -2.37249215134294
action: Play([7, 7]), value: -2.02144132852343
action: Play([8]), value: -2.43100706433578
action: Play([8, 8]), value: -1.92882355251769
action: Play([10]), value: -2.52312369949707
action: Play([10, 10]), value: -2.17219710926474
selected action: Play([8, 8])
[Sarsa Com 4]
action: Discard([2, 2]), value: -1.58307996717115
action: Play([8, 11]), value: -0.944126600418926
action: Play([11, 11]), value: -1.12507343459922
selected action: Play([8, 11])
[Sarsa Com 1]
action: Discard([2, 3]), value: -2.93377256620945
selected action: Discard([2, 3])
[Sarsa Com 2]
action: Discard([2, 3]), value: -2.66066268712375
selected action: Discard([2, 3])
----------
trick 1 is done.
Sarsa Com 4 takes trick.
----------
[Sarsa Com 4]
action: Play([2]), value: -0.380347605514121
action: Play([2, 2]), value: -0.864391431314887
action: Play([3]), value: -1.42779516691363
action: Play([4]), value: -1.325658764665
action: Play([5]), value: -0.243718793632442
action: Play([5, 5]), value: -1.04816888306409
action: Play([11]), value: -0.350953839697532
selected action: Play([5])
[Sarsa Com 1]
action: Discard([4]), value: -3.85856268104413
action: Play([5]), value: -2.80630780869918
action: Play([6]), value: -2.74374349107967
action: Play([7]), value: -2.57581512232272
action: Play([9]), value: -2.49516485347672
action: Play([10]), value: -2.54028915961737
selected action: Play([9])
[Sarsa Com 2]
action: Discard([4]), value: -3.79727813812505
action: Play([9]), value: -2.84162991792055
action: Play([10]), value: -2.91334305639988
selected action: Play([9])
[Sarsa Com 3]
action: Discard([3]), value: -2.11062930303543
action: Play([10]), value: -2.18087639989375
selected action: Discard([3])
----------
trick 2 is done.
Sarsa Com 2 takes trick.
----------
[Sarsa Com 2]
action: Play([4]), value: -3.59188073410386
action: Play([5]), value: -2.9633805120974
action: Play([8]), value: -2.22564821208456
action: Play([9]), value: -2.16606525191463
action: Play([10]), value: -2.74570740659684
action: Play([10, 10]), value: -2.51465612726046
selected action: Play([9])
[Sarsa Com 3]
action: Discard([3]), value: -2.70332407762966
action: Play([10]), value: -2.03508026447133
selected action: Play([10])
[Sarsa Com 4]
action: Discard([2]), value: -0.528951539084747
action: Play([11]), value: -0.398980773223031
selected action: Play([11])
[Sarsa Com 1]
action: Discard([4]), value: -3.42741724468523
selected action: Discard([4])
----------
trick 3 is done.
Sarsa Com 4 takes trick.
----------
[Sarsa Com 4]
action: Play([2]), value: -0.147509226121171
action: Play([2, 2]), value: -0.631553051921938
action: Play([3]), value: -1.19495678752068
action: Play([4]), value: -1.09282038527205
action: Play([5]), value: -1.0379649781144
selected action: Play([2])
[Sarsa Com 1]
action: Play([5]), value: -3.29627589866651
action: Play([6]), value: -3.233711581047
action: Play([7]), value: -3.06578321229004
action: Play([10]), value: -3.0302572495847
selected action: Play([10])
[Sarsa Com 2]
action: Discard([4]), value: -3.42654790064123
action: Play([10]), value: -2.55143651445806
selected action: Play([10])
[Sarsa Com 3]
action: Discard([3]), value: -2.74933359707205
action: Play([10]), value: -1.70652833128607
selected action: Play([10])
----------
trick 4 is done.
Sarsa Com 3 takes trick.
----------
[Sarsa Com 3]
action: Play([3]), value: -2.09314135533522
action: Play([5]), value: -2.04572067168898
action: Play([7]), value: -1.5011756827513
action: Play([7, 7]), value: -1.15012485993179
selected action: Play([7, 7])
[Sarsa Com 4]
action: Discard([2, 3]), value: -1.97816853926348
selected action: Discard([2, 3])
[Sarsa Com 1]
action: Discard([5, 6]), value: -2.45013641320524
selected action: Discard([5, 6])
[Sarsa Com 2]
action: Discard([4, 5]), value: -3.87379281222544
action: Play([8, 10]), value: -1.78125512564604
selected action: Play([8, 10])
----------
trick 5 is done.
Sarsa Com 2 takes trick.
----------
[Sarsa Com 2]
action: Play([4]), value: -2.59059316155682
action: Play([5]), value: -1.96209293955036
selected action: Play([5])
[Sarsa Com 3]
action: Discard([3]), value: -1.59707357642548
action: Play([5]), value: -1.46934448390198
selected action: Play([5])
[Sarsa Com 4]
action: Discard([4]), value: -2.12039119357815
action: Play([5]), value: -2.01731290732492
selected action: Play([5])
[Sarsa Com 1]
action: Play([6]), value: -2.21384152124221
action: Play([7]), value: -1.67977087982789
selected action: Play([7])
----------
trick 6 is done.
Sarsa Com 1 takes trick.
----------
deal is done.
last cards:
Sarsa Com 1: 6
Sarsa Com 2: 4
Sarsa Com 3: 3
Sarsa Com 4: 4
["Sarsa Com 1"] lose in deal.
minus points:
Sarsa Com 1: [11, 6, 6]
Sarsa Com 2: [7]
Sarsa Com 3: [6]
Sarsa Com 4: [6, 11]
--------------------
game ended.
total minus points:
Sarsa Com 1: 23
Sarsa Com 2: 7
Sarsa Com 3: 6
Sarsa Com 4: 17
["Sarsa Com 1"] lose.
[1] learning....................................................................................................

〜〜省略〜〜

[99] learning....................................................................................................
Sarsa Com 1 weight: (省略)
Sarsa Com 2 weight: (省略)
Sarsa Com 3 weight: (省略)
Sarsa Com 4 weight: (省略)
--------------------
deal...
----------
[Sarsa Com 1]
action: Play([2]), value: -1.78176666708138
action: Play([3]), value: -1.5332420132324
action: Play([3, 3]), value: -1.69628048251101
action: Play([3, 3, 3]), value: -2.00253789541708
action: Play([4]), value: -1.73844959257352
action: Play([4, 4]), value: -2.6573726019047
action: Play([5]), value: -2.0380987898678
action: Play([7]), value: -1.43118078846655
action: Play([7, 7]), value: -0.950012178972491
action: Play([8]), value: -0.998254752311405
selected action: Play([7, 7])
[Sarsa Com 2]
action: Discard([3, 5]), value: -6.06736010908569
action: Play([7, 8]), value: -4.39129209664311
action: Play([7, 9]), value: -4.94290742566495
action: Play([7, 10]), value: -4.67003429068773
action: Play([7, 11]), value: -4.70273274076741
action: Play([8, 9]), value: -5.049541776108
action: Play([8, 10]), value: -4.77666864113078
action: Play([8, 11]), value: -4.80936709121046
action: Play([9, 9]), value: -4.74434169491086
action: Play([9, 10]), value: -5.15040293654637
action: Play([9, 11]), value: -5.18310138662605
action: Play([10, 11]), value: -4.80864138574224
selected action: Play([7, 8])
[Sarsa Com 3]
action: Discard([2, 2]), value: -2.4812127707455
action: Play([8, 9]), value: -1.55624575684383
action: Play([8, 11]), value: -1.79410568812154
action: Play([9, 9]), value: -1.18478233462706
action: Play([9, 11]), value: -2.15686976972971
action: Play([11, 11]), value: -1.73104193488844
selected action: Play([9, 9])
[Sarsa Com 4]
action: Discard([2, 3]), value: -3.41972441296677
selected action: Discard([2, 3])
----------
trick 0 is done.
Sarsa Com 3 takes trick.
----------
[Sarsa Com 3]
action: Play([2]), value: -1.25157095224312
action: Play([2, 2]), value: -1.23959956941329
action: Play([2, 2, 2]), value: -1.3757383224057
action: Play([4]), value: -1.23218375994082
action: Play([4, 4]), value: -2.35843242745983
action: Play([8]), value: -0.677341033836141
action: Play([11]), value: -1.1413107540742
action: Play([11, 11]), value: -0.681601007374243
selected action: Play([8])
[Sarsa Com 4]
action: Discard([4]), value: -4.95439703346153
action: Play([8]), value: -3.14926341773496
action: Play([9]), value: -3.02299614817804
selected action: Play([9])
[Sarsa Com 1]
action: Discard([2]), value: -1.15788040043538
selected action: Discard([2])
[Sarsa Com 2]
action: Discard([3]), value: -4.97282420063103
action: Play([9]), value: -4.13448450587093
action: Play([10]), value: -3.76002450498711
action: Play([11]), value: -3.72145885779352
selected action: Play([11])
----------
trick 1 is done.
Sarsa Com 2 takes trick.
----------
[Sarsa Com 2]
action: Play([3]), value: -4.4893607730364
action: Play([5]), value: -3.2692424244165
action: Play([5, 5]), value: -3.58682731213736
action: Play([6]), value: -4.09414132603764
action: Play([9]), value: -3.82697050420417
action: Play([9, 9]), value: -2.8770164716773
action: Play([10]), value: -3.45251050332035
selected action: Play([9, 9])
[Sarsa Com 3]
action: Discard([2, 2]), value: -0.774641099952119
action: Play([11, 11]), value: -0.106421549110851
selected action: Play([11, 11])
[Sarsa Com 4]
action: Discard([4, 5]), value: -4.51814587991488
selected action: Discard([4, 5])
[Sarsa Com 1]
action: Discard([3, 3]), value: -1.06856870143911
selected action: Discard([3, 3])
----------
trick 2 is done.
Sarsa Com 3 takes trick.
----------
[Sarsa Com 3]
action: Play([2]), value: -0.402492985859945
action: Play([2, 2]), value: -0.390521603030115
action: Play([2, 2, 2]), value: -0.526660356022521
action: Play([4]), value: -0.383105793557637
action: Play([4, 4]), value: -1.50935446107666
selected action: Play([4])
[Sarsa Com 4]
action: Play([6]), value: -5.05471382599029
action: Play([7]), value: -4.52827688338485
action: Play([8]), value: -4.38906555404887
selected action: Play([8])
[Sarsa Com 1]
action: Discard([3]), value: -2.16876465520947
action: Play([8]), value: -0.881245991798936
selected action: Play([8])
[Sarsa Com 2]
action: Discard([3]), value: -3.90901138987242
action: Play([10]), value: -2.59885616122331
selected action: Play([10])
----------
trick 3 is done.
Sarsa Com 2 takes trick.
----------
[Sarsa Com 2]
action: Play([3]), value: -3.26880086873592
action: Play([5]), value: -2.04868252011602
action: Play([5, 5]), value: -2.36626740783688
action: Play([6]), value: -2.87358142173716
selected action: Play([5])
[Sarsa Com 3]
action: Discard([2]), value: -0.786187926540899
selected action: Discard([2])
[Sarsa Com 4]
action: Play([6]), value: -4.42625134061157
action: Play([7]), value: -3.89981439800613
selected action: Play([7])
[Sarsa Com 1]
action: Discard([3]), value: -1.38699228156571
selected action: Discard([3])
----------
trick 4 is done.
Sarsa Com 4 takes trick.
----------
[Sarsa Com 4]
action: Play([6]), value: -3.72127104497376
action: Play([6, 6]), value: -3.27250947540913
selected action: Play([6, 6])
[Sarsa Com 1]
action: Discard([4, 4]), value: -2.28567837738157
selected action: Discard([4, 4])
[Sarsa Com 2]
action: Discard([3, 5]), value: -3.0799741864967
selected action: Discard([3, 5])
[Sarsa Com 3]
action: Discard([2, 2]), value: 0.364238178287948
selected action: Discard([2, 2])
----------
trick 5 is done.
Sarsa Com 4 takes trick.
----------
deal is done.
last cards:
Sarsa Com 1: 5
Sarsa Com 2: 6
Sarsa Com 3: 4
Sarsa Com 4: 6
["Sarsa Com 4", "Sarsa Com 2"] lose in deal.
minus points:
Sarsa Com 1: []
Sarsa Com 2: [6]
Sarsa Com 3: []
Sarsa Com 4: [6]
--------------------
deal...
----------

〜省略〜

--------------------
deal...
----------
[Sarsa Com 3]
action: Play([2]), value: -1.56493476704333
action: Play([3]), value: -2.46867409363993
action: Play([4]), value: -1.55749058903126
action: Play([4, 4]), value: -2.68373925655028
action: Play([5]), value: -2.02340651221356
action: Play([7]), value: -1.3948562075944
action: Play([7, 7]), value: -0.658053365448189
action: Play([9]), value: -1.36541194453475
action: Play([9, 9]), value: -0.460648236203308
action: Play([10]), value: -0.967481377971518
selected action: Play([9, 9])
[Sarsa Com 4]
action: Discard([2, 2]), value: -4.7164784207611
action: Play([10, 10]), value: -3.62787032096611
action: Play([10, 11]), value: -3.70774165088491
selected action: Play([10, 10])
[Sarsa Com 1]
action: Discard([2, 2]), value: -0.10790995669631
action: Play([11, 11]), value: -0.230677932574472
selected action: Discard([2, 2])
[Sarsa Com 2]
action: Discard([3, 3]), value: -3.96398175521391
selected action: Discard([3, 3])
----------
trick 0 is done.
Sarsa Com 4 takes trick.
----------
[Sarsa Com 4]
action: Play([2]), value: -3.70218023723626
action: Play([2, 2]), value: -3.96270183315716
action: Play([5]), value: -4.26793505104003
action: Play([6]), value: -3.46048888093795
action: Play([6, 6]), value: -2.98559275206603
action: Play([7]), value: -3.2844580073545
action: Play([8]), value: -3.14524667801852
action: Play([11]), value: -3.1801299214301
selected action: Play([6, 6])
[Sarsa Com 1]
action: Discard([3, 3]), value: -0.96358827833708
action: Play([7, 11]), value: -0.0790601386909588
action: Play([11, 11]), value: -0.701831500922724
selected action: Play([7, 11])
[Sarsa Com 2]
action: Discard([4, 5]), value: -5.95024765619554
selected action: Discard([4, 5])
[Sarsa Com 3]
action: Discard([2, 3]), value: -1.95385449213466
selected action: Discard([2, 3])
----------
trick 1 is done.
Sarsa Com 1 takes trick.
----------
[Sarsa Com 1]
action: Play([3]), value: 0.141443104146031
action: Play([3, 3]), value: -0.358768259301096
action: Play([4]), value: -0.860906178498235
action: Play([5]), value: -0.247666641481773
action: Play([11]), value: -0.291092148922914
action: Play([11, 11]), value: 0.293657055170194
selected action: Play([11, 11])
[Sarsa Com 2]
action: Discard([8, 8]), value: -4.44296791141928
selected action: Discard([8, 8])
[Sarsa Com 3]
action: Discard([4, 4]), value: -2.70049428671409
selected action: Discard([4, 4])
[Sarsa Com 4]
action: Discard([2, 2]), value: -3.30093221564442
selected action: Discard([2, 2])
----------
trick 2 is done.
Sarsa Com 1 takes trick.
----------
[Sarsa Com 1]
action: Play([3]), value: 0.138185586599377
action: Play([3, 3]), value: -0.362025776847751
action: Play([4]), value: -0.86416369604489
action: Play([5]), value: -0.250924159028427
selected action: Play([3])
[Sarsa Com 2]
action: Play([9]), value: -4.65481560077389
action: Play([10]), value: -4.34559501253163
selected action: Play([10])
[Sarsa Com 3]
action: Discard([5]), value: -3.28488380943377
action: Play([10]), value: -2.45272267413438
selected action: Play([10])
[Sarsa Com 4]
action: Discard([5]), value: -4.14422716266526
action: Play([11]), value: -3.22698837840968
selected action: Play([11])
----------
trick 3 is done.
Sarsa Com 4 takes trick.
----------
[Sarsa Com 4]
action: Play([5]), value: -4.08984398834974
action: Play([7]), value: -3.10636694466421
action: Play([8]), value: -2.96715561532822
selected action: Play([8])
[Sarsa Com 1]
action: Discard([3]), value: -0.822210280736597
selected action: Discard([3])
[Sarsa Com 2]
action: Play([9]), value: -4.37820875683349
selected action: Play([9])
[Sarsa Com 3]
action: Discard([5]), value: -2.48183700561925
selected action: Discard([5])
----------
trick 4 is done.
Sarsa Com 2 takes trick.
----------
[Sarsa Com 2]
action: Play([9]), value: -4.10295599692963
selected action: Play([9])
[Sarsa Com 3]
action: Discard([7]), value: -2.15609837454416
selected action: Discard([7])
[Sarsa Com 4]
action: Discard([5]), value: -2.74988698773465
selected action: Discard([5])
[Sarsa Com 1]
action: Discard([4]), value: -1.69595161390184
selected action: Discard([4])
----------
trick 5 is done.
Sarsa Com 2 takes trick.
----------
deal is done.
last cards:
Sarsa Com 1: 5
Sarsa Com 2: 9
Sarsa Com 3: 7
Sarsa Com 4: 7
["Sarsa Com 2"] lose in deal.
minus points:
Sarsa Com 1: []
Sarsa Com 2: [6, 8, 9]
Sarsa Com 3: []
Sarsa Com 4: [6, 8]
--------------------
game ended.
total minus points:
Sarsa Com 1: 0
Sarsa Com 2: 23
Sarsa Com 3: 0
Sarsa Com 4: 14
["Sarsa Com 2"] lose.

なかなか絶妙な判断のプレイが出来てることが分かると思う。
単調なプレイになっていないのがいい。

また、最初と最後を比較してみると、学習が進んでる様子もうかがえる。

学習結果の分析

学習後のSarsa Com 1〜Sarsa Com 4の各特徴ごとのパラメータをグラフにしてみたのが、以下。

f:id:yamaimo0625:20151017095247p:plain

若干のバラツキはあるけど、大体同じ値になっていることが分かると思う。
(もしこれで完全にバラバラだったら、表現力が足りなくて発散してしまっていることになる)

これを見ると、手札に3や4がないとかなり負けやすく、逆に手札から7, 8, 9がなくなるとかなり負けにくいということも分かる。
あと、意外なことに最後にプレイされたカードはあまり重要じゃない感じ。
それよりも、どのカードが残っているのかが重要な感じ。
カウンティングの重要性・・・

何はともあれ、これで強化学習の方法を使ったAIの作成は完了。
学習が上手くいかない感じだったらもっと特徴の数や種類を調整しようと思っていたのだけど、今回は杞憂で終わったみたい。

今日はここまで!