昨日はアクション選択と学習のアルゴリズムを実装した。
今日は、パラメータの保存とロードを実装して、クラスとして完成させる。
SarsaComクラス(続き)
保存とロード
保存とロードの実装を入れたクラス全体の様子は、以下のようになる。
(省略されているところは、昨日、一昨日の記事を参照)
//============================== // BirdHead //------------------------------ // SarsaCom.swift //============================== import Foundation class SarsaCom: Player { static func load(filename: String) -> SarsaCom { let saveData = NSDictionary(contentsOfFile: filename)! let name = saveData["name"]! as! String let epsilon = saveData["epsilon"]! as! Double let lambda = saveData["lambda"]! as! Double let stepSize = saveData["stepSize"]! as! Double let weight = saveData["weight"]! as! [Double] let sarsaCom = SarsaCom(name: name, epsilon: epsilon, lambda: lambda, stepSize: stepSize) sarsaCom.weight = weight return sarsaCom } private(set) var name: String private(set) var isCom: Bool private var epsilon: Double private var lambda: Double private var stepSize: Double var learn: Bool var debugPrint: Bool private(set) var weight: [Double] private var previousFeature: [Double] private var currentFeature: [Double] private var accumulatedFeature: [Double] init(name: String, epsilon: Double, lambda: Double, stepSize: Double) { self.name = name self.isCom = true self.epsilon = epsilon self.lambda = lambda self.stepSize = stepSize self.learn = true self.debugPrint = false self.weight = [Double](count: 98, repeatedValue: 0.0) self.previousFeature = [Double]() self.currentFeature = [Double]() self.accumulatedFeature = [Double](count: 98, repeatedValue: 0.0) } func select(view: GameInfo.PlayerView) throws -> Action { // 省略 } func learn(minusPoint: Int) { // 省略 } func save(filename: String) { var saveData = [String: AnyObject]() saveData["name"] = self.name saveData["epsilon"] = self.epsilon saveData["lambda"] = self.lambda saveData["stepSize"] = self.stepSize saveData["weight"] = self.weight (saveData as NSDictionary).writeToFile(filename, atomically: true) } private func toFeature(view: GameInfo.PlayerView, action: Action) -> [Double] { // 省略 } private func valueOfFeature(feature: [Double]) -> Double { // 省略 } }
保存ではプロパティをplistファイルに書き出し、逆にロードではプロパティをplistファイルから読みだすようにしている。
これでSarsaComクラスは完成!
自己対戦による学習
これで実装は出来たので、自己対戦で学習させてみる。
import Foundation let stdout = NSFileHandle.fileHandleWithStandardOutput() let players: [Player] = [ SarsaCom(name: "Sarsa Com 1", epsilon: 0.1, lambda: 0.9, stepSize: 0.001), SarsaCom(name: "Sarsa Com 2", epsilon: 0.1, lambda: 0.9, stepSize: 0.001), SarsaCom(name: "Sarsa Com 3", epsilon: 0.1, lambda: 0.9, stepSize: 0.001), SarsaCom(name: "Sarsa Com 4", epsilon: 0.1, lambda: 0.9, stepSize: 0.001), ] // learn for i in 0..<100 { stdout.synchronizeFile() stdout.writeString("[\(i)] learning") for _ in 0..<100 { let deck = Deck() let game = GameInfo(deck: deck, playerCount: 4) let controller = GameController(gameInfo: game, players: players) controller.learn = true try! controller.start() stdout.writeString(".") } print("") for player in players { let sarsa = player as! SarsaCom sarsa.learn = false sarsa.debugPrint = true print("\(sarsa.name) weight: \(sarsa.weight)") } let deck = Deck() let game = GameInfo(deck: deck, playerCount: 4) let controller = GameController(gameInfo: game, players: players) controller.output = true try! controller.start() for player in players { let sarsa = player as! SarsaCom sarsa.learn = true sarsa.debugPrint = false } } // save for i in 0..<4 { (players[i] as! SarsaCom).save("sarsa\(i).plist") }
100回ごとに途中経過を出力しつつ、合計で10,000回学習させている。
ちょっと気をつけたいのが、ステップサイズの指定。
特徴の数が約100個あるので、0.1/100 = 0.001を指定している。
これは、線形結合しているので、特徴の数で割ってやらないと価値関数のレベルでのステップサイズがとんでもないことになるから。
(最初、単に0.1としてしまったので、パラメータが途中で+/-∞に発散してしまった)
これをビルドして実行した結果が、以下。
[0] learning.................................................................................................... Sarsa Com 1 weight: (省略) Sarsa Com 2 weight: (省略) Sarsa Com 3 weight: (省略) Sarsa Com 4 weight: (省略) -------------------- deal... ---------- [Sarsa Com 1] action: Play([2]), value: -1.88601832147534 action: Play([3]), value: -2.1396629596506 action: Play([4]), value: -0.976964429850177 action: Play([4, 4]), value: -1.88794972202002 action: Play([5]), value: -1.61011720965024 action: Play([5, 5]), value: -1.48229989912132 action: Play([7]), value: -1.61855856394451 action: Play([7, 7]), value: -1.06738382619868 action: Play([9]), value: -1.03978661155825 action: Play([11]), value: -1.30387326053594 selected action: Play([4]) [Sarsa Com 2] action: Discard([2]), value: -3.47260842094249 action: Play([5]), value: -3.43794212330004 action: Play([6]), value: -2.6766395787999 action: Play([7]), value: -2.89124387558303 action: Play([9]), value: -3.14855587932015 action: Play([11]), value: -2.87142951407622 selected action: Play([6]) [Sarsa Com 3] action: Discard([4]), value: -2.73646500832458 action: Play([6]), value: -1.45843191106511 action: Play([8]), value: -2.00908027110381 action: Play([10]), value: -1.83481358750941 action: Play([11]), value: -1.63103209464365 selected action: Play([6]) [Sarsa Com 4] action: Discard([2]), value: -4.25876948690351 action: Play([6]), value: -3.19366006265389 action: Play([7]), value: -3.18491459831658 action: Play([8]), value: -3.29571386976795 action: Play([9]), value: -3.25895442074911 action: Play([10]), value: -3.76713499225674 selected action: Play([7]) ---------- trick 0 is done. Sarsa Com 4 takes trick. ---------- [Sarsa Com 4] action: Play([2]), value: -3.45387561078593 action: Play([4]), value: -2.65821054661879 action: Play([4, 4]), value: -3.4888563970052 action: Play([5]), value: -3.71971402795031 action: Play([6]), value: -2.51686386884614 action: Play([8]), value: -2.6189176759602 action: Play([9]), value: -2.58215822694136 action: Play([10]), value: -3.09033879844899 action: Play([10, 10]), value: -2.70866442678027 selected action: Play([6]) [Sarsa Com 1] action: Discard([2]), value: -1.75787622135135 action: Play([7]), value: -1.41367065199455 action: Play([9]), value: -0.834898699608291 action: Play([11]), value: -1.09898534858598 selected action: Play([9]) [Sarsa Com 2] action: Discard([2]), value: -3.09539638946885 action: Play([9]), value: -2.73104585969227 action: Play([11]), value: -2.45391949444834 selected action: Play([11]) [Sarsa Com 3] action: Discard([4]), value: -2.43218603348729 action: Play([11]), value: -1.34465595675913 selected action: Play([11]) ---------- trick 1 is done. Sarsa Com 3 takes trick. ---------- [Sarsa Com 3] action: Play([4]), value: -2.04348486502394 action: Play([5]), value: -1.73120055903466 action: Play([8]), value: -1.49015261910165 action: Play([8, 8]), value: -1.42457772120615 action: Play([8, 8, 8]), value: -0.864043633201311 action: Play([10]), value: -1.31588593550726 action: Play([10, 10]), value: -1.37424627167578 action: Play([10, 10, 10]), value: -0.997172381308313 selected action: Play([8, 8, 8]) [Sarsa Com 4] action: Discard([2, 4, 4]), value: -3.93030196924925 action: Play([8, 9, 10]), value: -1.90364392074412 action: Play([8, 10, 10]), value: -2.03690930532466 action: Play([9, 10, 10]), value: -2.00014985630582 selected action: Play([8, 9, 10]) [Sarsa Com 1] action: Discard([2, 3, 4]), value: -2.72376050377358 selected action: Discard([2, 3, 4]) [Sarsa Com 2] action: Discard([2, 3, 3]), value: -2.37888230446322 selected action: Discard([2, 3, 3]) ---------- trick 2 is done. Sarsa Com 4 takes trick. ---------- [Sarsa Com 4] action: Play([2]), value: -2.31580022191776 action: Play([4]), value: -1.52013515775062 action: Play([4, 4]), value: -2.35078100813703 action: Play([5]), value: -2.58163863908214 action: Play([10]), value: -1.52534027897033 selected action: Play([4]) [Sarsa Com 1] action: Play([5]), value: -3.0039699381072 action: Play([7]), value: -3.01241129240146 action: Play([11]), value: -2.6977259889929 selected action: Play([11]) [Sarsa Com 2] action: Discard([3]), value: -3.3732721045302 selected action: Discard([3]) [Sarsa Com 3] action: Discard([4]), value: -2.07295945290131 selected action: Discard([4]) ---------- trick 3 is done. Sarsa Com 1 takes trick. ---------- [Sarsa Com 1] action: Play([5]), value: -2.56594762380864 action: Play([5, 5]), value: -2.43813031327972 action: Play([7]), value: -2.57438897810291 action: Play([7, 7]), value: -2.02321424035707 selected action: Play([7, 7]) [Sarsa Com 2] action: Discard([5, 7]), value: -3.74539596637002 action: Play([7, 9]), value: -3.33940793282986 action: Play([9, 9]), value: -3.0887909203641 selected action: Play([9, 9]) [Sarsa Com 3] action: Discard([5, 10]), value: -2.81011334707719 action: Play([10, 10]), value: -2.11522500023954 selected action: Play([10, 10]) [Sarsa Com 4] action: Discard([2, 4]), value: -2.87282518631956 selected action: Discard([2, 4]) ---------- trick 4 is done. Sarsa Com 3 takes trick. ---------- [Sarsa Com 3] action: Play([5]), value: -2.2645385064677 action: Play([10]), value: -1.46133813928329 selected action: Play([10]) [Sarsa Com 4] action: Discard([5]), value: -3.49158138922092 action: Play([10]), value: -2.27752912057932 selected action: Play([10]) [Sarsa Com 1] action: Discard([5]), value: -2.35044618784709 selected action: Discard([5]) [Sarsa Com 2] action: Discard([5]), value: -3.28708582018571 selected action: Discard([5]) ---------- trick 5 is done. Sarsa Com 4 takes trick. ---------- deal is done. last cards: Sarsa Com 1: 5 Sarsa Com 2: 7 Sarsa Com 3: 5 Sarsa Com 4: 5 ["Sarsa Com 2"] lose in deal. minus points: Sarsa Com 1: [] Sarsa Com 2: [7] Sarsa Com 3: [] Sarsa Com 4: [] -------------------- deal... ---------- 〜省略〜 -------------------- deal... ---------- [Sarsa Com 1] action: Play([2]), value: -2.43751192490417 action: Play([3]), value: -2.69115656307944 action: Play([4]), value: -1.52845803327901 action: Play([4, 4]), value: -2.43944332544885 action: Play([5]), value: -1.90242317020955 action: Play([6]), value: -1.83985885259004 action: Play([6, 6]), value: -2.01646013464381 action: Play([7]), value: -1.67193048383309 action: Play([9]), value: -1.59128021498709 action: Play([10]), value: -1.63640452112774 selected action: Play([4]) [Sarsa Com 2] action: Discard([2]), value: -1.83067654618551 action: Play([4]), value: -2.42451047054952 action: Play([5]), value: -1.79601024854306 action: Play([8]), value: -1.44591486450213 action: Play([9]), value: -1.50662400456317 action: Play([10]), value: -1.57833714304249 selected action: Play([8]) [Sarsa Com 3] action: Discard([3]), value: -2.93869290600437 action: Play([8]), value: -2.91579555406669 action: Play([9]), value: -2.62907299695951 action: Play([10]), value: -3.00791218922798 selected action: Play([9]) [Sarsa Com 4] action: Discard([2]), value: -1.15253494993807 action: Play([11]), value: -1.3074991251928 selected action: Discard([2]) ---------- trick 0 is done. Sarsa Com 3 takes trick. ---------- [Sarsa Com 3] action: Play([3]), value: -2.22596691391021 action: Play([3, 3]), value: -2.74996511490158 action: Play([5]), value: -2.91703714028062 action: Play([7]), value: -2.37249215134294 action: Play([7, 7]), value: -2.02144132852343 action: Play([8]), value: -2.43100706433578 action: Play([8, 8]), value: -1.92882355251769 action: Play([10]), value: -2.52312369949707 action: Play([10, 10]), value: -2.17219710926474 selected action: Play([8, 8]) [Sarsa Com 4] action: Discard([2, 2]), value: -1.58307996717115 action: Play([8, 11]), value: -0.944126600418926 action: Play([11, 11]), value: -1.12507343459922 selected action: Play([8, 11]) [Sarsa Com 1] action: Discard([2, 3]), value: -2.93377256620945 selected action: Discard([2, 3]) [Sarsa Com 2] action: Discard([2, 3]), value: -2.66066268712375 selected action: Discard([2, 3]) ---------- trick 1 is done. Sarsa Com 4 takes trick. ---------- [Sarsa Com 4] action: Play([2]), value: -0.380347605514121 action: Play([2, 2]), value: -0.864391431314887 action: Play([3]), value: -1.42779516691363 action: Play([4]), value: -1.325658764665 action: Play([5]), value: -0.243718793632442 action: Play([5, 5]), value: -1.04816888306409 action: Play([11]), value: -0.350953839697532 selected action: Play([5]) [Sarsa Com 1] action: Discard([4]), value: -3.85856268104413 action: Play([5]), value: -2.80630780869918 action: Play([6]), value: -2.74374349107967 action: Play([7]), value: -2.57581512232272 action: Play([9]), value: -2.49516485347672 action: Play([10]), value: -2.54028915961737 selected action: Play([9]) [Sarsa Com 2] action: Discard([4]), value: -3.79727813812505 action: Play([9]), value: -2.84162991792055 action: Play([10]), value: -2.91334305639988 selected action: Play([9]) [Sarsa Com 3] action: Discard([3]), value: -2.11062930303543 action: Play([10]), value: -2.18087639989375 selected action: Discard([3]) ---------- trick 2 is done. Sarsa Com 2 takes trick. ---------- [Sarsa Com 2] action: Play([4]), value: -3.59188073410386 action: Play([5]), value: -2.9633805120974 action: Play([8]), value: -2.22564821208456 action: Play([9]), value: -2.16606525191463 action: Play([10]), value: -2.74570740659684 action: Play([10, 10]), value: -2.51465612726046 selected action: Play([9]) [Sarsa Com 3] action: Discard([3]), value: -2.70332407762966 action: Play([10]), value: -2.03508026447133 selected action: Play([10]) [Sarsa Com 4] action: Discard([2]), value: -0.528951539084747 action: Play([11]), value: -0.398980773223031 selected action: Play([11]) [Sarsa Com 1] action: Discard([4]), value: -3.42741724468523 selected action: Discard([4]) ---------- trick 3 is done. Sarsa Com 4 takes trick. ---------- [Sarsa Com 4] action: Play([2]), value: -0.147509226121171 action: Play([2, 2]), value: -0.631553051921938 action: Play([3]), value: -1.19495678752068 action: Play([4]), value: -1.09282038527205 action: Play([5]), value: -1.0379649781144 selected action: Play([2]) [Sarsa Com 1] action: Play([5]), value: -3.29627589866651 action: Play([6]), value: -3.233711581047 action: Play([7]), value: -3.06578321229004 action: Play([10]), value: -3.0302572495847 selected action: Play([10]) [Sarsa Com 2] action: Discard([4]), value: -3.42654790064123 action: Play([10]), value: -2.55143651445806 selected action: Play([10]) [Sarsa Com 3] action: Discard([3]), value: -2.74933359707205 action: Play([10]), value: -1.70652833128607 selected action: Play([10]) ---------- trick 4 is done. Sarsa Com 3 takes trick. ---------- [Sarsa Com 3] action: Play([3]), value: -2.09314135533522 action: Play([5]), value: -2.04572067168898 action: Play([7]), value: -1.5011756827513 action: Play([7, 7]), value: -1.15012485993179 selected action: Play([7, 7]) [Sarsa Com 4] action: Discard([2, 3]), value: -1.97816853926348 selected action: Discard([2, 3]) [Sarsa Com 1] action: Discard([5, 6]), value: -2.45013641320524 selected action: Discard([5, 6]) [Sarsa Com 2] action: Discard([4, 5]), value: -3.87379281222544 action: Play([8, 10]), value: -1.78125512564604 selected action: Play([8, 10]) ---------- trick 5 is done. Sarsa Com 2 takes trick. ---------- [Sarsa Com 2] action: Play([4]), value: -2.59059316155682 action: Play([5]), value: -1.96209293955036 selected action: Play([5]) [Sarsa Com 3] action: Discard([3]), value: -1.59707357642548 action: Play([5]), value: -1.46934448390198 selected action: Play([5]) [Sarsa Com 4] action: Discard([4]), value: -2.12039119357815 action: Play([5]), value: -2.01731290732492 selected action: Play([5]) [Sarsa Com 1] action: Play([6]), value: -2.21384152124221 action: Play([7]), value: -1.67977087982789 selected action: Play([7]) ---------- trick 6 is done. Sarsa Com 1 takes trick. ---------- deal is done. last cards: Sarsa Com 1: 6 Sarsa Com 2: 4 Sarsa Com 3: 3 Sarsa Com 4: 4 ["Sarsa Com 1"] lose in deal. minus points: Sarsa Com 1: [11, 6, 6] Sarsa Com 2: [7] Sarsa Com 3: [6] Sarsa Com 4: [6, 11] -------------------- game ended. total minus points: Sarsa Com 1: 23 Sarsa Com 2: 7 Sarsa Com 3: 6 Sarsa Com 4: 17 ["Sarsa Com 1"] lose. [1] learning.................................................................................................... 〜〜省略〜〜 [99] learning.................................................................................................... Sarsa Com 1 weight: (省略) Sarsa Com 2 weight: (省略) Sarsa Com 3 weight: (省略) Sarsa Com 4 weight: (省略) -------------------- deal... ---------- [Sarsa Com 1] action: Play([2]), value: -1.78176666708138 action: Play([3]), value: -1.5332420132324 action: Play([3, 3]), value: -1.69628048251101 action: Play([3, 3, 3]), value: -2.00253789541708 action: Play([4]), value: -1.73844959257352 action: Play([4, 4]), value: -2.6573726019047 action: Play([5]), value: -2.0380987898678 action: Play([7]), value: -1.43118078846655 action: Play([7, 7]), value: -0.950012178972491 action: Play([8]), value: -0.998254752311405 selected action: Play([7, 7]) [Sarsa Com 2] action: Discard([3, 5]), value: -6.06736010908569 action: Play([7, 8]), value: -4.39129209664311 action: Play([7, 9]), value: -4.94290742566495 action: Play([7, 10]), value: -4.67003429068773 action: Play([7, 11]), value: -4.70273274076741 action: Play([8, 9]), value: -5.049541776108 action: Play([8, 10]), value: -4.77666864113078 action: Play([8, 11]), value: -4.80936709121046 action: Play([9, 9]), value: -4.74434169491086 action: Play([9, 10]), value: -5.15040293654637 action: Play([9, 11]), value: -5.18310138662605 action: Play([10, 11]), value: -4.80864138574224 selected action: Play([7, 8]) [Sarsa Com 3] action: Discard([2, 2]), value: -2.4812127707455 action: Play([8, 9]), value: -1.55624575684383 action: Play([8, 11]), value: -1.79410568812154 action: Play([9, 9]), value: -1.18478233462706 action: Play([9, 11]), value: -2.15686976972971 action: Play([11, 11]), value: -1.73104193488844 selected action: Play([9, 9]) [Sarsa Com 4] action: Discard([2, 3]), value: -3.41972441296677 selected action: Discard([2, 3]) ---------- trick 0 is done. Sarsa Com 3 takes trick. ---------- [Sarsa Com 3] action: Play([2]), value: -1.25157095224312 action: Play([2, 2]), value: -1.23959956941329 action: Play([2, 2, 2]), value: -1.3757383224057 action: Play([4]), value: -1.23218375994082 action: Play([4, 4]), value: -2.35843242745983 action: Play([8]), value: -0.677341033836141 action: Play([11]), value: -1.1413107540742 action: Play([11, 11]), value: -0.681601007374243 selected action: Play([8]) [Sarsa Com 4] action: Discard([4]), value: -4.95439703346153 action: Play([8]), value: -3.14926341773496 action: Play([9]), value: -3.02299614817804 selected action: Play([9]) [Sarsa Com 1] action: Discard([2]), value: -1.15788040043538 selected action: Discard([2]) [Sarsa Com 2] action: Discard([3]), value: -4.97282420063103 action: Play([9]), value: -4.13448450587093 action: Play([10]), value: -3.76002450498711 action: Play([11]), value: -3.72145885779352 selected action: Play([11]) ---------- trick 1 is done. Sarsa Com 2 takes trick. ---------- [Sarsa Com 2] action: Play([3]), value: -4.4893607730364 action: Play([5]), value: -3.2692424244165 action: Play([5, 5]), value: -3.58682731213736 action: Play([6]), value: -4.09414132603764 action: Play([9]), value: -3.82697050420417 action: Play([9, 9]), value: -2.8770164716773 action: Play([10]), value: -3.45251050332035 selected action: Play([9, 9]) [Sarsa Com 3] action: Discard([2, 2]), value: -0.774641099952119 action: Play([11, 11]), value: -0.106421549110851 selected action: Play([11, 11]) [Sarsa Com 4] action: Discard([4, 5]), value: -4.51814587991488 selected action: Discard([4, 5]) [Sarsa Com 1] action: Discard([3, 3]), value: -1.06856870143911 selected action: Discard([3, 3]) ---------- trick 2 is done. Sarsa Com 3 takes trick. ---------- [Sarsa Com 3] action: Play([2]), value: -0.402492985859945 action: Play([2, 2]), value: -0.390521603030115 action: Play([2, 2, 2]), value: -0.526660356022521 action: Play([4]), value: -0.383105793557637 action: Play([4, 4]), value: -1.50935446107666 selected action: Play([4]) [Sarsa Com 4] action: Play([6]), value: -5.05471382599029 action: Play([7]), value: -4.52827688338485 action: Play([8]), value: -4.38906555404887 selected action: Play([8]) [Sarsa Com 1] action: Discard([3]), value: -2.16876465520947 action: Play([8]), value: -0.881245991798936 selected action: Play([8]) [Sarsa Com 2] action: Discard([3]), value: -3.90901138987242 action: Play([10]), value: -2.59885616122331 selected action: Play([10]) ---------- trick 3 is done. Sarsa Com 2 takes trick. ---------- [Sarsa Com 2] action: Play([3]), value: -3.26880086873592 action: Play([5]), value: -2.04868252011602 action: Play([5, 5]), value: -2.36626740783688 action: Play([6]), value: -2.87358142173716 selected action: Play([5]) [Sarsa Com 3] action: Discard([2]), value: -0.786187926540899 selected action: Discard([2]) [Sarsa Com 4] action: Play([6]), value: -4.42625134061157 action: Play([7]), value: -3.89981439800613 selected action: Play([7]) [Sarsa Com 1] action: Discard([3]), value: -1.38699228156571 selected action: Discard([3]) ---------- trick 4 is done. Sarsa Com 4 takes trick. ---------- [Sarsa Com 4] action: Play([6]), value: -3.72127104497376 action: Play([6, 6]), value: -3.27250947540913 selected action: Play([6, 6]) [Sarsa Com 1] action: Discard([4, 4]), value: -2.28567837738157 selected action: Discard([4, 4]) [Sarsa Com 2] action: Discard([3, 5]), value: -3.0799741864967 selected action: Discard([3, 5]) [Sarsa Com 3] action: Discard([2, 2]), value: 0.364238178287948 selected action: Discard([2, 2]) ---------- trick 5 is done. Sarsa Com 4 takes trick. ---------- deal is done. last cards: Sarsa Com 1: 5 Sarsa Com 2: 6 Sarsa Com 3: 4 Sarsa Com 4: 6 ["Sarsa Com 4", "Sarsa Com 2"] lose in deal. minus points: Sarsa Com 1: [] Sarsa Com 2: [6] Sarsa Com 3: [] Sarsa Com 4: [6] -------------------- deal... ---------- 〜省略〜 -------------------- deal... ---------- [Sarsa Com 3] action: Play([2]), value: -1.56493476704333 action: Play([3]), value: -2.46867409363993 action: Play([4]), value: -1.55749058903126 action: Play([4, 4]), value: -2.68373925655028 action: Play([5]), value: -2.02340651221356 action: Play([7]), value: -1.3948562075944 action: Play([7, 7]), value: -0.658053365448189 action: Play([9]), value: -1.36541194453475 action: Play([9, 9]), value: -0.460648236203308 action: Play([10]), value: -0.967481377971518 selected action: Play([9, 9]) [Sarsa Com 4] action: Discard([2, 2]), value: -4.7164784207611 action: Play([10, 10]), value: -3.62787032096611 action: Play([10, 11]), value: -3.70774165088491 selected action: Play([10, 10]) [Sarsa Com 1] action: Discard([2, 2]), value: -0.10790995669631 action: Play([11, 11]), value: -0.230677932574472 selected action: Discard([2, 2]) [Sarsa Com 2] action: Discard([3, 3]), value: -3.96398175521391 selected action: Discard([3, 3]) ---------- trick 0 is done. Sarsa Com 4 takes trick. ---------- [Sarsa Com 4] action: Play([2]), value: -3.70218023723626 action: Play([2, 2]), value: -3.96270183315716 action: Play([5]), value: -4.26793505104003 action: Play([6]), value: -3.46048888093795 action: Play([6, 6]), value: -2.98559275206603 action: Play([7]), value: -3.2844580073545 action: Play([8]), value: -3.14524667801852 action: Play([11]), value: -3.1801299214301 selected action: Play([6, 6]) [Sarsa Com 1] action: Discard([3, 3]), value: -0.96358827833708 action: Play([7, 11]), value: -0.0790601386909588 action: Play([11, 11]), value: -0.701831500922724 selected action: Play([7, 11]) [Sarsa Com 2] action: Discard([4, 5]), value: -5.95024765619554 selected action: Discard([4, 5]) [Sarsa Com 3] action: Discard([2, 3]), value: -1.95385449213466 selected action: Discard([2, 3]) ---------- trick 1 is done. Sarsa Com 1 takes trick. ---------- [Sarsa Com 1] action: Play([3]), value: 0.141443104146031 action: Play([3, 3]), value: -0.358768259301096 action: Play([4]), value: -0.860906178498235 action: Play([5]), value: -0.247666641481773 action: Play([11]), value: -0.291092148922914 action: Play([11, 11]), value: 0.293657055170194 selected action: Play([11, 11]) [Sarsa Com 2] action: Discard([8, 8]), value: -4.44296791141928 selected action: Discard([8, 8]) [Sarsa Com 3] action: Discard([4, 4]), value: -2.70049428671409 selected action: Discard([4, 4]) [Sarsa Com 4] action: Discard([2, 2]), value: -3.30093221564442 selected action: Discard([2, 2]) ---------- trick 2 is done. Sarsa Com 1 takes trick. ---------- [Sarsa Com 1] action: Play([3]), value: 0.138185586599377 action: Play([3, 3]), value: -0.362025776847751 action: Play([4]), value: -0.86416369604489 action: Play([5]), value: -0.250924159028427 selected action: Play([3]) [Sarsa Com 2] action: Play([9]), value: -4.65481560077389 action: Play([10]), value: -4.34559501253163 selected action: Play([10]) [Sarsa Com 3] action: Discard([5]), value: -3.28488380943377 action: Play([10]), value: -2.45272267413438 selected action: Play([10]) [Sarsa Com 4] action: Discard([5]), value: -4.14422716266526 action: Play([11]), value: -3.22698837840968 selected action: Play([11]) ---------- trick 3 is done. Sarsa Com 4 takes trick. ---------- [Sarsa Com 4] action: Play([5]), value: -4.08984398834974 action: Play([7]), value: -3.10636694466421 action: Play([8]), value: -2.96715561532822 selected action: Play([8]) [Sarsa Com 1] action: Discard([3]), value: -0.822210280736597 selected action: Discard([3]) [Sarsa Com 2] action: Play([9]), value: -4.37820875683349 selected action: Play([9]) [Sarsa Com 3] action: Discard([5]), value: -2.48183700561925 selected action: Discard([5]) ---------- trick 4 is done. Sarsa Com 2 takes trick. ---------- [Sarsa Com 2] action: Play([9]), value: -4.10295599692963 selected action: Play([9]) [Sarsa Com 3] action: Discard([7]), value: -2.15609837454416 selected action: Discard([7]) [Sarsa Com 4] action: Discard([5]), value: -2.74988698773465 selected action: Discard([5]) [Sarsa Com 1] action: Discard([4]), value: -1.69595161390184 selected action: Discard([4]) ---------- trick 5 is done. Sarsa Com 2 takes trick. ---------- deal is done. last cards: Sarsa Com 1: 5 Sarsa Com 2: 9 Sarsa Com 3: 7 Sarsa Com 4: 7 ["Sarsa Com 2"] lose in deal. minus points: Sarsa Com 1: [] Sarsa Com 2: [6, 8, 9] Sarsa Com 3: [] Sarsa Com 4: [6, 8] -------------------- game ended. total minus points: Sarsa Com 1: 0 Sarsa Com 2: 23 Sarsa Com 3: 0 Sarsa Com 4: 14 ["Sarsa Com 2"] lose.
なかなか絶妙な判断のプレイが出来てることが分かると思う。
単調なプレイになっていないのがいい。
また、最初と最後を比較してみると、学習が進んでる様子もうかがえる。
学習結果の分析
学習後のSarsa Com 1〜Sarsa Com 4の各特徴ごとのパラメータをグラフにしてみたのが、以下。
若干のバラツキはあるけど、大体同じ値になっていることが分かると思う。
(もしこれで完全にバラバラだったら、表現力が足りなくて発散してしまっていることになる)
これを見ると、手札に3や4がないとかなり負けやすく、逆に手札から7, 8, 9がなくなるとかなり負けにくいということも分かる。
あと、意外なことに最後にプレイされたカードはあまり重要じゃない感じ。
それよりも、どのカードが残っているのかが重要な感じ。
カウンティングの重要性・・・
何はともあれ、これで強化学習の方法を使ったAIの作成は完了。
学習が上手くいかない感じだったらもっと特徴の数や種類を調整しようと思っていたのだけど、今回は杞憂で終わったみたい。
今日はここまで!