日本免费全黄少妇一区二区三区-高清无码一区二区三区四区-欧美中文字幕日韩在线观看-国产福利诱惑在线网站-国产中文字幕一区在线-亚洲欧美精品日韩一区-久久国产精品国产精品国产-国产精久久久久久一区二区三区-欧美亚洲国产精品久久久久

自學(xué)圍棋的AlphaGo Zero,你也可以造一個(gè)( 三 )


三位一體的修煉
狗零的修煉分為三個(gè)過(guò)程,是異步的 。
一是自對(duì)弈 (Self-Play),用來(lái)生成數(shù)據(jù) 。
1def self_play():
2 while True:
3 new_player, checkpoint = load_player()
4 if new_player:
5 player = new_player
6、7 ## Create the self-play match queue of processes
8 results = create_matches(player, cores=PARALLEL_SELF_PLAY,
9 match_number=SELF_PLAY_MATCH)
10 for _ in range(SELF_PLAY_MATCH):
11 result = results.get()
12 db.insert({
13 "game": result,
【自學(xué)圍棋的AlphaGo Zero,你也可以造一個(gè)】14 "id": game_id
15 })
16 game_id= 1
二是訓(xùn)練 (Training),拿新鮮生成的數(shù)據(jù) , 來(lái)改進(jìn)當(dāng)前的神經(jīng)網(wǎng)絡(luò) 。
1def train():
2 criterion = AlphaLoss()
3 dataset = SelfPlayDataset()
4 player, checkpoint = load_player(current_time, loaded_version)
5 optimizer = create_optimizer(player, lr,
6 param=checkpoint['optimizer'])
7 best_player = deepcopy(player)
8 dataloader = DataLoader(dataset, collate_fn=collate_fn,
9 batch_size=BATCH_SIZE, shuffle=True)
10、11 while True:
12 for batch_idx, (state, move, winner) in enumerate(dataloader):
13、14 ## Evaluate a copy of the current network
15 if total_ite % TRAIN_STEPS == 0:
16 pending_player = deepcopy(player)
17 result = evaluate(pending_player, best_player)
18、19 if result:
20 best_player = pending_player
21、22 example = {
23 'state': state,
24 'winner': winner,
25 'move' : move
26 }
27 optimizer.zero_grad()
28 winner, probas = pending_player.predict(example['state'])
29、30 loss = criterion(winner, example['winner'],
31 probas, example['move'])
32 loss.backward()
33 optimizer.step()
34、35 ## Fetch new games
36 if total_ite % REFRESH_TICK == 0:
37 last_id = fetch_new_games(collection, dataset, last_id)
訓(xùn)練用的損失函數(shù)表示如下:
1class AlphaLoss(torch.nn.Module):
2 def __init__(self):
3 super(AlphaLoss, self).__init__()
4、5 def forward(self, pred_winner, winner, pred_probas, probas):
6 value_error = (winner - pred_winner) ** 2
7 policy_error = torch.sum((-probas *
8 (1e-6pred_probas).log()), 1)
9 total_error = (value_error.view(-1)policy_error).mean()
10 return total_error
三是評(píng)估 (Evaluation) , 看訓(xùn)練過(guò)的智能體,比起正在生成數(shù)據(jù)的智能體,是不是更優(yōu)秀了 (最優(yōu)秀者回到第一步,繼續(xù)生成數(shù)據(jù))。
1def evaluate(player, new_player):
2 results = play(player, opponent=new_player)
3 black_wins = 0
4 white_wins = 0
5、6 for result in results:
7 if result[0] == 1:
8 white_wins= 1
9 elif result[0] == 0:
10 black_wins= 1
11、12 ## Check if the trained player (black) is better than
13 ## the current best player depending on the threshold
14 if black_wins >= EVAL_THRESH * len(results):
15 return True
16 return False
第三部分很重要,要不斷選出最優(yōu)的網(wǎng)絡(luò),來(lái)不斷生成高質(zhì)量的數(shù)據(jù),才能提升AI的棋藝 。
三個(gè)環(huán)節(jié)周而復(fù)始 , 才能養(yǎng)成強(qiáng)大的棋手 。
有志于AI圍棋的各位,也可以試一試這個(gè)PyTorch實(shí)現(xiàn) 。
本來(lái)摘自量子位,原作 Dylan Djian 。
代碼實(shí)現(xiàn)傳送門:
網(wǎng)頁(yè)鏈接
教程原文傳送門:
網(wǎng)頁(yè)鏈接
AlphaGo Zero論文傳送門:
網(wǎng)頁(yè)鏈接

推薦閱讀