SBNeC 2010
Resumo:F.083


Poster (Painel)
F.083Rats respond to the opponents’ change in strategy in a competitive game
Autores:Luiz Eduardo Tassi (IB USP - Inst. Biociências Universidade de São Paulo) ; Gilberto Fernando Xavier (IB USP - Inst. Biociências Universidade de São Paulo)

Resumo

Objective: In a world where animals have to compete with one another for resources, flexibility of decisions to unpredictability is essential for survival. In this kind of environment the outcomes of one’s actions change dynamically as a function of competitors’ decisions. Games provide a way of studying decision making involving many agents. Here, we studied whether Wistar rats were capable of playing the Matching Pennies game (MPG), and how they responded to the use of different strategies by a computer opponent. MPG is a simple strategic game, in which players’ choices, to be optimal, must be independent of previous choices and payoffs. According to reinforcement learning models, adjustments of choice policy are based on discrepancies between expected and obtained rewards, thus generating more predictable choice sequences, which lead to a lower reward rate. Methods: Eleven male Wistar rats were trained to play a three-hole nose-poke version of the game in a Skinner box. In each trial, after nose poking a central hole, the animals chose one of the lateral holes. When the animal and the computer chose the same hole, reward (20 µL of 5 % sucrose solution) was provided. To make its decision, the computer was programmed to use two different algorithms to predict the animals’ next choice. The first algorithm ONLY exploited statistical biases present in the right(R)/left(L) choice sequence, e.g. the probability of a L choice after a RRL sequence. So, a reinforcement learning policy of repeating rewarded choices was not penalized. In the second algorithm the L/R sequence and the payoff of each choice were taken into account, e.g. the probability of a R choice after a R(+)R(-)L(+) sequence (where (+) and (-) stand for rewarded and unrewarded choices, respectively). Consequently, in the second algorithm, the probability of getting rewarded was smaller when subjects used the reinforcement learning policy (repeating the choice after being rewarded and changing it otherwise, the so called ”win-stay-lose-shift” strategy – WSLS). Therefore, if rats are sensitive to the opponents’ strategy, their probability of using a WSLS in the second algorithm would be smaller. Results: When playing against algorithm 2, subjects used WSLS significantly less than when playing against algorithm 1 (Wilcoxon signed-rank test, p=0.002). There was no significant difference in the reward rate (Wilcoxon signed-rank test, p > 0.05), which was 48%, significantly different from the expected optimal rate of 50% (Wilcoxon signed-rank test, p = 0.0001). Discussion: Rats were capable of approaching the optimal strategy in the MPG, and were sensitive to changes in the opponent’s strategy, responding accordingly. The changes in the probability of WSLS and the reward rate below the expected optimal rate, demonstrate that reinforcement learning is at the core of the animals’ strategy.


Palavras-chave:  decision making, game theory, optimal strategy