Chameleon

Investigating and Overcoming Task-Switching Costs Through A Reinforcement Learning Lens

Working on a paper, not releasing the technical details just yet. I aim to publish soon! If you want to discuss the math and code with me, please contact me!

Inspiration

  1. Functional fixedness: an inability to be creative and the mistake of extrapolating learned skills to completely new contexts;
  2. Task-switching cost: the idea that it is mentally expensive to switch between different unrelated tasks;
  3. Relevance Encoding: the idea that our brain utilizes its plasticity to morph its neural architecture only when exposed to to tasks that are relevant to our consciousness, or some sense of reward/utility.

Humans can overcome functional fixedness in their quotidian life to some extent by recognizing changes in their environment. When humans identify this context-shift, they do not completely discard their experiences from previously-encountered contexts, but ensemble their previous context-specific understandings in a way that is helpful to navigating their current context. For example, when a toddler first walks on the uneven gravel of their neighbourhood playground, they do not simply forget about when they first learned to walk at 10 months in the carpeted floor of their home; rather, their strategy or policy to walk in the playground may be a composition of their existing context-specific strategies after a recognition that they are indeed in a new context.

Objective

Our vision is to be able to emulate this autonomous recognition of a context-shift and subsequent adaption within an artificial decision-making system. The artificial decision-making system within which we hope to encode this understanding of a context-shift will be a reinforcement learning (RL) agent – namely a Chameleon. We specifically investigate a finite-horizon pathfinding problem where a baseline and Chameleon would explore an environment with pre-defined terminal states. For us to robustly evaluate the performance of the Chameleon against a baseline agent, we need methods to quantify the behaviour of an agent in an arbitrary environment. We employ and compare a core set of heuristics not only to compare the Chameleon to a baseline, but also hope to offer transparency on the decision-making process conducted by any arbitrary agent when subject to changing contexts

Interesting Animations

This is an exhibition of task-switching costs that the Deep Q-Network agent encounters on the third context. One the first and second contexts, the DQN agent doesn’t struggle much since each context is roughly similar. However, the peaks and valleys on the third context are in very different locations – so we see the agent spend a lot of time in the areas that had high rewards in contexts 1 and 2 while exploring the final context.

This is the animation of a reference environment mentioned in the paper in progress (will probably share soon if folks are interested). The left-hand represents the hill while the right-hand represents the contour map.

This is a longer simulation, just for fun! Interestingly, the DQN agent really struggles in the last context.

Some Results