{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Analyzing Polyn et al. (2009) CMR Data\n",
    "\n",
    "This tutorial demonstrates analyzing the Polyn et al. (2009) free recall dataset, which was used in the development of the Context Maintenance and Retrieval (CMR) model.\n",
    "\n",
    "The dataset contains behavioral data from 45 subjects who studied lists of 24 words using either SIZE or ANIMACY encoding tasks.\n",
    "\n",
    "**The experiment included three list types:**\n",
    "- Control (Size): All items studied using the SIZE task\n",
    "- Control (Animacy): All items studied using the ANIMACY task\n",
    "- Shift: Items alternated between SIZE and ANIMACY encoding tasks\n",
    "\n",
    "We'll analyze recall performance using:\n",
    "1. Probability of First Recall (PFR) - probability of recalling each position first\n",
    "2. Lag-CRP - conditional recall probability by temporal lag\n",
    "3. Serial Position Curve (SPC) - recall probability by encoding position\n",
    "4. Memory Fingerprint - clustering by task and temporal features\n",
    "\n",
    "**Reference:**\n",
    "Polyn, S.M., Norman, K.A., & Kahana, M.J. (2009). A Context Maintenance and Retrieval Model of Organizational Processes in Free Recall. *Psychological Review*, Vol. 116 (1), 129-156. https://doi.org/10.1037/a0014420"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import quail\n",
    "import matplotlib.pyplot as plt\n",
    "import warnings\n",
    "from collections import Counter\n",
    "\n",
    "# Suppress RuntimeWarnings about empty slices\n",
    "warnings.filterwarnings('ignore', category=RuntimeWarning)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the dataset\n",
    "\n",
    "The CMR dataset is included with quail and can be loaded using `load_example_data()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the CMR dataset\n",
    "egg = quail.load_example_data('cmr')\n",
    "\n",
    "print(f\"Loaded CMR data: {egg.n_subjects} subjects, {egg.n_lists} lists, \"\n",
    "      f\"{egg.list_length} items per list\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up list groupings\n",
    "\n",
    "Unlike Murdock (1962) where each subject was in a single condition, here lists are mixed within subjects. We create a nested `listgroup` structure where each subject has a list of condition labels for their lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Build per-subject listgroups: map each list to its condition\n",
    "# In this dataset, lists are mixed within subjects (each list has its own condition)\n",
    "listgroup = []\n",
    "for subj_idx in range(egg.n_subjects):\n",
    "    subj_listgroup = []\n",
    "    for list_idx in range(egg.n_lists):\n",
    "        try:\n",
    "            sample = egg.pres.loc[(subj_idx, list_idx)][0]\n",
    "            if sample and 'condition' in sample:\n",
    "                subj_listgroup.append(sample['condition'])\n",
    "            else:\n",
    "                subj_listgroup.append(None)\n",
    "        except (KeyError, IndexError, TypeError):\n",
    "            subj_listgroup.append(None)\n",
    "    listgroup.append(subj_listgroup)\n",
    "\n",
    "# Count lists per condition (excluding None)\n",
    "all_conditions = [c for subj in listgroup for c in subj if c is not None]\n",
    "print(f\"Lists per condition: {dict(Counter(all_conditions))}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analyze and plot\n",
    "\n",
    "We'll create four plots:\n",
    "- **PFR**: Probability of first recall by serial position\n",
    "- **Lag-CRP**: Conditional recall probability by temporal lag\n",
    "- **SPC**: Serial position curve showing primacy and recency\n",
    "- **Fingerprint**: Memory clustering by task (semantic) and temporal features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a figure with subplots for each analysis\n",
    "fig, axes = plt.subplots(2, 2, figsize=(12, 10))\n",
    "\n",
    "# 1. Probability of First Recall\n",
    "pfr = egg.analyze('pfr', listgroup=listgroup)\n",
    "pfr.plot(ax=axes[0, 0], plot_type='list', legend=True)\n",
    "\n",
    "# 2. Lag-CRP\n",
    "lagcrp = egg.analyze('lagcrp', listgroup=listgroup)\n",
    "lagcrp.plot(ax=axes[0, 1], plot_type='list', legend=False)\n",
    "\n",
    "# 3. Serial Position Curve\n",
    "spc = egg.analyze('spc', listgroup=listgroup)\n",
    "spc.plot(ax=axes[1, 0], plot_type='list', legend=False)\n",
    "\n",
    "# Configure plots\n",
    "axes[0, 0].set_title('Probability of First Recall')\n",
    "axes[0, 0].set_xlabel('Serial Position')\n",
    "axes[0, 0].set_ylabel('Probability')\n",
    "axes[0, 0].set_ylim([0, 0.3])\n",
    "\n",
    "axes[0, 1].set_title('Lag-CRP')\n",
    "axes[0, 1].set_xlabel('Lag')\n",
    "axes[0, 1].set_ylabel('Conditional Recall Probability')\n",
    "axes[0, 1].set_xlim([-10, 10])\n",
    "axes[0, 1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)\n",
    "\n",
    "axes[1, 0].set_title('Serial Position Curve')\n",
    "axes[1, 0].set_xlabel('Serial Position')\n",
    "axes[1, 0].set_ylabel('Recall Probability')\n",
    "axes[1, 0].set_ylim([0, 1])\n",
    "\n",
    "# 4. Memory Fingerprint - averaged across all lists\n",
    "avg_listgroup = ['average'] * egg.n_lists\n",
    "fingerprint = egg.analyze('fingerprint', features=['task', 'temporal'],\n",
    "                          listgroup=avg_listgroup)\n",
    "fingerprint.plot(ax=axes[1, 1], title='Memory Fingerprint', ylim=[0, 1])\n",
    "axes[1, 1].set_xlabel('Feature')\n",
    "axes[1, 1].set_ylabel('Clustering Score')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.suptitle('Polyn et al. (2009) CMR Dataset Analysis', y=1.02, fontsize=14)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Key findings\n",
    "\n",
    "The plots demonstrate several important memory phenomena:\n",
    "\n",
    "1. **Task-based organization**: The fingerprint plot shows clustering by task (SIZE vs ANIMACY), indicating semantic organization during recall\n",
    "2. **Temporal contiguity**: Strong forward asymmetry in the Lag-CRP, showing preference for recalling items that were studied nearby in time\n",
    "3. **Serial position effects**: Clear primacy and recency in the SPC\n",
    "4. **Condition differences**: Shift lists may show different patterns due to alternating encoding tasks"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}