Analyzing Polyn et al. (2009) CMR Data¶

This tutorial demonstrates analyzing the Polyn et al. (2009) free recall dataset, which was used in the development of the Context Maintenance and Retrieval (CMR) model.

The dataset contains behavioral data from 45 subjects who studied lists of 24 words using either SIZE or ANIMACY encoding tasks.

The experiment included three list types:

Control (Size): All items studied using the SIZE task
Control (Animacy): All items studied using the ANIMACY task
Shift: Items alternated between SIZE and ANIMACY encoding tasks

We’ll analyze recall performance using:

Probability of First Recall (PFR) - probability of recalling each position first
Lag-CRP - conditional recall probability by temporal lag
Serial Position Curve (SPC) - recall probability by encoding position
Memory Fingerprint - clustering by task and temporal features

Reference: Polyn, S.M., Norman, K.A., & Kahana, M.J. (2009). A Context Maintenance and Retrieval Model of Organizational Processes in Free Recall. Psychological Review, Vol. 116 (1), 129-156. https://doi.org/10.1037/a0014420

[1]:

import quail
import matplotlib.pyplot as plt
import warnings
from collections import Counter

# Suppress RuntimeWarnings about empty slices
warnings.filterwarnings('ignore', category=RuntimeWarning)

Load the dataset¶

The CMR dataset is included with quail and can be loaded using load_example_data().

[2]:

# Load the CMR dataset
egg = quail.load_example_data('cmr')

print(f"Loaded CMR data: {egg.n_subjects} subjects, {egg.n_lists} lists, "
      f"{egg.list_length} items per list")

Loaded CMR data: 45 subjects, 34 lists, 24 items per list

Set up list groupings¶

Unlike Murdock (1962) where each subject was in a single condition, here lists are mixed within subjects. We create a nested listgroup structure where each subject has a list of condition labels for their lists.

[3]:

# Build per-subject listgroups: map each list to its condition
# In this dataset, lists are mixed within subjects (each list has its own condition)
listgroup = []
for subj_idx in range(egg.n_subjects):
    subj_listgroup = []
    for list_idx in range(egg.n_lists):
        try:
            sample = egg.pres.loc[(subj_idx, list_idx)][0]
            if sample and 'condition' in sample:
                subj_listgroup.append(sample['condition'])
            else:
                subj_listgroup.append(None)
        except (KeyError, IndexError, TypeError):
            subj_listgroup.append(None)
    listgroup.append(subj_listgroup)

# Count lists per condition (excluding None)
all_conditions = [c for subj in listgroup for c in subj if c is not None]
print(f"Lists per condition: {dict(Counter(all_conditions))}")

Lists per condition: {}

Analyze and plot¶

We’ll create four plots:

PFR: Probability of first recall by serial position
Lag-CRP: Conditional recall probability by temporal lag
SPC: Serial position curve showing primacy and recency
Fingerprint: Memory clustering by task (semantic) and temporal features

[4]:

# Create a figure with subplots for each analysis
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 1. Probability of First Recall
pfr = egg.analyze('pfr', listgroup=listgroup)
pfr.plot(ax=axes[0, 0], plot_type='list', legend=True)

# 2. Lag-CRP
lagcrp = egg.analyze('lagcrp', listgroup=listgroup)
lagcrp.plot(ax=axes[0, 1], plot_type='list', legend=False)

# 3. Serial Position Curve
spc = egg.analyze('spc', listgroup=listgroup)
spc.plot(ax=axes[1, 0], plot_type='list', legend=False)

# Configure plots
axes[0, 0].set_title('Probability of First Recall')
axes[0, 0].set_xlabel('Serial Position')
axes[0, 0].set_ylabel('Probability')
axes[0, 0].set_ylim([0, 0.3])

axes[0, 1].set_title('Lag-CRP')
axes[0, 1].set_xlabel('Lag')
axes[0, 1].set_ylabel('Conditional Recall Probability')
axes[0, 1].set_xlim([-10, 10])
axes[0, 1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)

axes[1, 0].set_title('Serial Position Curve')
axes[1, 0].set_xlabel('Serial Position')
axes[1, 0].set_ylabel('Recall Probability')
axes[1, 0].set_ylim([0, 1])

# 4. Memory Fingerprint - averaged across all lists
avg_listgroup = ['average'] * egg.n_lists
fingerprint = egg.analyze('fingerprint', features=['task', 'temporal'],
                          listgroup=avg_listgroup)
fingerprint.plot(ax=axes[1, 1], title='Memory Fingerprint', ylim=[0, 1])
axes[1, 1].set_xlabel('Feature')
axes[1, 1].set_ylabel('Clustering Score')

plt.tight_layout()
plt.suptitle('Polyn et al. (2009) CMR Dataset Analysis', y=1.02, fontsize=14)
plt.show()

../_images/tutorial_cmr_polyn_2009_7_0.png

Key findings¶

The plots demonstrate several important memory phenomena:

Task-based organization: The fingerprint plot shows clustering by task (SIZE vs ANIMACY), indicating semantic organization during recall
Temporal contiguity: Strong forward asymmetry in the Lag-CRP, showing preference for recalling items that were studied nearby in time
Serial position effects: Clear primacy and recency in the SPC
Condition differences: Shift lists may show different patterns due to alternating encoding tasks