The Egg data object =================== This tutorial will go over the basics of the ``Egg`` data object, the essential ``quail`` data structure that contains all the data you need to run analyses and plot the results. An egg is made up of two primary pieces of data: 1. ``pres`` data - stimuli/features that were presented to a subject 2. ``rec`` data - stimuli/features that were recalled by the subject. You cannot create an ``egg`` without both of these components. Additionally, there are a few optional fields: 1. ``dist_funcs`` dictionary - this field allows you to control the distance functions for each of the stimulus features. For more on this, see the fingerprint tutorial. 2. ``meta`` dictionary - this is an optional field that allows you to store custom meta data about the dataset, such as the date collected, experiment version etc. There are also a few other fields and functions to make organizing and modifying ``eggs`` easier (discussed at the bottom). Now, lets dive in and create an ``egg`` from scratch. Load in the library ------------------- .. code:: ipython3 import quail %matplotlib inline .. parsed-literal:: /usr/local/lib/python3.6/site-packages/pydub/utils.py:165: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) The ``pres`` data structure --------------------------- The first piece of an ``egg`` is the ``pres`` data, or in other words the stimuli that were presented to the subject. For a single subject’s data, the form of the input will be a list of lists, where each list is comprised of the words presented to the subject during a particular study block. Let’s create a fake dataset of one subject who saw two encoding lists: .. code:: ipython3 presented_words = [['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']] The ``rec`` data structure -------------------------- The second fundamental component of an egg is the ``rec`` data, or the words/stimuli that were recalled by the subject. Now, let’s create the recall lists: .. code:: ipython3 recalled_words = [['bat', 'cat', 'goat', 'hat'],['animal', 'horse', 'zoo']] We now have the two components necessary to build an ``egg``, so let’s do that and then take a look at the result. .. code:: ipython3 egg = quail.Egg(pres=presented_words, rec=recalled_words) That’s it! We’ve created our first ``egg``. Let’s take a closer look at how the ``egg`` is setup. We can use the ``info`` method to get a quick snapshot of the ``egg``: .. code:: ipython3 egg.info() .. parsed-literal:: Number of subjects: 1 Number of lists per subject: 2 Number of words per list: 4 Date created: Mon Aug 6 14:43:19 2018 Meta data: {} Now, let’s take a closer look at how the ``egg`` is structured. First, we will check out the ``pres`` field: .. code:: ipython3 egg.get_pres_items() .. raw:: html

		0	1	2	3
Subject	List
0	0	cat	bat	hat	goat
0	1	zoo	animal	zebra	horse

As you can see above, the ``pres`` field was turned into a multi-index Pandas DataFrame organized by subject and by list. This is how the ``pres`` data is stored within an egg, which will make more sense when we consider larger datasets with more subjects. Next, let’s take a look at the ``rec`` data: .. code:: ipython3 egg.get_rec_items() .. raw:: html

		0	1	2	3
Subject	List
0	0	bat	cat	goat	hat
0	1	animal	horse	zoo	NaN

The ``rec`` data is also stored as a DataFrame. Notice that if the number of recalled words is shorter than the number of presented words, those columns are filled with a ``NaN`` value. Now, let’s create an ``egg`` with two subject’s data and take a look at the result. Multisubject ``eggs`` --------------------- .. code:: ipython3 # presented words sub1_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']] sub2_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']] # recalled words sub1_recalled=[['bat', 'cat', 'goat', 'hat'],['animal', 'horse', 'zoo']] sub2_recalled=[['cat', 'goat', 'bat', 'hat'],['horse', 'zebra', 'zoo', 'animal']] # combine subject data presented_words = [sub1_presented, sub2_presented] recalled_words = [sub1_recalled, sub2_recalled] # create Egg multisubject_egg = quail.Egg(pres=presented_words, rec=recalled_words) multisubject_egg.info() .. parsed-literal:: Number of subjects: 2 Number of lists per subject: 2 Number of words per list: 4 Date created: Mon Aug 6 14:43:19 2018 Meta data: {} As you can see above, in order to create an ``egg`` with more than one subject’s data, all you do is create a list of subjects. Let’s see how the ``pres`` data is organized in the egg with more than one subject: .. code:: ipython3 multisubject_egg.get_pres_items() .. raw:: html

		0	1	2	3
Subject	List
0	0	cat	bat	hat	goat
0	1	zoo	animal	zebra	horse
1	0	cat	bat	hat	goat
1	1	zoo	animal	zebra	horse

Looks identical to the single subject data, but now we have two unique subject identifiers in the ``DataFrame``. The ``rec`` data is set up in the same way: .. code:: ipython3 multisubject_egg.get_rec_items() .. raw:: html

		0	1	2	3
Subject	List
0	0	bat	cat	goat	hat
0	1	animal	horse	zoo	NaN
1	0	cat	goat	bat	hat
1	1	horse	zebra	zoo	animal

As you add more subjects, they are simply appended to the bottom of the df with a unique subject identifier. Adding features to the egg -------------------------- Stimuli can also be passed as a dictionary containing the stimulus and features of the stimulus. You can include any stimulus feature you want in this dictionary, such as the position of the word on the screen, the color, or perhaps the font of the word: .. code:: ipython3 cat_features = { 'item': 'cat', 'category': 'animal', 'word_length': 3, 'starting_letter': 'c', } Let’s try creating an egg with additional stimulus features: .. code:: ipython3 # presentation features presented_words = [ [ { 'item': 'cat', 'category': 'animal', 'word_length': 3, 'starting_letter': 'c' }, { 'item': ' bat', 'category': 'object', 'word_length': 3, 'starting_letter': 'b' }, { 'item': 'hat', 'category': 'object', 'word_length': 3, 'starting_letter': 'h' }, { 'item': 'goat', 'category': 'animal', 'word_length': 4, 'starting_letter': 'g' }, ], [ { 'item': 'zoo', 'category': 'place', 'word_length': 3, 'starting_letter': 'z' }, { 'item': 'donkey', 'category' : 'animal', 'word_length' : 6, 'starting_letter' : 'd' }, { 'item': 'zebra', 'category': 'animal', 'word_length': 5, 'starting_letter': 'z' }, { 'item': 'horse', 'category': 'animal', 'word_length': 5, 'starting_letter': 'h' }, ], ] recalled_words = [ [ { 'item': ' bat', 'category': 'object', 'word_length': 3, 'starting_letter': 'b' }, { 'item': 'cat', 'category': 'animal', 'word_length': 3, 'starting_letter': 'c' }, { 'item': 'goat', 'category': 'animal', 'word_length': 4, 'starting_letter': 'g' }, { 'item': 'hat', 'category': 'object', 'word_length': 3, 'starting_letter': 'h' }, ], [ { 'item': 'donkey', 'category' : 'animal', 'word_length' : 6, 'starting_letter' : 'd' }, { 'item': 'horse', 'category': 'animal', 'word_length': 5, 'starting_letter': 'h' }, { 'item': 'zoo', 'category': 'place', 'word_length': 3, 'starting_letter': 'z' }, ], ] # create egg object egg = quail.Egg(pres=presented_words, rec=recalled_words) Like before, you can use the ``get_pres_items`` method to retrieve the presented items: .. code:: ipython3 egg.get_pres_items() .. raw:: html

		0	1	2	3
Subject	List
0	0	cat	bat	hat	goat
0	1	zoo	donkey	zebra	horse

The stimulus features can be accessed by calling the ``get_pres_features`` method: .. code:: ipython3 egg.get_pres_features() .. raw:: html

		0	1	2	3
Subject	List
0	0	{'category': 'animal', 'word_length': 3, 'star...	{'category': 'object', 'word_length': 3, 'star...	{'category': 'object', 'word_length': 3, 'star...	{'category': 'animal', 'word_length': 4, 'star...
0	1	{'category': 'place', 'word_length': 3, 'start...	{'category': 'animal', 'word_length': 6, 'star...	{'category': 'animal', 'word_length': 5, 'star...	{'category': 'animal', 'word_length': 5, 'star...

Defining custom distance functions for the stimulus feature dimensions ---------------------------------------------------------------------- As described in the fingerprint tutorial, the ``features`` data structure is used to estimate how subjects cluster their recall responses with respect to the features of the encoded stimuli. Briefly, these estimates are derived by computing the similarity of neighboring recall words along each feature dimension. For example, if you recall “dog”, and then the next word you recall is “cat”, your clustering by category score would increase because the two recalled words are in the same category. Similarly, if after you recall “cat” you recall the word “can”, your clustering by starting letter score would increase, since both words share the first letter “c”. This logic can be extended to any number of feature dimensions. Similarity between stimuli can be computed in a number of ways. By default, the distance function for all textual features (like category, starting letter) is binary. In other words, if the words are in the same category (cat, dog), there similarity would be 1, whereas if they are in different categories (cat, can) their similarity would be 0. For numerical features (such as word length), by default similarity between words is computed using Euclidean distance. However, the point of this digression is that you can define your own distance functions by passing a ``dist_func`` dictionary to the ``Egg`` class. This could be for all feature dimensions, or only a subset. Let’s see an example: .. code:: ipython3 dist_funcs = { 'word_length' : lambda x,y: (x-y)**2 } egg = quail.Egg(pres=presented_words, rec=recalled_words, dist_funcs=dist_funcs) In the example code above, similarity between words for the word_length feature dimension will now be computed using this custom distance function, while all other feature dimensions will be set to the default. Adding meta data to an ``egg`` ------------------------------ Lastly, we can add meta data to the ``egg``. We added this field to help researchers keep their eggs organized by adding custom meta data to the ``egg`` object. The data is added to the ``egg`` by passing the ``meta`` key word argument when creating the ``egg``: .. code:: ipython3 meta = { 'Researcher' : 'Andy Heusser', 'Study' : 'Egg Tutorial' } egg = quail.Egg(pres=presented_words, rec=recalled_words, meta=meta) egg.info() .. parsed-literal:: Number of subjects: 1 Number of lists per subject: 2 Number of words per list: 4 Date created: Mon Aug 6 14:43:19 2018 Meta data: {'Researcher': 'Andy Heusser', 'Study': 'Egg Tutorial'} Adding ``listgroup`` and ``subjgroup`` to an ``egg`` ---------------------------------------------------- While the ``listgroup`` and ``subjgroup`` arguments can be used within the ``analyze`` function, they can also be attached directly to the ``egg``, allowing you to save condition labels for easy organization and easy data sharing. To do this, simply pass one or both of the arguments when creating the ``egg``: .. code:: ipython3 # presented words sub1_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']] sub2_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']] # recalled words sub1_recalled=[['bat', 'cat', 'goat', 'hat'],['animal', 'horse', 'zoo']] sub2_recalled=[['cat', 'goat', 'bat', 'hat'],['horse', 'zebra', 'zoo', 'animal']] # combine subject data presented_words = [sub1_presented, sub2_presented] recalled_words = [sub1_recalled, sub2_recalled] # create Egg multisubject_egg = quail.Egg(pres=presented_words,rec=recalled_words, subjgroup=['condition1', 'condition2'], listgroup=['early','late']) Saving an ``egg`` ----------------- Once you have created your egg, you can save it for use later, or to share with colleagues. To do this, simply call the ``save`` method with a filepath: :: multisubject_egg.save('myegg') To load this egg later, simply call the ``load_egg`` function with the path of the egg: :: egg = quail.load('myegg') Stacking ``eggs`` ----------------- We now have two separate eggs, each with a single subject’s data. Let’s combine them by passing a ``list`` of ``eggs`` to the ``stack_eggs`` function: .. code:: ipython3 # subject 1 data sub1_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']] sub1_recalled=[['bat', 'cat', 'goat', 'hat'],['animal', 'horse', 'zoo']] # create subject 2 egg subject1_egg = quail.Egg(pres=sub1_presented, rec=sub1_recalled) # subject 2 data sub2_presented=[['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']] sub2_recalled=[['cat', 'goat', 'bat', 'hat'],['horse', 'zebra', 'zoo', 'animal']] # create subject 2 egg subject2_egg = quail.Egg(pres=sub2_presented, rec=sub2_recalled) .. code:: ipython3 stacked_eggs = quail.stack_eggs([subject1_egg, subject2_egg]) stacked_eggs.get_pres_items() .. raw:: html

		0	1	2	3
Subject	List
0	0	cat	bat	hat	goat
0	1	zoo	animal	zebra	horse
1	0	cat	bat	hat	goat
1	1	zoo	animal	zebra	horse

Cracking ``eggs`` ----------------- You can use the ``crack_egg`` function to slice out a subset of subjects or lists: .. code:: ipython3 cracked_egg = quail.crack_egg(stacked_eggs, subjects=[1], lists=[0]) cracked_egg.get_pres_items() .. raw:: html

		0	1	2	3
Subject	List
0	0	cat	bat	hat	goat

Alternatively, you can use the ``crack`` method, which does the same thing: .. code:: ipython3 stacked_eggs.crack(subjects=[0,1], lists=[1]).get_pres_items() .. raw:: html

		0	1	2	3
Subject	List
0	0	zoo	animal	zebra	horse
1	0	zoo	animal	zebra	horse