Lab #1b - Speech Encoding



Posted Monday, August 29th; due Monday October 3rd, but note that there is an “intermediate” due date of Friday September 23rd (for data collection).

To completing this lab you will need access to group data files. These will become available around September 26th.[Files from 2015 version: sequence recall raw data, sequence recall individual profiiles, implicit discrimination raw data.]

In the first part of Lab #1 you familiarized yourself with two classic paradigms in speech perception research, gathered data on your own perceptual abilities using PsychoPy, and used a spreadsheet program to analyze and graph the results. In this second part of the lab you will run two further experiments on native and non-native speech perception, using more recently developed experimental paradigms that have been claimed to show greater sensitivity to the high-level phonological encoding of sounds.

You will run the experiments using PsychoPy and analyze the results using a spreadsheet program, as in part A of the lab. In addition to the novel experimental paradigms, a novel component of part B of the lab is that you will compute some basic statistical analyses of your own data and data from the class group. Each of the experiments can be downloaded as a single compressed folder (Experiment 1, Experiment 2) that should automatically decompress into a normal folder. Lab 1B will also be run in PsychoPy. You should be familiar with PsychoPy by now, but if you have any questions, documentation about PsychoPy can be found here.



There are good opportunities for collaboration in this lab project, particularly in the data analysis sections. You are encouraged to consult with one another on the statistical analyses, and on strategies for efficiently processing the data. The individual data analysis can be done by hand, but we have provided tips for automating this process, and all other analyses can be done very efficiently – if it’s taking you a long time, seek help! As usual, you are expected to write up your lab report individually. If you worked with other students, please indicate clearly at the top of your report who you worked with.


Lab Report

Your lab report should be submitted via email by Monday October 3rd. Your report should describe the rationale and predictions for the studies, and should give a concise summary of the methods. You should report the results, as described above, and for each of the two experiments include a discussion of whether the results fit with the predictions, and what conclusions or further questions can be drawn from the results.

Experiment 1: Sequence Recall Task

This task is based upon a paradigm developed by Emmanuel Dupoux and colleagues to test the availability of high-level phonological codes for encoding native and non-native speech sounds. The experiment is directly modeled on Dupoux et al. (2008, Cognition), which you are encouraged to read. The PsychoPy scripts for Experiments 1 and 2 were created by Julia Buffinton, based on materials created by Sunyoung Lee-Ellis in collaboration with Brian Dillon.

In each of 4 experimental conditions you will first receive practice in the sound contrast being tested, by listening to multiple speakers pronounce each of two nonce words. You will learn to associate one of the words with the key “1” and the other with the key “2”. Next you will receive practice in listening to sequences of those words and recalling them using a number sequence. In the practice session you will receive feedback about the correct answer. Finally, you will be tested – without feedback – on sequences of 4 minimally contrasting words, which you will need to recall immediately after each sequence.

(i) Run the experiment. You should test yourself in 4 experimental conditions.

  • Condition “C” – control contrast, /kada ~ kata/
  • Condition “E” – English contrast, /kasta ~ kasuta/, i.e., ‘st’ cluster vs. vowel epenthesis
  • Condition “K” – Korean contrast, /kama ~ kamma/, i.e., single vs. geminate ‘m’
  • Condition “K2” – Korean contrast, /saka ~ ssaka/, i.e., tense vs. lax ‘s’

In order to randomize the order of presentation across participants, please make a random choice of which order to run the conditions in (yes, really: pull the names out of a hat, use a random number generator … whatever works for you).

(ii)  Analyze your accuracy in recalling the 4-word sequences. You should have heard 28 sequences in each condition, so each condition should yield a score out of 28.

The output file may seem overwhelming, but you’ll only need a few columns and you should feel free to delete the rest if it helps. You will need to keep the following columns: sequence, trial_resp.keys, trial_resp.rt, and participant. You can then easily calculate accuracy by comparing the sequence column in your data file to your response in the trial_resp.keys column.

For purposes of this lab we will use a simple method of coding the data. Full credit is given if a sequence is recalled perfectly, and no credit is given otherwise. Dupoux and colleagues use a more sophisticated scheme that gives partial credit for partially correct sequences, but we do not need to do this here. Add a column to your spreadsheet called “correct” and place a 1 in the cell if you correctly recalled the sequence, and a 0 if you did not.

It is not difficult to code the accuracy of the answers by hand, since there are few trials, but there are several ways to automate this process. Try the MID() and/or COUNTIF() functions. If you have new suggestions, then we’d love to hear them.

(iii) Contribute your data for the group analysis by emailing Hanna Muller (hmuller@umd). Please send two things in your email. First, send text including (i) your name, (ii) your native language, (iii) whether your native language allows geminate consonants (e.g.’m’ vs. ‘mm’) or closed syllables as in /kasta/, (iv) scores out of 28 on the English, Korean (x2), and Control conditions. Second, send a single Excel file (.xls, .xlsx, or .csv, please!) containing four sheets, one for each condition. Each sheet should contain the columns you kept from the output file (see above), plus the correct column in the following order:  Column 1: Initials (participant); Column 2: Sequence (sequence); Column 3: Response Time (trial_resp.rt); Column 5: Keys (trial_resp.keys); Column 6: Correct (correct). You should also include Task 2 below. We need the data by the end of the day on Friday September 3rd. Hanna will concatenate the data and then we will make it available to everybody.

(iv) Create a histogram that compares your accuracy in the four conditions.

(v) Individual Statistics. Determine whether you performed significantly worse in either of the non-native conditions than in the native language condition or the control condition, i.e., two comparisons. (For most speakers the “English” condition counts as the native condition, and the “Korean” condition counts as the non-native condition, but if you are a native speaker of Korean, this is, of course, reversed.). Compute exact statistics using the binomial test. This analysis should be done “by hand”, i.e., use your spreadsheet program, but do not use a built-in binomial test function. See the separate instructions on the binomial test in the following tab.

(vi) Group analysis. Once the group data is available, compute new accuracy scores for the class group. For each condition separate the class into those speakers for whom the contrast is in their native language and those speakers for whom it is not. Create a percentage accuracy histogram that summarizes the group comparisons.

(vii) Group Statistics. Now determine whether non-native accuracy is lower than native accuracy for the group as a whole. For this analysis you should again use the binomial test, but rather than computing the test “by hand”, you should use the built in function, e.g., BINOMDIST (or its equivalent) in your spreadsheet program. See the separate instructions on the binomial test in the following tab.

(viii) Write-up your analyses as a part of your lab report.


Experiment 2: Implicit Discrimination Task

This task is based upon a paradigm developed by Christophe Pallier and colleagues to test implicit phoneme discrimination in non-native speakers. The experiment is directly modeled on Navarra et al. (2005, J. Exp. Psychol: Hum Percep Perf), which you are encouraged to read.

In each of 3 sub-experiments you will perform a simple word classification task based on two-syllable nonce words. You will classify words based on their initial syllable (‘ka’ vs. ‘ti’), and the second syllable will always be irrelevant to your task. Nevertheless, the second syllable will sometimes vary across trials. Within each sub-experiment you will encounter 4 experimental conditions, organized in a 2 x 2 design. The first syllable will be either ‘ka’ (“1”) or ‘ti’ (“2”). These sounds will be presented in blocks of trials where either the second syllable of each word is identical (homogeneous conditions, “H”) or the second syllable varies (different conditions, “D”). The classification task itself is very easy, and accuracy should be very high. The interest of the study is in the impact of variation in the irrelevant second syllable on reaction times on the classification task.

(i) Run the implicit discrimination study, which consists of 3 sub-experiments.

  • Task 2C – Control: kaba ~ tiba ~ kada ~ tida
  • Task 2E – English: kasta ~ tista ~ kasuta ~ tisuta
  • Task 2K – Korean: kama ~ tima ~ kamma ~ timma

(ii) Compute accuracy for this task, but the real interest in this study is in reaction time (RT), and how it is impacted by variation in the (irrelevant) second syllable in the test words.

The data files includes many columns of information – don’t be overwhelmed! You can delete most of them except for the following: stim, condition, resp.keys, resp.rt, participant. These correspond to the (i) sound heard, (ii) Condition name, i.e., H1, H2, D1, D2, (iii) response label, i.e., “1”, “2”, (iv) response time, and (v) your initials. A correct response should have the same number as the number in the Condition Name.

You should be able to calculate RTs using PivotTables, similar to what you did in Lab #1A. It is recommended that you exclude trials on which the response was incorrect. (This should NOT require manual sorting of the data.) You should not need to code answers as “correct/incorrect” manually … if you find yourself doing this, then you should stop and look for a better solution, which could include consulting a classmate. You should semi-automatically create a new data column (correct) that indicates whether each trial is answered correctly or incorrectly, where correct is coded with a 1 and incorrect with a 0.

(iii) Contribute your individual data for the group analysis by sending a spreadsheet file (again, .xls, .xlsx, or .csv, please!) to Hanna Muller by Friday September 23rd. Your file should be organized exactly as follows – if you do not use a consistent format, it will become very difficult to automate the group analyses. You are all smart graduate students, so we know that you can follow specific directions. It is preferred that you send a single spreadsheet file (‘workbook’) with multiple worksheets, one for each sub-experiment, rather than as a series of separate files. Be sure to label the worksheets clearly, e.g. “Task 2E”.

Information in each worksheet should be: Column 1: Initials (participant); Column 2: Condition (condition); Column 3: Response Time (resp.rt); Column 5: Keys (resp.keys); Column 6: Correct (correct). This format is straightforwardly related to the format of the data file, except that the irrelevant columns are deleted, and a new column is added (correct).

(iv) Create a histogram that compares your average RTs in all of the experimental conditions. (Consider modifying the y-axis dimensions to fit the range of values that you need to display.)

(v) Group Analysis. Once the group data is available, compute new mean RTs for the class group, again separating the data according to speakers for whom the contrasts are native or non-native. Create a new histogram of group mean RTs.

(vi) Write up your analyses as part of your lab report.


This page gives an overview and some background on how to use the Binomial Test to calculate the probability of outcomes in experiments that consist of events with discrete outcomes. Examples of events with discrete outcomes include coin-flips (heads/tails), rolls of a die (6 possible outcomes), or tasks that are scored as correct or incorrect.

The Binomial Test is an appropriate measure for analyzing accuracy data in the sequence recall task of Lab 1B, since each trial was scored as correct or incorrect, a discrete outcome. We will use the Binomial Test here to compute exact probabilities “by hand”, and to compute probabilities using a function that approximates the exact distribution.

Don’t forget that there is a great deal of freely available information on this and other statistical concepts that are just a few keystrokes away. Make the most of this!

Some nice interactive demos can be found at the Wolfram Demonstrations Project, created by the makers of Mathematica. Statistical demos relevant to our current concerns can be found under Mathematics > Statistics > Probability. Relevant to this page are the demos “Binomial Distribution” and “Normal Approximation to a Binomial Random Variable”. In order to view these demos it is necessary to download the free Mathematica Player, which unfortunately is a rather large download (~80Mb).

Example 1: Individual Accuracy Score

Suppose that your responses to the sequence recall task showed the following results:

Native contrast: 27/28 correct
Non-native contrast: 23/28 correct

You might expect to be more accurate in a task involving native language speech sounds, but we want to know whether the difference in scores observed here reflects a difference in your perceptual abilities in the native and non-native language, or whether it simply occurred by chance.

In order to assess whether the difference in scores is likely to be meaningful we need to ask the following question: How likely is it that this difference would arise by chance? This question can, in turn, be rephrased as follows: If the errors were equally likely to occur in the native and non-native conditions, how likely is it that we would encounter the observed data?

Put differently: If the 6 errors had an equal probability of occurring in the native and non-native conditions, how likely is it that we would encounter 5 of the errors in the non-native condition and just one of the errors in the native condition? This question can be answered straightforwardly.

The Binomial Distribution

Our question about the distribution of errors is no different from a question about the number of heads occurring in a sequence of coin flips. It is equivalent to the question: In a sequence of 6 coin-flips, how likely is it that we would encounter 1 head and 5 tails?


We can compute the exact probability of this occurrence by calculating (i) the total number of possible outcomes, and (ii) the expected frequency of each possible outcome.

If we were considering a situation with just one coin toss, there would be two possible outcomes (H, T), and each should occur with equal frequency (0.5).

In a situation with two coin tosses, the number of possible outcomes increases to 4 (HH, HT, TH, TT). Since each coin toss is independent and we assume that we have a fair coin, each of these 4 outcomes is equally likely. Therefore the probability of 0 heads is 0.25, the probability of 1 head is 0.5 (2 ways of achieving this result), and the probability of 2 heads is 0.25.

In a situation with three coin tosses, the number of possible outcomes increases to 8. If you spell out the set of possible outcomes you can verify that there is 1 outcome with 0 heads, 3 outcomes with 1 head, 3 outcomes with 2 heads, and 1 with 3 heads. This converts to p(0H) = 0.125, p(1H) = 0375, p(2H) = 0.375, p(3H) = 0.125.

More generally, the number of possible outcomes for a sequence of coin flips is 2n outcomes. This means that for 6 coin-flips we have 26 possible outcomes, i.e., 64.

Also, if there are n coin-flips, the number of possible outcomes that yield exactly h heads is: n!/{h! * (n-h)!}, where “!” stands for ‘factorial’, e.g., 6! or 6-factorial = 6 x 5 x 4 x 3 x 2 x 1 = 720.

Note: this is easier than it looks, since many of the numbers in the formula cancel each other out. For h=2, n=6, the number of possible outcomes is equal to:

(6 x 5 x 4 x 3 x 2 x 1) / {(2 x 1) * (4 x 3 x 2 x 1)}

which by canceling out is equivalent to:

(6 x 5)/2
= 15

Applying this to our case of 6 coin-flips:

The number of possible outcomes that yield exactly 0 heads is 1
The number of possible outcomes that yield exactly 1 head is 6
The number of possible outcomes that yield exactly 2 heads is 15
The number of possible outcomes that yield exactly 3 heads is 20
The number of possible outcomes that yield exactly 4 heads is 15
The number of possible outcomes that yield exactly 5 heads is 6
The number of possible outcomes that yield exactly 6 heads is 1

Now we can answer the question that we posed at the start of this section: in a sequence of 6 coin-flips, what is the probability of obtaining exactly 1 head and 5 tails? The answer is 6/64, or 0.09375. If we wanted to know the probability of obtaining at most 1 head, this would be 7/64, which is approximately 0.11.

Notice that the sequence of outcome counts shown above is equivalent to the 6th row of Pascal’s Triangle (wikipediaWolfram Mathworld). More generally, the number of ways of obtaining h heads in a sequence of n coin flips can be found in the hth element of row n of Pascal’s Triangle.

Importantly, the probability of obtaining a given number of heads, as seen here, follows a Binomial Distribution (wikipediaWolfram Mathworld).

Returning to the Accuracy Scores

Next we can return to our earlier question. If we were really equally good at recalling the native and non-native sound sequences, then the errors should occur with equal probability in the two conditions. If this were the case, then how likely is it that we would observe just 1 error in the native condition and 5 errors in the non-native condition?

This is just the same question that we asked about the 6 coin flips, so we already have the answer. The probability of observing exactly 1 of the 6 errors in the native condition is 0.09375. If we are really interested in the probability of observing at most one of the errors in the native condition, then the probability is 7/64, or about 0.11. Thus, it is not very likely that this outcome would arise by chance. However, if we follow the standard convention of rejecting the null hypothesis only if the observed outcome has a probability of less than 0.05, then we would conclude that our result is compatible with the null hypothesis, and that the performance on the non-native condition is not significantly worse than the performance on the native condition.

Important: we calculated the probability of observing at most 1 error in the native condition in a study with 6 total errors, and concluded that the probability was about 0.11. This is different from asking the probability of observing at most 1 error in either condition in a study with 6 total errors. This would include outcomes in which most of the errors occurred in the native condition, and the probability would rise to about 0.22. So, the overall probability of observing at most 1 of 6 errors in one of the two conditions is fairly high.

Which of these probabilities is most relevant when assessing the significance of our experimental results? This depends on the experimenter’s hypotheses. If the experimental hypothesis was that the two conditions should differ in the specific direction observed, then it is appropriate to use the lower probability. On the other hand, if the experimental hypothesis did not include a direction prediction, then it would be more appropriate to use the higher probability. (Additional notes here.)

Exercise. Suppose that you ran the experiment on yourself again and obtained the same result. Now you would have a total of 2 errors in the native condition and 10 errors in the non-native condition. Should we now treat this as a significant difference? (This question is optional for the lab, but is recommended as a way of seeing the effect of increasing the number of trials.)

You can now use this same procedure for the scores that you obtained in the sequence recall task.

… You have now used a Binomial Test to calculate the probability that your performance on the sequence recall task was significantly different in the native and non-native conditions.

Once we start dealing with larger datasets it quickly becomes cumbersome to do these calculations by hand. Fortunately, it is straightforward to efficiently run a Binomial Test using built-in functions in your spreadsheet program.

Example 2: Group Data

Now let’s assume that you have the data from the entire class, and you want to find out whether performance on the non-native task was significantly different from performance on the native language task. Much as you enjoy calculating the probabilities by hand, life is short and you would like a more efficient way of doing this.

(i) Calculate total accuracy scores for the different conditions in the experiment. (These are totals, not means.)

(ii) Convert the accuracy scores into counts of errors and their distribution across conditions.

(iii) For any comparison that you might want to make, use the BINOMDIST function in your spreadsheet program to compute the probability of the observed distribution occurring by chance.

e.g., for the example that we computed by hand above, use the following expression

=BINOMDIST(1, 6, 0.5, 1)

Consult the help pages in your spreadsheet program for notes on the use of this function.

Important: In all of the comparisons that we have discussed so far, we have considered scenarios that were equivalent to a flip of a fair coin, i.e., each possible outcome had an equal probability, p = 0.5. For example, in a comparison of two experimental conditions that were presented an equal number of times, random errors should occur on either condition with equal probability, i.e., p = 0.5. However, we often encounter situations where the outcomes are not equally probable.

Consider a situation where we are comparing sequence recall data from two groups of participants, one of which is much larger than the other. 3/4 of participants were in Group A (e.g., English speakers), and 1/4 in Group B (e.g., Korean speakers). We want to ask whether the performance of one group was significantly different from the other group, and use a Binomial Test for this purpose. In this case we cannot assume that a random distribution of errors would yield equal error counts for the two groups, for the simple reason that Group A received 3 times as many trials as Group B. Hence, the probability of an error appearing with Group A rather than Group B is 0.75. The probability value that is fed to the BINOMDIST function should be adjusted accordingly.


When computing accuracy for the sequence recall, you cannot simply compare your response in the “response” column to the actual sounds in the “sequence” column. With the update to PsychoPy, it is much simpler to make this comparison, however, than it was in previous versions on the experiment. This page contains some helpful tips for making this comparison.

A few years ago a couple of people took up the challenge of finding a way to (semi-)automatically code the responses in the sequence recall task for Lab 1B. Wing Yee Chow (now at UCL), Shevaun Lewis (now back at UMD, in LSC), and Erika Hussey (now at U of Illinois) provided useful tips for how to do this using the following functions:

  1. MID Function: Select a substring of a given cell using its location in the text.
  2. IF Function: Display different values depending on whether a given condition is met.
  3. AND Function: Check if multiple conditions are met, only display TRUE if all conditions are met.
  4. CONCATENATE Function: Put together text from multiple cells. Can also be achieved using the “&” symbol between texts.

If data crunching of this kind is new(-ish) to you, you are encouraged to play around with the functions suggested here, in order to get a feel for the kinds of things that one can do. You might want to experiment with a dummy data set to get the hang of the different steps. For more information on the above functions, check out the Help section in Excel.

If you have experience with Python, Emily Darley (visitor from U. of Bristol in 2015) provided the following scripts to ease processing. You can alter them to work with your own data output files. Task 1, Task 2.