Psycholinguistics I/II - 2022-2023

LING 640/641


This course is a year-long foundation course sequence in psycholinguistics, aimed at graduate students from any language science field. The course assumes no specific background in psycholinguistics, including experimentation or statistics. The first semester course also requires only limited background in formal linguistics. But all students should have a serious commitment to some area of language science, and relevant expertise that they can contribute to the class group.

Psycholinguistics is a broad field. In principle, it includes all areas of the mentalistic study of language, including the various fields of so-called formal/theoretical linguistics, plus language acquisition and the neuroscience of language. And while we’re at it, why not throw in language disorders and second language acquisition for good measure. Due to this breadth, psycholinguistics can sometimes appear like a scientific archipelago – many interesting but disconnected islands. We will make no attempt to tour all of these islands here. Instead, we will focus on trying to understand the overall space, how the pieces fit together, and recurring themes and problems. We will focus on:

  • Understanding the landscape of psycholinguistics
  • Psycholinguistic thinking: finding good questions, evaluating evidence, resolving conflicts
  • Doing psycholinguistics: tools needed to carry out psycholinguistic research

In the Fall semester (LING 640) we will devote a lot of time to ‘model’ problems, such as speech categorization and word recognition, because these relatively simple cases allow us to probe deeply into psycholinguistic issues with limited linguistic overhead.

In the Spring semester (LING 641) we will devote more attention to the relation between the syntax and semantics of sentences and language learning and language processing, both speaking and understanding.

Location, location, location

In person interaction is so valuable. We will do everything possible to maintain that. Online interaction is next best. Hybrid is a nut that has yet to be cracked.

For now, the semester is proceeding as ‘normal’, i.e., roughly as we used to, 3-4 years ago. We hope that it will remain this way, though there may be bumps along the way.  


We hope that you will not be sidelined by COVID-19 during this semester. But it could happen. It probably will happen. If you get infected, UMD has extensive guidelines on how that impacts your class participation. We will try to take it all in our stride.

Remember that mental health is an important element of good health, especially for graduate students. Be aware, and seek help if needed. 


A research university thrives on connectivity. We lost a great deal of that over the past 3 years. This is a time for rebuilding. We have learned a great deal over the course of the past couple of years.

We have seen fewer people. We have had fewer spontaneous encounters. We have had fewer shared experiences. So we have needed to take extra steps to be connected.

One tool that helped us last year was a class Slack channel, created within the “Maryland Psycholinguistics” workspace. It proved to be useful for sharing questions, documents, and class updates. It could be good to do this again.

Individual meetings

Individual and small group conversations are especially valuable right now. Seek them out!

I am very happy to have individual discussions. Just drop me a line.

I welcome opportunities for in person meetings. In 2020-22 I learned to greatly value outdoor meetings in addition to traditional indoor-with-computers meetings. The UMD campus is beautiful year round, and it is good for graduate students and faculty to be visibly active on campus. (Did you know that the entire campus is an arboretum? — Check out the University of Maryland Arboretum Explorer app.)



Schedule – Spring

Mondays & Wednesdays, 12:00 – 1:30.  1108B Marie Mount Hall.

January 25: Introduction. The psycholinguistic landscape

January 30: Discussion Note 1: Structural priming (Branigan & Pickering 2016)

February 1: Experiments as arbiters (or not)

February 6: Discussion Note 2: Input vs. update (Omaki et al. 2014)

February 8: Misparsing as evidence (Lidz, White, Perkins, …)

February 13: Discussion Note 3: Evidence and variation (Pinker 1989, Goro 2008)

February 15: Invisible variation (Han et al. 2007)

February 20: Discussion Note 4: Islands (Pearl & Sprouse 2013, Kush 2022)

February 22: Transparent vs. opaque learning models

February 27: Discussion Note 5: Grammatical constraints and parsing in children (Conroy et al. 2009)

March 1: Selective fallibility

March 6: Discussion Note 6: Grammatical precision in comprehension (Keshev & Meltzer-Asscher 2017)

March 8: Prediction

March 13: Discussion Note 7: something by Sol Lago

March 15: Guest speaker: Sol Lago (PhD ’14; University of Frankfurt)

March 18 – 26: SPRING BREAK

March 27: (possibly hybrid) Memory access

March 29: (possibly hybrid) Memory access

April 3: Discussion Note 8: Illusions

April 5

April 10: Discussion Note 9: Information theoretic models

April 12

April 17: Discussion Note 10: Production

April 19

April 24: Discussion Note 11: Production

April 26

May 1: Discussion Note 12: TBD

May 3

May 8

May 10: It’s a wrap!

Schedule – Fall

Mondays & Wednesdays, 12:00 – 1:30.  1108B Marie Mount Hall.

August 29: Introduction. The psycholinguistic landscape

August 31: Some core concepts. Discussion of whistled languages.

September 7: Development of Speech Perception

September 12: Development of Speech Perception

September 14:  Becoming a native listener

September 19: Distributional learning

September 21: Learning contrasts

September 26:

September 28:

October 3: Neuroscience of speech perception and production

October 5: Word recognition

October 10: Active processing

October 12: Recognizing words in context

October 17:

October 19: Neuroscience of word recognition

October 24:

October 26: Word production

October 31:

November 2:

November 7:

November 9:

November 14:

November 16:

November 21:


November 28:

November 30:

December 5:

December 7:

December 12:



This is graduate school. Your grade should not be your top concern here. You should be aiming to get a top grade, but your focus should be on using the course to develop the skills that will serve you well in your research. There will be no exams for this course. The focus of the course is on reading, discussing, writing and doing throughout the semester, and hence your entire grade will be based upon this.

Grades will be aligned with the values that guide this course: (i) active engagement with the core questions, (ii) thinking and writing clearly, (iii) taking risks and exploring new ideas, (iv) communicating and collaborating with others.

If you want to get the maximum benefit from this class (i.e. learn lots and have a grade to show for it at the end), you will do the following …

1. Come to class prepared, and participate (40% of grade).

Being prepared means having done some reading and thinking before coming to class. Writing down your thoughts or questions about the article(s) is likely to help. Although many readings are listed for this course, you are not expected to read them all from beginning to end. An important skill to develop is the ability to efficiently extract ideas and information from writing. Particpating in class discussions is valuable because it makes you an active learner and greatly increases the likelihood that you will understand and retain the material. You should also feel free to contact me outside of class with questions that you have about the material.

2. Think carefully and write clearly in assignments (60% of grade).

In writing assignments you will think and write about issues raised in class and in the assigned readings. The writing assignment will often be due before the material is discussed in class: this will help you to be better prepared for class and to form your own opinions in advance of class discussion. In your writing it is important to write clearly and provide support for claims that you make.
We will plan to have many shorter writing assignments, typically involving responses to questions about individual readings, for which you will have relatively limited time. These are not intended to be major writing assignments. But they will all be read, and they will contribute to your class grade, following the guiding values of the class.
Revising your written work following discussion in and out of class is a very valuable activity. We will make more use of this in the spring semester.

If you are worried about how you are doing in the course, do not hesitate to contact me. Email is generally the most reliable way of reaching me.

Grade scale

 A 80-100%  B- 60-65%
 A- 75-80%  C+ 55-60%
 B+ 70-75%  C 50-55%
 B 65-70%  C- 45-50%

Note that even in the A range there is plenty of room for you to show extra initiative and insight. The threshold for A is deliberately set low, so that you have an opportunity to get additional credit for more creative work.


Written work should be submitted individually, unless the assignment guidelines state otherwise or you have made prior arrangements with the instructor, but you are strongly encouraged to work together on anything in this course. Academic honesty includes giving appropriate credit to collaborators.  Although collaboration is encouraged, collaboration should not be confused with writing up the results of a classmate’s work – this is unacceptable. If you work as a part of a group, you should indicate this at the top of your assignment when you submit it.

Assignments – Spring

Discussion Note #S12 (5/1/23). For our final discussion note of the year (#24!) we will look at some new studies about the production of multi-clause utterances. Most work on sentence production has focused on single clauses, where the speaker’s goal is, roughly, to turn a simple event-sized chunk of message into a simple clause-sized chunk of language. A lot of language involves multi-clause utterances, which are interconnected in various ways, often with linguistic dependencies that span those clauses. This raises many interesting new challenges when for characterizing production processes. 

We will discuss three new papers on multi-clause production. You should certainly read and discuss Momma 2021 and Momma 2022, both of which are about production of wh-dependencies in English. As an optional bonus, also discuss Sarvasy et al. 2023. This one falls into the now for something completely different category, as it is about production of switch reference morphology in the Papuan language Nungon. 

(i) Summarize the key argument(s). Do they work? (ii) What do these findings tell us about timing of processes, or grammatical representations, or both? (iii) In our discussions in the past week we focused on the distinction between “function assignment” and “constituent assembly” steps in widely adopted production models. Where do these new findings about multi-clause utterances fit into that general production pipeline? Are they the equivalent of the function assignment processes that we have been discussing, or do they depend on something like that having taken place already?

Discussion Note #S11 (4/24/23). Our goal in this (penultimate) assignment is to build on the discussion from class on Wednesday 4/19/23, where we were wondering about the “function assignment” stage in sentence production, specifically the evidence that it is a (i) distinct stage, that (ii) maps arguments directly on to grammatical functions like ‘subject’ and ‘direct object’, rather than mediation via initial assignment to argument roles like ‘agent’ and patient’. The review articles that we read pointed to Bock et al. 1992 as a key piece of evidence for this claim. So, please explain the empirical argument that Bock et al. make, and comment on how persuasive you find it? (Their new experiment is explained on pp. 8-10 of the PDF file.  The background information in the opening pages of the paper seems to be more than is needed, so you should not need to read it word-for-word.

In other sources that we have been drawing on, there is further discussion of what goes into the function assignment stage. Bock & Levelt 1994 is a broad review of sentence planning, that includes much discussion of evidence from speech errors. Roelofs and Ferreira 2019 is a recent review that has less to say about function assignment overall, but they comment on p. 43 on priming evidence that they regard as challenging for classic views.

Can the claims of Bock et al. (1992) about “direct mapping” be reconciled with the findings by Momma and colleagues on thematic roles?

Discussion Note #S10 (4/17/23). Momma and Ferreira (2019) is a clever study on the order in which speakers plan the words that they are going to say. Their main argument is that planning order systematically diverges from surface word order. The main point of the study is made already in Experiment 1, and the following 5 experiments mostly serve to tidy up loose ends around the interpretation of the first experiment. (a) What do you regard as the most important or most surprising piece of evidence in the study, and why? (b) Most of the experiments in the paper rely on an (extended) picture word interference (PWI) measure. Towards the end of the paper, M&F introduce an exploratory analysis involving correlations of speaking times with verb frequencies (most of the action is on p.25, which is dense, but worth it). Both of these measures are intended to measure when specific words are planned. Are these two effects measuring the same thing? Do you think that one is more useful than the other? (c) All the nouns and verbs in the study are singular. If the modifier noun is plural (“spoons”), should we expect to see agreement attraction in either of the two key sentences (1: The octopus below the spoons are boiling; 2: The octopus below the spoons are swimming)? 

Discussion Note #S9 (4/10/23). After repeatedly stating this week that agreement attraction phenomena are consistent and robust, we turn to a couple of examples of a big and interesting caveat. Recent discoveries in multiple languages have shown that agreement attraction effects are also highly selective, in ways that call for careful linguistic analysis. Fun!

A recent paper by Bhatia and Dillon (2022) examines agreement attraction in Hindi. A paper by Slioussar (2018) examines agreement attraction in Russian. In both examples, agreement attraction shows selectivity. In both cases, the selectivity depends on the case marking system of the language. But the specific details differ between the two languages. The contrast between the two languages is quite striking.

(i) Give a concise description of the what you take to be the key finding in each language. Which piece of evidence do you find most compelling in demonstrating that finding? (ii) Thinking in terms of explicit mechanisms, what information must Hindi and Russian speakers be encoding and accessing in order to capture the observed effects? In our cartoon example of agreement computation in class, we have assumed something like “At the agreeing word, look up its features; use those features as retrieval cues to access the memory encoding of the prior sentence; respond “ok!” when something in memory at least partly matches those retrieval cues.” Does this work for these cases? (iii) Will the same mechanism work for both languages?

Discussion Note #S8 (4/3/23). This week we are looking at (roughly) one empirical contrast, which has generated more controversy than one might have expected. This gets us into questions of experimental robustness, different tasks, model predictions, and linguistic detail. Fun!

In class this week we discussed findings that indicate that comprehenders are pretty good at processing reflexives, in the sense that they consider only those antecedents that are consistent with binding Principle A, roughly, “antecedents must be in the same clause,” e.g., Sturt 2003. Before Spring Break we also discussed findings about “agreement attraction”, i.e., situations where subject-verb agreement seems to be disrupted by nouns that should be irrelevant to the calculation of agreement. But, wait a minute, this sounds like a contrast. Both reflexives and verbs need to form a relation with the subject of the same clause. One of those processes seems to be pretty robust, while the other seems pretty fragile. Is this true? And, if so, why?

The literatures on agreement and anaphor processing developed largely independent of one another. The emergence of explicit memory access accounts, such as we have discussed in class, made the contrast more striking. Dillon et al. (2013, UMD study) is one attempt to more empirically show that the contrast is real, and not an artifact of different experimental procedures. Jaeger et al. (2020, Potsdam study) is one among a series of studies, from the home of general, cue-based retrieval models, that disputes whether the empirical contrast is real. Kandel & Phillips (2022, UMD study) looks at the same linguistic phenomena, but from the angle of production. So, these are three from a larger body of studies on whether the contrast that we have already seen in class is a real thing. They’re all published in the same journal, one that is known as being a stickler for empirical detail.

Our discussion topic for Monday: what’s at stake, and how to evaluate the evidence. Questions: (i) Why does this contrast matter? Or, what is at stake for the protagonists in these discussions? Feel free to disagree with their rationale. (ii) Is the contrast between anaphora and agreement real? What evidence do you find more or less compelling? (iii) Most of this dispute has played out in the area of comprehension. Do the more recent findings about production make a difference? (Agreement attraction in production is utterly uncontroversial. Its counterpart in production has been studied much more rarely.)

Note, there are 3 articles in this week’s collection. Clearly, these can’t all get the same level of attention that you might give to a single article. Time is finite. There’s too much to read. This isn’t going to change. So, I would suggest to be strategic to how you approach the task of extracting information from the articles. You already know some of the basic ideas. You probably don’t need to read cover to cover. A good feature of articles in psychology-style journals is that they are systematically structured, and there’s a good chance that the main tables and figures tell much of the story. So, write down explicit questions about what you want to know from the articles, and then go digging. E.g., “I’m guessing that they’re going to show me evidence  of such-and-such kind, how does it look exactly?” Also, it’s clear that I have a horse in this race. But this should not bias your conclusions. As I emphasized on the first day of the fall semester, being “interestingly wrong” is a great place to be in science.

Discussion Note #S7 (3/13/23). This week we will look at a recent study by Stone et al. (2021), in preparation for the guest talk on Weds 3/15 by Sol Lago (UMD PhD ’14). The hope is that we will be ready for a conversation with Sol that goes beyond the specifics of what she did, and engages on questions about mechanisms, implications, etc.

German possessive pronouns are interesting because they agree twice in the same word. The pronoun stem agrees with the antecedent of the pronoun, i.e., the possessor. The pronoun suffix agrees with the head noun, i.e., the possessee. The Stone et al. study is about the potential conflict that arises between these agreement relations for comprehenders. The study argues that there is a conflict, and discusses mechanisms that could account for that. (i) Why does the phenomenon even matter? (ii) What, if anything, do we learn from the difference between experiments 1 and 2? Is this just an adjustment that helps to make underlying processes clearer, or is the change in design changing the underlying processes? (iii) Thinking counterfactually, what might a mechanism need to look like if it was to avoid the conflict that this study documents? In other words, imagine a world where the experiment(s) had worked out differently. What would that world need to be like?

Discussion Note #S6 (3/6/23). For this week’s discussion, we focus on an important line of argumentation, due to Brian McElree and colleagues (McElree, Foraker, & Dyer 2003). They rely on an interesting but rarely used experimental paradigm called Speed-Accuracy Tradeoff (SAT). The 2003 article makes the striking claim that, in comprehension of wh-dependencies, long-distance relations are constructed just as quickly as local relations. (i) Give a brief summary of the empirical argument for this claim. Does this argument depend on the logic of the SAT paradigm, or could the same argument maybe be made using more common reaction time paradigms (e.g., press a button as soon as you detect an anomaly)? (ii) Thinking about other recent things that we have read about wh-dependencies, e.g., Pearl & Sprouse 2013, the psycholinguistic studies behind Wilcox et al. 2023, we have seen good examples of how constraints on filler-gap relations impact the learning or processing of wh-dependencies. If McElree et al. are right that filler-gap dependencies are formed via a “direct access” mechanism, how could constraints on those dependencies, i.e., island constraints, impact dependency formation? Specifically, how does the direct access mechanism fit (or not fit) with what we have seen about island constraints blocking certain types of dependency formation?

If you’re interested to see more, you could check out a paper using the SAT paradigm by Brian Dillon and colleagues(2014). They argue that in some kinds of dependency formation, notably long-distance reflexive licensing in Mandarin, the structural distance between the pieces of the dependency does matter.

Discussion Note #S5 (2/27/23) Conroy et al. (2009) present a series of experiments on children’s interpretations of pronouns in sentences like (i) “Grumpy painted him” or (ii) “Every Dwarf painted him”. It has long been known that children in English and other languages often allow an interpretation of (i) where the pronoun corefers with the subject of the same clause, an interpretation that adults disallow. The reasons why they allow this are disputed. A well-known finding from the late 80s is that the same children appear to do better, i.e., allowing only the adult interpretation for (ii), where the subject is a quantifier. This is striking, because children seem to do better with (ii), despite it being intuitively more ‘complex’ than (i), and because it potentially resolved a dispute among linguists and philosophers about how pronouns link up to their antecedents. Exciting stuff! … But then things got messy. Elbourne (2005) argued that the famous contrast was an experimental artifact (boo!). So in an earlier iteration of this course we tried to remedy that, by finding a better design. To our surprise, we found that Elbourne was right. Well, ok, he was only partly right. Along the way we learned that children could do better on (i), but only if the conditions were delicately balanced. Children’s interpretations are pretty fragile.

Q1: Both the “Delay of Principle B Effect” and the “Quantificational Asymmetry” have attracted a lot of attention from a lot of language acquisition researchers. Few developmental findings have had such direct influence on linguistic theory. Why all the fuss and attention? What is at stake for (psycho)linguists?

Q2: The experimental design highlights the notions of “availability” — a property of the pronoun’s antecedent (an “NP-sized” unit) — and “disputability” — a property of a “clause-sized” interpretation. If both of these matter to children’s performance, what can we conclude (if anything) about how children go about generating or choosing interpretations? This is an invitation to think about what kind of interpretation-generation processes would care about such properties.

Q3: Although Conroy et al. found that children were surprisingly adultlike in their pronoun interpretations, many, many other studies have not found that. What should we make of this variability in results across studies?

Discussion Note #S4 [2/20/23]. Kush et al. (2021) and Wilcox et al. (in press) are studies that try to take the learning challenge addressed by Pearl & Sprouse (2013) a step further. Kush et al. explore a more ‘exotic’ case of island effects and non-effects in Norwegian. Wilcox et al. ask whether island constraints are in some sense learned by recent large language models such as GPT-3, which lack the hard-coded parsing constraints assumed by Pearl & Sprouse. Question: What conclusions should we draw about the need for constraints on learning and generalization, in light of the findings in these two studies? You can choose to go into more detail on one or the other studies in your answer, but I would like you to take a stab at addressing both — minimally, try to give a brief characterization of the evidence that the authors regard as most important.

One thing to bear in mind: in Pearl & Sprouse’s study it was no mystery why the model fared as well as it did? By combining (i) their assumptions about the units that their parser/learner tracks, i.e., container node trigrams with (ii) their discoveries about what structures are(n’t) attested in child input, we can see exactly how their model comes to assign low probabilities to island-violating sentences. In thinking about both of the newer studies, you may find it helpful to use those two facts as a starting point. In Norwegian, both the grammar and the corpus seem to be different. In the case of the large language models, we can maybe assume that they learn from a corpus that is roughly similar to P&S’s corpus, in terms of the distribution of questions. Well, aside from the fact that the large language models learn from a lot more input data.

Reading note for Kush et al.: basically, the authors are looking at a simple(-ish) table of construction types, comparing the corpus evidence with acceptability judgments. Sketching out that table is useful for understanding what’s going on.

Discussion Note #S3 [2/13/23] Pearl & Sprouse (2013) present a model of how island constraints could be learned without ‘hard coding’ them into Universal Grammar. Their model is interesting because (i) it challenges claims that these constraints areunlearnable, (ii) it uses (parsed) child-directed speech, and (iii) it is computationally explicit and transparent, i.e., no black box machinery. For this reason, I regard it as raising the bar for learning models.

Q1: Briefly describe how the P&S model generalizes beyond input sentences that it encounters. How does it come to treat long-distance wh-questions as more acceptable than island violations?

Q2: If child directed speech includes errors, e.g., inadvertent island constraint violations, can the P&S model still succeed?

Q3: Could the P&S model learn different island constraints in another language? What would it need in order to do that?

Tip: In the P&S model, the notion of ”container node trigram” is central. This is the main piece to understand.

Note: In 2013 I published an article that is critical of the P&S proposal. This does not change that I think P&S’s model is really important. Their model changes the nature of the discussion. (Bonus: Jeff Lidz and I have argued over the developmental predictions of P&S’s model. I have interpreted them as predicting a conservative learner that starts with a very restrictive grammar of wh-movement and then expands its repertoire. Jeff has interpreted them as making the opposite prediction, such that learners start with a very liberal grammar of wh-movement. How are such different interpretations even possible?)

Discussion note #S2 [2/6/22]: Omaki et al. (2014) examines adult and child interpretations of globally ambiguous wh-questions like “Where did Emily say that she hurt herself?”, in both English and Japanese. The motivation is to understand how children process sentences that they encounter.

1.What should we conclude from the cross-language comparison between English and Japanese speaking children?
2.Based on these findings, how serious is the risk that children misunderstand things that are said to them?
3.Some languages, e.g., Russian, are reported to severely limit long-distance wh-dependencies, such that “Where did Emily tell someone that she hurt herself?” can only be understood as a question about the telling event. In light of Omaki et al.’s findings: what would be needed for English and Russian speaking children to correctly figure out whether their language allows long-distance wh-dependencies?
Akira Omaki was a 2010 PhD graduate from UMD who passed away in 2018. Read about his life here.

Discussion Note #S1 [1/30/23] Branigan & Pickering 2017 (“An experimental approach to linguistic representation”; see Readings) argues for the value of syntactic priming as a tool for understanding language structure. Try to address the following three questions: (1) The finding by Bock & Loebell (1990) about priming and by-phrases (see Section 2.1, p. 8) is among the most influential in this literature. Why so? Is this fame justified? (2) What do B&P mean by “The reality of linguistic representation” (p. 3)? (3) What is the role of the evidence from missing (“elided”) elements in Mandarin (p. 10)?

As is standard for articles in this journal, B&P’s target article is followed by a collection of short commentaries, which stake out various positions in support of or in criticism of the authors. You do not need to read all of them, but it can be fun to read a sampling of the opinions.


Assignments – Fall

Discussion Note #12 [12/5/22] For our final Discussion Note of the semester, we’ll look at a topic that sits at a number of intersections: (i) word and sentence processing, (ii) syntactic and semantic processing, (iii) time-based and task-based accounts of differences. It’s also a topic that has been exercising a number of minds locally in the past couple of years. (I would have been shocked if you told me that 10-15 years ago.)

Kim & Osterhout (2005) is an influential ERP study. Its main finding helped to upend received wisdom about the most prominent ERP responses in sentence comprehension. At least two other groups published roughly the same finding at around the same time. I think the K&O study has the most interesting design. K&O take their findings to have architectural implications for the relation between syntactic and semantic processing, specifically challenging the VERY standard assumption that semantic interpretation combines smaller meanings into larger meanings by using syntactic structure as a guide. QUESTION: K&O regard their findings in Experiment 2 (“the dusty tabletops …”) as especially important to their argument. Which aspect of their results is most important to their argument?
Chow et al. (2018) reports experiments carried out in Mandarin Chinese by Wing Yee Chow in 2011-2013 (yeah, the publication process was really arduous). Chow was looking at similar phenomena to Kim & Osterhout, but she was looking at them rather differently. She concluded that she was testing mechanisms of lexical prediction. Lee et al. (2022: it’s a 20-minute video of a talk, starts around the 27:00 mark) looked at very similar phenomena. They even used materials from some of Chow’s studies. But they were focusing on different measures. That was an impact of the pandemic. They reached a different conclusion than Chow did. QUESTION: which aspects of the Chow et al. and Lee et al. findings present the most challenge for the account provided by the other?
READING/WATCHING TIP. The three pieces are closely related enough that they build on each other, and on other things that we have read about ERPs and cloze tasks. So, rather than reading them in a linear and historical fashion, you might do well to start by getting an overview of the different studies, and then from there piecing together how they relate to each other and what the implications are. You might even start by watching the 20-minute video of Rosa Lee’s talk, and working back from there. When reading the articles, it’s often good to start with the abstract and the figures/tables, and work outwards from there.

Discussion Note #11 [11/21/22]. The review by Lau et al. (2008) is widely cited in support of the claim that the N400 ERP component reflects lexical processes. Since the N400 to an incoming word is known to reflect how well that word fits in context (often operationalized in terms of cloze probability), this in turn implies that lexical processes are directly influenced by context. (i) What is the evidence that N400 reflects lexical processes, and do you buy it? (ii) If true, can this be reconciled with the claim from behavioral studies (cross-modal priming, visual world eye-tracking) that lexical access in comprehension proceeds initially in a context-independent fashion?

(Note that you should feel no need to agree with the arguments of Lau et al. (2008). The authors themselves have questions.)

Discussion Note #10 [11/14/22]. Following our discussion of a model of single word production that has been developed over decades, we shift to production of words in context. This is a more complicated phenomenon, but one where interesting recent results are helping to clarify the specifics of the processes involved (and one where there is much current activity locally, aided in part by the pandemic). This is also a topic where thinking in terms of an explicit process model helps to better understand some widely used measures, such as ‘cloze probability’.

Staub et al (2015) offer a study of a deceptively simple “speeded cloze” task, in which speakers simply call out a word to complete a sentence as quickly as possible. The key action in this paper involves the relation between the cloze probability of a response and its timing. It is not so surprising that high cloze responses are produced more quickly. What is more surprising, and hence informative, is the finding that responses of the same cloze probability (e.g., 20% cloze) are produced faster if the alternatives in that context are high cloze (e.g., 40%) than if the alternatives in that context are low cloze (e.g., 10%). In other words, when your competitors are stronger than you, you are produced more quickly. This is what you may hear locally referred to as the “Usain Bolt effect”. Question: what is going on here, and why does it favor a “race” model of word production in context over alternatives? (Related: cloze probability has long been used as a standard measure in psycholinguistics of how well a word fits a context. Why do some now regard speed of production as a better measure than cloze probability?)

Building on the Staub et al. (2015) study, a new paper by (current HESP postdoc) Tal Ness (Ness & Meltzer-Asscher 2021) makes an argument for more specific properties of the production of words in context. Ness & Meltzer-Asscher make the interesting claim that semantic similarity between word candidates has the opposite effect in a cloze task than it has in single-word tasks … but that those different effects have the same cause. Question: Try to explain in accessible terms why semantic similarity has these different effects in different tasks. Do you find this argument persuasive? The Ness & Meltzer-Asscher paper is short, but relatively dense. One of its interesting features is its use of explicit models of how words are activated in a cloze task, prior to utterance, and how this relates to prior models of lexical activation in single word tasks.

Discussion Note #9 [11/7/22]. Building on our discussion of picture-word interference effects on Weds 11/2, this week we will dig further into models of spoken word production. We will focus on one prominent model that has been widely used in computational simulations — Ardi Roelofs’ WEAVER++ model — and examine the empirical evidence for the model’s properties. Roelofs’ model implements a theory of word production that is the result of a monumental research program by Pim Levelt’s group at the Max Planck Institute for Psycholinguistics in Nijmegen, Netherlands. Spanning many years, and with the support of MPI resources, it is probably the most extensive project on speech production ever undertaken.

A chapter by Roelofs and Ferreira (2019) summarizes some key assumptions of the model on pp. 37-40, in particular a series of “major controversies” at the end of p. 38. Look into the evidence surrounding AT LEAST ONE of those controversies. Give a summary of what is at stake, how it impacts the model, and your assessment of the evidence. Since we will not all have read the same evidence, please come prepared to explain the topic/evidence that you chose in class.

Ardi Roelofs has a personal website that is useful, with a summary of WEAVER++ research and links to most of his papers. The ‘classic’ presentation of the model is in a Behavioral and Brain Sciences target article by Levelt, Roelofs, & Meyer (1999). (Meyer became one of the MPI directors following Levelt’s retirement.) Roelofs has published a number of subsequent papers with updates to the model, including illustrations of how it can capture word production difficulties in various different forms of atypical language.

Pim Levelt has a fascinating autobiographical piece in Annual Review of Linguistics, On becoming a physicist of mind (Levelt 2020). It includes the story of the process that led to Roelofs’ model. That story encompasses a lot of important figures in the history of psycholinguistics. Levelt himself was one of the founders of the field.

Discussion Note #8 [10/31/22]. This week we’ll try something a little different that ties together (i) psycholinguistic themes that we’re focusing on in this course, (ii) current national policy debates related to language science, and (iii) UMD campus efforts that are becoming much more prominent, as soon as at Friday’s Language Science Day.

Sold a Story is a new podcast by Emily Hanford, a reporter based in our local area. It’s about educational practices around reading, especially controversies around the role of bottom-up information (word forms, phonics, etc.) and top-down information (context) in becoming a fluent reader. Well-meaning people on both sides of this issue believe that they are being guided by scientific evidence. The first two episodes have already been released, with the third due to appear on Thursday Oct 27th. The reporting frequently appeals to what the science shows, but it’s not so clear what the actual evidence is. Your assignment: listen to the first two episodes (I don’t yet know what is in the third), identify one or more claims about what the science shows, and try to find out what the actual basis for those claims is. Two examples: (i) fluent readers pay attention to all of the letters in a word, (ii) how a person is taught affects which areas of the brain they use to read.

One thing that is different about this assignment is that there is no presumption that we will have all read the same material beforehand. So we can’t work from the same shared starting point as in our normal discussions. So please come armed with evidence, and be ready to give a compact summary of what the evidence consists of.

Please feel very free to discuss and share ideas and sources with other group members. The podcast episodes are well produced, and easy to listen to. But transcripts of the podcast are also available and searchable: Episode 1, Episode 2. Some of the psycholinguists and cognitive neuroscientists invoked in the podcast include Keith Rayner, Mark Seidenberg, and Bruce McCandliss, all highly regarded. (I gather that some of Emily Hanford’s other reporting on reading has featured UMD researchers.)

Also, here is the (equally new!) piece in Science Magazine about education and scientific evidence that I mentioned in class. UPDATE: Oh, interesting: here is an even newer piece from the New York Times about a similar approach to avoiding online misinformation in teens. (Odd coincidence, it’s partly about the work of a new UMD faculty member who recently moved a few houses down the street from me. Small world!) Something that I find striking, interesting, and a bit worrying about both of these is the message, “It’s too hard to evaluate evidence yourself, so instead focus on the credibility of the messenger.”

Discussion Note #7 [10/24/22]. Mostly by chance, two recent UMD graduates examined the effectiveness of simple syntactic contexts in constraining lexical access. Phoebe Gaston et al. (2020) focus on comprehension. Shota Momma et al. (2020) focus on production. Both employ creative experimental designs. The results lead to apparently very different conclusions about the effect of the category constraint. The basic aim of this exercise is to explore possibilities for the contrast.

As a warm-up, to think through the designs, first address the following. (i) Phoebe Gaston originally worried that she was ‘scooped’ by Strand’s 2017 paper on a similar topic. She later decided that she was not. Why do Phoebe’s design changes matter? (ii) In Shota Momma’s design, why does it matter that participants are doing a mixed task?

Finally: (iii) Discuss possible reasons for the contrasting results? Different task details? Difference between speaking and understanding? Different measures? Something else? No real conflict?

The Gaston et al. paper is a long manuscript (submitted + reviewed; reviewers were enthusiastic but had some probing questions). The main experiment is on pp. 34 onwards. The simulations in TRACE on pp. 18-34 are not essential, though you may find them interesting (we learned a lot from them). Phoebe’s discussion of mechanisms involved in guiding eye fixations on pp. 38-42 are very useful.

Discussion note #6 [10/10/22]. Read the short Science paper by Van Turennout et al. (1998) that makes an attention-grabbing claim about going “from syntax to phonology” in 40 milliseconds. The study is certainly clever, but the title is a bit of a simplification. Question: describe as accurately as possible what the 40 milliseconds corresponds to. Can we say what mental computations/processes occur in that time interval? Does that seem (im)plausibly fast? (Rough estimate, as a neural signal passes through different steps in the brain, it takes around 10ms per step, e.g., there are 6 steps between the cochlea and auditory cortex, and it takes 50-60ms for signals to travel from one to the other.)

Two things that it could be helpful to think about in reading the study:(i) what is the step-by-step mental process that a participant in the study has to go through in order to perform the task? (ii) When we refer to times in psycholinguistics, we often use single numbers as a summary of a lot of different numbers (from individual trials by individual participants). The underlying numbers always form a distribution. E.g., if we say that people in a sound discrimination task responded in 620ms, what we generally mean is that there is a distribution of response times for which 620ms reflects a central tendency of that distribution, e.g., average. Sometimes we instead use summary numbers to refer to the edge of a distribution, e.g., the earliest time at which something occurs, rather than the average time. When reading studies like Van Turennout, et al. be mindful of what the time estimates refer to.

Discussion note #5 [10/3/22]. Our next discussion focuses on some ways in which researchers have used electrophysiological measures (EEG/MEG) to try to understand how speech sounds are mentally represented/encoded. Näätänen et al. (1997), Kazanina et al. (2006), and Pelzl et al. (2021) are all examples of studies that rely on comparisons of speakers of different native languages. But they each adopt a different logic to do this. Choose (at least) two of the three studies, and address the following: (i) How do they differ (if at all) in terms of the sound encodings that they are seeking to clarify, and the experimental logic for how they achieve this? (ii) What (if anything) is the “value added” in these studies from the use of electrophysiological (EEG/MEG) measures, rather than behavioral measures?

Two of the articles are short. The newest one is longer, and technically more up-to-date, but the core findings are fairly clear. A couple of clarifications. First, I am not seeking a specific answer on the second question. You might conclude that the more cumbersome method is beneficial or not. What I do want you to do is to think through the rationale of what one might hope to gain from the use of specific methods, as this is the kind of decision that one needs to make again and again in psycholinguistics. Second, although I was a co-author of one of the studies and the logic comes from an earlier study of mine (Phillips et al. 2000), feel free to challenge it. None of the authors are beholden to the claims made there.

Discussion note #4 [9/26/22]. For a number of years, Naomi Feldman and colleagues have been digging into the problem of how infant sensitivity to speech categories might develop, despite lack of minimal pairs and lack of clearly distinct acoustic distributions. The aim of this discussion note is to try to situate this body of work in the context of developmental changes that we have already discussed. In particular, is there a role for early word learning in figuring out speech categories, according to Feldman et al. (Those views might be different in different papers.)

There are a few papers, which are linked to the Readings tab. You do not need to read and respond to all of them. Do read at least one of them, preferably Hitczenko & Feldman (2022), as it is brand new. Preferably look at at least one other. Feldman et al. (2009) adopts a very clear position on the importance of word learning, which may be at odds with later claims. (In that paper, if you can explain why Feldman argues that it’s easier to learn categories from a lexicon without minimal pairs, then you understand the main point.) That one is also the shortest. Of the two 2021 papers, the one in Open Mind is more accessible.

Each of the articles involves some amount of technical material about computational models, which may be more or less accessible to you. You do not need to follow all of that material in order to get the core ideas in the papers. (Also, I do not understand all of it, either. But feel free to discuss with classmates.)

Synthesis note #1 [9/21/22]. How accurate is it to say that infants have phonological category representations like older children and adults?

For purposes of this piece, “infants” can refer to any or all of ages from birth to 18 months. The main aim of this exercise is to think carefully about the relationship between behaviors and experimental evidence on the one hand, and the cognitive representations that are responsible for these behaviors. Your write up does not need to be long, but please explain as clearly as possible.

Discussion note #3 [9/14/22]: A very short paper by Stager & Werker (1997) reports four experiments on infants’ sensitivity to labels assigned to pictures. In one key experiment, 8-month olds appear to “outperform” 14-month olds. This seems counterintuitive. What is going on, and why is this an elegant demonstration? (The paper appeared in Nature, it has been cited over 1,000 times, and it has generated a lot of interesting subsequent work.)

Discussion Note #2 [9/7/22]: This discussion note is about Chomsky (1965, ch 1, pp. 3-15), and Marr, D (1982; pp. 8-28, esp. 20-28). Question: Marr’s discussion of visual perception highlights the “goal” of the computation. Does linguistic computation have a “goal”? Does Marr’s view on levels of analysis align with Chomsky’s contrasting of competence and performance?

I recommend to keep in mind the distinction that we discussed on Wednesday (8/31) between (i) Levels of analysis, (ii) Tasks, and (iii) Mechanisms.

Discussion note #1 [8/31/22]: Whistled languages are a striking example of the adaptation of human speech to different environments. Generally they are not distinct languages, but versions of spoken languages that are conveyed in whistled form. A recent review of whistled languages from around the world by Julien Meyer reveals striking similarities in how languages adapt to the whistled medium. This is also summarized in a broad audience piece (with demos!) by Bob Holmes in Knowable Magazine. Please answer the following questions: (i) Why does whistling force languages to limit the information that is conveyed to the listener? Describe a couple of regularities in how languages choose to do this. (ii) Are there psycholinguistic implications of how languages adapt to the whistled medium? In particular, do the adaptations seem more suited to helping speakers or helping listeners (or neither)?

Introduction [8/30/22]: Please send an email to, introducing yourself and relevant background and motivations. I’d love to know the following:

  1. What is your background in psycholinguistics, in (in)formal study or in hands-on experience. What kind of background do you have in experimentation, including knowledge of experiment control platforms, e.g., PC-Ibex, PsychoPy, or statistics/analysis tools, e.g., R, Excel?
  2. What, briefly, is your linguistic background, including languages where you have expertise?
  3. What are the outcomes that you hope to gain from the course (either by December ’22 or by May ’23)?




These are links to the slides used in the course. But note that they include some things that were not discussed in class, and in many cases the slides do not do justice to our extensive discussions in class.

Set 1: Scene setting

Set 2: Development of speech perception

Set 3: Electrophysiology and speech perception



Readings – Spring

Introduction (Spring)

Bever, T. (2021). How Cognition came into being. Cognition, 213, 104761.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. [chapter 1]

Branigan, H. & Pickering, M. (2017). An experimental approach to linguistic representation. Behavioral and Brain Sciences, 40, e282.

Overview: Putting Pieces Together

This series of articles lays out the current thinking of myself and colleagues on the relation between traditional linguistic theories and theories in psycholinguistics.

Lewis, S. & Phillips, C. (2015). Aligning grammatical theories and language processing models. Journal of Psycholinguistic Research, 44, 27-46. 

Momma, S. & Phillips, C. (2018). The relationship between parsing and generation. Annual Review of Linguistics, 4, 233-254. 

Phillips, C., Gaston, P., Huang, N., & Muller, H. (2020). Theories all the way down: remarks on “theoretical” and “experimental” linguistics. In press: G. Goodall, ed., Cambridge Handbook of Experimental Syntax. 

Omaki, A. & Lidz, J. (2015). Linking parser development to acquisition of syntactic knowledgeLanguage Acquisition, 22, 158-192.


Introduction (Fall)

Whistled languages – something completely different … maybe

Meyer, J. (2021). Environmental and linguistic typology of whistled languages. Annual Review of Linguistics, 7, 493-510.

Holmes, B. (2021). Speaking in whistles. Knowable Magazine, publ. 8/16/21.


Higher level background

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. [chapter 1]

Marr, D. (1982). Vision. Cambridge, MA: MIT Press. [excerpt]

Jackendoff, R. (2002). Foundations of language. Oxford University Press. [chapter 1, chapter 2, chapter 3, chapter 4]

Lewis, S. & Phillips, C. (2015). Aligning grammatical theories and language processing modelsJournal of Psycholinguistic Research, 44, 27-46.

Momma, S. & Phillips, C. (2018). The relationship between parsing and generation. Annual Review of Linguistics, 4, 233-254

Speech Perception, Learning Sound Categories

Stager, C. & Werker, J. (1997). Infants listen for more phonetic detail in speech perception than word learning tasksNature, 388, 381-382. [This is one of the primary readings for the section of the course on phonetic/phonological representations. A very short, but very important study. Why are younger infants better than older infants, even on native-language contrasts?]

Hitczenko, K. & Feldman, N. (2022). Naturalistic speech supports distributional learning across contextsProceeding of the National Academies of Science, 119, e2123230119.

Feldman, N., Griffiths, T., & Morgan, J. (2009). Learning phonetic categories by learning a lexicon. Proceedings of the 31st Annual Meeting of the Cognitive Science Society.

Feldman, N, Goldwater, S., Dupoux, E., & Schatz, T. (2021). Do infants really learn phonetic categories? Open Mind, 5, 113-131.

Schatz, T., Feldman, N., Goldwater, S., Cao, X., & Dupoux, E. (2021). Early phonetic learning without phonetic categories: Insights from large-scale simulations on realistic inputProceedings of the National Academies of Science, 118.

Werker, J. (1994). Cross-language speech perception: Developmental change does not involve loss. In: Goodman & Nusbaum (eds.), The Development of Speech Perception. Cambridge, MA: MIT Press, pp:93-120. [Useful for Lab 1. This paper reviews in more details the reasons why Werker adopts a structure-adding view of phonetic development.]

Werker, J. (1995). Exploring developmental changes in cross-language speech perception. In L. Gleitman & M. Liberman (eds) Language: An Invitation to Cognitive Science, Vol 1 (2nd edn.), 87-106. [This paper is the best starting point for this section of the course. It presents an overview of Werker’s views on phonetic development up to 1995, including a straightforward study of her important cross-language experiments from the early 1980s.]

Werker, J. F., Pons, F., Dietrich, C., Kajikawa, S., Fais, L., & Amano, S. (2007). Infant-directed speech supports phonetic category learning in English and JapaneseCognition, 103, 147-162. [Analysis of what infants actually hear. It is presented as an argument for unsupervised distributional learning, but I suspect that it shows the opposite.]

Cognitive Neuroscience of Speech Perception

Näätänen et al. 1997. Language-specific phoneme representations revealed by electric and magnetic brain responsesNature, 385, 432-434.

Kazanina, N., Phillips, C., & Idsardi, W. 2006. The influence of meaning on the perception of speech soundsProceedings of the National Academy of Sciences, 103, 11381-11386.

Pelzl, E., Lau, E., Guo, T., & DeKeyer, R. 2021. Even in the best case scenario L2 learners have persistent difficulty perceiving and utilizing tones in Mandarin. Studies in Second Language Acquisition, 43, 268-296.

van Turennout, M., Hagoort, P., & Brown, C. 1998. Brain activity during speaking: from syntax to phonology in 40 millisecondsScience, 280, 572-574.

Word Recognition

An accessible introduction to some foundational concepts and findings: 

Altmann, G. 1997. Words and how we (eventually) find them. Chapter 6 of The Ascent of Babel. Oxford University Press. [A good introductory chapter.]

Some recommended readings for class discussion.

Chen, L. & Boland, J. 2008. Dominance and context effects on activation of alternative homophone meaningsMemory and Cognition, 36, 1306-1323.

Magnuson, J., Mirman, D., & Myers, E. 2013. Spoken word recognition. In D. Reisberg (ed.), The Oxford Handbook of Cognitive Psychology, p. 412-441. Oxford University Press.

Gaston, P., Lau, E., & Phillips, C. 2020. How does(n’t) syntactic context guide auditory word recognition. Submitted.

Lau, E., Phillips, C., & Poeppel, D. 2008. A cortical network for semantics: (de)constructing the N400. Nature Reviews Neuroscience, 9, 920-933.

Federmeier, K. & Kutas, M. 1999. Right words and left words: electrophysiological evidence for hemispheric differences in meaning processingCognitive Brain Research 8, 373-392.

Ness, T. & Meltzer-Asscher, A. 2021. Love thy neighbor: Facilitation and inhibition in the competition between parallel predictionsCognition 207, 104509.

Staub, A., Grant, M., Astheimer, L., & Cohen, A. 2015. The influence of cloze probability and item constraint on cloze task response timeJournal of Memory and Language 82, 1-17.

Some seminal papers discussed in class.

Marslen-Wilson, W. 1975. Sentence perception as an interactive parallel processScience, 189, 226-228 

Marslen-Wilson, W. 1987. Functional parallelism in spoken word recognitionCognition, 25, 71-102.

Boland, J. and Cutler, A. 1996. Interaction with autonomy: Multiple output models and the inadequacy of the Great DivideCognition, 58-309-320.

Dahan, D., Magnuson, J., & Tanenhaus, M. 2001. Time course of frequency effects in spoken word recognition: Evidence from eye-movementsCognitive Psychology, 42, 317-367.

Kutas, M. & Federmeier, K. 2000. Electrophysiology reveals semantic memory use in language comprehension. Trends in Cognitive Sciences, 4, 463-470