Linguistic Illusions Seminar (HT 24)

Graduate Seminar in Linguistic Illusions
Colin Phillips
Hilary Term 2024
Tuesdays, 11:30 – 1:00, Jan 16 – Mar 5, Clarendon Institute Lecture Theater

In vision science there is a long tradition of using perceptual illusions to understand the inner workings of the visual system. A rapidly expanding body of findings about linguistic illusions — about where they occur and where they do not occur — can serve a similar function for understanding the human language system. Linguistic illusions arise when an individual’s perception or production systematically mismatches with their own internal grammar. They can be measured using a wide range of different experimental approaches. At the word and sentence level, illusions have been documented in diverse phenomena such as agreement, anaphora, negative polarity, argument roles, comparatives, and lexical access. Findings from an increasingly broad range of languages show that linguistic detail matters a great deal in controlling vulnerability to illusions. The systematic variability across phenomena and languages is unlikely to be learned, so it more likely reveals important clues to how linguistic representations are encoded and accessed. However, (psycho)linguistic theories have struggled to keep up with the growth in new findings. Computational models play an increasingly prominent role in this field, and illusion-sensitivity has been used as a test of how human-like the abilities of Large Language Models are.

This is an interactive graduate seminar, covering topics in psycholinguistics, syntax, morphology, and semantics. In order to facilitate discussions, participants will be strongly encouraged to read one or more assigned papers ahead of each session, and to submit brief written answers to questions designed to encourage synthesis of ideas and findings. This seminar aims to develop skills in thinking and writing about how to design experiments on specific topics that can serve broader psycholinguistic theories.

In case of accessibility needs, please contact Colin Phillips for information on how to join remotely. Otherwise, in person participation is strongly encouraged, to facilitate discussion.


Week 1

Theme: setting the scene; why we care; early findings (up to ~2010).

Recommended reading for Week 1: Phillips, Wagers, & Lau (2011). Grammatical illusions and selective fallibility in real-time language comprehension. In J. Runner (ed), Experiments at the Interfaces, Syntax & Semantics, 37, 148-180.

This article is an early synthesis of what we knew on this topic around 2010. Since that time, many new things have been learned, and many new questions have arisen. That will be the focus for the remainder of the term.

Slides: Illusions Week 1 (we did not cover all of this, and not in order)

Week 2

Theme: agreement attraction

Why we care. Two main reasons that we focus on here.

(1) Agreement attraction effects like “the key to the cabinets are on the table” have played an important role in guiding how we think about memory encoding and access. In particular, the “grammatical asymmetry” in agreement attraction in comprehension has been a key piece of evidence in favour of cue-based retrieval in content-addressable memory (cf. Wagers, Lau, & Phillips, 2009).

(2) More recent research has revealed strikingly varied agreement attraction profiles in different languages, as a function of syntactic and morphological variation. This variability likely provides important clues about how linguistic detail is encoded and accessed.

Discussion Note: we will look at 3 recent studies on agreement attraction effects in languages that show richer morphological agreement paradigms than English. (a) Slioussar 2018 – in Russian, singulars that share a surface form with plurals can trigger plural attraction. (b) Bhatia & Dillon 2022 – in Hindi, abstract properties of the split ergative case/agreement system modulate attraction effects. (c) Chromy et al. 2023 – Czech speakers seem to be immune to agreement attraction effects.

Prompt: Look at the results of 2 of these 3 recent articles. These studies show different attraction profiles in each language, e.g., based on surface form, abstract structural properties, or no attraction. Why are the findings surprising? How could they be simultaneously true? This is an invitation to speculate about how representations and access mechanisms combine to yield different kinds of sensitivity. Please send brief written thoughts (e.g., less than a page) by 10:30am on Tuesday, to

Slides: Illusions Week 2 (we did not cover all of this, and not in this order)

Week 3

Theme: agreement and anaphora

This week we will continue the discussion from Week 2 of systematic cross-language variation in agreement attraction effects, and then turn to systematic variation in how successfully speakers and comprehenders resolve anaphoric dependencies involving pronouns, reflexives, etc.

Why we care: a priori, anaphoric relations present interesting challenges for real-time linguistic computation. An uneven profile of success and errors/delays suggest that speakers and comprehenders do not always overcome those challenges. (i) Subject-verb agreement and subject-reflexive licensing appear to require agreement with the same item in memory, yet agreement is more vulnerable than reflexive licensing. This contrast suggests that either different mechanisms are involved, or that computations occur at different times. (ii) Some constraints on anaphora make reference to c-command. Such relational notions should be hard to capture in a content-addressable memory architecture, yet humans are rather successful. This suggests that something is wrong in our understanding. (iii) We see varying levels of errors in generating appropriate forms of pronouns and reflexives. This pushes us to explain why different computations are differentially vulnerable. (iv) As a bonus, there is a striking parallel between children’s offline interpretations and adults’ fleeting online interpretations.

Discussion Note: The pandemic pushed our group to learn how to do speech production studies online. This worked much better than expected, allowing us to efficiently collect and analyze large amounts of data on what people say and on when they say it. One of the first things that we did was follow up on an unexpected finding from earlier in-lab studies on production of agreement and anaphora. Speakers sometimes say the wrong pronoun, even in situations where they know exactly what they are trying to refer to! Take a look at (i) Kandel & Phillips 2022, a paper on agreement and anaphor production, and (ii) Wyatt, Kandel, & Phillips, 2021, a presentation on pronoun production. Read the slides, or watch the video (starts around 33:00). Across these studies we see (a) lots of agreement errors, and evidence for attraction even in trials where speakers get agreement right, (b) systematic pronoun errors, but at low rates, and (c) very rare reflexive errors. What could be some mechanisms that could capture this quantitative variability in disruption? Please send brief written thoughts (e.g., less than a page) by 10:30am on Tuesday, to

Slides: Illusions Week 3 (we did not cover all of this, and not in order)

Week 4

Theme: quantifiers and negative polarity

Why we care: the discovery of Negative Polarity illusions around 20 years ago, coinciding with a growth in interest in memory access mechanisms, played a key role in triggering broader research on linguistic illusions. There have been two main puzzles. First, based on what we know about memory access on the one hand, and about NPIs and quantification on the other hand, these illusions shouldn’t occur. Second, subsequent research has uncovered a number of interesting contrasts in susceptibility to illusions with NPIs and quantifiers.

Discussion Note: first, thanks for the very interesting discussion notes that you have sent the past couple of weeks! This week we will take as a starting point a recent paper by Orth et al. (2021). The headline finding there is that “not” does not cause NPI illusions. Then you are encouraged to choose one of two other studies that presents an interesting contrast to the Orth et al. findings. Option (i): Yanilmaz & Drury (2017) document a variety of NPI illusions in Turkish, where sentential negation does seem to cause illusions. (I think similar effects have been found in Korean.) Is the evidence comparable in the two languages, and what might be responsible for the contrast? Option (ii): Hanna Muller’s 2022 PhD dissertation is the most thorough investigation of English NPI illusions to date. One of the nice features of Muller’s work is that she tested some effects many times over, making it possible to see the robustness of the effects. Muller and her collaborators (Iria de dios Flores and myself) agree with Orth et al. that sentential negation doesn’t trigger NPI illusions, but they take a different view of why this happens. Why do they disagree … and should they be uncomfortable with the apparent robustness of this (non-)effect? (Full disclosure: they do find it worrying.) Hanna’s dissertation is FAR too long to read in one go. So I would recommend to look at her useful summaries, and to then just sample some of her experimental designs and conclusions. A good starting point is the two summary figures 6.1 and 6.2 on pp. 195-196. The first of these shows the basic NPI illusion effect, and the second summarises 10 different experiments that tested the contrast between quantificational vs. sentential negation. Please send brief written thoughts (e.g., less than a page) by 10:30am on Tuesday, to   

Slides: Illusions Week 4; also slides from Muller & Phillips 2021 talk (Frankfurt Negation workshop)

Week 5

Theme: Argument role reversals and comprehension-production relations 

First an apology. Sorry that I am posting this late on a Friday evening. All The Things were happening this week! 

Why we care: argument role reversals like the cop that the thief arrested have generated pockets of interest in different areas of psycholinguistics over the past 50 years, especially when they lead to actual misinterpretations in children, aphasic patients, or other specific groups. 

They attracted a lot more attention in the electrophysiology literature in the 2000s when a number of different groups more or less simultaneously discovered that these semantically anomalous strings do not elicit N400 effects (contrary to expectations), and instead elicit P600 effects typically associated with syntactic anomalies (e.g., Kim & Osterhout 2005; Kuperberg et al. 2003; Kolk et al. 2005). This fueled an interest in syntax-independent interpretive mechanisms, and in faulty prediction mechanisms (Chow et al. 2016, 2018). 

Very recently, my collaborators and I became very interested in an apparent divergence between speaking and understanding. Rapid comprehension measures seem to indicate fleeting insensitivity to argument roles in comprehension. But production measures, e.g., using a speeded cloze task, indicate a high degree of sensitivity. This contrast proved irresistible to us, so it has been one of our main areas of interest since the start of the pandemic. It has led us to pay far closer attention to task details than we ever did before.

Discussion Note 5: As a change, this week’s primary “reading” is a short video, from a talk that Rosa Lee gave at the 2022 Human Sentence Processing Conference. Rosa’s talk starts at 27:00 into the video. It reflects our efforts to understand the comprehension-production contrast as of a couple of years ago. As an optional secondary reading, look at a paper from our group by Masato Nakamura et al. 2024, which is so new that we finished it just today, though the experiments are a couple of years old. Rosa and Masato’s takes on the comprehension-production contrast are not the same. 

Question: subsequently, Rosa Lee ran an EEG version of the interleaved comprehension-production design, in which she measured ERPs at the verb in the comprehension trials. What do you think she expected to find? Explain your rationale. Optional extra: if you also looked at the new Nakamura et al. paper, what do you think Nakamura would have expected in Rosa’s interleaved EEG study? Try to work through these predictions before looking at what Rosa actually found (poster presentation from HSP 2023 conference). Do you find the results surprising in light of your predictions?  

(Why am I asking this in this way? Because I think it’s a useful exercise for working through the logic of experiments, and in particular for spelling out the assumptions that you are making.)

Slides: Illusions Week 5 (we did not cover all of this!)

Week 6

Theme: comparatives, negation, substitution illusions, and more.

I am not currently planning a discussion note assignment for this week. Reason: we have a little catching up to do, and this week’s new material is topically scattered. However, I would encourage you to “nose around” the literature a little, to get a feel of what is out there and what is(n’t) known. In domains like agreement, negative polarity or role reversals we can say, “Here is a cool set of generalizations that we have established about how to modulate this illusion”. In the case of these other phenomena, we are not yet at that point, though progress is happening. I will update this section of the page in the next couple of days with papers, but here are a couple to whet your appetite.

Wellwood et al. 2018. This is an early experimental attempt to understand comparative illusions (“More people have been to Oxford than I have.”)

Zhang et al. 2023. A “noisy channel” approach to Depth Charge illusions (“No head injury is too trivial to ignore.”) This is a paper from Ted Gibson’s group at MIT. They have been arguing that a noisy channel approach explains pretty much everything. You be the judge.

Huang & Phillips 2021. A Mandarin counterpart of the “Missing VP” illusions documented in European languages. In Mandarin it’s a “Missing NP Illusion”.

Week 7

Theme: Plenty about humans, and some about large language models (LLMs)

Why we care: Large language models (LLMs) are very good at language … by some measures. Many people have begun to explore how “humanlike” LLMs’ language abilities are. Partly because they are curious about the models in their own right, but partly because they want to know whether the models can inform us about how humans work. Much of this work has focused on whether LLMs can emulate the things that humans do very well. But there’s also interest in whether LLMs show the same limitations as humans. This is where LLMs meet the study of linguistic illusions.

Discussion Note 5: A short paper by Zhang, Gibson, & Davis (2023) tests whether LLMs show human-like susceptibility to 3 different kinds of linguistic illusions. (i) Do you think that the specific measures that Zhang et al. choose (perplexity and surprisal) are good (or bad) proxies for to the kinds of measures that we use to measure illusions in humans? (ii) What role, if any, do you think that LLMs could play, now or in the future, in helping us to understand why humans are selectively susceptible to linguistic illusions? The first of these questions encourages you to think about what the researchers are actually doing. The second encourages you to think more integratively about what we have been trying to figure out in this seminar.

Also, this theme is intended to help whet our appetites for next Saturday’s workshop on Linguistics and NLP, at Magdalen College.

Week 8