The US government recently formed an Interagency Working Group on Language and Communication (IWGLC) to look into the scope of research and development activity across around 15 different government agencies. This may sound like a total snoozefest, but it’s potentially a big deal.
We like to talk about how language is important for so many different parts of human life, and don’t feel like we’re always taken entirely seriously. Part of the problem is that language science is spread across so many different fields. Another part of the problem is that we language researchers don’t always see common purpose, so the world rarely gets to see the breadth of what we do.
So if a broad swath of US government agencies, ranging from the National Science Foundation to NASA to the State Department to the Department of Transportation all get together to create a map of language science research, to identify shared interests, and to look for ways to better coordinate their efforts, this is quite encouraging.
The IWGLC produced a report. And they widely circulated a request for feedback from the language research community. Perhaps surprisingly, only 2 comments were submitted. One of them was from leaders of our Maryland Language Science community. The comments are public, but you’re probably not closely monitoring regulations.gov. So I’m posting our remarks here.
I really don’t expect you to read all of this. But it is good to know that this process is underway within the US government. We can all hope that it continues and is successful.
Response to RFI on Interagency Working Group on Language and Communication’s Report on Research and Development Activities
December 30th, 2016
From: Colin Phillips, Director, Maryland Language Science Center
Rochelle Newman, Chair, Dept of Hearing & Speech Sciences, LSC Associate Director
Michael Bunting, Acting Executive Director, Center for Advanced Study of Language
David Ellis, Executive Director, National Foreign Language Center
Maria Polinsky, Director, Maryland-Guatemala Field Station, LSC Associate Director
(all part of the University of Maryland)
These comments are submitted by a group of researchers from the University of Maryland, representing a diverse community of language science experts (from special education to electrical engineering). Our community’s R&D efforts cover most of the cells of the IWGLC taxonomy, from basic research to implementational R&D, and we have a long-term commitment to connecting the various pieces. We have experienced the benefits and the challenges of integrating these different areas, and we also have experience of coordinating with multiple government agencies for R&D on language and communication.
We read the IWGLC report with great interest, and have comments that fall into four categories: (i) support for the aims of IWGLC, (ii) specific suggestions on the taxonomy, (iii) need for integration of language R&D beyond US government agencies, (iv) specific examples that can benefit from increased coordination among US government agencies and other groups.
Support for IWGLC’s aims.
- A taxonomy that organizes language and communication R&D into a coherent landscape is valuable, and has been lacking to date. (We offer more specific suggestions below.) The taxonomy is hopefully a first step towards greater coordination among agencies.
- Language and communication R&D is a challenging area to integrate for many reasons. The same limited integration that IWGLC aims to overcome in the US government is also found in academic R&D on language and communication. Experts in education, engineering, neuroscience, fieldwork, among others, receive different training, they are housed in different parts of universities, they publish in different venues, and they often have little contact with one another. Widespread preconceptions about language, among the public, policymakers, and researchers, create further barriers to integration. A number of universities in the US and globally (ours included) are now making serious efforts towards greater integration in “language science”. So the IWGLC’s efforts parallel recent trends in academic research that are still in the early stages.
- A further barrier to integration of language & communication R&D is the silos found in K-12 education, where language is often pigeon-holed as an arts/humanities field, and its relevance to computer science, human biology/psychology, etc. is overlooked. These silos impact the creation of a well-prepared language R&D workforce.
- We welcome the proposal to re-charter the IWGLC for further coordination of federal efforts. But we question the proposal to house it wholly under the Subcommittee on Social, Behavioral, and Economic Sciences, given the broader reach of language & communication research. In our own academic setting we have had success in overcoming traditional silos for language research by having an umbrella unit that reports to two different divisions of the university (among the 6 broad academic divisions that have a stake in language research).
Specific comments on the proposed taxonomy.
The taxonomy is an excellent first pass at organizing the domain of language science. However, we noted some areas that were either missing or underrepresented. A risk of a taxonomy like this one is that certain areas may be overlooked because they are hard to fit into a single category.
- Health-related aspects of language seem to be underrepresented in the taxonomy, and they are split across major categories. Surprisingly, acquired language disorders, e.g., aphasia, appear in the “knowledge and processes” category, whereas developmental language disorders appear in the “abilities and skills” category. Research on hearing is underrepresented, though it is certainly an area that receives federal R&D funding. Hearing is critically tied to language comprehension, and hearing difficulties are especially important in light of the aging population (and government workforce), and the large post-military workforce. Research increasingly shows that hearing impairments are not limited to peripheral sensory damage, and that higher-level language processes play an important role in the ability to hear well.
- Creation of resources that serve multiple areas of language R&D is hard to fit into the taxonomy. For example, aggregation and integration of text corpora, recordings, analyses, diagnostics, and pedagogical materials for hundreds or thousands of languages is valuable for multiple fields of research, but it does not easily fit the taxonomy. This type of R&D partly falls under the “language documentation” heading discussed in Section 5.1 of the report, but it is broader than standard documentation.
- Two increasingly important areas of language learning research that are underrepresented in the taxonomy are heritage languages and “L3” (and beyond). Tens of millions of Americans are speakers of a heritage language, i.e., a language spoken at home and learned early, but later replaced by English as the dominant language. These speakers are a potentially enormous reserve of foreign language expertise for the US, but too little is currently known about heritage language abilities and how they can be maintained or revived across the lifespan. Meanwhile, the ability to continue learning new languages across the lifespan should hold much interest for the government and commercial sectors. In a fast-changing world it is increasingly valuable for humans to be able to adapt to work in new settings. To-date, research on so-called “L3 acquisition” has focused on questions such as interference from the second language, but many other questions arise about lifelong flexibility, i.e., the transferability of language and culture training from one language to another.
- We note that the IWGLC report states that DOD has no R&D for K-12 language education. However, it does sponsor programs that support training in critical languages for K-12 students. For example, the StarTalk program (startalk.umd.edu), administered by the National Foreign Language Center at the University of Maryland, is an NSA-supported program that aims to increase the number of US citizens learning, speaking, and teaching critical-need foreign languages.
- The word “noise” does not appear in the report or in the taxonomy. The impact of adverse listening conditions (including both noise and degraded signals) on successful language comprehension is relevant to many issues facing government workers, military personnel, and technological solutions. It represents an important area of government-funded research, but is not included in the taxonomy.
- Many of the topics identified in the taxonomy become more complicated when extended beyond one or two languages. The impact of linguistic and cultural diversity is underrepresented and relevant across much of the taxonomy. For example, the notion of “finding the best ways to communicate critical information about health and safety… to the public” appears to focus on the best use of language for public information campaigns. Yet the concerns are broader when the information needs to be communicated to less familiar language communities or cultures, as might be the case following a natural disaster in another region of the world.
- Moreover, while the report notes that humanitarian relief efforts can be stymied by language differences, this might not do justice to the full extent of the difficulties; even identifying the greatest needs relies on being able to communicate with speakers of diverse and often poorly documented languages. And for communicating with populations that speak multiple languages, i.e., most of humanity, it is important to understand the relative effectiveness of communicating via a lingua franca vs. a home language. This relates to an issue that the report does a good job of highlighting: the difference between merely communicating information vs. communicating information that is trusted and acted upon.
- Language learning issues are divided in the taxonomy between implicit/naturalistic learning (under “knowledge and processes”) and explicit/classroom learning (under “abilities and skills”). This accurately reflects standard practices, but important new lines of research do not fall readily into one of the two categories. There is much current interest in how differences in learners’ naturalistic language experience impacts their linguistic and educational outcomes (variously under headings such as “30 million word gap” or “language poverty”), and in how caregivers and teachers can modify their interaction to facilitate learning. These efforts focus on naturalistic interaction and implicit learning mechanisms, but with intentional modification of learner experiences. Relatedly, the impact of media and social media on early education cuts across the primary categories in the taxonomy.
- Issues of within-language diversity (dialects) feature in the technology section of the taxonomy, but they are another issue that cuts across much of the taxonomy. For example, dialect mismatches between home and school language create additional challenges for early learning and literacy. Dialect differences can also have important implications for establishing trust in communications.
- The broad category on language technologies highlights applications for language learning, but does not include applications for health, whether for language-related health issues (“language as health”) or for broader health issues (“language for health”). This includes the role of technology in diagnosis, prevention, and health monitoring, and potentially extends across many languages.
- The language technology section of the taxonomy focuses on fully automatic processes involving language technology, i.e., situations where technology replaces humans. But there is rapid increase in language technology research on human-computer collaboration, i.e., how to effectively divide workflows between things that humans and computers do best. Sample applications include triage of large amounts of data for DOD analysts or the role of human experts (even monolingual experts) in machine translation workflows, especially for low-resource languages. This can be seen, for example, in DARPA’s recent LORELEI program.
- The taxonomy highlights automatic translation and interpreting, but does not include research on human translation and interpreting.
- There is a presumption in the report that early learning is always better. It is clear that naturalistic language learning is easier for young children than for adolescents or adults. But it is less clear that this advantage fully extends to explicit/classroom language learning. Much educational policy on foreign language instruction presumes the early advantage, but there is little research in the US on whether this advantage exists, or whether it exists for all aspects of language and all learning settings. Some research in other countries questions the early advantage for explicit learning.
- The role of scripts/orthographies in language and communication is underrepresented. Different types of writing systems present challenges and opportunities for human and computer communication.
The need to integrate language science R&D extends beyond US federal agencies.
- We applaud the proposed steps for greater coordination among federal agencies, but achieving the IWGLC’s goals will require activities that brings research fields together, not just federal agencies.
- It is important to facilitate development of a workforce that is equipped to carry out R&D that bridges the different areas of the taxonomy, especially connecting basic and applied research. Agencies such as NSF have had training programs that aim to strengthen interdisciplinary research, e.g., IGERT and NRT. These programs are successful, but (i) they have focused on connecting different areas of basic research, rather than bridging basic and applied research, and (ii) there has been only limited attention to the scalability of the training innovations beyond the programs that the government funds.
- Efforts towards early integration of fields in K-12 could make a big difference to the availability of a workforce that is equipped to work across different areas of the IWGLC taxonomy. For example, there is much current interest in early STEM education, but language and communication are typically not included, because language in K-12 is generally pigeon-holed in a specific part of the curriculum. We are involved in various local and national efforts to raise the visibility of language research among K-12 children. A common reaction that we encounter is pleasant surprise that language can be explored scientifically.
- The report focuses on language & communication R&D in the US. This is understandable, but limiting. Language R&D has a necessarily global character, and many problems cannot be solved in the US alone, or in any single country. Expertise in language and languages is broadly distributed. Federal agencies can contribute to solutions by lowering barriers to international collaboration and integration of knowledge across national borders. Also, in some fields the US could benefit from greater attention to successful models in other countries, e.g., in foreign language education.
- We welcome the plan to expand joint funding mechanisms that span multiple federal agencies, but suggest that joint program announcements not be the exclusive vehicle for this. There is language and communication R&D that should be of interest to many different agencies, but that could not feasibly be covered by a planned multi-agency program announcement. Such joint programs can lead to missed opportunities, because agencies say “We’re collaborating!”, but researchers respond that “You’re telling us to collaborate in one specific way!”. Mechanisms that allow multiple agencies to contribute to an R&D effort without cumbersome multi-agency review could be valuable.
- The IWGLC report emphasizes lack of redundancy in existing programs. This can be valuable for preserving budgets, but it suggests a climate where agencies are disincentivized to identify shared needs and to cooperate, because this could create risks of cuts.
Specific examples of overlapping interest, and challenges in supporting them
Here we provide just a few examples of research themes or initiatives that serve multiple interests, and the challenges that exist in supporting them under current arrangements.
- There is a need for closer connections between literacy and speech/hearing sciences. Learners who have speech/hearing difficulties often have literacy challenges, too. And the two types of difficulties can be confused with one another, e.g., because literacy is often assessed through oral reading. Yet the two research areas are largely independent. Researchers are trained in separate fields, K-12 schools employ different specialists, and services are often poorly coordinated. The report highlights that IES and NIH have complementary interests in this area, but this promotes the existing disconnect.
- There is broad interest in electronic resources that aggregate, integrate, and elicit information on the world’s 6000+ languages. Such resources are valuable to multiple agencies, to academia, and to the public. But this broad interest creates funding challenges, as we have experienced in coordinating the Langscape project. Langscape is an online portal for language diversity (langscape.umd.edu). It started as a DOD-internal project, with a focus on in-house data collection, then was later turned into a public-facing project, managed by the University of Maryland, with a focus on integrating existing worldwide expertise in an accessible format. Working through a university facilitates global collaboration, and the DOD can still maintain a classified mirror (the university researchers have no role in this). This is a smart strategy for partnering with academia to create resources that the government could not create on its own, at a relatively modest cost. It holds interest for multiple agencies, but it has proven difficult to support, because individual agencies understandably do not want to shoulder the burden of funding projects that serve multiple agencies. A mechanism for joint funding could benefit multi-use projects such as this.
- Language and communication in noise and signal degradation are serious concerns in many different domains, from military telecommunications to speech recognition, to school environments, to aging. The impact of noise on successful communication is even greater in multilingual societies, since comprehension is more fragile in a non-native language. But there is limited sharing of R&D resources and expertise across these different application areas.
- More generally, it is challenging for teams of researchers to pursue end-to-end solutions via government support, because they generally span diverse agencies. For example, efforts to improve child learning through rich caregiver interactions requires coordination between experts in child development, linguistics, education, language technology, behavioral feedback, and more. It requires a mix of all four types of research in the IWGLC taxonomy (basic, translational, applied, implementation). But it would be difficult to mount such a coordinated effort without substantial coordination between different government and non-government agencies.