A Faster Way to Create Research Videos for Children

MIT researchers develop automated pipeline that generates child-friendly audiovisual stimuli in minutes instead of weeks

Animated animal used in the videos

Bianca Santi and Halie Olson

Developmental neuroscientists face a persistent challenge: creating engaging video content to hold children's attention during brain imaging experiments. Producing dozens or hundreds of customized videos—with precisely controlled audio and visual characteristics—has traditionally been tedious, expensive, and time-consuming, often requiring labs to hire professional animators or spending weeks on manual production.

Researchers in Professor Ev Fedorenko’s group have developed an automated pipeline that accelerates this process significantly. By combining off-the-shelf tools for speech synthesis and animation, they've created a system that generates polished, child-friendly audiovisual stimuli from simple text input in minutes per stimulus.

“We were starting up a study to compare children's brain responses to large language models, and we needed to create hundreds of well-controlled, child-friendly audiovisual stimuli,” explains Dr. Halie Olson, a postdoc working in the Fedorenko lab. Her initial strategy was to engage summer UROPs to make the stimuli, one at a time. “Instead, they asked ‘can we think about how to automate this process?’”
Bianca Santi ’25, the lead author on the paper, is now a grad student in Psychology at Princeton University. She says, “the stimulus pipeline project started out as a way for us to create many child-friendly stimuli for an fMRI project, but for me it became much more than that. As we realized that the tools we used and created for our own stimuli could be useful to other researchers as well, validating and documenting the pipeline became its own spinoff project, which I was grateful to have the opportunity to lead.”

Streamlining the Process

The pipeline, described in a paper published in Developmental Cognitive Neuroscience, works in two streamlined steps. First, Google Cloud Text-to-Speech converts written sentences into audio files in less than a second per stimulus. Second, Adobe Character Animator automatically creates mouth movements synchronized to the audio and generates an animated video of an on-screen character "speaking" the text.

For the researchers' test case, the full pipeline created each finished video in approximately 1.5 minutes—and for audio alone, the process takes less than a second per stimulus. Overall error rates remained low: fewer than 2 percent for audio and roughly 9 percent for video (though many video errors were minor issues like slight character positioning problems that could be easily fixed).

To put this in perspective, manually producing the 880 videos the team needed for their fMRI study would have taken a researcher several weeks of devoted effort. With the automated pipeline, the full set was ready in just a couple of days.

“We realized we could leverage recent technological advances like improved text-to-speech generation to not only speed up the process but also give us additional control over the outputs. We also realized that these tools would be valuable for other developmental cognitive scientists, too; stimuli creation can be a big bottleneck in the research pipeline,” says Olson.

Standardization with Flexibility

As well as accelerating video creation, the pipeline also provides standardization. When the researchers compared audio generated by their system to traditionally recorded speech, they found that the computer-generated stimuli were significantly more consistent in pitch and acoustic properties. This standardization is important for reducing confounds in research, ensuring that differences in brain responses reflect the experimental manipulation rather than uncontrolled variations in how sentences were spoken.

The pipeline is highly customizable. Researchers can choose from over 220 pre-recorded voices in more than 40 languages, adjust speech rate and pitch, select from various animated characters, and control visual details like character movement and background. Users can randomize certain features within specified ranges, enabling sophisticated study designs that would be impractical to implement by hand.

Either component of the pipeline can be used independently. A team studying infant-directed speech, for example, might use human-recorded audio with the animation component. Or researchers might use the audio generator alone for studies that don't require video.

Accessible Research Tools

The pipeline makes sophisticated stimulus creation accessible to labs without specialized animation expertise or large budgets. "This was a great example of research needs driving methodological innovation. We didn't set out to create a pipeline,” says Olson, “but rather realized along the way that we could leverage existing tools to make our research better. It was also a true team effort that drew on the strengths of multiple lab members, including undergraduate researchers like Bianca.”

The researchers built their pipeline using a combination of free and low-cost tools (though Adobe Creative Cloud subscriptions are required for the full animation component). All code and documentation are openly available for other researchers to use and adapt.

Broader Applications

While designed for fMRI studies of language comprehension in children, the pipeline has potential applications across developmental cognitive neuroscience and beyond. It could support studies of social cognition using story vignettes with talking characters, research on emotional processing, or personalized interventions where individual children hear their own names in carefully controlled experimental contexts.

In the paper, the researchers acknowledge important limitations. The generated speech can sometimes be distinguishable from human voices, which may not be appropriate for all studies. Animated characters have long been used in developmental research, but they also can introduce confounds. The system is currently optimized for relatively simple scenarios with a single speaking character.

Looking Forward

The researchers believe their work demonstrates how developmental cognitive neuroscientists can accelerate research through use of new tools. “I'm thrilled with how it all turned out,” says Olson. “Not only did we generate the hundreds of stimuli that we needed for our fMRI project, but we also were able to share what we learned with the broader research community. The stimuli are being put to good use — I've been scanning children as young as six years old up through adults!”

Through this project, Santi was able to make her first presentation at a research conference and author her first journal publication; she says, “I had the opportunity to gain first-hand experience in all parts of the research process, from ideation and study design to publication.” She continues, “I also especially appreciated the opportunity to learn about how to make scientific tools available to other researchers, including how to write documentation and how to make resources available on platforms like the Open Science Framework (OSF). The work that goes into making science open and accessible is often under-appreciated, and I am grateful I got to learn about it early on in my research career. I learned important lessons about tool development and stimulus design for developmental neuroimaging and also gained general research skills that are now serving me well in graduate school.”

The research was supported by the Simons Foundation, the National Institute of Child Health and Human Development, MIT's McGovern Institute, and the MIT Siegel Family Quest for Intelligence.