Resource Web: Open Data Repositories for CSD Research

OS Tools
Open Data
For Scientists
Free
CSDisseminate Resource
Contributors

Austin Thompson

Elaine Kearney

Published

September 4, 2025


Where can I find data?

This resource web provides a curated list of open data repositories relevant to Communication Sciences and Disorders (CSD) research.

Databases


Datasets

The following datasets share varying levels of data, from raw data to analytic data.

Datasets

  • ASDBank Restricted
    Brian MacWhinney, Sanne Kuijper, Aparna Nadig, Angela MacDonald-Prégent, Inge-Marie Eigsti, Helen Tager-Flusberg, Janet Bang, Jean Quigley, Sinead McNally, Pamela Rollins, Nadège Foudon, Hariklia Proios, Jing Zhou

    ASDBank includes data on language development from children and adolescents with autism spectrum disorder. This bank is part of the TalkBank repository.

  • AphasiaBank Restricted
    Brian MacWhinney

    AphasiaBank is a shared database of multimedia interactions for the study of communication in aphasia. Access to the data in AphasiaBank is password protected and restricted to members of the AphasiaBank consortium group. This bank is part of the TalkBank repository.

  • BilingBank Restricted
    Brian MacWhinney, Fatma Özcan, Ilknur Keçik, Jens Normann Jörgensen, Irina Sekerina, Nada Jeletic, Eliana Mirkovic, Gordana Hrzica, Ana Maria Collazos, Eva Eppler, Aspa Hatzidaki, Penelope Gardner-Chloros, Margaret Deuchar

    BilingBank is a component of TalkBank dedicated to providing corpora for the study of multilingualism. This bank is part of the TalkBank repository.

  • Do dyslexia and stuttering share a processing deficit? Open
    Mahmoud M. Elsherif, Linda R. Wheeldon, Steven Frisson

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Do dyslexia and stuttering share a processing deficit?”.

  • FluencyBank Restricted
    Brian MacWhinney, Nan Bernstein Ratner

    FluencyBank is a shared database for the study of fluency development. Participants include typically-developing monolingual and bilingual children, children and adults who stutter (C/AWS) or who clutter (C/AWC), and second language learners. This bank is part of the TalkBank repository.

  • General Enhancement of Spatial Hearing in Congenitally Blind People Open
    Ceren Battal, Valeria Occelli, Giorgia Bertonati, Federica Falagiarda, Olivier Collignon

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “General Enhancement of Spatial Hearing in Congenitally Blind People”.

  • HomeBank Restricted
    Brian MacWhinney, Mark VanDam

    HomeBank is a resource for shared multi-hour, real-world recordings of children’s everyday experiences (for example, daylong home recordings using the LENA system), plus tools for analyzing those recordings. It is a component of the TalkBank system.

  • Hypernasality associated with basal ganglia dysfunction: evidence from Parkinson's disease and Huntington's disease Open
    Michal Novotný, Jan Rusz, Roman Čmejla, Hana Růžičková, Jiří Klempíř, Evžen Růžička

    This page contains open datasets needed to replicate the primary outcomes for the study “Hypernasality associated with basal ganglia dysfunction: evidence from Parkinson's disease and Huntington's disease”.

  • Normative Reference Values for FEES and VASES: Preliminary Data From 39 Nondysphagic, Community-Dwelling Adults Open
    James Arthur Curtis, James C. Borders, Avery Dakin

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Normative Reference Values for FEES and VASES: A Prospective, Observational Study of Non-Dysphagic, Community-Dwelling Adults”.

  • PERCEPT Restricted
    Nina R Benway, Jonathan Preston, Elaine Hitchcock, Yvan Rose, Asif Salekin, Wendy Liang, Tara McAllister

    Data for PERCEPT-R and PERCEPT-GFTA were collected during 34 separate cross-sectional and longitudinal studies at Syracuse University, Montclair State University, and New York University between 2006 and 2021.

  • PhonBank Restricted
    Brian MacWhinney, Yvan Rose

    PhonBank is the child phonology component of the TalkBank system

  • Relationships between reading performance and regional spontaneous brain activity following surgical removal of primary left-hemisphere tumors: A resting-state fMRI study Open
    Elaine Kearney, Sonia L.E. Brownsett, David A. Copland, Katharine J. Drummond, Rosalind L. Jeffree, Sarah Olson, Emma Murton, Benjamin Ong, Gail A. Robinson, Valeriya Tolkacheva, Katie L. McMahon, Greig I. de Zubicaray

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Relationships between reading performance and regional spontaneous brain activity following surgical removal of primary left-hemisphere tumors: A resting-state fMRI study”.

  • SLABank Restricted
    Brian MacWhinney and 47 other contributors

    SLABank is a component of TalkBank dedicated to providing corpora for the study of second language acquisition.

  • Supporting Emergent Bilinguals Who Use Augmentative and Alternative Communication and Their Families: Lessons in Telepractice From the COVID-19 Pandemic Open
    Marika King, Hannah Ward, Gloria Soto, Tyson S. Barrettc

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Supporting Emergent Bilinguals Who Use Augmentative and Alternative Communication and Their Families: Lessons in Telepractice From the COVID-19 Pandemic”.

  • TORGO Open
    Frank Rudzicz, Aravind Kumar Namasivayam, Talya Wolff

    The TORGO database of dysarthric articulation consists of aligned acoustics and measured 3D articulatory features from speakers with either cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS), which are two of the most prevalent causes of speech disability (Kent and Rosen, 2004), and matched controls.

  • TalkBank Restricted
    Brian MacWhinney

    The goal of TalkBank is to foster fundamental research in the study of human communication with an emphasis on spoken communication. Currently, TalkBank provides repositories in 14 research areas.

  • The SEED Corpus Restricted
    Marisha Speights

    The SEED corpus includes recordings of single words and continuous speech samples that provide examples of speakers with and without speech disorders.

  • UA-Speech Restricted
    Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon Gunderson, Thomas S. Huang, Kenneth Watkin, Simone Frame

    The UA-Speech corpus was published in 2008. It originally contained recordings of nineteen individuals with dysarthria as a correlate of cerebral palsy, plus age-matched and gender-matched controls. Of the original nineteen, three have removed permission over the years, and/or their data has been corrupted, so that the distribution now contains data from sixteen speakers.

  • UltraSuite Open
    Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Korin Richmond, Zoe Roxburgh, James Scobbie, Alan Wrench

    UltraSuite is a repository of ultrasound and acoustic data from child speech therapy sessions.

  • Vowel Acoustics as Predictors of Speech Intelligibility in Dysarthria Open
    Austin Thompson, Micah E Hirsch, Kaitlin L Lansford, Yunjung Kim

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Vowel Acoustics as Predictors of Speech Intelligibility in Dysarthria”.

No matching items

Raw data is the original, unprocessed information collected directly from participants or instruments, such as audio recordings, sensor outputs, or survey responses.

Datasets

  • ASDBank Restricted
    Brian MacWhinney, Sanne Kuijper, Aparna Nadig, Angela MacDonald-Prégent, Inge-Marie Eigsti, Helen Tager-Flusberg, Janet Bang, Jean Quigley, Sinead McNally, Pamela Rollins, Nadège Foudon, Hariklia Proios, Jing Zhou

    ASDBank includes data on language development from children and adolescents with autism spectrum disorder. This bank is part of the TalkBank repository.

  • AphasiaBank Restricted
    Brian MacWhinney

    AphasiaBank is a shared database of multimedia interactions for the study of communication in aphasia. Access to the data in AphasiaBank is password protected and restricted to members of the AphasiaBank consortium group. This bank is part of the TalkBank repository.

  • BilingBank Restricted
    Brian MacWhinney, Fatma Özcan, Ilknur Keçik, Jens Normann Jörgensen, Irina Sekerina, Nada Jeletic, Eliana Mirkovic, Gordana Hrzica, Ana Maria Collazos, Eva Eppler, Aspa Hatzidaki, Penelope Gardner-Chloros, Margaret Deuchar

    BilingBank is a component of TalkBank dedicated to providing corpora for the study of multilingualism. This bank is part of the TalkBank repository.

  • FluencyBank Restricted
    Brian MacWhinney, Nan Bernstein Ratner

    FluencyBank is a shared database for the study of fluency development. Participants include typically-developing monolingual and bilingual children, children and adults who stutter (C/AWS) or who clutter (C/AWC), and second language learners. This bank is part of the TalkBank repository.

  • HomeBank Restricted
    Brian MacWhinney, Mark VanDam

    HomeBank is a resource for shared multi-hour, real-world recordings of children’s everyday experiences (for example, daylong home recordings using the LENA system), plus tools for analyzing those recordings. It is a component of the TalkBank system.

  • PERCEPT Restricted
    Nina R Benway, Jonathan Preston, Elaine Hitchcock, Yvan Rose, Asif Salekin, Wendy Liang, Tara McAllister

    Data for PERCEPT-R and PERCEPT-GFTA were collected during 34 separate cross-sectional and longitudinal studies at Syracuse University, Montclair State University, and New York University between 2006 and 2021.

  • PhonBank Restricted
    Brian MacWhinney, Yvan Rose

    PhonBank is the child phonology component of the TalkBank system

  • SLABank Restricted
    Brian MacWhinney and 47 other contributors

    SLABank is a component of TalkBank dedicated to providing corpora for the study of second language acquisition.

  • TORGO Open
    Frank Rudzicz, Aravind Kumar Namasivayam, Talya Wolff

    The TORGO database of dysarthric articulation consists of aligned acoustics and measured 3D articulatory features from speakers with either cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS), which are two of the most prevalent causes of speech disability (Kent and Rosen, 2004), and matched controls.

  • TalkBank Restricted
    Brian MacWhinney

    The goal of TalkBank is to foster fundamental research in the study of human communication with an emphasis on spoken communication. Currently, TalkBank provides repositories in 14 research areas.

  • The SEED Corpus Restricted
    Marisha Speights

    The SEED corpus includes recordings of single words and continuous speech samples that provide examples of speakers with and without speech disorders.

  • UA-Speech Restricted
    Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon Gunderson, Thomas S. Huang, Kenneth Watkin, Simone Frame

    The UA-Speech corpus was published in 2008. It originally contained recordings of nineteen individuals with dysarthria as a correlate of cerebral palsy, plus age-matched and gender-matched controls. Of the original nineteen, three have removed permission over the years, and/or their data has been corrupted, so that the distribution now contains data from sixteen speakers.

  • UltraSuite Open
    Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Korin Richmond, Zoe Roxburgh, James Scobbie, Alan Wrench

    UltraSuite is a repository of ultrasound and acoustic data from child speech therapy sessions.

No matching items

Analytic data is information that has been cleaned, coded, or transformed into a format ready for analysis; these types of datasets are often shared alongside publications to support computational reproducibility.

Datasets

  • Do dyslexia and stuttering share a processing deficit? Open
    Mahmoud M. Elsherif, Linda R. Wheeldon, Steven Frisson

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Do dyslexia and stuttering share a processing deficit?”.

  • General Enhancement of Spatial Hearing in Congenitally Blind People Open
    Ceren Battal, Valeria Occelli, Giorgia Bertonati, Federica Falagiarda, Olivier Collignon

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “General Enhancement of Spatial Hearing in Congenitally Blind People”.

  • Hypernasality associated with basal ganglia dysfunction: evidence from Parkinson's disease and Huntington's disease Open
    Michal Novotný, Jan Rusz, Roman Čmejla, Hana Růžičková, Jiří Klempíř, Evžen Růžička

    This page contains open datasets needed to replicate the primary outcomes for the study “Hypernasality associated with basal ganglia dysfunction: evidence from Parkinson's disease and Huntington's disease”.

  • Normative Reference Values for FEES and VASES: Preliminary Data From 39 Nondysphagic, Community-Dwelling Adults Open
    James Arthur Curtis, James C. Borders, Avery Dakin

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Normative Reference Values for FEES and VASES: A Prospective, Observational Study of Non-Dysphagic, Community-Dwelling Adults”.

  • Relationships between reading performance and regional spontaneous brain activity following surgical removal of primary left-hemisphere tumors: A resting-state fMRI study Open
    Elaine Kearney, Sonia L.E. Brownsett, David A. Copland, Katharine J. Drummond, Rosalind L. Jeffree, Sarah Olson, Emma Murton, Benjamin Ong, Gail A. Robinson, Valeriya Tolkacheva, Katie L. McMahon, Greig I. de Zubicaray

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Relationships between reading performance and regional spontaneous brain activity following surgical removal of primary left-hemisphere tumors: A resting-state fMRI study”.

  • Supporting Emergent Bilinguals Who Use Augmentative and Alternative Communication and Their Families: Lessons in Telepractice From the COVID-19 Pandemic Open
    Marika King, Hannah Ward, Gloria Soto, Tyson S. Barrettc

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Supporting Emergent Bilinguals Who Use Augmentative and Alternative Communication and Their Families: Lessons in Telepractice From the COVID-19 Pandemic”.

  • Vowel Acoustics as Predictors of Speech Intelligibility in Dysarthria Open
    Austin Thompson, Micah E Hirsch, Kaitlin L Lansford, Yunjung Kim

    This OSF page contains open datasets needed to replicate the primary outcomes for the study “Vowel Acoustics as Predictors of Speech Intelligibility in Dysarthria”.

No matching items
Back to top