Resource Web: Open Data Repositories for CSD Research
Where can I find data?
This resource web provides a curated list of open data repositories relevant to Communication Sciences and Disorders (CSD) research.
Databases
- NIH’s Resource for Finding Data Repositories — Hint, narrow your search by selecting “NIDCD” (or whichever intitute may contain the data you are interested in) in the Institute or Center column.
- Open Science Framework (OSF) — Search for your population of interest and filter by files or project.
- Mendeley Data
- Figshare
- IEEE Dataport
- Synapse
Datasets
The following datasets share varying levels of data, from raw data to analytic data.
Datasets
-
ASDBank Restricted
ASDBank includes data on language development from children and adolescents with autism spectrum disorder. This bank is part of the TalkBank repository.
-
AphasiaBank Restricted
AphasiaBank is a shared database of multimedia interactions for the study of communication in aphasia. Access to the data in AphasiaBank is password protected and restricted to members of the AphasiaBank consortium group. This bank is part of the TalkBank repository.
-
BilingBank Restricted
BilingBank is a component of TalkBank dedicated to providing corpora for the study of multilingualism. This bank is part of the TalkBank repository.
-
Do dyslexia and stuttering share a processing deficit? Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Do dyslexia and stuttering share a processing deficit?”.
-
FluencyBank Restricted
FluencyBank is a shared database for the study of fluency development. Participants include typically-developing monolingual and bilingual children, children and adults who stutter (C/AWS) or who clutter (C/AWC), and second language learners. This bank is part of the TalkBank repository.
-
General Enhancement of Spatial Hearing in Congenitally Blind People Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “General Enhancement of Spatial Hearing in Congenitally Blind People”.
-
HomeBank Restricted
HomeBank is a resource for shared multi-hour, real-world recordings of children’s everyday experiences (for example, daylong home recordings using the LENA system), plus tools for analyzing those recordings. It is a component of the TalkBank system.
-
Hypernasality associated with basal ganglia dysfunction: evidence from Parkinson's disease and Huntington's disease Open
This page contains open datasets needed to replicate the primary outcomes for the study “Hypernasality associated with basal ganglia dysfunction: evidence from Parkinson's disease and Huntington's disease”.
-
Normative Reference Values for FEES and VASES: Preliminary Data From 39 Nondysphagic, Community-Dwelling Adults Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Normative Reference Values for FEES and VASES: A Prospective, Observational Study of Non-Dysphagic, Community-Dwelling Adults”.
-
PERCEPT Restricted
Data for PERCEPT-R and PERCEPT-GFTA were collected during 34 separate cross-sectional and longitudinal studies at Syracuse University, Montclair State University, and New York University between 2006 and 2021.
-
PhonBank Restricted
PhonBank is the child phonology component of the TalkBank system
-
Relationships between reading performance and regional spontaneous brain activity following surgical removal of primary left-hemisphere tumors: A resting-state fMRI study Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Relationships between reading performance and regional spontaneous brain activity following surgical removal of primary left-hemisphere tumors: A resting-state fMRI study”.
-
SLABank Restricted
SLABank is a component of TalkBank dedicated to providing corpora for the study of second language acquisition.
-
Supporting Emergent Bilinguals Who Use Augmentative and Alternative Communication and Their Families: Lessons in Telepractice From the COVID-19 Pandemic Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Supporting Emergent Bilinguals Who Use Augmentative and Alternative Communication and Their Families: Lessons in Telepractice From the COVID-19 Pandemic”.
-
TORGO Open
The TORGO database of dysarthric articulation consists of aligned acoustics and measured 3D articulatory features from speakers with either cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS), which are two of the most prevalent causes of speech disability (Kent and Rosen, 2004), and matched controls.
-
TalkBank Restricted
The goal of TalkBank is to foster fundamental research in the study of human communication with an emphasis on spoken communication. Currently, TalkBank provides repositories in 14 research areas.
-
The SEED Corpus Restricted
The SEED corpus includes recordings of single words and continuous speech samples that provide examples of speakers with and without speech disorders.
-
UA-Speech Restricted
The UA-Speech corpus was published in 2008. It originally contained recordings of nineteen individuals with dysarthria as a correlate of cerebral palsy, plus age-matched and gender-matched controls. Of the original nineteen, three have removed permission over the years, and/or their data has been corrupted, so that the distribution now contains data from sixteen speakers.
-
UltraSuite Open
UltraSuite is a repository of ultrasound and acoustic data from child speech therapy sessions.
-
Vowel Acoustics as Predictors of Speech Intelligibility in Dysarthria Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Vowel Acoustics as Predictors of Speech Intelligibility in Dysarthria”.
Raw data is the original, unprocessed information collected directly from participants or instruments, such as audio recordings, sensor outputs, or survey responses.
Datasets
-
ASDBank Restricted
ASDBank includes data on language development from children and adolescents with autism spectrum disorder. This bank is part of the TalkBank repository.
-
AphasiaBank Restricted
AphasiaBank is a shared database of multimedia interactions for the study of communication in aphasia. Access to the data in AphasiaBank is password protected and restricted to members of the AphasiaBank consortium group. This bank is part of the TalkBank repository.
-
BilingBank Restricted
BilingBank is a component of TalkBank dedicated to providing corpora for the study of multilingualism. This bank is part of the TalkBank repository.
-
FluencyBank Restricted
FluencyBank is a shared database for the study of fluency development. Participants include typically-developing monolingual and bilingual children, children and adults who stutter (C/AWS) or who clutter (C/AWC), and second language learners. This bank is part of the TalkBank repository.
-
HomeBank Restricted
HomeBank is a resource for shared multi-hour, real-world recordings of children’s everyday experiences (for example, daylong home recordings using the LENA system), plus tools for analyzing those recordings. It is a component of the TalkBank system.
-
PERCEPT Restricted
Data for PERCEPT-R and PERCEPT-GFTA were collected during 34 separate cross-sectional and longitudinal studies at Syracuse University, Montclair State University, and New York University between 2006 and 2021.
-
PhonBank Restricted
PhonBank is the child phonology component of the TalkBank system
-
SLABank Restricted
SLABank is a component of TalkBank dedicated to providing corpora for the study of second language acquisition.
-
TORGO Open
The TORGO database of dysarthric articulation consists of aligned acoustics and measured 3D articulatory features from speakers with either cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS), which are two of the most prevalent causes of speech disability (Kent and Rosen, 2004), and matched controls.
-
TalkBank Restricted
The goal of TalkBank is to foster fundamental research in the study of human communication with an emphasis on spoken communication. Currently, TalkBank provides repositories in 14 research areas.
-
The SEED Corpus Restricted
The SEED corpus includes recordings of single words and continuous speech samples that provide examples of speakers with and without speech disorders.
-
UA-Speech Restricted
The UA-Speech corpus was published in 2008. It originally contained recordings of nineteen individuals with dysarthria as a correlate of cerebral palsy, plus age-matched and gender-matched controls. Of the original nineteen, three have removed permission over the years, and/or their data has been corrupted, so that the distribution now contains data from sixteen speakers.
-
UltraSuite Open
UltraSuite is a repository of ultrasound and acoustic data from child speech therapy sessions.
Analytic data is information that has been cleaned, coded, or transformed into a format ready for analysis; these types of datasets are often shared alongside publications to support computational reproducibility.
Datasets
-
Do dyslexia and stuttering share a processing deficit? Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Do dyslexia and stuttering share a processing deficit?”.
-
General Enhancement of Spatial Hearing in Congenitally Blind People Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “General Enhancement of Spatial Hearing in Congenitally Blind People”.
-
Hypernasality associated with basal ganglia dysfunction: evidence from Parkinson's disease and Huntington's disease Open
This page contains open datasets needed to replicate the primary outcomes for the study “Hypernasality associated with basal ganglia dysfunction: evidence from Parkinson's disease and Huntington's disease”.
-
Normative Reference Values for FEES and VASES: Preliminary Data From 39 Nondysphagic, Community-Dwelling Adults Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Normative Reference Values for FEES and VASES: A Prospective, Observational Study of Non-Dysphagic, Community-Dwelling Adults”.
-
Relationships between reading performance and regional spontaneous brain activity following surgical removal of primary left-hemisphere tumors: A resting-state fMRI study Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Relationships between reading performance and regional spontaneous brain activity following surgical removal of primary left-hemisphere tumors: A resting-state fMRI study”.
-
Supporting Emergent Bilinguals Who Use Augmentative and Alternative Communication and Their Families: Lessons in Telepractice From the COVID-19 Pandemic Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Supporting Emergent Bilinguals Who Use Augmentative and Alternative Communication and Their Families: Lessons in Telepractice From the COVID-19 Pandemic”.
-
Vowel Acoustics as Predictors of Speech Intelligibility in Dysarthria Open
This OSF page contains open datasets needed to replicate the primary outcomes for the study “Vowel Acoustics as Predictors of Speech Intelligibility in Dysarthria”.