BrainScape: An open-source framework for integrating and preprocessing anatomical MRI datasets
MRI has revolutionized our ability to investigate and understand brain structure and function in health and disease. A large amount of MRI data is widely available to researchers, both from large-scale multi-site consortia and smaller site-specific datasets. This wealth of MRI data offers opportunit...
        Saved in:
      
    
          | Published in | Imaging neuroscience (Cambridge, Mass.) Vol. 3 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        255 Main Street, 9th Floor, Cambridge, Massachusetts 02142, USA
          MIT Press
    
        22.10.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2837-6056 2837-6056  | 
| DOI | 10.1162/IMAG.a.944 | 
Cover
| Summary: | MRI has revolutionized our ability to investigate and understand brain structure and function in health and disease. A large amount of MRI data is widely available to researchers, both from large-scale multi-site consortia and smaller site-specific datasets. This wealth of MRI data offers opportunities to advance our understanding of the brain, particularly through machine learning and deep learning approaches that rely on large sample sizes to reveal complex associations between brain organization and its behavioral and clinical associations. Many large-scale initiatives provide extensive datasets with sufficient statistical power to support reproducibility, but reproducibility alone does not ensure clinical relevance or broad generalizability due to narrow demographic representations and minimized dataset variability. Recent work highlights the need to embrace dataset variability and open-science collaborations for pooling heterogeneous datasets. Nevertheless, effectively integrating these diverse resources remains a significant challenge. Inconsistencies in organization, data formatting, acquisition protocols, and metadata remain, especially for smaller, site-specific datasets, despite ongoing efforts within the neuroimaging community to standardize data sharing practices. To address these issues, we introduce BrainScape: a curated collection of 160 publicly available MRI datasets packaged with an open-source, plugin-based Python framework that automates the download, organization, preprocessing, and demographic attachment of the MRI data. Each individual dataset includes a detailed configuration file capturing all dataset-specific parameters, enabling other researchers to regenerate the BrainScape dataset. The current BrainScape dataset integrates 160 datasets, encompassing a total of 27227 subjects and 46583 multimodal MRI scans after quality control. The BrainScape framework’s pipeline effectively aggregates these heterogeneous datasets while preserving the original dataset structure and demographic details. Its modular design allows integration into data pipelines, supporting large-scale studies involving diverse cohorts and targeted research on rare phenotypes. BrainScape framework employs an easy-to-use plugin-based architecture with distinct modules for data downloading, file mapping, validation, preprocessing, and demographics attachment. Furthermore, each MR image can be traced to its source project and repository, and subjects excluded from datasets are documented in dedicated dataset-specific configuration files, providing transparent and reproducible exclusion criteria. BrainScape dataset includes multiple MRI modalities such as T1-weighted (T1w), T2-weighted (T2w), gadolinium-enhanced T1-weighted (T1Gd), and fluid-attenuated inversion recovery (FLAIR) from diverse sources and integrates key demographic fields, such as age, sex, and handedness, for large-scale studies. This unified workflow reduces manual labor and minimizes the risk of data duplication and biases. By providing automated, transparent, and configurable workflows, BrainScape hopes to address open science challenges, accelerate data-driven investigations, and promote inclusivity and reproducibility in neuroscience research. | 
|---|---|
| Bibliography: | 2025 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23  | 
| ISSN: | 2837-6056 2837-6056  | 
| DOI: | 10.1162/IMAG.a.944 |