• API

    Application Programmer Interfaces. API technologies serve as software-based intermediaries to exchange data.

  • AWS

    Amazon Web Services. A provider of cloud services available on-demand.

  • BagIt

    BagIt is a hierarchical file packaging format for storage and transfer of arbitrary digital content.

  • CC

    Coordinating Center

  • Cloud Computing

    Internet-based computing, wherein computing power, networking, storage or applications running on computers outside an organization are presented to that organization in a secure, services-oriented way.

  • Components

    Software units that implement a specific function or functions and which can be reused.

  • Containers

    A standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another (for example, Docker).

  • COPDGene

    Chronic Obstructive Pulmonary Disease (COPD) Gene

  • CWL

    Common Workflow Language. A simple scripting language for describing computational workflows for performing sequential operations on data.

  • Data Steward (STAGE)

    Members of the TOPMed and COPDGene communities who are working with DataSTAGE teams.

  • DataSTAGE

    Data Storage, Toolspace, Access, and analytics for biG-data Empowerment.

  • dBGaP

    Database of Genotypes and Phenotypes


    Data Commons Pilot Phase Consortium. The Other Transaction Awardees, Data Stewards, and the NIH.

  • Deep Learning

    A machine learning method based on neural networks to learn from data through training to recognize patterns in the data.

  • Deliverables

    Demonstrations and products.

  • Demos

    Activities and documentation resulting from the DCPPC to build, test and demonstrate completion of goals of the Data Commons Pilot Phase.

  • Docker

    Software for running containers, packaged, portable units of code and dependencies that can be run in the same way across many computers. See also Containers.

  • Dockstore

    An open platform developed by the Cancer Genome Collaboratory and used by the GA4GH for sharing Docker-based tools described with the Common Workflow Language (CWL), the Workflow Description Language (WDL), or Nextflow (NFL)

  • DOI

    Digital Object Identifier; a code used to permanently and stably identify (usually digital) objects. DOIs provide a standard mechanism for retrieval of metadata about the object, and generally a means to access the data object itself.

  • Ecosystem

    A software ecosystem is a collection of processes that execute on a shared platform or across shared protocols to provide flexible services.

  • EEP

    External Expert Panel. A group of experts who provide guidance and direction to NIH about the program.

  • FAIR

    Findable Accessible Interoperable Reusable .

  • FS

    Full Stack

  • GA4GH

    Global Alliance for Genomics and Health

  • GA4GH APIs

    The Genomic Data Working Group is a coalition assembled to create interoperability standards for storing and sharing genomic data. The GA4GH Genomics API offers Interoperability for exchanging genomic data between various platforms and organizations by sending simple HTTP requests through a JSON equipped RESTful API.

  • GitHub

    An online hub for storing and sharing computer programs and other plain text files. We use it for storage, hosting websites, communication and project management.

  • HLBS

    Heart, Lung, Blood, Sleep

  • Interoperability

    The ability of data or tools from multiple resources to effectively integrate data, or operate processes, across all systems with a moderate degree of effort.

  • Jupyter Notebooks

    A web-based interactive environment for organizing data, performing computation, and visualizing output.

  • Metadata

    Data about other data


    National Heart, Lung, and Blood Institute

  • NIH

    National Institutes of Health

  • PI

    Principal Investigator

  • PM

    Project Manager

  • SC

    Steering Committee

  • Scientific use case

    Defined in this project as an analysis of data from the designated sources which has relevance and value in the domain of health sciences, probably implementation and software agnostic.

  • Sprints

    Term of art used in software generation, referring to short, iterative cycles of development, with continuous review of code through daily builds and end-of-sprint demos

  • Stack

    Term of art referring to a suite of services that run in the cloud and enable ubiquitous, convenient, on-demand access to a shared pool of configurable computing resources.

  • Team

    Groups of people led by a Principal Investigator (PI), or PIs, who will complete milestones and produce deliverables. Each group has been assigned a name, represented by the elements on the periodic chart.

  • Tiger Teams

    A diversified group of experts brought together to investigate, solve, build, or recommend possible solutions to unique situations or problems. Populated with mature experts who know what's at stake, what needs to be done, and how to work well with others; their strengths are diversity of knowledge, a single focus or purpose, cross-functional communications, decision-making sovereignty, and organizational agility.

  • TOPMed

    Trans-Omics for Precision Medicine. One of the primary data sets of the DCPPC.

  • Trans-cloud

    A provider-agnostic multi-cloud deployment architecture.

  • User story

    A description of a software feature from an technical/process oriented perspective.

  • Whitelist

    A security measure to permit only an approved list of entities (the

  • Workflow

    A sequence of processes, usually computational in this context, through which a user may analyze data.