Glossary

  • Agile Development

    Agile software development is an approach to software development under which requirements and solutions evolve through the collaborative effort of self-organizing and cross-functional teams and their customer(s)/end user(s).

  • Alpha Users

    A small group of users who are more willing to tolerate working in a system that isn’t as fully developed, providing detailed feedback & maybe some back & forth discussions

  • Ambassadors

    A small group of experts that represent the personas featured within the priority User Narratives. For their time and help, Ambassadors will receive early access to the DataSTAGE platform, free compute time, monetary fee for time, and relevant travel expenses will be covered.

  • API

    Application Programmer Interfaces. API technologies serve as software-based intermediaries to exchange data.

  • AWS

    Amazon Web Services. A provider of cloud services available on-demand.

  • BagIt

    BagIt is a hierarchical file packaging format for storage and transfer of arbitrary digital content.

  • Beta Users

    A slightly larger group than the alpha users who are not as tolerant to a difficult/clunky environment but understand that the version they are using is not polished and they need to give feedback.

  • Beta-User Training

    Once the platform is available to a broader audience, we will support freely-accessible online training for beta-users at any time.

  • Carpentries Instructor Training Program

    Ambassadors attend this training program to become DataSTAGE trainers.

  • CC

    Coordinating Center

  • CCM

    Change Control Management; the systematic approach to managing all changes made to a document or process. Ensures no unnecessary changes are made, all changes are documented, and a process exists for implementing approved change.

  • CIO

    Chief Information Officer

  • Cloud Computing

    Internet-based computing, wherein computing power, networking, storage or applications running on computers outside an organization are presented to that organization in a secure, services-oriented way.

  • Components

    Software units that implement a specific function or functions and which can be reused.

  • ConOps

    Concept of Operations

  • Consortia

    A collection of teams and stakeholders working to deliver on the common goals of integrated and advanced cyberinfrastructure, leading-edge data management and analysis tools, FAIR data, and HLBS researcher engagement.

  • Containers

    A standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another (for example, Docker).

  • COPDGene

    Chronic Obstructive Pulmonary Disease (COPD) Gene

  • Cost Monitoring (level)

    At the Epic Level
    The Coordinating Center will facilitate this process by developing reporting templates (see example in PM Plan, Financial Management) for distribution to the teams. The DataSTAGE teams will complete these templates and send them directly to NHLBI.
    Each team is responsible for tracking their finances based upon the award conditions and for providing status updates as requested to NHLBI.

  • CSOC Alpha

    Common Services Operations Center (CSOC): operates cloud, commons, compliance and security services that enable the operation of data commons; has ATO and hosts production system.

  • CSOC Beta

    Development/testing; Real data in pilot (not production) that can be accessed by users

  • CWL

    Common Workflow Language. A simple scripting language for describing computational workflows for performing sequential operations on data.

  • DAC

    Data Access Committee: reviews all requests for access to human studies datasets

  • Data Access

    A process that involves authorization to access different data repositories; part of a User Narrative for the December 2020 release goal A Work Stream PM Plan constraint: NHLBI, as the project sponsor, will identify a process to enable data access by the DataSTAGE team members and for research users

  • Data Commons

    Provides tools, applications, and workflows to enable computing large scale data sets in secure workspaces.

  • Data Steward (STAGE)

    Members of the TOPMed and COPDGene communities who are working with DataSTAGE teams.

  • DataSTAGE

    Data Storage, Toolspace, Access, and Analytics for biG-data Empowerment. HLBS researchers can go to this cloud-based platform to find, search, access, share, store, crosslink, and compute on large scale data sets.

  • dBGaP

    Database of Genotypes and Phenotypes

  • DCPPC

    Data Commons Pilot Phase Consortium. The Other Transaction Awardees, Data Stewards, and the NIH.

  • Decision Tree

    A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility

  • Deep Learning

    A machine learning method based on neural networks to learn from data through training to recognize patterns in the data.

  • Deliverables

    Demonstrations and products.

  • Demos

    Activities and documentation resulting from the DCPPC to build, test and demonstrate completion of goals of the Data Commons Pilot Phase.

  • DEV Environment

    Set of processes and programming tools used to create the program or software product

  • DMI

    Data Management Incident

  • Docker

    Software for running containers, packaged, portable units of code and dependencies that can be run in the same way across many computers. See also Containers.

  • Dockstore

    An open platform developed by the Cancer Genome Collaboratory and used by the GA4GH for sharing Docker-based tools described with the Common Workflow Language (CWL), the Workflow Description Language (WDL), or Nextflow (NFL)

  • DOI

    Digital Object Identifier; a code used to permanently and stably identify (usually digital) objects. DOIs provide a standard mechanism for retrieval of metadata about the object, and generally a means to access the data object itself.

  • DUOS

    Data Use Oversight System, https://duos.broadinstitute.org/

  • Ecosystem

    A software ecosystem is a collection of processes that execute on a shared platform or across shared protocols to provide flexible services.

  • EEP

    External Expert Panel. A group of experts who provide guidance and direction to NIH about the program.

  • Epic

    A very large user story which can be broken down into executable stories

    *NHLBI’s cost-monitoring level

  • eRA Commons

    Designated ID provider for whitelist

  • External Expert Panel

    An independent body of experts that inform and advise the work of the DataSTAGE Consortium.

  • FAIR

    Findable Accessible Interoperable Reusable .

  • Feature

    A functionality at the system level that fulfills a meaningful stakeholder need

    *Level at which the CC coordinates

  • FireCloud

    Broad Institute secure cloud environment for analytical processing, https://software.broadinstitute.org/firecloud/

  • FISMA moderate environment

    Federal Information Security Modernization Act of 2014, amends the Federal Information Security Management Act of 2002 (FISMA), see https://www.dhs.gov/fisma

  • FS

    Full Stack

  • GA4GH

    Global Alliance for Genomics and Health

  • GA4GH APIs

    The Genomic Data Working Group is a coalition assembled to create interoperability standards for storing and sharing genomic data. The GA4GH Genomics API offers Interoperability for exchanging genomic data between various platforms and organizations by sending simple HTTP requests through a JSON equipped RESTful API.

  • GCP

    Google Cloud Platform

  • GCR

    Governance, Compliance, and Risk

  • Gen3

    Gen3 is open source and licensed under the Apache license, which you can use for setting up, developing and operating data commons

  • GitHub

    An online hub for storing and sharing computer programs and other plain text files. We use it for storage, hosting websites, communication and project management.

  • Gold Master

    A gold master, or GM, is the final version of software or data ready for release to production; a master version from which copies can be made.

  • GWAS

    Genome-wide Association Study

  • HLBS

    Heart, Lung, Blood, Sleep

  • Identity Providers

    A system entity that creates, maintains, and manages identity information for principals while providing authentication services to relying applications within a federation or distributed network; identity providers offer user authentication as a service

  • Interoperability

    The ability of data or tools from multiple resources to effectively integrate data, or operate processes, across all systems with a moderate degree of effort.

  • IP

    DataSTAGE Implementation Plan; outlines how the various elements from the planning phase of the DataSTAGE project will come together to form a concrete, operationalized DataSTAGE platform

  • IRB

    Institutional Review Board; the entity within a research organization that reviews and approves research protocols and clinical research protocols to protect human and animal subjects.

  • IRC

    Informatics Research Core

  • ISA

    Interoperability Service Agreement

  • ITAC

    Information Technology Applications Center

  • Jupyter Notebooks

    A web-based interactive environment for organizing data, performing computation, and visualizing output.

  • Linux

    An open source computer operating system

  • Metadata

    Data about other data

  • Milestone

    Marks specific progress points on the development timeline, and they can be invaluable in measuring and monitoring the evolution and risk of a program. © Scaled Agile, Inc.

  • MSD

    Minimum set of documents

  • MVP

    Minimum viable product

  • NHLBI

    National Heart, Lung, and Blood Institute

  • NIH

    National Institutes of Health

  • NIST Moderate controls

    NIST 800-53 - A collection of security controls and assessment procedures that both U.S. Federal and non-governmental organizations can apply to their information systems, policies, and procedures.

  • OTA

    Other Transaction Authority - the mechanism of award that NHLBI chose because it provides a degree of flexibility in the scope of the work that is needed to advance this type of high risk/high reward project

  • PI

    Principal Investigator

  • PM

    Project Manager

  • PMP

    DataSTAGE Project Management Plan; breaks down the implementation of DataSTAGE from the perspective of the project managers involved in the project including details on roles, specific milestones, and the project schedule.

  • PO

    Program Officer

  • Portfolio for Jira

    Software-as-a-Service project management tool, used to track, roadmap, and visualize various project metrics.

  • Python

    Open source programming language, used extensively in research for data manipulation, analysis, and modeling

  • Quality Assurance

    The planned and systematic activities implemented in quality management so that quality requirements for a product or service satisfy stated goals and expectations.

  • Quality Control

    The operational techniques and activities aimed at monitoring and measuring work processes and eliminating the causes of unsatisfactory outputs.

  • RACI

    Responsible, Accountable, Consulted and Informed; tool that can be used for identifying roles and responsibilities during an organizational change process; DataSTAGE RACI

  • RFC

    Request for Comment: A process that documents and enables effective interactions between stakeholders to support shared decision making.

  • Risk Register

    A tool used to continuously identify risk, risk response planning and status updates throughout the project lifecycle. This project risk register is the primary risk reporting tool, and is located in the Project Management Plan.

  • SC

    Steering Committee

  • Scientific use case

    Defined in this project as an analysis of data from the designated sources which has relevance and value in the domain of health sciences, probably implementation and software agnostic.

  • SF or SFP

    DataSTAGE Strategic Framework [Plan]; defines what the DataSTAGE teams have accomplished up to this point, what we plan to accomplish in a timelined fashion, and milestones to track and measure implementation.

  • SFTP

    Secure File Transfer Protocol

  • Software Developers Kit

    A set of software development tools that allows the creation of applications for a certain software package, software framework, hardware platform, computer system, or similar development platform

  • Sprints

    Term of art used in software generation, referring to short, iterative cycles of development, with continuous review of code through daily builds and end-of-sprint demos

  • Stack

    Term of art referring to a suite of services that run in the cloud and enable ubiquitous, convenient, on-demand access to a shared pool of configurable computing resources.

  • STAGECC

    DataSTAGE Coordinating Center

  • STAGEWS

    DataSTAGE Whole System

  • Steering Committee

    Responsible for decision-making and communication in DataSTAGE.

  • STRIDES

    Science & Technology Research Infrastructure for Discovery, Experimentation, and Sustainability

  • Team

    Groups of people led by a Principal Investigator (PI), or PIs, who will complete milestones and produce deliverables. Each group has been assigned a name, represented by the elements on the periodic chart.

  • Tiger Teams

    A diversified group of experts brought together to investigate, solve, build, or recommend possible solutions to unique situations or problems. Populated with mature experts who know what's at stake, what needs to be done, and how to work well with others; their strengths are diversity of knowledge, a single focus or purpose, cross-functional communications, decision-making sovereignty, and organizational agility.

  • TOPMed

    Trans-Omics for Precision Medicine. One of the primary data sets of the DCPPC.

  • TOPMed DCC

    TOPMed Data Coordinating Center

  • Trans-cloud

    A provider-agnostic multi-cloud deployment architecture.

  • User Narrative

    Descriptions of a user interaction experience within the system from the perspective of a particular persona. User Narratives are further broken down into Features, Epics, and User Stories. Currently formulated into rough 6-month timelines to benchmark progress.

  • User story

    A description of a software feature from a technical/process-oriented perspective; a backlog item that describes a requirement or functionality for a user

    *Finest level of PM Monitoring

  • VCF

    Variant Call Format, See http://www.internationalgenome.org/wiki/Analysis/vcf4.0/

  • VDS

    A composite of complete server hardware, along with the operating system (OS), which is powered by a remote access layer that allows end users to globally access their server via the Internet

  • VPC

    Virtual Private Cloud

  • Whitelist

    A security measure to permit only an approved list of entities (the

  • Workflow

    A sequence of processes, usually computational in this context, through which a user may analyze data.

  • Workstream

    A collection of related features; orthogonal to a User Narrative

  • [Amazon] EFS

    [Amazon] Elastic File System a simple, scalable, elastic file system for Linux-based workloads for use with AWS Cloud services and on-premises resources.