- Agile Development
Agile software development is an approach to software development under which requirements and solutions evolve through the collaborative effort of self-organizing and cross-functional teams and their customer(s)/end user(s).
- Alpha Users
A small group of users who are more willing to tolerate working in a system that isn’t as fully developed, providing detailed feedback & maybe some back & forth discussions
A small group of experts that represent the personas featured within the priority User Narratives. For their time and help, Ambassadors will receive early access to the BioData Catalyst platform, free compute time, monetary fee for time, and relevant travel expenses will be covered.
Application Programmer Interfaces. API technologies serve as software-based intermediaries to exchange data.
Amazon Web Services. A provider of cloud services available on-demand.
BagIt is a hierarchical file packaging format for storage and transfer of arbitrary digital content.
BioData Catalyst Coordinating Center
- Beta Users
A slightly larger group than the alpha users who are not as tolerant to a difficult/clunky environment but understand that the version they are using is not polished and they need to give feedback.
- Beta-User Training
Once the platform is available to a broader audience, we will support freely-accessible online training for beta-users at any time.
- Carpentries Instructor Training Program
Ambassadors attend this training program to become BioData Catalyst trainers.
Change Control Management; the systematic approach to managing all changes made to a document or process. Ensures no unnecessary changes are made, all changes are documented, and a process exists for implementing approved change.
Chief Information Officer
- Cloud Computing
Internet-based computing, wherein computing power, networking, storage or applications running on computers outside an organization are presented to that organization in a secure, services-oriented way.
Software units that implement a specific function or functions and which can be reused.
Concept of Operations
A collection of teams and stakeholders working to deliver on the common goals of integrated and advanced cyberinfrastructure, leading-edge data management and analysis tools, FAIR data, and HLBS researcher engagement.
A standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another (for example, Docker).
Chronic Obstructive Pulmonary Disease (COPD) Gene
- Cost Monitoring (level)
At the Epic Level
The Coordinating Center will facilitate this process by developing reporting templates (see example in PM Plan, Financial Management) for distribution to the teams. The BioData Catalyst teams will complete these templates and send them directly to NHLBI.
Each team is responsible for tracking their finances based upon the award conditions and for providing status updates as requested to NHLBI.
- CSOC Alpha
Common Services Operations Center (CSOC): operates cloud, commons, compliance and security services that enable the operation of data commons; has ATO and hosts production system.
- CSOC Beta
Development/testing; Real data in pilot (not production) that can be accessed by users
Common Workflow Language. A simple scripting language for describing computational workflows for performing sequential operations on data.
Data Access Committee: reviews all requests for access to human studies datasets
- Data Access
A process that involves authorization to access different data repositories; part of a User Narrative for the December 2020 release goal A Work Stream PM Plan constraint: NHLBI, as the project sponsor, will identify a process to enable data access by the BioData Catalyst team members and for research users
- Data Commons
Provides tools, applications, and workflows to enable computing large scale data sets in secure workspaces.
- Data Steward
Members of the TOPMed and COPDGene communities who are working with BioData Catalyst teams.
Database of Genotypes and Phenotypes
Data Commons Pilot Phase Consortium. The Other Transaction Awardees, Data Stewards, and the NIH.
- Decision Tree
A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility
- Deep Learning
A machine learning method based on neural networks to learn from data through training to recognize patterns in the data.
Demonstrations and products.
Activities and documentation resulting from the DCPPC to build, test and demonstrate completion of goals of the Data Commons Pilot Phase.
- DEV Environment
Set of processes and programming tools used to create the program or software product
Data Management Incident
Software for running containers, packaged, portable units of code and dependencies that can be run in the same way across many computers. See also Containers.
An open platform developed by the Cancer Genome Collaboratory and used by the GA4GH for sharing Docker-based tools described with the Common Workflow Language (CWL), the Workflow Description Language (WDL), or Nextflow (NFL)
Digital Object Identifier; a code used to permanently and stably identify (usually digital) objects. DOIs provide a standard mechanism for retrieval of metadata about the object, and generally a means to access the data object itself.
Data Use Ontology - a GA4GH standard for automating access (API) to human genomics data (https://github.com/EBISPOT/DUO)
Data Use Oversight System, https://duos.broadinstitute.org/
A software ecosystem is a collection of processes that execute on a shared platform or across shared protocols to provide flexible services. Example: The "BioData Catalyst Ecosystem" - inclusive of all platforms and tools
External Expert Panel. A group of experts who provide guidance and direction to NIH about the program.
A very large user story which can be broken down into executable stories
*NHLBI’s cost-monitoring level
- eRA Commons
Designated ID provider for whitelist
- External Expert Panel
An independent body of experts that inform and advise the work of the BioData Catalyst Consortium.
Findable Accessible Interoperable Reusable .
A functionality at the system level that fulfills a meaningful stakeholder need
*Level at which the CC coordinates
Broad Institute secure cloud environment for analytical processing, https://software.broadinstitute.org/firecloud/
- FISMA moderate environment
Federal Information Security Modernization Act of 2014, amends the Federal Information Security Management Act of 2002 (FISMA), see https://www.dhs.gov/fisma
Global Alliance for Genomics and Health
- GA4GH APIs
The Genomic Data Working Group is a coalition assembled to create interoperability standards for storing and sharing genomic data. The GA4GH Genomics API offers Interoperability for exchanging genomic data between various platforms and organizations by sending simple HTTP requests through a JSON equipped RESTful API.
Google Cloud Platform
Governance, Compliance, and Risk
Gen3 is open source and licensed under the Apache license, which you can use for setting up, developing and operating data commons
An online hub for storing and sharing computer programs and other plain text files. We use it for storage, hosting websites, communication and project management.
- Gold Master
A gold master, or GM, is the final version of software or data ready for release to production; a master version from which copies can be made.
Genome-wide Association Study
Heart, Lung, Blood, Sleep
- Identity Providers
A system entity that creates, maintains, and manages identity information for principals while providing authentication services to relying applications within a federation or distributed network; identity providers offer user authentication as a service
The ability of data or tools from multiple resources to effectively integrate data, or operate processes, across all systems with a moderate degree of effort.
BioData Catalyst Implementation Plan; outlines how the various elements from the planning phase of the BioData Catalyst project will come together to form concrete, operationalized BioData Catalyst platform.
Institutional Review Board; the entity within a research organization that reviews and approves research protocols and clinical research protocols to protect human and animal subjects.
Informatics Research Core
Interoperability Service Agreement
Information Technology Applications Center
- Jupyter Notebooks
A web-based interactive environment for organizing data, performing computation, and visualizing output.
An open source computer operating system
Data about other data
Marks specific progress points on the development timeline, and they can be invaluable in measuring and monitoring the evolution and risk of a program. © Scaled Agile, Inc.
Minimum set of documents
Minimum viable product
National Heart, Lung, and Blood Institute
National Institutes of Health
- NIST Moderate controls
NIST 800-53 - A collection of security controls and assessment procedures that both U.S. Federal and non-governmental organizations can apply to their information systems, policies, and procedures.
Other Transaction Authority - the mechanism of award that NHLBI chose because it provides a degree of flexibility in the scope of the work that is needed to advance this type of high risk/high reward project
A piece of the BioData Catalyst ecosystem. Examples: Terra, Gen3, Seven Bridges, etc.
BioData Catalyst Project Management Plan; breaks down the implementation of BioData Catalyst from the perspective of the project managers involved in the project including details on roles, specific milestones, and the project schedule.
- Portfolio for Jira
Software-as-a-Service project management tool, used to track, roadmap, and visualize various project metrics.
Open source programming language, used extensively in research for data manipulation, analysis, and modeling
- Quality Assurance
The planned and systematic activities implemented in quality management so that quality requirements for a product or service satisfy stated goals and expectations.
- Quality Control
The operational techniques and activities aimed at monitoring and measuring work processes and eliminating the causes of unsatisfactory outputs.
Responsible, Accountable, Consulted and Informed; tool that can be used for identifying roles and responsibilities during an organizational change process; BioData Catalyst RACI
Request for Comment: A process that documents and enables effective interactions between stakeholders to support shared decision making.
- Risk Register
A tool used to continuously identify risk, risk response planning and status updates throughout the project lifecycle. This project risk register is the primary risk reporting tool, and is located in the Project Management Plan.
- Scientific use case
Defined in this project as an analysis of data from the designated sources which has relevance and value in the domain of health sciences, probably implementation and software agnostic.
- SF or SFP
BioData Catalyst Strategic Framework [Plan]; defines what the BioData Catalyst teams have accomplished up to this point, what we plan to accomplish in a timeline fashion, and milestones to track and measure implementation.
Secure File Transfer Protocol
- Software Developers Kit
A set of software development tools that allows the creation of applications for a certain software package, software framework, hardware platform, computer system, or similar development platform
Term of art used in software generation, referring to short, iterative cycles of development, with continuous review of code through daily builds and end-of-sprint demos
Term of art referring to a suite of services that run in the cloud and enable ubiquitous, convenient, on-demand access to a shared pool of configurable computing resources.
- Steering Committee
Responsible for decision-making and communication in BioData Catalyst.
Science & Technology Research Infrastructure for Discovery, Experimentation, and Sustainability
Groups of people led by a Principal Investigator (PI), or PIs, who will complete milestones and produce deliverables. Each group has been assigned a name, represented by the elements on the periodic chart.
- Tiger Teams
A diversified group of experts brought together to investigate, solve, build, or recommend possible solutions to unique situations or problems. Populated with mature experts who know what's at stake, what needs to be done, and how to work well with others; their strengths are diversity of knowledge, a single focus or purpose, cross-functional communications, decision-making sovereignty, and organizational agility.
Trans-Omics for Precision Medicine. One of the primary data sets of the DCPPC.
- TOPMed DCC
TOPMed Data Coordinating Center
A provider-agnostic multi-cloud deployment architecture.
- User Narrative
Descriptions of a user interaction experience within the system from the perspective of a particular persona. User Narratives are further broken down into Features, Epics, and User Stories. Currently formulated into rough 6-month timelines to benchmark progress.
- User story
A description of a software feature from a technical/process-oriented perspective; a backlog item that describes a requirement or functionality for a user
*Finest level of PM Monitoring
Variant Call Format, See http://www.internationalgenome.org/wiki/Analysis/vcf4.0/
A composite of complete server hardware, along with the operating system (OS), which is powered by a remote access layer that allows end users to globally access their server via the Internet
Virtual Private Cloud
A security measure to permit only an approved list of entities (the
A sequence of processes, usually computational in this context, through which a user may analyze data.
Areas to work on/with data within a platform. Examples: projects within Seven Bridges
A collection of related features; orthogonal to a User Narrative
- [Amazon] EFS
[Amazon] Elastic File System a simple, scalable, elastic file system for Linux-based workloads for use with AWS Cloud services and on-premises resources.