Keith Community Mtg BioIT World 2016.pdf

Viewer
Transcript

The tranSMART Foundation: Welcome An Update Keith Elliston CEO

The tranSMART Foundation

• The tranSMART Foundation is a member-driven non-profit foundation developing an open-source community around the tranSMART translational research platform. • The mission of the Foundation is to stimulate the growth and development of the translational research community and the tranSMART platform, and to enable the development of precision medicine

Translational Research Data is Big Data

Populations

Athey and Omenn, 2009

Volume, Variety, Velocity

tranSMART Foundation Ethos: From the Linux Foundation Playbook Open communication • With an open community and publicly visible and accessible communication channels, anyone can join the community and meet hundreds of other community members just like them. Licensing of work • Every contribution to the community is licensed in such a way that it benefits the entire community. The fair licensing of all contributions adds a strong sense of confidence to the security of the community. Open tools • Anyone with an Internet connection and a computer can contribute. All of the development tools and documentation are entirely free and open to access. • These provide a low barrier to entry, and lets new users play with the technology.

tranSMART Foundation 3C Committees

Open SOURCE

Open DATA

Code Committee

Content Committee

Open SCIENCE

Community Committee

tranSMART OPEN SOURCE Platform Data Sources

Databases

Data Extraction

Relational Datawarehouse

Relational Databases

Data Curation

Genome Datawarehouse

Flat Files Data Transformation

Spreadsheets

Data Integration

Clinical Datawarehouse

Imaging Datawarehouse

Data Analytics

tranSMART API Layer

ETL

Data Exploration

Data Visualization

Data Mining

Applications

Cohort Selection and Analysis

Mutation Analysis

Biomarker Discovery

tranSMART: More than just a Platform an approach to enable translational science Applications

Databases

Genome Datawarehouse

Clinical Datawarehouse

Imaging Datawarehouse

tranSMART API Layer

Relational Datawarehouse

Cohort Selection and Analysis

tranSMART – A platform and Community • Open-source and open-data translational biomedical research community • Scientists, Developers, Service Providers, Clinicians

Mutation Analysis

Biomarker Discovery

A Community

The Foundation’s Goals • Establish and sustain tranSMART as the preferred data sharing and analytics platform for translational biomedical research. • Link academic, non-profit and corporate research communities for collaborative research facilitated by tranSMART. • Align and grow a vibrant developer network around the scientific goals of the tranSMART community. • Reduce barriers to entry through use of advanced technologies and an active marketplace

Foundation Governance Board and Committees

Member Committees & Working Groups

Executive

Community

Finance / Audit

Marketing & Communications

Code

Architecture

Content Shared Content & Open Data

Governance / Nominating

Board of Directors (15 Members) External Counsel

CFO & Treasurer

Chief Science Officer

Accounting Firm

Secretary

Chief Technology Officer

VP Industry Engineering

VP Business Development

CEO Audit Firm

VP Marketing

Administrative Asst Key Contracted Resource

Member Representative

“a member-driven organization”

Foundation Members Gold Members (5)

Silver Members (16)

Affiliate Members (3)

Board of Directors • Gold – – – – –

Ed Bowen Christophe Gibeault Juergen Hammer Gil Omenn (Chairman) Eric Perakslis

• Silver – – – – –

Kees van Bochove Sherry Cao Brett Davis Yike Guo Bob Stanley

• At-Large – – – –

Matteo di Tommaso Hiroaki Kitano Gerrit Meijer Jim Serum

Management Team • Chief Executive Officer Keith Elliston, PhD, Seneca Creek Research

• Chief Science Officer Brian Athey, PhD, University of Michigan

• Secretary Kevin Smith, MSIS, University of Michigan

• Vice President Marketing Rudy Potenzone, PhD, iChemLabs, SciencePoint Solutions

• Chief Financial Officer and Treasurer Steve Johnson, partner at CFGI

• VP Engineering John O’Hara • Vice President Business Development Ron Guerriero

The tranSMART Foundation Progress Ann Arbor 2013

Paris 2013

Ann Arbor 2014

Amsterdam 2015

San Diego 2016

tranSMART Foundation Incorporated April 2013

v1.0

v1.1

v1.2

v1.2+

v16.1

Feb 2012

Nov 2013

Aug 2014

Oct 2015

Apr 2016

2012

2013

2014

2015

2016

tranSMART Platform Priorities ‘Commercial grade’ platform • Installation process • Stability and reliability • Data loading • Data availability Innovative platform • Improve the API • Support research projects • Integrate technologies • Support innovative efforts Innovative Efforts • ‘Full genome’ variants • ‘Wearable sensor’ data

tranSMART Platform Vision Easy to install and bring up in basic mode ––––––––––

Commercial Grade Core, with Extensibility ––––––––––

Well Managed Codebase (PMC)

––––––––––

Well Organized Developer Community ––––––––––

Readily shareable data available ––––––––––

1

2

3

4

5

tranSMART Platform Vision Scorecard Easy to install and bring up in basic mode ––––––––––

Commercial Grade Core, with Extensibility ––––––––––

ü Scripted install for Ubuntu/ Postgres

o High quality, commercial grade core

o Scripted install for RedHat/ Oracle

o Extensible core

Well Managed Codebase (PMC)

––––––––––

ü Managed Foundation Master Branch on GitHub

o Support for new feature development and customization

ü Well defined release process

o Uniform coding standards and technologies

ü Centrally managed feature integration

o Packaged install

ü Regular and predictable releases

1

2

Well Organized Developer Community ––––––––––

3

ü Clear path to feature integration into releases

4

Readily shareable data available ––––––––––

o Coordinate a repository of ‘tranSMARTready’ public and private data o Streamline data loading process

5

tranSMART Development and Release Roadmap Project Name

Today

17.2 72 17.1 16.2 16.1 3Q15

Specification

Development

Specification

Development

Specification

Development Specification

Development

Longitudinal Data Scalable Genomics

Release

Improved ETL, XNAT, SmartR

Release

Improved Installation, code governance

Release

4Q15

Release Development

1Q16

2Q16

3Q16

4Q16

1Q17

2Q17

3Q17

4Q17

tranSMART Roadmap Phase I – 16.1 Project (1.2.5) • Re-engineer install process • Simplify Data loading process • Implement bug fixes

Phase II – 16.2 Project (1.3) • • • •

Automated Test, Build and install Automated code inspection Install and Upgrade ‘in place’ Restructuring of the ETL process (public app store for data) • Potential Additional features: SmartR, XNAT Imaging, GWAS Support

Phase IV – 17.2 Project Phase III – 17.1 Project (v2) • Database restructuring – I2B2 integration – Extended Genomic data support

• Federated access to private data – Private data Federation

• Spark / Parquet Architecture

• New applications and capabilities • Data models and applications to support TA’s ̶

̶

Oncology Neuroscience

16.1 Release Project Summary • 16.1 Project Status – A new wiki area has been set up for the testing instances (under ‘Developing the platform’ on the main page) –

Development and Release-candidate test instances

• 16.1 Release Summary – Scripted Install (supports Ubuntu/Postgres) – Digitally signed release artifacts (.tar files) – Extensive bug fixes and enhancements to existing functionality

• 16.1 is now open for beta testing – see the wiki page for links to test instances, instructions for reporting bugs etc. – Beta test period extends until 4 Apr, followed by time to bugfix & build release artifacts

• 16.1 Production Release planned for April 25th

16.2 Development Project Summary • 16.2 Project Status (Completing Development Phase) – Planned functional upgrades (completed developments): • • • •

SmartR (Luxembourg/ITTM) XNAT Plugin (Erasmus) Improved ICE tool and ETL Loading (Sanofi) GWAS enhancements (Pfizer)

– Awaiting Legal Approval • Omics Cohort Selection (JNJ)

– Ubuntu/Postgres and RHEL / Oracle installation support

• 16.2 Release planned for Sept. 30th

17.1 Specification Project Process • 17.1 Project Requirements Gathering – Community Interviews – Use Case Development

• Technology Evaluation • 17.1 Development Proposal Solicitation – Harvard, Hyve, Deloitte

• Foundation Recommendations

17.1 tranSMART Community Priorities (1 of 2) Topic

Platform Robustness and Performance

Description § § § § § § §

Make the platform more supportable and upgradable Make queries more reliable and have better performance on scaling Robust and well-documented APIs Tune queries for Oracle backend Support for federated data model Improved ETL performance Security and access control

Priority

3.77

Longitudinal Data Support

Ability to align clinic or data capture visits across trials Support unscheduled data (e.g., EHRs) Support for i2b2-style 'encounters' table Support for relative / elapsed time queries and comparison operators (e.g., patients with AE within 24 hours of dosing) § Support for date and event selections in advanced workflows and visualizations § Backward compatibility with existing study data, such that existing studies do not have to be reloaded

3.23

Cross Study Support

§ Ability to merge data from multiple trials for analysis without losing data origins § Allow concepts to be independent of studies so that the same concept can be utilized and compared across studies

3.10

§ § § §

17.1 tranSMART Community Priorities (2 of 2) Topic

Description

Priority

Support for High Volume Variant Data

§ Allow loading, querying, exporting, and cohort-building based on genetic variation data, whether derived from sequencing or other platforms (e.g., array-based assays) § Support for whole-genome volumes of variants (10s millions per patient) 3.00 from 100s of thousands of samples § Support all variant types and annotations supported by VCF 4.2 spec (including structural variants) § Support for ARVADOS, and GA4GH API

Upgrade path / i2b2 integration

§ Backward compatibility from previous versions of tranSMART § Ensure date support from previous versions of i2B2 § i2b2 integration must not mean that data have to be loaded twice

3.00

Continuation of SmartR or other plugin visualization/ analytic tool (e.g., Spotfire)

§ Harmonize workflows: Need to maintain SmartR workflows and tranSMART workflows § Ability to create a cohort and data set, then be able to apply multiple workflows against it

2.85

Support use of standard and internal proprietary ontologies

§ Embedded support for ontologies to help make data curation (ETL) and cross study data queries and normalization easier

2.85

Better support for flexible ETL

§ tranSMART should recommend preferred ETL tool § Improved error handling

2.23

tranSMART Foundation Funding Model • Modeled after Linux Foundation – Non-profit Collaborative Alliance – Advantages: • Enables the use of diverse software development resources – Contract, Volunteer, grant funded

• Provides key advantages to stakeholders – Control budget, spending, deliverables and timelines – Prioritize features and requirements

– Disadvantages • Distributed software development resources (contract and volunteer) • Minimal critical mass required to initiate

– Examples: • openDaylight, R-Consortium, CloudFoundry, Automotive Grade Linux, etc. • Over 30 successful programs at Linux Foundation

tranSMART-Pro Alliance Structure tranSMART Foundation

Governing Board

(management and oversight, and enabling infrastructure)

(drives business decisions and prioritization for Alliance)

tranSMART-Pro Alliance End User Advisory Group (develops use cases & recommends new features and enhancements)

Technical Steering Committee (TSC) (drives Alliance technical direction)

How to Join the tranSMART-Pro Development Project • Find out more about the Project at: www.transmartfoundation.org • Contact Keith Elliston at the tranSMART Foundation: [email protected]

Introducing the tranSMART Corporate Sponsors • Platinum Corporate Sponsors