The tranSMART Foundation: Welcome An Update Keith Elliston CEO
The tranSMART Foundation
• The tranSMART Foundation is a member-driven non-profit foundation developing an open-source community around the tranSMART translational research platform. • The mission of the Foundation is to stimulate the growth and development of the translational research community and the tranSMART platform, and to enable the development of precision medicine
Translational Research Data is Big Data
Populations
Athey and Omenn, 2009
Volume, Variety, Velocity
tranSMART Foundation Ethos: From the Linux Foundation Playbook Open communication • With an open community and publicly visible and accessible communication channels, anyone can join the community and meet hundreds of other community members just like them. Licensing of work • Every contribution to the community is licensed in such a way that it benefits the entire community. The fair licensing of all contributions adds a strong sense of confidence to the security of the community. Open tools • Anyone with an Internet connection and a computer can contribute. All of the development tools and documentation are entirely free and open to access. • These provide a low barrier to entry, and lets new users play with the technology.
tranSMART Foundation 3C Committees
Open SOURCE
Open DATA
Code Committee
Content Committee
Open SCIENCE
Community Committee
tranSMART OPEN SOURCE Platform Data Sources
Databases
Data Extraction
Relational Datawarehouse
Relational Databases
Data Curation
Genome Datawarehouse
Flat Files Data Transformation
Spreadsheets
Data Integration
Clinical Datawarehouse
Imaging Datawarehouse
Data Analytics
tranSMART API Layer
ETL
Data Exploration
Data Visualization
Data Mining
Applications
Cohort Selection and Analysis
Mutation Analysis
Biomarker Discovery
tranSMART: More than just a Platform an approach to enable translational science Applications
Databases
Genome Datawarehouse
Clinical Datawarehouse
Imaging Datawarehouse
tranSMART API Layer
Relational Datawarehouse
Cohort Selection and Analysis
tranSMART – A platform and Community • Open-source and open-data translational biomedical research community • Scientists, Developers, Service Providers, Clinicians
Mutation Analysis
Biomarker Discovery
A Community
The Foundation’s Goals • Establish and sustain tranSMART as the preferred data sharing and analytics platform for translational biomedical research. • Link academic, non-profit and corporate research communities for collaborative research facilitated by tranSMART. • Align and grow a vibrant developer network around the scientific goals of the tranSMART community. • Reduce barriers to entry through use of advanced technologies and an active marketplace
Foundation Governance Board and Committees
Member Committees & Working Groups
Executive
Community
Finance / Audit
Marketing & Communications
Code
Architecture
Content Shared Content & Open Data
Governance / Nominating
Board of Directors (15 Members) External Counsel
CFO & Treasurer
Chief Science Officer
Accounting Firm
Secretary
Chief Technology Officer
VP Industry Engineering
VP Business Development
CEO Audit Firm
VP Marketing
Administrative Asst Key Contracted Resource
Member Representative
“a member-driven organization”
Foundation Members Gold Members (5)
Silver Members (16)
Affiliate Members (3)
Board of Directors • Gold – – – – –
Ed Bowen Christophe Gibeault Juergen Hammer Gil Omenn (Chairman) Eric Perakslis
• Silver – – – – –
Kees van Bochove Sherry Cao Brett Davis Yike Guo Bob Stanley
• At-Large – – – –
Matteo di Tommaso Hiroaki Kitano Gerrit Meijer Jim Serum
Management Team • Chief Executive Officer Keith Elliston, PhD, Seneca Creek Research
• Chief Science Officer Brian Athey, PhD, University of Michigan
• Secretary Kevin Smith, MSIS, University of Michigan
• Vice President Marketing Rudy Potenzone, PhD, iChemLabs, SciencePoint Solutions
• Chief Financial Officer and Treasurer Steve Johnson, partner at CFGI
• VP Engineering John O’Hara • Vice President Business Development Ron Guerriero
The tranSMART Foundation Progress Ann Arbor 2013
Paris 2013
Ann Arbor 2014
Amsterdam 2015
San Diego 2016
tranSMART Foundation Incorporated April 2013
v1.0
v1.1
v1.2
v1.2+
v16.1
Feb 2012
Nov 2013
Aug 2014
Oct 2015
Apr 2016
2012
2013
2014
2015
2016
tranSMART Platform Priorities ‘Commercial grade’ platform • Installation process • Stability and reliability • Data loading • Data availability Innovative platform • Improve the API • Support research projects • Integrate technologies • Support innovative efforts Innovative Efforts • ‘Full genome’ variants • ‘Wearable sensor’ data
tranSMART Platform Vision Easy to install and bring up in basic mode ––––––––––
Commercial Grade Core, with Extensibility ––––––––––
Well Managed Codebase (PMC)
––––––––––
Well Organized Developer Community ––––––––––
Readily shareable data available ––––––––––
1
2
3
4
5
tranSMART Platform Vision Scorecard Easy to install and bring up in basic mode ––––––––––
Commercial Grade Core, with Extensibility ––––––––––
ü Scripted install for Ubuntu/ Postgres
o High quality, commercial grade core
o Scripted install for RedHat/ Oracle
o Extensible core
Well Managed Codebase (PMC)
––––––––––
ü Managed Foundation Master Branch on GitHub
o Support for new feature development and customization
ü Well defined release process
o Uniform coding standards and technologies
ü Centrally managed feature integration
o Packaged install
ü Regular and predictable releases
1
2
Well Organized Developer Community ––––––––––
3
ü Clear path to feature integration into releases
4
Readily shareable data available ––––––––––
o Coordinate a repository of ‘tranSMARTready’ public and private data o Streamline data loading process
5
tranSMART Development and Release Roadmap Project Name
Today
17.2 72 17.1 16.2 16.1 3Q15
Specification
Development
Specification
Development
Specification
Development Specification
Development
Longitudinal Data Scalable Genomics
Release
Improved ETL, XNAT, SmartR
Release
Improved Installation, code governance
Release
4Q15
Release Development
1Q16
2Q16
3Q16
4Q16
1Q17
2Q17
3Q17
4Q17
tranSMART Roadmap Phase I – 16.1 Project (1.2.5) • Re-engineer install process • Simplify Data loading process • Implement bug fixes
Phase II – 16.2 Project (1.3) • • • •
Automated Test, Build and install Automated code inspection Install and Upgrade ‘in place’ Restructuring of the ETL process (public app store for data) • Potential Additional features: SmartR, XNAT Imaging, GWAS Support
Phase IV – 17.2 Project Phase III – 17.1 Project (v2) • Database restructuring – I2B2 integration – Extended Genomic data support
• Federated access to private data – Private data Federation
• Spark / Parquet Architecture
• New applications and capabilities • Data models and applications to support TA’s ̶
̶
Oncology Neuroscience
16.1 Release Project Summary • 16.1 Project Status – A new wiki area has been set up for the testing instances (under ‘Developing the platform’ on the main page) –
Development and Release-candidate test instances
• 16.1 Release Summary – Scripted Install (supports Ubuntu/Postgres) – Digitally signed release artifacts (.tar files) – Extensive bug fixes and enhancements to existing functionality
• 16.1 is now open for beta testing – see the wiki page for links to test instances, instructions for reporting bugs etc. – Beta test period extends until 4 Apr, followed by time to bugfix & build release artifacts
• 16.1 Production Release planned for April 25th
16.2 Development Project Summary • 16.2 Project Status (Completing Development Phase) – Planned functional upgrades (completed developments): • • • •
SmartR (Luxembourg/ITTM) XNAT Plugin (Erasmus) Improved ICE tool and ETL Loading (Sanofi) GWAS enhancements (Pfizer)
– Awaiting Legal Approval • Omics Cohort Selection (JNJ)
– Ubuntu/Postgres and RHEL / Oracle installation support
• 16.2 Release planned for Sept. 30th
17.1 Specification Project Process • 17.1 Project Requirements Gathering – Community Interviews – Use Case Development
• Technology Evaluation • 17.1 Development Proposal Solicitation – Harvard, Hyve, Deloitte
• Foundation Recommendations
17.1 tranSMART Community Priorities (1 of 2) Topic
Platform Robustness and Performance
Description § § § § § § §
Make the platform more supportable and upgradable Make queries more reliable and have better performance on scaling Robust and well-documented APIs Tune queries for Oracle backend Support for federated data model Improved ETL performance Security and access control
Priority
3.77
Longitudinal Data Support
Ability to align clinic or data capture visits across trials Support unscheduled data (e.g., EHRs) Support for i2b2-style 'encounters' table Support for relative / elapsed time queries and comparison operators (e.g., patients with AE within 24 hours of dosing) § Support for date and event selections in advanced workflows and visualizations § Backward compatibility with existing study data, such that existing studies do not have to be reloaded
3.23
Cross Study Support
§ Ability to merge data from multiple trials for analysis without losing data origins § Allow concepts to be independent of studies so that the same concept can be utilized and compared across studies
3.10
§ § § §
17.1 tranSMART Community Priorities (2 of 2) Topic
Description
Priority
Support for High Volume Variant Data
§ Allow loading, querying, exporting, and cohort-building based on genetic variation data, whether derived from sequencing or other platforms (e.g., array-based assays) § Support for whole-genome volumes of variants (10s millions per patient) 3.00 from 100s of thousands of samples § Support all variant types and annotations supported by VCF 4.2 spec (including structural variants) § Support for ARVADOS, and GA4GH API
Upgrade path / i2b2 integration
§ Backward compatibility from previous versions of tranSMART § Ensure date support from previous versions of i2B2 § i2b2 integration must not mean that data have to be loaded twice
3.00
Continuation of SmartR or other plugin visualization/ analytic tool (e.g., Spotfire)
§ Harmonize workflows: Need to maintain SmartR workflows and tranSMART workflows § Ability to create a cohort and data set, then be able to apply multiple workflows against it
2.85
Support use of standard and internal proprietary ontologies
§ Embedded support for ontologies to help make data curation (ETL) and cross study data queries and normalization easier
2.85
Better support for flexible ETL
§ tranSMART should recommend preferred ETL tool § Improved error handling
2.23
tranSMART Foundation Funding Model • Modeled after Linux Foundation – Non-profit Collaborative Alliance – Advantages: • Enables the use of diverse software development resources – Contract, Volunteer, grant funded
• Provides key advantages to stakeholders – Control budget, spending, deliverables and timelines – Prioritize features and requirements
– Disadvantages • Distributed software development resources (contract and volunteer) • Minimal critical mass required to initiate
– Examples: • openDaylight, R-Consortium, CloudFoundry, Automotive Grade Linux, etc. • Over 30 successful programs at Linux Foundation
tranSMART-Pro Alliance Structure tranSMART Foundation
Governing Board
(management and oversight, and enabling infrastructure)
(drives business decisions and prioritization for Alliance)
tranSMART-Pro Alliance End User Advisory Group (develops use cases & recommends new features and enhancements)
Technical Steering Committee (TSC) (drives Alliance technical direction)
How to Join the tranSMART-Pro Development Project • Find out more about the Project at: www.transmartfoundation.org • Contact Keith Elliston at the tranSMART Foundation:
[email protected]
Introducing the tranSMART Corporate Sponsors • Platinum Corporate Sponsors