Wednesday Keynote: “Systems Medicine, Transformational Technologies and the Emergence of Predictive, Personalized, Preventive and Participatory (P4) Medicine”

 

Dr Leroy Hood, genomics pioneer

o   Medicine will become an information science over the next decade

o   Very cross-disciplinary now

o   Computation, technology, biology (the driver)

o   Need:

o   Global analyses (all genes etc)

o   Integrate multiscalar data types

o   Dynamic analyses

o   Hypothesis driven

o   Predictive and actionable models (explain emergent behaviour)

o   Discovery approaches are key

o   Quantification and parallelisation of the generation of biological information

o   Big issue: signal to noise challenges

o   Disease arises from disease-perturbed networks

o   Studying a prion infection disease in mice

o   50 million data sets generated a few years ago from one experiment

o   Most of this was noise

o   Used a variety of genetically modified mice to narrow this down to 300 relevant genes

o   Mapped these 300 genes into four types of biological networks

o   Found new genes relevant to the pathogenic networks

o   Could then modify these four disease-perturbed networks with drugs to prevent the onset of the full-blown disease

o   Blood diagnostics is going to be huge

o   Can track the onset and development of disease by the changes in proteins in the blood

o   Can use this to:

§  Detect early

§  Stratify disease (breast cancer is four or five separate diseases)

§  Observe disease progression

§  Follow therapy

o   Working with Complete Genomics Inc.

o   Family genome sequencing helps create much more accurate sequences (Mendelian principle)

o   Making significant use of sequencing microfluidics (nanotech based)

o   Just needs a pin-prick of blood sample

o   In vitro molecular diagnostics

o   Six routine analyses on individual patients within 5-10 years:

o   Complete individual genome sequences (ideally at family level)

o   Complete individual cell genome sequences – targeting cancer

o   Sequencing 1000 transcriptomes simultaneously in one DNA sequencing run from single cancer cells

o   2500 blood organ-specific blood proteins – twice per year, wellness assessment

o   Analyze 10000 B and T cells for the functional regions of their immune receptors

o   Analyze individual stem cells from each individual

o   Predicts that all this can significantly reduce the cost of healthcare

 

Towards climate modelling in the Exaflop era

 

David Randall (Colorado State University)

o   Atmosphere, ocean (and sea ice) and land surface (and vegetation)

o   Partial differential equations, spherical grids

o   Time steps of a few minutes (seconds in Exaflops)

o   Need very fast computers

o   GCM – Global circulation/climate model

o   First 4 models: GFDL, UCLA, NCAR, Livermore

o   First global models came from the UK

o   Current challenges:

o   Global cloud-resolving models (grid resolution <= 1km)

o   Simulation of ice-age cycles

o   Interactive (land and ocean) biology

o   Greater machine parallelism is useful though doesn’t help speeding up models of a fixed size (e.g. modelling ice ages)

o   “The small stuff strongly affects the big stuff”

o   Dust particle 10-6m, storm 104m in size

o   Parameterizations required to account for dust, individual rain drops etc

o   Multiscale modelling growing in popularity

o   Have scaled to 1km resolution on 82,000 cores on Jaguar (160,000 cores imminently)

o   Taking 17 seconds per time step

o   Want:

o   1km grid with 5s time step

o   256 layers up to 80km above surface

o   128B total cells

o   32 time stepped variables per grid cell

o   32TB storage for time stepped fields

o   100 TB of storage all together

o   Will require 500 TFLOPS per simulated day

o   Could run this at 2000 simulated days per day (5 years per day) on a sustained Exaflop machine (3 weeks to simulate a century)

o   Could generate 10 PBytes/day of output

o   Parameterization won’t go away

 

Green Flash: Exascale Computing for Ultra-High Resolution Climate Modeling

Michael Wehner, LBNL

o   Building a model of a global cloud resolving model (!)

o   Dynamics, fast physics, slow physics and multi-grid solver

o   Want to simulate 1000 times faster than real-time

o   176M vertices in a 12 division grid (1.75km resolution)

o   Considering application-specific computer designs for climate modelling

o   Climate codes typically run at 5% of peak FLOPS or less (but what about bandwidth)

o   Going to make extensive use of auto-tuning of software and hardware

o   Also using consumer-electronics components

o   Aiming for 100X energy efficiency improvement over mainstream HPC approach

o   Reduce power consumption by reducing waste

o   E.g. speculative execution, stalls, useless work

o   Considering a machine with:

o   21M cores on ~163,000 CPUs, each of 128 cores

o   Sustained speed of 12 PFLOPS

o   Need 600 MFLOP per processor

o   BW 78MB/s per processor

o   Estimate the whole system would consume about 5MW

§  I think they were being very naēve about this implementation but the core idea is sound