SlideShare una empresa de Scribd logo
1 de 121
BIMCV: The Perfect "Big Data" Storm. 
Collision of Peta Bytes of Population Image Data, Millions of Hardware 
Devices and Thousands of Software Tools. 
e-Infraestructuras 
Nacionales 
Maria de la Iglesia, PhD. http://ceib.san.gva.es
OVERVIEW 
• Big Data 
• Strategic Vision of Big Data in EU 
• Strategic Vision of Big Data in US 
• Big Data in Neuroimaging 
• Population Imaging 
• EuroBioimaging – BIMCV – Valencia Node 
• Neuroimaging 
• Relevant facts
Big Data
Big data techniques and 
technologies 
• Techniques for analyzing big data 
– A/B testing. 
• Association rule learning. 
• Classification. 
• Cluster analysis. 
• Crowdsourcing. 
• Data fusion and data integration. 
– Signal processing 
– natural language processing 
• Data mining.
Big data techniques and 
technologies 
• Techniques for analyzing big data 
– Ensemble learning 
– Genetic algorithms 
– Machine learning 
– Natural language processing (NLP) 
– Neural Networks 
• Pattern recognition 
– Network analysis
Big data techniques and 
technologies 
• Techniques for analyzing big data 
– Optimization 
• Pattern recognition 
• Predictive modeling. 
• Regression. 
• Signal processing 
– time series analysis 
– data fusion 
• Spatial analysis. 
• Statistics.
Big data techniques and 
technologies 
• Big DataTechnologies 
– Big Table. (Proprietary distributed database system 
built on the Google File System. Inspiration for 
Hbase) 
– Business intelligence (BI). BI tools are often used to 
read data that have been previously stored in a data 
warehouse or data mart 
– Cassandra. An open source (free) database 
management system designed to handle huge 
amounts of data on a distributed system. This system 
was originally developed at Facebook and is now 
managed as a project of the Apache Software 
foundation
Big data techniques and 
technologies 
• Big DataTechnologies 
– Cloud computing. 
– Data mart. 
– Data warehouse. using ETL (extract, transform, and load) 
– Distributed system. 
– Dynamo 
– ETL 
– Google File System. 
– Hadoop 
– HBase. 
– MapReduce. 
– Mashup
Big data techniques and 
technologies 
• Big DataTechnologies 
– Non-relational database. 
– R. 
– Relational database. 
– Semi-structured data. 
– SQL. 
– Stream processing. 
– Structured data. 
– Unstructured data. 
– Visualization.
Big data techniques and 
technologies 
• Big DataTechnologies 
– VISUALIZATION 
• Tag cloud 
• Clustergram 
• History flow 
• Spatial information flow
VISUALIZATION: Tag cloud
VISUALIZATION: Clustergram
VISUALIZATION: History flow
VISUALIZATION: Spatial 
information flow
Strategic Vision of Big Data in EU
How Is the Europe Union Responding? 
In Big Data
Panel: Personalized Medicine in the 
Era of Big Data 
EHTEL Symposium 
Tapani Piha 
• Head of Unit for eHealth and Technology 
Assessment 
European Commission 
DG Health and Consumers 
Health Systems and Products
How does Big Data link to the 
Personalized Medicine? 
•Big Data refers to a collection of data sets so 
large and complex, it’s impossible to process 
them with the usual databases and tools 
•The data is gathered (most of the time) by 
people just living their lives (e.g. using mobile 
phones, the internet, driving cars, paying with 
banking cards) 
•Big data is used in the private sector (e.g. 
Google), and in the public sector (e.g. NSA)
Big Data use in public health & 
health care? 
•Research: "In the last five years, more scientific 
data has been generated than in the entire history 
of mankind”1 
•Health care: more evidence about personalized 
treatment, better selection of right provider, better 
equipped health care providers (e.g. IBM's Watson) 
•Public health: better personalized life-style info 
for citizens, earlier detection of epidemics, more 
and quicker access to epidemiological 
information 
12012 Winston Hide, The Promise of Big Data, Harvard Public Health
Commission action on Big Data 
•BIG-project: multi-sectorial initiative started in 
2011 to promote adoption of earlier waves of big 
data technology and contribute to EU 
competitiveness; 
•Green paper on mHealth: to assess market and 
further clarify what is needed in the legal 
framework concerning mHealth 
•Study in health program: to assess the usages 
and adoption of big data programs for (public) 
health systems within the EU.
Strategic Vision of Big Data in US
How Is U.S. Responding? 
National Institute of Standards an 
Technology (NIST) 
NIST is an agency of the U.S. Department of Commerce. 
To search federal science and technology web sites, including online databases see: 
science.org 
NIST program questions: 
Public Inquiries Unit: (301) 975-NIST (6478), Federal Relay Service (800) 877-8339 (TTY). 
NIST, 100 Bureau Drive, Stop 1070, Gaithersburg, MD 20899-1070 
Technical website questions: DO-webmaster@nist.gov
NIST Big Data Public Working Group 
Big Data PWG Overview Presentation 
September 30, 2013 
Wo Chang, NIST 
Robert Marcus, ET-Strategies 
Chaitanya Baru, UC San Diego
Agenda 
• Why Big Data? Why NIST? 
• NBD-PWG Charter 
• Overall Workplan 
• Subgroups Charter and Deliverables 
– Use Case and Requirements SG 
– Definitions and Taxonomies SG 
– Reference Architecture SG 
– Security and Privacy SG 
– Technology Roadmap SG 
• Next Steps 
9/30/13 NBD-PWG Overview 
28
Why Big Data? Why NIST? 
• Why Big Data? There is a broad agreement among commercial, academic, and government 
leaders about the remarkable potential of “Big Data” to spark innovation, fuel commerce, 
and drive progress. 
• Why NIST? (a) Recommendation from January 15 -- 17, 2013 Cloud/Big Data Forum and (b) 
A lack of consensus on some important, fundamental questions is confusing potential users 
and holding back progress. Questions such as: 
– What are the attributes that define Big Data solutions? 
– How is Big Data different from the traditional data environments and related 
applications that we have encountered thus far? 
– What are the essential characteristics of Big Data environments? 
– How do these environments integrate with currently deployed architectures? 
– What are the central scientific, technological, and standardization challenges that 
need to be addressed to accelerate the deployment of robust Big Data solutions? 
NBD-PWG is being launched to address these questions and is charged to develop 
consensus definitions, taxonomies, secure reference architecture, and technology roadmap 
for Big Data that can be embraced by all sectors. 
9/30/13 NBD-PWG Overview 
29
NBD-PWG Deliverables 
Working Drafts version 1.0 for 
1. Big Data Definitions 
2. Big Data Taxonomies 
3. Big Data Requirements 
4. Big Data Security and Privacy Requirements 
5. Big Data Architectures White Paper Survey 
6. Big Data Reference Architectures 
7. Big Data Security and Privacy Reference Architectures 
8. Big Data Technology Roadmap 
9/30/13 NBD-PWG Overview 
30
NBD-PWG Workplan 
9/30/13 NBD-PWG Overview 
31
Big Data Ecosystem in One Sentence 
• Use Clouds running Data Analytics 
Collaboratively processing Big Data to solve 
problems in X-Informatics ( or e-X) 
• X = Astronomy, Biology, Biomedicine, Business, Chemistry, Climate, 
Crisis, Earth Science, Energy, Environment, Finance, Health, Intelligence, 
Lifestyle, Marketing, Medicine, Pathology, Policy, Radar, Security, 
Sensor, Social, Sustainability, Wealth and Wellness with more fields 
(physics) defined implicitly 
• Spans Industry and Science (research) 
• Education: Data Science see recent New York Times articles 
• http://datascience101.wordpress.com/2013/04/13/new-york-times-data- 
science-articles/ 
32
Social Informatics 
Visual&Decision 
Informatics 
33
Big Data Definition 
• More consensus on Data Science definition than that of Big Data 
• Big Data refers to digital data volume, velocity and/or variety 
that: 
– Enable novel approaches to frontier questions previously inaccessible or 
impractical using current or conventional methods; and/or 
– Exceed the storage capacity or analysis capability of current or 
conventional methods and systems; and 
– Differentiates by storing and analyzing population data and not sample 
sizes. 
– Needs management requiring scalability across coupled horizontal 
resources 
34
Vendor-neutral and Technology-agnostic Proposals 
Data Processing Flow 
M0039 
Data Transformation Flow 
M0017 
IT Stack 
M0047 
35
Data Processing Flow 
M0039 
Data Transformation Flow 
M0017 
IT Stack 
M0047 
36 
Vendor-neutral and Technology-agnostic Proposals
Data Processing Flow 
M0039 
IT Stack 
M0047 
Data Transformation Flow 
M0017 
37 
Vendor-neutral and Technology-agnostic Proposals
Vendor-neutral and Technology-agnostic 
Proposals 
Data Transformation Flow 
M0017 
IT Stack 
M0047 
Data Processing Flow 
M0039 
38
Electronic Medical Record (EMR) Data I 
• Application: Large national initiatives around health data are emerging, and 
include developing a digital learning health care system to support 
increasingly evidence-based clinical decisions with timely accurate and up-to- 
date patient-centered clinical information; using electronic observational 
clinical data to efficiently and rapidly translate scientific discoveries into 
effective clinical treatments; and electronically sharing integrated health 
data to improve healthcare process efficiency and outcomes. These key 
initiatives all rely on high-quality, large-scale, standardized and aggregate 
health data. One needs advanced methods for normalizing patient, 
provider, facility and clinical concept identification within and among 
separate health care organizations to enhance models for defining and 
extracting clinical phenotypes from non-standard discrete and free-text 
clinical data using feature selection, information retrieval and machine 
learning decision-models. One must leverage clinical phenotype data to 
support cohort selection, clinical outcomes research, and clinical decision 
support. 
40 
PP, Fusion, S/Q, Index Parallelism Streaming over EMR (a set per person), viewers
Electronic Medical Record (EMR) Data II 
• Current Approach: Clinical data from more than 1,100 discrete logical, 
operational healthcare sources in the Indiana Network for Patient Care 
(INPC) the nation's largest and longest-running health information 
exchange. This describes more than 12 million patients, more than 4 
billion discrete clinical observations. > 20 TB raw data. Between 
500,000 and 1.5 million new real-time clinical transactions added per 
day. 
• Futures: Teradata, PostgreSQL and MongoDB supporting information 
retrieval methods to identify relevant clinical features (tf-idf, latent 
semantic analysis, mutual information). Natural Language Processing 
techniques to extract relevant clinical features. Validated features will 
be used to parameterize clinical phenotype decision models based on 
maximum likelihood estimators and Bayesian networks. Decision 
models will be used to identify a variety of clinical phenotypes such as 
diabetes, congestive heart failure, and pancreatic cancer. 
41
Pathology Imaging/ Digital Pathology I 
• Application: Digital pathology imaging is an emerging field where examination of high 
resolution images of tissue specimens enables novel and more effective ways for 
disease diagnosis. Pathology image analysis segments massive (millions per image) 
spatial objects such as nuclei and blood vessels, represented with their boundaries, 
along with many extracted image features from these objects. The derived information 
is used for many complex queries and analytics to support biomedical research and 
clinical diagnosis. 
42 
MR, MRIter, PP, Classification Streaming Parallelism over Images
Pathology Imaging/ Digital Pathology II 
• Current Approach: 1GB raw image data + 1.5GB analytical results per 2D image. MPI for 
image analysis; MapReduce + Hive with spatial extension on supercomputers and 
clouds. GPU’s used effectively. Figure 3 of section 2.12 shows the architecture of 
Hadoop-GIS, a spatial data warehousing system over MapReduce to support spatial 
analytics for analytical pathology imaging. 
43 
• Futures: Recently, 3D pathology imaging 
is made possible through 3D laser 
technologies or serially sectioning 
hundreds of tissue sections onto slides 
and scanning them into digital images. 
Segmenting 3D microanatomic objects 
from registered serial images could 
produce tens of millions of 3D objects 
from a single image. This provides a 
deep “map” of human tissues for next 
generation diagnosis. 1TB raw image 
data + 1TB analytical results per 3D 
image and 1PB data per moderated 
hospital per year. 
Architecture of Hadoop-GIS, a spatial data warehousing system over MapReduce 
to support spatial analytics for analytical pathology imaging
18: Computational Bioimaging 
• Application: Data delivered from bioimaging is increasingly automated, higher 
resolution, and multi-modal. This has created a data analysis bottleneck that, if 
resolved, can advance the biosciences discovery through Big Data techniques. 
• Current Approach: The current piecemeal analysis approach does not scale to 
situation where a single scan on emerging machines is 32TB and medical 
diagnostic imaging is annually around 70 PB even excluding cardiology. One 
needs a web-based one-stop-shop for high performance, high throughput 
image processing for producers and consumers of models built on bio-imaging 
data. 
• Futures: Goal is to solve that bottleneck with extreme scale computing with 
community-focused science gateways to support the application of massive 
data analysis toward massive imaging data sets. Workflow components include 
data acquisition, storage, enhancement, minimizing noise, segmentation of 
regions of interest, crowd-based selection and extraction of features, and 
object classification, and organization, and search. Use ImageJ, OMERO, 
VolRover, advanced segmentation and feature detection software. 
44 
MR, MRIter?, PP, Classification Streaming Parallelism over Images
22: Statistical Relational Artificial Intelligence for Health Care 
• Application: The goal of the project is to analyze large, multi-modal medical data 
including different data types such as imaging, EHR, genetic and natural language. This 
approach employs the relational probabilistic models that have the capability of 
handling rich relational data and modeling uncertainty using probability theory. The 
software learns models from multiple data types and can possibly integrate the 
information and reason about complex queries. Users can provide a set of descriptions 
– say for instance, MRI images and demographic data about a particular subject. They 
can then query for the onset of a particular disease (say Alzheimer’s) and the system 
will then provide a probability distribution over the possible occurrence of this disease. 
• Current Approach: A single server can handle a test cohort of a few hundred patients 
with associated data of 100’s of GB. 
• Futures: A cohort of millions of patient can involve petabyte datasets. Issues include 
availability of too much data (as images, genetic sequences etc) that complicate 
analysis. A major challenge lies in aligning the data and merging from multiple sources 
in a form that can be made useful for a combined analysis. Another issue is that 
sometimes, large amount of data is available about a single subject but the number of 
subjects themselves is not very high (i.e., data imbalance). This can result in learning 
algorithms picking up random correlations between the multiple data types as 
important features in analysis. 
MRIter, EGO Streaming Parallelism over People and their EMR 45
El paradigma P4 de la Medicina 
PREDICTIVO PREVENTIVO PERSONALIZADO PARTICIPATIVO
El paradigma V4 en Big Data 
Medicina 
V-OLUME V-ARIETY V-ELOCITY V-ALUE
Big Data en Neuroimagen
human neuroimaging is now, officially, a 
“big data” science 
• Among the examples of “big data” featured at 
the meeting was – no surprise - human 
neuroimaging 
• The Brain Research through Advancing 
Innovative Neurotechnologies (BRAIN) Initiative 
• Initiatives surrounding large-scale brain mapping 
are also underway in Europe 
http://www.humanbrainproject.eu 
• Organization for Human Brain Mapping (OHBM; 
http://www.humanbrainmapping.org)
How Big is “Big”? 
• While size is a relative term when it comes to data, 
medical imaging applied to the brain comes in a variety of 
forms which each generating differing types and amounts 
of information about neural structure and/or function. 
• NeuroImage, indicates that since 1995 the amount of 
data collected has doubled approximately every 26 
months. At this rate, by 2015 the amount of acquired 
neuroimaging data alone, discounting header information 
and before more files are generated during data 
processing and statistical analysis, may exceed an average 
of 20GB per published research study
Growth of Neuroimaging 
Study Size 
20000 
15000 
10000 
5000 
0 
1990 1995 2000 2005 2010 2015 2020 
MegaBytes 
Year 
Expected 
Observed 
Predicted 
Van Horn and Toga (in press) Brain Imaging and Behavior
Kryder’s law: Exponential Growth of 
Data 
VOLUME OF DATA 
MB = MEGABYTE = 106, GB = GIGABYTE = 109 
TB = TERABYTE = 1012, PB = PETABYTE = 1015 
COMPUTE 
POWER 
CPU TRANSISTOR 
COUNTS 
MOORE’S LAW 
YEARS 
SINGLE CRYO BRAIN VOLUME 
1600 CM2 
NEUROIMAGING 
(ANNUALLY) 
GENOMICS 
(BP/YR) 
Voxel Resolution Gray Scale Color 200 GB 10 MB 1x105 1985-1989 
Size Count 8bits 16bits 24bits 1 TB 100 MB 1x106 1990-1994 
1cm 12x15x9 1620 3000 4860 50 TB 10 GB 5x106 1995-1999 
1mm 
120x 
150x90 
1.62 
MB 
3.24 MB 4.86 MB 250 TB 1TB 1x107 2000-2004 
100 μm 
1200x 
1500x900 
1.62 
GB 
3.24 GB 4.86 GB 1 PB 30TB 8x106 2005-2009 
10 μm 
12000x 
15000x 
9000 
1.62 
TB 
3.24 TB 4.86 TB 5 PB 1 PB 1x109 2010-2014 
1 μm 
120000x 
150000x 
90000 
1.62 
PB 
3.24 PB 4.86 PB 10+ PB 20+ PB 1x1011 2015-2019 
(estimated)
Big Neuroimaging + Big Genetics = 
REALLY Big Data 
• With the ability to obtain genome-wide sets of single 
nucleotide polymorphism (SNP) information becoming 
routine and the costs of full genomic sequencing rapidly 
becoming affordable. 
• Next Generation Sequencing (NGS) methods, for major 
brain imaging studies such as the Alzheimer’s Disease 
Neuroimaging Initiative (ADNI) (Weiner, Veitch et al. 
2012), with its initially available sample of 832 subjects. 
• As the bond between neuroimaging and genomics grows 
tighter, with both areas growing at incredible rates, disk 
storage, unique data compression techniques
Multisite Consortia and 
Data Sharing 
• Examples of multisite neuroimaging efforts can be found 
in the ubiquitous application of neuroimaging in health 
but also in devestating illnesses such as: 
• Parkinson’s (Evangelou, Maraganore et al. 2009) 
• psychiatric disorders (Schumann, Loth et al. 2010) 
• the mapping of human brain connectivity (Toga, Clark et 
al. 2012 
• databases of aging and aging-related diseases, largescale 
Autism Research (NDAR; Hall,Huerta et al. 2012) and the 
Federal InteracgencyTraumatic Brain Injury Research 
(FITBIR; Bushnik and Gordon 2012)
Multisite Consortia and Data Sharing 
• The various “grass roots” collections of resting-state 
fMRI data maintained as part of the 
“1000 Functional Connectomes” project 
• http://fcon_1000.projects.nitrc.org/ 
(see Biswal, Mennes et al. 2010) 
• Task-based OpenfMRI http://www.openfmri.org 
(Poldrack, Barch et al. 2013) are other notable 
examples.
The Role of Cyberinfrastructure 
• Individual desktop computers are now no longer 
suitable for analyzing potentially petabytes 
worth of brain and genomics data at a time. 
• While the National Science Foundation (NSF) 
has made major investments in the computer 
architecture needed for physics, weather, and 
geological data. 
• Eg. XSEDE, https://www.xsede.org/ , and Open 
Science Grid, https://www.opensciencegrid.org
The Role of Cyberinfrastructure 
• The Neuroimaging Informatics Tools and 
Resources Clearinghouse 
(NITRC; http://www.nitrc.org ) 
• The International Neuroinformatics Coordinating 
Facility (INCF; http://incf.org ) 
Have begun to deploy local clusters with Amazon 
EC2 server technology toward this goal but a larger 
effort will be required involving dedicated 
processing centers or distributed grids of linked 
compute centers.
Many 1,000’s of Software Tools 
• Acquisition, processing, storage/DB, service, migration, mining, analysis, 
visualization, annotation, … “(data-driven) process understanding” 
• Biomedical Imaging 
– There are 100’s of different types of image 
processing algorithms and filters 
– For each type of process there may be dozens 
of 
concrete software products (instance implementations) 
• (Example) Neuroimaging 
– NITRC lists > 500 openly shared software tools 
– For each openly shared tool there may be 
dozens of 
proprietary or less commonly used analogues
Millions of Dispersed Hardware Devices 
• Cisco: "By the end of 2012, the number of mobile-connected devices will 
exceed the number of people on Earth” 
• There will be over 10 billion mobile-connected devices in 2016; i.e., there 
will be 1.3 mobile devices per capita 
– These include phones, tablets, laptops, handheld gaming consoles, e-readers, 
in-car entertainment systems, digital cameras, and “machine-to-machine 
modules” 
• DBs, Clients, Servers, Compute-Nodes, Web-Services, Interfaces, … 
• Solution … 
Dinov et al., BMC 2011
Image 
spatial 
alignment 
Slice 
timing 
adjustment 
Van Horn et al., Nature Neuro, 2004 
Statistical 
modeling 
(e.g. GLM) 
Functional – 
structural 
co-registration 
Raw fMRI 
time series 
High-resolution 
anatomical 
image 
Standardized 
brain atlas 
template 
Image 
smoothing 
Gaussian 
spatial 
filtering 
Experimental 
design matrix 
Study Meta Data 
Scanner protocols 
Subject demographics 
Stimulus timing 
etc. 
Spatial 
normalization 
to atlas space 
Statistical 
results maps 
Graphical 
overlays 
Table of 
statistically 
significant 
voxels in atlas 
space coordinates
Pipeline Version 5.9.1 Features 
Graphical Programming Environment 
11/17/2014 64
Perfect Neuroimaging-Computation Storm? 
• Single Subject Studies (N=1) 
– Genetics: 
• Depending on Coverage(X) 
• Whole Genome Seq Data > 320GB (>80X) 
• Require 2+ TB RAM, and 100+ hrs CPU 
– Imaging: 
• Depending on protocols 
• 40-512 gradient directions Diffusion imaging data 
• Raw (multimodal) Neuroimaging Data > 10 GB 
• Derived Data > 100 GB 
• Require 100GB RAM and 70+ hrs CPU 
• Large Subject Studies 
– Cohort studies (N>10, Typically N~100’s) 
– Multi-Institutional Population-wide Studies (N>1,000) 
– Longitudinal (neuroimaging) studies …
From Biomedical Challenges to Modeling, 
Computation, Tools and Curricular 
Training 
• Quantitative Volumetric and Surface based Stats Analyses 
– Interactome: Challenge↔Models↔Data Analysis↔Computation↔Education 
– Statistics Online Computational Resource Che, et al., JSS (2009) No effect 
Marginal 
Significant
Grid & Cloud Computing 
• UCLA Grids 
Cerebro Medulla 
 1,200 cores 
 1.4TB RAM 
 12,000 jobs/day 
 700 users 
• Amazon Cloud 
 4,300 cores 
 9.6 TB RAM 
 (new) 
– EC2 (Elastic Cloud Computing) 
– S3 (Simple Storage Service) 
• UC Grid 
• Globus GridFTP 
• INI Cluster @ USC 
– 3328 cores, 128GB RAM per 16 cores, 26tb aggregate 
memory space. Connectivity is 5Gbit per 16 cores, 
roughly 4terabit aggregate on comp and another 4.3Tbit 
on the storage. 2.43PB of online storage with over 50TB 
of SSD accelerating it currently.
Neuroimaging Applications: 56-ROI Global 
Shape Analysis (NC vs. IBS/Pain) Group 
Effects 
Data Workflow Protocol Results 
Structural T1 data 
NC IBS 
221 107 
Mean-Curvature between-group 
differences in: 
L_cuneus 
R_angular_gyrus 
Left View 
Right View
Neuroimaging Applications: Stat Mapping 
of Cortical GM Thickness (Group Effects) 
Results 
Left 
Anterior 
Insula 
Data 
Workflow Protocol 
Structural T1 data 
Cortical Models 
1.0 
P-value 
0.0
Pipeline User Community
Population Imaging
Big Data y el sector de la Salud en 
Imagen Poblacional 
• Según Bonnies Feldman “el potencial de Big Data en medicina 
reside en la posibilidad de combinar los datos tradicionales con 
otras nuevas formas de datos, tanto a nivel individual como 
Poblacional” 
• El potencial del Big Data indica que se pueden producir ahorros en 
el sector sanitario a través de varias vías: 
– Transformación de datos en información. 
– Apoyo al autocuidado de las personas. 
– Aumento del conocimiento. 
– Concienciación del estado de salud. 
• El Big Data es una metodología de acceso abierto para integrar 
diferentes tipos de datos en imagen poblacional, cuantificación de 
imagen y extracción de características.
Tipos de Estudios 
• Individual 
• Longitudinal 
0 1 2 M 
• Transversal
Estudios Poblacionales 
• Estudios Poblacionales 
– Si no se forman grupos en la población, se calcula la media 
del parámetro o parámetros. 
– Si se forman grupos (control y Patológicos) se debe realizar 
un contraste de hipótesis. 
• Modelado Poblacional 
– Modelar la degeneración volumétrica de sustancia gris y 
sustancia blanca 
– Establecer parámetros de degeneración 
– Contrastar el estado de un individuo con respecto a dicho 
modelo.
Aplicación en 
Alzheimer
Aplicación a Casos Reales 
Resultados de parámetros globales
Aplicación a Casos Reales 
Resultados de grosor y volumen por estructura, 
junto con los valores de referencia
Aplicación a Casos Reales 
• Representación de la diferencia del volumen en comparación 
con la población
¿Porqué no podemos combinar 
BELLEZA Y CIENCIA?
Objetivos BIMCV 
• Desarrollar e implementar estrategias para 
prevenir o tratar efectivamente las 
enfermedades mediante una infraestructura de 
investigación en imagen asociada a grandes 
estudios poblacionales de imagen. 
– Concepto de “Population Imaging”. 
• Proporcionar datos, 
herramientas y recursos de 
proceso para realizar estudios 
avanzados en imagen.
volBrain system
volBrain pipeline
Segmentación no supervisada de 
Glioblastomas
GIBI230 
Luis Martí-Bonmatí 
Fernando Aparici 
Alexandre Pérez 
Roberto Sanz 
Carlos Infantes 
Jose María Salinas 
Cayetano Hernández 
NEuro-Bioimaging VLC 
Mariam de la Iglesia 
IBIME 
Juan M García-Gómez 
Elies Fuster 
Javier Juan-Albarracín
BIMCV
Nodo Valenciano 
Euro-BioImaging 
Infraestructura Europea para la Investigación en 
Tecnologías de Imagen Biomédica e Imagen 
Biológica. 
Un proyecto sobre la hoja de ruta de las ESFRI en infraestructuras 
de investigación 
www.eurobioimaging.eu
EIBIR key facts and daily work 
In the service of research, 
EIBIR offers to its Network Members: 
- Multidisciplinary networking 
- Project Management 
- Research communication 
- Research Training 
- Meeting organisation 
EIBIR Office 
• Established in 2006 
• Staff: 4.5, incl. 3 Project Managers, 1 assistant 
• Provision of services to Network Members + EIBIR bodies 
• Monitoring European Affairs + research funding opportunities 
• Project management and coordination 
• Information activities and media work 
• Promotion of Network Membership 
• Website and data base updates 
• Congress activities 
• Scientific Advisory Board
Cronología & Financiación 
88 
2013 - 2017 
Fase de 
Construcción 
• Evaluación & 
selección de nodos. 
• Construccion de los 
nodes. 
Financiado por los Estados 
Miembros (¿MINECO?) 
2010 - 2013 
Fase Preparatoria 
• Framework 
• Definición de los 
criterios de 
elegibilidad para los 
nodos 
• Llamada a los 
Nodos, Abierta. 
Financiado por CE 
……… 
2017 - …. 
Fase Operacional 
• Acceso y formación 
• Tecnología y evaluación 
para mejorar el servicio 
Financiado por los Estados Miembros 
& EC
MULTIMODAL 
TECHNOLOGY 
NODE 
Imaging Infrastructure with open user access 
European life scientists as users 
FLAGSHIP NODE 
FLAGSHIP NODE 
FLAGSHIP NODE 
FLAGSHIP NODE 
USER TRAINING 
STAFF TRAINING 
Web-access portal 
Data storage and analysis infrastructure 
User returns with results for publication 
NODES HUB 
MULTIMODAL 
TECHNOLOGY 
NODE
1st Open Call 
Euro-BioImaging Nodes – Expression of Interest 
The 1st Open Call: 1 February – 30 April 2013 
• Multi-Modal Molecular Imaging 
• Phase contrast Imaging 
• High-field MRI 
• MR-PET 
• Population Imaging 
• Data Infrastructure: Challenges Framework 
• The biological imaging community will call for EoIs in 6 technologies
Nodo Valenciano, BIMCV
Resultados 1ª Convocatoria 
Biological Imaging 
Biomedical Imaging 
9 NODOS ESPAÑOLES 
– 18 Instituciones –
MEDICAL IMAGING DATA BANK (BIMCV) 
Expresion of Interest: Population Imaging 
BIG DATA DIASEASE SIGNATURES 
SINGLE TECHNOLOGY FLAGSHIPS 
CONSORTIUM
Evaluation summary and Final ranking 
• The node develops and provides access to a large database of 
imaging data and the associated clinical data records. 
• Big Data repository from hospitals in the Valencia region (5 million 
inhabitants living over an area of 23.255 Km2. average number of 
5.3 million clinical cases per year, from 210 different imaging 
modalities). 
• The access to such data and tools will be an efficient way of 
advancing population imaging studies and research. 
• The node has ability to incorporate data from other facilities
Services offered by the node 
• BIMCV facility provides a multi-level and multi-ology storage 
service (Vendor Neutral Archive). 
• CEIB-CS node integrates access to high-performance 
computational services from local and European 
infrastructures (Principe Felipe Research Centre & UPV-I3M 
Infrastructure). 
• Open access methodology to integrate different data types for 
population imaging, quantitative resources and feature 
extraction. 
• Comprehensive user training
Single Technology Flagship Node – Population Imaging: Valencia 
Evaluation summary and Final ranking: 
• Requires minor improvements (training plan, actually corrected). 
• The node develops and provides access to a large database of imaging data 
and the associated clinical data records. 
• Big Data repository from hospitals in the Valencia region (5 million inhabitants 
living over an area of 23.255 Km2. average number of 5.3 million clinical cases 
per year, from 210 different imaging modalities). 
• The access to such data and tools will be an efficient way of advancing 
population imaging studies and research. 
• The node has ability to incorporate data from other facilities. 
Other facilities 
MEDICAL IMAGING DATA BANK (BIMCV) 
BIG DATA DIASEASE SIGNATURES 
Services offered by the node: 
• BIMCV facility provides a multi-level and multi-ology 
storage service (Vendor Neutral Archive). 
• CEIB-AVS node integrates access to high-performance 
computational services from local and European 
infrastructures (Principe Felipe Research Centre & UPV-I3M 
Infrastructure). 
• Open access methodology to integrate different data 
types for population imaging, quantitative resources and 
feature extraction. 
• Comprehensive user training.
Nodo Valenciano, BIMCV 
Centro de Excelencia en Imagen Biomédica de la Conselleria de 
Sanitat 
Sede CEIB clínica Sede CEIB computo
Servicios
With an Architecture Well Define
Other facilities
Neuroimaging. The landscapes' of the mind
Human Neuroimaging as a 
“Big Data” Science 
The mind landscapes 
http://prezi.com/sseievn7ujcf/?utm_campaign=share&utm_medium=copy
Estudio de la estructura 
Morfometría
Estudio de la estructura 
Tractografía
Estudio de la Función
Estudio de la Función
RESTING STATE
Conectomica
Conectomica
Acknowledgment
Relevant facts
10 K Structural Modeling in 
Neuroimage of Valencia Region 
• Dos becas de la Subdirección General de Sistemas para 
la Salud de la CS. Ingenieros Informáticos o Ingenieros 
de Telecomunicaciones (DOGV 9-07-2014). 
• Se van a medir las estructuras principales del cerebro. 
• En colaboración con LABMAN 
• En colaboración con Brain Dynamics 
• La universidad del Sur de California (Jack Van Horn) 
• Posiblemente con IBIME (volBrain system)
Prototipo de realidad Virtual Aumentada 
ARiBraiN3T (Para android)

Más contenido relacionado

La actualidad más candente

Cvpr2019 mia paperlist
Cvpr2019 mia paperlistCvpr2019 mia paperlist
Cvpr2019 mia paperlistKevin Zhou
 
The Cancer imaging Phenomics Toolkit (CaPTk)
The Cancer imaging Phenomics Toolkit (CaPTk)The Cancer imaging Phenomics Toolkit (CaPTk)
The Cancer imaging Phenomics Toolkit (CaPTk)imgcommcall
 
Digital Pathology at John Hopkins
Digital Pathology at John HopkinsDigital Pathology at John Hopkins
Digital Pathology at John HopkinsWilliam Baird
 
ICT4Life objective on information fusion and algorithm training
ICT4Life objective on information fusion and algorithm trainingICT4Life objective on information fusion and algorithm training
ICT4Life objective on information fusion and algorithm trainingAlejandro Sánchez-Rico
 
Medical image processing studies
Medical image processing studiesMedical image processing studies
Medical image processing studiesBằng Nguyễn Kim
 
Digital pathology and its importance as an omics data layer
Digital pathology and its importance as an omics data layerDigital pathology and its importance as an omics data layer
Digital pathology and its importance as an omics data layerYves Sucaet
 
Artificial intelligence in medical image processing
Artificial intelligence in medical image processingArtificial intelligence in medical image processing
Artificial intelligence in medical image processingFarzad Jahedi
 
PhD Projects in Medical Image Processing Research Assistance
PhD Projects in Medical Image Processing Research AssistancePhD Projects in Medical Image Processing Research Assistance
PhD Projects in Medical Image Processing Research AssistancePhD Services
 
Health IT & Picture Archiving and Communication Systems
Health IT & Picture Archiving and Communication SystemsHealth IT & Picture Archiving and Communication Systems
Health IT & Picture Archiving and Communication SystemsRogier Van de Wetering, PhD
 
Arturas Kaklauskas - Trans-disciplinary knowledge platform - sensors, biometr...
Arturas Kaklauskas - Trans-disciplinary knowledge platform - sensors, biometr...Arturas Kaklauskas - Trans-disciplinary knowledge platform - sensors, biometr...
Arturas Kaklauskas - Trans-disciplinary knowledge platform - sensors, biometr...tu1204
 

La actualidad más candente (14)

Cvpr2019 mia paperlist
Cvpr2019 mia paperlistCvpr2019 mia paperlist
Cvpr2019 mia paperlist
 
The Cancer imaging Phenomics Toolkit (CaPTk)
The Cancer imaging Phenomics Toolkit (CaPTk)The Cancer imaging Phenomics Toolkit (CaPTk)
The Cancer imaging Phenomics Toolkit (CaPTk)
 
Digital Pathology at John Hopkins
Digital Pathology at John HopkinsDigital Pathology at John Hopkins
Digital Pathology at John Hopkins
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
 
ICT4Life objective on information fusion and algorithm training
ICT4Life objective on information fusion and algorithm trainingICT4Life objective on information fusion and algorithm training
ICT4Life objective on information fusion and algorithm training
 
Digital pathology
Digital pathologyDigital pathology
Digital pathology
 
Medical image processing studies
Medical image processing studiesMedical image processing studies
Medical image processing studies
 
Digital pathology and its importance as an omics data layer
Digital pathology and its importance as an omics data layerDigital pathology and its importance as an omics data layer
Digital pathology and its importance as an omics data layer
 
Artificial intelligence in medical image processing
Artificial intelligence in medical image processingArtificial intelligence in medical image processing
Artificial intelligence in medical image processing
 
PhD Projects in Medical Image Processing Research Assistance
PhD Projects in Medical Image Processing Research AssistancePhD Projects in Medical Image Processing Research Assistance
PhD Projects in Medical Image Processing Research Assistance
 
Health IT & Picture Archiving and Communication Systems
Health IT & Picture Archiving and Communication SystemsHealth IT & Picture Archiving and Communication Systems
Health IT & Picture Archiving and Communication Systems
 
Arturas Kaklauskas - Trans-disciplinary knowledge platform - sensors, biometr...
Arturas Kaklauskas - Trans-disciplinary knowledge platform - sensors, biometr...Arturas Kaklauskas - Trans-disciplinary knowledge platform - sensors, biometr...
Arturas Kaklauskas - Trans-disciplinary knowledge platform - sensors, biometr...
 
Picture Archiving and Communication Systems
Picture Archiving and Communication SystemsPicture Archiving and Communication Systems
Picture Archiving and Communication Systems
 

Destacado

Jerry Vigil CV - Software Engineer - San Francisco, CA, USA
Jerry Vigil CV - Software Engineer - San Francisco, CA, USAJerry Vigil CV - Software Engineer - San Francisco, CA, USA
Jerry Vigil CV - Software Engineer - San Francisco, CA, USAMktNeutral
 
Anonimizacion y cqc en imagen de resonancia magnetica
Anonimizacion y cqc en imagen de resonancia magneticaAnonimizacion y cqc en imagen de resonancia magnetica
Anonimizacion y cqc en imagen de resonancia magneticamaigva
 
The 10K Big Data in Brain Imaging of Valencia Region
The 10K Big Data in Brain Imaging of Valencia RegionThe 10K Big Data in Brain Imaging of Valencia Region
The 10K Big Data in Brain Imaging of Valencia Regionmaigva
 
Programa para el análisis de polisomnografía EEG y EMG
Programa para el análisis de polisomnografía EEG y EMGPrograma para el análisis de polisomnografía EEG y EMG
Programa para el análisis de polisomnografía EEG y EMGmaigva
 
Análisis de MRI cerebrales para la detección de tumores
Análisis de MRI cerebrales para la  detección de tumoresAnálisis de MRI cerebrales para la  detección de tumores
Análisis de MRI cerebrales para la detección de tumoresmaigva
 
V Congrés català de Salut Mental de la Infància i l'Adolescència
V Congrés català de Salut Mental de la Infància i l'AdolescènciaV Congrés català de Salut Mental de la Infància i l'Adolescència
V Congrés català de Salut Mental de la Infància i l'Adolescènciamaigva
 
Control de calidad en imagen de resonancia magnética
Control de calidad en imagen de resonancia magnéticaControl de calidad en imagen de resonancia magnética
Control de calidad en imagen de resonancia magnéticamaigva
 
Presentacion mdiv v15
Presentacion mdiv v15Presentacion mdiv v15
Presentacion mdiv v15maigva
 
Ines Mergel CV December 2012
Ines Mergel CV December 2012Ines Mergel CV December 2012
Ines Mergel CV December 2012Ines Mergel
 
Resume_Informatica_4.3yrs_CSC_MCA_from_NIT_Venkat_CV.v1.0
Resume_Informatica_4.3yrs_CSC_MCA_from_NIT_Venkat_CV.v1.0Resume_Informatica_4.3yrs_CSC_MCA_from_NIT_Venkat_CV.v1.0
Resume_Informatica_4.3yrs_CSC_MCA_from_NIT_Venkat_CV.v1.0Venkat Bathem
 
Dan Wang - CV
Dan Wang - CVDan Wang - CV
Dan Wang - CVdw4085
 
Yury's CV as of 2013.03.31
Yury's CV as of 2013.03.31Yury's CV as of 2013.03.31
Yury's CV as of 2013.03.31Yury Velikanov
 
CV_OR(EN)
CV_OR(EN)CV_OR(EN)
CV_OR(EN)oreyesc
 

Destacado (14)

Jerry Vigil CV - Software Engineer - San Francisco, CA, USA
Jerry Vigil CV - Software Engineer - San Francisco, CA, USAJerry Vigil CV - Software Engineer - San Francisco, CA, USA
Jerry Vigil CV - Software Engineer - San Francisco, CA, USA
 
Anonimizacion y cqc en imagen de resonancia magnetica
Anonimizacion y cqc en imagen de resonancia magneticaAnonimizacion y cqc en imagen de resonancia magnetica
Anonimizacion y cqc en imagen de resonancia magnetica
 
The 10K Big Data in Brain Imaging of Valencia Region
The 10K Big Data in Brain Imaging of Valencia RegionThe 10K Big Data in Brain Imaging of Valencia Region
The 10K Big Data in Brain Imaging of Valencia Region
 
Programa para el análisis de polisomnografía EEG y EMG
Programa para el análisis de polisomnografía EEG y EMGPrograma para el análisis de polisomnografía EEG y EMG
Programa para el análisis de polisomnografía EEG y EMG
 
Análisis de MRI cerebrales para la detección de tumores
Análisis de MRI cerebrales para la  detección de tumoresAnálisis de MRI cerebrales para la  detección de tumores
Análisis de MRI cerebrales para la detección de tumores
 
V Congrés català de Salut Mental de la Infància i l'Adolescència
V Congrés català de Salut Mental de la Infància i l'AdolescènciaV Congrés català de Salut Mental de la Infància i l'Adolescència
V Congrés català de Salut Mental de la Infància i l'Adolescència
 
Control de calidad en imagen de resonancia magnética
Control de calidad en imagen de resonancia magnéticaControl de calidad en imagen de resonancia magnética
Control de calidad en imagen de resonancia magnética
 
Presentacion mdiv v15
Presentacion mdiv v15Presentacion mdiv v15
Presentacion mdiv v15
 
Ines Mergel CV December 2012
Ines Mergel CV December 2012Ines Mergel CV December 2012
Ines Mergel CV December 2012
 
Resume_Informatica_4.3yrs_CSC_MCA_from_NIT_Venkat_CV.v1.0
Resume_Informatica_4.3yrs_CSC_MCA_from_NIT_Venkat_CV.v1.0Resume_Informatica_4.3yrs_CSC_MCA_from_NIT_Venkat_CV.v1.0
Resume_Informatica_4.3yrs_CSC_MCA_from_NIT_Venkat_CV.v1.0
 
Dan Wang - CV
Dan Wang - CVDan Wang - CV
Dan Wang - CV
 
Yury's CV as of 2013.03.31
Yury's CV as of 2013.03.31Yury's CV as of 2013.03.31
Yury's CV as of 2013.03.31
 
CV 5 17 15
CV 5 17 15CV 5 17 15
CV 5 17 15
 
CV_OR(EN)
CV_OR(EN)CV_OR(EN)
CV_OR(EN)
 

Similar a BIMCV: The Perfect "Big Data" Storm.

BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
 
Towards a big data roadmap for europe
Towards a big data roadmap for europeTowards a big data roadmap for europe
Towards a big data roadmap for europeBIG Project
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013oj08
 
Big data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesBig data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesNavneet Randhawa
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsDhruv Saxena
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart datacaniceconsulting
 
Proposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdfProposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdfshayamiticharles
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor networkparry prabhu
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Toolsijsrd.com
 
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...European Data Forum
 

Similar a BIMCV: The Perfect "Big Data" Storm. (20)

BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
Towards a big data roadmap for europe
Towards a big data roadmap for europeTowards a big data roadmap for europe
Towards a big data roadmap for europe
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Big data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesBig data: Challenges, Practices and Technologies
Big data: Challenges, Practices and Technologies
 
Big data
Big dataBig data
Big data
 
ppt1.pptx
ppt1.pptxppt1.pptx
ppt1.pptx
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
 
Proposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdfProposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdf
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Cri big data
Cri big dataCri big data
Cri big data
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 

Más de maigva

Bioinformatics valenciabimcv
Bioinformatics valenciabimcvBioinformatics valenciabimcv
Bioinformatics valenciabimcvmaigva
 
Midas invescot 2018_2
Midas invescot 2018_2Midas invescot 2018_2
Midas invescot 2018_2maigva
 
TED AVISA BIMCV
TED AVISA BIMCVTED AVISA BIMCV
TED AVISA BIMCVmaigva
 
Estudio morfológico en conos de crecimiento mediante análisis
Estudio morfológico en conos de crecimiento mediante análisis Estudio morfológico en conos de crecimiento mediante análisis
Estudio morfológico en conos de crecimiento mediante análisis maigva
 
Medical imaging data structure
Medical imaging data structureMedical imaging data structure
Medical imaging data structuremaigva
 
Pint of science 2017 - fisabio
Pint of science 2017 - fisabioPint of science 2017 - fisabio
Pint of science 2017 - fisabiomaigva
 
Defensa tesis jose salinas 2013
Defensa tesis jose salinas 2013Defensa tesis jose salinas 2013
Defensa tesis jose salinas 2013maigva
 

Más de maigva (7)

Bioinformatics valenciabimcv
Bioinformatics valenciabimcvBioinformatics valenciabimcv
Bioinformatics valenciabimcv
 
Midas invescot 2018_2
Midas invescot 2018_2Midas invescot 2018_2
Midas invescot 2018_2
 
TED AVISA BIMCV
TED AVISA BIMCVTED AVISA BIMCV
TED AVISA BIMCV
 
Estudio morfológico en conos de crecimiento mediante análisis
Estudio morfológico en conos de crecimiento mediante análisis Estudio morfológico en conos de crecimiento mediante análisis
Estudio morfológico en conos de crecimiento mediante análisis
 
Medical imaging data structure
Medical imaging data structureMedical imaging data structure
Medical imaging data structure
 
Pint of science 2017 - fisabio
Pint of science 2017 - fisabioPint of science 2017 - fisabio
Pint of science 2017 - fisabio
 
Defensa tesis jose salinas 2013
Defensa tesis jose salinas 2013Defensa tesis jose salinas 2013
Defensa tesis jose salinas 2013
 

Último

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 

Último (20)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 

BIMCV: The Perfect "Big Data" Storm.

  • 1. BIMCV: The Perfect "Big Data" Storm. Collision of Peta Bytes of Population Image Data, Millions of Hardware Devices and Thousands of Software Tools. e-Infraestructuras Nacionales Maria de la Iglesia, PhD. http://ceib.san.gva.es
  • 2. OVERVIEW • Big Data • Strategic Vision of Big Data in EU • Strategic Vision of Big Data in US • Big Data in Neuroimaging • Population Imaging • EuroBioimaging – BIMCV – Valencia Node • Neuroimaging • Relevant facts
  • 4.
  • 5. Big data techniques and technologies • Techniques for analyzing big data – A/B testing. • Association rule learning. • Classification. • Cluster analysis. • Crowdsourcing. • Data fusion and data integration. – Signal processing – natural language processing • Data mining.
  • 6. Big data techniques and technologies • Techniques for analyzing big data – Ensemble learning – Genetic algorithms – Machine learning – Natural language processing (NLP) – Neural Networks • Pattern recognition – Network analysis
  • 7. Big data techniques and technologies • Techniques for analyzing big data – Optimization • Pattern recognition • Predictive modeling. • Regression. • Signal processing – time series analysis – data fusion • Spatial analysis. • Statistics.
  • 8. Big data techniques and technologies • Big DataTechnologies – Big Table. (Proprietary distributed database system built on the Google File System. Inspiration for Hbase) – Business intelligence (BI). BI tools are often used to read data that have been previously stored in a data warehouse or data mart – Cassandra. An open source (free) database management system designed to handle huge amounts of data on a distributed system. This system was originally developed at Facebook and is now managed as a project of the Apache Software foundation
  • 9. Big data techniques and technologies • Big DataTechnologies – Cloud computing. – Data mart. – Data warehouse. using ETL (extract, transform, and load) – Distributed system. – Dynamo – ETL – Google File System. – Hadoop – HBase. – MapReduce. – Mashup
  • 10. Big data techniques and technologies • Big DataTechnologies – Non-relational database. – R. – Relational database. – Semi-structured data. – SQL. – Stream processing. – Structured data. – Unstructured data. – Visualization.
  • 11. Big data techniques and technologies • Big DataTechnologies – VISUALIZATION • Tag cloud • Clustergram • History flow • Spatial information flow
  • 16.
  • 17. Strategic Vision of Big Data in EU
  • 18. How Is the Europe Union Responding? In Big Data
  • 19. Panel: Personalized Medicine in the Era of Big Data EHTEL Symposium Tapani Piha • Head of Unit for eHealth and Technology Assessment European Commission DG Health and Consumers Health Systems and Products
  • 20. How does Big Data link to the Personalized Medicine? •Big Data refers to a collection of data sets so large and complex, it’s impossible to process them with the usual databases and tools •The data is gathered (most of the time) by people just living their lives (e.g. using mobile phones, the internet, driving cars, paying with banking cards) •Big data is used in the private sector (e.g. Google), and in the public sector (e.g. NSA)
  • 21. Big Data use in public health & health care? •Research: "In the last five years, more scientific data has been generated than in the entire history of mankind”1 •Health care: more evidence about personalized treatment, better selection of right provider, better equipped health care providers (e.g. IBM's Watson) •Public health: better personalized life-style info for citizens, earlier detection of epidemics, more and quicker access to epidemiological information 12012 Winston Hide, The Promise of Big Data, Harvard Public Health
  • 22. Commission action on Big Data •BIG-project: multi-sectorial initiative started in 2011 to promote adoption of earlier waves of big data technology and contribute to EU competitiveness; •Green paper on mHealth: to assess market and further clarify what is needed in the legal framework concerning mHealth •Study in health program: to assess the usages and adoption of big data programs for (public) health systems within the EU.
  • 23. Strategic Vision of Big Data in US
  • 24.
  • 25. How Is U.S. Responding? National Institute of Standards an Technology (NIST) NIST is an agency of the U.S. Department of Commerce. To search federal science and technology web sites, including online databases see: science.org NIST program questions: Public Inquiries Unit: (301) 975-NIST (6478), Federal Relay Service (800) 877-8339 (TTY). NIST, 100 Bureau Drive, Stop 1070, Gaithersburg, MD 20899-1070 Technical website questions: DO-webmaster@nist.gov
  • 26. NIST Big Data Public Working Group Big Data PWG Overview Presentation September 30, 2013 Wo Chang, NIST Robert Marcus, ET-Strategies Chaitanya Baru, UC San Diego
  • 27. Agenda • Why Big Data? Why NIST? • NBD-PWG Charter • Overall Workplan • Subgroups Charter and Deliverables – Use Case and Requirements SG – Definitions and Taxonomies SG – Reference Architecture SG – Security and Privacy SG – Technology Roadmap SG • Next Steps 9/30/13 NBD-PWG Overview 28
  • 28. Why Big Data? Why NIST? • Why Big Data? There is a broad agreement among commercial, academic, and government leaders about the remarkable potential of “Big Data” to spark innovation, fuel commerce, and drive progress. • Why NIST? (a) Recommendation from January 15 -- 17, 2013 Cloud/Big Data Forum and (b) A lack of consensus on some important, fundamental questions is confusing potential users and holding back progress. Questions such as: – What are the attributes that define Big Data solutions? – How is Big Data different from the traditional data environments and related applications that we have encountered thus far? – What are the essential characteristics of Big Data environments? – How do these environments integrate with currently deployed architectures? – What are the central scientific, technological, and standardization challenges that need to be addressed to accelerate the deployment of robust Big Data solutions? NBD-PWG is being launched to address these questions and is charged to develop consensus definitions, taxonomies, secure reference architecture, and technology roadmap for Big Data that can be embraced by all sectors. 9/30/13 NBD-PWG Overview 29
  • 29. NBD-PWG Deliverables Working Drafts version 1.0 for 1. Big Data Definitions 2. Big Data Taxonomies 3. Big Data Requirements 4. Big Data Security and Privacy Requirements 5. Big Data Architectures White Paper Survey 6. Big Data Reference Architectures 7. Big Data Security and Privacy Reference Architectures 8. Big Data Technology Roadmap 9/30/13 NBD-PWG Overview 30
  • 30. NBD-PWG Workplan 9/30/13 NBD-PWG Overview 31
  • 31. Big Data Ecosystem in One Sentence • Use Clouds running Data Analytics Collaboratively processing Big Data to solve problems in X-Informatics ( or e-X) • X = Astronomy, Biology, Biomedicine, Business, Chemistry, Climate, Crisis, Earth Science, Energy, Environment, Finance, Health, Intelligence, Lifestyle, Marketing, Medicine, Pathology, Policy, Radar, Security, Sensor, Social, Sustainability, Wealth and Wellness with more fields (physics) defined implicitly • Spans Industry and Science (research) • Education: Data Science see recent New York Times articles • http://datascience101.wordpress.com/2013/04/13/new-york-times-data- science-articles/ 32
  • 33. Big Data Definition • More consensus on Data Science definition than that of Big Data • Big Data refers to digital data volume, velocity and/or variety that: – Enable novel approaches to frontier questions previously inaccessible or impractical using current or conventional methods; and/or – Exceed the storage capacity or analysis capability of current or conventional methods and systems; and – Differentiates by storing and analyzing population data and not sample sizes. – Needs management requiring scalability across coupled horizontal resources 34
  • 34. Vendor-neutral and Technology-agnostic Proposals Data Processing Flow M0039 Data Transformation Flow M0017 IT Stack M0047 35
  • 35. Data Processing Flow M0039 Data Transformation Flow M0017 IT Stack M0047 36 Vendor-neutral and Technology-agnostic Proposals
  • 36. Data Processing Flow M0039 IT Stack M0047 Data Transformation Flow M0017 37 Vendor-neutral and Technology-agnostic Proposals
  • 37. Vendor-neutral and Technology-agnostic Proposals Data Transformation Flow M0017 IT Stack M0047 Data Processing Flow M0039 38
  • 38. Electronic Medical Record (EMR) Data I • Application: Large national initiatives around health data are emerging, and include developing a digital learning health care system to support increasingly evidence-based clinical decisions with timely accurate and up-to- date patient-centered clinical information; using electronic observational clinical data to efficiently and rapidly translate scientific discoveries into effective clinical treatments; and electronically sharing integrated health data to improve healthcare process efficiency and outcomes. These key initiatives all rely on high-quality, large-scale, standardized and aggregate health data. One needs advanced methods for normalizing patient, provider, facility and clinical concept identification within and among separate health care organizations to enhance models for defining and extracting clinical phenotypes from non-standard discrete and free-text clinical data using feature selection, information retrieval and machine learning decision-models. One must leverage clinical phenotype data to support cohort selection, clinical outcomes research, and clinical decision support. 40 PP, Fusion, S/Q, Index Parallelism Streaming over EMR (a set per person), viewers
  • 39. Electronic Medical Record (EMR) Data II • Current Approach: Clinical data from more than 1,100 discrete logical, operational healthcare sources in the Indiana Network for Patient Care (INPC) the nation's largest and longest-running health information exchange. This describes more than 12 million patients, more than 4 billion discrete clinical observations. > 20 TB raw data. Between 500,000 and 1.5 million new real-time clinical transactions added per day. • Futures: Teradata, PostgreSQL and MongoDB supporting information retrieval methods to identify relevant clinical features (tf-idf, latent semantic analysis, mutual information). Natural Language Processing techniques to extract relevant clinical features. Validated features will be used to parameterize clinical phenotype decision models based on maximum likelihood estimators and Bayesian networks. Decision models will be used to identify a variety of clinical phenotypes such as diabetes, congestive heart failure, and pancreatic cancer. 41
  • 40. Pathology Imaging/ Digital Pathology I • Application: Digital pathology imaging is an emerging field where examination of high resolution images of tissue specimens enables novel and more effective ways for disease diagnosis. Pathology image analysis segments massive (millions per image) spatial objects such as nuclei and blood vessels, represented with their boundaries, along with many extracted image features from these objects. The derived information is used for many complex queries and analytics to support biomedical research and clinical diagnosis. 42 MR, MRIter, PP, Classification Streaming Parallelism over Images
  • 41. Pathology Imaging/ Digital Pathology II • Current Approach: 1GB raw image data + 1.5GB analytical results per 2D image. MPI for image analysis; MapReduce + Hive with spatial extension on supercomputers and clouds. GPU’s used effectively. Figure 3 of section 2.12 shows the architecture of Hadoop-GIS, a spatial data warehousing system over MapReduce to support spatial analytics for analytical pathology imaging. 43 • Futures: Recently, 3D pathology imaging is made possible through 3D laser technologies or serially sectioning hundreds of tissue sections onto slides and scanning them into digital images. Segmenting 3D microanatomic objects from registered serial images could produce tens of millions of 3D objects from a single image. This provides a deep “map” of human tissues for next generation diagnosis. 1TB raw image data + 1TB analytical results per 3D image and 1PB data per moderated hospital per year. Architecture of Hadoop-GIS, a spatial data warehousing system over MapReduce to support spatial analytics for analytical pathology imaging
  • 42. 18: Computational Bioimaging • Application: Data delivered from bioimaging is increasingly automated, higher resolution, and multi-modal. This has created a data analysis bottleneck that, if resolved, can advance the biosciences discovery through Big Data techniques. • Current Approach: The current piecemeal analysis approach does not scale to situation where a single scan on emerging machines is 32TB and medical diagnostic imaging is annually around 70 PB even excluding cardiology. One needs a web-based one-stop-shop for high performance, high throughput image processing for producers and consumers of models built on bio-imaging data. • Futures: Goal is to solve that bottleneck with extreme scale computing with community-focused science gateways to support the application of massive data analysis toward massive imaging data sets. Workflow components include data acquisition, storage, enhancement, minimizing noise, segmentation of regions of interest, crowd-based selection and extraction of features, and object classification, and organization, and search. Use ImageJ, OMERO, VolRover, advanced segmentation and feature detection software. 44 MR, MRIter?, PP, Classification Streaming Parallelism over Images
  • 43. 22: Statistical Relational Artificial Intelligence for Health Care • Application: The goal of the project is to analyze large, multi-modal medical data including different data types such as imaging, EHR, genetic and natural language. This approach employs the relational probabilistic models that have the capability of handling rich relational data and modeling uncertainty using probability theory. The software learns models from multiple data types and can possibly integrate the information and reason about complex queries. Users can provide a set of descriptions – say for instance, MRI images and demographic data about a particular subject. They can then query for the onset of a particular disease (say Alzheimer’s) and the system will then provide a probability distribution over the possible occurrence of this disease. • Current Approach: A single server can handle a test cohort of a few hundred patients with associated data of 100’s of GB. • Futures: A cohort of millions of patient can involve petabyte datasets. Issues include availability of too much data (as images, genetic sequences etc) that complicate analysis. A major challenge lies in aligning the data and merging from multiple sources in a form that can be made useful for a combined analysis. Another issue is that sometimes, large amount of data is available about a single subject but the number of subjects themselves is not very high (i.e., data imbalance). This can result in learning algorithms picking up random correlations between the multiple data types as important features in analysis. MRIter, EGO Streaming Parallelism over People and their EMR 45
  • 44. El paradigma P4 de la Medicina PREDICTIVO PREVENTIVO PERSONALIZADO PARTICIPATIVO
  • 45. El paradigma V4 en Big Data Medicina V-OLUME V-ARIETY V-ELOCITY V-ALUE
  • 46. Big Data en Neuroimagen
  • 47.
  • 48. human neuroimaging is now, officially, a “big data” science • Among the examples of “big data” featured at the meeting was – no surprise - human neuroimaging • The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative • Initiatives surrounding large-scale brain mapping are also underway in Europe http://www.humanbrainproject.eu • Organization for Human Brain Mapping (OHBM; http://www.humanbrainmapping.org)
  • 49. How Big is “Big”? • While size is a relative term when it comes to data, medical imaging applied to the brain comes in a variety of forms which each generating differing types and amounts of information about neural structure and/or function. • NeuroImage, indicates that since 1995 the amount of data collected has doubled approximately every 26 months. At this rate, by 2015 the amount of acquired neuroimaging data alone, discounting header information and before more files are generated during data processing and statistical analysis, may exceed an average of 20GB per published research study
  • 50. Growth of Neuroimaging Study Size 20000 15000 10000 5000 0 1990 1995 2000 2005 2010 2015 2020 MegaBytes Year Expected Observed Predicted Van Horn and Toga (in press) Brain Imaging and Behavior
  • 51. Kryder’s law: Exponential Growth of Data VOLUME OF DATA MB = MEGABYTE = 106, GB = GIGABYTE = 109 TB = TERABYTE = 1012, PB = PETABYTE = 1015 COMPUTE POWER CPU TRANSISTOR COUNTS MOORE’S LAW YEARS SINGLE CRYO BRAIN VOLUME 1600 CM2 NEUROIMAGING (ANNUALLY) GENOMICS (BP/YR) Voxel Resolution Gray Scale Color 200 GB 10 MB 1x105 1985-1989 Size Count 8bits 16bits 24bits 1 TB 100 MB 1x106 1990-1994 1cm 12x15x9 1620 3000 4860 50 TB 10 GB 5x106 1995-1999 1mm 120x 150x90 1.62 MB 3.24 MB 4.86 MB 250 TB 1TB 1x107 2000-2004 100 μm 1200x 1500x900 1.62 GB 3.24 GB 4.86 GB 1 PB 30TB 8x106 2005-2009 10 μm 12000x 15000x 9000 1.62 TB 3.24 TB 4.86 TB 5 PB 1 PB 1x109 2010-2014 1 μm 120000x 150000x 90000 1.62 PB 3.24 PB 4.86 PB 10+ PB 20+ PB 1x1011 2015-2019 (estimated)
  • 52. Big Neuroimaging + Big Genetics = REALLY Big Data • With the ability to obtain genome-wide sets of single nucleotide polymorphism (SNP) information becoming routine and the costs of full genomic sequencing rapidly becoming affordable. • Next Generation Sequencing (NGS) methods, for major brain imaging studies such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Weiner, Veitch et al. 2012), with its initially available sample of 832 subjects. • As the bond between neuroimaging and genomics grows tighter, with both areas growing at incredible rates, disk storage, unique data compression techniques
  • 53. Multisite Consortia and Data Sharing • Examples of multisite neuroimaging efforts can be found in the ubiquitous application of neuroimaging in health but also in devestating illnesses such as: • Parkinson’s (Evangelou, Maraganore et al. 2009) • psychiatric disorders (Schumann, Loth et al. 2010) • the mapping of human brain connectivity (Toga, Clark et al. 2012 • databases of aging and aging-related diseases, largescale Autism Research (NDAR; Hall,Huerta et al. 2012) and the Federal InteracgencyTraumatic Brain Injury Research (FITBIR; Bushnik and Gordon 2012)
  • 54. Multisite Consortia and Data Sharing • The various “grass roots” collections of resting-state fMRI data maintained as part of the “1000 Functional Connectomes” project • http://fcon_1000.projects.nitrc.org/ (see Biswal, Mennes et al. 2010) • Task-based OpenfMRI http://www.openfmri.org (Poldrack, Barch et al. 2013) are other notable examples.
  • 55.
  • 56. The Role of Cyberinfrastructure • Individual desktop computers are now no longer suitable for analyzing potentially petabytes worth of brain and genomics data at a time. • While the National Science Foundation (NSF) has made major investments in the computer architecture needed for physics, weather, and geological data. • Eg. XSEDE, https://www.xsede.org/ , and Open Science Grid, https://www.opensciencegrid.org
  • 57. The Role of Cyberinfrastructure • The Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC; http://www.nitrc.org ) • The International Neuroinformatics Coordinating Facility (INCF; http://incf.org ) Have begun to deploy local clusters with Amazon EC2 server technology toward this goal but a larger effort will be required involving dedicated processing centers or distributed grids of linked compute centers.
  • 58. Many 1,000’s of Software Tools • Acquisition, processing, storage/DB, service, migration, mining, analysis, visualization, annotation, … “(data-driven) process understanding” • Biomedical Imaging – There are 100’s of different types of image processing algorithms and filters – For each type of process there may be dozens of concrete software products (instance implementations) • (Example) Neuroimaging – NITRC lists > 500 openly shared software tools – For each openly shared tool there may be dozens of proprietary or less commonly used analogues
  • 59. Millions of Dispersed Hardware Devices • Cisco: "By the end of 2012, the number of mobile-connected devices will exceed the number of people on Earth” • There will be over 10 billion mobile-connected devices in 2016; i.e., there will be 1.3 mobile devices per capita – These include phones, tablets, laptops, handheld gaming consoles, e-readers, in-car entertainment systems, digital cameras, and “machine-to-machine modules” • DBs, Clients, Servers, Compute-Nodes, Web-Services, Interfaces, … • Solution … Dinov et al., BMC 2011
  • 60. Image spatial alignment Slice timing adjustment Van Horn et al., Nature Neuro, 2004 Statistical modeling (e.g. GLM) Functional – structural co-registration Raw fMRI time series High-resolution anatomical image Standardized brain atlas template Image smoothing Gaussian spatial filtering Experimental design matrix Study Meta Data Scanner protocols Subject demographics Stimulus timing etc. Spatial normalization to atlas space Statistical results maps Graphical overlays Table of statistically significant voxels in atlas space coordinates
  • 61. Pipeline Version 5.9.1 Features Graphical Programming Environment 11/17/2014 64
  • 62. Perfect Neuroimaging-Computation Storm? • Single Subject Studies (N=1) – Genetics: • Depending on Coverage(X) • Whole Genome Seq Data > 320GB (>80X) • Require 2+ TB RAM, and 100+ hrs CPU – Imaging: • Depending on protocols • 40-512 gradient directions Diffusion imaging data • Raw (multimodal) Neuroimaging Data > 10 GB • Derived Data > 100 GB • Require 100GB RAM and 70+ hrs CPU • Large Subject Studies – Cohort studies (N>10, Typically N~100’s) – Multi-Institutional Population-wide Studies (N>1,000) – Longitudinal (neuroimaging) studies …
  • 63. From Biomedical Challenges to Modeling, Computation, Tools and Curricular Training • Quantitative Volumetric and Surface based Stats Analyses – Interactome: Challenge↔Models↔Data Analysis↔Computation↔Education – Statistics Online Computational Resource Che, et al., JSS (2009) No effect Marginal Significant
  • 64. Grid & Cloud Computing • UCLA Grids Cerebro Medulla  1,200 cores  1.4TB RAM  12,000 jobs/day  700 users • Amazon Cloud  4,300 cores  9.6 TB RAM  (new) – EC2 (Elastic Cloud Computing) – S3 (Simple Storage Service) • UC Grid • Globus GridFTP • INI Cluster @ USC – 3328 cores, 128GB RAM per 16 cores, 26tb aggregate memory space. Connectivity is 5Gbit per 16 cores, roughly 4terabit aggregate on comp and another 4.3Tbit on the storage. 2.43PB of online storage with over 50TB of SSD accelerating it currently.
  • 65. Neuroimaging Applications: 56-ROI Global Shape Analysis (NC vs. IBS/Pain) Group Effects Data Workflow Protocol Results Structural T1 data NC IBS 221 107 Mean-Curvature between-group differences in: L_cuneus R_angular_gyrus Left View Right View
  • 66. Neuroimaging Applications: Stat Mapping of Cortical GM Thickness (Group Effects) Results Left Anterior Insula Data Workflow Protocol Structural T1 data Cortical Models 1.0 P-value 0.0
  • 69. Big Data y el sector de la Salud en Imagen Poblacional • Según Bonnies Feldman “el potencial de Big Data en medicina reside en la posibilidad de combinar los datos tradicionales con otras nuevas formas de datos, tanto a nivel individual como Poblacional” • El potencial del Big Data indica que se pueden producir ahorros en el sector sanitario a través de varias vías: – Transformación de datos en información. – Apoyo al autocuidado de las personas. – Aumento del conocimiento. – Concienciación del estado de salud. • El Big Data es una metodología de acceso abierto para integrar diferentes tipos de datos en imagen poblacional, cuantificación de imagen y extracción de características.
  • 70. Tipos de Estudios • Individual • Longitudinal 0 1 2 M • Transversal
  • 71. Estudios Poblacionales • Estudios Poblacionales – Si no se forman grupos en la población, se calcula la media del parámetro o parámetros. – Si se forman grupos (control y Patológicos) se debe realizar un contraste de hipótesis. • Modelado Poblacional – Modelar la degeneración volumétrica de sustancia gris y sustancia blanca – Establecer parámetros de degeneración – Contrastar el estado de un individuo con respecto a dicho modelo.
  • 73. Aplicación a Casos Reales Resultados de parámetros globales
  • 74. Aplicación a Casos Reales Resultados de grosor y volumen por estructura, junto con los valores de referencia
  • 75. Aplicación a Casos Reales • Representación de la diferencia del volumen en comparación con la población
  • 76. ¿Porqué no podemos combinar BELLEZA Y CIENCIA?
  • 77. Objetivos BIMCV • Desarrollar e implementar estrategias para prevenir o tratar efectivamente las enfermedades mediante una infraestructura de investigación en imagen asociada a grandes estudios poblacionales de imagen. – Concepto de “Population Imaging”. • Proporcionar datos, herramientas y recursos de proceso para realizar estudios avanzados en imagen.
  • 80. Segmentación no supervisada de Glioblastomas
  • 81. GIBI230 Luis Martí-Bonmatí Fernando Aparici Alexandre Pérez Roberto Sanz Carlos Infantes Jose María Salinas Cayetano Hernández NEuro-Bioimaging VLC Mariam de la Iglesia IBIME Juan M García-Gómez Elies Fuster Javier Juan-Albarracín
  • 82. BIMCV
  • 83. Nodo Valenciano Euro-BioImaging Infraestructura Europea para la Investigación en Tecnologías de Imagen Biomédica e Imagen Biológica. Un proyecto sobre la hoja de ruta de las ESFRI en infraestructuras de investigación www.eurobioimaging.eu
  • 84. EIBIR key facts and daily work In the service of research, EIBIR offers to its Network Members: - Multidisciplinary networking - Project Management - Research communication - Research Training - Meeting organisation EIBIR Office • Established in 2006 • Staff: 4.5, incl. 3 Project Managers, 1 assistant • Provision of services to Network Members + EIBIR bodies • Monitoring European Affairs + research funding opportunities • Project management and coordination • Information activities and media work • Promotion of Network Membership • Website and data base updates • Congress activities • Scientific Advisory Board
  • 85. Cronología & Financiación 88 2013 - 2017 Fase de Construcción • Evaluación & selección de nodos. • Construccion de los nodes. Financiado por los Estados Miembros (¿MINECO?) 2010 - 2013 Fase Preparatoria • Framework • Definición de los criterios de elegibilidad para los nodos • Llamada a los Nodos, Abierta. Financiado por CE ……… 2017 - …. Fase Operacional • Acceso y formación • Tecnología y evaluación para mejorar el servicio Financiado por los Estados Miembros & EC
  • 86. MULTIMODAL TECHNOLOGY NODE Imaging Infrastructure with open user access European life scientists as users FLAGSHIP NODE FLAGSHIP NODE FLAGSHIP NODE FLAGSHIP NODE USER TRAINING STAFF TRAINING Web-access portal Data storage and analysis infrastructure User returns with results for publication NODES HUB MULTIMODAL TECHNOLOGY NODE
  • 87.
  • 88. 1st Open Call Euro-BioImaging Nodes – Expression of Interest The 1st Open Call: 1 February – 30 April 2013 • Multi-Modal Molecular Imaging • Phase contrast Imaging • High-field MRI • MR-PET • Population Imaging • Data Infrastructure: Challenges Framework • The biological imaging community will call for EoIs in 6 technologies
  • 90. Resultados 1ª Convocatoria Biological Imaging Biomedical Imaging 9 NODOS ESPAÑOLES – 18 Instituciones –
  • 91. MEDICAL IMAGING DATA BANK (BIMCV) Expresion of Interest: Population Imaging BIG DATA DIASEASE SIGNATURES SINGLE TECHNOLOGY FLAGSHIPS CONSORTIUM
  • 92. Evaluation summary and Final ranking • The node develops and provides access to a large database of imaging data and the associated clinical data records. • Big Data repository from hospitals in the Valencia region (5 million inhabitants living over an area of 23.255 Km2. average number of 5.3 million clinical cases per year, from 210 different imaging modalities). • The access to such data and tools will be an efficient way of advancing population imaging studies and research. • The node has ability to incorporate data from other facilities
  • 93. Services offered by the node • BIMCV facility provides a multi-level and multi-ology storage service (Vendor Neutral Archive). • CEIB-CS node integrates access to high-performance computational services from local and European infrastructures (Principe Felipe Research Centre & UPV-I3M Infrastructure). • Open access methodology to integrate different data types for population imaging, quantitative resources and feature extraction. • Comprehensive user training
  • 94. Single Technology Flagship Node – Population Imaging: Valencia Evaluation summary and Final ranking: • Requires minor improvements (training plan, actually corrected). • The node develops and provides access to a large database of imaging data and the associated clinical data records. • Big Data repository from hospitals in the Valencia region (5 million inhabitants living over an area of 23.255 Km2. average number of 5.3 million clinical cases per year, from 210 different imaging modalities). • The access to such data and tools will be an efficient way of advancing population imaging studies and research. • The node has ability to incorporate data from other facilities. Other facilities MEDICAL IMAGING DATA BANK (BIMCV) BIG DATA DIASEASE SIGNATURES Services offered by the node: • BIMCV facility provides a multi-level and multi-ology storage service (Vendor Neutral Archive). • CEIB-AVS node integrates access to high-performance computational services from local and European infrastructures (Principe Felipe Research Centre & UPV-I3M Infrastructure). • Open access methodology to integrate different data types for population imaging, quantitative resources and feature extraction. • Comprehensive user training.
  • 95. Nodo Valenciano, BIMCV Centro de Excelencia en Imagen Biomédica de la Conselleria de Sanitat Sede CEIB clínica Sede CEIB computo
  • 97. With an Architecture Well Define
  • 100. Human Neuroimaging as a “Big Data” Science The mind landscapes http://prezi.com/sseievn7ujcf/?utm_campaign=share&utm_medium=copy
  • 101. Estudio de la estructura Morfometría
  • 102. Estudio de la estructura Tractografía
  • 103.
  • 104. Estudio de la Función
  • 105. Estudio de la Función
  • 107.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
  • 118.
  • 119. 10 K Structural Modeling in Neuroimage of Valencia Region • Dos becas de la Subdirección General de Sistemas para la Salud de la CS. Ingenieros Informáticos o Ingenieros de Telecomunicaciones (DOGV 9-07-2014). • Se van a medir las estructuras principales del cerebro. • En colaboración con LABMAN • En colaboración con Brain Dynamics • La universidad del Sur de California (Jack Van Horn) • Posiblemente con IBIME (volBrain system)
  • 120.
  • 121. Prototipo de realidad Virtual Aumentada ARiBraiN3T (Para android)