SlideShare una empresa de Scribd logo
1 de 82
The use of R statistical package in
controlled infrastructure
The case of Clinical Research industry
Adrian Olszewski
Senior Biostatistician at 2KMM
22th Jun 2018
Poland • Sosnowiec
www.2kmm.eu
Polish National Group of the
International Society for Clinical Biostatistics
http://www.iscb.pl
40min
www.r-clinical-research.com
r.clin.res@gmail.com
PART I
DISCLAIMER
All trademarks, logos of companies and names of products
used in this document
are the sole property of their respective owners
and are included here for informational, illustrative purposes only,
which falls within the nominative fair use.
This presentation is based exclusively on information
publicly available on the Internet under provided hyperlinks.
If you believe your rights are violated, please email me: r.clin.res@gmail.com
Agenda
► Quick introduction to R
o Description
o History. Events important for the use of R in EBM*
o Who uses R?
* Evidence-Based Medicine 3
Agenda
► R in Evidence-Based Medicine
o Capabilities
o A brief overview of common tasks
o Cooperation and compliance with SAS
o www.r-clinical-research.com or CRAN Task Views
4
Agenda
► R in Clinical Research
o Status of R on the Clinical Research market
o Myths and Facts
o What does FDA say?
o What does it mean „to validate”? Why do we want this?
o Preparing R to enter the industry
5
Agenda
► Validation
o Validation of installation vs. numerical validation
o Numerical validation
o Methods
o Reference data
► Fixing the environment and controlling for changes
► How does R support the creation of a controlled environment?
6
Agenda
► Conclusions
► Does it work?
► Q&A
7
Quick introduction to R ► Description
R is an open-source software environment, widely used in scientific world for:
statistical computing
data manipulation
data presentation
and other general programming tasks
https://www.r-project.org
𝒙
𝒏
It’s also the name of a high-level, Turing-complete, interpreted, multi-paradigm
programming language used within the environment.
8
Quick introduction to R ► Description
Short characteristics:
► Description computational environment + programming language
► Developer R Development Core Team
► Operating systems cross-platform: Windows, Unix, Linux, OS X, mobile: Android, Maemo, Raspbian
► Form command line + third-party IDEs and editors
► Infrastructure R core library + shell + libraries (base and third-party)
► Model of work 1) standalone application, 2) standalone server, 3) server process
► Programming language Turing-complete, domain-specific, interpreted, high-level with dynamic typing
► Paradigm
1) array, 2) object-oriented (S3, S4, R5, R6 models), 3) imperative, 4) functional,
5) procedural, 6) reflective
► Source of libraries mirrored repository – CRAN, users' sites, third-party repositories (Github, RForge)
► License of the core GNU General Public License ver. 2
► License of libraries 99.9% open-source. 0.1% is licensed (free for non-commercial use)
model <- lm(y ~ x1 * x2)
9
Quick introduction to R ► Description
The basic GUI on Windows
10
Quick introduction to R ► Description
Advanced IDE – RStudio
11
Quick introduction to R ► Description
Advanced IDE – Microsoft Visual Studio
12
Quick introduction to R ► History
1976 1998
1993
R was born
1997
R Core Team
was formed
1988
S-PLUS was born
Statistical Sciences, Inc.
R. Douglas Martin
University of Washington
Univ. of Auckland
Ross Ihaka, Robert Gentleman
1980
First
commercial
release
via AT&T
1988
New S Language
First statistical system to receive the
Software System Award, the top
software award from the Association
for Computing Machinery
The last version
2008
IC acquired
1993
S code boguht
for $2 mln
2004
Exclusive license
to develop and sell
the S language
20072003
R Foundation
was formed
R Consortium
was founded
First release
CRAN
S was born
Bell Laboratories
Rick Becker,
Allan Wilks,
John Chambers
from Bell Labs Insightful Corporation
from AT&T  Lucent
TIBCO
2013
TERR - TIBCO Enterprise Runtime for R
2007
Revolution was born
Revolution Analytics
2015
R Open was born
Microsoft
2008
R Enterprise
Oracle
2015
Revolution
acquired
by Microsoft
v 1.0.0
2000
TIBCO Spotfire
13
1997 The first release of R FDA 21 CFR Part 11 CRAN
1998 nlme
1999 FDA „Off-The-Shelf Software Use in Medical Device”
2000 xtable
2001 DBI  survival
2002 multcomp FDA „General Principles of Software Validation – Final” Bioconductor
2003 lme4  nlmeODE The R Core Team
2004
2005 drc (Dose-Response)  PKfit  PK  ggplot2  ROCR
2006 gsDesign  meta  mice  tdm  ivivc  blockrand  pwr
2007 SASxport  Rtools
"Using R: Perspectives of a FDA Statistical RevieweR„
"R - Regulatory Compliance and Validation Issues"
"Use of R in C.T. & Industry-Sponsored Medical Res. "
"Op. Sour. Stat. Soft. in Pharma Developm.: A case study with R"
The R Foundation
2008 MCPMod  bear  rjags  epiR  plyr  DanteR
2009 SAS IML studio supports R  SAS7bdat  metafor  gamm4
2010 PKGraph  pROC  oro.nifti  oro.dicom  PowerTOST
2011 RStudio  Detools  ggbio  RISmed rplos
2012 Shiny  knitr  Pmetrics  TrialSize  stargazer  OpenCPU FDA: „Sponsors may use R in their submissions”
2013 cpk The SAS® versus R Debate in Industry and Academia
2014
Tidyverse  ValidR  Checkpoint  Packrat  Rmarkdown 
rclinicaltrials  pubmed.miner  ReporteRs  greport  dplyr
MRAN
2015 rxODE  gfd  ThreeArmedTrials  randomizeR FDA: „Statistical Software Clarifying Statement” The R Consortium
2016 R Tools for Visual Studio  rankFD The R Epid. Cons.
2017 dfpk - Bayesian Dose-Finding Designs  officer
2018 Mediana - general framework for CT simulations
Quick introduction to R ► History ► (few) Events important for the use of R in EBM
14
Quick introduction to R ► Who uses R?
Medicine and Pharmacy Other Business & Science Tycoons
► American Express
► Bank of America
► BBC
► Capgemini
► Deloitte
► Ebay
► Facebook
► Fermi National
Accelerator Laboratory
► Ford
► Goldman Sachs
► Google
► HP
► IBM
► J.P. Morgan
► Kickstarter
► Microsoft
► Monsanto
► Mozilla
► New York Times
► NIST - National Institute of
Standards & Technology
► NOAA
► Oracle
► Twitter
► Uber
► UK Government
► Wells Fargo
► 2KMM
► Amgen
► Astra Zeneca
► Bayer
► CardioDX
► Dr. Reddy’s Laboratories
► FDA
► GCE
► KCR (2014-2017)
► Medtronic
► Merck
► Novartis
► Pfizer
► Roche
15
Quick introduction to R ► Who uses R?
The list is built based exclusively on publicly available information:
 lists of users provided by Revolution, RStudio and others
 articles (example, example) and interviews (example)
 published documents in which a name of a company is visible (example)
 job advertisements (LinkedIn, Google, PharmiWeb, etc.)
 names of companies supporting / organizing events (conferences, courses, etc)
 other sources (example)
That is to say, a logo of a company is included in the list only if there is a clear evidence that the
company uses or supports (or used or supported) R, based on information shared on the Internet –
and thus available for everyone.
Please note, that I am not aware if all listed companies are still using any version of R at the time the
presentation is being viewed. If you want me to remove your logo, please send me an mail to
r.clin.res@gmail.com
16
Quick introduction to R ► Who uses R?
17
“We use R for adaptive designs frequently because it’s the fastest tool to explore designs that interest
us. Off-the-shelf software, gives you off-the-shelf options. Those are a good first order approximation,
but if you really want to nail down a design, R is going to be the fastest way to do that.”
Keaven Anderson
Executive Director, Late Stage Biostatistics
Merck
“De facto, R is already a significant component of Pfizer core technology. Access to a supported
version of R will allow us to keep pace with the growing use of R in the organization, and provides a
path forward to use of R in regulated applications.”
James A. Rogers Ph.D.
Associate Director, Nonclinical Statistics Group
Pfizer
https://pharma-life-sciences.cioreview.com/news/gsdesign-explorer-to-optimize-merck-s-clinical-trial-process-nid-1305-cid-36.html
Google Books: Big Data for Big Pharma: An Accelerator for The Research and Development Engine?
Publicly available sources:
https://www.featuredcustomers.com/vendor/revolution-analytics-1/customers/pfizer
Publicly available sources:
Quick introduction to R ► Who uses R?
18
“We use R for all of our analysis,” says Elashoff. “I think it’s fair to say that R really is the
foundation of a lot of the work that we do.” To speed up the process without sacrificing
accuracy, the team also uses Revolution R analytic products. “We use R seven or eight
hours per day, so any improvement in speed is helpful, particularly when you’re looking at a
million biomarkers and wondering if you’ll need to re-run a million analyses.”
Open-source R packages enable the biostatisticians at CardioDX to run a broad range of
analyses, accurately and effectively, on a routine basis. Adding Revolution R products to the
mix improves processing speeds and makes it easier to crunch large data sets. Accelerating
the analytic process reduces ov erall project time, increasing the team’s efficiency. “Revolution
R is faster than regular R,” says Elashoff. “The faster we can analyze data, the less time it
takes us to build our diagnostic algorithms.”
Michael Elashoff
The company’s director of biostatistics
CardioDX
https://www.featuredcustomers.com/media/CustomerCaseStudy.document/revolution-analytics-1_cardiodx_8284.pdf
Publicly available sources:
Quick introduction to R ► Who uses R?
19
“We use R for all of our analysis,” says Elashoff. “I think it’s fair to say that R really is the
foundation of a lot of the work that we do.” To speed up the process without sacrificing
accuracy, the team also uses Revolution R analytic products. “We use R seven or eight
hours per day, so any improvement in speed is helpful, particularly when you’re looking at a
million biomarkers and wondering if you’ll need to re-run a million analyses.”
Open-source R packages enable the biostatisticians at CardioDX to run a broad range of
analyses, accurately and effectively, on a routine basis. Adding Revolution R products to the
mix improves processing speeds and makes it easier to crunch large data sets. Accelerating
the analytic process reduces ov erall project time, increasing the team’s efficiency. “Revolution
R is faster than regular R,” says Elashoff. “The faster we can analyze data, the less time it
takes us to build our diagnostic algorithms.”
Michael Elashoff
The company’s director of biostatistics
CardioDX
https://www.businesswire.com/news/home/20110118006656/en/CardioDX-Revolution-Analytics-Develop-Non-Intrusive-Test-Predicting
Publicly available sources:
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
descriptive analysis
& summarizing
20
errors-in-variables modeling
comparison of methods
Deming, Passing-Bablock, Bland-Altman
time-to-event / survival
Kaplan-Meier, Nelson-Aalen,
Cox regression, Weibull
design of experiments
parallel, cross-over, adaptive,
group-sequential, multi-arm
ROC analysis
categorical data
analysis
planned
& post-factum analysis
advanced plotting
sample size & power
meta-analysis
non-inferiority
superiority
(bio) equivalence
PK, PD,
Dose-Response
randomization
repeated measures &
longitudinal trials
parametric / non-parametric
modeling
(non) parametric (non) linear models
with mixed effects
resampling
bootstrap, permutation, exact
factorial design analysis
parametric / non-parametric
robust methods
regularized, M-estimators
detection of outliers
univariate / multivariate
missing data imputation
*OCF, kNN, LI, MI, censored (KM)
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
21
logging processes
pure ascii, html, pdf, doc
cooperation with SAS interoperability
.NET, Java, Scala, Python, C++, Fortran,
PHP, Perl, DDE, COM, TCP, WebServices
accessing registers
clinicaltials.gov, PubMed, PLOS
accessing databases
ODBC, JDBC, Oracle, MS SQL, MySQL,
dBase, PostgreSQL, SQLite, DB/2,
Informix, Firebird, H2, MongoDB, more…
reproducible researchproducing documents
doc(x), ppt(x), pdf, rtf, odf, ps
exchanging data
Excel, OO Calc, GNumeric, SPSS, Weka,
Systat, Stata, EpiInfo, SAS, SAS XPT,
Minitab, Octave, Matlab, DBF, CSV, XML,
HTML, JSON, DICOM, NIFTI
production tools and
unit testing
GUI desktop
& server applications
interactive
presentations
advanced data
querying &
transforming
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
22
 Descriptive stats
Data review 
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
23
 Linear regression
 ANOVA
 post-hoc 
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
24
 GLM modelling 
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
25
 NLM modelling 
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
26
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
27
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
28
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
29
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
30
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
31
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
32
R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks
33
R in Evidence-Based Medicine ► Capabilities ► Cooperation & compliance with SAS
34
SAS and R Team in Clinical Research (Adrian Olszewski)
Differences in:
 origin of dates
 default contrasts
 used sum of squares
 calculation of quantiles
 generation of random numbers
 implementation of advanced model
 representation of floating point numbers
SAS
module #1
SAS
module #2
Missing or
expensive
functionality
or different
method of
communication
SAS IML
SAS baseRequired algorithm or functionality
1
𝑛ℎ 𝑑
𝑖=1
𝑛
𝑥 − 𝑥𝑖
ℎ
Bi-directional
communication
Agenda
► R in Clinical Research
o Status of R on the Clinical Research market
o Myths and Facts
o What does FDA say?
o What does it mean „to validate”? Why do we want this?
o Preparing R to enter the industry
35
R in Clinical Research ► Status of R on the Clinical Research market
36
 In clinical research, however, SAS reigns par excellence
 In general bioscience and academia, S  R has built over years its
position of one of the industry standards
 Pharmaceutical companies, CROs and even FDA do use R “internally”.
But they resist (or hesitate) to use it in submissions (to FDA).
 Clinical Programmer or Biostatistician ≝ SAS Programmer. Period.
OK, but how did it come to this?
R in Clinical Research ► Status of R on the Clinical Research market
37
We can only speculate on why so often R users are told the mantra:
Too many myths have accumulated, but we cannot ignore the facts.
R in Clinical Research ► Myths and Facts
38
Facts Myths / objections
 FDA requires software to be validated
 FDA demands SAS for both the analysis and producing
datasets. No other software is allowed.
 R is not validated out-of-the-box  R cannot generate datasets in SAS Transport format
 R doesn’t facilitate the creation of CDISC datasets
 R cannot cooperate with SAS, including reading and writing
SAS binary files
 R doesn’t have a metadata layer
 R doesn’t have paid hot-line  R cannot be validated as well as commercial software
 Nobody takes the responsibility if something goes
wrong
 Commercial software doesn’t have errors
 Packages change over time. What works today, may
not work tomorrow. Packages happen to be removed
 R is full of bugs (errors) as nobody controls it
 Validation of a software is challenging and time-
consuming process, so not everyone can afford.
 R poorly supports the generation of TFLs
R in Clinical Research ► Myths and Facts
39
Facts Myths / objections
 Errors happen often in non-commercial software  R is limited in terms of implemented statistical methods
 Announcing errors publicly doesn’t make people calm
 Nobody uses R (or Open Source in general) in pharma
industry (or in “serious business’). Maybe in academia, which
is not a kind of a serious business.
 FDA: “Results should be reproducible and
independent of the software used to derive them”
 R doesn’t meet 21 CFR Part 11, which is a must
 Creators of R packages don’t have to provide (good)
unit tests. It’s king of a good will.
 R has no SAS-like “LOG”, which records everything
 There is no commercial support (product and/or validation)
 The entire software (all packages and functions) must be
validated
 Commercial software releases the end-user from any
responsibility regarding the validation.
R in Clinical Research ► Myths and Facts
40
Facts Myths
Who is right and…
…is it possible to use R in controlled environment?
R in Clinical Research ► Myths and Facts
41
First, let us briefly address all points in the “table of shame”. Facts first.
 FDA requires software to be validated
 Yes. This is mandatory process. And that’s good! It protects not
only the sponsor from serious troubles but also the patients!
 R is not validated out-of-the-box
 Yes, the official release is not guaranteed to be errors-free. A
disclaimer note confirming that is displayed every time R is
launched. But the validation is fully possible.
 R doesn’t facilitate the creation of CDISC
datasets
 True. There is no easy GUI tools to map fields between CDASH
and SDTM or easy-to-use ways to generate define.xml
 R doesn’t have a metadata layer
 Partially true. R supports attributes on every level of a data
structure. With a few effort it can be implemented effectively. I
plan to release a package allowing datasets to be annotated and
printed in line with the assigned formats.
 R doesn’t have paid hot-line
 True. This is not a commercial project. But the R community is
vibrant and provides giant amount of knowledge (Stack, Github)
R in Clinical Research ► Myths and Facts
42
First, let us briefly address all points in the “table of shame”. Facts first.
 Nobody takes the responsibility if something
goes wrong
 True. There is not a commercial project. By the way, to what
extent exactly commercial companies take the responsibility? Do
you have the conditions (and $$$) written down on paper and
signed?
 Packages change over time. What works
today, may not work tomorrow. Packages
happen to be removed
 Very true. This can be effectively managed in many ways.
Addressed later in this presentation
 Validation of a software is challenging and
time-consuming process, so not everyone
can afford.
 Very true. Time is money. One has to analyze the profitability and
then make a decision.
R in Clinical Research ► Myths and Facts
43
First, let us briefly address all points in the “table of shame”. Facts first.
 Errors happen often in non-commercial
software
 That is true. There is no “paid testers”, only volunteers. It does
not mean at all they perform any worse, but also does not make
any guarantee they perform well.
 Errors happen in every software, including commercial. Even in
the top-quality medical devices (FDA recalls that in their
guidelines), nuclear devices (Therac-25 medical accelerator
case), power plants, space rockets, and even in Martian Rover or
Mariner I space probe.
 There is no error-free software. There is only software testes not
well enough.
 Announcing errors publicly doesn’t make
people calm
 Well, that is true. But hiding issues doesn’t make them less
dangerous.
 “Transparency” is the most reliable way of cooperating with
software users. Programmers or end-users publicly announce
errors so the whole community can learn about that and react
quickly. Nothing is hidden, all the more so as this is Open Source.
How often are you getting informed about errors in your favorite
software with full details and the source code?
R in Clinical Research ► Myths and Facts
44
First, let us briefly address all points in the “table of shame”. Facts first.
 FDA: “Results should be reproducible and
independent of the software used to derive
them”
 That is true. Results may differ between statistical packages. A
little – but still.
 If FDA uses SAS for checking, we may get into trouble in case of
resampling methods even with the same seed set.
 Creators of R packages don’t have to provide
(good) unit tests. It’s king of a good will.
 Yes. Even if forced to write tests, nobody can guarantee the tests
are defined properly and bring any advantage.
R in Clinical Research ► Myths and Facts
Now myths.
 FDA demands SAS for both the analysis and
producing datasets. No other software is
allowed.
 No. FDA has never claimed that. This myth is so often repeated,
so FDA issued an official “Software “Clarifying Statement”
 R cannot generate datasets in SAS Transport
format
 False. R can generate XPT using SASxport package.
 The SAS Transport Format is an open format and published by
SAS Institute long time ago:
1. https://www.loc.gov/preservation/digital/formats/....
2. http://documentation.sas.com....
 R cannot cooperate with SAS, including
reading and writing SAS binary files
 False. R can be combined with SAS in may ways. Check this out:
https://www.quora.com/How-can-I-integrate-SAS-with-R
 SAS enabled direct communication between R and SAS in the
IML module in 2009.
 R can read SAS7 binary data files and both read/write XPT files.
 R cannot be validated as well as commercial
software
 False. R can be validated no worse. In fact there is at least one
company offering validated version of R – Mango.
 Commercial software doesn’t have errors  Facts deny this claim evidently.
 R is full of bugs (errors) as nobody controls it
 Errors happen in third-party packages. No trace of increased
reporting of bugs has had a place
 R poorly supports the generation of TFLs
 False. There are packages for creation of advanced graphs
(ggplot2), Word documents, OpenDocument files, RTF and PDF.
All tasks can be automatized since R is a programming language.
R in Clinical Research ► Myths and Facts
 R is limited in terms of implemented statistical
methods
 We have just seen how rich is the R statistical library. This is the
most complete library after SAS (plus few routines more)
 Nobody uses R (or Open Source in general)
in pharma industry (or in “serious business’).
Maybe in academia, which is not a kind of a
serious business.
 False. We have just seen few slides ago, that pharmaceutical
companies do use R.
 Not to mention the non-clinical representatives of a “serious
business”.
Now myths.
R in Clinical Research ► Myths and Facts
 R doesn’t meet 21 CFR
Part 11, which is a must
 Let me quote this: Whoever told you that is not well-informed. CFR Part 11 has to do
with critical software that runs medical devices and about certain primary data
management software. It does not apply to statistical analysis software. We use R all
the time in industry-sponsored and NIH sponsored clinical trials. You do not need to
seek FDA's approval. FDA accepts all comers and does not dictate software policy for
analysis. They even accept Excel and Minitab for NDAs. There are many messages
related to this in the r-help archive; please look at them.
Frank E Harrell Jr
Professor and Chair School of Medicine, Department of Biostatistics
Vanderbilt University
Source
 And this: “Records submitted to FDA, under predicate rules in electronic format [are Part
11 records]. However, a record that is not itself submitted, but is used in generating a
submission, is not a part 11 record unless it is otherwise required to be maintained under
a predicate rule and it is maintained in electronic format.”
Therefore, it is not mandated that 21 CFR Part 11 is appropriate to data analysis
software systems that are not primarily intended for storage and transmission of
electronic medical records. It remains the responsibility of an individual organization
however to define the applicability of Part 11 and validation to their systems.
R: Regulatory Compliance and Validation, 11 March 25, 2018
Source
 Formal confirmation: Statistical Software Clarifying Statement by FDA Source
Now myths.
R in Clinical Research ► Myths and Facts
 R has no SAS-like “LOG”, which records
everything
 Yes, R doesn’t have a “LOG”, but with RMarkdown (or knitr,
sweave, odfWeave) and following the Reproducible Research
paradigm, the “LOG” can easily be reproduced effortlessly.
 The generated HTML (or PDF, DOCx) document contains both
the code and corresponding results combined.
 In addition, employing a versioning system (SVN, Git) to store the
“LOG” into a repository, allows the analyst to version it and track
changes. This gives a high level of confidence.
 Less sophisticated, yet fully valid method can be implemented
with the “sink()” function.
 There is no commercial support (product
and/or validation)
 False. Mango ValidR product is a good example. Revolution also
offered paid support. This refers only to certain packages (mostly
from the “base” set)
 The entire software (all packages and
functions) must be validated
 No. It has to be done properly and sufficiently.
 Practice shows, that only the used part of R code must be
validated. If a package contains 1000 functions, while only two of
them are used, only the two functions have to be validated. If a
validated function X calls an unvalidated function Y, the results
subjected to validation is still returned by the function X under
given parameters and conditions.
Now myths.
R in Clinical Research ► Myths and Facts
 Commercial software releases the end-user
from any responsibility regarding the
validation.
 No. Let’s quote FDA: All production and/or quality system
software, even if purchased off-the-shelf, should have
documented requirements that fully define its intended use, and
information against which testing results and other evidence can
be compared, to show that the software is validated for its
intended use
Source: General Principles of Software Validation - Final Guidance
for Industry and FDA Staff
Now myths.
R in Clinical Research ► What does FDA say?
50
Now, let us see what FDA has said about:
 The use of any software in clinical research. This is the KEY.
 The process of validation of the software
Then let us look at what some FDA-related people say about R
R in Clinical Research ► What does FDA say?
51
The use of any software in clinical research ( + 21 CFR part 11 status)
https://www.fda.gov/downloads/forindustry/datastandards/studydatastandards/ucm587506.pdf
R in Clinical Research ► What does FDA say?
52
The process of validation of the software
https://www.fda.gov/downloads/medicaldevices/.../ucm085371.pdf
[…] FDA considers software validation to be: “confirmation by examination and
provision of objective evidence that software specifications conform to user
needs and intended uses, and that the particular requirements implemented
through software can be consistently fulfilled.”
General Principles of Software Validation
Final Guidance for Industry and FDA Staff
R in Clinical Research ► What does FDA say?
53
This document […] can be applied to any software.
[…]
This document does not specifically identify which software is or is not regulated
[…]
The management and control of the software validation process should not be
confused with any other validation requirements, such as process validation for an
automated manufacturing process (so the regular validation of clinical programs don’t count)
[…]
design input requirements must be documented, and that specified requirements
must be verified
[…]
Success in accurately and completely documenting software requirements is a crucial
factor in successful validation of the resulting software.
R in Clinical Research ► What does FDA say?
54
A specification is defined as “a document that states requirements.”
[…]
There are many different kinds of written specifications, e.g., system requirements
specification, software requirements specification, software design specification,
software test specification, software integration specification, etc
[…]
Software verification provides objective evidence that the design outputs of a
particular phase of the software development life cycle meet all of the specified
requirements for that phase. Software verification looks for consistency,
completeness, and correctness of the software and its supporting
documentation, as it is being developed, and provides support for a subsequent
conclusion that software is validated.
R in Clinical Research ► What does FDA say?
55
Software validation is a part of the design validation for a finished device, but is not
separately defined in the Quality System regulation. For purposes of this guidance,
FDA considers software validation to be “confirmation by examination and
provision of objective evidence that software specifications conform to user
needs and intended uses, and that the particular requirements implemented
through software can be consistently fulfilled.
Production
is R and all packages done well?
Installation and work
mean( 1:3 ) == 2 ?
SOFTWARE VERIFICATION SOFTWARE VALIDATION
≠
R in Clinical Research ► What does FDA say?
56
Software validation includes confirmation of conformance to all software
specifications and confirmation that all software requirements are traceable to the
system specifications.
requirements
documentation
specification
of the system
( verification ) + validation
The system confirmation
R in Clinical Research ► What does FDA say?
57
Because of its complexity, the development process for software should be even
more tightly controlled than for hardware, in order to prevent problems that cannot
be easily detected later in the development process.
[…]
Seemingly insignificant changes in software code can create unexpected and
very significant problems elsewhere in the software program. The software
development process should be sufficiently well planned, controlled, and documented
to detect and correct unexpected results from software changes.
R in Clinical Research ► What does FDA say?
58
SECTION 4. PRINCIPLES OF SOFTWARE VALIDATION
4.9. INDEPENDENCE OF REVIEW
Validation activities should be conducted using the basic quality
assurance precept of “independence of review.” Self-validation is
extremely difficult. When possible, an independent evaluation is
always better, especially for higher risk applications.
Validator Builder
R in Clinical Research ► What does FDA say?
59
The software requirements specification document should contain a written definition of the software functions.
It is not possible to validate software without predetermined and documented software requirements.
Typical software requirements specify the following:
 All software system inputs
 All software system outputs
 All functions that the software system will perform
 All performance requirements that the software will meet, (e.g., data throughput, reliability, and timing)
 The definition of all external and user interfaces, as well as any internal software-to-system interfaces
 How users will interact with the system
 What constitutes an error and how errors should be handled
 Required response times
 The intended operating environment for the software, if this is a design constraint (e.g. hardware platform,
operating system)
 All ranges, limits, defaults, and specific values that the software will accept
 All safety related requirements, specifications, features, or functions that will be implemented in software
R in Clinical Research ► What does FDA say?
60
The vendor’s life cycle documentation, such as testing protocols and results, source code, design
specification, and requirements specification, can be useful in establishing that the software has
been validated. However, such documentation is frequently not available from commercial
equipment vendors, or the vendor may refuse to share their proprietary information.
Now let’s stop for a while and quickly summarize what we already learned
commercial software open-source software
 No source code  Source code provided
 No proprietary technical information  Full documentation provided (if available)
 Assurance “we did our best”  Assurance “we did our best”
 No guarantee  No guarantee
 Support  No hot-line. But very active community.
 …”millions of people use that”  …”millions of people use that”
 Full trust: it’s paid = validated well  Low trust. Free things are poorly made
R in Clinical Research ► What does FDA say?
61
The process of validation of the software
https://www.fda.gov/downloads/MedicalDevices/.../ucm073779.pdf
Guidance for Industry, FDA Reviewers
and Compliance on
Off-The-Shelf Software Use in Medical Devices
This is another essential document. A must-read.
We are not going to analyze it thoroughly, yet it is strongly
recommended to familiarize with.
R in Clinical Research ► What does FDA say?
62
Introduction to the controlled environment
https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm070266.pdf
Guidance for Industry
Computerized Systems Used in Clinical Investigations
S.O.P
Dependability System Documentation System Controls
Change Control Documentation Training of Personnel
R in Clinical Research ► What does FDA-related people say?
63http://user2007.org/program/presentations/soukup.pdf
R in Clinical Research ► What does FDA-related people say?
64
Hey! Do read it again!
(see the next page)
R in Clinical Research ► What does FDA-related people say?
65
1. Use of R functions without proper validation is at the organizations risk
It seems that nobody forces us to validate R. Is it up to us? So
let’s better do (see the next page)
2. Results should be reproducible and independent of the software used
to derive them.
This is impossible “by definition”… SAS, R, Stata, SPSS may return
different results even for quantiles, or due to floating number representation!
The results should be maximally close to each other, but what about resampling
methods (SAS and R gives different random numbers for the same seed)?
R in Clinical Research ► What does FDA-related people say?
66
Another argument
for validating the R
R in Clinical Research ► What does FDA-related people say?
67
R in Clinical Research ► What does FDA-related people say?
68
R in Clinical Research ► What does it mean „to validate”? Why do we want this?
69
Finally, we got to this place. Let us now try to answer this question in layman terms:
“To validate” means to ensure that R does all the calculations properly.
But to confirm this, we need to check dozens of components, packages, functions.
Remember:
FDA doesn’t tell you what exactly should be validated (which functions). You decide.
The analysis of risk and validation coverage is entirely up to you.
That’s our responsibility to do it WELL.
Why? The necessity for validation is also to protect you and let you sleep well.
Try to think this way. Once done properly – it gives you a reliable, powerful tool.
R in Clinical Research ► Preparing R to enter the industry
70
 We know FDA allows us to use R in submissions
 We know what FDA wants from us and have a piece of advice how to do it
 We have the source code provided for both R Core and every package
 Most of the packages refers to handbooks and point to certain formulas
 The R Core Team prepared a very important document on this topic
 R has tools for unit testing
 Reference data for testing are available in the Internet or can be obtained
 There are tools allowing the system maintainer to protect (“to freeze”) the newly
validated environment against changes.
What tools do we have and what is to be done?
R in Clinical Research ► Preparing R to enter the industry
71
 Validation is incremental. Once validated, a function doesn’t have to be re-
validated until update. Of course we can validate it many times (which I
recommend), which is easy with automated tools.
 Only used functions have to be tested. Unused code means non-existent code.
 Accumulation of test-cases over time significantly improves the process of
validation. Every new trial is a source of new, real data, perfect for testing.
And a bonus
R in Clinical Research ► Preparing R to enter the industry
72
The R-FDA.PDF document is a giant milestone. It makes a perfect starting point in the
process of establishing an own controlled R-based environment.
For obvious reasons it is limited only to a small subset o packages, labelled “Base” and
“Recommended”.
These packages don’t cover the complete ser o statistical routines used in clinical
research, but will definitely allow one to start with advanced analysis employing:
• linear mixed models (with given covariance structure), generalized additive models,
• survival analysis,
• accessing data generated by external statistical packages,
• resampling (bootstrap)
• and tons of statistical tests
• plotting (low-level and quite advanced via “lattice” package) and much more.
R in Clinical Research ► Preparing R to enter the industry
73
https://www.r-project.org/doc/R-FDA.pdf
Validation ► Validation of installation vs. numerical validation
74
What aspects of R-based computing environment can be validated?
 The process of installation of the core R
 The process of installation of required packages (version)
 The quality of code in installed packages (code metrics)
 Coverage by unit tests defined in installed packages
 The outcome of these unit tests
Thought #1: incorrectly installed R or its package will not work properly or even
launch. It is useless.
Validation ► Validation of installation vs. numerical validation
75
What aspects of R-based computing environment can be validated?
 The correctness of calculations performed by selected functions in selected
packages.
Thought #2: even correctly installed R or package, but returning wrong
results of calculation is not even useless, it’s extremely dangerous!
Well-done Validation = Validation of installation + Numerical validation
Validation ► Numerical validation ► Methods
76
How to validate a module numerically?
 By comparing results with some reference data, obtained from trusted
source (good!)
trial versions (if license permits) of other statistical packages
asking someone who has a legal licence to run a certain analysis on given data
publicly available documentation with examples
 By comparing results with calculations done by hand, step by step
(makes sense only for easy methods)
 By inspecting the code and compare the implemented formula with the
reference in corresponding textbook (so-so, but allows to find issues)
Validation ► Numerical validation ► Methods
77
How to validate a module numerically?
Comparison has to be done with some tolerance, as it is likely, that two statistical
packages will slightly differ in results, due to numerous issues, like:
 Different way of storing floating point numbers
 Different approach to calculating quantiles
 Different algorithm of rounding numbers
 Difference in default contrasts set
 Difference in type of Sum of Square used
 Difference in random number generator (for same seed)
 Different correction applied to a method (different rules of choice)
Validation ► Numerical validation ► Methods
78
How to validate a module numerically?
 Obtained collection:
 Statistical method name
 Values of relevant parameters
 Input data set provided to the reference software
 An outcome returned by the reference software
…can be then enclosed into so-called “unit tests” code and stored into a
repository. A unit-testing engine queries the repository, fetches the definitions of
tests and passes them to appropriate functions for test in fully automated
manner. The tested function returns a result which is compared to the
reference. At the end it generates a report from validation.
Validation ► Validation of installation
79
https://www.londonr.org/wp-content/uploads/sites/2/presentations/LondonR_-_Challenges_Of_Validating_R_-_Chris_Campbell_-_20140617.pdf
Fixing the environment and controlling for changes
80
How to prevent the environment from being “invalidated”?
 To prevent the users updating the R core
 To prevent users from installing “illegal”(not validated) packages
 “foreign” packages (not in the local use)
 in different version
BUT!
 Each project may require different set of packages in different versions
 Certain project may require installation of new (yet not validated) packages
 New packages are created within the company
Fixing the environment and controlling for changes
81
How to prevent the environment from being “invalidated”?
 Docker containers (Rocker)
 Read-only environment on a CD or DVD (slow!)
 Portable version of R with “broken”.libPaths
 Isolation of the workstation from the Internet (so cruel!)
 Local repository of packages (in different versions): miniCRAN
 The checkpoint solution, based on MRAN
 The packrat solution, combined with miniCRAN
 Employing a Concurrent Versioning System, like SVN or Git
Thank you
82
part II - soon!

Más contenido relacionado

La actualidad más candente

Applications of sas and minitab in data analysis
Applications of sas and minitab in data analysisApplications of sas and minitab in data analysis
Applications of sas and minitab in data analysisVeenaV29
 
Introduction to Research.pdf
Introduction to Research.pdfIntroduction to Research.pdf
Introduction to Research.pdfRavinandan A P
 
Criticisms of orthodox medical ethics, importance of
Criticisms of orthodox medical ethics, importance ofCriticisms of orthodox medical ethics, importance of
Criticisms of orthodox medical ethics, importance ofsupriyawable1
 
Formulation and Evaluation of Solid dispersion for Dissolution Enhancement of...
Formulation and Evaluation of Solid dispersion for Dissolution Enhancement of...Formulation and Evaluation of Solid dispersion for Dissolution Enhancement of...
Formulation and Evaluation of Solid dispersion for Dissolution Enhancement of...Jing Zang
 
Design of Experiments (DOE)
Design of Experiments (DOE)Design of Experiments (DOE)
Design of Experiments (DOE)Imdad H. Mukeri
 
Designing the methodology: COHORT Studies.pdf
Designing the methodology: COHORT Studies.pdfDesigning the methodology: COHORT Studies.pdf
Designing the methodology: COHORT Studies.pdfRavinandan A P
 
Drugsusedinprotozoalinfections 170307214705(1)
Drugsusedinprotozoalinfections 170307214705(1)Drugsusedinprotozoalinfections 170307214705(1)
Drugsusedinprotozoalinfections 170307214705(1)Ravi kumar
 
STATISTICAL PARAMETERS
STATISTICAL  PARAMETERSSTATISTICAL  PARAMETERS
STATISTICAL PARAMETERSHasiful Arabi
 
General research methodology mpharm
General research methodology  mpharmGeneral research methodology  mpharm
General research methodology mpharmAlkaDiwakar
 
Cross over design, Placebo and blinding techniques
Cross over design, Placebo and blinding techniques Cross over design, Placebo and blinding techniques
Cross over design, Placebo and blinding techniques Dinesh Gangoda
 
Research Methodology_UNIT_I_General Research Methodology M. Pharm (IIIrd Sem.)
Research Methodology_UNIT_I_General Research Methodology M. Pharm (IIIrd Sem.)Research Methodology_UNIT_I_General Research Methodology M. Pharm (IIIrd Sem.)
Research Methodology_UNIT_I_General Research Methodology M. Pharm (IIIrd Sem.)Prachi Pandey
 
Ind (investigational new drug application) and nda
Ind (investigational new drug application) and ndaInd (investigational new drug application) and nda
Ind (investigational new drug application) and ndaswati2084
 
Alternative to animal studies
Alternative to animal studiesAlternative to animal studies
Alternative to animal studiespaulvitrion91
 
Understanding clinical trial's statistics
Understanding clinical trial's statisticsUnderstanding clinical trial's statistics
Understanding clinical trial's statisticsMagdy Khames Aly
 
Medical Research Pharmacy
Medical Research PharmacyMedical Research Pharmacy
Medical Research PharmacyAparna Yadav
 

La actualidad más candente (20)

Applications of sas and minitab in data analysis
Applications of sas and minitab in data analysisApplications of sas and minitab in data analysis
Applications of sas and minitab in data analysis
 
Introduction to Research.pdf
Introduction to Research.pdfIntroduction to Research.pdf
Introduction to Research.pdf
 
Criticisms of orthodox medical ethics, importance of
Criticisms of orthodox medical ethics, importance ofCriticisms of orthodox medical ethics, importance of
Criticisms of orthodox medical ethics, importance of
 
Formulation and Evaluation of Solid dispersion for Dissolution Enhancement of...
Formulation and Evaluation of Solid dispersion for Dissolution Enhancement of...Formulation and Evaluation of Solid dispersion for Dissolution Enhancement of...
Formulation and Evaluation of Solid dispersion for Dissolution Enhancement of...
 
Design of Experiments (DOE)
Design of Experiments (DOE)Design of Experiments (DOE)
Design of Experiments (DOE)
 
Designing the methodology: COHORT Studies.pdf
Designing the methodology: COHORT Studies.pdfDesigning the methodology: COHORT Studies.pdf
Designing the methodology: COHORT Studies.pdf
 
Drugsusedinprotozoalinfections 170307214705(1)
Drugsusedinprotozoalinfections 170307214705(1)Drugsusedinprotozoalinfections 170307214705(1)
Drugsusedinprotozoalinfections 170307214705(1)
 
STATISTICAL PARAMETERS
STATISTICAL  PARAMETERSSTATISTICAL  PARAMETERS
STATISTICAL PARAMETERS
 
Declaration of helsinki (Pharmacology SEM-III)
Declaration of helsinki (Pharmacology SEM-III)Declaration of helsinki (Pharmacology SEM-III)
Declaration of helsinki (Pharmacology SEM-III)
 
Medical research
Medical researchMedical research
Medical research
 
CPCSEA GUIDELINE
CPCSEA GUIDELINE CPCSEA GUIDELINE
CPCSEA GUIDELINE
 
General research methodology mpharm
General research methodology  mpharmGeneral research methodology  mpharm
General research methodology mpharm
 
Cpcsea
CpcseaCpcsea
Cpcsea
 
Cross over design, Placebo and blinding techniques
Cross over design, Placebo and blinding techniques Cross over design, Placebo and blinding techniques
Cross over design, Placebo and blinding techniques
 
Research Methodology_UNIT_I_General Research Methodology M. Pharm (IIIrd Sem.)
Research Methodology_UNIT_I_General Research Methodology M. Pharm (IIIrd Sem.)Research Methodology_UNIT_I_General Research Methodology M. Pharm (IIIrd Sem.)
Research Methodology_UNIT_I_General Research Methodology M. Pharm (IIIrd Sem.)
 
Synopsis copy
Synopsis   copySynopsis   copy
Synopsis copy
 
Ind (investigational new drug application) and nda
Ind (investigational new drug application) and ndaInd (investigational new drug application) and nda
Ind (investigational new drug application) and nda
 
Alternative to animal studies
Alternative to animal studiesAlternative to animal studies
Alternative to animal studies
 
Understanding clinical trial's statistics
Understanding clinical trial's statisticsUnderstanding clinical trial's statistics
Understanding clinical trial's statistics
 
Medical Research Pharmacy
Medical Research PharmacyMedical Research Pharmacy
Medical Research Pharmacy
 

Similar a The use of R statistical package in controlled infrastructure/TITLE

GNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineGNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineAdrian Olszewski
 
Revolution R: 100% R and more
Revolution R: 100% R and moreRevolution R: 100% R and more
Revolution R: 100% R and moreMasayoshi Ootsuka
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in RRevolution Analytics
 
European Pharmaceutical Contractor: SAS and R Team in Clinical Research
European Pharmaceutical Contractor: SAS and R Team in Clinical ResearchEuropean Pharmaceutical Contractor: SAS and R Team in Clinical Research
European Pharmaceutical Contractor: SAS and R Team in Clinical ResearchKCR
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
SC1 - Hangout 2: The Open PHACTS pilot
SC1 - Hangout 2: The Open PHACTS pilotSC1 - Hangout 2: The Open PHACTS pilot
SC1 - Hangout 2: The Open PHACTS pilotBigData_Europe
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R ProgrammingIRJET Journal
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Databricks
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...Work-Bench
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationAlvaro Gil
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Softwarearttan2001
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Mark Tabladillo
 
Revolution R Enterprise - 100% R and More
Revolution R Enterprise - 100% R and MoreRevolution R Enterprise - 100% R and More
Revolution R Enterprise - 100% R and MoreRevolution Analytics
 

Similar a The use of R statistical package in controlled infrastructure/TITLE (20)

GNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based MedicineGNU R in Clinical Research and Evidence-Based Medicine
GNU R in Clinical Research and Evidence-Based Medicine
 
Revolution R: 100% R and more
Revolution R: 100% R and moreRevolution R: 100% R and more
Revolution R: 100% R and more
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
 
European Pharmaceutical Contractor: SAS and R Team in Clinical Research
European Pharmaceutical Contractor: SAS and R Team in Clinical ResearchEuropean Pharmaceutical Contractor: SAS and R Team in Clinical Research
European Pharmaceutical Contractor: SAS and R Team in Clinical Research
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
Revolution R: 100% R and more
Revolution R: 100% R and moreRevolution R: 100% R and more
Revolution R: 100% R and more
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
SC1 - Hangout 2: The Open PHACTS pilot
SC1 - Hangout 2: The Open PHACTS pilotSC1 - Hangout 2: The Open PHACTS pilot
SC1 - Hangout 2: The Open PHACTS pilot
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R Programming
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...
 
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Software
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629
 
Revolution R Enterprise - 100% R and More
Revolution R Enterprise - 100% R and MoreRevolution R Enterprise - 100% R and More
Revolution R Enterprise - 100% R and More
 

Más de Adrian Olszewski

Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Adrian Olszewski
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchAdrian Olszewski
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyAdrian Olszewski
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyAdrian Olszewski
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsAdrian Olszewski
 
Modern statistical techniques
Modern statistical techniquesModern statistical techniques
Modern statistical techniquesAdrian Olszewski
 
Dealing with outliers in Clinical Research
Dealing with outliers in Clinical ResearchDealing with outliers in Clinical Research
Dealing with outliers in Clinical ResearchAdrian Olszewski
 
Rcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for RRcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for RAdrian Olszewski
 

Más de Adrian Olszewski (9)

Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental research
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journey
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journey
 
Flextable and Officer
Flextable and OfficerFlextable and Officer
Flextable and Officer
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statistics
 
Modern statistical techniques
Modern statistical techniquesModern statistical techniques
Modern statistical techniques
 
Dealing with outliers in Clinical Research
Dealing with outliers in Clinical ResearchDealing with outliers in Clinical Research
Dealing with outliers in Clinical Research
 
Rcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for RRcommander - a menu-driven GUI for R
Rcommander - a menu-driven GUI for R
 

Último

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Último (20)

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

The use of R statistical package in controlled infrastructure/TITLE

  • 1. The use of R statistical package in controlled infrastructure The case of Clinical Research industry Adrian Olszewski Senior Biostatistician at 2KMM 22th Jun 2018 Poland • Sosnowiec www.2kmm.eu Polish National Group of the International Society for Clinical Biostatistics http://www.iscb.pl 40min www.r-clinical-research.com r.clin.res@gmail.com PART I
  • 2. DISCLAIMER All trademarks, logos of companies and names of products used in this document are the sole property of their respective owners and are included here for informational, illustrative purposes only, which falls within the nominative fair use. This presentation is based exclusively on information publicly available on the Internet under provided hyperlinks. If you believe your rights are violated, please email me: r.clin.res@gmail.com
  • 3. Agenda ► Quick introduction to R o Description o History. Events important for the use of R in EBM* o Who uses R? * Evidence-Based Medicine 3
  • 4. Agenda ► R in Evidence-Based Medicine o Capabilities o A brief overview of common tasks o Cooperation and compliance with SAS o www.r-clinical-research.com or CRAN Task Views 4
  • 5. Agenda ► R in Clinical Research o Status of R on the Clinical Research market o Myths and Facts o What does FDA say? o What does it mean „to validate”? Why do we want this? o Preparing R to enter the industry 5
  • 6. Agenda ► Validation o Validation of installation vs. numerical validation o Numerical validation o Methods o Reference data ► Fixing the environment and controlling for changes ► How does R support the creation of a controlled environment? 6
  • 7. Agenda ► Conclusions ► Does it work? ► Q&A 7
  • 8. Quick introduction to R ► Description R is an open-source software environment, widely used in scientific world for: statistical computing data manipulation data presentation and other general programming tasks https://www.r-project.org 𝒙 𝒏 It’s also the name of a high-level, Turing-complete, interpreted, multi-paradigm programming language used within the environment. 8
  • 9. Quick introduction to R ► Description Short characteristics: ► Description computational environment + programming language ► Developer R Development Core Team ► Operating systems cross-platform: Windows, Unix, Linux, OS X, mobile: Android, Maemo, Raspbian ► Form command line + third-party IDEs and editors ► Infrastructure R core library + shell + libraries (base and third-party) ► Model of work 1) standalone application, 2) standalone server, 3) server process ► Programming language Turing-complete, domain-specific, interpreted, high-level with dynamic typing ► Paradigm 1) array, 2) object-oriented (S3, S4, R5, R6 models), 3) imperative, 4) functional, 5) procedural, 6) reflective ► Source of libraries mirrored repository – CRAN, users' sites, third-party repositories (Github, RForge) ► License of the core GNU General Public License ver. 2 ► License of libraries 99.9% open-source. 0.1% is licensed (free for non-commercial use) model <- lm(y ~ x1 * x2) 9
  • 10. Quick introduction to R ► Description The basic GUI on Windows 10
  • 11. Quick introduction to R ► Description Advanced IDE – RStudio 11
  • 12. Quick introduction to R ► Description Advanced IDE – Microsoft Visual Studio 12
  • 13. Quick introduction to R ► History 1976 1998 1993 R was born 1997 R Core Team was formed 1988 S-PLUS was born Statistical Sciences, Inc. R. Douglas Martin University of Washington Univ. of Auckland Ross Ihaka, Robert Gentleman 1980 First commercial release via AT&T 1988 New S Language First statistical system to receive the Software System Award, the top software award from the Association for Computing Machinery The last version 2008 IC acquired 1993 S code boguht for $2 mln 2004 Exclusive license to develop and sell the S language 20072003 R Foundation was formed R Consortium was founded First release CRAN S was born Bell Laboratories Rick Becker, Allan Wilks, John Chambers from Bell Labs Insightful Corporation from AT&T  Lucent TIBCO 2013 TERR - TIBCO Enterprise Runtime for R 2007 Revolution was born Revolution Analytics 2015 R Open was born Microsoft 2008 R Enterprise Oracle 2015 Revolution acquired by Microsoft v 1.0.0 2000 TIBCO Spotfire 13
  • 14. 1997 The first release of R FDA 21 CFR Part 11 CRAN 1998 nlme 1999 FDA „Off-The-Shelf Software Use in Medical Device” 2000 xtable 2001 DBI  survival 2002 multcomp FDA „General Principles of Software Validation – Final” Bioconductor 2003 lme4  nlmeODE The R Core Team 2004 2005 drc (Dose-Response)  PKfit  PK  ggplot2  ROCR 2006 gsDesign  meta  mice  tdm  ivivc  blockrand  pwr 2007 SASxport  Rtools "Using R: Perspectives of a FDA Statistical RevieweR„ "R - Regulatory Compliance and Validation Issues" "Use of R in C.T. & Industry-Sponsored Medical Res. " "Op. Sour. Stat. Soft. in Pharma Developm.: A case study with R" The R Foundation 2008 MCPMod  bear  rjags  epiR  plyr  DanteR 2009 SAS IML studio supports R  SAS7bdat  metafor  gamm4 2010 PKGraph  pROC  oro.nifti  oro.dicom  PowerTOST 2011 RStudio  Detools  ggbio  RISmed rplos 2012 Shiny  knitr  Pmetrics  TrialSize  stargazer  OpenCPU FDA: „Sponsors may use R in their submissions” 2013 cpk The SAS® versus R Debate in Industry and Academia 2014 Tidyverse  ValidR  Checkpoint  Packrat  Rmarkdown  rclinicaltrials  pubmed.miner  ReporteRs  greport  dplyr MRAN 2015 rxODE  gfd  ThreeArmedTrials  randomizeR FDA: „Statistical Software Clarifying Statement” The R Consortium 2016 R Tools for Visual Studio  rankFD The R Epid. Cons. 2017 dfpk - Bayesian Dose-Finding Designs  officer 2018 Mediana - general framework for CT simulations Quick introduction to R ► History ► (few) Events important for the use of R in EBM 14
  • 15. Quick introduction to R ► Who uses R? Medicine and Pharmacy Other Business & Science Tycoons ► American Express ► Bank of America ► BBC ► Capgemini ► Deloitte ► Ebay ► Facebook ► Fermi National Accelerator Laboratory ► Ford ► Goldman Sachs ► Google ► HP ► IBM ► J.P. Morgan ► Kickstarter ► Microsoft ► Monsanto ► Mozilla ► New York Times ► NIST - National Institute of Standards & Technology ► NOAA ► Oracle ► Twitter ► Uber ► UK Government ► Wells Fargo ► 2KMM ► Amgen ► Astra Zeneca ► Bayer ► CardioDX ► Dr. Reddy’s Laboratories ► FDA ► GCE ► KCR (2014-2017) ► Medtronic ► Merck ► Novartis ► Pfizer ► Roche 15
  • 16. Quick introduction to R ► Who uses R? The list is built based exclusively on publicly available information:  lists of users provided by Revolution, RStudio and others  articles (example, example) and interviews (example)  published documents in which a name of a company is visible (example)  job advertisements (LinkedIn, Google, PharmiWeb, etc.)  names of companies supporting / organizing events (conferences, courses, etc)  other sources (example) That is to say, a logo of a company is included in the list only if there is a clear evidence that the company uses or supports (or used or supported) R, based on information shared on the Internet – and thus available for everyone. Please note, that I am not aware if all listed companies are still using any version of R at the time the presentation is being viewed. If you want me to remove your logo, please send me an mail to r.clin.res@gmail.com 16
  • 17. Quick introduction to R ► Who uses R? 17 “We use R for adaptive designs frequently because it’s the fastest tool to explore designs that interest us. Off-the-shelf software, gives you off-the-shelf options. Those are a good first order approximation, but if you really want to nail down a design, R is going to be the fastest way to do that.” Keaven Anderson Executive Director, Late Stage Biostatistics Merck “De facto, R is already a significant component of Pfizer core technology. Access to a supported version of R will allow us to keep pace with the growing use of R in the organization, and provides a path forward to use of R in regulated applications.” James A. Rogers Ph.D. Associate Director, Nonclinical Statistics Group Pfizer https://pharma-life-sciences.cioreview.com/news/gsdesign-explorer-to-optimize-merck-s-clinical-trial-process-nid-1305-cid-36.html Google Books: Big Data for Big Pharma: An Accelerator for The Research and Development Engine? Publicly available sources: https://www.featuredcustomers.com/vendor/revolution-analytics-1/customers/pfizer Publicly available sources:
  • 18. Quick introduction to R ► Who uses R? 18 “We use R for all of our analysis,” says Elashoff. “I think it’s fair to say that R really is the foundation of a lot of the work that we do.” To speed up the process without sacrificing accuracy, the team also uses Revolution R analytic products. “We use R seven or eight hours per day, so any improvement in speed is helpful, particularly when you’re looking at a million biomarkers and wondering if you’ll need to re-run a million analyses.” Open-source R packages enable the biostatisticians at CardioDX to run a broad range of analyses, accurately and effectively, on a routine basis. Adding Revolution R products to the mix improves processing speeds and makes it easier to crunch large data sets. Accelerating the analytic process reduces ov erall project time, increasing the team’s efficiency. “Revolution R is faster than regular R,” says Elashoff. “The faster we can analyze data, the less time it takes us to build our diagnostic algorithms.” Michael Elashoff The company’s director of biostatistics CardioDX https://www.featuredcustomers.com/media/CustomerCaseStudy.document/revolution-analytics-1_cardiodx_8284.pdf Publicly available sources:
  • 19. Quick introduction to R ► Who uses R? 19 “We use R for all of our analysis,” says Elashoff. “I think it’s fair to say that R really is the foundation of a lot of the work that we do.” To speed up the process without sacrificing accuracy, the team also uses Revolution R analytic products. “We use R seven or eight hours per day, so any improvement in speed is helpful, particularly when you’re looking at a million biomarkers and wondering if you’ll need to re-run a million analyses.” Open-source R packages enable the biostatisticians at CardioDX to run a broad range of analyses, accurately and effectively, on a routine basis. Adding Revolution R products to the mix improves processing speeds and makes it easier to crunch large data sets. Accelerating the analytic process reduces ov erall project time, increasing the team’s efficiency. “Revolution R is faster than regular R,” says Elashoff. “The faster we can analyze data, the less time it takes us to build our diagnostic algorithms.” Michael Elashoff The company’s director of biostatistics CardioDX https://www.businesswire.com/news/home/20110118006656/en/CardioDX-Revolution-Analytics-Develop-Non-Intrusive-Test-Predicting Publicly available sources:
  • 20. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks descriptive analysis & summarizing 20 errors-in-variables modeling comparison of methods Deming, Passing-Bablock, Bland-Altman time-to-event / survival Kaplan-Meier, Nelson-Aalen, Cox regression, Weibull design of experiments parallel, cross-over, adaptive, group-sequential, multi-arm ROC analysis categorical data analysis planned & post-factum analysis advanced plotting sample size & power meta-analysis non-inferiority superiority (bio) equivalence PK, PD, Dose-Response randomization repeated measures & longitudinal trials parametric / non-parametric modeling (non) parametric (non) linear models with mixed effects resampling bootstrap, permutation, exact factorial design analysis parametric / non-parametric robust methods regularized, M-estimators detection of outliers univariate / multivariate missing data imputation *OCF, kNN, LI, MI, censored (KM)
  • 21. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 21 logging processes pure ascii, html, pdf, doc cooperation with SAS interoperability .NET, Java, Scala, Python, C++, Fortran, PHP, Perl, DDE, COM, TCP, WebServices accessing registers clinicaltials.gov, PubMed, PLOS accessing databases ODBC, JDBC, Oracle, MS SQL, MySQL, dBase, PostgreSQL, SQLite, DB/2, Informix, Firebird, H2, MongoDB, more… reproducible researchproducing documents doc(x), ppt(x), pdf, rtf, odf, ps exchanging data Excel, OO Calc, GNumeric, SPSS, Weka, Systat, Stata, EpiInfo, SAS, SAS XPT, Minitab, Octave, Matlab, DBF, CSV, XML, HTML, JSON, DICOM, NIFTI production tools and unit testing GUI desktop & server applications interactive presentations advanced data querying & transforming
  • 22. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 22  Descriptive stats Data review 
  • 23. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 23  Linear regression  ANOVA  post-hoc 
  • 24. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 24  GLM modelling 
  • 25. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 25  NLM modelling 
  • 26. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 26
  • 27. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 27
  • 28. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 28
  • 29. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 29
  • 30. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 30
  • 31. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 31
  • 32. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 32
  • 33. R in Evidence-Based Medicine ► Capabilities ► A brief overview of common tasks 33
  • 34. R in Evidence-Based Medicine ► Capabilities ► Cooperation & compliance with SAS 34 SAS and R Team in Clinical Research (Adrian Olszewski) Differences in:  origin of dates  default contrasts  used sum of squares  calculation of quantiles  generation of random numbers  implementation of advanced model  representation of floating point numbers SAS module #1 SAS module #2 Missing or expensive functionality or different method of communication SAS IML SAS baseRequired algorithm or functionality 1 𝑛ℎ 𝑑 𝑖=1 𝑛 𝑥 − 𝑥𝑖 ℎ Bi-directional communication
  • 35. Agenda ► R in Clinical Research o Status of R on the Clinical Research market o Myths and Facts o What does FDA say? o What does it mean „to validate”? Why do we want this? o Preparing R to enter the industry 35
  • 36. R in Clinical Research ► Status of R on the Clinical Research market 36  In clinical research, however, SAS reigns par excellence  In general bioscience and academia, S  R has built over years its position of one of the industry standards  Pharmaceutical companies, CROs and even FDA do use R “internally”. But they resist (or hesitate) to use it in submissions (to FDA).  Clinical Programmer or Biostatistician ≝ SAS Programmer. Period. OK, but how did it come to this?
  • 37. R in Clinical Research ► Status of R on the Clinical Research market 37 We can only speculate on why so often R users are told the mantra: Too many myths have accumulated, but we cannot ignore the facts.
  • 38. R in Clinical Research ► Myths and Facts 38 Facts Myths / objections  FDA requires software to be validated  FDA demands SAS for both the analysis and producing datasets. No other software is allowed.  R is not validated out-of-the-box  R cannot generate datasets in SAS Transport format  R doesn’t facilitate the creation of CDISC datasets  R cannot cooperate with SAS, including reading and writing SAS binary files  R doesn’t have a metadata layer  R doesn’t have paid hot-line  R cannot be validated as well as commercial software  Nobody takes the responsibility if something goes wrong  Commercial software doesn’t have errors  Packages change over time. What works today, may not work tomorrow. Packages happen to be removed  R is full of bugs (errors) as nobody controls it  Validation of a software is challenging and time- consuming process, so not everyone can afford.  R poorly supports the generation of TFLs
  • 39. R in Clinical Research ► Myths and Facts 39 Facts Myths / objections  Errors happen often in non-commercial software  R is limited in terms of implemented statistical methods  Announcing errors publicly doesn’t make people calm  Nobody uses R (or Open Source in general) in pharma industry (or in “serious business’). Maybe in academia, which is not a kind of a serious business.  FDA: “Results should be reproducible and independent of the software used to derive them”  R doesn’t meet 21 CFR Part 11, which is a must  Creators of R packages don’t have to provide (good) unit tests. It’s king of a good will.  R has no SAS-like “LOG”, which records everything  There is no commercial support (product and/or validation)  The entire software (all packages and functions) must be validated  Commercial software releases the end-user from any responsibility regarding the validation.
  • 40. R in Clinical Research ► Myths and Facts 40 Facts Myths Who is right and… …is it possible to use R in controlled environment?
  • 41. R in Clinical Research ► Myths and Facts 41 First, let us briefly address all points in the “table of shame”. Facts first.  FDA requires software to be validated  Yes. This is mandatory process. And that’s good! It protects not only the sponsor from serious troubles but also the patients!  R is not validated out-of-the-box  Yes, the official release is not guaranteed to be errors-free. A disclaimer note confirming that is displayed every time R is launched. But the validation is fully possible.  R doesn’t facilitate the creation of CDISC datasets  True. There is no easy GUI tools to map fields between CDASH and SDTM or easy-to-use ways to generate define.xml  R doesn’t have a metadata layer  Partially true. R supports attributes on every level of a data structure. With a few effort it can be implemented effectively. I plan to release a package allowing datasets to be annotated and printed in line with the assigned formats.  R doesn’t have paid hot-line  True. This is not a commercial project. But the R community is vibrant and provides giant amount of knowledge (Stack, Github)
  • 42. R in Clinical Research ► Myths and Facts 42 First, let us briefly address all points in the “table of shame”. Facts first.  Nobody takes the responsibility if something goes wrong  True. There is not a commercial project. By the way, to what extent exactly commercial companies take the responsibility? Do you have the conditions (and $$$) written down on paper and signed?  Packages change over time. What works today, may not work tomorrow. Packages happen to be removed  Very true. This can be effectively managed in many ways. Addressed later in this presentation  Validation of a software is challenging and time-consuming process, so not everyone can afford.  Very true. Time is money. One has to analyze the profitability and then make a decision.
  • 43. R in Clinical Research ► Myths and Facts 43 First, let us briefly address all points in the “table of shame”. Facts first.  Errors happen often in non-commercial software  That is true. There is no “paid testers”, only volunteers. It does not mean at all they perform any worse, but also does not make any guarantee they perform well.  Errors happen in every software, including commercial. Even in the top-quality medical devices (FDA recalls that in their guidelines), nuclear devices (Therac-25 medical accelerator case), power plants, space rockets, and even in Martian Rover or Mariner I space probe.  There is no error-free software. There is only software testes not well enough.  Announcing errors publicly doesn’t make people calm  Well, that is true. But hiding issues doesn’t make them less dangerous.  “Transparency” is the most reliable way of cooperating with software users. Programmers or end-users publicly announce errors so the whole community can learn about that and react quickly. Nothing is hidden, all the more so as this is Open Source. How often are you getting informed about errors in your favorite software with full details and the source code?
  • 44. R in Clinical Research ► Myths and Facts 44 First, let us briefly address all points in the “table of shame”. Facts first.  FDA: “Results should be reproducible and independent of the software used to derive them”  That is true. Results may differ between statistical packages. A little – but still.  If FDA uses SAS for checking, we may get into trouble in case of resampling methods even with the same seed set.  Creators of R packages don’t have to provide (good) unit tests. It’s king of a good will.  Yes. Even if forced to write tests, nobody can guarantee the tests are defined properly and bring any advantage.
  • 45. R in Clinical Research ► Myths and Facts Now myths.  FDA demands SAS for both the analysis and producing datasets. No other software is allowed.  No. FDA has never claimed that. This myth is so often repeated, so FDA issued an official “Software “Clarifying Statement”  R cannot generate datasets in SAS Transport format  False. R can generate XPT using SASxport package.  The SAS Transport Format is an open format and published by SAS Institute long time ago: 1. https://www.loc.gov/preservation/digital/formats/.... 2. http://documentation.sas.com....  R cannot cooperate with SAS, including reading and writing SAS binary files  False. R can be combined with SAS in may ways. Check this out: https://www.quora.com/How-can-I-integrate-SAS-with-R  SAS enabled direct communication between R and SAS in the IML module in 2009.  R can read SAS7 binary data files and both read/write XPT files.  R cannot be validated as well as commercial software  False. R can be validated no worse. In fact there is at least one company offering validated version of R – Mango.  Commercial software doesn’t have errors  Facts deny this claim evidently.  R is full of bugs (errors) as nobody controls it  Errors happen in third-party packages. No trace of increased reporting of bugs has had a place  R poorly supports the generation of TFLs  False. There are packages for creation of advanced graphs (ggplot2), Word documents, OpenDocument files, RTF and PDF. All tasks can be automatized since R is a programming language.
  • 46. R in Clinical Research ► Myths and Facts  R is limited in terms of implemented statistical methods  We have just seen how rich is the R statistical library. This is the most complete library after SAS (plus few routines more)  Nobody uses R (or Open Source in general) in pharma industry (or in “serious business’). Maybe in academia, which is not a kind of a serious business.  False. We have just seen few slides ago, that pharmaceutical companies do use R.  Not to mention the non-clinical representatives of a “serious business”. Now myths.
  • 47. R in Clinical Research ► Myths and Facts  R doesn’t meet 21 CFR Part 11, which is a must  Let me quote this: Whoever told you that is not well-informed. CFR Part 11 has to do with critical software that runs medical devices and about certain primary data management software. It does not apply to statistical analysis software. We use R all the time in industry-sponsored and NIH sponsored clinical trials. You do not need to seek FDA's approval. FDA accepts all comers and does not dictate software policy for analysis. They even accept Excel and Minitab for NDAs. There are many messages related to this in the r-help archive; please look at them. Frank E Harrell Jr Professor and Chair School of Medicine, Department of Biostatistics Vanderbilt University Source  And this: “Records submitted to FDA, under predicate rules in electronic format [are Part 11 records]. However, a record that is not itself submitted, but is used in generating a submission, is not a part 11 record unless it is otherwise required to be maintained under a predicate rule and it is maintained in electronic format.” Therefore, it is not mandated that 21 CFR Part 11 is appropriate to data analysis software systems that are not primarily intended for storage and transmission of electronic medical records. It remains the responsibility of an individual organization however to define the applicability of Part 11 and validation to their systems. R: Regulatory Compliance and Validation, 11 March 25, 2018 Source  Formal confirmation: Statistical Software Clarifying Statement by FDA Source Now myths.
  • 48. R in Clinical Research ► Myths and Facts  R has no SAS-like “LOG”, which records everything  Yes, R doesn’t have a “LOG”, but with RMarkdown (or knitr, sweave, odfWeave) and following the Reproducible Research paradigm, the “LOG” can easily be reproduced effortlessly.  The generated HTML (or PDF, DOCx) document contains both the code and corresponding results combined.  In addition, employing a versioning system (SVN, Git) to store the “LOG” into a repository, allows the analyst to version it and track changes. This gives a high level of confidence.  Less sophisticated, yet fully valid method can be implemented with the “sink()” function.  There is no commercial support (product and/or validation)  False. Mango ValidR product is a good example. Revolution also offered paid support. This refers only to certain packages (mostly from the “base” set)  The entire software (all packages and functions) must be validated  No. It has to be done properly and sufficiently.  Practice shows, that only the used part of R code must be validated. If a package contains 1000 functions, while only two of them are used, only the two functions have to be validated. If a validated function X calls an unvalidated function Y, the results subjected to validation is still returned by the function X under given parameters and conditions. Now myths.
  • 49. R in Clinical Research ► Myths and Facts  Commercial software releases the end-user from any responsibility regarding the validation.  No. Let’s quote FDA: All production and/or quality system software, even if purchased off-the-shelf, should have documented requirements that fully define its intended use, and information against which testing results and other evidence can be compared, to show that the software is validated for its intended use Source: General Principles of Software Validation - Final Guidance for Industry and FDA Staff Now myths.
  • 50. R in Clinical Research ► What does FDA say? 50 Now, let us see what FDA has said about:  The use of any software in clinical research. This is the KEY.  The process of validation of the software Then let us look at what some FDA-related people say about R
  • 51. R in Clinical Research ► What does FDA say? 51 The use of any software in clinical research ( + 21 CFR part 11 status) https://www.fda.gov/downloads/forindustry/datastandards/studydatastandards/ucm587506.pdf
  • 52. R in Clinical Research ► What does FDA say? 52 The process of validation of the software https://www.fda.gov/downloads/medicaldevices/.../ucm085371.pdf […] FDA considers software validation to be: “confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled.” General Principles of Software Validation Final Guidance for Industry and FDA Staff
  • 53. R in Clinical Research ► What does FDA say? 53 This document […] can be applied to any software. […] This document does not specifically identify which software is or is not regulated […] The management and control of the software validation process should not be confused with any other validation requirements, such as process validation for an automated manufacturing process (so the regular validation of clinical programs don’t count) […] design input requirements must be documented, and that specified requirements must be verified […] Success in accurately and completely documenting software requirements is a crucial factor in successful validation of the resulting software.
  • 54. R in Clinical Research ► What does FDA say? 54 A specification is defined as “a document that states requirements.” […] There are many different kinds of written specifications, e.g., system requirements specification, software requirements specification, software design specification, software test specification, software integration specification, etc […] Software verification provides objective evidence that the design outputs of a particular phase of the software development life cycle meet all of the specified requirements for that phase. Software verification looks for consistency, completeness, and correctness of the software and its supporting documentation, as it is being developed, and provides support for a subsequent conclusion that software is validated.
  • 55. R in Clinical Research ► What does FDA say? 55 Software validation is a part of the design validation for a finished device, but is not separately defined in the Quality System regulation. For purposes of this guidance, FDA considers software validation to be “confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled. Production is R and all packages done well? Installation and work mean( 1:3 ) == 2 ? SOFTWARE VERIFICATION SOFTWARE VALIDATION ≠
  • 56. R in Clinical Research ► What does FDA say? 56 Software validation includes confirmation of conformance to all software specifications and confirmation that all software requirements are traceable to the system specifications. requirements documentation specification of the system ( verification ) + validation The system confirmation
  • 57. R in Clinical Research ► What does FDA say? 57 Because of its complexity, the development process for software should be even more tightly controlled than for hardware, in order to prevent problems that cannot be easily detected later in the development process. […] Seemingly insignificant changes in software code can create unexpected and very significant problems elsewhere in the software program. The software development process should be sufficiently well planned, controlled, and documented to detect and correct unexpected results from software changes.
  • 58. R in Clinical Research ► What does FDA say? 58 SECTION 4. PRINCIPLES OF SOFTWARE VALIDATION 4.9. INDEPENDENCE OF REVIEW Validation activities should be conducted using the basic quality assurance precept of “independence of review.” Self-validation is extremely difficult. When possible, an independent evaluation is always better, especially for higher risk applications. Validator Builder
  • 59. R in Clinical Research ► What does FDA say? 59 The software requirements specification document should contain a written definition of the software functions. It is not possible to validate software without predetermined and documented software requirements. Typical software requirements specify the following:  All software system inputs  All software system outputs  All functions that the software system will perform  All performance requirements that the software will meet, (e.g., data throughput, reliability, and timing)  The definition of all external and user interfaces, as well as any internal software-to-system interfaces  How users will interact with the system  What constitutes an error and how errors should be handled  Required response times  The intended operating environment for the software, if this is a design constraint (e.g. hardware platform, operating system)  All ranges, limits, defaults, and specific values that the software will accept  All safety related requirements, specifications, features, or functions that will be implemented in software
  • 60. R in Clinical Research ► What does FDA say? 60 The vendor’s life cycle documentation, such as testing protocols and results, source code, design specification, and requirements specification, can be useful in establishing that the software has been validated. However, such documentation is frequently not available from commercial equipment vendors, or the vendor may refuse to share their proprietary information. Now let’s stop for a while and quickly summarize what we already learned commercial software open-source software  No source code  Source code provided  No proprietary technical information  Full documentation provided (if available)  Assurance “we did our best”  Assurance “we did our best”  No guarantee  No guarantee  Support  No hot-line. But very active community.  …”millions of people use that”  …”millions of people use that”  Full trust: it’s paid = validated well  Low trust. Free things are poorly made
  • 61. R in Clinical Research ► What does FDA say? 61 The process of validation of the software https://www.fda.gov/downloads/MedicalDevices/.../ucm073779.pdf Guidance for Industry, FDA Reviewers and Compliance on Off-The-Shelf Software Use in Medical Devices This is another essential document. A must-read. We are not going to analyze it thoroughly, yet it is strongly recommended to familiarize with.
  • 62. R in Clinical Research ► What does FDA say? 62 Introduction to the controlled environment https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm070266.pdf Guidance for Industry Computerized Systems Used in Clinical Investigations S.O.P Dependability System Documentation System Controls Change Control Documentation Training of Personnel
  • 63. R in Clinical Research ► What does FDA-related people say? 63http://user2007.org/program/presentations/soukup.pdf
  • 64. R in Clinical Research ► What does FDA-related people say? 64 Hey! Do read it again! (see the next page)
  • 65. R in Clinical Research ► What does FDA-related people say? 65 1. Use of R functions without proper validation is at the organizations risk It seems that nobody forces us to validate R. Is it up to us? So let’s better do (see the next page) 2. Results should be reproducible and independent of the software used to derive them. This is impossible “by definition”… SAS, R, Stata, SPSS may return different results even for quantiles, or due to floating number representation! The results should be maximally close to each other, but what about resampling methods (SAS and R gives different random numbers for the same seed)?
  • 66. R in Clinical Research ► What does FDA-related people say? 66 Another argument for validating the R
  • 67. R in Clinical Research ► What does FDA-related people say? 67
  • 68. R in Clinical Research ► What does FDA-related people say? 68
  • 69. R in Clinical Research ► What does it mean „to validate”? Why do we want this? 69 Finally, we got to this place. Let us now try to answer this question in layman terms: “To validate” means to ensure that R does all the calculations properly. But to confirm this, we need to check dozens of components, packages, functions. Remember: FDA doesn’t tell you what exactly should be validated (which functions). You decide. The analysis of risk and validation coverage is entirely up to you. That’s our responsibility to do it WELL. Why? The necessity for validation is also to protect you and let you sleep well. Try to think this way. Once done properly – it gives you a reliable, powerful tool.
  • 70. R in Clinical Research ► Preparing R to enter the industry 70  We know FDA allows us to use R in submissions  We know what FDA wants from us and have a piece of advice how to do it  We have the source code provided for both R Core and every package  Most of the packages refers to handbooks and point to certain formulas  The R Core Team prepared a very important document on this topic  R has tools for unit testing  Reference data for testing are available in the Internet or can be obtained  There are tools allowing the system maintainer to protect (“to freeze”) the newly validated environment against changes. What tools do we have and what is to be done?
  • 71. R in Clinical Research ► Preparing R to enter the industry 71  Validation is incremental. Once validated, a function doesn’t have to be re- validated until update. Of course we can validate it many times (which I recommend), which is easy with automated tools.  Only used functions have to be tested. Unused code means non-existent code.  Accumulation of test-cases over time significantly improves the process of validation. Every new trial is a source of new, real data, perfect for testing. And a bonus
  • 72. R in Clinical Research ► Preparing R to enter the industry 72 The R-FDA.PDF document is a giant milestone. It makes a perfect starting point in the process of establishing an own controlled R-based environment. For obvious reasons it is limited only to a small subset o packages, labelled “Base” and “Recommended”. These packages don’t cover the complete ser o statistical routines used in clinical research, but will definitely allow one to start with advanced analysis employing: • linear mixed models (with given covariance structure), generalized additive models, • survival analysis, • accessing data generated by external statistical packages, • resampling (bootstrap) • and tons of statistical tests • plotting (low-level and quite advanced via “lattice” package) and much more.
  • 73. R in Clinical Research ► Preparing R to enter the industry 73 https://www.r-project.org/doc/R-FDA.pdf
  • 74. Validation ► Validation of installation vs. numerical validation 74 What aspects of R-based computing environment can be validated?  The process of installation of the core R  The process of installation of required packages (version)  The quality of code in installed packages (code metrics)  Coverage by unit tests defined in installed packages  The outcome of these unit tests Thought #1: incorrectly installed R or its package will not work properly or even launch. It is useless.
  • 75. Validation ► Validation of installation vs. numerical validation 75 What aspects of R-based computing environment can be validated?  The correctness of calculations performed by selected functions in selected packages. Thought #2: even correctly installed R or package, but returning wrong results of calculation is not even useless, it’s extremely dangerous! Well-done Validation = Validation of installation + Numerical validation
  • 76. Validation ► Numerical validation ► Methods 76 How to validate a module numerically?  By comparing results with some reference data, obtained from trusted source (good!) trial versions (if license permits) of other statistical packages asking someone who has a legal licence to run a certain analysis on given data publicly available documentation with examples  By comparing results with calculations done by hand, step by step (makes sense only for easy methods)  By inspecting the code and compare the implemented formula with the reference in corresponding textbook (so-so, but allows to find issues)
  • 77. Validation ► Numerical validation ► Methods 77 How to validate a module numerically? Comparison has to be done with some tolerance, as it is likely, that two statistical packages will slightly differ in results, due to numerous issues, like:  Different way of storing floating point numbers  Different approach to calculating quantiles  Different algorithm of rounding numbers  Difference in default contrasts set  Difference in type of Sum of Square used  Difference in random number generator (for same seed)  Different correction applied to a method (different rules of choice)
  • 78. Validation ► Numerical validation ► Methods 78 How to validate a module numerically?  Obtained collection:  Statistical method name  Values of relevant parameters  Input data set provided to the reference software  An outcome returned by the reference software …can be then enclosed into so-called “unit tests” code and stored into a repository. A unit-testing engine queries the repository, fetches the definitions of tests and passes them to appropriate functions for test in fully automated manner. The tested function returns a result which is compared to the reference. At the end it generates a report from validation.
  • 79. Validation ► Validation of installation 79 https://www.londonr.org/wp-content/uploads/sites/2/presentations/LondonR_-_Challenges_Of_Validating_R_-_Chris_Campbell_-_20140617.pdf
  • 80. Fixing the environment and controlling for changes 80 How to prevent the environment from being “invalidated”?  To prevent the users updating the R core  To prevent users from installing “illegal”(not validated) packages  “foreign” packages (not in the local use)  in different version BUT!  Each project may require different set of packages in different versions  Certain project may require installation of new (yet not validated) packages  New packages are created within the company
  • 81. Fixing the environment and controlling for changes 81 How to prevent the environment from being “invalidated”?  Docker containers (Rocker)  Read-only environment on a CD or DVD (slow!)  Portable version of R with “broken”.libPaths  Isolation of the workstation from the Internet (so cruel!)  Local repository of packages (in different versions): miniCRAN  The checkpoint solution, based on MRAN  The packrat solution, combined with miniCRAN  Employing a Concurrent Versioning System, like SVN or Git

Notas del editor

  1. Wspomniec o revo, ms, tibco, spotfire, Oracle, r consortium, S