SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Big Data & DS Analytics
for PAARL
Albert Anthony D. Gavino, MBA
Data Scientist / DS Evangelist
About the speaker: Albert Anthony D. Gavino
Project profile
Program Objectives / Program Goals
Participants to be able to relate Big Data and Data Science
applications to Library services.
1. What is Big Data?
Extremely large data sets that may be analyzed to reveal
patterns, trends and associations
The BIG 3 V’s
•  Variety: different types of data
(Facebook, Twitter, CCTV feed)
•  Velocity: the speed that data comes in
(batch, streaming every second)
•  Volume: the largeness of that data.
(1GB, 1TB, 1PB, 1ZB)
Library Data Resources
What resources does the library have (budget, staff, premises, media, opening
hours etc.) and how is the library performing against traditional parameters,
like lending figures, visitors and social media activity? This library data can
also be combined with environmental information like community education
levels, geographical distances, age and so on.
http://www.axiell.co.uk/getting-the-most-from-your-library-data/
DATA Analytics Challenges and Pitfalls
The challenges to creating a robust institutional data analytics program include
culture, talent, cost, and data. We have deliberately mentioned culture first
because it is very easy to jump to data challenges. In fact, most of the
literature surrounding data analytics starts with challenges surrounding the
data itself. However, we are convinced that institutional culture is the most
important factor in determining the success of any given data analytics
program, including the politics and process around questions of talent, cost,
and data itself.
Reference: The Journal of Academic Librarianship, Libraries and Institutional Data Libraries:
Challenges and Opportunities
63% of researchers and administrators expressed unhappiness with the use
of metrics in higher education (Abbott et al., 2010)
What about New Tasks like streamlining for the Librarian?
If librarians take on new tasks, it is very important to track the amount of
time and level of staff required when undertaking analytics projects.
For example, collecting citation data for a researcher with a common
name often requires manual and painstaking record-by-record
searching in order to disambiguate that individual's research from
others that share his/her name. This type of work requires a librarian
with a deep and intimate knowledge of the bibliometric databases that
are being used to harvest the bibliometric data.
Reference: The Journal of Academic Librarianship, Libraries and Institutional Data Libraries:
Challenges and Opportunities
What is the Cost?
•  Data analytics should be thought of as a strategic
investment, not a cost-saving technique
•  the real cost is the time spent on cultural change and on
developing and educating a staff with the analytical skills
that we need in our discipline
•  visionary analytics plan invests in people, in hiring and
training, over data tools and platforms.
.
Pitfalls of Data Sharing:
Challenges on Institutional Data Analytics
Pitfalls Possible Solution/s
Ownership: who owns the data? It
could be registrar, library, IT
services.
An assigned office e.g. or Office of
the President/ Compliance Office
can release the official reports.
Quality: deciding when it is accurate
or good data, data reliability.
Data Governance Unit assures the
quality of data
Standards: what kind of data
variables are in use: string, numeric
This can be addressed by Data
Management on data warehousing
Access: who has access to the data User roles can be defined as to who
has access
Getting Started on Institutional Data
•  Creating an inventory of institutional data
•  Developing a data dictionary
•  Designing an unambiguous process for cleaning up those data
•  Creating an open data set that answers to the most commonly asked
data questions across campus.
Opportunities for Libraries on Big Data
•  Libraries know metadata
•  Libraries know strategy
•  Libraries know assessment
•  Libraries are neutral
•  Libraries know the vendors
•  Libraries are part of larger bodies like PAARL
•  Libraries have influence over campuses
•  Libraries know metrics
•  Libraries have user-centered culture
•  Libraries know the vendors
•  Libraries know the politics and policy issues with commercial parties
•  Libraries collaborate with both academic and academic support
2. Building a BIG DATA culture
•  Openness and acceptance to technology: Upper Management
•  Willingness to invest in the Big Data Platform: which entails cost
•  Training Staff and making sure of job security: Skills upgrade
•  Make data sharing acceptable: Trust in the data quality and people
•  Create Data Quality Assurance Team/s
•  Foster collaboration among departments
•  Continuous improvement of models
DATA Governance and DATA Management are different roles
Data governance is the designation of decision-rights and policy-making surrounding institutional data,
while data management is the implementation of those decisions and policies. Institutions need both,
and both require investment, but the senior leadership of our institutions need to design the former.
Data Governance Council
Data Management
policies
metrics
Data Quality Dept
Data Warehouse / Data
Lake
Machine Learning
Is a type of artificial intelligence that provides
computers with the ability to learn without being
explicitly programmed.
Market Basket Analysis on Book Recommendations (Association Rule Algorithm)
Weather related information and reading a book (use of hash tags and location and weather data)
Pic from Marco Rasos
Social Listening – is the process of monitoring digital conversations to
understand what customers are saying about a brand or service.
Online Research Journals and Click through Rates
Click through Rates (CTR)
Ratio of users who click on a specific
link to get to a page from a page ad or
button.
OpenCV (Open Source and Computer Vision)
Modern Day Data Scientists
Dr. Reina Reyes, Astrophysicist
Andrew Ng of Baidu, Coursera
Amy Smith, Uber Singapore
Data Science Conference 2016
YOU as the next
Doctor Strange
(Entering the world of
Data Science)
Isaac Reyes, Data Scientist Talas Data Scientists
CRISP – DM Methodology
The project was led by five
companies: SPSS,
Teradata, Daimler AG,
NCR Corporation and
OHRA, an insurance
company
CRISP-DM Tasks
From regular data to BIG data, from stat to AI
RegulardataBIGdata
Statistical modeling
Machine Learning
Deep Learning / A.I.
Traditional Modern
Trends in Data Science Domains
Data Science Domain Current Status
Natural Language
Processing (NLP)
Entered the market
Predictive Analytics /
Machine Learning
Entered the market
Visualization /
Dashboards
Entered the market
Image Processing
(openCV)
Exploration
Internet of Things (IoT) Exploration
Artificial Intelligence Exploration
DS/Big Data Applications to the field of Study
Agriculture Climate forecast modeling to help farmers
manage plantations (e.g. corn yields)
Medical field Image processing for chest x rays,
retina images for diabetic patients
Linguistics Natural Language Processing (NLP) for
dialects and Sentiment Analysis applications
Economics/Finance Predicting a stock price based on certain
indicators (e.g. noise, competitor price)
Sample Field of Study Specific Applications
Engineering Internet of Things (IoT) application to Big Data
Building a Data Science Team
Data ScientistData Engineer/
Dev Ops
Statistician Viz Expert
R,
Python,
Spark ML
Hadoop,
Spark Core,
Spark stream
SAS,
SPSS,
R, Matlab
Tableau, Cognos
D3, Javascript
Neural Nets
Random Forest
RDD, dataframes,
SQLContext
Linear Regression
K-means clustering
visualization
GIS maps
DS
role
Prog
Language
Sample
output
Data Science Team Composition
1 2 3
Trends on Programming Languages
scala
R
python
spark Rapid miner EMC
java
TOOLS: OPEN SOURCE vs PROPRIETARY SOFTWARE
OPEN SOURCE PROPRIETARY
SOFTWARE
pros No cost on software, packages are
available faster
Easy to deploy
cons Takes some time to create and
integrate with other software
Expensive software,
you have do buy in
modules
tools Python, R, Apache Spark SAS, IBM-SPSS,
AWS, Google
Small Data vs Big Data (in comparison)
Small data Big data
Sample size can be done
(sampling e.g. survey)
Use all of the data in the storage
No need for memory computing,
can be run on a regular PC/Mac
Eats up memory and needs
distributed computing
Statistical assumptions hold
true,
normality, heteroskedasticity
independence
Statistical assumptions do not
hold true like p-values since the
data is so large (what seems not
significant to small sets will
become significant, be careful
when using these assumptions)
Simple DS Cheat sheet
Classifiers
Neural Nets
Random forest
Clustering
K-means
Association
Assoc Rules
Predicting
Linear
Regression
Logistic
Regression
(binary)
Cox Regression
(Survival)
Hierarchical
Clustering
SVM (Cancer Cells)
Medical
Vizualization TOOLS
Color Hues and Functionality
Local Implications: Data Privacy Act 10173
Sensitive personal information refers to personal information:
1. About an individual’s race, ethnic origin, marital status, age, color, and religious, philosophical or
political affiliations;
2. About an individual’s health, education, genetic or sexual life of a person, or to any proceeding for
any offense committed or alleged to have been committed by such individual, the disposal of such
proceedings, or the sentence of any court in such proceedings;
3. Issued by government agencies peculiar to an individual which includes, but is not limited to, social
security numbers, previous or current health records, licenses or its denials, suspension or revocation,
and tax returns; and
4. Specifically established by an executive order or an act of Congress to be kept classified.
Solutions to the Data Privacy Act: Policies
Make sure you have the following in place
•  Opt In for customers
•  Opt out for customers
•  Updated your customer policy accordingly
•  Make your policy available publicly e.g. websites
References
•  www.coursera.org/learn/machine-learning
•  www.kaggle.com
•  www.crowdanalytix.com
•  www.talas.ph
•  www.facebook.com/analytics4pinoys
•  www.linkedin.com/albertgavino

Más contenido relacionado

La actualidad más candente

Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
Role of Information Technology in Library Management in Digital Era
Role of Information Technology  in Library Management in Digital EraRole of Information Technology  in Library Management in Digital Era
Role of Information Technology in Library Management in Digital Erarsgiri75
 
House keeeping operations .pptx
House keeeping operations .pptxHouse keeeping operations .pptx
House keeeping operations .pptxlisbala
 
Common communication format
Common communication formatCommon communication format
Common communication formatavid
 
Z39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptZ39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptSUNILKUMARSINGH
 
APPLICATION OF RFID TECHNOLOGY IN LIBRARIES AND ROLE OF LIBRARIAN
APPLICATION OF RFID TECHNOLOGY IN LIBRARIES AND ROLE OF LIBRARIANAPPLICATION OF RFID TECHNOLOGY IN LIBRARIES AND ROLE OF LIBRARIAN
APPLICATION OF RFID TECHNOLOGY IN LIBRARIES AND ROLE OF LIBRARIANKartika Mahajan
 
Greenstone Digital Library
Greenstone Digital LibraryGreenstone Digital Library
Greenstone Digital LibraryImran Mansuri
 
Planning and implementation of library automation by Aman Kumar Kushwaha
Planning and implementation of library automation by Aman Kumar KushwahaPlanning and implementation of library automation by Aman Kumar Kushwaha
Planning and implementation of library automation by Aman Kumar KushwahaAMAN KUMAR KUSHWAHA
 
New trends and skill in library automation: impact of Artificial Intelligence...
New trends and skill in library automation: impact of Artificial Intelligence...New trends and skill in library automation: impact of Artificial Intelligence...
New trends and skill in library automation: impact of Artificial Intelligence...Mokhtar Ben Henda
 

La actualidad más candente (20)

Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Role of Information Technology in Library Management in Digital Era
Role of Information Technology  in Library Management in Digital EraRole of Information Technology  in Library Management in Digital Era
Role of Information Technology in Library Management in Digital Era
 
Library automation
Library automationLibrary automation
Library automation
 
House keeeping operations .pptx
House keeeping operations .pptxHouse keeeping operations .pptx
House keeeping operations .pptx
 
Common communication format
Common communication formatCommon communication format
Common communication format
 
Ddc
Ddc Ddc
Ddc
 
Z39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptZ39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol ppt
 
APPLICATION OF RFID TECHNOLOGY IN LIBRARIES AND ROLE OF LIBRARIAN
APPLICATION OF RFID TECHNOLOGY IN LIBRARIES AND ROLE OF LIBRARIANAPPLICATION OF RFID TECHNOLOGY IN LIBRARIES AND ROLE OF LIBRARIAN
APPLICATION OF RFID TECHNOLOGY IN LIBRARIES AND ROLE OF LIBRARIAN
 
Innovative ICT Based Library Services
Innovative ICT Based Library ServicesInnovative ICT Based Library Services
Innovative ICT Based Library Services
 
S.R. Ranganathan:Three Planes of Work.
S.R. Ranganathan:Three Planes of Work.S.R. Ranganathan:Three Planes of Work.
S.R. Ranganathan:Three Planes of Work.
 
Koha presentation
Koha presentationKoha presentation
Koha presentation
 
Digital Library Software
Digital Library SoftwareDigital Library Software
Digital Library Software
 
MODULE - I (ACQUISITION)
MODULE - I (ACQUISITION)MODULE - I (ACQUISITION)
MODULE - I (ACQUISITION)
 
Areas of automation in library
Areas of automation in libraryAreas of automation in library
Areas of automation in library
 
Desidoc
DesidocDesidoc
Desidoc
 
LISTA Database Analysis
LISTA Database AnalysisLISTA Database Analysis
LISTA Database Analysis
 
Greenstone Digital Library
Greenstone Digital LibraryGreenstone Digital Library
Greenstone Digital Library
 
Planning and implementation of library automation by Aman Kumar Kushwaha
Planning and implementation of library automation by Aman Kumar KushwahaPlanning and implementation of library automation by Aman Kumar Kushwaha
Planning and implementation of library automation by Aman Kumar Kushwaha
 
New trends and skill in library automation: impact of Artificial Intelligence...
New trends and skill in library automation: impact of Artificial Intelligence...New trends and skill in library automation: impact of Artificial Intelligence...
New trends and skill in library automation: impact of Artificial Intelligence...
 
Koha
KohaKoha
Koha
 

Destacado

Austrian National Library - Current initiatives, projects and the library’s V...
Austrian National Library - Current initiatives, projects and the library’s V...Austrian National Library - Current initiatives, projects and the library’s V...
Austrian National Library - Current initiatives, projects and the library’s V...Max Kaiser
 
Slimme boekhouding met Isabel 6
Slimme boekhouding met Isabel 6Slimme boekhouding met Isabel 6
Slimme boekhouding met Isabel 6Isabel Group
 
Inspiration tour exact 24 11 16
Inspiration tour exact 24 11 16Inspiration tour exact 24 11 16
Inspiration tour exact 24 11 16Xeriusslides
 
Endouble Online advertising
Endouble Online advertising Endouble Online advertising
Endouble Online advertising Endouble
 
Recruitment And Analytics For Recruiters United
Recruitment And Analytics For Recruiters UnitedRecruitment And Analytics For Recruiters United
Recruitment And Analytics For Recruiters UnitedGordon Lokenberg
 
3 Big Data Trends for 2017
3 Big Data Trends for 20173 Big Data Trends for 2017
3 Big Data Trends for 2017Judd Bagley
 
Halve-dag coaching in (meer) delegeren
Halve-dag coaching in (meer) delegerenHalve-dag coaching in (meer) delegeren
Halve-dag coaching in (meer) delegerenKris Buggenhout
 
Presentatie Les 2 _ vacatures schrijven (versie 1617)
Presentatie Les 2 _ vacatures schrijven (versie 1617)Presentatie Les 2 _ vacatures schrijven (versie 1617)
Presentatie Les 2 _ vacatures schrijven (versie 1617)Ingeborg van Delst
 
PowerPoint les 2 - vacatures schrijven
PowerPoint les 2 - vacatures schrijvenPowerPoint les 2 - vacatures schrijven
PowerPoint les 2 - vacatures schrijvenIngeborg van Delst
 
Workshop kwaliteit jeugd regiobijeenkomst vng december 2016
Workshop kwaliteit jeugd   regiobijeenkomst vng  december 2016Workshop kwaliteit jeugd   regiobijeenkomst vng  december 2016
Workshop kwaliteit jeugd regiobijeenkomst vng december 2016Marion van der Bliek
 
Sollicitatie presentatie
Sollicitatie presentatieSollicitatie presentatie
Sollicitatie presentatieAnkie Botden
 

Destacado (20)

Digitizing Records At The National Archives
Digitizing Records At The National ArchivesDigitizing Records At The National Archives
Digitizing Records At The National Archives
 
Austrian National Library - Current initiatives, projects and the library’s V...
Austrian National Library - Current initiatives, projects and the library’s V...Austrian National Library - Current initiatives, projects and the library’s V...
Austrian National Library - Current initiatives, projects and the library’s V...
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Recruitment2 0
Recruitment2 0Recruitment2 0
Recruitment2 0
 
Slimme boekhouding met Isabel 6
Slimme boekhouding met Isabel 6Slimme boekhouding met Isabel 6
Slimme boekhouding met Isabel 6
 
Vacature onderzoeker
Vacature onderzoekerVacature onderzoeker
Vacature onderzoeker
 
Inspiration tour exact 24 11 16
Inspiration tour exact 24 11 16Inspiration tour exact 24 11 16
Inspiration tour exact 24 11 16
 
Presentatie 'De toekomstige digitale wereld en e-facturatie'
Presentatie 'De toekomstige digitale wereld en e-facturatie'Presentatie 'De toekomstige digitale wereld en e-facturatie'
Presentatie 'De toekomstige digitale wereld en e-facturatie'
 
Regulerende vaardigheden
Regulerende vaardighedenRegulerende vaardigheden
Regulerende vaardigheden
 
Endouble Online advertising
Endouble Online advertising Endouble Online advertising
Endouble Online advertising
 
Recruitment And Analytics For Recruiters United
Recruitment And Analytics For Recruiters UnitedRecruitment And Analytics For Recruiters United
Recruitment And Analytics For Recruiters United
 
3 Big Data Trends for 2017
3 Big Data Trends for 20173 Big Data Trends for 2017
3 Big Data Trends for 2017
 
literature study library
literature study libraryliterature study library
literature study library
 
Halve-dag coaching in (meer) delegeren
Halve-dag coaching in (meer) delegerenHalve-dag coaching in (meer) delegeren
Halve-dag coaching in (meer) delegeren
 
Presentatie Les 2 _ vacatures schrijven (versie 1617)
Presentatie Les 2 _ vacatures schrijven (versie 1617)Presentatie Les 2 _ vacatures schrijven (versie 1617)
Presentatie Les 2 _ vacatures schrijven (versie 1617)
 
PowerPoint les 2 - vacatures schrijven
PowerPoint les 2 - vacatures schrijvenPowerPoint les 2 - vacatures schrijven
PowerPoint les 2 - vacatures schrijven
 
Workshop kwaliteit jeugd regiobijeenkomst vng december 2016
Workshop kwaliteit jeugd   regiobijeenkomst vng  december 2016Workshop kwaliteit jeugd   regiobijeenkomst vng  december 2016
Workshop kwaliteit jeugd regiobijeenkomst vng december 2016
 
Sollicitatie flyer
Sollicitatie flyerSollicitatie flyer
Sollicitatie flyer
 
Sollicitatie presentatie
Sollicitatie presentatieSollicitatie presentatie
Sollicitatie presentatie
 
Trendrapport 2017
Trendrapport 2017Trendrapport 2017
Trendrapport 2017
 

Similar a Big Data for Library Services (2017)

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 
L3 Big Data and Application.pptx
L3  Big Data and Application.pptxL3  Big Data and Application.pptx
L3 Big Data and Application.pptxShambhavi Vats
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfrajsharma159890
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptxshalini s
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsSusanMRob
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 

Similar a Big Data for Library Services (2017) (20)

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
 
BigData
BigDataBigData
BigData
 
BIG DATA.ppt
BIG DATA.pptBIG DATA.ppt
BIG DATA.ppt
 
BIG-DATAPPTFINAL.ppt
BIG-DATAPPTFINAL.pptBIG-DATAPPTFINAL.ppt
BIG-DATAPPTFINAL.ppt
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
L3 Big Data and Application.pptx
L3  Big Data and Application.pptxL3  Big Data and Application.pptx
L3 Big Data and Application.pptx
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libs
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 

Último

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Último (20)

CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 

Big Data for Library Services (2017)

  • 1. Big Data & DS Analytics for PAARL Albert Anthony D. Gavino, MBA Data Scientist / DS Evangelist
  • 2. About the speaker: Albert Anthony D. Gavino
  • 4. Program Objectives / Program Goals Participants to be able to relate Big Data and Data Science applications to Library services.
  • 5. 1. What is Big Data? Extremely large data sets that may be analyzed to reveal patterns, trends and associations
  • 6. The BIG 3 V’s •  Variety: different types of data (Facebook, Twitter, CCTV feed) •  Velocity: the speed that data comes in (batch, streaming every second) •  Volume: the largeness of that data. (1GB, 1TB, 1PB, 1ZB)
  • 7. Library Data Resources What resources does the library have (budget, staff, premises, media, opening hours etc.) and how is the library performing against traditional parameters, like lending figures, visitors and social media activity? This library data can also be combined with environmental information like community education levels, geographical distances, age and so on. http://www.axiell.co.uk/getting-the-most-from-your-library-data/
  • 8. DATA Analytics Challenges and Pitfalls The challenges to creating a robust institutional data analytics program include culture, talent, cost, and data. We have deliberately mentioned culture first because it is very easy to jump to data challenges. In fact, most of the literature surrounding data analytics starts with challenges surrounding the data itself. However, we are convinced that institutional culture is the most important factor in determining the success of any given data analytics program, including the politics and process around questions of talent, cost, and data itself. Reference: The Journal of Academic Librarianship, Libraries and Institutional Data Libraries: Challenges and Opportunities 63% of researchers and administrators expressed unhappiness with the use of metrics in higher education (Abbott et al., 2010)
  • 9. What about New Tasks like streamlining for the Librarian? If librarians take on new tasks, it is very important to track the amount of time and level of staff required when undertaking analytics projects. For example, collecting citation data for a researcher with a common name often requires manual and painstaking record-by-record searching in order to disambiguate that individual's research from others that share his/her name. This type of work requires a librarian with a deep and intimate knowledge of the bibliometric databases that are being used to harvest the bibliometric data. Reference: The Journal of Academic Librarianship, Libraries and Institutional Data Libraries: Challenges and Opportunities
  • 10. What is the Cost? •  Data analytics should be thought of as a strategic investment, not a cost-saving technique •  the real cost is the time spent on cultural change and on developing and educating a staff with the analytical skills that we need in our discipline •  visionary analytics plan invests in people, in hiring and training, over data tools and platforms. .
  • 11. Pitfalls of Data Sharing: Challenges on Institutional Data Analytics Pitfalls Possible Solution/s Ownership: who owns the data? It could be registrar, library, IT services. An assigned office e.g. or Office of the President/ Compliance Office can release the official reports. Quality: deciding when it is accurate or good data, data reliability. Data Governance Unit assures the quality of data Standards: what kind of data variables are in use: string, numeric This can be addressed by Data Management on data warehousing Access: who has access to the data User roles can be defined as to who has access
  • 12. Getting Started on Institutional Data •  Creating an inventory of institutional data •  Developing a data dictionary •  Designing an unambiguous process for cleaning up those data •  Creating an open data set that answers to the most commonly asked data questions across campus.
  • 13. Opportunities for Libraries on Big Data •  Libraries know metadata •  Libraries know strategy •  Libraries know assessment •  Libraries are neutral •  Libraries know the vendors •  Libraries are part of larger bodies like PAARL •  Libraries have influence over campuses •  Libraries know metrics •  Libraries have user-centered culture •  Libraries know the vendors •  Libraries know the politics and policy issues with commercial parties •  Libraries collaborate with both academic and academic support
  • 14. 2. Building a BIG DATA culture •  Openness and acceptance to technology: Upper Management •  Willingness to invest in the Big Data Platform: which entails cost •  Training Staff and making sure of job security: Skills upgrade •  Make data sharing acceptable: Trust in the data quality and people •  Create Data Quality Assurance Team/s •  Foster collaboration among departments •  Continuous improvement of models
  • 15. DATA Governance and DATA Management are different roles Data governance is the designation of decision-rights and policy-making surrounding institutional data, while data management is the implementation of those decisions and policies. Institutions need both, and both require investment, but the senior leadership of our institutions need to design the former. Data Governance Council Data Management policies metrics Data Quality Dept Data Warehouse / Data Lake
  • 16. Machine Learning Is a type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed.
  • 17. Market Basket Analysis on Book Recommendations (Association Rule Algorithm)
  • 18. Weather related information and reading a book (use of hash tags and location and weather data) Pic from Marco Rasos
  • 19. Social Listening – is the process of monitoring digital conversations to understand what customers are saying about a brand or service.
  • 20. Online Research Journals and Click through Rates Click through Rates (CTR) Ratio of users who click on a specific link to get to a page from a page ad or button.
  • 21. OpenCV (Open Source and Computer Vision)
  • 22. Modern Day Data Scientists Dr. Reina Reyes, Astrophysicist Andrew Ng of Baidu, Coursera Amy Smith, Uber Singapore Data Science Conference 2016 YOU as the next Doctor Strange (Entering the world of Data Science) Isaac Reyes, Data Scientist Talas Data Scientists
  • 23. CRISP – DM Methodology The project was led by five companies: SPSS, Teradata, Daimler AG, NCR Corporation and OHRA, an insurance company
  • 25. From regular data to BIG data, from stat to AI RegulardataBIGdata Statistical modeling Machine Learning Deep Learning / A.I. Traditional Modern
  • 26. Trends in Data Science Domains Data Science Domain Current Status Natural Language Processing (NLP) Entered the market Predictive Analytics / Machine Learning Entered the market Visualization / Dashboards Entered the market Image Processing (openCV) Exploration Internet of Things (IoT) Exploration Artificial Intelligence Exploration
  • 27. DS/Big Data Applications to the field of Study Agriculture Climate forecast modeling to help farmers manage plantations (e.g. corn yields) Medical field Image processing for chest x rays, retina images for diabetic patients Linguistics Natural Language Processing (NLP) for dialects and Sentiment Analysis applications Economics/Finance Predicting a stock price based on certain indicators (e.g. noise, competitor price) Sample Field of Study Specific Applications Engineering Internet of Things (IoT) application to Big Data
  • 28. Building a Data Science Team Data ScientistData Engineer/ Dev Ops Statistician Viz Expert R, Python, Spark ML Hadoop, Spark Core, Spark stream SAS, SPSS, R, Matlab Tableau, Cognos D3, Javascript Neural Nets Random Forest RDD, dataframes, SQLContext Linear Regression K-means clustering visualization GIS maps DS role Prog Language Sample output Data Science Team Composition 1 2 3
  • 29. Trends on Programming Languages scala R python spark Rapid miner EMC java
  • 30. TOOLS: OPEN SOURCE vs PROPRIETARY SOFTWARE OPEN SOURCE PROPRIETARY SOFTWARE pros No cost on software, packages are available faster Easy to deploy cons Takes some time to create and integrate with other software Expensive software, you have do buy in modules tools Python, R, Apache Spark SAS, IBM-SPSS, AWS, Google
  • 31. Small Data vs Big Data (in comparison) Small data Big data Sample size can be done (sampling e.g. survey) Use all of the data in the storage No need for memory computing, can be run on a regular PC/Mac Eats up memory and needs distributed computing Statistical assumptions hold true, normality, heteroskedasticity independence Statistical assumptions do not hold true like p-values since the data is so large (what seems not significant to small sets will become significant, be careful when using these assumptions)
  • 32. Simple DS Cheat sheet Classifiers Neural Nets Random forest Clustering K-means Association Assoc Rules Predicting Linear Regression Logistic Regression (binary) Cox Regression (Survival) Hierarchical Clustering SVM (Cancer Cells) Medical
  • 34. Color Hues and Functionality
  • 35. Local Implications: Data Privacy Act 10173 Sensitive personal information refers to personal information: 1. About an individual’s race, ethnic origin, marital status, age, color, and religious, philosophical or political affiliations; 2. About an individual’s health, education, genetic or sexual life of a person, or to any proceeding for any offense committed or alleged to have been committed by such individual, the disposal of such proceedings, or the sentence of any court in such proceedings; 3. Issued by government agencies peculiar to an individual which includes, but is not limited to, social security numbers, previous or current health records, licenses or its denials, suspension or revocation, and tax returns; and 4. Specifically established by an executive order or an act of Congress to be kept classified.
  • 36. Solutions to the Data Privacy Act: Policies Make sure you have the following in place •  Opt In for customers •  Opt out for customers •  Updated your customer policy accordingly •  Make your policy available publicly e.g. websites
  • 37. References •  www.coursera.org/learn/machine-learning •  www.kaggle.com •  www.crowdanalytix.com •  www.talas.ph •  www.facebook.com/analytics4pinoys •  www.linkedin.com/albertgavino