SlideShare una empresa de Scribd logo
1 de 60
Descargar para leer sin conexión
Big Data Analytics: Challenges and
l
h ll
d
What Computational Intelligence
Techniques May Offer
h
ff
Ah-Hwee Tan
(http://www.ntu.edu.sg/home/asahtan)
School of Computer Engineering
Nanyang Technological University
Big Data Analytics Symposium
London, UK
13 September 2013
Outline
 Big Data Analytics
 Computational Intelligence Techniques
 Web Data Analytics


Flexible Organizer for Competitive
Intelligence (FOCI)



Web Information Fusion and Associative
Discovery
Di

 Analytics for Active Living for Elderly
The Era of Big Data
Big data refers to
collection of data sets so large and complex
that
th t exceed th competence of commonly used
d the
t
f
l
d
IT systems in terms of processing space and/or
time.
time
Sources of Big Data
g
• Traditionally, mostly produced in scientific fields such as
astronomy, meteorology
astronomy meteorology, genomics physics biology and
physics, biology,
environmental research.
• With rapid development of IT technology and the
p
p
gy
consequent decrease of cost on collecting and storing
data, big data has been generated from almost every
industry and sector as well as governmental department
department,
including retail, finance, banking, security, audit, electric
power, healthcare.
• Recently, big data over the Web (big Web data for short),
which includes all the context data, such as, user
generated contents, browser/search log data deep web
contents
data,
data, etc.
Examples of Big Data
(Source: Wikipedia)
• Walmart handles more than 1 million customer transactions
every h
hour, which i i
hi h is imported i t d t b
t d into databases estimated t
ti t d to
contain more than 2.5 petabytes (2560 terabytes) of data –
the equivalent of 167 times the information contained in all the books in
the US Library of Congress.

• Facebook handles 50 billion photos from its user base.
• FICO Falcon Credit Card Fraud Detection System protects
2.1 billion active accounts world-wide.
• Windermere Real Estate uses anonymous GPS signals from
nearly 100 million drivers to help new home buyers
yp
determine their typical drive times to and from work
throughout various times of the day.
Examples of Big Data
(Source: Wikipedia)
• NASA Center for Climate Simulation
(NCCS) stores 32 petabytes of
climate observations and simulations
on the Discover supercomputing
cluster.
• Utah Data Center is a data center
currently
c rrentl being constr cted b the
constructed by
United States National Security
Agency. When finished, the facility
will handle yottabytes of information
collected by NSA over the Internet.

Value

Metric

1000

kB

kilobyte

10002

MB

megabyte

10003

GB

gigabyte

10004

TB

terabyte

10005

PB

petabyte

10006

EB

exabyte

10007

ZB

zettabyte

10008

YB

yottabyte
Money of Big Data
(Source: Wikipedia)
• "Big data" have increased the demand of information
g
management specialists
• Software AG, Oracle Corporation, IBM, Microsoft,
SAP, EMC, d
SAP EMC and HP h
have spent more than $15 billion
t
th
billi
on software firms specializing in data management
and analytics.
y
• In 2010, this industry on its own was worth more than
$100 billion and was growing at almost 10 percent a
year: about twice as fast as the software business as
a whole.
Market of Big Data
(Source: Wikipedia)
• Developed economies make increasing use of datadata
intensive technologies. There are 4.6 billion mobilephone subscriptions worldwide and there are between
1 billion and 2 billion people accessing the internet
• The world's effective capacity to exchange information
through telecommunication networks was 281
petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes
in 2000, 65 exabytes in 2007[14] and it is predicted that
the amount of traffic flowing over the internet will reach
667 exabytes annually by 2013.[5]
Big Data Market Segments
(Report by Transparency Market Research)
• Segmentation of the big data market by components, by
g
g
y
p
, y
applications and by geography.
• The different components included are software and
services, hardware and storage.
• Software and services segment dominates the components
market whereas storage segment will be the fastest
growing segment for the next 5 years owing to the
perpetual growth in th d t generated.
t l
th i the data
t d
Big Data Market Segment by
Applications
• Covered eight applications namely financial services,
manufacturing, healthcare, telecommunication,
government, retail and media & entertainment and others in
the application segment.
• Financial Services, healthcare and the government sector
are the top three contributors of the big data market and
together held more than 55% of the big data market in
2012.
• M di and E t t i
Media d Entertainment and th h lth
t d the healthcare sectors will
t
ill
grow at high CAGR of nearly 42% from 2012 to 2018. The
g
growth in data in the form of video, images, and g
g
games is
driving the media and entertainment segment.
Read more: http://www.digitaljournal.com/pr/1395146#ixzz2b0hvuxrQ
Challenges of Big Data
• Volume
– Size in the order of petabytes,
exabytes, …

• Velocity
– Time sensitive data, data that
g
grow exponentially or even in
p
y
rates that overwhelm the wellknown Moore's Law

Value

Metric

1000

kB

kilobyte

10002

MB

megabyte

10003

GB

gigabyte
i b t

10004

TB

terabyte

10005

PB

petabyte

10006

EB

exabyte

10007

ZB

zettabyte

10008

YB

yottabyte

• V i t
Variety
– From structured data into semi-structured and
completely unstructured data of different types such as
types,
text, image, audio, video, click streams, log files,
Deeper Issues of Big Data
(The additional 3Vs)
• Validity
– Is the data correct and accurate for the intended
usage?

• V
Veracity
i
– Are the results meaningful for the given problem
space?

• Volatility
– How long do you need to look/store this data?
Computational Intelligence

• Neural Networks (IJCNN)
– Brain-like mathematical models for pattern
recognition, memory, and association discovery
– Examples: Perceptron, BP, SVM, SOM, ART, …

• Fuzzy Systems (IEEE-FUZZ)
– Fuzzy operators for handling non-discrete reasoning
– Examples: FNN, Fuzzy C-Means, …
Computational Intelligence

• Evolutionary Computing (CEC)
– Classes of heuristic algorithms repeatedly
search for good solutions by mimicking
g
y
g
the process of natural evolution
– Commonly used for optimization and
search problems
– Examples: Genetic Algo, Memetic Algo,
Flagship Events of
Computational Intelligence
• World Congress on Computational Intelligence
(Australia 2012, Beijing 2014)
y p
p
g
• IEEE Symposium on Computational Intelligence
(Singapore 2013, Florida, USA 2014)
• IEEE Symposium on Computational Intelligence
in Big Data (IEEE CIBD'2014)
Examples of Use of CI in Big Data
•
•
•
•
•
•
•
•
•
•
•
•
•
•

Data size and feature space adaptation
Uncertainty modeling in learning from big data
Distributed learning techniques in uncertain environment
Uncertainty in cloud computing
Distributed
Di ib d parallel computation
ll l
i
Feature selection/extraction in big data
Sample selection based on uncertainty
Incremental Learning
Manifold Learning on big data
Uncertainty techniques in big data classification/clustering
Imbalance learning on big data
Active learning on big data
Random weight networks on bi d t
R d
i ht t
k
big data
Transfer learning on big data
Self-Organizing N
S lf O
i i Neural
l
Networks for
Personalized W b Intelligence
P
li d Web I t lli

Towards Personalized Web Intelligence
g
Ah-Hwee Tan, Hwee-Leng Ong,
Hong Pan, Jamie Ng, Qiu-Xiang Li
Knowledge and Information Systems 18 (2004) 297-306
Workflow for Web Data Analytics
y
• Search
– Getting the information

• Organize
(clustering/categorizing)
– Putting things in perspectives

• Analyze (data mining)
– Discover hidden knowledge

• Share (knowledge management)
– Saving for reference and sharing

• Track
– Constant monitoring
Approaches to
Organizing/Analyzing
• Cl stering
Clustering
– Organizing information into groups based on
similarity functions and thresholds
– e.g. BullsEye, NorthernLight, Vivisimo

• Categorization
g
– Organizing information into a “predefined” set of
classes
– e.g. Yahoo!, Autonomy Knowledge Server

• Which is better?
Clustering
g
• Pros
– Unsupervised/self-organizing, require no training
or predefinition of classes
– Able to identify new themes

• Cons
– Users have no control
– Ever changing cluster structure
– Difficult to navigate and track
Categorization
g
• Pros
– Good control on classes
– Every info assigned to one or more classes
of interests

• Cons
–R
Require l
i learning (
i (supervised) and/or
i d)
d/
definition of classification rules/knowledge
– Every info has to be assigned to one or
more classes
– Good control but lack flexibility to handle
new information
User-configurable Clustering
(Tan & Pan, PAKDD 2002)
Pan

• Information organization and content
organi ation
management
• Online incremental clustering + user
userdefined structure (preferences)
• Reduces to a clustering system if no user
indication given
• Allows personalization in a direct
direct,
intuitive, and interactive manner
• Control + flexibility
ARAM for Personalized
Information Management
Information Clusters
F2

b

F1

a

F1
a

a



-

x

b

x

b

+

+

Information Vector

-

A

B

Preference Vector
Flexible Organizer for Competitive
Intelligence (FOCI)
• A platform for gathering, organizing,
tracking, analyzing, and sharing
competitive information
• Natural way of turning raw search results
into personalized CI portfolios
– Multilingual enabled
– with Multilingual Efficient Analyzer
g
y
– Domain localization (Technology)

• Patented and licensed to many companies
FOCI User Interface
FOCI Architecture
Intranet/
Internet

User’s
CI Portfolio
Domain-Specific
Knowledge

Content
Management
Content
Publishing
g
Content
Analysis

Visu
ualization Front End
d

Content
Gathering
Personalized Content Management
g
• Portfolio created through Search
f
S
• Unsupervised clustering (ARAM Pattern Channel A)
• Loop
– Personalization by users (ARAM Pattern Channel B)
– Reorganization of clusters (ARAM Pattern Channel A&B)

• Saving of personalized portfolio
• Tracking of new information
Personalization Functions
• Marking/labeling (selected) clusters
– Personal interpretation

• Inserting Clusters
– Indicate preference on groupings

• Merging clusters
– Indicate preferences on similarities

• Splitting clusters
– Indicate preferences on differences

• ...
Information Clustering
g

• A portfolio created
by a meta-search of
y
4 search engines
with a query on
“Text Mining”
A Personalized Portfolio
after <=19 personalization operations
p
p
(mainly labeling and creating clusters)
Organizing New Information
g
g
Without the
Personalized
Portfolio

42 new documents from
DirectHit, Netscape, and
BusinessWire
B i
Wi

Based on
Personalized
Portfolio
Summary
y
• A fusion neural network algorithm, called fusion ART, has
been
proposed
for
integrating
clustering
and
categorization
• Has been applied to competiti e intelligence on the web.
competitive
eb
• Comparing with
advantages in

existing

works,

fusion

ART

has

– Personalization— fusion ART performs analysis and organization
of data based on user preferences
– Low time complexity — f
fusion ART performs real-time search and
f
match of patterns resulting in a linear time complexity
– Incremental clustering manner — fusion ART may adapt to
dynamic web multimedia d
d
i
b
l i di data set b i
by incrementally clustering new
ll l
i
patterns based on the learnt cluster structure without referring to
the old data.
3
2
Heterogeneous Data Co-clustering
for
Social Media Data
Theme Discovery and Mining

Lei Meng, Ah-Hwee Tan and Dong Xu
g
g
IEEE Transactions on Knowledge and Data Engineering, 2013

33
Introduction
• The popularity of social websites leads to greatly
p p
y
g
y
increase of web multimedia documents
– Massive number – Billions of images and articles online
– Diversity – Diverse content and booming emerging topics
– Multi-modal descriptors – images, text, category, tags,
Keywords
comments
Category
Birds

Images

from
Wild, bird, beach,
Surrounding
tree, vacation,
text
animal, mar, sunny,
playa, nayarit,
arena,ave, water,
vacaciones,
i
hollyday,
pelicano.
34
Introduction
• Clustering of web multimedia data is challenging
–
–
–
–

Scalability bi d
S l bili to big data
Difficulty in integrating multi-modal feature data
Ambiguity in deciding the number of categories
Rich but noisy meta-information – semantic gap of images, noisy
tags

Birds
Bi d

Wild, bird,
beach, tree,
vacation,
animal, mar
animal mar,
sunny, playa,
nayarit, arena,
ave, water,
vacaciones,
hollyday,
pelicano.

Beach
B h

Ocean, blue,
sea, summer,
vacation, sun,
man, b h
beach,
water, yellow,
fun, sand,
p y
play, funny,
y
adult, humor,
lifestyle,
sunny, resort. 35
Problem Statement
We define the theme discovery of web multimedia data
as a h
heterogeneous d
data co-clustering problem, which
l
i
bl
hi h
identifies the semantic categories of data patterns
through the fusion and recognition of multiple types of
features.
Multiple
Apple
Apple

Descriptions
Category

Fruits

Products

Movies

Tag
User
Description
Surrounding
text

…… …… ……
36
Proposed Approach
p
pp
• A self-organizing neural network approach to Heterogeneous
Data Co-clustering
 Based on Fusion Adaptive Resonance Theory (Fusion ART)
 Fuse arbitrary number of feature modalities
 Adaptively tune the weights for different feature modalities
 Two different learning function for primary data, such as
images and articles, and meta-information to handle short
and nois text
noisy te t
 Incremental fast learning
 D not need to give the number of clusters
Do
d
i h
b
f l
37
Experiments
• NUS-WIDE data set
– 36784 images of 18 categories
– Visual features: Grid color moment, Edge direction histogram, and
wavelet texture
– T t l features of surrounding text: 1142 words (7 words per image on
Textual f t
f
di t t
d
d
i
average)

• 20 Newsgroups data set
g p
– 12826 text documents of 10 categories
– Textual features of document content: over 60k words (800 words per
document on average)
– Textual features of category: 3 labels per document on average

38
Experiments on NUS-WIDE Data Set
• Evaluation on weight adaptation across channels for visual and
textual features
– Performance Comparison with fixed weight values

• GHF-ART with the adaptively tuned weight values γ_SA achieves the best
performance in 5 classes and the overall performance, and achieves close
performance with the best results obtained by fixed weight values

39
Experiments on NUS-WIDE Data Set
– Tracking of the change in weight values of γ _SA

• Textual features of surrounding text are assigned higher weights than visual
features
• The value of γ SA s b es in [0.7, 0.8] with the increase of patterns
e v ue o γ_S stabilizes [ .7, . ] w
e c e se o p e s
• Big fluctuation may be resulted by the generation of new clusters

40
Experiments on NUS-WIDE Data Set
•

Clustering Performance comparison with existing algorithms in terms of
weighted average precision cluster entropy (H cluster) class entropy ( H class )
precision,
),
),
l
purity and rand index (RI)

• GHF-ART achieves the best performance in terms of all the evaluation
measures
• With supervisory information, GHF-ART(SS) consistently obtains better
performance

41
Experiments on NUS-WIDE Data Set
• Time complexity analysis

– GHF-ART and Fusion ART incur very small increase of time cost
– For 23284 images, GHF-ART complete the clustering process in 10 seconds

42
Experiments on 20 Newsgroups Data Set
p
g p
• Clustering performance comparison using document content
and category information
d t
i f
ti

– Both GHF-ART and GHF-ART(SS) outperform other algorithms in all
the evaluation measures
– GHF ART has a 5% gain than Fusion ART in terms of Average
GHF-ART
Precision, Purity and Rand Index.
– Comparing with other unsupervised algorithms, GHF-ART achieves
around 80% in Average Precision, Purity and Rand Index while other
Precision
algorithms typically obtain less than 75%
43
Summary
y
• A Heterogeneous data co-clustering algorithm, called GHFART,
ART is proposed to discover the themes of web multimedia data
via their rich but heterogeneous descriptors.
• Comparing with existing works GHF ART has advantages in
works, GHF-ART
– Strong noise immunity — A learning function of meta-information is
proposed to handle noise
– Ad ti channel weighting — A well-defined weighting algorithm i
Adaptive h
l
i hti
ll d fi d
i hi
l i h is
proposed to identify the important feature modalities for a better fusion of
multi-modal features for overall similarity measure;
– L
Low ti
time complexity — GHF ART performs real-time search and match
l it
GHF-ART
f
l ti
h d
t h
of patterns resulting in a linear time complexity for big data;
– Incremental clustering manner — GHF-ART may adapt to dynamic
web multimedia d t set b i
b
lti di data t by incrementally clustering new patterns b d
t ll l t i
tt
based
on the learnt cluster structure without referring to the old data.
44
Research Centre of Excellence in
Active LIving for th ld LY
A ti LI i f the elderLY (LILY)

Aging in Place:
Opportunities and Challenges
Ah-Hwee Tan
( p
(http://www.ntu.edu.sg/home/asahtan)
g
)
School of Computer Engineering
Nanyang Technological University

JOINT UBC-NTU RESEARCH CENTRE
Aging in Place
g g
“the ability to live in one's own home and community
safely, independently, and comfortably, regardless of
age, income, or ability level” - Center for Disease
Control, Dec 2011
,

46
Motivation
 Global aging population creates silver challenges
 Most adults would prefer to age in place
 78 percent of adults between the ages of 50 and 64
report that they would prefer to stay in their current
residence as they age

 Growing elderly population will be living
independently in own homes
g
 Vital to transform future homes into intelligent
human-centered environment for the elderly
 Golden opportunities for innovating assistive
technologies f aging i place
h l i for i in l
47
A Basic Scenario of Tender Care for Agingin-place
p
 Unobtrusive
Sensing
 Social Signal
Processing
g
 Context
Aware Auto
Tagging
 Social
Cognitive
Network

Unobtrusive sensing device detects: the elder keeps walking around at an irregular
pace.
Social signal processing indicates: the elder has been silent for an unusually long
time.

Cognitive
Analysis
result…
lt

Your
mother may
be feeling
anxious
now…
now

I need to
call my
y
mother
now…
Silver Challenges
g

49
Vision
To
T enable elderly t maintain an active, h lth and
bl ld l to
i t i
ti
healthy d
engaging life style in their own homes supported by
an age-friendly intelligent environment, providing allg
y
g
p
g
round comprehensive tender care
 Round-the-clock day-to-day health and wellness
monitoring
i i
 Cognitive Support and recommendation to products
and services
 Companionship and emotional support
 Support for maintaining/stimulating social
interaction
50
Design Consideration and
Challenges
 How to perform unobtrusive monitoring?
- Mobile sensing, activity tracking
 How to provide all-around comprehensive care?
all around
- Physical, cognitive, emotional, social, sustainability

 How to maintain ubiquitous access
q
interaction?
- Cross platform, multimedia, multimodal
 How to provide friendly, personal touch?
- Adaptive user modeling, mood detection

and

-P
Proactive, natural i
i
l interaction
i
51
Approach and Methodology
pp
gy
To support active living of elderlies
pp
g f
through an intelligent multi-agent environment
with ubiquitous access, natural interface, and allrounded comprehensive care
dd
h i
Key Technologies





Unobtrusive sensing and social signal processing
Activity pattern and user modeling
Information and service recommendation
Proactive stimulation and natural interaction
52
A Multi-Agent Collaborative
Care Environment
Isabel
(Personal Nurse)
Small talk
Recommendations
for healthcare
products and services

Alfred
Alf d
(The Butler)
Small talk
User modeling
Social and travel
advisory

Frank
(Robot Dog)
Activity sensing
Pattern modeling
53
Why Multi-Agent?
y
g
 Unobtrusive sensing and monitoring – agents
of different characteristics and capabilities

 Ubi i
Ubiquitous access to information and
i f
i
d
services – agents in different platforms and
locations

 Comprehensive tender care – agents with
different domain knowledge and functions
diff
d
i k
l d
df
i

 “Three’s a party” – more opportunities for
p y
pp
cognitive stimulation and social interaction
54
Comprehensive Tender Care
 Physical Support – Activity tracking, safety and
tracking
wellness monitoring

C
Cognitive S
i i Support – i f
information and
i
d
recommendation on (healthcare) products, services,
skills and activities
k
nd ct v t

 Emotional Support – mood detection, affective
support, small talk
t
ll t lk

 Social Support – companionship and connection
to family and friends (old and new) through sms,
emails and facebooks etc

55
Unobtrusive Sensing and
Ubiquitous Access to Services
unobtrusive in-home real-time data collection
and contextual social signal processing
- Essential to better understand and cater to the
elderly’s needs.
ld l ’
d

 Sensing – bio sensing, motion sensors,
wearable/mobile sensors for health monitoring and
activity tracking

 Cross Platform – Large screen interactive display,
mobile handheld devices, physical robots

 Multimedia – text, audio, video

56
Adaptive User Modelling
p
g
 Identity and profile
 Interests and preferences
 Behaviour model: Ti
Time, space, activity
p
ti it
 Knowledge and skills
 S i l network: Family and f d
Social
k
l
d friends
Meth0ds for Model Building
 Explicit: User specification
 Implicit: User actions, choices, conversation
57
Cognitive Support:
Product/Service Recommendation
 Domain knowledge:
Healthcare, Travel, Cooking

 Delivery modes:
- Question & Answer
-P
Proactive recommendation
i
d i
- Conversation

P
Personal T h
l Touch:
Personalized, Context sensitive, small talks
58
Challenges in
g
g
y
Big Living Analytics
 Volume – huge amount of data through bio
sensing, motion sensors, wearable/mobile sensors
for health monitoring and activity tracking

 Velocity – 24x7 real time sensing, sense making,
decision making service recommendation
making,

 Variety – information integration and knowledge
sharing from cross platform, multimedia
h i f
l f
l i di
unstructured data - text, audio, video, gestures

59
Research Centre of Excellence in
Active LIving for the elderLY (LILY)
LI
LY

Thank you!
JOINT UBC-NTU RESEARCH CENTRE

Más contenido relacionado

La actualidad más candente

Hadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of TanzaniaHadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
ijsrd.com
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 

La actualidad más candente (19)

A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Big data presentation for University of Reykjavik, Iceland, March 22
Big data presentation for University of Reykjavik, Iceland, March 22 Big data presentation for University of Reykjavik, Iceland, March 22
Big data presentation for University of Reykjavik, Iceland, March 22
 
Overview of Data Mining
Overview of Data MiningOverview of Data Mining
Overview of Data Mining
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of TanzaniaHadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
UN Global Pulse Annual Report 2018
UN Global Pulse Annual Report 2018UN Global Pulse Annual Report 2018
UN Global Pulse Annual Report 2018
 
A Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyA Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven Society
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
 
Data Mining
Data MiningData Mining
Data Mining
 
Lars Lyberg, Inizio: Rapport från konferensen BigSurv18
Lars Lyberg, Inizio: Rapport från konferensen BigSurv18Lars Lyberg, Inizio: Rapport från konferensen BigSurv18
Lars Lyberg, Inizio: Rapport från konferensen BigSurv18
 
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 

Destacado (6)

Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Data mining and_big_data_web
Data mining and_big_data_webData mining and_big_data_web
Data mining and_big_data_web
 
Introduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data AnalyticsIntroduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data Analytics
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Similar a Computational intelligence for big data analytics bda 2013

BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
maigva
 

Similar a Computational intelligence for big data analytics bda 2013 (20)

BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
NCCT.pptx
NCCT.pptxNCCT.pptx
NCCT.pptx
 
DOWLD SLIDES.pptx
DOWLD SLIDES.pptxDOWLD SLIDES.pptx
DOWLD SLIDES.pptx
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Big data
Big dataBig data
Big data
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
ppt1.pptx
ppt1.pptxppt1.pptx
ppt1.pptx
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 

Computational intelligence for big data analytics bda 2013

  • 1. Big Data Analytics: Challenges and l h ll d What Computational Intelligence Techniques May Offer h ff Ah-Hwee Tan (http://www.ntu.edu.sg/home/asahtan) School of Computer Engineering Nanyang Technological University Big Data Analytics Symposium London, UK 13 September 2013
  • 2. Outline  Big Data Analytics  Computational Intelligence Techniques  Web Data Analytics  Flexible Organizer for Competitive Intelligence (FOCI)  Web Information Fusion and Associative Discovery Di  Analytics for Active Living for Elderly
  • 3. The Era of Big Data Big data refers to collection of data sets so large and complex that th t exceed th competence of commonly used d the t f l d IT systems in terms of processing space and/or time. time
  • 4. Sources of Big Data g • Traditionally, mostly produced in scientific fields such as astronomy, meteorology astronomy meteorology, genomics physics biology and physics, biology, environmental research. • With rapid development of IT technology and the p p gy consequent decrease of cost on collecting and storing data, big data has been generated from almost every industry and sector as well as governmental department department, including retail, finance, banking, security, audit, electric power, healthcare. • Recently, big data over the Web (big Web data for short), which includes all the context data, such as, user generated contents, browser/search log data deep web contents data, data, etc.
  • 5. Examples of Big Data (Source: Wikipedia) • Walmart handles more than 1 million customer transactions every h hour, which i i hi h is imported i t d t b t d into databases estimated t ti t d to contain more than 2.5 petabytes (2560 terabytes) of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress. • Facebook handles 50 billion photos from its user base. • FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide. • Windermere Real Estate uses anonymous GPS signals from nearly 100 million drivers to help new home buyers yp determine their typical drive times to and from work throughout various times of the day.
  • 6. Examples of Big Data (Source: Wikipedia) • NASA Center for Climate Simulation (NCCS) stores 32 petabytes of climate observations and simulations on the Discover supercomputing cluster. • Utah Data Center is a data center currently c rrentl being constr cted b the constructed by United States National Security Agency. When finished, the facility will handle yottabytes of information collected by NSA over the Internet. Value Metric 1000 kB kilobyte 10002 MB megabyte 10003 GB gigabyte 10004 TB terabyte 10005 PB petabyte 10006 EB exabyte 10007 ZB zettabyte 10008 YB yottabyte
  • 7. Money of Big Data (Source: Wikipedia) • "Big data" have increased the demand of information g management specialists • Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, d SAP EMC and HP h have spent more than $15 billion t th billi on software firms specializing in data management and analytics. y • In 2010, this industry on its own was worth more than $100 billion and was growing at almost 10 percent a year: about twice as fast as the software business as a whole.
  • 8. Market of Big Data (Source: Wikipedia) • Developed economies make increasing use of datadata intensive technologies. There are 4.6 billion mobilephone subscriptions worldwide and there are between 1 billion and 2 billion people accessing the internet • The world's effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007[14] and it is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 2013.[5]
  • 9. Big Data Market Segments (Report by Transparency Market Research) • Segmentation of the big data market by components, by g g y p , y applications and by geography. • The different components included are software and services, hardware and storage. • Software and services segment dominates the components market whereas storage segment will be the fastest growing segment for the next 5 years owing to the perpetual growth in th d t generated. t l th i the data t d
  • 10. Big Data Market Segment by Applications • Covered eight applications namely financial services, manufacturing, healthcare, telecommunication, government, retail and media & entertainment and others in the application segment. • Financial Services, healthcare and the government sector are the top three contributors of the big data market and together held more than 55% of the big data market in 2012. • M di and E t t i Media d Entertainment and th h lth t d the healthcare sectors will t ill grow at high CAGR of nearly 42% from 2012 to 2018. The g growth in data in the form of video, images, and g g games is driving the media and entertainment segment. Read more: http://www.digitaljournal.com/pr/1395146#ixzz2b0hvuxrQ
  • 11. Challenges of Big Data • Volume – Size in the order of petabytes, exabytes, … • Velocity – Time sensitive data, data that g grow exponentially or even in p y rates that overwhelm the wellknown Moore's Law Value Metric 1000 kB kilobyte 10002 MB megabyte 10003 GB gigabyte i b t 10004 TB terabyte 10005 PB petabyte 10006 EB exabyte 10007 ZB zettabyte 10008 YB yottabyte • V i t Variety – From structured data into semi-structured and completely unstructured data of different types such as types, text, image, audio, video, click streams, log files,
  • 12. Deeper Issues of Big Data (The additional 3Vs) • Validity – Is the data correct and accurate for the intended usage? • V Veracity i – Are the results meaningful for the given problem space? • Volatility – How long do you need to look/store this data?
  • 13. Computational Intelligence • Neural Networks (IJCNN) – Brain-like mathematical models for pattern recognition, memory, and association discovery – Examples: Perceptron, BP, SVM, SOM, ART, … • Fuzzy Systems (IEEE-FUZZ) – Fuzzy operators for handling non-discrete reasoning – Examples: FNN, Fuzzy C-Means, …
  • 14. Computational Intelligence • Evolutionary Computing (CEC) – Classes of heuristic algorithms repeatedly search for good solutions by mimicking g y g the process of natural evolution – Commonly used for optimization and search problems – Examples: Genetic Algo, Memetic Algo,
  • 15. Flagship Events of Computational Intelligence • World Congress on Computational Intelligence (Australia 2012, Beijing 2014) y p p g • IEEE Symposium on Computational Intelligence (Singapore 2013, Florida, USA 2014) • IEEE Symposium on Computational Intelligence in Big Data (IEEE CIBD'2014)
  • 16. Examples of Use of CI in Big Data • • • • • • • • • • • • • • Data size and feature space adaptation Uncertainty modeling in learning from big data Distributed learning techniques in uncertain environment Uncertainty in cloud computing Distributed Di ib d parallel computation ll l i Feature selection/extraction in big data Sample selection based on uncertainty Incremental Learning Manifold Learning on big data Uncertainty techniques in big data classification/clustering Imbalance learning on big data Active learning on big data Random weight networks on bi d t R d i ht t k big data Transfer learning on big data
  • 17. Self-Organizing N S lf O i i Neural l Networks for Personalized W b Intelligence P li d Web I t lli Towards Personalized Web Intelligence g Ah-Hwee Tan, Hwee-Leng Ong, Hong Pan, Jamie Ng, Qiu-Xiang Li Knowledge and Information Systems 18 (2004) 297-306
  • 18. Workflow for Web Data Analytics y • Search – Getting the information • Organize (clustering/categorizing) – Putting things in perspectives • Analyze (data mining) – Discover hidden knowledge • Share (knowledge management) – Saving for reference and sharing • Track – Constant monitoring
  • 19. Approaches to Organizing/Analyzing • Cl stering Clustering – Organizing information into groups based on similarity functions and thresholds – e.g. BullsEye, NorthernLight, Vivisimo • Categorization g – Organizing information into a “predefined” set of classes – e.g. Yahoo!, Autonomy Knowledge Server • Which is better?
  • 20. Clustering g • Pros – Unsupervised/self-organizing, require no training or predefinition of classes – Able to identify new themes • Cons – Users have no control – Ever changing cluster structure – Difficult to navigate and track
  • 21. Categorization g • Pros – Good control on classes – Every info assigned to one or more classes of interests • Cons –R Require l i learning ( i (supervised) and/or i d) d/ definition of classification rules/knowledge – Every info has to be assigned to one or more classes – Good control but lack flexibility to handle new information
  • 22. User-configurable Clustering (Tan & Pan, PAKDD 2002) Pan • Information organization and content organi ation management • Online incremental clustering + user userdefined structure (preferences) • Reduces to a clustering system if no user indication given • Allows personalization in a direct direct, intuitive, and interactive manner • Control + flexibility
  • 23. ARAM for Personalized Information Management Information Clusters F2 b F1 a F1 a a  - x b x b + + Information Vector - A B Preference Vector
  • 24. Flexible Organizer for Competitive Intelligence (FOCI) • A platform for gathering, organizing, tracking, analyzing, and sharing competitive information • Natural way of turning raw search results into personalized CI portfolios – Multilingual enabled – with Multilingual Efficient Analyzer g y – Domain localization (Technology) • Patented and licensed to many companies
  • 27. Personalized Content Management g • Portfolio created through Search f S • Unsupervised clustering (ARAM Pattern Channel A) • Loop – Personalization by users (ARAM Pattern Channel B) – Reorganization of clusters (ARAM Pattern Channel A&B) • Saving of personalized portfolio • Tracking of new information
  • 28. Personalization Functions • Marking/labeling (selected) clusters – Personal interpretation • Inserting Clusters – Indicate preference on groupings • Merging clusters – Indicate preferences on similarities • Splitting clusters – Indicate preferences on differences • ...
  • 29. Information Clustering g • A portfolio created by a meta-search of y 4 search engines with a query on “Text Mining”
  • 30. A Personalized Portfolio after <=19 personalization operations p p (mainly labeling and creating clusters)
  • 31. Organizing New Information g g Without the Personalized Portfolio 42 new documents from DirectHit, Netscape, and BusinessWire B i Wi Based on Personalized Portfolio
  • 32. Summary y • A fusion neural network algorithm, called fusion ART, has been proposed for integrating clustering and categorization • Has been applied to competiti e intelligence on the web. competitive eb • Comparing with advantages in existing works, fusion ART has – Personalization— fusion ART performs analysis and organization of data based on user preferences – Low time complexity — f fusion ART performs real-time search and f match of patterns resulting in a linear time complexity – Incremental clustering manner — fusion ART may adapt to dynamic web multimedia d d i b l i di data set b i by incrementally clustering new ll l i patterns based on the learnt cluster structure without referring to the old data. 3 2
  • 33. Heterogeneous Data Co-clustering for Social Media Data Theme Discovery and Mining Lei Meng, Ah-Hwee Tan and Dong Xu g g IEEE Transactions on Knowledge and Data Engineering, 2013 33
  • 34. Introduction • The popularity of social websites leads to greatly p p y g y increase of web multimedia documents – Massive number – Billions of images and articles online – Diversity – Diverse content and booming emerging topics – Multi-modal descriptors – images, text, category, tags, Keywords comments Category Birds Images from Wild, bird, beach, Surrounding tree, vacation, text animal, mar, sunny, playa, nayarit, arena,ave, water, vacaciones, i hollyday, pelicano. 34
  • 35. Introduction • Clustering of web multimedia data is challenging – – – – Scalability bi d S l bili to big data Difficulty in integrating multi-modal feature data Ambiguity in deciding the number of categories Rich but noisy meta-information – semantic gap of images, noisy tags Birds Bi d Wild, bird, beach, tree, vacation, animal, mar animal mar, sunny, playa, nayarit, arena, ave, water, vacaciones, hollyday, pelicano. Beach B h Ocean, blue, sea, summer, vacation, sun, man, b h beach, water, yellow, fun, sand, p y play, funny, y adult, humor, lifestyle, sunny, resort. 35
  • 36. Problem Statement We define the theme discovery of web multimedia data as a h heterogeneous d data co-clustering problem, which l i bl hi h identifies the semantic categories of data patterns through the fusion and recognition of multiple types of features. Multiple Apple Apple Descriptions Category Fruits Products Movies Tag User Description Surrounding text …… …… …… 36
  • 37. Proposed Approach p pp • A self-organizing neural network approach to Heterogeneous Data Co-clustering  Based on Fusion Adaptive Resonance Theory (Fusion ART)  Fuse arbitrary number of feature modalities  Adaptively tune the weights for different feature modalities  Two different learning function for primary data, such as images and articles, and meta-information to handle short and nois text noisy te t  Incremental fast learning  D not need to give the number of clusters Do d i h b f l 37
  • 38. Experiments • NUS-WIDE data set – 36784 images of 18 categories – Visual features: Grid color moment, Edge direction histogram, and wavelet texture – T t l features of surrounding text: 1142 words (7 words per image on Textual f t f di t t d d i average) • 20 Newsgroups data set g p – 12826 text documents of 10 categories – Textual features of document content: over 60k words (800 words per document on average) – Textual features of category: 3 labels per document on average 38
  • 39. Experiments on NUS-WIDE Data Set • Evaluation on weight adaptation across channels for visual and textual features – Performance Comparison with fixed weight values • GHF-ART with the adaptively tuned weight values γ_SA achieves the best performance in 5 classes and the overall performance, and achieves close performance with the best results obtained by fixed weight values 39
  • 40. Experiments on NUS-WIDE Data Set – Tracking of the change in weight values of γ _SA • Textual features of surrounding text are assigned higher weights than visual features • The value of γ SA s b es in [0.7, 0.8] with the increase of patterns e v ue o γ_S stabilizes [ .7, . ] w e c e se o p e s • Big fluctuation may be resulted by the generation of new clusters 40
  • 41. Experiments on NUS-WIDE Data Set • Clustering Performance comparison with existing algorithms in terms of weighted average precision cluster entropy (H cluster) class entropy ( H class ) precision, ), ), l purity and rand index (RI) • GHF-ART achieves the best performance in terms of all the evaluation measures • With supervisory information, GHF-ART(SS) consistently obtains better performance 41
  • 42. Experiments on NUS-WIDE Data Set • Time complexity analysis – GHF-ART and Fusion ART incur very small increase of time cost – For 23284 images, GHF-ART complete the clustering process in 10 seconds 42
  • 43. Experiments on 20 Newsgroups Data Set p g p • Clustering performance comparison using document content and category information d t i f ti – Both GHF-ART and GHF-ART(SS) outperform other algorithms in all the evaluation measures – GHF ART has a 5% gain than Fusion ART in terms of Average GHF-ART Precision, Purity and Rand Index. – Comparing with other unsupervised algorithms, GHF-ART achieves around 80% in Average Precision, Purity and Rand Index while other Precision algorithms typically obtain less than 75% 43
  • 44. Summary y • A Heterogeneous data co-clustering algorithm, called GHFART, ART is proposed to discover the themes of web multimedia data via their rich but heterogeneous descriptors. • Comparing with existing works GHF ART has advantages in works, GHF-ART – Strong noise immunity — A learning function of meta-information is proposed to handle noise – Ad ti channel weighting — A well-defined weighting algorithm i Adaptive h l i hti ll d fi d i hi l i h is proposed to identify the important feature modalities for a better fusion of multi-modal features for overall similarity measure; – L Low ti time complexity — GHF ART performs real-time search and match l it GHF-ART f l ti h d t h of patterns resulting in a linear time complexity for big data; – Incremental clustering manner — GHF-ART may adapt to dynamic web multimedia d t set b i b lti di data t by incrementally clustering new patterns b d t ll l t i tt based on the learnt cluster structure without referring to the old data. 44
  • 45. Research Centre of Excellence in Active LIving for th ld LY A ti LI i f the elderLY (LILY) Aging in Place: Opportunities and Challenges Ah-Hwee Tan ( p (http://www.ntu.edu.sg/home/asahtan) g ) School of Computer Engineering Nanyang Technological University JOINT UBC-NTU RESEARCH CENTRE
  • 46. Aging in Place g g “the ability to live in one's own home and community safely, independently, and comfortably, regardless of age, income, or ability level” - Center for Disease Control, Dec 2011 , 46
  • 47. Motivation  Global aging population creates silver challenges  Most adults would prefer to age in place  78 percent of adults between the ages of 50 and 64 report that they would prefer to stay in their current residence as they age  Growing elderly population will be living independently in own homes g  Vital to transform future homes into intelligent human-centered environment for the elderly  Golden opportunities for innovating assistive technologies f aging i place h l i for i in l 47
  • 48. A Basic Scenario of Tender Care for Agingin-place p  Unobtrusive Sensing  Social Signal Processing g  Context Aware Auto Tagging  Social Cognitive Network Unobtrusive sensing device detects: the elder keeps walking around at an irregular pace. Social signal processing indicates: the elder has been silent for an unusually long time. Cognitive Analysis result… lt Your mother may be feeling anxious now… now I need to call my y mother now…
  • 50. Vision To T enable elderly t maintain an active, h lth and bl ld l to i t i ti healthy d engaging life style in their own homes supported by an age-friendly intelligent environment, providing allg y g p g round comprehensive tender care  Round-the-clock day-to-day health and wellness monitoring i i  Cognitive Support and recommendation to products and services  Companionship and emotional support  Support for maintaining/stimulating social interaction 50
  • 51. Design Consideration and Challenges  How to perform unobtrusive monitoring? - Mobile sensing, activity tracking  How to provide all-around comprehensive care? all around - Physical, cognitive, emotional, social, sustainability  How to maintain ubiquitous access q interaction? - Cross platform, multimedia, multimodal  How to provide friendly, personal touch? - Adaptive user modeling, mood detection and -P Proactive, natural i i l interaction i 51
  • 52. Approach and Methodology pp gy To support active living of elderlies pp g f through an intelligent multi-agent environment with ubiquitous access, natural interface, and allrounded comprehensive care dd h i Key Technologies     Unobtrusive sensing and social signal processing Activity pattern and user modeling Information and service recommendation Proactive stimulation and natural interaction 52
  • 53. A Multi-Agent Collaborative Care Environment Isabel (Personal Nurse) Small talk Recommendations for healthcare products and services Alfred Alf d (The Butler) Small talk User modeling Social and travel advisory Frank (Robot Dog) Activity sensing Pattern modeling 53
  • 54. Why Multi-Agent? y g  Unobtrusive sensing and monitoring – agents of different characteristics and capabilities  Ubi i Ubiquitous access to information and i f i d services – agents in different platforms and locations  Comprehensive tender care – agents with different domain knowledge and functions diff d i k l d df i  “Three’s a party” – more opportunities for p y pp cognitive stimulation and social interaction 54
  • 55. Comprehensive Tender Care  Physical Support – Activity tracking, safety and tracking wellness monitoring C Cognitive S i i Support – i f information and i d recommendation on (healthcare) products, services, skills and activities k nd ct v t  Emotional Support – mood detection, affective support, small talk t ll t lk  Social Support – companionship and connection to family and friends (old and new) through sms, emails and facebooks etc 55
  • 56. Unobtrusive Sensing and Ubiquitous Access to Services unobtrusive in-home real-time data collection and contextual social signal processing - Essential to better understand and cater to the elderly’s needs. ld l ’ d  Sensing – bio sensing, motion sensors, wearable/mobile sensors for health monitoring and activity tracking  Cross Platform – Large screen interactive display, mobile handheld devices, physical robots  Multimedia – text, audio, video 56
  • 57. Adaptive User Modelling p g  Identity and profile  Interests and preferences  Behaviour model: Ti Time, space, activity p ti it  Knowledge and skills  S i l network: Family and f d Social k l d friends Meth0ds for Model Building  Explicit: User specification  Implicit: User actions, choices, conversation 57
  • 58. Cognitive Support: Product/Service Recommendation  Domain knowledge: Healthcare, Travel, Cooking  Delivery modes: - Question & Answer -P Proactive recommendation i d i - Conversation P Personal T h l Touch: Personalized, Context sensitive, small talks 58
  • 59. Challenges in g g y Big Living Analytics  Volume – huge amount of data through bio sensing, motion sensors, wearable/mobile sensors for health monitoring and activity tracking  Velocity – 24x7 real time sensing, sense making, decision making service recommendation making,  Variety – information integration and knowledge sharing from cross platform, multimedia h i f l f l i di unstructured data - text, audio, video, gestures 59
  • 60. Research Centre of Excellence in Active LIving for the elderLY (LILY) LI LY Thank you! JOINT UBC-NTU RESEARCH CENTRE