SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
大数据与金融创新:从研究到实践
Assistant Professor
DS Lee Foundation Fellow
School of Information Systems
Singapore Management University
Dec. 11, 2015
朱飞达 Feida Zhu
Founding Director
Pinnacle Lab for Analytics
DBS-SMU Lab for Life Analytics
Singapore Management University
大数据与金融创新:从研究到实践
企业痛点
1. 经济下行,市场竞争压力大,金融行业已经不同满足于传统的被动
服务,而需要全方位了解 用户,把高效便利的金融服务渗透入“医,
食,住,行,玩,学”各个生活场景中,提升用户体验,把产品服务
嵌入用户生活各个侧面。
2. 企业内部数据对用户了解有局限,用户数据来源不足,且获取的
合法性,可持续性,隐私保护性值得担忧。
3. 达到用户的“最后一公里”渠道匮乏,传统营销手段日益疲软
(营销骚扰电话等),难以构建用户数据从收集,分析,到最后营
销的闭环。
金融创新的一个角度:生活即金融	
  
大数据与金融创新:从研究到实践
大数据的三大价值	
  
– Insight	
  from	
  scale	
  
• What	
  can	
  big	
  data	
  tell	
  us	
  that	
  small	
  data	
  cannot?	
  
– Knowledge	
  from	
  enrichment	
  
• What	
  important	
  knowledge	
  can	
  we	
  learn	
  from	
  enriching	
  small	
  
data	
  with	
  big	
  data?	
  
– Agility	
  from	
  real-­‐?me	
  responsiveness	
  
• What	
  are	
  the	
  values	
  of	
  being	
  real-­‐?me?	
  
	
  
VOLUME
VARIETY
VELOCITY
大数据与金融创新:从研究到实践
外部大数据到底能为企业提供什么价值?
企业内部数据
通常只以交易纪录为基础	
  
Transac?on-­‐based	
  
总量和覆盖有限	
  
Limited	
  coverage	
  	
  
只反映用户生活的局部和侧面	
  
Fragmented	
  par?al	
  perspec?ve	
  
静态,低频	
  
Sta?c,	
  low	
  frequency	
  
孤立单一的客户视图,只见人	
  
Isolated	
  view	
  of	
  individual	
  user	
  	
  
外部社交媒体大数据
能展现交易行为的上下文场景	
  
Context-­‐based	
  
海量的社会级覆盖	
  
Societal	
  scale	
  
提供用户的多角度全景式洞察	
  
Mul?-­‐facet	
  insight	
  
动态实时,高频	
  
Dynamic,	
  high	
  frequency	
  
能综合考虑丰富真实社交关系	
  
Network-­‐embedded	
  user	
  view	
  
大数据与金融创新:从研究到实践
“以人为本”的三个打通
跨外部数据平台用户身份归一
运用独有算法技术识别同一个用户在不同外部数据平
台上的不同账号(即便是使用不同用户名),把各个
平台的数据以自然人为单位整合到同一用户。目前我
们有超过1.5亿中国用户跨多个核心平台的数据。
内外部用户数据匹配
对企业内部客户和外部数据平台用户建立身份匹配
基于大数据的360度全方位动态客户视图
对企业内部客户提供全景式客户画像,动态追踪客户生活和潜在
需求,及时捕捉销售和服务最佳时机和方式。
1
2
3
用户兴趣画像 真实生活社交网络 产品倾向性模型
1
2
企业内部数据
3
大数据与金融创新:从研究到实践
应用案例-精准营销
§  海量数据
§  2亿潜在客户的巨大候选空间
§  应用场景:保险业务员每天需要联系大量
用户来推销各种保险,如何及时找到目标
客户以及最适合这个客户的保险产品?
§  精准目标:基于大数据挖掘,自然语言处理,网络结构分析的准确客户画像
§  及时推送:动态监听客户,定义最佳营销时间点,实时响应潜在需求
大数据与金融创新:从研究到实践
应用案例-精准营销
从1.5亿多潜在客户的海量数据巨大候选空
间中进行实时排名,选出排名最高50名
(1)精准目标 基于自然语言处理和文本挖掘,机器自动给用户文本打标签,如“孩子”
(3)及时推送 动
态监听客户,发现兴
趣的增长趋势,在最
佳营销时间点实时响
应潜在需求
我今天要联系
50个潜在客户
卖“少儿险”,
告诉我打哪些
电话?	
  
为什么是
这些人?	
  
你今天最应该
打这些人的电
话!	
  
(2)关系网络分析
基于线下人际关系
挖掘,她的周围人,
亲密好友也很关心
孩子
因为这些人最关
心孩子,而且最
近有增长趋势,
现在是最好时机!	
  
大数据与金融创新:从研究到实践
应用案例-关系营销和风控
§  应用场景:银行时刻在关心两类人:高风险客
户和高净值客户,如何利用客户之间的人际关
系顺藤摸瓜找到其他潜在相关客户?
§  海量线下人际关系网络
§  3亿人,60亿条人际关系边组
成的巨大关系网
豪车
高尔夫
游艇
赌
博
高价
值
§  线下人际关系:通过线下亲密关系
来顺藤摸瓜找到其他相关目标客户
§  精准客户画像:基于大数据挖掘,自然语言处理的准确客户画像
§  人工智能挖掘
§  从用户外部大数据中自动挖掘出用
户的线下真实人际关系网络
大数据与金融创新:从研究到实践
研究课题-线下关系挖掘
1
21
31
41
51
61
12345
6789745
7
2
874
5
4
353
87
9
5
789
8
9
8
789
8
98
Figure 1. Mutual Reachability.
1
211
311
411
511
611
711
811
1 611 2111 2611
12342563789
67
2344

82
4689
2342
3285
36
339
Figure 2. Friendship Retainability.
1
2
3
45
46
51
17281729
17298179
17981799
17998176
17681769
1769817

17
817
9
17
98173
12345
671
Figure 3. Community A
Problem:	
  	
  Given	
  a	
  TwiNer	
  follow	
  network	
  of	
  a	
  target	
  user,	
  iden?fy	
  the	
  user’s	
  offline	
  community	
  by	
  
examining	
  the	
  follow	
  linkage	
  alone.	
  	
  
Informa.on	
  should	
  be	
  able	
  to	
  flow	
  in	
  
both	
  direc.ons	
  within	
  a	
  small	
  distance	
  
between	
  real-­‐life	
  friends.	
  	
  	
  
Principle I:
Mutual Reachability
Principle II:
Friendship Retainability
1
21
31
41
51
61
12345
6789745
7
2
874
5
4
353

Más contenido relacionado

Destacado

Marketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s AccountabilityMarketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s AccountabilityStephanWo
 
El Fantasma De La óPera
El Fantasma De La óPeraEl Fantasma De La óPera
El Fantasma De La óPeraSilvia-Chajarí
 
greencore northampton
greencore northamptongreencore northampton
greencore northamptonMartin Ash
 
Kotler framework 5e_13_sppt
Kotler framework 5e_13_spptKotler framework 5e_13_sppt
Kotler framework 5e_13_spptNate Wildes
 
chamblee-welcome-packet-2016-FIX-072516-sm
chamblee-welcome-packet-2016-FIX-072516-smchamblee-welcome-packet-2016-FIX-072516-sm
chamblee-welcome-packet-2016-FIX-072516-smTeresa Taylor
 
Bentley Smith - Written Reference - Shell Autocare Pty Ltd
Bentley Smith - Written Reference - Shell Autocare Pty LtdBentley Smith - Written Reference - Shell Autocare Pty Ltd
Bentley Smith - Written Reference - Shell Autocare Pty LtdBentley Smith
 
MySQL Tech Tour 2015 - 5.7 InnoDB
MySQL Tech Tour 2015 - 5.7 InnoDBMySQL Tech Tour 2015 - 5.7 InnoDB
MySQL Tech Tour 2015 - 5.7 InnoDBMark Swarbrick
 
2 environmental issues in dyeing
2 environmental issues in dyeing2 environmental issues in dyeing
2 environmental issues in dyeingAdane Nega
 
Lezione 1 Qualità Non Alimentari
Lezione 1   Qualità Non AlimentariLezione 1   Qualità Non Alimentari
Lezione 1 Qualità Non AlimentariMara Celani
 
Las empresas de la industria de la moda sostenibilidad y rse en sus reportes ...
Las empresas de la industria de la moda sostenibilidad y rse en sus reportes ...Las empresas de la industria de la moda sostenibilidad y rse en sus reportes ...
Las empresas de la industria de la moda sostenibilidad y rse en sus reportes ...Susy Inés Bello Knoll
 
Pse02 - zipato siren user manual v1.2 English
Pse02 - zipato siren user manual v1.2 EnglishPse02 - zipato siren user manual v1.2 English
Pse02 - zipato siren user manual v1.2 EnglishDomotica daVinci
 
Sensative Door Window Sensor Strip Z-Wave Plus User Manual
Sensative Door Window Sensor Strip Z-Wave Plus User ManualSensative Door Window Sensor Strip Z-Wave Plus User Manual
Sensative Door Window Sensor Strip Z-Wave Plus User ManualDomotica daVinci
 
Zipato zipatile-data-sheet-1.2z
Zipato zipatile-data-sheet-1.2zZipato zipatile-data-sheet-1.2z
Zipato zipatile-data-sheet-1.2zDomotica daVinci
 

Destacado (20)

Social Media Expo
Social Media ExpoSocial Media Expo
Social Media Expo
 
Marketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s AccountabilityMarketing-Methods,Marketing Automation & Marketing‘s Accountability
Marketing-Methods,Marketing Automation & Marketing‘s Accountability
 
El Fantasma De La óPera
El Fantasma De La óPeraEl Fantasma De La óPera
El Fantasma De La óPera
 
greencore northampton
greencore northamptongreencore northampton
greencore northampton
 
Kotler framework 5e_13_sppt
Kotler framework 5e_13_spptKotler framework 5e_13_sppt
Kotler framework 5e_13_sppt
 
chamblee-welcome-packet-2016-FIX-072516-sm
chamblee-welcome-packet-2016-FIX-072516-smchamblee-welcome-packet-2016-FIX-072516-sm
chamblee-welcome-packet-2016-FIX-072516-sm
 
Bentley Smith - Written Reference - Shell Autocare Pty Ltd
Bentley Smith - Written Reference - Shell Autocare Pty LtdBentley Smith - Written Reference - Shell Autocare Pty Ltd
Bentley Smith - Written Reference - Shell Autocare Pty Ltd
 
MySQL Tech Tour 2015 - 5.7 InnoDB
MySQL Tech Tour 2015 - 5.7 InnoDBMySQL Tech Tour 2015 - 5.7 InnoDB
MySQL Tech Tour 2015 - 5.7 InnoDB
 
Verdi traviata
Verdi traviataVerdi traviata
Verdi traviata
 
Precesores independencia
Precesores independenciaPrecesores independencia
Precesores independencia
 
2 environmental issues in dyeing
2 environmental issues in dyeing2 environmental issues in dyeing
2 environmental issues in dyeing
 
Lezione 1 Qualità Non Alimentari
Lezione 1   Qualità Non AlimentariLezione 1   Qualità Non Alimentari
Lezione 1 Qualità Non Alimentari
 
Postura correcta
 Postura correcta Postura correcta
Postura correcta
 
Las empresas de la industria de la moda sostenibilidad y rse en sus reportes ...
Las empresas de la industria de la moda sostenibilidad y rse en sus reportes ...Las empresas de la industria de la moda sostenibilidad y rse en sus reportes ...
Las empresas de la industria de la moda sostenibilidad y rse en sus reportes ...
 
Lista de precios guatemala
Lista de precios  guatemalaLista de precios  guatemala
Lista de precios guatemala
 
INTRODUCTION TO IKM LUMUT
INTRODUCTION TO IKM LUMUTINTRODUCTION TO IKM LUMUT
INTRODUCTION TO IKM LUMUT
 
Dior - Marcas e patentes
Dior - Marcas e patentesDior - Marcas e patentes
Dior - Marcas e patentes
 
Pse02 - zipato siren user manual v1.2 English
Pse02 - zipato siren user manual v1.2 EnglishPse02 - zipato siren user manual v1.2 English
Pse02 - zipato siren user manual v1.2 English
 
Sensative Door Window Sensor Strip Z-Wave Plus User Manual
Sensative Door Window Sensor Strip Z-Wave Plus User ManualSensative Door Window Sensor Strip Z-Wave Plus User Manual
Sensative Door Window Sensor Strip Z-Wave Plus User Manual
 
Zipato zipatile-data-sheet-1.2z
Zipato zipatile-data-sheet-1.2zZipato zipatile-data-sheet-1.2z
Zipato zipatile-data-sheet-1.2z
 

Similar a 大数据助推金融创新

Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013Nicola Barbieri
 
The effect of social welfare
The effect of social welfareThe effect of social welfare
The effect of social welfarecsandit
 
The Effect of Social Welfare System Based on the Complex Network
The Effect of Social Welfare System Based on the Complex NetworkThe Effect of Social Welfare System Based on the Complex Network
The Effect of Social Welfare System Based on the Complex Networkcsandit
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processingjins0618
 
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK csandit
 
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEBCOMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEBIJMIT JOURNAL
 
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...IIIT Hyderabad
 
Adaptive Percolation Daniel Burkhardt Cerigo
Adaptive Percolation Daniel Burkhardt CerigoAdaptive Percolation Daniel Burkhardt Cerigo
Adaptive Percolation Daniel Burkhardt CerigoDaniel Burkhardt Cerigo
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
 
Online Social Netowrks- report
Online Social Netowrks- reportOnline Social Netowrks- report
Online Social Netowrks- reportAjay Karri
 
FRIEND RECOMMENDATION IN ONLINE SOCIAL NETWORKS USING LDA
FRIEND RECOMMENDATION IN ONLINE SOCIAL NETWORKS USING LDAFRIEND RECOMMENDATION IN ONLINE SOCIAL NETWORKS USING LDA
FRIEND RECOMMENDATION IN ONLINE SOCIAL NETWORKS USING LDAJournal For Research
 
A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationSurendra Gadwal
 
Ripple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network NodesRipple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network Nodesrahulmonikasharma
 
Homophily and influence in social networks
Homophily and influence in social networksHomophily and influence in social networks
Homophily and influence in social networksNicola Barbieri
 
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSEVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSAIRCC Publishing Corporation
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisWael Elrifai
 
10477-Article Text-14005-1-2-20201228 (2).pdf
10477-Article Text-14005-1-2-20201228 (2).pdf10477-Article Text-14005-1-2-20201228 (2).pdf
10477-Article Text-14005-1-2-20201228 (2).pdfpallavithakur4789
 
Interaction networks
Interaction networksInteraction networks
Interaction networksBob O'Hara
 

Similar a 大数据助推金融创新 (20)

Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013
 
The effect of social welfare
The effect of social welfareThe effect of social welfare
The effect of social welfare
 
The Effect of Social Welfare System Based on the Complex Network
The Effect of Social Welfare System Based on the Complex NetworkThe Effect of Social Welfare System Based on the Complex Network
The Effect of Social Welfare System Based on the Complex Network
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
MODELING SOCIAL GAUSS-MARKOV MOBILITY FOR OPPORTUNISTIC NETWORK
 
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEBCOMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
 
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
 
Adaptive Percolation Daniel Burkhardt Cerigo
Adaptive Percolation Daniel Burkhardt CerigoAdaptive Percolation Daniel Burkhardt Cerigo
Adaptive Percolation Daniel Burkhardt Cerigo
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
Online Social Netowrks- report
Online Social Netowrks- reportOnline Social Netowrks- report
Online Social Netowrks- report
 
FRIEND RECOMMENDATION IN ONLINE SOCIAL NETWORKS USING LDA
FRIEND RECOMMENDATION IN ONLINE SOCIAL NETWORKS USING LDAFRIEND RECOMMENDATION IN ONLINE SOCIAL NETWORKS USING LDA
FRIEND RECOMMENDATION IN ONLINE SOCIAL NETWORKS USING LDA
 
paper_148.pptx
paper_148.pptxpaper_148.pptx
paper_148.pptx
 
socialpref
socialprefsocialpref
socialpref
 
A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence Maximization
 
Ripple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network NodesRipple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network Nodes
 
Homophily and influence in social networks
Homophily and influence in social networksHomophily and influence in social networks
Homophily and influence in social networks
 
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSEVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKS
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
10477-Article Text-14005-1-2-20201228 (2).pdf
10477-Article Text-14005-1-2-20201228 (2).pdf10477-Article Text-14005-1-2-20201228 (2).pdf
10477-Article Text-14005-1-2-20201228 (2).pdf
 
Interaction networks
Interaction networksInteraction networks
Interaction networks
 

Más de Jerry Wen

BDTC2015 数美时代-梁堃-sentry 金融实时风控系统
BDTC2015 数美时代-梁堃-sentry 金融实时风控系统BDTC2015 数美时代-梁堃-sentry 金融实时风控系统
BDTC2015 数美时代-梁堃-sentry 金融实时风控系统Jerry Wen
 
BDTC2015 阿里巴巴-郑斌-大数据下的数据安全
BDTC2015 阿里巴巴-郑斌-大数据下的数据安全BDTC2015 阿里巴巴-郑斌-大数据下的数据安全
BDTC2015 阿里巴巴-郑斌-大数据下的数据安全Jerry Wen
 
BDTC2015 新浪微博-姜贵彬-大数据驱动下的微博社会化推荐
BDTC2015 新浪微博-姜贵彬-大数据驱动下的微博社会化推荐BDTC2015 新浪微博-姜贵彬-大数据驱动下的微博社会化推荐
BDTC2015 新浪微博-姜贵彬-大数据驱动下的微博社会化推荐Jerry Wen
 
BDTC2015 南京大学-黄宜华-octopus(大章鱼):基于r语言的跨平台大数据机器学习与数据分析系统
BDTC2015 南京大学-黄宜华-octopus(大章鱼):基于r语言的跨平台大数据机器学习与数据分析系统BDTC2015 南京大学-黄宜华-octopus(大章鱼):基于r语言的跨平台大数据机器学习与数据分析系统
BDTC2015 南京大学-黄宜华-octopus(大章鱼):基于r语言的跨平台大数据机器学习与数据分析系统Jerry Wen
 
BDTC2015 小米-大数据和小米金融
BDTC2015 小米-大数据和小米金融BDTC2015 小米-大数据和小米金融
BDTC2015 小米-大数据和小米金融Jerry Wen
 
BDTC2015 阿里巴巴-鄢志杰(智捷)-deep learning助力客服小二:数据技术及机器学习在客服中心的应用
BDTC2015 阿里巴巴-鄢志杰(智捷)-deep learning助力客服小二:数据技术及机器学习在客服中心的应用BDTC2015 阿里巴巴-鄢志杰(智捷)-deep learning助力客服小二:数据技术及机器学习在客服中心的应用
BDTC2015 阿里巴巴-鄢志杰(智捷)-deep learning助力客服小二:数据技术及机器学习在客服中心的应用Jerry Wen
 
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarnBDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarnJerry Wen
 
BDTC2015 京东-刘海锋-大规模内存数据库jimdb:从2014到2016
BDTC2015 京东-刘海锋-大规模内存数据库jimdb:从2014到2016BDTC2015 京东-刘海锋-大规模内存数据库jimdb:从2014到2016
BDTC2015 京东-刘海锋-大规模内存数据库jimdb:从2014到2016Jerry Wen
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkJerry Wen
 
BDTC2015 启明星辰-潘柱廷-中国大数据技术与产业发展报告
BDTC2015 启明星辰-潘柱廷-中国大数据技术与产业发展报告BDTC2015 启明星辰-潘柱廷-中国大数据技术与产业发展报告
BDTC2015 启明星辰-潘柱廷-中国大数据技术与产业发展报告Jerry Wen
 

Más de Jerry Wen (10)

BDTC2015 数美时代-梁堃-sentry 金融实时风控系统
BDTC2015 数美时代-梁堃-sentry 金融实时风控系统BDTC2015 数美时代-梁堃-sentry 金融实时风控系统
BDTC2015 数美时代-梁堃-sentry 金融实时风控系统
 
BDTC2015 阿里巴巴-郑斌-大数据下的数据安全
BDTC2015 阿里巴巴-郑斌-大数据下的数据安全BDTC2015 阿里巴巴-郑斌-大数据下的数据安全
BDTC2015 阿里巴巴-郑斌-大数据下的数据安全
 
BDTC2015 新浪微博-姜贵彬-大数据驱动下的微博社会化推荐
BDTC2015 新浪微博-姜贵彬-大数据驱动下的微博社会化推荐BDTC2015 新浪微博-姜贵彬-大数据驱动下的微博社会化推荐
BDTC2015 新浪微博-姜贵彬-大数据驱动下的微博社会化推荐
 
BDTC2015 南京大学-黄宜华-octopus(大章鱼):基于r语言的跨平台大数据机器学习与数据分析系统
BDTC2015 南京大学-黄宜华-octopus(大章鱼):基于r语言的跨平台大数据机器学习与数据分析系统BDTC2015 南京大学-黄宜华-octopus(大章鱼):基于r语言的跨平台大数据机器学习与数据分析系统
BDTC2015 南京大学-黄宜华-octopus(大章鱼):基于r语言的跨平台大数据机器学习与数据分析系统
 
BDTC2015 小米-大数据和小米金融
BDTC2015 小米-大数据和小米金融BDTC2015 小米-大数据和小米金融
BDTC2015 小米-大数据和小米金融
 
BDTC2015 阿里巴巴-鄢志杰(智捷)-deep learning助力客服小二:数据技术及机器学习在客服中心的应用
BDTC2015 阿里巴巴-鄢志杰(智捷)-deep learning助力客服小二:数据技术及机器学习在客服中心的应用BDTC2015 阿里巴巴-鄢志杰(智捷)-deep learning助力客服小二:数据技术及机器学习在客服中心的应用
BDTC2015 阿里巴巴-鄢志杰(智捷)-deep learning助力客服小二:数据技术及机器学习在客服中心的应用
 
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarnBDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
 
BDTC2015 京东-刘海锋-大规模内存数据库jimdb:从2014到2016
BDTC2015 京东-刘海锋-大规模内存数据库jimdb:从2014到2016BDTC2015 京东-刘海锋-大规模内存数据库jimdb:从2014到2016
BDTC2015 京东-刘海锋-大规模内存数据库jimdb:从2014到2016
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of spark
 
BDTC2015 启明星辰-潘柱廷-中国大数据技术与产业发展报告
BDTC2015 启明星辰-潘柱廷-中国大数据技术与产业发展报告BDTC2015 启明星辰-潘柱廷-中国大数据技术与产业发展报告
BDTC2015 启明星辰-潘柱廷-中国大数据技术与产业发展报告
 

Último

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Último (20)

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

大数据助推金融创新

  • 1. 大数据与金融创新:从研究到实践 Assistant Professor DS Lee Foundation Fellow School of Information Systems Singapore Management University Dec. 11, 2015 朱飞达 Feida Zhu Founding Director Pinnacle Lab for Analytics DBS-SMU Lab for Life Analytics Singapore Management University
  • 2. 大数据与金融创新:从研究到实践 企业痛点 1. 经济下行,市场竞争压力大,金融行业已经不同满足于传统的被动 服务,而需要全方位了解 用户,把高效便利的金融服务渗透入“医, 食,住,行,玩,学”各个生活场景中,提升用户体验,把产品服务 嵌入用户生活各个侧面。 2. 企业内部数据对用户了解有局限,用户数据来源不足,且获取的 合法性,可持续性,隐私保护性值得担忧。 3. 达到用户的“最后一公里”渠道匮乏,传统营销手段日益疲软 (营销骚扰电话等),难以构建用户数据从收集,分析,到最后营 销的闭环。 金融创新的一个角度:生活即金融  
  • 3. 大数据与金融创新:从研究到实践 大数据的三大价值   – Insight  from  scale   • What  can  big  data  tell  us  that  small  data  cannot?   – Knowledge  from  enrichment   • What  important  knowledge  can  we  learn  from  enriching  small   data  with  big  data?   – Agility  from  real-­‐?me  responsiveness   • What  are  the  values  of  being  real-­‐?me?     VOLUME VARIETY VELOCITY
  • 4. 大数据与金融创新:从研究到实践 外部大数据到底能为企业提供什么价值? 企业内部数据 通常只以交易纪录为基础   Transac?on-­‐based   总量和覆盖有限   Limited  coverage     只反映用户生活的局部和侧面   Fragmented  par?al  perspec?ve   静态,低频   Sta?c,  low  frequency   孤立单一的客户视图,只见人   Isolated  view  of  individual  user     外部社交媒体大数据 能展现交易行为的上下文场景   Context-­‐based   海量的社会级覆盖   Societal  scale   提供用户的多角度全景式洞察   Mul?-­‐facet  insight   动态实时,高频   Dynamic,  high  frequency   能综合考虑丰富真实社交关系   Network-­‐embedded  user  view  
  • 6. 大数据与金融创新:从研究到实践 应用案例-精准营销 §  海量数据 §  2亿潜在客户的巨大候选空间 §  应用场景:保险业务员每天需要联系大量 用户来推销各种保险,如何及时找到目标 客户以及最适合这个客户的保险产品? §  精准目标:基于大数据挖掘,自然语言处理,网络结构分析的准确客户画像 §  及时推送:动态监听客户,定义最佳营销时间点,实时响应潜在需求
  • 8. 大数据与金融创新:从研究到实践 应用案例-关系营销和风控 §  应用场景:银行时刻在关心两类人:高风险客 户和高净值客户,如何利用客户之间的人际关 系顺藤摸瓜找到其他潜在相关客户? §  海量线下人际关系网络 §  3亿人,60亿条人际关系边组 成的巨大关系网 豪车 高尔夫 游艇 赌 博 高价 值 §  线下人际关系:通过线下亲密关系 来顺藤摸瓜找到其他相关目标客户 §  精准客户画像:基于大数据挖掘,自然语言处理的准确客户画像 §  人工智能挖掘 §  从用户外部大数据中自动挖掘出用 户的线下真实人际关系网络
  • 10. 2
  • 11. 874
  • 12. 5 4
  • 13. 353
  • 15. 8 9 8
  • 16. 789
  • 17. 8 98
  • 18. Figure 1. Mutual Reachability. 1 211 311 411 511 611 711 811 1 611 2111 2611 12342563789 67
  • 20. 3285
  • 21. 36
  • 22. 339 Figure 2. Friendship Retainability. 1 2 3 45 46 51 17281729 17298179 17981799 17998176 17681769 1769817 17 817 9 17 98173 12345 671 Figure 3. Community A Problem:    Given  a  TwiNer  follow  network  of  a  target  user,  iden?fy  the  user’s  offline  community  by   examining  the  follow  linkage  alone.     Informa.on  should  be  able  to  flow  in   both  direc.ons  within  a  small  distance   between  real-­‐life  friends.       Principle I: Mutual Reachability Principle II: Friendship Retainability 1 21 31 41 51 61 12345 6789745 7
  • 23. 2
  • 24. 874
  • 25. 5 4
  • 26. 353
  • 28. 8 9 8
  • 29. 789
  • 30. 8 98
  • 31. Figure 1. Mutual Reachability. 1 211 311 411 511 611 711 811 1 611 2111 2611 12342563789 67
  • 33. 3285
  • 34. 36
  • 35. 339 Figure 2. Friendship Retainability. 1 2 3 45 46 51 17281729 17298179 17981799 17998176 17681769 1769817 17 817 9 17 98173 17381739 1739817 12345 671 Figure 3. Community Affini The  size  of  a  user’s  offline  community   has  an  upper-­‐bound  threshold  σ   related  to  Dunbar’s  number   Principle III: Community Affinity Figure 6: Case study of a user’s fo 5. EXPERIMENTAL STUDY ity with A  user’s  off-­‐line  friends  usually   group  into  clusters  within  which   members  know  each  other    
  • 36. 大数据与金融创新:从研究到实践 研究课题-线下关系挖掘 Figure 6: Case study of a user’s follow network. 5. EXPERIMENTAL STUDY An implementation of our algorithm as a demo system – TwiCube1 – is publicly available. 5.1 Case Study We now present a case study on a real user X who par- ticipated in our evaluation. X has 107 followers and follows 385 other users. Figure 6 illustrates the discovery of his core community in a total of 4 iterations each indicated by a dif- ferent color. In summary, 34 users are identified in Iteration 1, 19 in Iteration 2, 3 in Iteration 3 and only one user in the last iteration. The precision and recall for this result of X’s core community is 0.8947 and 0.9807 respectively. It can be observed from Figure 6 that there is a dense clusters of core community members heavily linked among one an- other (lower left to X) and another such cluster of non-core- community users similarly linked (upper right to X). This shows that approaches based on dense subgraph mining or structural clustering would have a hard time in distinguish- ing between these two similarly-structured communities and, consequently, identifying the true core community. In fact, this cluster of non-core-community users consists of media, business and active Twitter users sharing similar interests and topics, which is a good indicator of those of X’s own. ity with X. This case would fail the naive approa identify core community members by two-way f In (b), we show the follow networks between X community member Y , who is discovered in Iter this case, X follows Y but Y does not follow X. M is not until more core community members have tified at Iteration 1 and 2 that Y ’s sophisticated c with the core community are revealed. In this by unleashing the power of iterated core commu fication, our algorithm is still able to correctly id 5.2 Effectiveness One naive method to identify the core commun get user u is to find the set of users who have dire follow links with u, i.e., they and u follow each ot rect two-way follow links provide good indication real-world friendship? Our experiments suggest links are not sufficient. In Figure 7 we show the on the distribution (among the 65 user evaluati cision, recall and F score between our algorithm the naive algorithm. In general our solution o the naive solution by a large margin. To conduc tailed comparison between the two methods, l Figure 5: Core Community Discovery RWR and closeness score between a user node i a as follows. ri,S = j∈S ri,j rS,i = j∈S rj,i ci,S = cS,i = ri,S ∗ rS,i Given a user node i, the probability transition Approach Figure 1: Three Types of Core Community Mem- bers. We now show how these three principles help us identify core communities members of different kinds. Based on our study, we categorize a user’s follow network based on three attributes each reflects one of the above-mentioned princi- ples. Note that these attributes and their corresponding parameters are proposed for the categorization only, none of which will be actually computed in our algorithm. Suppose the target user is u and the user in consideration is v. (I) Mutual Following. The first attribute is whether u and v directly follow each other. There are two cases: (I). u and v follow each other, i.e., v ∈ N1 u← N1 u→. We call this a two-way follow case. (II). Either u follows v or v follows u, but not both, i.e., v ∈ N1 u← N1 u→ N1 u← N1 u→. We call this a one-way follow case. Principle 1 is immediately satisfied in a two-way follow case as tweets of both u and v are delivered directly to each other, while in a one-way follow case, computation considering the k-hop neighborhood of u is necessary to determine the satisfiability of Principle 1. (II) Friendship Exclusivity. The second attribute is the larger one between |Fu←| and |Fu→|. For simplicity, we use |Fu←| to illustrate while the analysis with |Fu→| can be done similarly. This attribute indicates the number of other users in whom u is interested in hearing about. In general, this !  Random Walk with Restart !  Closeness Score !  Iterative Off-line Community Discovery !  Off-line community is discovered by iterations. !  A virtual user node is used as the threshold to cut for each iteration. ose our algorithm based on the idea of random walk art(RWR). RWR has been successfully used to mea- relevance score between two nodes in a weighted 3, 9, 2, 12]. It is defined in [9] with the following ⃗ri = (1 − c) ˜W ⃗ri + c⃗ei (1) tting, given a weighted graph, a particle starts from d conducts random movement. It transmits to the hood of its current node with a probability propor- the edge weights. At each step, the particle also o the start node i with some probability c. The score of node j with respect to i is defined as the ate probability ri,j that the particle finally stays at roblem setting, given the Twitter network G = target user u ∈ V and a number k, we focus on raph Gk u induced by Nk u , which is simplified as Gu s fixed. A probability transition matrix W is de- Gu(V ) such that, for two nodes v, w ∈ Gu(V ), the puted iteratively and it finally converges to )−1 ⃗ei [9]. When it converges, the steady-state tor ⃗ri reflects the bandwidth of information from user i to user j for every j ∈ Gu(V ). eady-state probability to define the closeness wo users i and j: ci,j = ri,j ∗ rj,i (3) score thus defined satisfies Principle (I). It ng desirable properties, the proofs of which e to space limit. 1. Given a Twitter follow network G(V, E) i, j ∈ V , ci,j is symmetric, i.e., ci,j = cj,i. Property 2. Given a Twitter follow network G(V, E), two users i, j ∈ V and k, ci,j 0 if and only if i and j satisfy Principle 1 — i ∈ Nk j→ Nk j← and j ∈ Nk i→ Nk i←, i.e., tweets originated from either user i or j should be able to reach the other one in k hops. Property 3. Given a Twitter follow network G(V, E), two users i, j ∈ V and k, obtain a node j′ resulted from removing a set S of users from j’s immediate neighborhood such that for each v ∈ S, either v ∈ Fj→ Nk i← or v ∈ Fj← Nk i→. We have ci,j ≤ ci,j′ . Figure 2: Core Community Discovery closeness score between u and all the rest users, t we compute the closeness score between ˜u and eve user. From the ranking list thus generated, if any us ahead of ˆv in this iteration, the user will be adde core community of u, which ends this iteration. So so forth. Figure 2 illustrates the process. The targ is shown in red in the center and the auxiliary dum ˆv is shown in purple. In iteration 1, the core comm just u itself, which is indicated by the shaded circle u. The highlighted blue nodes and follow links re Fu← Fu→. After computing the closeness score cu v, three users are found to be ahead of ˆv in the ranking list. They are therefore added to the core nity, indicated by their color changed from blue to In iteration 2, we use the new core community ˜u, c now of 4 users, to compute the closeness scores c˜u rest nodes v. Those ranked ahead of ˆv will be adde core community. The iterations continue until no n can be added to the core community, ending the al As the virtual user node ˜u is actually a set, we no RWR and closeness score between a user node i and as follows. ri,S = j∈S ri,j the naive approach respectively. The result shows that for most users, our solution outperforms the naive solution for both precision and recall. In particular, in two cases, the difference is even close to 1. There is only one single case in which our algorithm is prevailed for both precision and recall. 5.4 On Ranking compare2(v1, v2) = ⎪⎩ compare1(v1, −1, Which one is better? We evaluate computing their AUC value for eac tions of the AUC values are showed i shows that for both rankings, more values are greater than 0.9 and more Figure 7: AUC comparison for rankings with and without incorporatin values are greater than 0.8. The right graph in Figure 7 shows that in most cases, the ranking with iteration informa- tion incorporated is superior than the ranking based solely on closeness score. This demonstrates that core community information helps the ranking. 5.5 On Iteration It has been observed in our experiments that the core com- munity discovery process ends after a few iterations. One interesting question is whether core community members identified in later iterations are as good as those found in earlier iterations. If we set a maximum number of iteration allowed in the algorithm to force termination, will the result give better precision and recall? Our experiments suggest a negative answer. Figure 8 shows that the average pre- cision, recall and F-score for varied maximum number of iterations allowed from 1 to 10 as well as unlimited. As the maximum number of iterations allowed increases, although average precision drops slightly, recall improves significantly, and so does the F-score. Intuitively, earlier iterations tend to capture those closest members to the target user, which results in a higher precision yet at the cost of missing out many other core community members with more sophisti- cated social connections with the target user. By setting no maximum number of iterations and allowing the core com- munity itself to take shape, much greater gain in recall could be achieved, offering a better result overall. In most cases, core communities stabilize after 5 or 6 iterations, as shown in Figure 9 which presents the distribution of number of iterations of all our eval 5.6 Modeling Use How to model user inter tent recommendation an Furthermore, our study ery could significantly en following two aspects: (I munity members themse terizing u’s interests tha network. u follow them m life friends anyway. On t or topics that drive u t users. As such, when i step is to distinguish u’s low network. (II). Altho themselves may not nec users followed by these c less could help understa could follow media/celeb In our experiments, we users, A,B and C to hel that A and B share mu interests, background an if we check the common by A and B, they have in Figure 11), while A in Figure 12). This me community, C could be Figure 7: AUC comparison for rankings with and without incorporating iteratio values are greater than 0.8. The right graph in Figure 7 shows that in most cases, the ranking with iteration informa- tion incorporated is superior than the ranking based solely on closeness score. This demonstrates that core community information helps the ranking. 5.5 On Iteration It has been observed in our experiments that the core com- munity discovery process ends after a few iterations. One interesting question is whether core community members identified in later iterations are as good as those found in earlier iterations. If we set a maximum number of iteration allowed in the algorithm to force termination, will the result give better precision and recall? Our experiments suggest a negative answer. Figure 8 shows that the average pre- cision, recall and F-score for varied maximum number of iterations allowed from 1 to 10 as well as unlimited. As the maximum number of iterations allowed increases, although average precision drops slightly, recall improves significantly, and so does the F-score. Intuitively, earlier iterations tend to capture those closest members to the target user, which results in a higher precision yet at the cost of missing out many other core community members with more sophisti- cated social connections with the target user. By setting no maximum number of iterations and allowing the core com- munity itself to take shape, much greater gain in recall could be achieved, offering a better result overall. In most cases, core communities stabilize after 5 or 6 iterations, as shown in Figure 9 which presents the distribution of number of iterations of all our evaluation part 5.6 Modeling User Interest How to model user interests is of cri tent recommendation and linkage pr Furthermore, our study reveals that ery could significantly enhance user following two aspects: (I) For a tar munity members themselves are les terizing u’s interests than the rest network. u follow them mostly beca life friends anyway. On the other ha or topics that drive u to follow oth users. As such, when investigating step is to distinguish u’s core comm low network. (II). Although the co themselves may not necessarily refl users followed by these core commu less could help understand u’s inte could follow media/celebrity/busine In our experiments, we identify and users, A,B and C to help us evalua that A and B share much more sim interests, background and life-style t if we check the common non-core-co by A and B, they have 15 such u in Figure 11), while A and C have in Figure 12). This means that, w community, C could be considered Figure 7: AUC comparison for rankings with and without incorporating iteration inform values are greater than 0.8. The right graph in Figure 7 shows that in most cases, the ranking with iteration informa- tion incorporated is superior than the ranking based solely on closeness score. This demonstrates that core community information helps the ranking. 5.5 On Iteration It has been observed in our experiments that the core com- munity discovery process ends after a few iterations. One interesting question is whether core community members identified in later iterations are as good as those found in earlier iterations. If we set a maximum number of iteration allowed in the algorithm to force termination, will the result give better precision and recall? Our experiments suggest a negative answer. Figure 8 shows that the average pre- cision, recall and F-score for varied maximum number of iterations allowed from 1 to 10 as well as unlimited. As the maximum number of iterations allowed increases, although average precision drops slightly, recall improves significantly, and so does the F-score. Intuitively, earlier iterations tend to capture those closest members to the target user, which results in a higher precision yet at the cost of missing out many other core community members with more sophisti- cated social connections with the target user. By setting no maximum number of iterations and allowing the core com- munity itself to take shape, much greater gain in recall could be achieved, offering a better result overall. In most cases, core communities stabilize after 5 or 6 iterations, as shown in Figure 9 which presents the distribution of number of iterations of all our evaluation participants. 5.6 Modeling User Interests How to model user interests is of critical imp tent recommendation and linkage prediction in Furthermore, our study reveals that core com ery could significantly enhance user interest m following two aspects: (I) For a target user u munity members themselves are less informa terizing u’s interests than the rest user node network. u follow them mostly because they a life friends anyway. On the other hand, it is s or topics that drive u to follow other non-c users. As such, when investigating u’s inte step is to distinguish u’s core community fro low network. (II). Although the core commu themselves may not necessarily reflect u’s i users followed by these core community mem less could help understand u’s interests, e.g could follow media/celebrity/business users o In our experiments, we identify and hire thre users, A,B and C to help us evaluate. The g that A and B share much more similar profi interests, background and life-style than A an if we check the common non-core-community by A and B, they have 15 such users in co in Figure 11), while A and C have 18 in co in Figure 12). This means that, without th community, C could be considered more sim Application Example: User Interest Pro Figure 11: Interest profile comparison for A and B Figure 12: Interest profile compari bi-directional way and relies on no other attribute informa- to predict link strength in online soci Parameters !  On # of Iterations !  On Robustness Figure 8: The result for limiting the max # of iterations allowed. Figure 9: The distribution of # of iterations. Figure 10: R B, contradicting the truth. In fact, we can use core com- munity to remedy the situation. Similar as in the idea of TF-IDF [11], for target user u, we use the following formula to compute the weight for each non-core-community user v wu(v) = |Fv→ Cu| |Cu| log |Fv→| (9) As such, for a target user u, we obtain a vector ⃗xu where each dimension is one non-core-community member. For two tar- get users u1 and u2, we compute the similarity between their interest profile as Sim(u1, u2) = ⃗xu1 ·⃗xu2 |⃗xu1 ||⃗xu2 | . In Figure 11 and Figure 12, we show the relative ratio between user A and B, where the percent for user A on dimension v is computed by wA(v) wA(v)+wB (v) , and wB (v) wA(v)+wB (v) for user B. Now if we com- of SNS and real-life social networks. [14] book has influenced the establishment o lationships. Another related direction is real-life friendship or relationship stren work using hyperlinks and text informat predict relationships between individua further information including network to tions to predict relationship strength. the same problem with a link-based late While the relationship between a user’ social network has been investigated in Facebook, few studies have so far pose on Twitter network. More importantly Facebook, Twitter has two important d tics — (I) As shown in [8], Twitter fun of news media and social network comb both. (II) Follow links on Twitter are Figure 8: The result for limiting the max # of iterations allowed. Figure 9: The distribution of # of iterations. Figure 10: Robus B, contradicting the truth. In fact, we can use core com- munity to remedy the situation. Similar as in the idea of TF-IDF [11], for target user u, we use the following formula to compute the weight for each non-core-community user v wu(v) = |Fv→ Cu| |Cu| log |Fv→| (9) As such, for a target user u, we obtain a vector ⃗xu where each dimension is one non-core-community member. For two tar- get users u1 and u2, we compute the similarity between their interest profile as Sim(u1, u2) = ⃗xu1 ·⃗xu2 |⃗xu1 ||⃗xu2 | . In Figure 11 and Figure 12, we show the relative ratio between user A and B, where the percent for user A on dimension v is computed by wA(v) wB (v) of SNS and real-life social networks. [14] looked book has influenced the establishment of new lationships. Another related direction is to us real-life friendship or relationship strength. work using hyperlinks and text information on predict relationships between individuals. [6, further information including network topolog tions to predict relationship strength. [17] ha the same problem with a link-based latent va While the relationship between a user’s onlin social network has been investigated in stand Facebook, few studies have so far pose the sa on Twitter network. More importantly, com Facebook, Twitter has two important differen tics — (I) As shown in [8], Twitter functions of news media and social network combiningFigure 5: Core Community Discovery RWR and closeness score between a u as follows. ri,S = j∈S ri,j rS,i = j∈S rj,i ci,S = cS,i = ri,S ∗ r Given a user node i, the probability Figure 6: Case study of a user’s follow A  real  TwiFer  user:       §  Following  385  users   §  Followed  by  107  users  
  • 37. 大数据与金融创新:从研究到实践 研究课题-线下亲密关系挖掘 Problem:    Given  a  user’s  tweets,  iden?fy  all  interpersonal  rela?onships  that  involve   physical  or  emo?onal  in?macy,  such  as  family  members,  husband  and  wife,  roman?c   rela?onship,  etc..     Example:   §  In.mate  expressions   §  “honey”,  “baby”,  “dear”,  “my  dear  wife”,…   §  Occasions/Events   §  Valen?ne’s  day,  anniversary,  father’s  day,   birthday,…   §  In.macy-­‐related  name  en..es   §  Resort  hotels,  kids,  home-­‐improvement,  …   §  Screen-­‐name  correla.on   §  Substring  swaps   §  Similar  PaNerns  with  keywords   §  PaNerns  with  domain  knowledge   Design Ideas I Intimacy-related Entity Use  Dempster–Shafer   theory  to  model  the   associa?on  degree   between  en??es  and  a   certain  type  of   rela?onship.    The  final   in?mate  rela?onship   scores  are  achieved   through  an  itera?ve   algorithm.     Design Ideas II: Exclusivity of “@” to identify relationship candidates
  • 38. 大数据与金融创新:从研究到实践 外部数据跨平台用户身份归一 Linkage Information Collection Photos Tweets/Retweets Trajectories ... Profiles Username Photos Tweets/Retweets Trajectories ... Profiles Username t Unlinked Identities… Step 3: Multi-objective Optimization MinW [F1(w), F2(w),…,  FM(w)] Linkage Function fW Unknown Identities Step 2: Structure Information Modeling Step 1:Heterogeneous Behavior Modeling Figure 3: HYDRA framework. Figure 4: The workflow of A face detector is employe profile images. Then a pre- fidence score in [0, 1] indica to one person. attributes used in the matchi set by probabilistic modeling Specifically, given a set o •  Nodal  aFributes  (numeric,   categorical)   •  Demographics,  loca?on,  personal   interest,  etc.     •  User  Generated  Content  (topics,   sen.ments)   •  Reviews,  tweets,  ra?ngs,   mul?media,  etc.   •  Social  network  (snapshot/sta.c   view)   •  Friend  network,  followers/followees   network,  communi?es/interest   groups,  etc.   •  Behavior  trajectory  (dynamic,   evolu.onary)   •   content  sharing  history,  social   interac?on  paNern,  network   forma?on,  etc.      
  • 39. 大数据与金融创新:从研究到实践 外部数据跨平台用户身份归一 •  People’s  closest  friends  are  similar   across  different  social  plaaorms.       •  Behavior  similarity  aggrega?on  of   the  most  frequently  interac?ng   friends  of  users  provides  insights   into  user  iden?ty  linkage.   •  Supervised  Learning   •  Structure  Consistency  Modeling   •  Mul?-­‐objec?ve  Op?miza?on   A  two-­‐class  classifica?on  problem  -­‐-­‐-­‐  construct  mul?-­‐objec?ve  op?miza?on  which  jointly  op?mizes  the  predic.on  accuracy   on  the  labeled  user  pairs  and  mul.ple  structure  consistency  measurements  across  different  plaaorms.  
  • 40. 大数据与金融创新:从研究到实践 社交媒体大数据的核心: 5个“C •  Content  内容   –  个人档案,话题分布,情感模型,兴趣画像.   •  Context  情景   –  地点,时序分析,行为轨迹,社群分析.         •  Connec.on  关联   –  线下关系挖掘,核心网络分析.   •  Crowd  众智   –  利用大众的人脑智慧,众包,众筹.   •  Cloud  云平台   –  开发多源的思维模式.     社交媒体 大数据 内容 Content   情景 Context   云平台 Cloud   众智 Crowd   关联 Connec.on  
  • 41. 大数据与金融创新:从研究到实践 社交媒体大数据的个人征信应用 •  弥补个⼈人信⽤用数据的稀疏性 •  在中国,官⽅方正式的个⼈人信⽤用数据匮乏,尤其是中低收⼊入层次的申请⼈人,⽽而这部分⼈人群正是互联⺴⽹网 ⾦金融的主要⺫⽬目标客户。 •  冷启动 •  对抗恶意欺诈 •  社交数据和⾦金融领域的弱相关 •  侦测异地诈骗 •  挖掘⻛风险的前瞻性 •  利⽤用⽣生活情景的时序推理 •  深挖信⽤用⻛风险的社会关系传递
  • 44. 大数据与金融创新:从研究到实践 •  组合较优的独立特征为复合特征,加入 传统模型。   •  使用决策树组合:地理位置特征 (所 在地、签到地点)根据各个特征上Good、 Bad分布的差异性,选出特征放入决策 树。   •  根据数据生成的决策树如下表。表中用 不同的颜色来区分决策树的层次,黄色 为第一层,绿色为第二层,蓝色为第三 层。表中的数值表示满足该条件的人群 是坏人的风险指数。基于此决策树模型, 分类的准确率达到0.83。   提取社交维度信用特征,加入现有传统信用模型  
  • 45. 大数据与金融创新:从研究到实践 提取社交维度信用特征,加入现有传统信用模型   Fid Feature Name Pearson Correlation χ2 Statistics 1 Gender 4.45 × 10−2 14.27∗ 2 Age 1.92 × 10−2 16.28∗ 3 Verified 5.128 × 10−2 17.02∗ 4 Education 4.18 × 10−3 0 5 Location 4.81 × 10−2 16.68∗ 6 Occupation 2.244 × 10−2 0.137 7 Registration time 6.944 × 10−2 39.44∗ ∗ Passes the significance test at the confidence level of 95%. Table 5: Pearson correlation and χ2 statistics evaluation for demographic features 0 10 20 30 40 50 1 2 3 4 5 6 7 Fid ImportanceValue 0 2 4 6 8 1 2 3 4 5 6 7 8 9 10 Fid ImportanceValue Fid Feature Name Pearson Correlation χ2 Statistics 1 Length 5.546 × 10−2 48.04∗ 2 Containing images 4.149 × 10−2 3.650 3 Containing URL 1.827 × 10−2 58.02∗ 4 Conta. HashTag 3.422 × 10−2 2.376 5 Conta. only mentions 6.114 × 10−2 21.63∗ 6 Conta. only emotions 5.504 × 10−2 9.475∗ 7 Grant of “badges” 2.212 × 10−2 6.449∗ 8 Commercial purpose 1.134 × 10−2 2.026 9 N. B. based prob. 7.716 × 10−2 25.76∗ 10 Topic distributions 5.370 × 10−2 39.44∗ ∗ Passes the significance test at the confidence level of 95%. Table 6: Pearson correlation and χ2 statistics evaluation for microblog features Fid Feature Name Pearson Correlation χ2 Statistics 1 Near Duplicate 2.740 × 10−2 2.642 2 Retweet Chain 9.200 × 10−2 53.05∗ 3 Plain Retweet 3.374 × 10−2 34.61∗ 4 Emoticon behavior 8.637 × 10−2 25.68∗ 5 Mention behavior 6.236 × 10−2 28.10∗ 6 Posting time 5.162 × 10−2 61.06∗ 7 Metaphysical power 4.370 × 10−2 0.660 lowees and #followees test at confidence level o parison in Figure 5 (d) more important features This phenomenon shows degree features are inform predictions in different w 4. EXPERIMEN 4.1 Experiment Data Sets. Description #user of good cr #user of bad cre Total Number o #Microblogs by #Microblogs by Total number of Size of vocabula 7 Grant of “badges” 2.212 × 10 6.449 8 Commercial purpose 1.134 × 10−2 2.026 9 N. B. based prob. 7.716 × 10−2 25.76∗ 10 Topic distributions 5.370 × 10−2 39.44∗ ∗ Passes the significance test at the confidence level of 95%. Table 6: Pearson correlation and χ2 statistics evaluation for microblog features Fid Feature Name Pearson Correlation χ2 Statistics 1 Near Duplicate 2.740 × 10−2 2.642 2 Retweet Chain 9.200 × 10−2 53.05∗ 3 Plain Retweet 3.374 × 10−2 34.61∗ 4 Emoticon behavior 8.637 × 10−2 25.68∗ 5 Mention behavior 6.236 × 10−2 28.10∗ 6 Posting time 5.162 × 10−2 61.06∗ 7 Metaphysical power 4.370 × 10−2 0.660 8 Active level 4.770 × 10−2 31.77∗ 9 Sentiment word(+) 4.240 × 10−2 0.380 10 Sentiment word(-) 5.063 × 10−2 0.092 11 Sentiment ploarity(+) 2.602 × 10−2 4.851 12 Sentiment ploarity(-) 9.272 × 10−3 2.268 ∗ Passes the significance test at the confidence level of 95%. Table 7: Pearson correlation and χ2 statistics evaluation for behavior features ing time are especially important since their chi2 statistics are all considerable high and there are 24 different features of this kind. Figure 5 (c) shows the feature importance when behavior features are used as input for GBDT model. Their importance values are all comparable with each other, and the low importance values also validate the intuition that behavior information only indirectly and limitedly reflect user’s credit risk. Although the feature importance of each feature is not very high as a whole, the combination of so many predictive behavior features also demonstrates very high per- of each feature is not very high as many predictive behavior features a formance, as will be shown in the e 3.5.4 Network Features Fid Feature Name P 1 #followees 2 #followers 3 #friends 4 #friends/#followees 5 #followers+#followees 6 Aggregated feature 1 7 Aggregated feature 4 8 Betweenness Cetnrality ∗ Passes the significance test at t Table 8: Pearson correlation an network features Table 8 and Figure 5 (d) presen network features proposed in Sect tures’ correlation value and χ2 st list all of them in the table. Amon •  发帖时间分布   •  手机终端   •  签到地区分布   •  签到地区时间跨度   0.52   0.54   0.56   0.58   0.6   0.62   1   3   5   7   9   11   13   15   17   19   21   Number  of  Features   Accuracy   accuracy  
  • 47. 大数据与金融创新:从研究到实践 基于社会关系网络的风险传递查询和探索引擎   产生种子   网络拓展   业务应用   内部不良客户名单   外部大数据平台   •  社交媒体举报名单   •  互联网金融类网站不良记录 名单   •  政府公共信息平台不良纪录 名单   •  事件新闻触发名单   多元数据挖掘维度   •  用户内容分析(主题,意 见,情感等)   •  上下文情景分析(时空序 列,地理位置等)   •  社会关系网分析(家庭, 同事,好友,社区等)   海量客户自动评分   交互式侦测调查系统  
  • 50. 大数据与金融创新:从研究到实践 社交媒体大数据用于个人信用评估的优势   •  基于⽤用户个⼈人数据可以建⽴立⽤用户个⼈人信⽤用评分 •  个⼈人数据:海量,全⽅方位,动态实时,场景理解 •  分析⼿手段: •  内容分析:兴趣爱好(赌博,⾊色情,奢侈品⾼高消费等),个⼈人素质(粗俗⽤用语,说谎, ⾃自相⽭矛盾),性格特征(易怒,偏激,鲁莽,冲动) •  上下⽂文场景分析:⾏行动轨迹(是否居⽆无定所,出⼊入不良场所,出没于诈骗⾼高发地区), ⽣生活习惯(夜⽣生活,发帖时间),使⽤用设备(⼿手机类型配置) •  基于⽤用户社交⺴⽹网络可以建⽴立⽤用户综合信⽤用评估,挖掘潜在信⽤用⻛风险 •  社交⺴⽹网络:⽤用户的核⼼心⺴⽹网络(家庭,好友,合作伙伴) •  分析⼿手段: •  基于⺴⽹网络的信⽤用推导(例如:是否和信⽤用不良⼈人⼠士关系密切)
  • 51. 大数据与金融创新:从研究到实践 社交大数据用于金融创新的挑战和课题 •  The  “CANNOTs  (or  SHOULD-­‐NOTs)”:  the  boundaries  and  fron.ers   –  Privacy   • How  to  provide  non-­‐intrusive  yet  personalized  customer  service?   • Where  is  the  boundary  between  public  and  private  data?   –  Ownership   • Who  should  own  the  data  shared  on  various  plaaorms?   • How  to  split  profit  from  the  data?   –  Valua?on   • How  to  assess  value  for  different  data  sets?   • How  to  promote  and  regulate  data  exchange  among  par?es?