Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Walmart Big Data Expo

786 visualizaciones

Publicado el

The Human Factor in Artificial Intelligence
Jennifer Prendki

Publicado en: Datos y análisis
  • Sé el primero en comentar

Walmart Big Data Expo

  1. 1. Natural  Intelligence:   the  Human  Factor  in  A.I. Big  Data  Expo  2017 Utrecht,  Netherlands
  2. 2. About  Me • Former  Member  of  the  Search  team  at  @WalmartLabs • Former  Head  of  Metrics  &  Measurements  team • I  also  led  the  Human  Evaluation  team • About  the  Metrics  and  Measurements  team • A  team  of  engineers,  analysts  and  scientists  in  charge  of providing  accurate  and  exhaustive  measurements • we  also  had  an  auditing  role  towards  adjacent  teams • What  do  we  measure? • Engineering  metrics  related  to  model  and  data  quality • Business  metrics  (revenue,  etc.) • More  exotic  customer-­‐centric  metrics   (customer  value,  customer  satisfaction,  model  impact,  etc.) • Currently  Head  of  Data  Science  at  Atlassian • In  charge  of  the  Search  &  Smarts  team
  3. 3. About  Me • Former  Member  of  the  Search  team  at  @WalmartLabs • Former  Head  of  Metrics  &  Measurements  team • I  also  led  the  Human  Evaluation  team • About  the  Metrics  and  Measurements  team • A  team  of  engineers,  analysts  and  scientists  in  charge  of providing  accurate and  exhaustive measurements • we  also  had  an  auditing  role  towards  adjacent  teams • What  do  we  measure? • Engineering  metrics  related  to  model  and  data  quality • Business  metrics  (revenue,  etc.) • More  exotic  customer-­‐centric  metrics   (customer  value,  customer  satisfaction,  model  impact,  etc.) • Currently  Head  of  Data  Science  at  Atlassian • In  charge  of  the  Search  &  Smarts  team
  4. 4. About  Me • Former  Member  of  the  Search  team  at  @WalmartLabs • Former  Head  of  Metrics  &  Measurements  team • I  also  led  the  Human  Evaluation  team • About  the  Metrics  and  Measurements  team • A  team  of  engineers,  analysts  and  scientists  in  charge  of providing  accurate and  exhaustive measurements • we  also  had  an  auditing  role  towards  adjacent  teams • What  do  we  measure? • Engineering  metrics  related  to  model  and  data  quality • Business  metrics  (revenue,  etc.) • More  exotic  customer-­‐centric  metrics   (customer  value,  customer  satisfaction,  model  impact,  etc.) • Currently  Head  of  Data  Science  at  Atlassian • In  charge  of  the  Search  &  Smarts  team
  5. 5. About  Me • Former  Member  of  the  Search  team  at  @WalmartLabs • Former  Head  of  Metrics  &  Measurements  team • I  also  led  the  Human  Evaluation  team • About  the  Metrics  and  Measurements  team • A  team  of  engineers,  analysts  and  scientists  in  charge  of providing  accurate and  exhaustive measurements • we  also  had  an  auditing  role  towards  adjacent  teams • What  do  we  measure? • Engineering  metrics  related  to  model  and  data  quality • Business  metrics  (revenue,  etc.) • More  exotic  customer-­‐centric  metrics   (customer  value,  customer  satisfaction,  model  impact,  etc.) • Currently  Head  of  Data  Science  at  Atlassian • In  charge  of  the  Search  &  Smarts  team
  6. 6. q Humans  &  Big  Data • The  role  of  human  beings  in  the  era  of  Big  Data • Why  do  we  need  to  tag  data? • How  to  get  tagged  data? q The  Era  of  Crowdsourcing • What  is  Crowdsourcing? • Use  cases  and  details  about  Crowdsourcing • Traditional  crowds  vs.  curated  crowds q The  Human-­‐in-­‐the-­‐Loop  Paradigm • Definition  and  details  about  Human-­‐In-­‐The-­‐Loop  ML • Introduction  to  Active  Learning Outline
  7. 7. q Humans  &  Big  Data • The  role  of  human  beings  in  the  era  of  Big  Data • Why  do  we  need  to  tag  data? • How  to  get  tagged  data? q The  Era  of  Crowdsourcing • What  is  Crowdsourcing? • Use  cases  and  details  about  Crowdsourcing • Traditional  crowds  vs.  curated  crowds q The  Human-­‐in-­‐the-­‐Loop  Paradigm • Definition  and  details  about  Human-­‐In-­‐The-­‐Loop  ML • Introduction  to  Active  Learning Outline
  8. 8. q Humans  &  Big  Data • The  role  of  human  beings  in  the  era  of  Big  Data • Why  do  we  need  to  tag  data? • How  to  get  tagged  data? q The  Era  of  Crowdsourcing • What  is  Crowdsourcing? • Use  cases  and  details  about  Crowdsourcing • Traditional  crowds  vs.  curated  crowds q The  Human-­‐in-­‐the-­‐Loop  Paradigm • Definition  and  details  about  Human-­‐In-­‐The-­‐Loop  ML • Introduction  to  Active  Learning Outline
  9. 9. Humans  &  Big  Data: The  Role  of  Human  Beings  in  the  Era  of   Machine  Learning
  10. 10. The  Era  of  Very  Big  Data q VOLUME • More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race • By  2020,  accumulated  data  will  reach  44 trillion gigabytes q VELOCITY • By  2020,  ~1.7  MB of  new  data  /  second  /  human  being • 1.2  trillion  search  queries  on  Google  per  year q VARIETY • 31  million  messages/2.8  million  videos per  minute  on  Facebook • Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube • In  2015, 1  trillion  photos taken;  billions  shared  online data  center  at  Google
  11. 11. The  Era  of  Very  Big  Data q VOLUME • More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race • By  2020,  accumulated  data  will  reach  44 trillion gigabytes q VELOCITY • By  2020,  ~1.7  MB of  new  data  /  second  /  human  being • 1.2  trillion  search  queries  on  Google  per  year q VARIETY • 31  million  messages/2.8  million  videos per  minute  on  Facebook • Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube • In  2015, 1  trillion  photos taken;  billions  shared  online data  center  at  Google
  12. 12. The  Era  of  Very  Big  Data q VOLUME • More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race • By  2020,  accumulated  data  will  reach  44 trillion gigabytes q VELOCITY • By  2020,  ~1.7  MB of  new  data  /  second  /  human  being • 1.2  trillion  search  queries  on  Google  per  year q VARIETY • 31  million  messages/2.8  million  videos per  minute  on  Facebook • Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube • In  2015, 1  trillion  photos taken;  billions  shared  online data  center  at  Google
  13. 13. The  Era  of  Very  Big  Data q VOLUME • More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race • By  2020,  accumulated  data  will  reach  44 trillion gigabytes q VELOCITY • By  2020,  ~1.7  MB of  new  data  /  second  /  human  being • 1.2  trillion  search  queries  on  Google  per  year q VARIETY • 31  million  messages/2.8  million  videos per  minute  on  Facebook • Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube • In  2015, 1  trillion  photos taken;  billions  shared  online data  center  at  Google
  14. 14. Supervised  vs.  Unsupervised  Machine  Learning Supervised  ML requires  tagged  data • Classification:   problem  where  the  output  variable  is  a  category examples:  SVM,  random  forest,  Bayesian  classifiers • Regression:   problem  where  the  output  variable  is  a  real  value examples:  linear  regression,  random  forest
  15. 15. Supervised  vs.  Unsupervised  Machine  Learning Supervised  ML requires  tagged  data Unsupervised  ML doesn’t  require  tagged  data • Classification:   problem  where  the  output  variable  is  a  category examples:  SVM,  random  forest,  Bayesian  classifiers • Regression:   problem  where  the  output  variable  is  a  real  value examples:  linear  regression,  random  forest • Clustering: discovery of inherent groupings in the data examples: k-­‐means, k-­‐nearest neighbors • Association rules: discovery of rules describing the data example: Apriori algorithm
  16. 16. Supervised  vs.  Unsupervised  Machine  Learning Supervised  ML requires  tagged  data Unsupervised  ML doesn’t  require  tagged  data Supervised: • Image  Recognition • Speech  Recognition Unsupervised • Feature  Learning • Autoencoders • Classification:   problem  where  the  output  variable  is  a  category examples:  SVM,  random  forest,  Bayesian  classifiers • Regression:   problem  where  the  output  variable  is  a  real  value examples:  linear  regression,  random  forest • Clustering: discovery of inherent groupings in the data examples: k-­‐means, k-­‐nearest neighbors • Association rules: discovery of rules describing the data example: Apriori algorithm The  Case  of  Deep  Learning both  supervised  and  unsupervised  applications NB:  Deep  Learning  algorithms   are  data-­‐greedy…
  17. 17. • Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML • Expensive • Quality  control  is  hard,  requires  second  human  pass • Hardly  scalable  à heavy  use  of  sampling  strategies • How  do  companies  doing  Machine  Learning  get  tagged  data? • Implicit  tagging:  customer  engagement • Explicit  tagging:  manual  labor • A  few  strategies  to  get  tagged  data  for  cheap/free: • Games  (Google  Quick  Draw) • Incentivization  (extra  lives  or  bonuses  in  games) Tagged  Data
  18. 18. • Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML • Expensive • Quality  control  is  hard,  requires  second  human  pass • Hardly  scalable  à heavy  use  of  sampling  strategies • How  do  companies  doing  Machine  Learning  get  tagged  data? • Implicit  tagging:  customer  engagement • Explicit  tagging:  manual  labor • A  few  strategies  to  get  tagged  data  for  cheap/free: • Games  (Google  Quick  Draw) • Incentivization  (extra  lives  or  bonuses  in  games) Tagged  Data
  19. 19. • Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML • Expensive • Quality  control  is  hard,  requires  second  human  pass • Hardly  scalable  à heavy  use  of  sampling  strategies • How  do  companies  doing  Machine  Learning  get  tagged  data? • Implicit  tagging:  customer  engagement • Explicit  tagging:  manual  labor • A  few  strategies  to  get  tagged  data  for  cheap/free: • Games  (Google  Quick  Draw) • Incentivization  (extra  lives  or  bonuses  in  games) Tagged  Data https://quickdraw.withgoogle.com/
  20. 20. Why  human  input  matters:  the  use  case  of  image  colorization The  Wisdom  from  the  Crowd
  21. 21. Why  human  input  matters:  the  use  case  of  image  colorization The  Wisdom  from  the  Crowd Colorization Model à Colorization  is  straightforward  to  humans  because  they  can  ‘tap’  into  their  general  knowledge
  22. 22. The  Wisdom  from  the  Crowd image   recognition watermelon grapesbananas pineapple orange tagged training  data  set “Bananas  are  generally   ” ‘general’  knowledge • obvious  for  human  beings • fastidious  for  machines colorization Why  human  input  matters:  the  use  case  of  image  colorization
  23. 23. Crowdsourcing: Human  Wisdom  at  Scale
  24. 24. What  is  Crowdsourcing? the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people Crowdsourcing
  25. 25. What  is  Crowdsourcing? Ø Crowdsourcing  =  'crowd'  +  'outsourcing'   Ø Act  of  taking  a  function  once  performed  by  employees  and   outsourcing  it  to  an  undefined  (generally  large)  network  of   people  in  the  form  of  an  open  call the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people History  of  Crowdsourcing • Term  was  first  used  in  2005  by  the  editors  at Wired • Official  definition  published  in  Wired  article  “The  Rise  of  Crowdsourcing”,  June  2016 • Describes  how  businesses  were  using  the  Internet  to  “outsource  work  to  the  crowd” What  Crowdsourcing  helps  with: • Scale   à peer-­‐production  (for jobs  to  be  performed  collaboratively)   • Reach   à connect  with  a  large  network  of  potential  laborers  (if  task  undertaken  by  sole  individuals) Crowdsourcing
  26. 26. What  is  Crowdsourcing? Ø Crowdsourcing  =  'crowd'  +  'outsourcing'   Ø Act  of  taking  a  function  once  performed  by  employees  and   outsourcing  it  to  an  undefined  (generally  large)  network  of   people  in  the  form  of  an  open  call the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people History  of  Crowdsourcing • Term  was  first  used  in  2005 by  the  editors  at Wired • Official  definition  published  in  Wired  article  “The  Rise  of  Crowdsourcing”,  June  2006 • Describes  how  businesses  were  using  the  Internet  to  “outsource  work  to  the  crowd” What  Crowdsourcing  helps  with: • Scale   à peer-­‐production  (for jobs  to  be  performed  collaboratively)   • Reach   à connect  with  a  large  network  of  potential  laborers  (if  task  undertaken  by  sole  individuals) Crowdsourcing
  27. 27. What  is  Crowdsourcing? Ø Crowdsourcing  =  'crowd'  +  'outsourcing'   Ø Act  of  taking  a  function  once  performed  by  employees  and   outsourcing  it  to  an  undefined  (generally  large)  network  of   people  in  the  form  of  an  open  call the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people Crowdsourcing History  of  Crowdsourcing • Term  was  first  used  in  2005 by  the  editors  at Wired • Official  definition  published  in  Wired  article  “The  Rise  of  Crowdsourcing”,  June  2016 • Describes  how  businesses  were  using  the  Internet  to  “outsource  work  to  the  crowd” What  Crowdsourcing  helps  with: • Scale   à peer-­‐production  (for jobs  to  be  performed  collaboratively)   • Reach   à connect  with  a  large  network  of  potential  laborers  (if  task  undertaken  by  sole  individuals)
  28. 28. The  Nature  of  Crowdsourcing • Data  generation: user  generated  content  such  as  reviews,  pictures,  translations,  etc. • Data  validation:  validation  of  translation,  etc. • Data  tagging:  image  tagging,  product  categorization,  etc. • Data  curation:  curation  of  news  feeds,  etc. Microtasks Funding Macrotasks • Solution  development:  algorithm  improvement,  etc. • Crowd  contest:  design  competition,  algorithmic  competition,  etc.
  29. 29. The  Nature  of  Crowdsourcing • Data  generation: user  generated  content  such  as  reviews,  pictures,  translations,  etc. • Data  validation:  validation  of  translation,  etc. • Data  tagging:  image  tagging,  product  categorization,  etc. • Data  curation:  curation  of  news  feeds,  etc. Microtasks Funding Macrotasks • Solution  development:  algorithm  improvement,  etc. • Crowd  contest:  design  competition,  algorithmic  competition,  etc.
  30. 30. The  Nature  of  Crowdsourcing • Data  generation: user  generated  content  such  as  reviews,  pictures,  translations,  etc. • Data  validation:  validation  of  translation,  etc. • Data  tagging:  image  tagging,  product  categorization,  etc. • Data  curation:  curation  of  news  feeds,  etc. Microtasks Funding Macrotasks • Solution  development:  algorithm  improvement,  etc. • Crowd  contest:  design  competition,  algorithmic  competition,  etc.
  31. 31. Some  Cool  Crowdsourcing  Applications
  32. 32. Some  Cool  Crowdsourcing  Applications Mapping • Photo  Sphere • Google  Maps  crowdsources  info  for   wheelchair-­‐accessible  places
  33. 33. Some  Cool  Crowdsourcing  Applications Mapping • Photo  Sphere • Google  Maps  crowdsources  info  for   wheelchair-­‐accessible  places Traffic • Google  Traffic • Waze:  Traffic  reporting  app
  34. 34. Some  Cool  Crowdsourcing  Applications Mapping • Photo  Sphere • Google  Maps  crowdsources  info  for   wheelchair-­‐accessible  places Traffic • Google  Traffic • Waze:  Traffic  reporting  app Translation   • Google  Translate
  35. 35. Some  Cool  Crowdsourcing  Applications Mapping • Photo  Sphere • Google  Maps  crowdsources  info  for   wheelchair-­‐accessible  places Traffic • Google  Traffic • Waze:  Traffic  reporting  app Epidemiology • Flu  tracking  applications Translation   • Google  Translate
  36. 36. Companies  Based  on  Crowdsourcing Quora is  a question-­‐and-­‐answer  site where  questions  are  asked,   answered,  edited  and  organized  by  its  community  of  users. Waze  is  a  community-­‐based  traffic  and  navigation  app  where  drivers   share  real-­‐time  traffic  and  road  info Kaggle is  a  platform  for predictive  modelling competitions  in  which   companies  post  data  and  data  miners  compete  to  produce  the  best  models. Stack  Overflow  is  a  platform  for  users  to  ask  and  answer  questions  and  to   vote  questions  and  answers  up  or  down  and  edit  them. Flickr is  an image  and  video  hosting website that  is  widely  used   by bloggers to  host  images  that  they  embed  in  social  media.
  37. 37. The  Challenges  of  Crowdsourcing
  38. 38. Reliability   • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items) • Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area Relevance  of  knowledge • Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating Subjectivity • Search: Relevance  score  varies  depending  on  profile  and  personal  preferences Speed  &  cost • Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples • Not  practical  for  measurement  purposes The  Challenges  of  Crowdsourcing
  39. 39. Reliability   • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items) • Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area Relevance  of  knowledge • Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating Subjectivity • Search: Relevance  score  varies  depending  on  profile  and  personal  preferences Speed  &  cost • Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples • Not  practical  for  measurement  purposes The  Challenges  of  Crowdsourcing
  40. 40. Reliability   • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items) • Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area Relevance  of  knowledge • Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating Subjectivity • Search: Relevance  score  varies  depending  on  profile  and  personal  preferences Speed  &  cost • Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples • Not  practical  for  measurement  purposes The  Challenges  of  Crowdsourcing
  41. 41. Reliability   • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items) • Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area Relevance  of  knowledge • Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating Subjectivity • Search: Relevance  score  varies  depending  on  profile  and  personal  preferences Speed  &  cost • Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples • Not  practical  for  measurement  purposes The  Challenges  of  Crowdsourcing
  42. 42. Crowdsourcing  vs.  Curated  Crowds Traditional  Crowdsourcing  Model $$$$$ + Speed:   • many  hands  generate  light  work + Lower  cost: • typically  a  few  pennies  per  task -­‐ No  quality  control -­‐ Lack  of  control:   • little  to  no  incentive  to  deliver  on  time -­‐ High  maintenance:   • clear  instructions  needed   • automated  understanding  checks -­‐ Lower  reliability:   • high  overlap  required -­‐ Lack  of  confidentiality:   • anyone  can  see  your  tasks Curated  Crowd $$$$$ + Quality  control:   • judges  submitted  to  quality  metrics   • removed  if  they  don’t  deliver  required  quality + Better  quality:   • very  little  overlap  needed + Expertise: • judges  become  experts  at  required  task + Constraints  on  crowd:   • judges  less  likely  to  drop  out -­‐ More  expensive: • typically  primary  source  of  income  for  judges -­‐ Consistency  required:   • need  frequent  tasks  to  keep  sharp  skills
  43. 43. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce
  44. 44. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce The  example  of  Product  Tagging
  45. 45. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce The  example  of  Product  Tagging
  46. 46. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce The  example  of  Product  Tagging
  47. 47. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce The  example  of  Product  Tagging
  48. 48. Use  Case:  Evaluation  of  Search  Engine  Relevance à Human  evaluation  makes  it  possible  to   measure  the  intangible  with  little  risk Ranking  BRanking  A Side-­‐by-­‐Side  Engine  Comparison Judge  1: Prefers  ranking  A Judge  2: Prefers  ranking  A Judge  3: Prefers  ranking  B
  49. 49. Use  Case:  Evaluation  of  Search  Engine  Relevance 5/5 5/5 5/5 4/5 3/5 2/5 5/5 5/5 5/5 5/5 5/5 5/5 Query-­‐Item  Relevance  Scoring  for   Measurement  of  Ranking  Quality 𝐷𝐶𝐺$ = & 𝑟𝑒𝑙* 𝑙𝑜𝑔-(𝑖 + 1) $ *34 𝑛𝐷𝐶𝐺$ = 𝐷𝐶𝐺$ 𝐼𝐷𝐶𝐺$ 𝐼𝐷𝐶𝐺$ = & 289:; − 1 𝑙𝑜𝑔-(𝑖 + 1) =>? *34 where graded  relevance  of item at  position i Discounted  cumulative  gain
  50. 50. Human-­‐in-­‐the-­‐Loop: When  Human  Beings  still  Outperform  the  Machine Fact:   the  brain  has 38  petaflops (thousand  trillion  operations  per  second)   of  processing  power…
  51. 51. The  Dream  of  Automation FIRST  REVOLUTION  – 1784 Mechanical  production,   railroad,  steam  power SECOND  REVOLUTION  – 1870 Mass  production,  electrical  power,   assembly  lines THIRD  REVOLUTION  – 1969 Automated  production,  electronics, computers FOURTH  REVOLUTION  – ongoing Artificial  intelligence,  big  data The  4  Industrial  Revolutions
  52. 52. The  Dream  of  Automation FIRST  REVOLUTION  – 1784 Mechanical  production,   railroad,  steam  power SECOND  REVOLUTION  – 1870 Mass  production,  electrical  power,   assembly  lines THIRD  REVOLUTION  – 1969 Automated  production,  electronics, computers FOURTH  REVOLUTION  – ongoing Artificial  intelligence,  big  data à Automation  is  not  a  new  idea The  4  Industrial  Revolutions
  53. 53. The  Dream  of  Automation FIRST  REVOLUTION  – 1784 Mechanical  production,   railroad,  steam  power SECOND  REVOLUTION  – 1870 Mass  production,  electrical  power,   assembly  lines THIRD  REVOLUTION  – 1969 Automated  production,  electronics, computers FOURTH  REVOLUTION  – ongoing Artificial  intelligence,  big  data à Automation  is  not  a  new  idea The  4  Industrial  Revolutions the  use  of  various control  systems for  operating   equipment  such  as  machinery  and  processes  with   minimal  or  reduced  human  intervention. Automation
  54. 54. The  Dream  of  Automation the  use  of  various control  systems for  operating   equipment  such  as  machinery  and  processes  with   minimal  or  reduced  human  intervention. FIRST  REVOLUTION  – 1784 Mechanical  production,   railroad,  steam  power SECOND  REVOLUTION  – 1870 Mass  production,  electrical  power,   assembly  lines THIRD  REVOLUTION  – 1969 Automated  production,  electronics, computers FOURTH  REVOLUTION  – ongoing Artificial  intelligence,  big  data Why? • Automate  boring/repetitive  tasks • Perform  tasks  at  scale • Perform  tasks  with  enhanced  precision • Deliver  consistent products • Use  machines  where  they  outperform  humans à Automation  is  not  a  new  idea The  4  Industrial  Revolutions Automation
  55. 55. When  Full  Automation  can’t  be  Achieved… Human-­‐in-­‐the-­‐Loop Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction
  56. 56. The  idea  of  using  human  beings  to  enhance  the  machine  is  not  new We  have  been  doing  Human-­‐in-­‐the-­‐Loop  all  along… • Example:  Autopilot  technology  for  planes Human  intervention/presence  is  useful: • To  handle  corner  cases  (outlier  management) • To  “keep  an  eye”  on  the  system  (sanity  check) • To  correct  unwanted  behavior  (refinement) • To  validate  appropriate  behavior  (validation) When  Full  Automation  can’t  be  Achieved… Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction Human-­‐in-­‐the-­‐Loop
  57. 57. The  idea  of  using  human  beings  to  enhance  the  machine  is  not  new We  have  been  doing  Human-­‐in-­‐the-­‐Loop  all  along… • Example:  Autopilot  technology  for  planes Human  intervention/presence  is  useful: • To  handle  corner  cases  (outlier  management) • To  “keep  an  eye”  on  the  system  (sanity  check) • To  correct  unwanted  behavior  (refinement) • To  validate  appropriate  behavior  (validation) When  Full  Automation  can’t  be  Achieved… Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction Human-­‐in-­‐the-­‐Loop
  58. 58. Human-­‐in-­‐the-­‐Loop  Paradigm Pareto  Principle aka  the  80/20  rule,  the law  of  the  vital  few, or  the principle  of  factor  sparsity -­‐ states  that,  for  many  events,  roughly  80%  of  the  effects  come  from  20%  of  the  causes
  59. 59. ML  version  of  the  Pareto  Principle:   • Evidence  suggests  that  some  of  the  most  accurate  ML  systems  to  date need:   • 80%  computer  AI-­‐driven   • 19%  human  input • 1  %  unknown  randomness   to  balance  things  out • The  combination  of  machine  and  human  intervention  achieves  maximum  machine  accuracy How  can  human  knowledge  be  incorporated  to  ML  models? A. Helping  label  the  original  dataset  that  will  be  fed  into  a  ML  model B. Helping  correct  inaccurate  predictions  that  arise  as  the  system  goes  live. Human-­‐in-­‐the-­‐Loop  Paradigm aka  the  80/20  rule,  the law  of  the  vital  few, or  the principle  of  factor  sparsity -­‐ states  that,  for  many  events,  roughly  80%  of  the  effects  come  from  20%  of  the  causes Pareto  Principle
  60. 60. ML  version  of  the  Pareto  Principle:   • Evidence  suggests  that  some  of  the  most  accurate  ML  systems  to  date need:   • 80%  computer  AI-­‐driven   • 19%  human  input • 1  %  unknown  randomness   to  balance  things  out • The  combination  of  machine  and  human  intervention  achieves  maximum  machine  accuracy How  can  human  knowledge  be  incorporated  to  ML  models? A. Helping  label  the  original  dataset  that  will  be  fed  into  a  ML  model B. Helping  correct  inaccurate  predictions  that  arise  as  the  system  goes  live Human-­‐in-­‐the-­‐Loop  Paradigm aka  the  80/20  rule,  the law  of  the  vital  few, or  the principle  of  factor  sparsity -­‐ states  that,  for  many  events,  roughly  80%  of  the  effects  come  from  20%  of  the  causes Pareto  Principle
  61. 61. Human-­‐In-­‐The-­‐Loop  Use  Case  #1 An  example  of  HITL  approach:  face  recognition
  62. 62. Human-­‐In-­‐The-­‐Loop  Use  Case  #1 Mary Roberto Victoria LauraSebastian Cecelia An  example  of  HITL  approach:  face  recognition
  63. 63. Human-­‐In-­‐The-­‐Loop  Use  Case  #1 Mary Roberto Victoria LauraSebastian Cecelia Accuracy • Facebook's  DeepFace Software  reaches  97.25%  of  accuracy HITL  as  a  feedback  loop • When  the  confidence  is  below  a  certain  threshold,  it: • suggests  a  label • ask  the  uploader  to  validate/approve  or  correct  the   suggestion • The  new  data  is  used  to  improve  the  accuracy  of  the   algorithm An  example  of  HITL  approach:  face  recognition
  64. 64. Human-­‐In-­‐The-­‐Loop  Use  Case  #1 Mary Roberto Victoria LauraSebastian Cecelia Accuracy • Facebook's  DeepFace Software  reaches  97.25%  of  accuracy HITL  as  a  feedback  loop • When  the  confidence  is  below  a  certain  threshold,  it: • suggests a  label • ask  the  uploader  to  validate/approve  or  correct  the   suggestion • The  new  data  is  used  to  improve  the  accuracy  of  the   algorithm An  example  of  HITL  approach:  face  recognition
  65. 65. Human-­‐In-­‐The-­‐Loop  Use  Case  #2 An  example  of  HITL  approach:  autonomous  vehicles
  66. 66. Teaching  the  machine • Driving  systems  were  trained  using  a  human  to  oversee  the  process Accuracy  considerations • Autopilot  system  is  now  over  99%  accurate • However,  a  99%  accuracy  means  that  people  can  die  1%  of  the  time  (!!) • Though  we  have  seen  huge  advances  in  accuracy  of  pure  machine-­‐ driven  systems,  they  tend  to fall  short  of  acceptable accuracy  rates Human-­‐In-­‐The-­‐Loop  Use  Case  #2 An  example  of  HITL  approach:  autonomous  vehicles
  67. 67. Teaching  the  machine • Driving  systems  were  trained  using  a  human  to  oversee  the  process Accuracy  considerations • Autopilot  system  is  now  over  99%  accurate • However,  a  99%  accuracy  means  that  people  can  die  1%  of  the  time  (!!) • Though  we  have  seen  huge  advances  in  accuracy  of  pure  machine-­‐ driven  systems,  they  tend  to fall  short  of  acceptable accuracy  rates Human-­‐In-­‐The-­‐Loop  Use  Case  #2 An  example  of  HITL  approach:  autonomous  vehicles
  68. 68. Teaching  the  machine • Driving  systems  were  trained  using  a  human  to  oversee  the  process Accuracy  considerations • Autopilot  system  is  now  over  99%  accurate • However,  a  99%  accuracy  means  that  people  can  die  1%  of  the  time  (!!) • Though  we  have  seen  huge  advances  in  accuracy  of  pure  machine-­‐ driven  systems,  they  tend  to fall  short  of  acceptable accuracy  rates Human-­‐In-­‐The-­‐Loop  Use  Case  #2 An  example  of  HITL  approach:  autonomous  vehicles Corner  cases • Fun  fact: Volvo’s  self-­‐driving  cars  fail  in  Australia  because  of  kangaroos • Reaching  100%  is  hard  because  of  corner  cases • A  HITL  approach  helps  get  the  accuracy  to  ~100% • get  the  accuracy  to  ~100% Volvo's  driverless  cars   'confused'  by  kangaroos
  69. 69. The  Success  of  Human-­‐In-­‐The-­‐Loop The  Example  of  Chess
  70. 70. The  Human  vs.  the  Machine • In  1997,  Chess  Master  Garry  Kasparov  is  beaten  by  IBM  supercomputer  Deep  Blue The  Success  of  Human-­‐In-­‐The-­‐Loop The  Example  of  Chess Garry  Kasparov
  71. 71. The  Human  vs.  the  Machine • In  1997,  Chess  Master  Garry  Kasparov  is  beaten  by  IBM  supercomputer  Deep  Blue The  Success  of  Human-­‐In-­‐The-­‐Loop The  Example  of  Chess Freestyle  or  “Advanced”  Chess • Advanced:  A  human  chess  master  works  with  a  computer  to  find  the  best  possible  move   • Freestyle:  A  team  can  be  made  of  any  combination  of  human  beings  +  computers • In  2005,  Steven  Cramton,  Zackary  Stephen  and  their  3  computers  win  Freestyle  Chess  Tournament Why  it  works • Computers  are  great  at  reading  tough  tactical  situations • But  humans  are  better  at  understanding  long  term  strategy • Computers  to  limit  “blunders”  while  using  their  intuition  to  force  the  opponent  into  board  states  that   confuses  the  computer(s) Garry  Kasparov
  72. 72. The  Human  vs.  the  Machine • In  1997,  Chess  Master  Garry  Kasparov  is  beaten  by  IBM  supercomputer  Deep  Blue The  Success  of  Human-­‐In-­‐The-­‐Loop The  Example  of  Chess Freestyle  or  “Advanced”  Chess • Advanced:  A  human  chess  master  works  with  a  computer  to  find  the  best  possible  move   • Freestyle:  A  team  can  be  made  of  any  combination  of  human  beings  +  computers • In  2005,  Steven  Cramton,  Zackary  Stephen  and  their  3  computers  win  Freestyle  Chess  Tournament Why  it  works • Computers  are  great  at  reading  tough  tactical  situations • But  humans  are  better  at  understanding  long  term  strategy • Computers  to  limit  “blunders”  while  using  their  intuition  to  force  the  opponent  into  board  states  that   confuses  the  computer(s) Garry  Kasparov
  73. 73. Active  Learning: The  Best  of  Both  Worlds
  74. 74. Active  Learning a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance Active  Learning
  75. 75. Active  Learning a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance General  Strategy If  D  is the  entire  data  set,  a each  iteration i , D is  broken  up  into  three  subsets 1. DK,  i :  data  points  where  the  label  is known 2. DU,  i :  data  points  where  the  label  is unknown 3. DQ,  i :  data  points for  which  the  label  is  queried  (sometimes,  even  when  the  label  is  known) Benefits • Query  labels  only  when  necessary  (lower  cost) Next  Generation  Algorithms • Proactive  learning:   • relaxes  the  assumption  that  the  oracle  is  always  right • casts  the  problem  as  an  optimization  problem w/  a budget  constraint Active  Learning
  76. 76. Active  Learning a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance General  Strategy If  D  is the  entire  data  set,  a each  iteration i , D is  broken  up  into  three  subsets 1. DK,  i :  data  points  where  the  label  is known 2. DU,  i :  data  points  where  the  label  is unknown 3. DQ,  i :  data  points for  which  the  label  is  queried  (sometimes,  even  when  the  label  is  known) Benefits • Query  labels  only  when  necessary  (lower  cost) Next  Generation  Algorithms • Proactive  learning:   • relaxes  the  assumption  that  the  oracle  is  always  right • casts  the  problem  as  an  optimization  problem w/  a budget  constraint Active  Learning
  77. 77. Active  Learning a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance General  Strategy If  D  is the  entire  data  set,  a each  iteration i , D is  broken  up  into  three  subsets 1. DK,  i :  data  points  where  the  label  is known 2. DU,  i :  data  points  where  the  label  is unknown 3. DQ,  i :  data  points for  which  the  label  is  queried  (sometimes,  even  when  the  label  is  known) Benefits • Query  labels  only  when  necessary  (lower  cost) Next  Generation  Algorithms • Proactive  learning:   • relaxes  the  assumption  that  the  oracle  is  always  right • casts  the  problem  as  an  optimization  problem w/  a budget  constraint Active  Learning
  78. 78. Active  Learning:  How  does  it  Work?
  79. 79. Active  Learning:  How  does  it  Work? Machine  Learning  needs   • Logics  (algorithm) • Data   • Optimization • Feedback  ß Human-­‐in-­‐the-­‐Loop Active  Learning  =  a  Machine  Learning  Algorithm  using   an  “oracle”  to  reduce  mistakes/uncertainty Query  Strategy  -­‐ Labels  are  queried  when: • Data  points  for  which  model  uncertainty  is  high   (uncertainty  sampling) • Data  points  for  which  the  different  models  of  an   ensemble  method  disagree  the  most   (query  by  committee) • Data  points  causing  the  most  changes  on  the  model (expected  model  change) • Data  points  caused  overall  variance  to  be  high (variance  reduction)
  80. 80. Active  Learning:  How  does  it  Work? Unlabeled  Data Active   Learning   Algorithm select/remove   single  example Labeled  Data Classifier Oracle (Human) update add  labeled   example provide   correct  label Machine  Learning  needs   • Logics  (algorithm) • Data   • Optimization • Feedback  ß Human-­‐in-­‐the-­‐Loop Active  Learning  =  a  Machine  Learning  Algorithm  using   an  “oracle”  to  reduce  mistakes/uncertainty Query  Strategy  -­‐ Labels  are  queried  when: • Data  points  for  which  model  uncertainty  is  high   (uncertainty  sampling) • Data  points  for  which  the  different  models  of  an   ensemble  method  disagree  the  most   (query  by  committee) • Data  points  causing  the  most  changes  on  the  model (expected  model  change) • Data  points  caused  overall  variance  to  be  high (variance  reduction)
  81. 81. Active  Learning:  How  does  it  Work? Unlabeled  Data Active   Learning   Algorithm select/remove   single  example Labeled  Data Classifier Oracle (Human) update add  labeled   example provide   correct  label Machine  Learning  needs   • Logics  (algorithm) • Data   • Optimization • Feedback  ß Human-­‐in-­‐the-­‐Loop Active  Learning  =  a  Machine  Learning  Algorithm  using   an  “oracle”  to  reduce  mistakes/uncertainty Query  Strategy  -­‐ Labels  are  queried  when: • Data  points  for  which  model  uncertainty  is  high   (uncertainty  sampling) • Data  points  for  which  the  different  models  of  an   ensemble  method  disagree  the  most   (query  by  committee) • Data  points  causing  the  most  changes  on  the  model (expected  model  change) • Data  points  caused  overall  variance  to  be  high (variance  reduction)
  82. 82. Active  Learning:  How  does  it  Work? Machine  Learning Classifier Confidence   level  high? YES NO Output Annotation  by   Human  Oracle Human-­‐in-­‐the-­‐Loop Active  Learning By  adding  a  human  feedback  loop,  we  allow  the  system  to:   • actively  learn • correct  itself  where  it  got  it  wrong • improve  the  algorithm  over  iterations
  83. 83. Active  Learning:  How  does  it  Work? Machine  Learning Classifier Confidence   level  high? YES NO Output Annotation  by   Human  Oracle Human-­‐in-­‐the-­‐Loop Active  Learning By  adding  a  human  feedback  loop,  we  allow  the  system  to:   • actively  learn • correct  itself  where  it  got  it  wrong • improve  the  algorithm  over  iterations
  84. 84. 3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail Active  Learning  at  Walmart  e-­‐Commerce
  85. 85. q Machine  Learning  Lifecycle  Management  (Programming  by  Feedback) • Automatic  monitoring  of  input  and  output  values  for  ML  algorithm • An  algorithm  detects  failings  and  outliers  in  real-­‐time  and  suggest  an  action • A  human  validates  the  action,  creating  tagged  data  for  full  automation q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning) • Algorithm  uncovers  demoted  items  and  suggests  most  likely  reason  for  the  demotion • Engineer  manually  confirms/corrects  the  suggestion,  generating  training  data  for  full  automation q Refinement  of  Query  Tagging  Algorithm  (Optimization) • Human  evaluation  team  manually  measures  accuracy  of  query  tagging  model • Mistagged  queries  are  used  to  discover  patterns  specific  to  problematic  queries,  which  are  reported  to  engineers • Sample  is  enriched  with  problematic  queries  (evaluation  team  can  diagnose  problems  with  algorithms) 3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail Active  Learning  at  Walmart  e-­‐Commerce
  86. 86. q Machine  Learning  Lifecycle  Management  (Programming  by  Feedback) • Automatic  monitoring  of  input  and  output  values  for  ML  algorithm • An  algorithm  detects  failings  and  outliers  in  real-­‐time  and  suggest  an  action • A  human  validates  the  action,  creating  tagged  data  for  full  automation q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning) • Algorithm  uncovers  demoted  items  and  suggests  most  likely  reason  for  the  demotion • Engineer  manually  confirms/corrects  the  suggestion,  generating  training  data  for  full  automation q Refinement  of  Query  Tagging  Algorithm  (Optimization) • Human  evaluation  team  manually  measures  accuracy  of  query  tagging  model • Mistagged  queries  are  used  to  discover  patterns  specific  to  problematic  queries,  which  are  reported  to  engineers • Sample  is  enriched  with  problematic  queries  (evaluation  team  can  diagnose  problems  with  algorithms) 3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail Active  Learning  at  Walmart  e-­‐Commerce
  87. 87. q Machine  Learning  Lifecycle  Management  (Programming  by  Feedback) • Automatic  monitoring  of  input  and  output  values  for  ML  algorithm • An  algorithm  detects  failings  and  outliers  in  real-­‐time  and  suggest  an  action • A  human  validates  the  action,  creating  tagged  data  for  full  automation q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning) • Algorithm  uncovers  demoted  items  and  suggests  most  likely  reason  for  the  demotion • Engineer  manually  confirms/corrects  the  suggestion,  generating  training  data  for  full  automation q Refinement  of  Query  Tagging  Algorithm  (Optimization) • Human  evaluation  team  manually  measures  accuracy  of  query  tagging  model • Mistagged  queries  are  used  to  discover  patterns  specific  to  problematic  queries,  which  are  reported  to  engineers • Sample  is  enriched  with  problematic  queries  (evaluation  team  can  diagnose  problems  with  algorithms) 3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail red t-shirt Size M color product  type size Active  Learning  at  Walmart  e-­‐Commerce
  88. 88. • Why  do  humans  and  machine  complement  each  other? • Human  beings  are  memory-­‐constrained • Computers  are  knowledge-­‐constrained • Tagged  data  more  important  than  ever • But  getting  quality  data  is  challenging  given  the  volume  of  data • Crowdsourcing  offer  more  flexibility  to  tag  data  at  scale • Human-­‐in-­‐the-­‐Loop  paradigm • Improve  accuracy  of  machine  learning  algorithm  (classifiers) • Many  examples  of  successful  endeavors  using  “Augmented  Intelligence” • Active  Learning  is  a  booming  area  of  ML/AI Conclusion  and  Takeaways
  89. 89. • Why  do  humans  and  machine  complement  each  other? • Human  beings  are  memory-­‐constrained • Computers  are  knowledge-­‐constrained • Tagged  data  more  important  than  ever • But  getting  quality  data  is  challenging  given  the  volume  of  data • Crowdsourcing  offer  more  flexibility  to  tag  data  at  scale • Human-­‐in-­‐the-­‐Loop  paradigm • Improve  accuracy  of  machine  learning  algorithm  (classifiers) • Many  examples  of  successful  endeavors  using  “Augmented  Intelligence” • Active  Learning  is  a  booming  area  of  ML/AI Conclusion  and  Takeaways
  90. 90. • Why  do  humans  and  machine  complement  each  other? • Human  beings  are  memory-­‐constrained • Computers  are  knowledge-­‐constrained • Tagged  data  more  important  than  ever • But  getting  quality  data  is  challenging  given  the  volume  of  data • Crowdsourcing  offer  more  flexibility  to  tag  data  at  scale • Human-­‐in-­‐the-­‐Loop  paradigm • Improve  accuracy  of  machine  learning  algorithm  (classifiers) • Many  examples  of  successful  endeavors  using  “Augmented  Intelligence” • Active  Learning  is  a  booming  area  of  ML/AI Conclusion  and  Takeaways
  91. 91. Thank  You!

×