Augmenting Vision for Accessibility

•Descargar como PPTX, PDF•

1 recomendación•103 vistas

Jeffrey Bigham

Jeff Bigham's presentation at the 2017 Microsoft Research Faculty Summit.

Ingeniería

@jeffbigham
Jeffrey P. Bigham
Associate Professor, Carnegie Mellon University
Augmenting Vision

@jeffbigham
What do blind people want to know?
vizwiz.org

@jeffbigham
VizWiz Dataset
vizwiz.org/data
Check out Microsoft’s Seeing AI!

@jeffbigham
Using Devices
User + AI + Crowds
X
Bigham, Jeffrey, et al. "VizWiz: nearly real-time answers to visual questions." UIST 2010.
Brady, Erin, et al. “Visual challenges in the everyday lives of blind people.” CHI 2013.

@jeffbigham
Insights from VizWiz for AI
Supporting Blind Photography
Supporting Blind Photography. Jayant, Chandrika, Ji, Hanjie, White,
Samuel, Bigham, Jeffrey P. ASSETS 2011.

@jeffbigham
Insights from VizWiz for AI
Wu, et al. Automatic Alt-text: Computer-generated Image Descriptions for
Blind Users on a Social Network Service. CSCW 2017.
Target What People Want to Know

@jeffbigham
Insights from VizWiz for AI
Errors are Important and Not Created Equal
MacLeod, Bennett, Morris, Cuttrell, et al. Understanding Blind People’s
Experiences with Computer-Generated Captions of Social Media Images. CHI
2017.
Measure and Express Confidence Well

@jeffbigham
Future Directions
* Amplify Users with AI Support (blind photography)
* Solve Real Problems (understand user needs)
* Embrace and Understand Errors (all errors are not e
* Measure Confidence and Express it Well

@jeffbigham
The End Students and Collaborators
VizWiz Dataset:
http://www.vizwiz.org/data
VizWiz Publications:
http://www.jeffreybigham.com
* Amplify Users with AI Support (blind photography)
* Solve Real Problems (understand user needs)
* Embrace and Understand Errors (all errors are not equal)
* Measure Confidence and Express it Well
Funded by:
Microsoft, NSF, NIDILRR

Más contenido relacionado

Último

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Java Programming :Event Handling(Types of Events)simmis5

MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N

UNIT - IV - Air Compressors and its Performancesivaprakash250

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Introduction to Multiple Access Protocol.pptxupamatechverse

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

Extrusion Processes and Their Limitations120cr0395

Online banking management system project.pdfKamal Acharya

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

University management System project report..pdfKamal Acharya

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Destacado

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Destacado (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Augmenting Vision for Accessibility

1. @jeffbigham Jeffrey P. Bigham Associate Professor, Carnegie Mellon University Augmenting Vision

2. @jeffbigham Jeffrey P. Bigham Associate Professor, Carnegie Mellon University Augmenting Vision

3. @jeffbigham What do blind people want to know? vizwiz.org

8. @jeffbigham VizWiz Dataset vizwiz.org/data Check out Microsoft’s Seeing AI!

9. @jeffbigham Using Devices User + AI + Crowds X Bigham, Jeffrey, et al. "VizWiz: nearly real-time answers to visual questions." UIST 2010. Brady, Erin, et al. “Visual challenges in the everyday lives of blind people.” CHI 2013.

10. @jeffbigham

11. @jeffbigham

12. @jeffbigham

13. @jeffbigham

14. @jeffbigham

15. @jeffbigham

16. @jeffbigham What do blind people want to know? vizwiz.org

17. @jeffbigham Insights from VizWiz for AI

18. @jeffbigham Insights from VizWiz for AI Supporting Blind Photography Supporting Blind Photography. Jayant, Chandrika, Ji, Hanjie, White, Samuel, Bigham, Jeffrey P. ASSETS 2011.

19. @jeffbigham Insights from VizWiz for AI Wu, et al. Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service. CSCW 2017. Target What People Want to Know

20. @jeffbigham Insights from VizWiz for AI Errors are Important and Not Created Equal MacLeod, Bennett, Morris, Cuttrell, et al. Understanding Blind People’s Experiences with Computer-Generated Captions of Social Media Images. CHI 2017. Measure and Express Confidence Well

21. @jeffbigham Future Directions * Amplify Users with AI Support (blind photography) * Solve Real Problems (understand user needs) * Embrace and Understand Errors (all errors are not e * Measure Confidence and Express it Well

22. @jeffbigham The End Students and Collaborators VizWiz Dataset: http://www.vizwiz.org/data VizWiz Publications: http://www.jeffreybigham.com * Amplify Users with AI Support (blind photography) * Solve Real Problems (understand user needs) * Embrace and Understand Errors (all errors are not equal) * Measure Confidence and Express it Well Funded by: Microsoft, NSF, NIDILRR

Notas del editor

Hi, I’m Jeffrey Bigham, from Carnegie Mellon University.AI has the potential to dramatically expand possibilities for people with disabilities, by making more of the world accessible. People with disabilities have been among the earliest adopters of AI technology, and have driven some of the biggest human-facing innovations.There’s a lot to learn from their experiences, I think, in how we approach AI-driven applications, as AI starts to make it more and more into interactive applications targeting mainstream use. In my short talk, I’ll focus on what we’ve learned by deploying crowd-powered solutions for people who are blind.
If you think about it, developing technologies for people who are blind should be a grand challenge for computer vision researchers.What is more core to the computer vision discipline as a natural application than reproducing in software the perceptual abilities afforded by the human eye.And, there has been a long history of work at the intersection of computer vision and accessibility. But, sometimes it’s been an afterthought – “hmm, I just did something kind of cool with computer vision, I bet a blind person could use that!”<<CLICK>> A few years ago, my students and I set out to discover what blind people actually wanted to know – we did that with a crowdsourcing application called VizWiz.To use it, users take a picture, speak a question, and then we arranged for paid crowd workers to answer the questions in less than a minute.
We deployed this application, and more than 10,000 users asked over 100,000 questions with it.There were a wide range of questions. From, what does this thermostat say?<<CLICK>>To,
What does the sky look like – this user asked this question every 5 minutes over the course of an hour, as the sky went from light to dark.He was watching the sun set.
To, credit card information (which we discouraged).And, how do I use this appliance… which was really hard to answer with natural language.
We took this data, removed those containing personal information, and released a dataset of about 50,000 images, questions, and answers.Our hope is that this dataset, of real questions by blind people in their everyday lives, may inspire computer vision researchers to work on this problem. A true grand challenge for their field.When we were getting started, computer vision researchers would tell me that it’s interesting, but our dataset, covering so many different topics, with badly taken photographs, was too hard.Fortunately, deep learning has come along. This dataset is still really hard, and we’re far from being able to answer many if not most of the questions in it automatically, but with deep learning has come confidence. And, I’ll take that.There are a number of papers out who have used our dataset in various ways. Microsoft’s own Seeing-AI, a cool new app released just a few days ago used it. So, you should definitely check that out.
We’ve used it to inspire some of our own work.One thing we noticed was that a lot of people asked about how to use devices. We got lots of pictures of devices, and VizWiz didn’t have a great way to answer these questions. It’s difficult to describe the layout of a complicated device interface in a way people can understand, and it’s nearly impossible for them to find the correct buttons on flat interfaces that are becoming more common.
In response, we created VizLens, a modified version of VizWiz that enables blind people to use appliances.They take a picture of the interface they want to use, crowd workers label it, and then computer vision running on the phone is able to tell them what button their finger is over.The computer vision works really well because we’re not trying to solve the general problem of recognizing any interface, but rather recognizing the same interface, from the same camera, from roughly the same perspective, and from roughly the same lighting conditions.
It is thus able to work across a wide variety of devices.
In an extension, we modified the workflow slightly to have users add a dollar bill to the device.This provides a fiduciary marker of known size.
From this, we can automatically create a 3d model for a tactile overlay for the device interface.
The user can then send it off to a remote service for printing.
And, then attach it to their device, providing an independent way to make their devices accessible.
So, by releasing a general purpose crowd-powered system, we were able to motivate a number of lines of research, both computer vision on the hard general problem of answering arbitrary visual questions, but also research into systems that combine the user, crowds, and computer vision to target specific tasks. This experience has led to some generalizable thoughts for AI systems and research in the future.
Early on in the project, Justin Romack, an independent blind blogger, recorded a video about how to use VizWiz. I think it contains examples of the themes we’ve extracted from our experience.Here’s Justin asking his question:<<CLICK>>
One thing we noticed pretty quickly is that it’s hard for blind people to take pictures, but a good picture is vital for either human or machine-powered vision to work well.As a result, we (and others) have done a lot of work on blind photography, to help blind people take better pictures.The lesson here is that it will still be awhile until AI can take over many tasks fully. But, AI can be useful now to support and amplify what users are able to do independently.
In the video, the first answer comes back from a service that identifies the main object in the photo. <<CLICK>>Of course Justin knew that as a thermostat. And, so, while it might be obvious, one lesson is to work toward providing the information that people want to know.<<CLICK>>A lot of effort has been put into providing fairly generic labels for images… for instance, Facebook has started tagging all of their images with such labels. This is great, but I think we need to not pretend that this is likely to answer all questions, or even a lot of them. People, blind people included, are amazing at inferring information from context… a lot of times the easiest labels to provide are the easiest to infer. So, how do we start answering what people want to know, which is likely much more contextual.
A few seconds later Justin gets an answer back from the crowd.<<CLICK>>What’s interesting here is that the crowd wasn’t able to answer Justin’s question – but, they were able to appropriately convey that inability, and provide information that allowed Justin to figure it out.Errors are important and not created equal; oftentimes all we look at in AI is an accuracy rate.One result I really like that highlights how important this is from Microsoft Research at this past CHI, in which they explored how blind users interpreted correct and incorrect captions for social media images. Even when the captioning was wrong, they would often invent stories for why the captioning might be right, e.g., why a man doing a trick on a skateboard might be the image used in a Hillary Clinton campaign post.Algorithms are going to get things wrong, and so an important problem is improving how we measure and express confidence. A. lot of models will output confidence values, but they’ve rarely been tuned to be accurate. A 50% confidence value doesn’t mean that 50% of the time the answer will be right.
To summarize, what I’d like to see more work in AI focus on is: READ SLIDE While our work has been conducted in the context of visual assistance for blind people, these are really important lessons for anyone hoping to build interactive AI systems of the future. As has often been the case, people with disabilities are leading from the front in understanding, using, and developing smart technology.

Augmenting Vision for Accessibility

Recomendados

Recomendados

Más contenido relacionado

Último

Último (20)

Destacado

Destacado (20)

Augmenting Vision for Accessibility

Notas del editor