9. @jeffbigham
Using Devices
User + AI + Crowds
X
Bigham, Jeffrey, et al. "VizWiz: nearly real-time answers to visual questions." UIST 2010.
Brady, Erin, et al. “Visual challenges in the everyday lives of blind people.” CHI 2013.
18. @jeffbigham
Insights from VizWiz for AI
Supporting Blind Photography
Supporting Blind Photography. Jayant, Chandrika, Ji, Hanjie, White,
Samuel, Bigham, Jeffrey P. ASSETS 2011.
19. @jeffbigham
Insights from VizWiz for AI
Wu, et al. Automatic Alt-text: Computer-generated Image Descriptions for
Blind Users on a Social Network Service. CSCW 2017.
Target What People Want to Know
20. @jeffbigham
Insights from VizWiz for AI
Errors are Important and Not Created Equal
MacLeod, Bennett, Morris, Cuttrell, et al. Understanding Blind People’s
Experiences with Computer-Generated Captions of Social Media Images. CHI
2017.
Measure and Express Confidence Well
21. @jeffbigham
Future Directions
* Amplify Users with AI Support (blind photography)
* Solve Real Problems (understand user needs)
* Embrace and Understand Errors (all errors are not e
* Measure Confidence and Express it Well
22. @jeffbigham
The End Students and Collaborators
VizWiz Dataset:
http://www.vizwiz.org/data
VizWiz Publications:
http://www.jeffreybigham.com
* Amplify Users with AI Support (blind photography)
* Solve Real Problems (understand user needs)
* Embrace and Understand Errors (all errors are not equal)
* Measure Confidence and Express it Well
Funded by:
Microsoft, NSF, NIDILRR
Notas del editor
Hi, I’m Jeffrey Bigham, from Carnegie Mellon University.AI has the potential to dramatically expand possibilities for people with disabilities, by making more of the world accessible.
People with disabilities have been among the earliest adopters of AI technology, and have driven some of the biggest human-facing innovations.There’s a lot to learn from their experiences, I think, in how we approach AI-driven applications, as AI starts to make it more and more into interactive applications targeting mainstream use.
In my short talk, I’ll focus on what we’ve learned by deploying crowd-powered solutions for people who are blind.
If you think about it, developing technologies for people who are blind should be a grand challenge for computer vision researchers.What is more core to the computer vision discipline as a natural application than reproducing in software the perceptual abilities afforded by the human eye.And, there has been a long history of work at the intersection of computer vision and accessibility.
But, sometimes it’s been an afterthought – “hmm, I just did something kind of cool with computer vision, I bet a blind person could use that!”<<CLICK>>
A few years ago, my students and I set out to discover what blind people actually wanted to know – we did that with a crowdsourcing application called VizWiz.To use it, users take a picture, speak a question, and then we arranged for paid crowd workers to answer the questions in less than a minute.
We deployed this application, and more than 10,000 users asked over 100,000 questions with it.There were a wide range of questions. From, what does this thermostat say?<<CLICK>>To,
What does the sky look like – this user asked this question every 5 minutes over the course of an hour, as the sky went from light to dark.He was watching the sun set.
To, credit card information (which we discouraged).And, how do I use this appliance… which was really hard to answer with natural language.
We took this data, removed those containing personal information, and released a dataset of about 50,000 images, questions, and answers.Our hope is that this dataset, of real questions by blind people in their everyday lives, may inspire computer vision researchers to work on this problem. A true grand challenge for their field.When we were getting started, computer vision researchers would tell me that it’s interesting, but our dataset, covering so many different topics, with badly taken photographs, was too hard.Fortunately, deep learning has come along. This dataset is still really hard, and we’re far from being able to answer many if not most of the questions in it automatically, but with deep learning has come confidence. And, I’ll take that.There are a number of papers out who have used our dataset in various ways. Microsoft’s own Seeing-AI, a cool new app released just a few days ago used it. So, you should definitely check that out.
We’ve used it to inspire some of our own work.One thing we noticed was that a lot of people asked about how to use devices. We got lots of pictures of devices, and VizWiz didn’t have a great way to answer these questions. It’s difficult to describe the layout of a complicated device interface in a way people can understand, and it’s nearly impossible for them to find the correct buttons on flat interfaces that are becoming more common.
In response, we created VizLens, a modified version of VizWiz that enables blind people to use appliances.They take a picture of the interface they want to use, crowd workers label it, and then computer vision running on the phone is able to tell them what button their finger is over.The computer vision works really well because we’re not trying to solve the general problem of recognizing any interface, but rather recognizing the same interface, from the same camera, from roughly the same perspective, and from roughly the same lighting conditions.
It is thus able to work across a wide variety of devices.
In an extension, we modified the workflow slightly to have users add a dollar bill to the device.This provides a fiduciary marker of known size.
From this, we can automatically create a 3d model for a tactile overlay for the device interface.
The user can then send it off to a remote service for printing.
And, then attach it to their device, providing an independent way to make their devices accessible.
So, by releasing a general purpose crowd-powered system, we were able to motivate a number of lines of research, both computer vision on the hard general problem of answering arbitrary visual questions, but also research into systems that combine the user, crowds, and computer vision to target specific tasks.
This experience has led to some generalizable thoughts for AI systems and research in the future.
Early on in the project, Justin Romack, an independent blind blogger, recorded a video about how to use VizWiz.
I think it contains examples of the themes we’ve extracted from our experience.Here’s Justin asking his question:<<CLICK>>
One thing we noticed pretty quickly is that it’s hard for blind people to take pictures, but a good picture is vital for either human or machine-powered vision to work well.As a result, we (and others) have done a lot of work on blind photography, to help blind people take better pictures.The lesson here is that it will still be awhile until AI can take over many tasks fully. But, AI can be useful now to support and amplify what users are able to do independently.
In the video, the first answer comes back from a service that identifies the main object in the photo.
<<CLICK>>Of course Justin knew that as a thermostat. And, so, while it might be obvious, one lesson is to work toward providing the information that people want to know.<<CLICK>>A lot of effort has been put into providing fairly generic labels for images… for instance, Facebook has started tagging all of their images with such labels. This is great, but I think we need to not pretend that this is likely to answer all questions, or even a lot of them. People, blind people included, are amazing at inferring information from context… a lot of times the easiest labels to provide are the easiest to infer.
So, how do we start answering what people want to know, which is likely much more contextual.
A few seconds later Justin gets an answer back from the crowd.<<CLICK>>What’s interesting here is that the crowd wasn’t able to answer Justin’s question – but, they were able to appropriately convey that inability, and provide information that allowed Justin to figure it out.Errors are important and not created equal; oftentimes all we look at in AI is an accuracy rate.One result I really like that highlights how important this is from Microsoft Research at this past CHI, in which they explored how blind users interpreted correct and incorrect captions for social media images. Even when the captioning was wrong, they would often invent stories for why the captioning might be right, e.g., why a man doing a trick on a skateboard might be the image used in a Hillary Clinton campaign post.Algorithms are going to get things wrong, and so an important problem is improving how we measure and express confidence. A. lot of models will output confidence values, but they’ve rarely been tuned to be accurate. A 50% confidence value doesn’t mean that 50% of the time the answer will be right.
To summarize, what I’d like to see more work in AI focus on is:
READ SLIDE
While our work has been conducted in the context of visual assistance for blind people, these are really important lessons for anyone hoping to build interactive AI systems of the future.
As has often been the case, people with disabilities are leading from the front in understanding, using, and developing smart technology.