Driving Behavioral Change for Information Management through Data-Driven Gree...
Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Cloud Computing
1. Eyesight Sharing in Blind Grocery Shopping: Remote
P2P Caregiving through Cloud Computing
Vladimir Kulyukin, Tanwir Zaman, Abhishek Andhavarapu ,
and Aliasgar Kutiyanawala
Department of Computer Science
Utah State University
Logan, UT, USA
{vladimir.kulyukin}@usu.edu
Abstract. Product recognition continues to be a major access barrier for visual-
ly impaired (VI) and blind individuals in modern supermarkets. R&D ap-
proaches to this problem in the assistive technology (AT) literature vary from
automated vision-based solutions to crowdsourcing applications where VI cli-
ents send image identification requests to web services. The former struggle
with run-time failures and scalability while the latter must cope with concerns
about trust, privacy, and quality of service. In this paper, we investigate a mo-
bile cloud computing framework for remote caregiving that may help VI and
blind clients with product recognition in supermarkets. This framework empha-
sizes remote teleassistance and assumes that clients work with dedicated care-
givers (helpers). Clients tap on their smartphones’ touchscreens to send images
of products they examine to the cloud where the SURF algorithm matches in-
coming image against its image database. Images along with the names of the
top 5 matches are sent to remote sighted helpers via push notification services.
A helper confirms the product’s name, if it is in the top 5 matches, or speaks or
types the product’s name, if it is not. Basic quality of service is ensured through
human eyesight sharing even when image matching does not work well. We
implemented this framework in a module called EyeShare on two Android
2.3.3/2.3.6 smartphones. EyeShare was tested in three experiments with one
blindfolded subject: one lab study and two experiments in Fresh Market, a su-
permarket in Logan, Utah. The results of our experiments show that the pro-
posed framework may be used as a product identification solution in supermar-
kets.
1 Introduction
The term teleassistance covers a wide range of technologies that enable VI and blind
individuals to transmit video and audio data to remote caregivers and receive audio
assistance [1]. Research evidence suggests that the availability of remote caregiving
reduces the psychological stress on VI and blind individuals when they perform vari-
ous tasks in different environments [2].
2. A typical example of how teleassistance is used for blind navigation is the system
developed by Bujacz et. al. [1]. The system consists of two notebook computers: one
is carried by the VI traveler in a backpack and the other used by the remote sighted
caregiver. The traveler transmits video through a chest-mounted USB camera. The
traveler wears a headset (an earphone and a microphone) to communicate with the
caregiver. Several indoor navigation experiments showed that VI travelers walked
faster, at a steadier pace, and were able to navigate more easily when assisted by re-
mote guides then when they navigated the same routes by themselves.
Our research group has applied teleassistance to blind shopping in ShopMobile, a
mobile shopping system for VI and blind individuals [3]. Our end objective is to ena-
ble VI and blind individuals to shop independently using only their smartphones.
ShopMobile is our most recent system for accessible blind shopping that follows Ro-
boCart and ShopTalk [4]. The system has three software modules: an eyes-free bar-
code scanner, an OCR engine, and a teleassitance module called TeleShop. The eyes-
free barcode scanner allows VI shoppers to scan UPC barcodes on products and MSI
barcodes on shelves. The OCR engine is being developed to extract nutrition facts
from nutrition tables available on many product packages. TeleShop provides a tele-
assistance backup in situations when the barcode scanner or the OCR engine’s mal-
function.
The current implementation of TeleShop consists of a server running on the VI
shopper's smartphone (Google Nexus One with Android 2.3.3/2.3.6) and a client GUI
module running on the remote caregiver's computer. All client-server communication
occurs over UDP. Images from the phone camera are continuously transmitted to the
client GUI. The caregiver can start, stop, and pause the incoming image stream and to
change image resolution and quality. Images of high resolution and quality provide
more reliable detail but may cause the video stream to become choppy. Lower resolu-
tion images result in smoother video streams but provide less detail. The pause option
is for holding the current image on the screen.
TeleShop has so far been evaluated in two laboratory studies with Wi-Fi and 3G
[3]. The first study was done with two sighted students, Alice and Bob. The second
study was done with a married couple: a completely blind person (Carl) and his sight-
ed wife (Diana). For both studies, we assembled four plastic shelves in our laboratory
and stocked them with empty boxes, cans, and bottles to simulate an aisle in a grocery
store. The shopper and the caregiver were in separate rooms. In the first study, we
blindfolded Bob to act as a VI shopper. The studies were done on two separate days.
The caregivers were given a list of nine products and were asked to help the shoppers
find the products and read the nutrition facts on the products' packages or bottles. A
voice connection was established between the shopper and the caregiver via a regular
phone call. Alice and Bob took an average of 57.22 and 86.5 seconds to retrieve a
product from the shelf and to read its nutrition facts, respectively. The corresponding
times for Carl and Diana were 19.33 and 74.8 seconds, respectively [3].
In this paper, we present an extension of TeleShop, called EyeShare, that leverages
cloud computing to assist VI and blind shoppers (clients) with product recognition in
supermarkets. The client takes a still image of the product that he or she currently
examines and sends it to the cloud. The image is processed by an open source object
3. recognition software application that runs on a cloud server and returns the top 5
matches from its product database. Number 5 was chosen, because a 5-item list easily
fits on one Google Nexus One screen. The matches, in the form of a list of product
names, are sent to the helper along with the original image through a push notification
service. The helper uses his or her smartphone to select the correct product name from
the list or, if the product’s name is not found among the matches, to speak it into the
smartphone. If speech recognition (SR) does not work, the helper types in the prod-
uct’s name. This framework is flexible in that various image recognition algorithms
can tested in the cloud. It is also possible to use no image recognition, in which case
all product recognition is done by the sighted caregiver.
The remainder of our paper is organized as follows. In Section 3, we present our
cloud computing framework for remote caregiving with which mobile devices form
ad hoc peer-to-peer (P2P) communication networks. In Section 4, we describe three
experiments in two different environments: a laboratory and a local supermarket
where a blindfolded individual and a remote sighted caregiver evaluated the system
on different products. In Section 5, we present the results of our experiments. In Sec-
tion 6, we discuss our investigation.
Fig. 1. Cloud Computing Framework for Remote Caregiving.
2 A Cloud Computing Framework for Remote P2P Caregiving
The cloud computing framework we have implemented consists of mobile devices
that communicate with each other in an ad hoc P2P network. The devices have
Google accounts for authentication and are registered with Google's C2DM (cloud to
device messaging) service (http://code.google.com/android/c2dm/), a
push notification service that allocates unique IDs to registered devices. Our frame-
work assumes that the cloud computing services run on Amazon's Elastic Computing
Service (EC2) (http://aws.amazon.com/ec2/). Other cloud computing ser-
vices may be employed. We configured an Amazon EC2 Linux server with 1 GHz
4. processor and 512 MB RAM. The server runs an OpenCV 2.3.3
(http://opencv.willowgarage.com/wiki/) image matching application.
Product images are saved in a MySQL database. The use of this framework requires
that clients and helpers download the client and caregiver applications on their
smartphones. The clients and helpers subsequently find each other and form an ad hoc
P2P network via C2DM registration IDs.
Figure 1 shows this framework in action. A client sends a help request (Step 1). In
EyeShare, this request consists of a product image. However, in principle, this request
can be anything transmittable over available wireless channels such as Wi-Fi, 3G, 4G,
Bluetooth, etc. The image is received by the Amazon EC2 Linux server where it is
matched against the images in the MySQL database.
Our image matching application uses the SURF algorithm [5]. The matching op-
eration returns the top 5 matches and sends the names of the corresponding products
along with the URL that contains the client’s original image to the C2DM service
(Step 2). Thus, the image is transmitted only once – in the help request. C2DM for-
wards the message to the caregiver's smartphone (Step 3). The helper confirms the
product’s name by selecting it from the list of the top 5 matches. If the top matches
are incorrect, the helper uses SR to speak the product’s name or, if SR does not work
or is not available, types it in on the touchscreen. If the helper cannot determine the
product’s name from the image, the helper sends a resend request to the client. The
helper’s message goes back to the C2DM service (Step 4) and then on to the client's
smartphone (Step 5). The helper application is designed in such a way that the helper
does not have to interrupt its smartphone activities for too long to render assistance.
2.1 Android Cloud to Device Messaging (C2DM) Framework
C2DM (http://code.google.com/android/c2dm/) takes care of message
queuing and delivery. Push notifications ensure that the application does not need to
keep polling the cloud server for new incoming requests. C2DM wakes up the An-
droid application when messages are received through intent broadcasts. However,
the application must be set up with the proper C2DM broadcast receiver permissions.
In EyeShare, C2DM is used in two separate activities. First, C2DM forwards the mes-
sage from the server to the helper application. This message consists of a formatted
string of the client registration ID, the names of the top 5 product matches, and the
URL containing the client’s image. Clients’ images are temporarily saved on the
cloud-based Linux server and removed as soon as the corresponding help requests are
processed. Second, C2DM is used when helper messages are sent back to clients.
2.2 Image Matching
We have used SURF (Speeded Up Robust Features) [5] as a black box image match-
ing algorithm in our cloud server. SURF extracts unique key points and descriptors
from images and later uses them to match indexed images against incoming image.
SURF uses an intermediate image representation called Integral Image that is com-
puted from the input image. This intermediate representation speeds up the calcula-
5. tions in rectangular areas. It is formed by summing up the pixel values of the x,y co-
ordinates from the origin to the ends of the image. This makes computation time in-
variant to change in size and is useful in matching large images. The SURF detector is
based on the determinant of the Hessian matrix. The SURF descriptor describes how
pixel intensities are distributed within a scale dependent neighborhood of each interest
point detected by Fast Hessian. Object detection using SURF is scale and rotation
invariant and does not require long training. The fact that SURF is rotation invariant
makes the algorithm useful in situations where image matching works with object
images taken at different orientations than the images of the same objects used in
training.
3 Experiments
We evaluated EyeShare in product recognition experiments at two locations. The first
study was conducted in our laboratory. The second and third studies were conducted
at Fresh Market, a local supermarket in Logan, Utah.
3.1 A Laboratory Study
We assembled four shelves in our laboratory and placed on them 20 products: bottles,
boxes, and cans. The same setup was successfully used in our previous experiments
on accessible blind shopping [3, 4]. We created a database of 100 images. Each of the
20 products on the shelves had 5 images taken at different orientations. The SURF
algorithm was trained on these 100 images. A blindfolded individual was given a
Google Nexus One smartphone (Android 2.3.3) with the EyeShare client application
installed on it. A sighted helper was given another Google Nexus One (Android 2.3.3)
with the EyeShare helper app installed on it.
The blindfolded client was asked to take each product from the assembled shelves
and recognize it. The client took a picture of the product by tapping the touchscreen.
The image was sent to the cloud Linux server where it was processed by the SURF
algorithm. The names of the top 5 matched products were sent to the helper for verifi-
cation along with the URL with the original image through C2DM. The helper, locat-
ed in a different room in the same building, selected the product’s name from the list
of the top matches and sent the product’s name back to the client. If the product’s
name was not in the list, the helper spoke the name of the product or, if SR was not
recognized after three attempts, typed in the product’s name on the virtual
touchscreen keyboard. The run for an individual product was considered completed
when the product’s name was spoken on the client’s smartphone through TTS. Thus,
the total run time (in seconds) for each run included all five steps given in Fig. 1.
3.2 Store Experiments
The next two experiments were executed in Fresh Market, a local supermarket in
Logan, Utah. Prior to the experiments we added 270 images to our image database
6. used in the laboratory study. We selected 45 products from 9 aisles (5 products per
aisle) in the supermarket and took 6 images at different rotations for every product.
The products included boxes, bottles, cans, and bags. We biased our selection to
products that an individual can hold in one hand. SURF was retrained on these 370
images (100 images from the lab study and 270 new ones).
The same blindfolded subject who participated in the laboratory study was given a
Samsung Galaxy S2 smartphone (Android 2.3.6) with the EyeShare client application
installed on it. The client used a 4G data plan. The same helper who participated in
the laboratory study was given a Google Nexus One (Android 2.3.6) with the Eye-
Share helper application installed on it. The helper was located in a building approx-
imately one mile away from the supermarket. The helper used a Wi-Fi connection.
The first set of experiments was confined to the first three aisles of the supermarket
and lasted for 30 minutes. In each aisle, three products from the database and three
products not from the database were chosen by a research assistant who went to the
supermarket with the blindfolded subject. The assistant gave each product to the sub-
ject who was asked to use the EyeShare client application to recognize the product.
There was no training involved, because it was the same blindfolded subject who did
the laboratory study. The subject was given 16 products, one product at a time, by the
assistant. One experimental run began at the time when the subject was given a prod-
uct and went on until the time when the subject’s smartphone received the product’s
name and read it out to the subject through TTS.
The second set of experiments was conducted in the same supermarket on a differ-
ent day with the same subject and helper. The experiments lasted 30 minutes. Since,
as explained in the discussion section, the image matching did not perform as well as
we hoped it would in the first supermarket study, we did not do any image matching
in the second set of experiments. All product recognition was done by the remote
sighted helper. The subject was given 17 products, one product at a time, taken from
the next three aisles of the supermarket by the assistant. The experimental run times
were computed in the same way as they were in the first supermarket study.
4 Results
The results of the experiments are summarized in Table 1. Column 1 gives the envi-
ronments where the experiments were executed. Column 2 gives the number of prod-
ucts used in the experiments in the corresponding environments. Column 3 gives the
mean time (in seconds) of the experimental runs. Column 4 gives the standard devia-
tions of the corresponding mean time values. Column 5 gives the number of times the
correct product was found in the top 5 matches. Column 6 gives the mean number of
SR attempts. Column 7 gives the number of SR failures when the helper had to type
the product names on the touchscreen keyboard after attempting to use SR three
times. In all experiments, all products were successfully recognized by the blindfold-
ed subject. As can be seen in Table 1, in supermarket study 1, after our image data-
base had grown in size, there were no correct product names in the top 5 matches.
Consequently, we decided not to use SURF in supermarket study 2. In supermarket
7. study 1, there were three cases when the helper requested the client to send another
image of a product because he could not identify the product’s name from the original
image. In supermarket study 1, there was one brief (several seconds) loss of Wi-Fi
connection on the helper’s smartphone.
Table 1. Experimental results.
Environment # Products Mean Time STD Top 5 Mean SR SR Failures
Lab 16 40 .00021 8 1.1 0
Store 1 16 60 .00033 0 1.2 2
Store 2 17 60 .00081 0 1.1 3
5 Discussion
Our study contributes to the recent body of research that addresses various aspects
of independent blind shopping through mobile and cloud computing (e.g., [6, 7, 8]).
Our approach differs from these studies in its emphasis on dedicated remote caregiv-
ing. Our approach addresses, at least to some extent, both image recognition failures
of fully automated solutions and the concerns about trust, privacy, and basic quality of
service of pure crowdsourcing approaches. Dedicated caregivers alleviate image
recognition failures through human eyesight sharing. Since dedicated caregiving is
more personal and trustworthy, clients are not required to post image recognition
requests on open web forums, which allows them to preserve more privacy. Interested
readers may watch our research videos at www.youtube.com/csatlusu for
more information on our accessible shopping experiments and projects.
The experiments show that the average product recognition is within one minute.
The results demonstrate that SR is a viable option for product naming. We attribute
the poor performance of SURF in the first supermarket study to our failure to properly
parameterize the algorithm. As we gain more experience with SURF, we may be able
to improve the performance of automated image matching. However, database
maintenance may be a more serious long-term concern for automated image matching
unless there is direct access to the supermarket’s inventory control system.
Our findings should be interpreted with caution, because we used only one blind-
folded subject in the experiments. Nonetheless, our findings may serve as a basis for
future research on remote teleassisted caregiving in accessible blind shopping. Our
experience with the framework suggests that telassistance may be an feasible option
for VI individuals in modern supermarkets. Dedicated remote caregiving can be ap-
plied not only to product recognition but also to assistance with cash payments and
supermarket navigation. It is a relatively inexpensive solution, because the only re-
quired hardware device is a smartphone with a data plan.
8. As the second supermarket study suggests, cloud-based image matching may not
be necessary. The use of mobile phones as the means of caregiving allows caregivers
to provide assistance from the comfort of their homes or offices or on the go. As data
plans move toward 4G network speeds, we can expect faster response times and better
quality of service. Faster network connections may, in time, make it feasible to com-
municate via streaming videos.
References
1. Bujacz M., Baranski P., Moranski M., Strumillo P., and Materka A. “Remote Guidance for
the Blind - A Proposed Teleassistance System and Navigation Trials,” In Proceedings of
the Conference on Human System Interactions, pp. 888-892, IEEE, Krakow, Poland, 2008.
2. Peake P. and Leonard J., “The Use of Heart-Rate as an Index of Stress in Blind Pedestri-
ans,” Ergonomics, 1971.
3. Kutiyanawala, A., Kulyukin, V., and Nicholson, J. “Teleassistance in Accessible Shopping
for the Blind.” In Proceedings of the 2011 International Conference on Internet Compu-
ting, ICOMP Press, pp. 190-193, July 18-21, 2011, Las Vegas, USA.
4. Kulyukin, V. and Kutiyanawala, A. “Accessible Shopping Systems for Blind and Visually
Impaired Individuals: Design Requirements and the State of the Art.” The Open Rehabili-
tation Journal, ISSN: 1874-9437, Volume 2, 2010, pp. 158-168, DOI:
10.2174/1874943701003010158.
5. Bay, H., Tuytelaars, T., L. Van Cool. “SURF: Speeded Up Robust Features.” Computer
Vision-ECCV, pp. 404-417, Springer-Verlag, 2006.
6. Sam S. Tsai, S., Chen, D., Chandrasekhar, V., Takacs, G., Ngai-Man, C.,
7. Vedantham, R., Grzeszczuk, R., and Girod, B. “Mobile Product Recognition.” In Proceed-
ings of the International Conference on Multimedia (MM '10). ACM, New York, NY,
USA, 1587-1590. DOI=10.1145/1873951.1874293,
http://doi.acm.org/10.1145/1873951.1874293
8. Girod, B., Chandrasekhar, V., Chen, D.M., Ngai-Man C., Grzeszczuk, R., Reznik, Y.,
Takacs, G., Tsai, S.S., and Vedantham, R. "Mobile Visual Search," Signal Processing
Magazine, IEEE, vol.28, no.4, pp.61-76, July 2011. doi: 10.1109/MSP.2011.940881.
9. Von Reischach, F., Michahelles, F., Guinard, D., Adelmann, R., Fleisch, E., and Schmidt,
A. “An Evaluation of Product Identification Techniques for Mobile Phones.” In Proceed-
ings of the 2nd IFIP TC13 Conference in Human-Computer Interaction (Interact 2009).
Uppsala, Sweden.