This paper presents a user-calibration-free method for estimating the point of gaze (POG) on a display accurately with estimation of the horizontal angles between the visual and the optical axes of both eyes. By using one pair of cameras and two light sources, the
optical axis of the eye can be estimated. This estimation is carried
out by using a spherical model of the cornea. The point of intersection
of the optical axis of the eye with the display is termed POA.By detecting the POAs of both the eyes, the POG is approximately estimated as the midpoint of the line joining the POAs of both the eyes on the basis of the binocular eye model; therefore, we can estimate
the horizontal angles between the visual and the optical axes of both the eyes without requiring user calibration. We have developed
a prototype system based on this method using a 19 display with two pairs of stereo cameras. We evaluated the system experimentally with 20 subjects who were at a distance of 600 mm from the display. The result shows that the average of the root-meansquare error (RMSE) of measurement of POG in the display screen coordinate system is 16.55 mm (equivalent to less than 1.58◦).
Similar a Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizontal Angles Between The Visual And The Optical Axes Of Both Eyes (20)
2. 2 User-calibration-free gaze tracking method XPOAL XPOG XPOAR
This section describes a mathematical model for user-calibration- Display
free gaze tracking. This method can determine the offsets between
the visual and the optical axes of both the eyes by using binocular
optics.
2.1 Estimation of optical axis of eye
The optical axis of a single eye can be estimated by the methods
described in [Nagamatsu et al. 2008b; Shih and Liu 2004; Guestrin
and Eizenman 2007]. This method requires a minimum of two cam-
eras and two point light sources.
Optical Axis
Optical Axis
2.2 Binocular eye model
Axis
Axis
al
al
Visu
Visu
It is known that the visual axis of the eye inclines toward the nose, BL BR
away from the optical axis of the eye. Figure 2 shows a 3D model
of the visual and the optical axes of both the eyes when a user gazes
at a certain point on the display. The visual axes of both the eyes in- AL AR
tersect on the display at the POG, whose position vector is XPOG .
The point of intersection of the optical axis of the eye with the dis-
play is termed POA. XPOAL and XPOAR are the position vectors EL ER
of POAs of the left and the right eyes, respectively.
POA of Left Eye POG FL FR
XPOAL XPOG
POA of Figure 3: Binocular 3D eye model in detail.
Right Eye
XPOAR Even though this concept is very simple, it is the most accurate
theoretical solution that is currently available for realization of user-
Optical Axis calibration-free gaze tracking.
Although it seems to be more accurate that the POG is determined
such that the angles between the visual and the optical axes of both
the eyes are equal than estimation of POG by Equation 1, we expect
Visual Axis
Optical Axis estimation of POG by the Equation 1 is stable and enough accurate
Display in practical use.
Left Eye
By using calculated POG, the offsets between the visual and the
Right Eye optical axes of both the eyes can be calculated using the one-point
calibration gaze estimation method [Nagamatsu et al. 2008b]. After
Figure 2: Binocular 3D eye model. estimating these offsets, when the system is unable to detect one eye
due to image processing problems etc., the POG can be calculated
from the data of the other eye. Thus, robust operation of the system
2.3 Estimation of point of gaze (POG) is achieved.
As described in the introduction, the average horizontal and vertical 3 Implementation
angles between the visual and the optical axes of the eye are 5.5◦
and 1.0◦ , respectively. In this study, we focus on the estimation
of the horizontal angle between the visual and the optical axes of 3.1 Development of prototype system
the eye. We ignore the vertical angle because it is smaller than the
horizontal angle and the average is only 1.0◦ . A prototype system was implemented, as shown in Figure 4. This
system consists of four synchronized monochrome IEEE-1394 dig-
Figure 3 shows a detailed model of the binocular eyes. We can esti- ital cameras (Firefly MV, Point Grey Research Inc.), three infrared
mate XPOAL and XPOAR by calculating the intersections between light sources (LED), 19 LCD, and a Windows-based PC (Win-
the optical axes of both the eyes and the display. We assume the dows XP, Intel Core 2 Quad). The software was developed using
human body to be symmetric about the sagittal plane, and there- OpenCV 1.0 [Intel]. Each camera uses a 1/3 CMOS image sensor
fore, the horizontal angles between the visual and the optical axes whose resolution is 752 × 480 pixels. A 50-mm lens and an IR fil-
of both the eyes to be similar. Therefore, the midpoint of the POAs ter are attached to each camera. In order to capture high-resolution
can serve as a good approximation of the POG, since the horizon- images of the eyes, we used lenses with a narrow field of view.
tal offsets of the POAs from the POG cancel each other. XPOG is Thus, a pair of cameras was used for capturing the left eye, and
calculated as follows: another pair was used for the right eye. These cameras were posi-
tioned under the display. The intrinsic parameters of the cameras
1 were determined before setting the system. The reason for using
XPOG (XPOAL + XPOAR ). (1) three LED is to reduce the estimation error of the optical axis of the
2
252
3. eye caused by asphericity of the cornea as described in Nagamatsu 4 Experimental evaluation
et al. [2008b].
4.1 Experiment and results
LED LED
We evaluated the prototype system in a laboratory with 20 adult
subjects (15 men and 5 women) who did not wear glasses or contact
lenses. The ages of the subjects range from 21 to 40.
The proposed gaze estimation method can allow the user to move;
however, the current implementation cannot capture the user’s eyes
in a wide area. Therefore, in order to measure the performance of
the gaze estimation method without error caused by head move-
ment, the head was supported by a chin rest to prevent from it from
being out of focus / field of view of the cameras in this experiment.
The eyes were approximately 600 mm from the display.
The subjects were asked to fixate on 25 points that appeared one-
by-one on the display. Data were recorded when the optical axes
Camera for Left Eye Camera for Right Eye
of both eyes were detected. We recorded more than 10 data points
when the subject gazed at each point.
Figure 4: Prototype system.
Figure 6 shows an example of the experimental result. POGs and
POAs of the left and right eyes are plotted in the display coordinate
3.2 Calibration of extrinsic camera parameters system. The black diamond-shaped points on the grids represent
fiducial points that were intentionally gazed at by the subject. The
If the cameras and lenses are selected, the intrinsic camera parame- triangle- and square-shaped points indicate the POAs of the left and
ters do not change. However, the extrinsic camera parameters may right eyes, respectively. The black circular points indicate the esti-
change if the position of the camera changes. mated POGs. The plotted points of the POAs are the representatives
of the recorded data when the subject gazed at each point; the po-
Figure 5 shows an arrangement of the components when the ex- sition is the average value of the recorded data. In cases in which
trinsic camera parameters are calibrated. A display, a camera- the POA of either the left eye or the right eye deviated from the
calibration board, and cameras are set up on the table. The dis- median value by more than 5◦ in terms of view angle, the data were
play and the camera-calibration board are parallel to each other and removed as an outlier. In such a case, it is assumed that the subject
upright to the table. The origin of the world coordinate system is gazed at another target or that the image processing failed to detect
O, which is on the table in the plane including the display plane. the center of the pupil or the first Purkinje images.
The x-axis is horizontal toward the left, and the y-axis is vertical
downward. The z-axis is the direction of the cross product of x and 0 640 1280
y. 0
205.5 m
m
LED2
19” Display
Camera-Calibration Board 512
L R
LED1 LED0
442 mm
100 mm
353 mm
1024
Cam
era 3 Ca
mera
123 mm
1 Came
x ra 2 Ca
mera Figure 6: Example of evaluation results for one subject.
0
O
z
mm y
600 Root-mean-square error (RMSE) of the system evaluation are listed
in Table 1; here, RMSEx and RMSEy denote the horizontal and ver-
tical RMSE of measurement in the display screen coordinate sys-
Figure 5: Arrangement of components. tem, respectively. RMSE is given by RMSE2 + RMSE2 . The
x y
values in the table are the averages of 25 points of these values for
The calibrations of the extrinsic camera parameters for the four each subject. The lowest row indicates the average of the values for
cameras are carried out by using a camera-calibration board. The all the subjects by each column.
board has two checker patterns (13 × 9 grid points). Cameras 0 and
1 are for the right eye, and therefore, they are directed toward the 4.2 Discussion
right-hand side checker pattern, indicated by R in Figure 5. Cam-
eras 2 and 3 are for the left eye, and therefore, they are directed As shown in Table 1, the average of the RMSEx and RMSEy for 20
toward the left-hand side checker pattern, indicated by L in Figure subject is 8.40 mm and 12.92 mm, respectively. These are equiv-
5. alent to less than 0.80◦ and 1.23◦ in terms of the view angle for a
253
4. Acknowledgements
Table 1: Root-mean-square error (RMSE) of the gaze tracking er-
ror for the 20 subjects. This research was partially supported by the Japan Science and
Technology Agency, Research for Promoting Technological Seeds,
Ave. of RMSEx Ave. of RMSEy Ave. of RMSE
Subject 2008.
for 25 points for 25 points for 25 points
1 5.43mm (0.52◦ ) 5.72 mm (0.55◦ ) 8.42 mm (0.80◦ )
2 12.72 mm (1.21◦ ) 12.09 mm (1.15◦ ) 18.19 mm (1.74◦ ) References
3 7.96 mm (0.76◦ ) 7.10 mm (0.68◦ ) 11.11 mm (1.06◦ )
4 8.13 mm (0.78◦ ) 12.37 mm (1.18◦ ) 15.33 mm (1.46◦ ) D UCHOWSKI , A. T. 2007. Eye Tracking Methodology: Theory
5 12.49 mm (1.19◦ ) 16.09 mm (1.54◦ ) 20.81 mm (1.99◦ ) and Practice, 2nd ed. Springer-Verlag.
6 17.95 mm (1.71◦ ) 8.89 mm (0.85◦ ) 21.58 mm (2.06◦ ) G UESTRIN , E. D., AND E IZENMAN , M. 2006. General theory
7 5.24 mm (0.50◦ ) 6.58 mm (0.63◦ ) 8.71 mm (0.83◦ ) of remote gaze estimation using the pupil center and corneal re-
8 4.05 mm (0.39◦ ) 7.61 mm (0.73◦ ) 8.97 mm (0.86◦ ) flections. IEEE Transactions on Biomedical Engineering 53, 6,
9 6.65 mm (0.64◦ ) 18.16 mm (1.73◦ ) 19.84 mm (1.89◦ ) 1124–1133.
10 7.79 mm (0.74◦ ) 14.47 mm (1.38◦ ) 17.31 mm (1.65◦ )
11 5.68 mm (0.54◦ ) 3.81 mm (0.36◦ ) 7.00 mm (0.67◦ ) G UESTRIN , E. D., AND E IZENMAN , M. 2007. Remote point-
12 3.71 mm (0.35◦ ) 5.38 mm (0.51◦ ) 6.68 mm (0.64◦ ) of-gaze estimation with free head movements requiring a single-
13 19.39 mm (1.85◦ ) 12.83 mm (1.22◦ ) 23.86 mm (2.28◦ ) point calibration. In Proceedings of the 29th Annual Interna-
14 12.97 mm (1.24◦ ) 8.78 mm (0.84◦ ) 16.39 mm (1.56◦ ) tional Conference of the IEEE EMBS, 4556–4560.
15 8.32 mm (0.79◦ ) 32.11 mm (3.06◦ ) 33.67 mm (3.21◦ )
16 5.52 mm (0.53◦ ) 26.19 mm (2.50◦ ) 26.86 mm (2.56◦ ) I NTEL. Open source computer vision library.
17 5.61 mm (0.54◦ ) 15.01 mm (1.43◦ ) 16.27 mm (1.55◦ ) http://sourceforge.net/projects/opencvlibrary/.
18 5.62 mm (0.54◦ ) 6.92 mm (0.66◦ ) 9.28 mm (0.89◦ ) JACOB , R. J. K. 1991. The use of eye movements in human-
19 5.01 mm (0.48◦ ) 20.14 mm (1.92◦ ) 20.86 mm (1.99◦ ) computer interaction techniques: what you look at is what you
20 7.66 mm (0.73◦ ) 18.09 mm (1.73◦ ) 19.88 mm (1.90◦ ) get. ACM Transactions on Information Systems 9, 2, 152–169.
Ave. 8.40 mm (0.80◦ ) 12.92 mm (1.23◦ ) 16.55 mm (1.58◦ )
KOHLBECHER , S., BARDINST, S., BARTL , K., S CHNEIDER , E.,
P OITSCHKE , T., AND A BLASSMEIER , M. 2008. Calibration-
free eye tracking by reconstruction of the pupil ellipse in 3D
distance of 600 mm from the eye, respectively. space. In Proceedings of the 2008 Symposium on Eye Tracking
Research & Applications, 135–138.
Therefore, the horizontal error is improved as compared to previ-
ous user-calibration-free methods that approximate the visual axis NAGAMATSU , T., K AIEDA , Y., K AMAHARA , J., AND S HIMADA ,
of the eye by the optical axis of the eye. On the other hand, the H. 2007. Development of a skill acquisition support system us-
vertical error is as expected, which is similar to the average value ing expert’s eye movement. In Proceedings of HCI International
for humans reported in previous literature [Osaka 1993]. 2007, vol. 9, 430–439.
As a whole system, the average of the RMSE for 20 subjects is NAGAMATSU , T., K AMAHARA , J., I KO , T., AND TANAKA , N.
16.55 mm (∼1.58◦ ). Therefore, the system is one of the most 2008. One-point calibration gaze tracking based on eyeball kine-
accurate user-calibration-free remote gaze tracking system at the matics using stereo cameras. In Proceedings of the 2008 Sympo-
moment. sium on Eye Tracking Research & Applications, 95–98.
NAGAMATSU , T., K AMAHARA , J., AND TANAKA , N. 2008.
3D gaze tracking with easy calibration using stereo cameras for
5 Conclusion robot and human communication. In Proceedings of IEEE RO-
MAN 2008, 59–64.
User-calibration-free gaze tracking using a binocular eye model
was described, which is the most accurate theoretical solution that NAGAMATSU , T., K AMAHARA , J., AND TANAKA , N. 2009.
is currently available for realization of user-calibration-free gaze Calibration-free gaze tracking using a binocular 3D eye model.
tracking. The proposed system uses two pairs of stereo cameras. In Proceedings of the 27th International Conference Extended
One pair of cameras each is used to estimate the optical axes of the Abstracts on Human Factors in Computing Systems, 3613–3618.
left and the right eye. The POG is estimated as the midpoint of O SAKA , R. 1993. Experimental Psychology of Eye Movements (in
the line joining the POAs of both the eyes. From the POG, we can Japanese). The University of Nagoya Press, Nagoya, Japan.
calculate the offsets of the visual and the optical axes of both the
eyes. S HIH , S.-W., AND L IU , J. 2004. A novel approach to 3-D gaze
tracking using stereo cameras. IEEE Transactions on Systems,
We developed a prototype system and evaluated it experimentally Man, and Cybernetics, Part B 34, 1, 234–245.
with 20 subjects. The results show that the average RMSEx ,
RMSEy , and RMSE for 20 subjects is 8.40 mm (∼0.80◦ ), 12.92 S HIH , S.-W., W U , Y.-T., AND L IU , J. 2000. A calibration-free
mm (∼1.23◦ ), and 16.55 mm (∼1.58◦ ), respectively. The horizon- gaze tracking technique. In Proceedings of International Con-
tal error is improved by using the binocular eye model as compared ference on Pattern Recognition, vol. 4, 201–204.
to previous user-calibration-free methods that approximate the vi- YAMAZOE , H., U TSUMI , A., YONEZAWA , T., AND A BE , S. 2008.
sual axis of the eye by the optical axis of the eye. Remote gaze estimation with a single camera based on facial-
feature tracking without special calibration actions. In Proceed-
Future works include estimation of vertical angle between the vi- ings of the 2008 Symposium on Eye tracking research & appli-
sual and the optical axes of the eye, improving the gaze estimation cations, 245–250.
area where the head can move, and applying the system in real-life
situations.
254