A model of visual saliency is often used to highlight interesting or perceptually significant features in an image. If a specific task is imposed upon the viewer, then the image features that disambiguate task-related objects from non-task-related locations should be incorporated into the saliency determination as top-down information. For this study, viewers were given the task of locating potentially cancerous lesions in synthetically-generated medical images. An ensemble of saliency maps was created to model the target versus error features that attract attention. For MRI images, lesions are most reliably modeled by luminance features and errors are mostly modeled by color features, depending upon the type of error (search, recognition, or decision). Other imaging modalities showed similar differences between the target and error features
that contribute to top-down saliency. This study provides evidence that image-derived saliency is task-dependent and may be used to predict target or error locations in complex images.
2. 2 Method
An ASL Model 504 remote eye-tracker was used for this experi-
ment, along with the ASL Eye-trac 6 User Interface Software and
Control Unit. 19 participants from the campus community (nine
males and ten females between the ages of 18 and 58) were re-
cruited, all na¨ve with respect to the purpose of the experiment, and
ı
with no prior experience locating lesions in radiological images. All
participants were screened for normal color vision and normal or
corrected-to-normal vision and were allowed an unlimited amount
of time to detect as many targets in each image as possible. The eye-
tracking session lasted approximately 30 minutes per participant,
including calibration time. Prior to the start of the experiment, each
participant was given an instruction sheet with information about
how the experiment would proceed. The instructions stated, in gen-
eral, that a feature would appear as a circular spot in the image and Figure 2: Unweighted saliency map using only low-level features
could be located anywhere within the anatomical portion of the im- of color, luminance, and oriented edges (left) and after thresholding
age (i.e., a feature would never be located on the image border). at 0.45 (right). Lesions are shown surrounded by a white square.
Since the participants used in this study were not radiologists, the
results are not directly applicable to a clinical setting; however, un-
trained observers might still provide useful information about the
The map score is used to determine how well a saliency map models
target and error features that attract attention during search in com-
attention. If the score is close to one, then the map is not a good
plex imagery.
model of attention - since St is nearly equal to Sm , any random
The experiment consisted of monitoring and recording participants’ location would do just as well at predicting the response. If, on
fixation locations, fixation durations, and mouse clicks as they the other hand, the score is greater than one, then the map is a good
viewed eleven sets of six simulated brain images (66 images to- (better than random) model of attention because the target locations
tal). Simulated lesions with known size, shape, contrast, and lo- tend to be on regions of the image that the model has computed as
cation were inserted into the images at random locations. Each being highly salient.
image had between zero and five lesions. The images were gen-
The scoring procedure is repeated with a different set of weights to
erated from single-mode PET and MRI phantoms and multi-mode
produce another candidate map, and stops when the highest possi-
fused PET/MRI images. Three sets of fused images were used,
ble score is produced. Since an exhaustive search across the en-
each set using a different color look-up table for displaying the
tire weight space is computationally prohibitive, a genetic algo-
mixed modes. The fused images were sub-divided into three cat-
rithm was developed to find approximately optimal weights, using
egories depending upon whether the lesions were embedded in the
the scoring metric described above as the fitness criteria. The ge-
PET image, the MRI image, or both. Figure 1 shows examples of
netic algorithm was initialized with random weights for each fea-
the fused PET/MRI images and the types of information that was
ture map, and then over each generation (300 total) the two high-
collected during the experiment.
est scores were selected to randomly exchange their weights, with
crossovers and mutations allowed according to established param-
3 Determining Feature Weights eters. A total of 2,400 trials were run before a solution converged.
A na¨ve saliency map weights each of the low-level feature maps
ı Figure 2 shows an example saliency map generated using only low-
(color, luminance, and orientation) equally in the final summation level features (color, luminance, and oriented edges) without any
step. An optimally weighted map would take into account the rela- task- or target-related information (as in the standard model [Itti
tive importance of any feature for the target type. To determine the et al. 1998]). Figure 3 shows the same image with the saliency map
optimal feature weights, a metric was developed to “score” a map, generated using weights learned from the genetic algorithm and ap-
givn a specific weight vector. A map score is defined simply as the plied to the low-level feature maps. For this example the targets are
ratio of the mean target saliency, St at some pre-defined locations lesions, with locations indicated on the image by a white square.
to the mean saliency of the entire map, Sm . Figure 4 shows the weighted saliency map found for an MRI im-
age with five lesions applied to an MRI image with 3 lesions. The
Score = St / Sm . weighted map is able to correctly predict lesion locations in this
different test image.
Mean target saliency St is found by first generating a saliency
map using a random set of weights for a particular input image.
Next, the x,y-coordinates of a set of target locations are determined 4 Results
from the eye-tracking data, ground-truth data, or from a record
of observer responses (mouse clicks). For each target location, An ensemble of (approximately) optimally-weighted saliency maps
the x,y-coordinate is used as an index into the saliency map, and was created, one for each of the different target types - lesion loca-
the saliency value at that location is extracted. A 7x7 pixel win- tions, false positives, search errors, recognition errors, and decision
dow (corresponding to 1/4◦ visual angle at the viewing distance of errors. The map feature weights for lesions locations are frequently
52 cm) is centered on the location, and all saliency values falling different from the map feature weights for errors. For example,
within the window are averaged together. This procedure is re- Figure 5 shows that the highest weighted feature for lesions in the
peated for every target location in the map and the average of those MRI images is luminance; however, for all of the MRI errors, the
values is used as the mean target saliency, St . The mean map highest weighted feature is the blue-yellow color-opponent feature.
saliency, Sm is the average saliency over all locations in the map Other imaging modalities also showed significant differences be-
(target and non-target). The score of a map is then simply the ratio tween feature weights for target and error locations. This may be an
between the mean target saliency and the mean map saliency. indication that visual search, recognition, and decision errors arise
62
3. Figure 5: Relative weights of the low-level feature maps that are
combined (summed) together to create the saliency map for the MRI
images. Note that low-level features of the search target (lesions)
are dominated by luminance information, whereas the low-level
features that attract attention for each of the four error types are
dominated by the blue-yellow color feature.
Figure 3: Weighted saliency map with weights determined using a
genetic algorithm optimized for target type (left) and after thresh- from specific attentional characteristics that differ from those for
olding at 0.45 (right). Lesions are shown surrounded by a white correct detection in a search task. This information might be useful
square. in a decision-support or computer-aided detection (CAD) system,
to highlight or otherwise flag locations in the image that have a
high probability of incorrect classification.
5 Conclusion
Low-level features such as luminance, color, and edges can attract
the attention of the human visual system during a search task, and
those features are specific to certain types of targets. More re-
search into the nature of decision-making at the level just below
that of conscious awareness, such as is enabled by eye-tracking ex-
periments, will help to uncover the pre-conscious biases and strate-
gies that contribute to image interpretation, as well as image mis-
interpretation.
Acknowledgements
Thanks to Karl Baum for generation of the MRI images.
References
I TTI , L., KOCH , C., AND N IEBUR , E. 1998. A model of saliency-
based visual attention for rapid scene analysis. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 20, 11,
1254–1259.
KOCH , C., AND U LLMAN , S. 1985. Shifts in selective visual
attention: Towards the underlying neural circuitry. Human Neu-
robiology 4, 219–227.
K RUPINSKI , E. A. 2000. The importance of perception research
Figure 4: Weighted saliency map on different image, thresholded at in medical imaging. Radiation Medicine 18, 6, 329–324.
0.45. Lesion locations are correcly predicted by the saliency map.
63