Machine Learning:

Schedule

09:00

Registration, poster set-up, and continental breakfast

09:30

Welcome

09:45

Invited Talk: Machine Learning in Space

Kiri L. Wagstaff, N.A.S.A.

10:15

A general agnostic active learning algorithm

Claire Monteleoni, UC San Diego

10:35

Bayesian Nonparametric Regression with Local Models

Jo-Anne Ting, University of Southern California

10:55

Coffee Break

11:15

Invited Talk: Applying machine learning to a real-world

problem: real-time ranking of electric components

Marta Arias, Columbia University

11:45

Generating Summary Keywords for Emails Using Topics.

Hanna Wallach, University of Cambridge

12:05

Continuous-State POMDPs with Hybrid Dynamics

Emma Brunskill, MIT

12:25

Spotlights

12:45

Lunch

14:20

Invited Talk: Randomized Approaches to Preserving Privacy

Nina Mishra, University of Virginia

14:50

Clustering Social Networks

Isabelle Stanton, University of Virginia

15:10

Coffee Break

15:30

Invited Talk: Applications of Machine Learning to Image

Retrieval

Sally Goldman, Washington University

16:00

Improvement in Performance of Learning Using Scaling

Soumi Ray, University of Maryland Baltimore County

16:20

Poster Session

17:10

Panel/ Open Discussion

17:40

Concluding Remarks

Invited Talks

Machine Learning in Space
Kiri L. Wagstaff, N.A.S.A.

Remote space environments simultaneously present significant challenges to the
machine learning community and enormous opportunities for advancement. In
this talk, I present recent work on three key issues associated with machine
learning in space: on-board data classification and regression, on-board
prioritization of analysis results, and reliable computing in high-radiation
environments. Support vector machines are currently being used on-board the
EO-1 Earth orbiter, and they are poised for adoption by the Mars Odyssey orbiter
as well. We have developed techniques for learning scientist preferences for
which subset of images is most critical for transmission, so that we can make the
most use of limited bandwidth. Finally, we have developed fault-tolerant SVMs
that can detect and recover from radiation-induced errors while performing on-
board data analysis.

About the speaker:
Kiri L. Wagstaff is a senior researcher at the Jet Propulsion
Laboratory in Pasadena, CA. She is a member of the
Machine Learning and Instrument Autonomy group, and
her focus is on developing new machine learning methods
that can be used for data analysis on-board spacecraft.
She has applied these techniques to data being collected
by the EO-1 Earth-orbiting spacecraft, Mars Odyssey, and
Mars Pathfinder. She has also worked on crop yield
prediction from orbital remote sensing observations, the
fault protection system for the MESSENGER mission to
Mercury, and automatic code generation for the Electra
radio used by the Mars Reconnaissance Orbiter and the
Mars Science Laboratory. She is very interested in issues such as robustness
(developing fault-tolerant machine learning methods for high-radiation
environments) and infusion (how can machine learning be used to advance
science?). She holds a Ph.D. in Computer Science from Cornell University and is
currently working on an M.S. in Geology from the University of Southern
California.

Applying machine learning to a real-world problem: real-time ranking of electric
components
Marta Arias, Columbia University

In this talk, I will describe our experience with applying machine learning
techniques to a concrete real-world problem: the generation of rankings of
electric components according to their susceptibility to failure. The system's goal
is to aid operators in the replacement strategy of most at-risk components and in
handling emergency situations. In particular, I will address the challenge of
dealing with the concept drift inherent in the electrical system and will describe
our solution based on a simple weighted-majority voting scheme.

About the speaker:
Marta Arias received her bachelor's degree in Computer
Science from the Polytechnic University of Catalunya
(Barcelona, Spain) in 1998. After that she worked for a
year at Incyta S.A. (Barcelona, Spain), a company
specializing in software products for Natural Language
Processing applications. She then enrolled in the graduate
student program at Tufts University, recieving her PhD in
Computer Science in 2004. That same year she joined the
Center for Computational Learning Systems of Columbia
University as an Associate Research Scientist. Dr. Arias'
research interest include the theory and application of
machine learning.

Randomized Approaches to Preserving Privacy
Nina Mishra, University of Virginia, Microsoft Research

The Internet is arguably one of the most important inventions of the last century.
It has altered the very nature of our lives -- the way we communicate, work, shop,
vote, recreate, etc. The impact has been phenomenal for the machine learning
community since both old and newly created information repositories, such as
medical records and web click streams, are readily available and waiting to be
mined. However, opposite these capabilities and advances is the basic right to
privacy: On the one hand, in order to best serve and protect its citizens, the
government should ideally have access to every available bit of societal
information. On the other hand, privacy is a fundamental right and human need,
which theoretically is served best when the government knows nothing about the
personal lives of its citizens. This raises the natural question of whether it is even
possible to simultaneously realize both of these diametrically opposed goals,
namely, information transparency and individual privacy. Surprisingly, the answer
is yes and I will describe solutions where individuals randomly perturb and
publish their data so as to preserve their own privacy and yet large-scale
information can still be learned. Joint work with Mark Sandler.

About the speaker:
Nina Mishra is an Associate Professor in the Computer
Science Department at the University of Virginia. Her
research interests are in data mining and machine
learning algorithms as well as privacy. She previously
held joint appointments as a Senior Research Scientist at
HP Labs, and as an Acting Faculty member at Stanford
University. She was Program Chair of the International
Conference on Machine Learning in 2003 and has served
on numerous data mining and machine learning program
committees. She also serves on the editorial Boards of
Machine Learning, IEEE Transactions on Knowledge and
Data Engineering, IEEE Intelligent Systems and the
Journal of Privacy and Conﬁdentiality. She is currently on
leave in Search Labs at Microsoft Research. She received a PhD in Computer
Science from UIUC.

Applications of Machine Learning to Image Retrieval
Sally Goldman, Washington University

Classic Content-Based Image Retrieval (CBIR) takes a single non-annotated
query image, and retrieves similar images from an image repository. Such a
search must rely upon a holistic (or global) view of the image. Yet often the
desired content of an image is not holistic, but is localized. Specifically, we define
Localized Content-Based Image Retrieval as a CBIR task where the user is only
interested in a portion of the image, and the rest of the image is irrelevant. We
discuss our localized CBIR system, Accio!, that uses labeled images in
conjunction with a multiple-instance learning algorithm to first identify the desired
object and re-weight the features, and then to rank images in the database using
a similarity measure that is based upon individual regions within the image. We
will discuss both the image representation and multiple-instance learning
algorithm that we have used in the localized CBIR systems that we have
developed. We also look briefly at ways in which multiple-instance learning can
be applied to knowledge-based image segmentation.

About the speaker:
Dr. Sally Goldman is the Edwin H. Murty Professor of
Engineering at Washington University in St. Louis and the
Associate Chair of the Department of Computer Science
and Engineering. She received a Bachelor of Science in
Computer Science from Brown University in December
1984. Under the guidance of Dr. Ronald Rivest at the
Massachusetts Institute of Technology, Dr. Goldman
completed her Master of Science in Electrical
Engineering and Computer Science in May 1987 and her
Ph.D. in July 1990. Dr. Goldman's research is in the area
of algorithm design and analysis and machine learning
with a recent focus on applications to the area of content-
based image retrieval. Dr. Goldman has received many
teaching awards and honors including the Emerson
Electric Company Excellence in Teaching Award in 1999, and the Governor's
Award for Excellence in Teaching in 2001. Dr. Goldman and her husband, Dr.
Ken Goldman, have just completed a book titled, A Practical Guide to Data
Structures and Algorithms using Java.

Talks

A General Agnostic Active Learning Algorithm
Claire Monteleoni, UC San Diego

We present a simple, agnostic active learning algorithm that works for any
hypothesis class of bounded VC dimension, and any data distribution. Most
previous work on active learning either makes strong distributional assumptions,
or else is computationally prohibitive. Our algorithm extends a scheme due to
Cohn, Atlas, and Ladner to the agnostic setting (i.e. arbitrary noise), by (1)
reformulating it using a reduction to supervised learning and (2) showing how to
apply generalization bounds even for the non-i.i.d. samples that result from
selective sampling. We provide a general characterization of the label
complexity of our algorithm. This quantity is never more than the usual PAC
sample complexity of supervised learning, and is exponentially smaller for some
hypothesis classes and distributions. We also demonstrate improvements
experimentally.

This is joint work with Sanjoy Dasgupta and Daniel Hsu. Currently in submission,
but for a full version, please see UCSD tech report:
http://www.cse.ucsd.edu/Dienst/UI/2.0/Describe/ncstrl.ucsd_cse/CS2007-0898

Bayesian Nonparametric Regression with Local Models
Jo-Anne Ting, University of Southern California

We propose a Bayesian nonparametric regression algorithm with locally linear
models for high-dimensional, data-rich scenarios where real- time, incremental
learning is necessary. Nonlinear function approximation with high-dimensional
input data is a nontrivial problem. An application example is a high-dimensional
movement system like a humanoid robot, where real-time learning of internal
models for compliant control may be needed. Fortunately, many real-world
data sets tend to have locally low dimensional distributions, despite having high
dimensional embedding (e.g., Tenenbaum et al. 2000, Roweis & Saul, 2000). A
successful algorithm, thus, must avoid numerical problems arising potentially
from redundancy in the input data, eliminate irrelevant input dimensions, and be
computationally efﬁcient to allow for incremental, online learning.

Several methods have been proposed for nonlinear function approximation, such
as Gaussian process regression (Williams & Rasmussen, 1996), support vector
regression (Smola & Schölkopf, 1998) and variational Bayesian mixture models
(Ghahramani & Beal, 2000). However, these global methods tend to be
unsuitable for fast, incremental function approximation. Atkeson, Moore & Schaal
(1997) have shown that in such scenarios, learning with spatially localized
models is more appropriate, particularly in the framework of locally weighted
learning.

In recent years, Vijayakumar & Schaal (2000) have introduced a learning
algorithm designed to fulfill the fast, incremental requirements of locally weighted
learning, specifically targeting high-dimensional input domains through the use of
local projections. This algorithm, called Locally Weighted Projection Regression
(LWPR),performs competitively in its generalization performance with state-of-
the-art batch regression methods. It has been applied successfully to
sensorimotor learning on a humanoid robot for the purpose of executing fast,
accurate movements in a feedforward controller.

The major issue with LWPR is that it requires gradient descent (with leave-one-
out cross-validation) to optimize the local distance metrics in each local
regression model. Since gradient descent search is sensitive to the initial values,
we propose a novel Bayesian treatment of locally weighted regression with
locally linear models that eliminates the need for any manual tuning of meta
parameters, cross-validation approaches or sampling. Combined with variational
approximation methods to allow for fast, tractable inference, this Bayesian
algorithm learns the optimal distance metric value for each local regression
model. It is able to automatically determine thesize of the neighborhood data
(i.e., the ``bandwidth’’) that should contribute to each local model. A Bayesian
approach offers error bounds on the distance metrics and incorporates this
uncertainty in the predictive distributions. By being able to automatically detect
relevant input dimensions, our algorithm is able to handle high- dimensional data
sets with a large number of redundant and/or irrelevant input dimensions and a
large number of data samples. We demonstrate competitive performance of our
Bayesian locally weighted regression algorithm with Gaussian Process
regression and LWPR on standard benchmark sets. We also explore extensions
of this locally linear Bayesian algorithm to a real-time setting, to offer a
parameter-free alternative for incremental learning in high-dimensional spaces.

Generating Summary Keywords for Emails Using Topics.
Hanna Wallach, University of Cambridge

Email summary keywords, used to concisely represent the gist of an email, can
help users manage and prioritize large numbers of messages. Previous work on
email keyword selection has focused on a two-stage supervised learning system
that selects nouns from individual emails using pre-defined linguistic rules [1]. In
this work we present an unsupervised learning framework for selecting email
summary keywords. A good summary keyword for an email message is not best
characterized as a word that is unique to that message, but a word that relates
the message to other topically similar messages. We therefore use latent
representations of the underlying topics in a user's mailbox to find words that
describe each message in the context of existing topics rather than selecting
keywords based on a single message in isolation. We present and compare
several methods for selecting email summary keywords, based on two well-

known models for inferring latent topics: latent semantic analysis (LSA) and
latent Dirichlet allocation (LDA).

Summary keywords for an email message are generated by selecting the
words that are most topically similar to the words in the email. We use two
approaches for selecting these words, one based on query-document similarity,
and the other based on word association. Each approach may be used in
conjunction with either LSA or LDA. We evaluate keyword quality by generating
summaries for emails from twelve users in the Enron corpus and comparing each
method's performance with a TF-IDF baseline. The quality of keywords are
assessed using two proxy tasks, in which the summaries are used in place of
whole messages: recipient prediction and foldering. In the recipient prediction
task, the keywords for each email are used to predict the intended recipients of
the current message. In the foldering task, each user's email messages are
sorted into folders using the selected keywords as features. Our topic-based
methods out-perform TF-IDF on both tasks, demonstrating that topic-based
methods yield better summary keywords. By selecting keywords based on user-
specific topics, we find summaries that represent each message in the context of
the entire mailbox, not just that of a single message. Furthermore, combining the
summary for an email with the email's subject improves foldering and recipient
prediction results over those obtained using either summaries or subjects alone.

References:
[1] S. Muresan, E. Tzoukermann, and J. Klavans (2001). Combining
linguistic and machine learning techniques for email
summarization. CONLL.

Continuous-State POMDPs with Hybrid Dynamics
Emma Brunskill, MIT

Partially observable Markov decision processes (POMDPs) provide a rich
framework for describing many important planning problems that arise in
situations with hidden state and stochastic actions. Most previous work has
focused on solving POMDPs with discrete state, action and observation spaces.
However, in a number of applications, such as navigation or robotic grasping, the
world is most naturally represented using continuous states. Though any
continuous domain can be described using a sufficiently fine grid, the number of
discrete states grows exponentially with the dimensionality of the underlying state
space. Existing discrete state POMDP algorithms can only scale up to the order
of a few thousand states, beyond which they become computationally infeasible.
Therefore, approaches for dealing efficiently with continuous-state POMDPs are
of great interest.

Previous work (such as [1]) on planning for continuous-state POMDPs has
typically modeled the world dynamics using a single linear Gaussian model to
describe the effects of an action. Unfortunately, this model is not powerful

enough to represent the multi-modal state-dependent dynamics that arise in a
number of problems of interest. For example, in legged locomotion the different
"modes" of walking and running are described best by signiﬁcantly different
dynamics. We instead employ a hybrid dynamics model for continuous-state
POMDPs that can represent stochastic state-dependent distributions over a
number of different linear dynamic models. We developed a new point-based
approximation algorithm for solving these hybrid-dynamics POMDP planning
problems that builds on Porta et al.'s continuous-state point-based approach[1].
One nice attribute of our algorithm is that by representing the value function and
belief states using a weighted sum of Gaussians, the belief state updates and
value function backups can be computed in closed form. An additional
contribution of our work is a new procedure for constructing a better
approximation of the alpha functions composing the value function. We
conducted experiments on a set of small problems to illustrate how the
representational power of the hybrid dynamics model allows us to address
problems not previously solvable by existing continuous-state approaches. In
addition, we examined the toy problem of a simulated robot searching blindly (no
observations) for a power supply in a long hallway. This problem requires a
variable level of representational granularity in order to perform well. Here our
hybrid continuous-state planner outperforms a discrete state POMDP planner,
demonstrating the potential of continuous-state approaches.
[1] J. Porta, M. Spaan, N. Vlassis, and P. Poupart. Point-based value iteration
for continuous POMDPs. Journal of Machine Learning Research, 7:2329-2367,
2006

Clustering Social Networks
Isabelle Stanton, University of Virginia

Social networks have gained popularity recently with the advent of sites such as
MySpace, Friendster, Facebook, etc. The number of users participating in these
networks is large, e.g., a hundred million in MySpace, and growing. These
networks are a rich source of data as users populate their sites with personal
information. Of particular interest in this paper is the graph structure induced by
the friendship links.

A fundamental problem related to these networks is the discovery of clusters or
communities. Intuitively, a cluster is a collection of individuals with dense
friendship patterns internally and sparse friendships externally. There are many
reasons to seek tightly-knit communities in networks, for instance, target
marketing schemes can be designed based on clusters and terrorist cells can be
uncovered.

Existing clustering criteria are limited in that clusters typically do not overlap, all
vertices are clustered and/or external sparsity is ignored. We introduce a new
criterion that overcomes these limitations by combining internal density with
external sparsity in a natural way. Our criterion does not require a strict

partitioning of the data which is particularly important in social networks, where
one user may be a member of many communities.

This work focuses on the combinatorial properties of the new criterion. In
particular, we bound the amount that clusters can overlap, as well as find a loose
bound for the number of clusters in a graph. From these properties we have
developed deterministic and randomized algorithms for provably finding the
clusters, provided there is a sufficiently large gap between internal density and
external sparsity. Finally, we perform experiments on real social networks
illustrate the effectiveness of the algorithm.

Improvement in Performance of Learning Using Scaling
Soumi Ray, University of Maryland Baltimore County

Reinforcement learning often requires many training iterations to get an optimal
policy. We are interested in trying to speed up learning in a domain using scaling,
which works as follows: partial learning is performed to learn a sub-optimal action
value function, Q, in the domain using standard Q-learning for few iterations. The
Q-values of Q are then multiplied by a constant factor to scale the Q-values.
Then learning continues using the scaled Q-values of the new Q-table as the
initial values. Surprising, in many situations this scaling significantly reduces the
number of iterations required to learn compared to learning without scaling.

We can summarize our method of scaling in the following steps:
1. Partial learning is done in the domain.
2. The Q-values of the partially learned domain are scaled, using a scaling factor
decided manually.
3. Finally learning in the domain is carried out using the new scaled Q-values.

This method can reduce the number of steps required to learn in the domain
compared to learning without scaling. Two important aspects of scaling are the
scaling factor and the time of scaling. If the scaling factor and the time of scaling
are chosen correctly then we can get great improvements in the performance of
learning in a domain. We have used 10×10 grid world domains with the starting
position at the top left corner and the goal at the bottom right corner to run our
experiments.

A Theory of Similarity Functions for Clustering
Maria-Florina Balcan, Carnagie Mellon University

Problems of clustering data from pairwise similarity information are ubiquitous in
Computer Science. Theoretical treatments typically view the similarity information
as ground-truth and then design algorithms to (approximately) optimize various
graph-based objective functions. However, in most applications, this similarity
information is merely based on some heuristic: the true goal is to cluster the

points correctly rather than to optimize any specific graph property. In this work,
we initiate a theoretical study of the design of similarity functions for clustering
from this perspective. In particular, motivated by recent work in learning theory
that asks "what natural properties of a similarity function are sufficient to be able
to learn well?" we ask "what natural properties of a similarity function are
sufficient to be able to em cluster well?"

We develop a notion of the clustering complexity of a given property (analogous
to notions of capacity in learning theory), that characterizes its information-
theoretic usefulness for clustering. We then analyze this complexity for several
natural game-theoretic and learning-theoretic properties, as well as design
efficient algorithms that are able to take advantage of them. We consider two
natural clustering objectives: (a) list clustering: analogous to the notion of list-
decoding, the algorithm can produce a small list of clusterings (which a user can
select from) and (b) hierarchical clustering: the desired clustering is some
pruning of this tree (which a user could navigate). Our algorithms for hierarchical
clustering combine recent learning-theoretic approaches with linkage-style
methods.

This is joint work with Avrim Blum and Santosh Vempala.

Spotlights

Advancing Associative Classifiers - Challenges and Solutions
Luiza Antonie, University of Alberta

In the past years, associative classifiers, classifiers that use association rules,
have started to attract attention. An important advantage that these classification
systems bring is that, using association rule mining, they are able to examine
several features at a time, while other state-of-the-art methods, like decision
trees or naive Bayesian classifiers, consider that each feature is independent of
one another. However, in real-life applications, the independence assumption is
not necessary true, and it was shown that correlations and co-occurrence of
features can be very important. In addition, the associative classifiers can handle
a large number of features, while other classification systems do not work well for
high dimensional data. The associative classification systems proved to perform
as well as, or even better, than other techniques in the literature. The associative
classifiers are models that can be read, understood, modified by humans and
thus can be manually enriched with domain knowledge.

We have proposed the integration of new types of association rules and new
methods to reduce the number of rules in the model. In our research work we
studied the behaviour of associative classifiers when negative association rules,
maximal and closed itemsets are employed. These types of association rules
have not been used in associative classifiers before, thus bringing new
challenges and opportunities to our work. Given that one advantage of the
classifiers based on association rules is their readability, another direction that
we investigated is reducing the number of association rules used in the
classification model. Pruning of rules not only improves readability, but it may
minimize overfitting of the model as well. Another challenge is the use of rules in
the classification stage. We proposed a new technique where the system
automatically learns how to use the rules.

Many applications can benefit from a good classification model. Given the
readability of the associative classifiers, they are especially fit to applications
were the model may assist domain experts in their decisions. Medical field is a
good example were such applications may appear. Let us consider an example
were a physician has to examine a patient. There is a considerable amount of
information associated with the patient (e.g. personal data, medical tests, etc.). A
classification system can assist the physician in this process. The system can
predict if the patient is likely to have a certain disease or present incompatibility
with some treatments. Considering the output of the classification model, the
physician can make a better decision on the treatment to be applied to this
patient. Given the transparency of our model, a health practitioner can
understand how the classification model reached its decision.

Real-life applications are usually characterized by unbalanced datasets. Classes
of interest may be under-represented, thus making harder the discovery of
knowledge associated with them. We evaluated the performance of our system
under these difficult conditions. We studied the performance of our classification
model on real-life applications (mammography classification, text categorization,
preterm birth prediction) where the classes of interest are typically under-
represented.

This is joint work with my supervisors, Osmar R. Zaiane and Robert C.
Holte.

Learning to Predict Prices in a Supply Chain Management Game
Shuo Chen, UC Berkeley

Economic decisions can benefit greatly from accurate predictions of market
prices, but making such predictions is a difficult problem and an area of active
research. In this paper, we present and compare several techniques for
predicting market prices that we have employed in the Trading Agent Competition
Supply Chain Management (TAC SCM) Prediction Challenge. These strategies
include simple heuristics and various machine learning approaches, such as
simple perceptrons and support vector regression. We show that the heuristic
methods are very good, especially for predicting current prices, but that the
machine learning techniques may be more appropriate for future price
predictions.

Sonar Terrain Mapping with BDI Agents
Shivali Gupta, University of Maryland, Baltimore County

Mapping a constantly changing environment is a challenge that necessitates a
team of agents working together. These agents must continually explore the
terrain and assemble the map in a distributed fashion. In a real-world instance of
this problem agents have limited sensor and communication ranges, such as
surveillance problem, further compounding the problem.

Our solution is to create multiple “Explorer" agents and a centralized “Base
station" agent using the BDI architecture. The BDI architecture provides a
framework for agents that have their individual beliefs, desires and intentions
(goals). The environment is ripe with uncertainty given its continually changing
nature which makes BDI architecture well suited to this problem. Mobile Explorer
agents have limited range of communication and partial observability of the
environment. The Base station agent is stable and it maintains the global map of
the environment from the information of the Explorer agents. Explorer agents use
the Base station’s global map (its beliefs about the world) to decide which area to
explore next, and after exploration they send their updated map to the Base
station agent. The Base station agent merges its copy with the information
received from the explorer agent. The Explorer agents must stay within

surveillance problem, further compounding the problem.
Our solution is to create multiple “Explorer" agents and a centralized “Base station" agent using
the BDI architecture. The BDI architecture provides a framework for agents that have their individual
beliefs, desires and intentions (goals). The environment is ripe with uncertainty given its continually
changing nature which makes BDI architecture well suited to this problem. Mobile Explorer agents
have limited range of communication and partial observability of the environment. The Base station
agent is stable and it maintains the global map of the environment from the information of the Explorer
agents. Explorer agents use the Base station’s global map (its beliefs about thecommunication
communication range of each other to maintain a complete world) to decide which
area to explore next, and after exploration they send their updated map to the Base station agent. The
network between all agents and the base station.
Base station agent merges its copy with the information received from the explorer agent. The Explorer
agents must stay within communication range of each other to maintain a complete communication
The system models the environment as a grid of cells and the Base station
network between all agents and the base station.
assigns each cell the“Curiosity level", based on how long it has been since that
The system models a environment as a grid of cells and the Base station assigns each cell a “Cu-
riosity level", based on howHigherhas been since that implies that the cell has curiosity level
region was explored. long it curiosity level region was explored. Higher not been
implies that recently. Therefore, the curiosity level drives exploration
explored the cell has not been explored recently. Therefore, the curiosity level drives exploration
toward regions of uncertainty. Explorer agents calculate a force vector,
toward regions of uncertainty. Explorer agents calculate a force vector,
force_vector = distance_based _penalty ∗ curiosity_value ∗ unit_vector (1)
for _every_cell

where distance_based_penalty is the inverse of theinverse of the manhattan distancefind cells
where distance_based_penalty is the manhattan distance of cells from agents, to of the
direction to explore. find calculation ensures that not all the agents move in one direction at the not all
from agents, to This the direction to explore. This calculation ensures that same
time. agents move advantages of this distributed approach is that One ofof an agent does not affect
the One of the major in one direction at the same time. a failure the major advantages
the this distributedIf an Exploreris that a failure of an agents can still continue to explore the
of system in general. approach agent fails, then the other agent does not affect the
environment.
system in general. If anagents prevent the average curiosity level from rising at a canspace and
The results show that more
Explorer agent fails, then the other agents fast still
continue eventually stabilizes after a limited number of Explorer agents explore the map. Another
the average to explore the environment.
result shows that distance penalty based on the manhattan distance provides a better solution because it
allowsresults show to explore theagents prevent the as well as the outer edges of the maprising
The Explorer agents that more local area around them, average curiosity level from in
comparison space and the on euclidian eventually stabilizes after aprocedure.number of
at a fast to a penalty based average distance which localizes the search limited In our future
work, we are interested in adding a learning mechanism result shows which distance penalty
Explorer agents explore the map. Another to the algorithm that would enable Explorer
agents to predict the changing behavior of the environment and how to explore it optimally. Learning
based on the manhattan distance provides a better solution because it allows
would also enable Explorer agents to avoid obstacles in their environment.
Explorer agents to explore the local area around them, as well as the outer
edges of the map in comparison to a penalty based on euclidian distance which
localizes the search procedure. In our future work, we are interested in adding a
learning mechanism to the algorithm which would enable Explorer agents to
predict the changing behavior of the environment and how to explore it optimally.
Learning would also enable Explorer agents to avoid obstacles in their
environment.

Online Learning for OffRoad Robots 1
Raia Hadsell, NYU

We present a learning-based solution to the problem of long-range obstacle
detection in autonomous robots. The system uses sparse traversability
information from a stereo module to train a classifier online. The trained classifier
can then predict the traversability of the entire scene. This learning strategy is
called self-supervised, near-to-far learning, and, if it is done in an online manner,
it allows the robot to adapt to changing environments and still accurately predict
the traversability of distant areas.

A distance-normalized image pyramid makes it possible to efficiently train on
each frame seen by the robot, using large windows that contain contextual
information as well as shape,color, and texture. Traversability labels are initially
obtained for each target using a stereo module, then propagated to other views
of the same target using temporal and spatial concurrences, thus training the

classifier to be view-invariant. A ring buffer simulates short-term memory and
ensures that the discriminative learning is balanced and consistent. This long-
range obstacle detection system sees obstacles and paths at 30-40 meters, far
beyond the maximum stereo range of 12 meters, and adapts very quickly to new
environments.

Experiments were run on the LAGR (Learning Applied to Ground Robots) robot
platform. Both the robot and the reference ``baseline'' software were built by
Carnegie Mellon University and the National Robotics Engineering Center. In this
program, in which all participants are constrained to use the given hardware, the
goal is to drive from a given start to a predefined (GPS) goal position through
unknown, offroad terrain using only passive vision. Both qualitative and
quantitative results are given by comparing the field performance of the robot
with and without learning-based, long-range vision enabled.

Posters
Untitled
Mair Allen-Williams, University of Southampton

Two particular challenges faced by agents within dynamic, uncertain multi-agent
systems are learning and acting in uncertain environments, and coordination with
other agents about whom they may have little or no knowledge. Although
uncertainty and coordination have each been tackled as separate problems,
existing formal models for an integrated approach make a number of simplifying
assumptions, and often have few guarantees. In this report we explore the
extension of a Bayesian learning model into partially observable multi-agent
domains. In order to implement such a model practically we make use of a
number of approximation techniques. In addition to traditional methods such as
repair sampling and state clustering, we apply graphical inference methods within
the learning step to propagate information through partially observable nodes.
We demonstrate the scalability of this approach with an ambulance rescue
problem inspired by the Robocup Rescue system.

Supervised Learning by Training on Aggregate Outputs
Janara Christensen, Carleton College

Supervised learning is a classic data mining problem where one wishes to be be
able to predict an output value associated with a particular input vector. We
present a new twist on this classic problem where, instead of having the training
set contain an individual output value for each input vector, the output values in
the training set are only given in aggregate over a number of input vectors. This
new problem arose from a particular need in learning on mass spectrometry
data, but could easily apply to situations when data has been aggregated in order
to maintain privacy. We provide a formal description of this new problem for both
classification and regression. We then examine how k-nearest neighbor, neural
networks, and support vector machines can be adapted for this problem.

Disparate Data Fusion for Protein Phosphorylation Prediction
Genetha Gray, Sandia National Labs

New challenges in knowledge extraction include interpreting and classifying data
sets while simultaneously considering related information to confirm results or
identify false positives. We discuss a data fusion algorithmic framework targeted
at this problem. It includes separate base classifiers for each data type and a
fusion method for combining the individual classifiers. The fusion method is an
extension of current ensemble classification techniques and has the advantage
of allowing data to remain in heterogeneous databases. In this poster, we focus
on the applicability of such a framework to the protein phosphorylation prediction
problem and show some numerical results.

Real Boosting a la Carte with an Application to Boosting Oblique Decision Tree
Claudia Henry, Université des Antilles et de la Guyane

In the past ten years, boosting has become a major field of machine learning and
classification. We bring contributions to its theory and algorithms. We first unify a
well-known top-down decision tree induction algorithm due to Kearns and
Mansour, and discrete AdaBoost, as two versions of a same higher-level
boosting algorithm. It may be used as the basic building block to devise simple
provable boosting algorithms for complex classifiers. We provide one example:
the first boosting algorithm for Oblique Decision Trees, an algorithm which turns
out to be simpler, faster and significantly more accurate than previous
approaches.

Multimodal Integration for Multiparty Dialogue Understanding: A Machine
Learning Framework
Pei-Yun Sabrina Hsueh, University of Edinburgh

Recent advances in recording and storage technologies have led to huge
archives of multimedia conversational speech recordings in widely ranging areas,
such as clinical use, online sharing service, and meeting analysis. While it is
straightforward to replay such recordings, finding information from the often
lengthy archives has become more difficult. It is therefore essential to provide
sufficient aids to guide the users through the recordings and to point out the most
important events that need their attentions. In particular, my research concerns
how to infer human communicative intention from low level audio and video
signals. In particular, I focus on identifying multimodal integration patterns (e.g.,
people tend to speak more firmly and address to the whole group more often
when they are making decisions) in human conversations, using approaches
ranging from statistical analysis, empirical study, to machine learning.

Past research has shown that ehe identified multimodal integration patterns are
useful for recognizing local speaker intention in recorded speech such as speech
disfluency (e.g., false start). My research attempts to recover speaker intention
that serve a more global communicative goal, such as ìinitiate-discussionî and
ìreach-decision." A learning framework that can identify characteristic features of
different semantic classes has been developed. This framework has been proven
to be useful for automatic topic segmentation (and labeling) and automatic
decision detection. The ultimate goal of this research is to enhance the current
browsing and search utilities of multimedia archives.

A POMDP for Automatic Software Customization
Bowen Hui, University of Toronto

Providing personalized software for individuals has the potential to increase work
productivity and user satisfaction. In order to accommodate a wide variety of user
needs, skills, and preferences, today's software is typically packed with
functionality suitable for everyone. As a result, the interface is complicated,
functionalities are unexplored, and hence, unused, and users are dissatisfied
with the product. Many attempts in the user adaptive systems literature have
explored ways to customize software according to the inferred user needs.

Recent probabilistic approaches model the uncertainty in the application domain
and typically optimize single objective functions, i.e., helping the user complete a
task faster or interact with the interface easier, but not both. A few exceptions
exist that provide a principled treatment to modeling the uncertainty and the
tradeoffs that are needed to satisfy multiple objectives. Nevertheless, existing
work have done little to address three important issues:
* the interaction principles that govern the nature of the problem's objective
functions
* the hidden user variables that explain observed preferences and behaviour
* the value of information available in the repeated, sequential nature of the
interaction between the user and the system

We are interested in designing a software agent that assists the user by adapting
the interface and suggesting task completion help. In particular, the sequential
nature of the human-computer interaction (HCI) naturally lends itself as a partially
observable Markov decision process (POMDP). We propose to develop a
customization POMDP that learns the type of user it is dealing with and adapts
its behaviour in order to maximize expected rewards formulated by the
interaction principles for that specific user. Overall, modeling the automatic
customization problem as a POMDP enables the system to take optimal actions
with respect to the value of information gain of an exploratory action and the
immediate rewards obtained by exploitation. This approach provides a decision-
theoretic treatment to balancing the opportunities to learn about the user versus
exploiting what the system already knows about the user.

This work pools together techniques and insights from artificial intelligence and
machine learning to construct and solve the POMDP. Specifically, we adopt
methods from the Bayesian user modeling literature to construct a generic user
model, the activity recognition literature to build a goal model of user activities,
the HCI literature to formulate the reward model specifying user objectives, the
preference elicitation literature to learn the user's utility function for adaptive
systems, and the machine learning literature to populate model parameters with
incomplete data and to do approximate inference. In addition to the development
of the novel user model and reward model, a major contribution here is

demonstrating that the customization POMDP is able to model real world
applications tractably and is able to adapt to different types of users quickly.

Using Probabilistic Graphical Models in Bio-Surveillance Research
Masoumeh Izadi, McGill University

Artificial intelligence methods can support and assist optimal use of clinical and
administrative knowledge in diverse perspectives from diagnostic assistance, and
detection of epidemics, to improved efficiency of health care delivery processes.
Probabilistic graphical models have been successfully used for many medical
problems. We describe a decision support system in public health bio-
surveillance research. A long line of research has shown that current outbreak
detection methods are ineffective; they raise both false alarms and miss attacks.
Our approach tries to bring us closer to an effective detection system that detects
real attacks and only those. I show how Partially Observable Markov Decision
Processes (POMDPs) can be applied on outbreak detection methods for
improving alarm function in the case of anthrax. Our results show that this
method significantly outperforms existing solutions, in terms of both sensitivity
and timeliness.

Incorporating a New Relational Feature in Online Handwritten Character
Recognition
Sara Izadi, Concordia University

Artificial neural networks have shown good capabilities in performing
classification tasks. However, classifier models used for learning in pattern
classification are challenged when the differences between the patterns of the
training set are small. Therefore, the choice of effective features is
mandatory for reaching a good performance. Statistical and geometrical features
alone are not suitable for recognition of hand printed characters due to variations
in writing styles, that may result in deformations of character shapes. We address
this problem by using a relational context feature combined with a local
descriptor for training a neural network-based recognition system in a user-
independent online character recognition application. Our feature extraction
approach provides a rich representation of the global shape characteristics, in a
considerably compact form. This new relational feature generally provides a
higher distinctiveness and robustness to character deformations, thus potentially
increasing the recognition rate in a user-independent system. While enhancing
the recognition accuracy, the feature extraction is computationally simple. We
show that the ability to discriminate in handwriting characters is increased by
adopting this mechanism which provides input to the feed forward neural network
architecture. Our experiments on Arabic character recognition show comparable
results with the state-of-the- art methods for online recognition of these
characters.

Description Length and the Multiple Motif Problem
Anna Ritz, Brown University

Protein interactions drive many biological functions in the cell. A source protein
can interact with several proteins; the specificity of this interaction is partly
determined by the sequence around the binding site. In the 20-letter alphabet of
protein sequences (denoting the 20 amino acids), a motif is a pattern that
describes these binding preferences for a given protein. The motif-finding
problem is to extract a motif from a set of sequences that interact with a given
protein. The problem is solved by identifying statistically enriched patterns in this
foreground set compared to a background set of non-interacting sequences.
Finding such patterns is well-studied in Computational Biology.

Recent advances in technology require us to rethink the approach to the motif-
finding problem. Mass spectrometry, for example, allows high-throughput
measurements of multiple proteins interacting simultaneously. This creates a
foreground set that is a mixture of motifs. The Multiple Motif problem is described
as follows: find a collection of motifs, called a motif model, that best describes the
foreground. The motif model is empty if the background distributions describe the
foreground better than any set of patterns.

A few algorithms to find multiple motifs exist, but they use either overly simplistic
or overly descriptive motif representations. Overly simplistic motifs provide limited
information about the structure of the data, while overly descriptive motifs use
many parameters that require unrealistically large datasets. We use a
representation between these extremes: some positions in a motif are exact,
while others are restricted to a few letters.

When comparing motif models, we want to know which model describes the
foreground the best. We use description length as a metric. Our goal is to learn
the motif model that produces the most compact representation of the foreground
by minimizing description length. Using minimum description length in this
context circumvents some of the limitations of other representations. Each motif
in the model must contribute to describing the foreground as concisely as
possible, avoiding both redundancy and overfitting. Description length also gives
a criterion for merging multiple exact motifs into a single, inexact motif, a task
that is often ambiguous in other algorithms.

We describe the use of minimum description length to filter the results of known
algorithms and to discover novel motifs in synthetic and real datasets.
This is joint work with Benjamin Raphael and Gregory Shakhnarovich at Brown
University.

Machine Translation with Self Organized Maps
Aparna Subramanian, University of Maryland, Baltimore County

I am investigating the idea of using Self Organizing Maps for the purposes of
Machine Translation. Human translators seem to translate based on their
knowledge of what words/phrases of one language best represent the translation
of the word/phrase in another. While choosing these word/phrase equivalents,
they rely on similarity in the underlying concept to which the two words/phrases
in different languages correspond to. This gives a good reason for a machine
translation system to do something similar, i.e. translating at a conceptual level.
Conceptual relativism of languages indicates a good source to parameterize
concepts for the purpose of translation. Self Organizing Maps (SOM) can be
used to formalize such concept categories and improve them by learning over
time. Contextual information can also be captured in SOMs and be used for
translation. Major challenges in practical application of SOMs to problems such
as translation which require large vectors of concepts to be stored and processed
are speed and space. This can be resolved in at least the following two ways –
SOMs stored and processed as a hierarchy of concepts and SOMs maintained
as different modules each catering to a group of similar concepts. I plan to
further investigate the feasibility of these methods.

One approach for translation therefore is to average over the contextual
relevance of the given piece, e.g. sentence, over the whole conversation or text
in the source language under consideration. This can be done using a SOM for
contexts which learns with every input sentence in the text. The mapping of the
input sentence in the SOM can then be used as input to the Word Category Map
of the source language. The output/s of this exercise can be the input to the
target language Word Category Map. The words/phrases that are the outcome of
this step can be organized into a sentence using the context SOM for the target
language and can be aided by the knowledge of the grammar for the target
language.

The investigation is in its initial stages, though the idea appears promising
because this kind of translation system has the capacity to evolve through
learning and takes care of pragmatics of the input. The approach also seems
viable since there have been attempts in the past to use SOM for Natural
Language Processing in general. The present work will be signiﬁcant as attempts
of using Self Organizing Maps for Machine Translation do not appear to have
been explored, though it has been indicated as possibility in previous works.

Policy Recognition for Multi-Player Tactical Scenarios
Gita Sukthankar, University of Central Florida

This research addresses the problem of recognizing policies given logs of battle
scenarios from multi-player games. The ability to identify individual and team
policies from observations is important for a wide range of applications including

automated commentary generation, game coaching, and opponent modeling.
We define a policy as a preference model over possible actions based on the
game state, and a team policy as a collection of individual policies along with an
assignment of players to policies. Given a sequence of input observations, O,
(including observable game state and player actions), a set of player policies, P,
and team policies, T, the goal is to identify the individual policies p that were
employed during the scenario.

A team policy is an allocation of players to tactical roles and is typically arranged
prior to the scenario as a locker-room agreement. However, circumstances
during the battle (such as the elimination of a teammate or unexpected enemy
reinforcements) can frequently force players to take actions that were a priori
lower in their individual preference model. In particular, one difference between
policy recognition in a tactical battle and typical plan recognition is that agents
rarely have the luxury of performing a pre-planned series of actions in the face of
enemy threat. This means that methods that rely on temporal structure, such as
Dynamic Bayesian Networks (DBNs) and Hidden Markov Models are not
necessarily be well-suited to this task. An additional challenge is that, over the
course of a single scenario, one only observes a small fraction of the possible
game states, which makes policy learning difficult.

This research explores a model-based system for combining evidence from
observed events using the Dempster-Shafer theory of evidential reasoning. The
primary benefit of this approach is that the model generalizes easily to different
initial starting states (scenario goals, agent capabilities, number and composition
of the team). Unlike traditional probability theory where evidence is associated
with mutually-exclusive outcomes, the Dempster-Shafer theory quantifies belief
over sets of events. We computed the average accuracy over the set of battles
for each of the three rules of combination. We evaluate our Dempster-Shafer
based approach on logs of real and simulated games played using Open Gaming
Foundation d20, the rule system used by many popular tabletop games,
including Dungeons and Dragons.

Advice-based Transfer in Reinforcement Learning
Lisa Torrey, University of Wisconsin

This report is an overview of our work on transfer in reinforcement learning using
advice-taking mechanisms. The goal in transfer learning is to speed up learning
in a target task by transferring knowledge from a related, previously learned
source task. Our methods are designed to do so robustly, so that positive transfer
will speed up learning but negative transfer will not slow it down. They are also
designed to allow human teachers to provide simple guidance that increases the
benefit of transfered knowledge. These methods allow us to push the boundaries
of current work in this area and perform transfer between complex and dissimilar
tasks in the challenging RoboCup simulated soccer domain.

Determining a Relationship Between Two Distinct Atmospheric Data Sets of
Different Granularities
Emma Turetsky, Carleton College

Regression analysis is a classic data mining problem with many real-world
applications. We present several methods of using data mining and statistical
analysis to find a relationship between two different data sets; atmospheric
particles (and their elemental constituents) and elemental carbon (EC).
Specifically, we wish to determine which elements in the atmosphere cause
elemental carbon, something that is common in industrial zones and large cities
and can normally be found in exhaust fumes and areas where there is visible
carbon. In order to do this, we used machine learning regression algorithms
including SVM regression and Lasso regression as well as regular linear
regression. Weíve created several models that correlate specific elements with
the amount of elemental carbon in the atmosphere.

Inferring causal relationships between genes from steady state observations and
topological ordering information
Xin Zhang, Arizona State University

The development of high-throughput genomic technologies, such as cDNA
microarray and oligonucleotide chips, empowers researchers to reveal gene
interactions. Mathematical modeling and in-silico simulation can be used to
analyze gene interactions unambiguously, and to predict the network dynamic
behavior in a systematic way. Various network inference models have been
developed to identify gene regulatory networks using gene expression data, but
none of them are about inferring causal relationships between genes, which is a
very important issue in system biology. Among the developed methods, the
Inductive Causation (IC) algorithm has been proven to be effective for inferring
causal relationships among variables. However, simulation study in the context of
gene regulatory network shows that the IC algorithm, which uses only one single
data source, results in low precision and recall rates. To improve the
performance, we propose a joint learning scheme that integrates multiple data
sources. We present a modified IC (mIC) algorithm, that combines steady state
data with partial prior knowledge of gene topological ordering information, for
jointly learning causal relationships among genes.

We perform three sets of experiments on synthetic datasets for learning causal
relationships between genes using the IC and the mIC algorithms. Each
experiment contains 100 randomly generated Boolean networks (DAGs), each of
which contains 10 genes connected by proper functions, with the gene
topological ordering information. The distribution of the network is generated
based on the probability distribution of the root genes and the proper functions.
The Monte Carlo sampling method is used to generate 200 samples in a dataset
for each network based on the probability distribution. We compare the
simulation results from the mIC algorithm with the ones from the IC algorithm.

From the simulation based evaluation we conclude that (i) IC algorithm does not
work well for learning gene regulatory networks from steady state data alone, (ii)
a better way for learning the gene causal relationship from steady state data is to
use additional knowledge such as gene topological ordering, (iii) the precision
and recall rates for mIC algorithm is significantly improved compared with IC
algorithm with statistical confidence of 95%. For randomly generated networks,
the mIC algorithms work well for jointly learning the causal regulatory network by
combining steady state data and gene topological ordering knowledge, with
precision rate of greater than 60%, and recall rate greater than 50%.

We further apply the mIC algorithm to gene expression profiles used in the study
of melanoma. 31 malignant melanoma samples were quantized to the ternary
format such that the expression level of each gene is assigned to ñ1
(downregulated), 0 (unchanged) or 1 (up-regulated). The 10 genes involved in
this study are chosen from 587 genes from the melanoma dataset. The result
showed that some of the important causal relationships associated with WNT5A
gene have been identified using the mIC algorithm, and those causal
connections have been verified from the literatures.

Workshop Organization

Organizers:

Hila Becker, Columbia University

Bethany Lefﬂer, Rutgers University

Faculty Advisor:
Lise Getoor, University of Maryland, College Park

Reviewers:

Hila Becker

Finale Doshi

Seyda Ertekin

Katherine Heller

Bethany Lefﬂer

Özgür Şimşek

Jenn Wortmann

Thanks to our sponsors:

C R A

Committee on the Status of
Women in Computing Research

PRINCETON
UNIVERSITY

Machine Learning:

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Machine Learning:

Similar a Machine Learning: (20)

Más de butest

Más de butest (20)

Machine Learning: