Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Machine Learning presentation.

Cargando en…3

Eche un vistazo a continuación

1 de 47 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)


Más de butest (20)

Machine Learning presentation.

  1. 1. CS 8520: Artificial Intelligence Machine Learning 1 Paula Matuszek Fall, 2005
  2. 2. What is learning? <ul><li>“ Learning denotes changes in a system that ... enable a system to do the same task more efficiently the next time.” – Herbert Simon </li></ul><ul><li>“ Learning is constructing or modifying representations of what is being experienced.” – Ryszard Michalski </li></ul><ul><li>“ Learning is making useful changes in our minds.” – Marvin Minsky </li></ul>
  3. 3. Why learn? <ul><li>Understand and improve efficiency of human learning </li></ul><ul><ul><li>Improve methods for teaching and tutoring people (better CAI) </li></ul></ul><ul><li>Discover new things or structure that were previously unknown to humans </li></ul><ul><ul><li>Examples: data mining, scientific discovery </li></ul></ul><ul><li>Fill in skeletal or incomplete specifications about a domain </li></ul><ul><ul><li>Large, complex AI systems cannot be completely derived by hand and require dynamic updating to incorporate new information. </li></ul></ul><ul><ul><li>Learning new characteristics expands the domain or expertise and lessens the “brittleness” of the system </li></ul></ul><ul><li>Build software agents that can adapt to their users or to other software agents </li></ul><ul><li>Reproduce an important aspect of intelligent behavior </li></ul>
  4. 4. Learning Systems <ul><li>Many machine learning systems can be viewed as an iterative process of </li></ul><ul><ul><li>produce a result, </li></ul></ul><ul><ul><li>evaluate it against the expected results </li></ul></ul><ul><ul><li>tweak the system </li></ul></ul><ul><li>Machine learning is also used for systems which discover patterns without prior expected results. </li></ul><ul><li>May be open or black box </li></ul><ul><ul><li>Open: changes are clearly visible in KB and understandable to humans </li></ul></ul><ul><ul><li>Black Box: changes are to a system whose internals are not readily visible or understandable. </li></ul></ul>
  5. 5. Learner Architecture <ul><li>Any learning system needs to somehow implement four components: </li></ul><ul><ul><li>Knowledge base: what is being learned. Representation of a problem space or domain. </li></ul></ul><ul><ul><li>Performer: does something with the knowledge base to produce results </li></ul></ul><ul><ul><li>Critic: evaluates results produced against expected results </li></ul></ul><ul><ul><li>Learner: takes output from critic and modifies something in KB or performer. </li></ul></ul><ul><li>May also need a “problem generator” to test performance against. </li></ul>
  6. 6. A Very Simple Learning Program <ul><li>Animals Guessing Game </li></ul><ul><ul><li>Representation is a binary tree </li></ul></ul><ul><ul><li>Performer is a tree walker interacting with a human </li></ul></ul><ul><ul><li>Critic is the human player </li></ul></ul><ul><ul><li>Learning component elicits new questions and modifies the binary tree </li></ul></ul>
  7. 7. Representation <ul><li>How do you describe your problem? </li></ul><ul><ul><li>I'm guessing an animal: binary decision tree </li></ul></ul><ul><ul><li>I'm playing chess: the board itself, sets of rules for choosing moves </li></ul></ul><ul><ul><li>I'm categorizing documents: vector of word frequencies for this document and for the corpus of documents </li></ul></ul><ul><ul><li>I'm fixing computers: frequency matrix of causes and symptoms </li></ul></ul><ul><ul><li>I'm OCRing digits: probability of this digit; 6x10 matrix of pixels; % light; # straight lines </li></ul></ul>
  8. 8. Performer <ul><li>How do you take action? </li></ul><ul><ul><li>Guessing an animal: walk the tree and ask associated questions </li></ul></ul><ul><ul><li>Playing chess: chain through the rules to identify a move; use conflict resolution to choose one; output it. </li></ul></ul><ul><ul><li>Categorizing documents: apply a function to the vector of features (word frequencies) to determine which category to put document in </li></ul></ul><ul><ul><li>Fixing computers: use known symptoms to identify potential causes, check matrix for additional diagnostic symptoms. </li></ul></ul><ul><ul><li>OCRing digits: input the features for a digit, output probability that it's 0-9. </li></ul></ul>
  9. 9. Critic <ul><li>How do you judge correct actions? </li></ul><ul><ul><li>Guessing an animal: human feedback </li></ul></ul><ul><ul><li>Playing chess: who won? (Credit assignment problem) </li></ul></ul><ul><ul><li>Categorizing documents: a set of human-categorized test documents. </li></ul></ul><ul><ul><li>Fixing computers: Human input about symptoms and cause observed for a specific case </li></ul></ul><ul><ul><li>OCRing digits: Human-categorized training set. </li></ul></ul>
  10. 10. Learner <ul><li>What does the learner do? </li></ul><ul><ul><li>Guessing an animal: elicit a question from the user and add it to the binary tree </li></ul></ul><ul><ul><li>Playing chess: increase the weight for some rules and decrease for others. </li></ul></ul><ul><ul><li>Categorizing documents: modify the weights on the function to improve categorization </li></ul></ul><ul><ul><li>Fixing computers: update frequency matrix with actual symptoms and outcome </li></ul></ul><ul><ul><li>OCRing digits: modify weights on a network of associations. </li></ul></ul>
  11. 11. General Model of Learning Agent Environment Agent Critic Learning Element Problem Generator Performer with KB Performance Standard Sensors Effectors feedback learning goals changes knowledge
  12. 12. Some major paradigms of machine learning <ul><li>Rote learning – Hand-encoded mapping from inputs to stored representation. “Learning by memorization.” </li></ul><ul><li>Interactive learning – Human/system interaction producing explicit mapping. </li></ul><ul><li>Induction – Using specific examples to reach general conclusions. </li></ul><ul><li>Analogy – Determining correspondence between two different representations. Case-based reasoning </li></ul><ul><li>Clustering – Unsupervised identification of natural groups in data </li></ul><ul><li>Discovery – Unsupervised, specific goal not given </li></ul><ul><li>Genetic algorithms – “Evolutionary” search techniques, based on an analogy to “survival of the fittest” </li></ul>
  13. 13. Approaches to Learning Systems <ul><li>Can also be classified by degree of human involvement required, in the critic or the learner component. </li></ul><ul><ul><li>All human input </li></ul></ul><ul><ul><li>Computer-guided human input </li></ul></ul><ul><ul><li>Human-guided computer learning </li></ul></ul><ul><ul><li>All computerized, no human input </li></ul></ul>
  14. 14. Rote Learning: <ul><li>In people: straight memorization </li></ul><ul><li>In computer systems: Knowledge engineering; direct entry of rules and facts </li></ul><ul><li>This is all human input. This is the traditional approach to developing ontologies, for instance </li></ul><ul><li>Knowledge base is captured knowledge </li></ul><ul><li>Performer is an inference engine, ontology browser, or other user of the KB </li></ul><ul><li>Learner is the editor use to develop the KB + the human </li></ul><ul><li>Critic is entirely offline, as the human examines or tests the system. </li></ul>
  15. 15. Interactive <ul><li>Methods in which the computer interacts with the human to expand the knowledge base </li></ul><ul><li>Classic example is Animals. </li></ul><ul><li>Another classic example is Teiresias 1 . </li></ul><ul><ul><li>Modified rules in Emycin by interacting with human </li></ul></ul><ul><ul><li>I conclude XXX. Is this the correct diagnosis? </li></ul></ul><ul><ul><ul><li>No </li></ul></ul></ul><ul><ul><li>I concluded XXX based on YYY and ZZZ. Is this rule correct, incorrect, or incomplete? </li></ul></ul><ul><ul><ul><li>Incomplete </li></ul></ul></ul><ul><ul><li>What additional tests should be added to the rule? </li></ul></ul><ul><li>1. B. Buchanan and E. Shortliffe, Rule-Based Expert Systems . Reading, MA: Addison-Wesley, 1984. </li></ul>
  16. 16. The inductive learning problem <ul><li>Extrapolate from a given set of examples to make accurate predictions about future examples </li></ul><ul><li>Concept learning or classification </li></ul><ul><ul><li>Given a set of examples of some concept/class/category, determine if a given example is an instance of the concept or not </li></ul></ul><ul><ul><li>If it is an instance, we call it a positive example </li></ul></ul><ul><ul><li>If it is not, it is called a negative example </li></ul></ul>
  17. 17. Inductive Learning Framework <ul><li>Representation must extract from possible observations a feature vector of relevant features for each example. </li></ul><ul><li>The number of attributes and values for the attributes are fixed (although values can be continuous). </li></ul><ul><li>Each example is represented as a specific feature vector. </li></ul><ul><li>Each example can be interpreted as a point in an n-dimensional feature space , where n is the number of attributes </li></ul><ul><li>Which features to include in the vector is a major question in developing an inductive learning system: </li></ul><ul><ul><li>They should be relevant to the prediction to be made </li></ul></ul><ul><ul><li>They should be (mostly) observable for every example </li></ul></ul><ul><ul><li>They should be as much as possible independent of one another </li></ul></ul>
  18. 18. Model spaces <ul><li>Decision trees </li></ul><ul><ul><li>Partition the instance space into axis-parallel regions, labeled with class value </li></ul></ul><ul><li>Nearest-neighbor classifiers </li></ul><ul><ul><li>Partition the instance space into regions defined by the centroid instances (or cluster of k instances) </li></ul></ul><ul><li>Associative rules (feature values -> class) </li></ul><ul><li>First-order logical rules </li></ul><ul><li>Bayesian networks (probabilistic dependencies of class on attributes) </li></ul><ul><li>Neural networks </li></ul>
  19. 19. Rule Induction <ul><li>Given </li></ul><ul><ul><li>Features </li></ul></ul><ul><ul><li>Training examples </li></ul></ul><ul><ul><li>Output for training examples </li></ul></ul><ul><li>Generate automatically a set of rules or a decision tree which will allow you to judge new objects </li></ul><ul><li>Basic approach is </li></ul><ul><ul><li>Combinations of features become antecedents or links </li></ul></ul><ul><ul><li>Examples become consequents or nodes </li></ul></ul>
  20. 20. Rule Induction Example <ul><li>Starting with 100 cases, 10 outcomes, 15 variables </li></ul><ul><li>Form 100 rules, each with 15 antecedents and one consequent. </li></ul><ul><li>Collapse rules. </li></ul><ul><li>Cancellations: If we have </li></ul><ul><ul><li>C, A => B and –C, A => B, collapse to A => B </li></ul></ul><ul><li>Drop Terms: </li></ul><ul><ul><li>D, E => F and D, G => F, collapse to D => F </li></ul></ul><ul><li>Test rules and undo collapse if performance gets worse </li></ul><ul><li>Additional heuristics for combining rules. </li></ul>
  21. 21. Rose Diagnosis R1: If not yellow leaves and wilted leaves and brown spots then fungus. … R6: If wilted leaves and yellow leaves and not brown spots then bugs N Y Y Bugs Y N Y Fungus Y N N Fungus N N Y Nutrition Y Y N Bugs Y Y N Fungus Brown Spots Wilted Leaves Yellow Leaves
  22. 22. Rose Diagnosis <ul><li>Cases 1 and 4 have opposite values for wilted leaves, so create new rule: </li></ul><ul><ul><li>R7: If not yellow leaves and brown spots then fungus. </li></ul></ul><ul><li>KB is rules. Learner is system collapsing and test rules. Critic is the test cases. Performer is rule-based inference. </li></ul><ul><li>Problems: </li></ul><ul><ul><li>Over-generalization </li></ul></ul><ul><ul><li>Irrelevance </li></ul></ul><ul><ul><li>Need data on all features for all training cases </li></ul></ul><ul><ul><li>Computationally painful. </li></ul></ul><ul><li>Useful if you have enough good training cases. </li></ul><ul><li>Output can be understood and modified by humans </li></ul>
  23. 23. Decision Tree Induction <ul><li>Very common data mining technique. </li></ul><ul><li>Given: </li></ul><ul><ul><li>Examples </li></ul></ul><ul><ul><li>Attributes </li></ul></ul><ul><ul><li>Goal (classification, typically) </li></ul></ul><ul><li>Pick “important” attribute: one which divides set cleanly. </li></ul><ul><li>Recur with subsets not yet classified. </li></ul>
  24. 24. ID3 <ul><li>A greedy algorithm for decision tree construction developed by Ross Quinlan, 1987 </li></ul><ul><li>Top-down construction of the decision tree by recursively selecting the “best attribute” to use at the current node in the tree </li></ul><ul><ul><li>Once the attribute is selected for the current node, generate children nodes, one for each possible value of the selected attribute </li></ul></ul><ul><ul><li>Partition the examples using the possible values of this attribute, and assign these subsets of the examples to the appropriate child node </li></ul></ul><ul><ul><li>Repeat for each child node until all examples associated with a node are either all positive or all negative </li></ul></ul>
  25. 25. Textbook restaurant domain <ul><li>Develop a decision tree to model the decision a patron makes when deciding whether or not to wait for a table at a restaurant </li></ul><ul><li>Two classes: wait, leave </li></ul><ul><li>Ten attributes: Alternative available? Bar in restaurant? Is it Friday? Are we hungry? How full is the restaurant? How expensive? Is it raining? Do we have a reservation? What type of restaurant is it? What’s the purported waiting time? </li></ul><ul><li>Training set of 12 examples </li></ul><ul><li>~ 7000 possible cases </li></ul>
  26. 26. A decision tree from introspection
  27. 27. A training set
  28. 28. ID3-induced decision tree
  29. 29. How well does it work? <ul><li>Many case studies have shown that decision trees are at least as accurate as human experts. </li></ul><ul><ul><li>A study for diagnosing breast cancer had humans correctly classifying the examples 65% of the time; the decision tree classified 72% correct </li></ul></ul><ul><ul><li>British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms that replaced an earlier rule-based expert system </li></ul></ul><ul><ul><li>Cessna designed an airplane flight controller using 90,000 examples and 20 attributes per example </li></ul></ul><ul><li>C4.5 is an extension of ID3 that accounts for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation, and so on </li></ul>
  30. 30. Evaluating Classifying Systems <ul><li>Standard methodology: </li></ul><ul><ul><li>1. Collect a large set of examples (all with correct classifications) </li></ul></ul><ul><ul><li>2. Randomly divide collection into two disjoint sets: training and test </li></ul></ul><ul><ul><li>3. Apply learning algorithm to training set </li></ul></ul><ul><ul><li>4. Measure performance with respect to test set </li></ul></ul><ul><li>Important: keep the training and test sets disjoint! </li></ul><ul><li>To study the efficiency and robustness of an algorithm, repeat steps 2-4 for different training sets and sizes of training sets </li></ul><ul><li>If you improve your algorithm, start again with step 1 to avoid evolving the algorithm to work well on just this collection </li></ul>
  31. 31. Summary: Decision tree learning <ul><li>Inducing decision trees is one of the most widely used learning methods in practice </li></ul><ul><li>Can out-perform human experts in many problems </li></ul><ul><li>Strengths include </li></ul><ul><ul><li>Fast </li></ul></ul><ul><ul><li>Simple to implement </li></ul></ul><ul><ul><li>Can convert result to a set of easily interpretable rules </li></ul></ul><ul><ul><li>Empirically valid in many commercial products </li></ul></ul><ul><ul><li>Handles noisy data </li></ul></ul><ul><li>Weaknesses include: </li></ul><ul><ul><li>Univariate splits/partitioning using only one attribute at a time so limits types of possible trees </li></ul></ul><ul><ul><li>Large decision trees may be hard to understand </li></ul></ul><ul><ul><li>Requires fixed-length feature vectors </li></ul></ul><ul><ul><li>Non-incremental (i.e., batch method) </li></ul></ul><ul><li>More detail on this in two weeks </li></ul>
  32. 32. Learning by Analogy: Case-based Reasoning <ul><li>Case-based systems are a significant chunk of AI in their own right. A case-based system has two major components: </li></ul><ul><ul><li>Case base </li></ul></ul><ul><ul><li>Problem solver </li></ul></ul><ul><li>The case base contains a growing set of cases, analogous to either a KB or a training set. </li></ul><ul><li>Problem solver has </li></ul><ul><ul><li>A case retriever and </li></ul></ul><ul><ul><li>A case reasoner. </li></ul></ul><ul><li>May also have a case installer. </li></ul>
  33. 33. Case-based Reasoning <ul><li>A case must be described in terms of a set of features. </li></ul><ul><li>Case-based reasoner </li></ul><ul><ul><li>Follows or matches case as far as possible </li></ul></ul><ul><ul><li>If that doesn’t lead to a solution, generalizes application of solution </li></ul></ul><ul><ul><li>Combines solutions or features from several retrieved cases </li></ul></ul><ul><li>Can operate without reasoner by returning all retrieved cases to user </li></ul>
  34. 34. Case-Based Retrieval <ul><li>Cases are described as a set of features </li></ul><ul><li>Retrieval uses methods such as </li></ul><ul><ul><li>Nearest neighbor: compare all features to all cases in KB and choose closest match </li></ul></ul><ul><ul><li>Indexed: compute and store some indices with each case and retrieve matching indices </li></ul></ul><ul><ul><li>Domain-based model clustering: CB is organized into a domain model; insertion is harder, but retrieval is easier. </li></ul></ul><ul><li>Example: “documents like this one” </li></ul><ul><ul><li>Features are the word frequencies in the document </li></ul></ul>
  35. 35. Case-based Reasoning <ul><li>Definition of relevant features is critical: </li></ul><ul><ul><li>Need to get the ones which influence outcomes </li></ul></ul><ul><ul><li>At the right level of granularity </li></ul></ul><ul><li>The reasoner can be a complex planning and what-if reasoning system, or a simple query for missing data. </li></ul><ul><li>Only really becomes a “learning” system if there is a case installer as well. </li></ul>
  36. 36. Unsupervised Learning <ul><li>Typically used to refer to clustering methods which don’t require training cases </li></ul><ul><ul><li>No prior definition of goal </li></ul></ul><ul><ul><li>Typical aim is “put similar things together” </li></ul></ul><ul><ul><ul><li>Document clustering </li></ul></ul></ul><ul><ul><ul><li>Recommender systems </li></ul></ul></ul><ul><ul><ul><li>Grouping inputs to a customer response system </li></ul></ul></ul><ul><li>Combinations of hand-modeled and automatic can work very well: Google News, for instance. </li></ul><ul><li>Still requires good feature set </li></ul>
  37. 37. REALLY Unsupervised Learning <ul><li>Turn the machine loose to learn on its own </li></ul><ul><li>Needs </li></ul><ul><ul><li>A representation. Still need some idea of what we are trying to learn! </li></ul></ul><ul><ul><li>Good natural language processing </li></ul></ul><ul><ul><li>A context </li></ul></ul><ul><ul><li>More AI than we have yet! </li></ul></ul><ul><li>People don’t learn very well unsupervised. </li></ul><ul><li>Currently some interesting research for instance-level knowledge. </li></ul><ul><li>Much harder to acquire structural or relational knowledge </li></ul>
  38. 38. More Aspects of Machine Learning <ul><li>Machine learning varies by degree of human intervention: </li></ul><ul><ul><li>Rote -- human builds KB. Cyc </li></ul></ul><ul><ul><li>Human assisted -- human adds knowledge directed by machine. Animals, Teiresias </li></ul></ul><ul><ul><li>Human scored -- human provides training cases. Neural nets, ID3, CART. </li></ul></ul><ul><ul><li>Completely automated. -- Nearest Neighbor, other statistical groupings, data mining. </li></ul></ul>
  39. 39. More Aspects of Machine Learning <ul><li>Machine Learning varies by degree of transparency </li></ul><ul><ul><li>Hand-built KBs are by definition clear to humans </li></ul></ul><ul><ul><li>Human-aided trees like Animals are also generally clear and meaningful, could easily be modified by humans </li></ul></ul><ul><ul><li>Inferred rules like ID3's are generally understood by humans but may not be intuitively obvious. Modifying them by hand may lead to worse results. </li></ul></ul><ul><ul><li>Systems like neural nets are typically black box: you can look at the functions and weights but it's hard to interpret them in any human-meaningful way and essentially impossible to modify them by hand. </li></ul></ul>
  40. 40. More Aspects of Machine Learning <ul><li>Machine learning varies by goal of the process </li></ul><ul><ul><li>Extend a knowledge base </li></ul></ul><ul><ul><li>Improve some kind of decision making, such as guessing an animal or classifying diseases. </li></ul></ul><ul><ul><li>Improve overall performance of a program, such as game playing </li></ul></ul><ul><ul><li>Organize large amounts of data </li></ul></ul><ul><ul><li>Find patterns or &quot;knowledge&quot; not previously known. Ultimately often still comes down to something actionable, but at one remove. </li></ul></ul>
  41. 41. The Future <ul><li>Where are we going with machine learning? </li></ul><ul><li>A couple of major factors having an impact </li></ul><ul><ul><li>DARPA </li></ul></ul><ul><ul><li>The Web </li></ul></ul>
  42. 42. Some Current DARPA Programs and Solicitations <ul><li>Learning Applied to Ground Robots (LAGR) . The goal of the LAGR program is to develop a new generation of learned perception and control algorithms for autonomous ground vehicles, and to integrate these learned algorithms with a highly capable robotic ground vehicle. </li></ul><ul><li>Personalized Assistant that Learns (PAL) The mission of the PAL program is to radically improve the way computers support humans by enabling systems that are cognitive, i.e., computer systems that can reason, learn from experience, be told what to do, explain what they are doing, reflect on their experience, and respond robustly to surprise . </li></ul><ul><li>Transfer Learning : The goal of the Transfer Learning Program is to develop, implement, demonstrate and evaluate theories, architectures, algorithms, methods, and techniques that enable computers to apply knowledge learned for a particular, original set of tasks to achieve superior performance on new, previously unseen tasks . </li></ul>
  43. 43. The Web <ul><li>Machine learning is one of those fields where the web is changing everything! </li></ul><ul><li>Three major factors </li></ul><ul><ul><li>One problematic aspect of machine learning research is finding enough data. </li></ul></ul><ul><ul><ul><li>This is NOT an issue on the web! </li></ul></ul></ul><ul><ul><li>Another problematic aspect is getting a critic </li></ul></ul><ul><ul><ul><li>Web offers a lot of opportunities </li></ul></ul></ul><ul><ul><li>A third is identifying good practical uses for machine learning </li></ul></ul><ul><ul><ul><li>Lots of online opportunities here </li></ul></ul></ul>
  44. 44. Finding Enough Data <ul><li>The web is an enormous repository of machine-readable data. What are some of the things we can we do with it? </li></ul><ul><ul><li>Learn instance knowledge. Searching for Common Sense , Matuszek et al, 2005. </li></ul></ul><ul><ul><li>Learn categories. Acquisition of Categorized Named Entities for Web Search , Pasca , CIKM’04, Washington, DC, 2004 . </li></ul></ul> 
  45. 45. Getting Critics <ul><li>People spend a lot of time on the web </li></ul><ul><li>The success of sites like Wikipedia is evidence that people are willing to volunteer time and effort </li></ul><ul><ul><li>The Open Mind </li></ul></ul><ul><ul><li>Learner </li></ul></ul><ul><ul><li>The ESP game </li></ul></ul><ul><ul><li>And more academically: ACM Spring 2005 Symposium: Knowledge Collection from Volunteer Contributors (KCVC) </li></ul></ul><ul><li>At another level of involvement: environments where AIs can interact with humans </li></ul><ul><ul><li>MUDs: Julia. </li></ul></ul><ul><ul><li>Chatbots: The Personality Forge </li></ul></ul><ul><ul><li>Online role-playing games: Genecys </li></ul></ul>
  46. 46. Online Uses for Machine Learning <ul><li>Improved search: learn from click-throughs. Google Personalized Search </li></ul><ul><li>Recommendations: learn from peoples’ opinions and choices. Recommendz </li></ul><ul><li>Online games. AIs add to the background but can’t be too static. </li></ul><ul><li>Better targeting for ads. More learning from click-throughs. </li></ul><ul><li>Customer Response Centers. Clustering, improved retrieval of responses. </li></ul>
  47. 47. Summary <ul><li>Valuable both because we want to understand how humans learn and because it improves computer systems </li></ul><ul><li>May learn representation or actions or both </li></ul><ul><li>Variety of methods, some knowledge-based and some statistical </li></ul><ul><li>Currently very active research area </li></ul><ul><li>Web is providing a lot of new opportunities </li></ul><ul><li>Still a long way to go </li></ul>