Denunciar

Compartir

•1 recomendación•202 vistas

Engineering Statistics.pdf Prepared by: Ass. Lec. Dilveen Hassan Omar Erbil Polytechnic University Erbil Technical Engineering College

•1 recomendación•202 vistas

Denunciar

Compartir

Engineering Statistics.pdf Prepared by: Ass. Lec. Dilveen Hassan Omar Erbil Polytechnic University Erbil Technical Engineering College

- 1. 0 2021-2022 Prepared by: Ass. Lec. Dilveen Hassan Omar 2021-2022 Engineering Statistics
- 2. 1 Chapter One Engineering Statistic Introduction for Statistic What is Statistics? Statistics is a concerned with scientific methods for collecting, organizing, summarizing, presenting and analyzing data, as well as drawing valid conclusions and reasonable decisions on the basis of such analysis. In the other words: is a science that helps us make decisions and draw conclusions in the presence of variability. For example, civil engineers working in the transportation field are concerned about the capacity of regional highway systems. A typical problem related to transportation would involve data regarding this specific system’s number of non-work, home-based trips, the number of persons per household, and the number of vehicles per household. The objective would be to produce a trip generation model relating trips to the number of persons per household and the number of vehicles per household. Types of Statistics 1. Descriptive Statistics Concerned with the organization, summation, and presentation of data. Or (it ways the collection and analysis of data and description to be meaningful format without dealing with the dissemination of the results).
- 3. 2 2. Inferential Statistics Value applies to create population or when data consist of a sample. Or (it specializes in analysis methods, interpretation and drawing conclusions based on a sample part of the society to reach decisions related the total community statistics and therefore the statistical inferential deals with the forecasting estimation and conclusions are in some cases, uncertainty is then measured using the probability). Difference between Descriptive and Inferential Statistics Basis for Comparison Descriptive Statistics Inferential Statistics Meaning Descriptive Statistics is that branch of statistics which is concerned with describing the population under study. Inferential Statistics is a type of statistics, that focuses on drawing conclusions about the population, on the basis of sample analysis and observation. What it does? Organize, analyze and present data in a meaningful way. Compares, test and predicts data. Form of final Result Charts, Graphs and Tables Probability Usage To describe a situation. To explain the chances of occurrence of an event. Function It explains the data, which is already known, to summarize sample. It attempts to reach the conclusion to learn about the population, that extends beyond the data available.
- 4. 3 Figure 1: Procedure of Work in Site Study for Each Sample Examples for understanding Statistics Who Are Those Speedy Drivers? Types of Statistics Descriptive Statistics Inferential Statistics Measure of Central Tendency Measure of Variability Mean Mode Median Range Variance Dispersion
- 5. 4 Does Prayer Lower Blood Pressure? Does Aspirin Reduce Heart Attack Rates? Does the Internet Increase Loneliness and Depression? Types of Variables Variable is a characteristic that differs from one individual to the next. 1. Categorical Variable: is a variable for which the raw data are group or category names that don’t necessarily have a logical ordering. Examples include eye color and country of residence. Hair color is also a categorical variable having a number of categories (blonde, brown, brunette, red, etc.), there is no agreed way to order these from highest to lowest. 2. Ordinal Variable: is a categorical variable for which the categories have a logical ordering or ranking. Examples include highest educational degree earned and T-shirt size (S, M, L, XL). Classify people into their education also can be ordered as elementary school, high school, some college, and college graduate.
- 6. 5 Figure 2: Procedure of Work in Site Study for Each Sample Classification of Data After making sure the integrity, clarity and accuracy of the data obtained. Begin the process these data are classification basis on phenomena, the data is sorted every phenomenon in the form of the group was classified on the basis of age, function, weight Aim and Objects Data Collection Previous Information Questionnaire Observation Experiment Interview Information Analysis Data Results of Analysis Calculation and Discussion Summary
- 7. 6 Figure 3: Data Classification Flow Chart Qualitative Data Data are measures and may be represented by a name, symbol, and number code. Quantitative Data Data are measures of values or counts and are expressed as numbers. 1. Nominal or Categorical Data Nominal data are used to label variables where there is no quantitative value and has no order. 1) Gender (Women, Men) 2) Hair color (Blonde, Brown, Brunette, Red, etc.) 3) Marital status (Married, Single, Widowed) 4) Ethnicity (Hispanic, Asian) 1. Discrete data Discrete data is a count that involves only integers. The discrete values cannot be subdivided into parts. 1) The number of students in a class. 2) The number of workers in a company. 3) The number of home runs in a baseball game. 4) The number of test questions you answered correctly Data Calcification (Type of Data) 2. Ordinal or Ranked data Ordinal data is almost the same as nominal data but not in the case of order as their categories can be ordered like 1st, 2nd, etc. 1) The first, second and third person in a competition. 2) Letter grades: A, B, C, and etc. 3) When a company asks a customer to rate the sales experience on a scale of 1-10. 4) Economic status: low, medium and high. 2. Continuous measurements Continuous data is information that could be meaningfully divided into finer levels. It can be measured on a scale or continuum and can have almost any numeric value. 1) The amount of time required to complete a project. 2) Height 3) Weight 4) Speed 5) Volume 6) Density
- 8. 7 Sample data: Have been collected when measurements have been taken from a subset of a population. Population data: Have been collected when all individuals in a population have been measured. Simple Random Sample: The most basic probability sampling plan is to use a simple random sample. To actually produce a simple random numbers. Figure 4: Population and Sampling
- 9. 8 Chapter Two Engineering Statistic Measurers of Location and Measures of Variation Measurers of Location (Central Tendency) When we work with numerical data, it seems apparent that in most set of data there is a tendency for the observed values to group themselves about same interior values; same central values seem to be the characteristics of the data. This phenomenon is referred to as central tendency. We shall consider the same more commonly used measures, namely mean, median, and mode. 1. Mean is the usual numerical average, calculated as the sum of the data values divided by the number of values. It is nearly universal to represent the mean of a sample with the symbol x, read as “x-bar.” 𝑋− = ∑𝑥𝑖 𝑛 If Frequency Distribution Data: 𝑋− = ∑𝑓𝑖𝑥𝑖 ∑ 𝑓𝑖 = ∑ 𝑓𝑥 𝑁 Example 1: Find mean for the following speed data on road: 60, 73, 85, 65, 90, 100, 60, 70, 95, 110, 75 km/hr X− = (60 + 73 + 85 + 65 + 90 + 100 + 60 + 70 + 95 + 110 + 75) 11 = 883 11 = 80.27 km/hr Example 2: Find the arithmetic mean of the numbers 5, 3, 6, 5, 4, 5, 2, 8, 6, 5, 4, 8, 3, 4, 5, 4, 8, 2, 5, 4. X− = (5 + 3 + 6 + 5 + 4 + 5 + 2 + 8 + 6 + 5 + 4 + 8 + 3 + 4 + 5 + 4 + 8 + 2 + 5 + 4) 20 = 96 20 = 4.8 X− = (6 ∗ 5) + (2 ∗ 3) + (2 ∗ 6) + (5 ∗ 4) + (2 ∗ 2) + (3 ∗ 8) 6 + 2 + 2 + 5 + 2 + 3 = 96 20 = 4.8
- 10. 9 Example 3: Use the frequency distribution of masses in the table below to find the mean mass of the 100 male students at XYZ University. Mass (kg) Class Mark (X) Frequency (f) fx 60 – 62 61 5 305 63 – 65 64 18 1152 66 – 68 67 42 2814 69 – 71 70 27 1890 72 – 74 73 8 584 N = ∑f = 100 ∑fx = 6745 X− = 6745 100 = 67.45 kg 2. Median is the middle data value for an odd number of observations, after the sample has been ordered from smallest to largest. It is the average of the middle two values, in an ordered sample, for even number of observations. Example 4: Find median for the previous Example: 60, 73, 85, 65, 90, 100, 60, 70, 95, 110, 75 km/hr 60, 60, 65, 70, 73,75, 85, 90, 95, 100, 110 (from smaller to larger) 110, 100, 95, 90, 85, 75, 73, 70, 65, 60, 60 (from larger to smaller) Example 5: Median and Mean Quiz Scores Suppose that scores on a quiz for n = 8 students in a class are 91, 79, 60, 94, 89, 93, 86, 95 60, 79, 86, 89, 91, 93, 94, 95 Median = 89 + 91 2 = 90
- 11. 10 3. Mode the most common data point. There may be multiple modes in a distribution. If no number in the list is repeated, then there is no mode for the list. Example 6: Find mode for the following speed data sets on road: 60, 73, 85, 65, 90, 100, 60, 70, 95, 110, 75 km/hr 60, 60, 65, 70, 73,75, 85, 90, 95, 100, 110 60 = 2 65 = 1 70 = 1 73 = 1 75 = 1 85 = 1 90 = 1 95 = 1 100 = 1 110 = 1 If 60, 60, 65, 70, 75,75, 85, 90, 95, 100, 110, 110 60 = 2 65 = 1 70 = 1 75 = 2 85 = 1 90 = 1 95 = 1 100 = 1 110 = 2 Mode = 65 Mode = 60 Mode = 110 Mode = 75
- 12. 11 Measures of variation (Dispersion) Is the most importance measured in statistical inference. 1. Range is the difference between the lowest and highest values. Range = high value _ low value 2. Variance how far a data set is spread out. It is mathematically defined as the average of the squared differences from the mean. 𝑆2 = ∑(𝑥𝑖 − 𝑥−)2 𝑛 − 1 3. Standard Deviation the average distance that values fall from the mean. Put another way, it measures variability by summarizing how far individual data values are from the mean. 𝑆 = √ ∑(𝑥𝑖 − 𝑥−)2 𝑛 − 1 If Frequency Distribution Data: 𝑆 = √ ∑(𝑥𝑖 − 𝑥−)2 ∗ 𝑓 (∑ 𝑓) − 1 For instance, the standard deviations for the following two sets of numbers, both with a mean of 100:
- 13. 12 Set Numbers Mean Standard Deviation 1 100, 100, 100, 100,100 100 0 2 90, 90, 100, 110, 110 100 10 Note: 1. For Population standard deviation use N or n only instead of n – 1. 2. For sample standard deviation use n – 1. Table (1-2) Symbol Population & Sample Subject Population Sample Size µ x¯ Mean ɓ S Standard Deviation ɓ² S² Variance N n 4. Coefficient of Variation (CV) is a statistical measure of the dispersion of data points in a data series around the mean. The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from one another. CV = Standard Deviation mean = S x− Example 7: Find range, variance, standard deviation, and Coefficient of Variation for the sample bellow. 60, 73, 85, 65, 90, 100, 60, 70, 95, 110, 75 km/hr 60, 60, 65, 70, 73, 75, 85, 90, 95, 100,110
- 14. 13 1. Range Range = 110 _ 60 = 50 km/hr 2. Variance S2 = ∑(xi − x−)2 n − 1 = 2848.18182 11 − 1 = 2848.181 Number x (𝑥𝑖 − 𝑥−) (𝑥𝑖 − 𝑥−)2 1 60 -20.272 410.983 2 60 -20.272 410.983 3 65 -15.272 233.256 4 70 -10.272 105.528 5 73 -7.272 52.892 6 75 -5.272 27.801 7 85 4.727 22.347 8 90 9.727 94.619 9 95 14.727 216.892 10 100 19.727 389.165 11 110 29.727 883.710 80.273 7.10543E-14 2848.181 3. Standard Deviation 𝑆 = √ ∑(𝑥𝑖 − 𝑥−)2 𝑛 − 1 = √ 2848.18182 11 − 1 = 16.876 4.Coefficient of Variation 𝐶𝑉 = 𝑆 𝑥− = 16.876 80.273 = 0.21
- 15. 14 Example 8: Table below shows the average speeds of 480 vehicles that passes through a segment of road. Find the standard deviation of vehicle’s speed. Class Speed x (km/hr) 70 74 78 82 86 90 94 98 102 106 110 114 118 122 120 Frequency f 4 9 16 28 45 66 85 72 54 38 27 18 11 5 2 Class Speed x (km/hr) Frequency f X – X– (X – X–)2 (X – X–)2*f 70 4 -27.93 780.27 3121.08 74 9 -23.93 572.80 5155.24 78 16 -19.93 397.34 6357.40 82 28 -15.93 253.87 7108.39 86 45 -11.93 142.40 6408.20 90 66 -7.93 62.94 4153.89 94 85 -3.93 15.47 1315.04 98 72 0.07 0.00 0.32 102 54 4.07 16.54 893.04 106 38 8.07 65.07 2472.70 110 27 12.07 145.60 3931.32 114 18 16.07 258.14 4646.48 118 11 20.07 402.67 4429.38 122 5 24.07 579.20 2896.02 125 2 27.07 732.60 1465.21 97.93 480 54353.73 𝑆 = √ 54353.73 480 − 1 = 10.65 𝑘𝑚/ℎ𝑟
- 16. 15 Procedure by SPSS Microsoft Excel Data – data analysis – regression – input and output data – ok Variable view – name (speed) – type (numerical) – measure (scale) - Analysis – descriptive statistics – frequencies – variable (speed) – statistics – (mean, median, mod, max., min., range, SD, SE, variance) - Continue – ok Statistics speed N Valid 11 Missing 0 Mean 56.6364 Std. Error of Mean .67787 Median 57.0000 Mode 54.00 Std. Deviation 2.24823 Variance 5.055 Skewness .233 Std. Error of Skewness .661 Range6.00 Minimum 54.00 Maximum 60.00
- 17. 16 Chapter Three Engineering Statistic Frequency Distribution and Graph Frequency Distribution How often something happened. The frequency of an observation tells you the number of times the observation occurs in the data. 1. Total Range Total Range = (max. value – min. value) or (max. value – min. value +1) (Add 1 if this value is a whole number) 2. Number of Class Number of class = 2.5√n 4 If give you the number of class use the value that give you. If tell you determine the Number of class (use equation Number of class = 2.5 √𝑛 4 ) If don’t tell and not give you so use (5-20) classes. 3. Length of Class Length of class = Total Range Number of class 4. Lower and upper boundary of class (lower limit - 0.5) = lower class boundaries (lower limit + 0.5) = upper class boundaries 5. Central of class Central of class = Lower Limit + Upper Limit 2 6. Relative Frequency Distribution
- 18. 17 Relative Frequency Distribution = fi ∗ = fn n ∗ 100 Example 1: The following data represent the driving speeds for men, find the frequency distribution for the data. Speed (km/hr) 55 60 80 80 80 80 85 85 85 85 90 90 90 90 90 92 94 95 95 95 90 90 90 90 90 92 94 95 95 95 95 95 95 100 100100 100 100 100 100 100 100 101 102 105 105 105 105 105 105 105 105 109 110 110 110 110 110 110 110 110 110 110 110 112 115 115 115 115 115 115 120 120 120 120 120 120 120 120 124 125 125 125 125 125 125 130 130 140 140 140 140 145 150 Total range = 150 – 55 = 95 or 150 – 55 + 1 = 96 Number of class = 2.5√94 4 = 7.78 ≅ 8 Length of class = 96 8 = 12 Class Frequency f Class boundaries Central of class Relative Frequency Distribution fi ∗ % 55 - 67 ll 2 54.5 – 67.5 61 2.128 67 - 79 0 66.5 – 79.5 73 0.000 79 - 91 lll 18 78.5 – 91.5 85 19.149 91 - 103 llll 24 90.5 – 103.5 97 25.532 103 - 115 l 21 102.5 – 115.5 109 22.340 115 - 127 l 21 114.5 – 127.5 121 22.340 127 - 139 ll 2 126.5 – 139.5 133 2.128 151 - 139 l 6 138.5 – 151.5 145 6.383 ∑ 94 100 Example 2:
- 19. 18 The results of compressive strength of concrete cubes with water cement ratio 0.38 – 0.60 are shown as table below: Compressive Strength (N/mm2) 38.2 30.8 29.3 23.1 35.9 32.7 24.2 22.4 35.9 35.4 27.6 24.9 34.7 32.5 31.3 27.6 29.5 28.4 26.8 22.4 The largest Compressive Strength = 38.2 (N/mm2 ) The smallest Compressive Strength = 22.4 (N/mm2 ) Total range = 15.8 (N/mm2 ) If 5 class intervals are used, the class interval size is = Total 𝑟𝑎𝑛𝑔𝑒 𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 15.8 5 = 3.16 If 20 class intervals are used, the class interval size is = Total 𝑟𝑎𝑛𝑔𝑒 𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 15.8 20 = 0.79 Use class interval size = 2 (N/mm2 ) Class Frequency f Class boundaries Central of class Relative Frequency Distribution fi ∗ % 22 - 24 3 21.5 – 24.5 23 0.15 24 - 26 2 23.5 – 26.5 25 0.10 26 - 28 3 25.5 – 28.5 27 0.15 28 - 30 3 27.5 – 30.5 29 0.15 30 - 32 2 29.5 – 32.5 31 0.10 32 - 34 2 31.5 – 34.5 33 0.10 34 - 36 4 33.5 – 36.5 35 0.20 36 - 38 0 35.5 – 38.5 37 0.00 38 - 40 1 37.5 – 40.5 39 0.05 ∑ 20 1.00
- 20. 19 Graphical Presentation There are many ways to graph data such as histogram, frequency polygon, cumulative frequency curve, bar chart, pie chart…. 1. Histogram Is a bar chart of a quantitative variable that shows how many values are in various intervals of the data. Example 3: A random survey is done on the number of children belonging to different age groups who play in government parks and the information is tabulated in the table given below. A. Draw a histogram representing the data. B. Identify the number of children belonging to the age groups 2, 4, 6, and 8 who play in government parks. Age (in year) Frequency 0 – 2 8 2 – 4 10 4 – 6 18 4 – 8 10 8 – 10 12 10 – 12 6 Bin Frequency 2 8 4 10 6 18 8 10 10 12 12 6
- 21. 20 2. Frequency Polygon another method to represent frequency distribution graphically is by a frequency polygon. Frequency Polygon is a graph constructed by using lines to join the midpoints of each interval or bin. Example 4: In a city, the weekly observations made in a study on the cost of a living index are given in the following table: Draw a frequency polygon for the data below. Cost of Living Index Number of weeks 140 – 150 2 150 – 160 8 160 – 170 14 170 – 180 20 180 – 190 10 190 – 200 6 0 10 20 30 67 79 91 103 115 127 139 Frequency Bin Histogram
- 22. 21 Class Interval Midclass Number of weeks 130 – 140 135 0 140 – 150 145 2 150 – 160 155 8 160 – 170 165 14 170 – 180 175 20 180 – 190 185 10 190 – 200 195 6 200 – 210 205 0 3. Cumulative Frequency is the sum of all the previous frequencies up to the current point. It is often referred to as the running total of the frequencies. Example 5: The marks obtained by 40 in an examination are given below: 0 5 10 15 20 25 135 145 155 165 175 185 195 205 Frequency Midclass Frequency Polygon
- 23. 22 27,18,15,21,48,25,49,29,27, students 21,19,45,14,34,37,34,23,45,24,42,8,47,22, 31,17,13,38,26,3,34,29,11,22,7,15,24,38,31,21,35 Class Interval Frequency Cumulative Frequency (%) 0 – 8 3 7.5 8 – 16 5 20 16 – 24 11 47.5 24 – 32 8 67.5 32 – 40 7 85 40 – 48 5 97.5 48 – 56 1 100 ∑ 40 Example 6: Fastest driving speeds (km/hr) for men here are the 24 male’s responses to the equation about how fast they have driven a car, except 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 8 16 24 32 40 48 56 Cumulative Frequency Class S-Curve
- 24. 23 now the data are in numerical order. To make them easier to count, the data are arranged in rows of following column: Fastest driving speeds (km/hr) 75 93 80 55 87 84 56 62 67 87 68 87 91 76 59 95 83 84 78 80 87 82 75 70 1) Find the median and mode. 2) Draw the histogram, if length of class equal to 8. 3) Draw the s- curve, if length of class equal to 8. 1) Median (95-93-91-87-87-87-87-84-84-83-82-80-80-78-76-75-75-70-68-67- 62-59-56-55) Median = 80+80 2 = 80 km/hr Mode (87 = 4, 84 = 2, 80 = 2, 75 = 2) Mode = 87 km/hr 2) & 3) Length of classes = 8 Range of classes Freq. Frequency, % Cumulative frequency, % 54 – 62 3 12.5 12.5 62 – 70 3 12.5 25 70 – 78 4 16.67 41.67 78 – 86 7 29.16 70.83 86 – 94 6 25 95.83 94 – 102 1 4.17 100
- 25. 24 4. Pie Chart It Is a geometric shape which represents the sectors inside the circle so that the total surface area of a circle represents the sectors. Must determine the angle of each it. sector angle = number of frequency total frequency ∗ 360 Example 7: The following table shows the numbers of hours spent by a child on different events on a working day. Activity No. of hours school 6 sleep 8 playing 2 study 4 T.v. 1 others 3 0 1 2 3 4 5 6 7 62 70 78 86 94 102 Frequency Speed, km/hr
- 26. 25 No. Activity No. of hours Measure of central angles 1 school 6 (6/24)×360º = 90º 2 sleep 8 (8/24)×360º = 120º 3 playing 2 (2/24)×360º = 30º 4 study 4 (4/24)×360º = 60º 5 T.v. 1 (1/24)×360º = 15º 6 others 3 (3/24)×360º = 45º Example 8: The following table shows the numbers of hours spent by students on different events on a working day. Activity University Sleep Study T.v. others Percent of hours 25% 15% 30% 20% 10% 25% 33% 8% 17% 4% 13% Pie Chart school sleep playing study T.v. others
- 27. 26 5. Bar chart It Is a set of rectangular vertical or horizontal rules are equal on the horizontal axis represent class or years or attributes (month, year, city ....) and on the vertical axis is frequency. Example 9: The vehicular traffic at a busy road crossing in a particular place was recorded on a particular day from 6 am to 2 pm and the data was rounded on to the nearest tens. Time in hours 6-7 7-8 8-9 9-10 10-11 11-12 12-1 1-2 Number of vehicles 100 450 1250 1050 750 600 550 200 25% 15% 30% 20% 10% University Sleep Study T.v. others
- 28. 27 Example 9: Create bar chat for the data below: Heating Cooling Water Heating Appliances Lighting Electronics Other 29% 17% 14% 13% 12% 4% 11% 0 200 400 600 800 1000 1200 1400 6,7 7,8 8,9 9,10 11, 10 12, 11 13, 12 14, 13 Number of Vehicles Time in hours from 6 am to 2 pm Bar Chart 0% 5% 10% 15% 20% 25% 30% 35% Heating Cooling Water Heating Appliances Lighting Electronics Other Percent Used % House Hold Consumption Bar Chart
- 29. 28 Example 10: Relation between population and car with different years. The number of population and vehicles has been rapidly increasing with time, for example the population increased from 730085 in 2009 to 856633 in 2015, and the number of vehicles increased from 279713 in 2009 to 642010 in 2015. These results correspond to an annual average increase in vehicles by 16.37 percentages and an average increase of population 2.70 percentage; Figure shows the population for the Erbil City from 730085 in 2009 to 856633 in 2015 with the percentage of car in same years. Home work: Determining spot density (km/pc/ln) characteristics from a set of density data mentioned data collected on an urban road during a density study below: so determine all of them for input data: 1. The measurers of location of density 2. Frequency polygon 0 100000 200000 300000 400000 500000 600000 700000 660000 680000 700000 720000 740000 760000 780000 800000 820000 840000 860000 880000 2009 2010 2011 2012 2013 2014 2015 Number of vehicles Population Year Line Chart population number of cars (all types)
- 30. 29 If the frequency of polygon if length (range) of class equal to 6 (normal or not normal distribution). Spot Density (km/pc/ln) 51.92 31.72 27.93 36.83 78.95 34.21 37.56 42.43 58.71 38.02 31.6 42.59 54 38.63 43.48 39.68 37.83 41.48 46.04 33.02 30.74 39.92 57.7 30.19 37.42 36.32 68.58 35.65 53.58 33.65 49.93 37.57 48.55 43.32 42.81 40.06 42.28 37.53 84.7 33.59 Microsoft Excel Insert – Charts – Select the chat that you want – ok
- 31. 30 Chapter Four Engineering Statistic Probability Probability If an event can occur in N mutually exclusive and equally likely ways and if m of these posses characteristic E, the probability of the occurrence of E is equal to (m/N) If we read P(E) as the probability of E we may express the above definition as P(E) = m/ N 1. Every day the sun rises in the east. 2. It is possible to live without water. 3. Probably Arun gets that job. Basic Concepts of Probability: Sample space (S): the set of possible outcomes of an experiment. Event (E): a subset of a sample. Union (∪): is the set of outcomes that belong either to A, to B, or to both (A or B). Intersection (∩): is the set of outcomes that belong both to A and to B (A and B). Complements (dash line over the parameters) (Ac ): is the set of outcomes that do not belong to A. A means not A.
- 32. 31 Example 1: An electrical engineer has on hand two boxes of resistors, with four resistors in each box. The resistors in the first box are labeled 10 (ohms), but in fact their resistances are 9, 10, 11, and 12 (ohms). The resistors in the second box are labeled 20 (ohms), but in fact their resistances are 18, 19, 20, and 21 (ohms). The engineer chooses one resistor from each box and determines the resistance of each. Let A be the event has a resistor greater than 10. Let B be the event has a resistor less than 19. Let C be the event that the sum of the resistance to 28. S = (9, 18), (9, 19), (9, 20), (9, 21), (10, 18), (10, 19), (10, 20), (10, 21), (11, 18), (11, 19), (11, 20), (11, 21), (12, 18), (12, 19), (12, 20), (12, 21). The events A, B, and C are given by: A = {(11, 18), (11, 19), (11, 20), (11, 21), (12, 18), (12, 19), (12, 20), (12, 21)}
- 33. 32 B = {(9, 18), (10, 18), (11, 18), (12, 18)} C = {(9, 19), (10, 18)} Example 2: Refer to Example 1, Find B∪C and 𝐴 ∩ 𝐵𝑐 Solution: B∪C = {(9,18),(10,18),(11,18),(12,18),(9,19)} A ∩ Bc = {(11,19),(11,20),(11,21),(12,19),(12,20),(12,21)} Elementary Properties of Probability: 1. P(Ei) > 0 2. P(E1) + P(E2) + P(E3) + … ……..+ p(En) = 1 3. P(Ei or Ej) = P(Ei) + P(Ej) Probabilities Each event in a sample space has a probability of occurring. P(A): denotes the probability that the event A occurs. The Axioms of probability: Axioms 1: Let S be a sample space. Then P(S) = 1. Axioms 2 = for any event A, 0 ≤ P(A) ≤ 1. Axioms 3 = if A and B are mutually exclusive events, then P (A ∪ B) = P (A) + P (B). P (A1 ∪ A2 ∪ … ) = P (A1) + P (A2) + ⋯ 1. For any event A: P(A−) = 1 − P(A) 2. Let Ø denote the empty set. Then: P(∅) = 0
- 34. 33 Example 3: The following table presents probabilities for the number of times that a certain computer system will crash in the course of a week. Let A be the event that there are more than two crashes during the week. Let B be the event that their system crashes at least once. Find a sample space. Then find the subsets of the sample space that correspond to the event A and B. then find P(A) and P(B). Number of Crashes Probability 0 0.60 1 0.30 2 0.05 3 0.04 4 0.01 S = {0, 1, 2, 3, 4} A = {3, 4} B = {1, 2, 3, 4} P(A) = P(3 crashes or 4 crashes) P(A) = P(3 crashes)P (4 crashes) P(A) = 0.04 + 0.01 = 0.05 P(B) = P(1 crash) + P(2 crash) + P(3 crash) + P(4 crash) P(B) = 0.30 + 0.05 + 0.04 + 0.01 = 0.40 Or P(B) = 1 − P(Bc) P(B) = 1 − 0.60 = 0.40
- 35. 34 General Rules in Probability Theory: 1.The Addition Rule Case 1: if A and B are mutually exclusive events P(A U B) = P (A) + P (B) P(A or B) = P (A) + P (B) Pr(E1 or E2 or. . . or Ek) = Pr(E1) + Pr(E2) ......+ Pr(Ek) = ∑pr(Ei) Case 2: if A and B are not mutually exclusive events P(A U B) = P (A) + P (B) – P(A Ո B) P(A or B) = P (A) + P (B) – P(A Ո B) Pr(E1 or E2 ) = Pr(E1) + Pr( E2) – Pr(E1E2) 2.The Multiplication Rule Pr(E1 and E2) = Pr(E1) * pr(E2) Pr(E1 and E2 ) = Pr(E1*E2) = Pr(E1) * Pr(E2/ E1) Example 4: Let the sample space S = {x: 0<x<1} If A = {x: 0<x< 1 2 } and B = {x: 1 2 ≤ x < 1} Find P(B) if P(A) = 1 4 ∵ P (A) = 1 4 , P(S) = 1 S = A ⋃ B P(S) = P(A ⋃ B) = P(A)+P(B) 1 = 1 4 + P(B) → P(B) = 3 4 Example 5: The probability that road A will failure within the next 20 years is 0.025 and that road B will Failure within the next 20 years is 0.030, what is the probability that: S A 1 2 B 0 1
- 36. 35 a. Both roads A and B will failure within the next 20 years? b. That road A will failure and B will not failure? c. That neither road A nor B will failure? a. P(A⋂ B) = P(A) . P(B) = 0.025 × 0.030 = 0.000750 b. P(A) = 0.025 P(BC ) = 1 - P(B) = 1 - 0.030 = 0.970 P(A⋂ BC ) = p(A) . p(BC ) = (0.025) . (0.970) = 0.02425 c. P(A) = 0.025 P(AC ) = 0.975 = 1 - P(A) P(AC ⋂ BC ) = P(AC ) × P(BC ) = (0.970) × (0.975) = 0.94575 Example 6: The sample space S = A ⋃ B, P(A) = 0.8 and P(B) = 0.5, find P(A⋂ B) P(S) = P(A ⋃ B) = P(A) + P(B) - P(A⋂ B) 1 = 0.8 +0.5 - P(A⋂ B) P(A⋂ B) = 1.3 -1 = 0. Example 7: Let the subset A = {x: 1 4 < x < 1 2 } And B = {x: 1 2 < x <1 } of the sample space. S = {x: 0 < x <1} such that: P(A) = 1 8 and P(B) = 1 2 Find, (1) P(A ⋃ B) , (2) P(AC ) and (3) P(AC ⋂ BC ) 1. P(𝐴 ⋃ 𝐵) = P(A) + P (B) = 1 8 + 1 2 = 5 8 S A B 0 1 4 1 2 1 B1
- 37. 36 2. P(AC ) = 1 - P(A) = 1 − 1 8 = 7 8 3. P(BC ) = 1 - P(B) = 1 − 1 2 = 1 2 P(𝐴𝐶 ⋂ 𝐵𝐶 ) = P(AC ) . P(BC ) = 7 8 . 1 2 = 7 16 P (𝐴 ⋃ 𝐵)C = 1 -P(𝐴 ⋃ 𝐵)= 1 − 5 8 = 8−5 8 = 3 8 Combinations It is an arrangement of objects without order. We write the number of combinations of n things taken r at a time as , 𝐂𝐫 𝐧 Cr n = n! (n − r)! r! 0 ≤ r ≤ n Example 8: How many committees of 3 can be taken from 8 people? C3 8 = 8! 3! (8 − 3)! = 8.7.6.5! (3.2.1).5! = 56 Permutation A permutation of a number of objects is any arrangement of these objects in a definite order. nPr = n! (n − r)! Factorial Given the positive integer n. The product of all whole numbers from n down through 1 is called n factorial and is written n!. 10! = 10.9.8.7.6.5.4.3.2.1 5! = 5.4.3.2.1 n! = n(n-1) (n-2) (n-3) …….1
- 38. 37 0! = 1 Example 9: In how many ways can 5 differently colored marbles be arranged in a row. Number of arrangements of n different objects in a row = 5! = 5.4.3.2.1 = 120 Example 10: It is required to seat 5 men and 4 women in a row so that the women occupy the even place. How many such arrangements are possible? 5P5 x 4P4 = 5! x 4! = (120) (24) = 2880 Example 11: How many (2) objects number can be repeated? 7 , 5 , 4 , 2 4 4 77 , 55 , 44 , 22 Example 12: In How many ways can set of balls be selected from 8 white and 6 red balls such that there will be 3 white and 2 red? C3 8 . C2 6 = 8! 3! .5! . 6! 2! .4! = 840 ways Example 13: In how many ways can six different book be arrangement on a shelf? n! = 6.5.4.3.2.1 =720 Example 14: In how many possible arrangements can be formed from the letters A, B, C. ABC , ACB , BAC, BCA , CAB , CBA
- 39. 38 n! = 3! = 3.2.1 = 6 Note: AB is the same combination as BA, but not the same permutation. Example 15: Let 2 items be chosen at random containing 12 items of which 4 are defective. Let A = {both items are defective} and Let B = {both items are non-defective} Find P(A) and P(B) C2 12 = 12! 2! (12 − 2)! . 12.11.10! (2.1).10! = 66 C2 4 = 4! 2! .2! = 4.3.2! (2.1).2! = 6 C2 8 = 8! 2! .6! = 8.7.6 ! 2! .6! = 28 P(A) = 6 66 = 1 11 P(B) = 28 66 = 14 33 Example 16: A ball is drawn at random from a box containing 6 red, 4 white balls and 5 blue balls, determine the probability that the ball drawn is a. red b. white c. blue d. not red e. red or white Pr(R) = 6 6 + 5 + 4 = 6 15 Pr(W) = 4 6 + 4 + 5 = 4 15
- 40. 39 Pr(B) = 5 6 + 4 + 5 = 5 15 Pr(Rc) = 1 − Pr(R) = 1 − 6 15 = 15 − 6 15 = 9 15 Pr(R + W) = 6 + 4 6 + 4 + 5 = 10 15 Example 17: How many arrangements can be made of the letter of the world MISSISSIPPI taken all together? M:n1 = 1 n! n1! . n2! . n3! . n4! = 11! 1! . 4! . 4! . 2! I:n2 = 4 11.10.9.8.7.6.5.4! 1.4.3.2.1.4! .2.1 = 34650 S:n3 = 4 P:n4 = 2 Example 18: If repetitions are not allowed: 1. How many (3) digits number can be formed from the six digits 2, 3, 4, 5, 6, 7 6.5.4 = 120 numbers P3 6 = 6! (6 − 3)! = 6.5.4.3.2.1 3.2.1 = 120 Example 19: A student is to answer 8 out of 10 questions in an exam. 1. How many choices has he? 2. How many, if he must answer exactly the first 3 questions. 3. How many if he must answer at least 4 of the first 5 questions?
- 41. 40 (1). C8 10 = 10! 8! (10 − 8)! = 10.9.8! 8! .2.1 = 45 (2). C5 7 = 7! 5! .2! = 7.6.5! 5! .2.1 = 21 (3). C5 5 . C3 5 = 5! 3! . 2! = 5.4.3! 3! .2.1 = 10 C4 5 = 5! 3! .1 = 5.4! 4! .1 = 5 Note: C5 5 . C3 5 . C4 5 . C4 5 Example 20: A class contain 9 boys and 3 girls: 1. In how many ways can the teacher choose a committee of 4? 2. How many of them will contain of least one girl? (1) C4 12 = 12! 4! .8! = 12.11.10.9.8! 4.3.2.1.8! = 495 (2) C1 3 . C3 9 + C2 3 . C2 9 + C3 3 . C1 9 = 3! 1! . 2! . 9! 3! + 3! 2! . 1! . 9! 2! .7! + 3! 3! . 0! . 9! 1! .8! = 369 Example 21: Out of 5 mathematicians and 7 physicists, a committee consisting of 2 mathematicians and 3 physicists is to be formed. In how many ways can this be done if (a) any mathematician and any physicist can be included, (b) one particular physicist must be on the committee, (c) two particular mathematicians cannot be on the committee? (a)2 mathematicians Out of 5 can be selected in C2 5 ways. 3 physicists Out of 7 can be selected in C3 7 ways.
- 42. 41 Total number of possible selections = C2 5 ∙ C3 7 = 5! (5−2)!2! ∗ 7! (7−3)!3! = 10 ∗ 35 = 350 ways (b) 2 mathematicians Out of 5 can be selected in C2 5 ways. 2 additional physicists Out of 6 can be selected in C2 6 ways. Total number of possible selections = C2 5 ∙ C2 6 = 5! (5−2)!2! ∗ 6! (6−2)!2! = 10 ∗ 15 = 150 ways (c) 2 mathematicians Out of 3 can be selected in C2 3 ways. 3 physicists Out of 7 can be selected in C3 7 ways. Total number of possible selections = C2 3 ∙ C3 7 = 3! (3−2)!2! ∗ 7! (7−3)!3! = 3 ∗ 35 = 105 ways Example 22: In how many ways can a committees consisting of 3 man and 2 women be chosen from 7 men and 5 women? C3 7 . C3 5 = 7! 3! (7 − 3)! . 5! 2! (5 − 2)! = 7.6.5.4! 3.2.1.4! = 5.4.3! 2.1.3! = (35)(10) =350 Example 23: A woman has (11) close friends: a. In how many ways can she invites (5) of them to dinner? b. In how many ways if two of the friends are married and will not attend separately? c. In how many ways if two of them are not speaking terms and will not attend together? (a) C5 11 = 462 (b) C5 9 + C3 9 = 210 C0 2 . C5 9 (c) C5 9 + 2C4 9 = 378 C1 2 . C4 9
- 43. 42 Example 24: A student is to answer (10) out of 13 questions on an exam? (1) How many choices has been? (2) How many if he must answer the first two questions. (3) How many if he must answer the first or second questions but not both? (4) How many if he must answer exactly 3 of the first 5 questions. (5) How many if he must answer at least 3 of the first 5 questions. (1) C10 13 = 286 (2) C2 2 . C8 11 = 165 (3) C1 2 . C9 11 + C9 11 . C1 2 = 2C9 11 = 110 (4) C3 5 . C7 8 = 80 (5) C3 5 . C7 8 + C4 5 . C6 8 + C5 5 . C5 8 = 276 Example 25: How many possible permutations can be formed from the word (SUCCESS)? n! n1! . n2! . n3! . n4! . n5! = 4! 3! . 1! . 2! . 1! = 4! 3! . 2! = 4 . 3! 3! . 2! = 2 Example 26: A class contain 9 boys and 3 girls: (1)In how many ways can the teacher choose committee of 4. (2)How many of them will contain at least one girl? (3)How many of them will contain exactly one girl?
- 44. 43 (1) 9 + 3 =12 C4 12 = 12! 4! .8! = 12.11.10.9.8! 4.3.2.1.8! = 495 (2) C1 3 . C3 9 + C2 3 . C2 9 + C3 3 . C1 9 3! 1! . 2! . 9! 3! + 3! 2!. 1! . 9! 2! . 7! + 3! 3! . 0! . 9! 1!. 8! = 369 (3) C1 3 . C3 9 = 3! 1!.2! . 9! 3!.6! = (3)(84) = 252 Example 27: In how many ways can a teacher choose one or more students from six students? Cr n = Cn−r n P(x = 6) = C1 6 + C2 6 + C3 6 + C4 6 + C5 6 + C6 6 C1 6 = 6! 1! . 5! = 6.5! 1 . 5! = 6 C5 6 = 6 C2 6 = 6! 2! . 4! = 6.5 .4! 2 .1 . 4! = 15 C4 6 = 15 C3 6 = 6! 3! . 3! = 6.5 .4.3! 3 . 2 .1 . 3! = 20 C6 6 = 6! 6! .0! = 1 6 + 15 + 20 + 15 + 6 + 1 = 63 ways Example 28: In how many ways can a set of balls be selected from 8 white and 6 red balls such that there will be 3 white and 2 red. C3 8 . C2 6 = 8! 3! . 5! . 6! 2! . 4! = 8.7.6.5! 3.2.1.5! = 6.5.4! 2.1.4! = (56)(15) = 840 ways Example 29: What is the number of ways in which 6 persons can be seated in a circle table?
- 45. 44 (n-1)! = (6-1)! = 5! =5.4.3.2.1 =120 Example 30: In how many ways can six different books be arrangement on a shelf? n! = 6.5.4.3.2.1=720 Example 31: In how many ways can a panty of 7 person arrange themselves. 1. In a row of chairs. 2. Around a circular table. Solve: 1. n! = 7! = 7.6.5.4.3.2.1 = 5040 2. (n-1)! = 6! = 6.5.4.3.2.1 = 720 Example 32: How many three digits, 5 numbers can be formed from the digits: 2, 4, 6, 7, 9? 5.4.3 = 60 5.5.5 = 125 Example 33: If repetitions are not allowed: 1. How many 3 digits’ numbers can be formed from the six digits: 2,3,4,5,6,7 2. How many of these are less than 4? 3. How many are even? 5 4 3 5 5 5
- 46. 45 4. How many are odd? 5. How many are multiples of 5? 6 5 4 1. 6.5.4 = 120 Pr n = P3 6 = 6! (6 − 3)! = 6.5.4.3! 3! = 120 numbers 2 5 4 2. 2.5.4 = 40 3 5 4 3. 3.4.5 = 60 3 4 5 4. 3.4.5 = 60 1 5 4 5. 1.5.4 = 20 The conditional probability and independence. P(A∩B) = P(A) . P(B) P(A∪B= P(A)+P(B)-P(A∩B) *If A and B are dependent events then: The P(A/B) = P(A ∩ B) P(B) → P(A ∩ B) = P(B). P(A/B) Note: If A and B are dependent events, then: if P(A/B) = P(A ∩ B) P(B) → P(A ∩ B) = P(B). P(A/B)
- 47. 46 If A and B are independent events, then: The P(A/B) = P(A ∩ B) P(B) = P(A). p(B) P(B) = P(A) … … . . P(B) > 0 Example 34: Suppose 10 horses run a race you would like to know in how many ways 3 horses can finish in 1st , 2nd , 3rd ,in any order. {A, B, C} , {B, C, A} , …… C3 10 = 10! 7! .3! = 10.9.8.7.6! 6! .4.3.2 = 120 Example 35: On a test, a student must select 6 out of 10 questions. In how many ways can this be done? C6 10 = 10! 6! . 4! = 10.9.8.7.6! 6! . 4 . 3 . 2 = 210 Example 36: A museum has 7 paintings by Picasso and wants to arrange 3 of them on the same wall. How many ways are there to do this? Pr N = N! (N − r)! = 7! (7 − 3)! = 7! 4! = 210 Example 37: How many ways can you arrange the letters in the word lollipop? N = 8, L = 3 ,O = 2, I = 1, P = 2 n! n1! . n2! . n3! . n4! = 8! 3! . 2! . 1! . 2! = 8.7.6.5.4.3.2.1 (3.2.1)(2.1)(1)(2.1) = 1680 Example 38: Suppose that the population of certain city is 40% male and 60% female, suppose that 50% of the males and 30% of the female smoke. Find the probability that a smoker is male.
- 48. 47 P(M) = 0.4 , P(F) = 0.6 P(S/M) = 0.5 , P(S/F) = 0.3 1. P(M/S) = P(M∩S) P(S) 2. P(M∩S) = P(S) , P(M/S) 3. P(S) = P(S∩M) + P(S∩F) P(M∩S) = P(M). P(S/M) = (0.4)(0.5) = 0.20 P(F∩S) = P(f) . P(S/f) = (0.6) (0.3) = 0.18 P(S) = 0.20 + 0.18 = 0.38 P(M/S) = 0.20 0.38 = 0.52 Example 39: Let S = [a1, a2, a3, a4] (2) Find P(a1) and P(a2) if P(a3) = P(a4) = 1 4 and P(a1) = 2P(a2) S = P(a1) + P(a2) + P(a3) + P(a4) 1 = 2P(a2) + P(a2) + 1 4 + 1 4 1 = 3P(a2) + 2 4 1 2 = 3P(a2) P(a2) = 0.16 P(a1) = 2 × 0.16 = 0.32 Example 40: Let contains 12 items of which 4 defectives, three items are down at random from the let one after the other. Find the probability that all three-one non defective.
- 49. 48 P(A1) = 8 12 P(A2) = 7 12 P(A3) = 6 12 P(A1 ∩ A2 ∩ A3) = P(A1), P(A2), P(A3) = 8 12 . 7 12 . 6 12 = 14 55 Example 41: Toss a dice and observe the number occurs or the top, then the sample space. S = {1,2,3,4,5,6} Let A be the event that an even number occurs Let B be the event that an odd number occurs Let C that a prime number occurs A = {2,4,6} B = {1, 3,5} C = {1,2,3,5} A∪B = {1,2,3,4,5,6} A∩B = {} CC = {4,6} Example 42: Suppose that two dice that we are flipping it and it was observed that the sum (T) of the two numbers that was less than 8, let: A b the event that T < 8 and let B is event that T is odd numbers then A∩B is, the event that (3,5,7) from the sample space of two dice find: P(A) , P(B) , P(A∩ B) , P(B/A)
- 50. 49 T = {2,3,4,5,6,7} A = {2, 3, 4, 5, 6, 7} B = {3, 5, 7} P(A∩ B) = P(3) + P (5) + P(7) = 2 36 + 4 36 + 6 36 = 12 36 = 1 3 P(B) = P(3) + P(5) + P(7) = 12 36 P(A) = P(2) + P(3) + P(4) + P(5) + P(6) + P(7) = 1 36 + 2 36 + 3 36 + 4 36 + 5 36 + 6 36 = 21 36 = 7 12 P(B/A) = P(A ∩ B) P(A) = 12 36 21 36 = 12 21 = 4 7 P(A ∩ B) = P(3) + P(5) + P(7) = 12 36 We are flipping a coin (2) times how many possible outcomes are there. Example 43: How many different committees of 2 people can we make from 3 people? C2 3 = 3! 2! .1! = 3 . 2! 2! = 3 H T Fist H T H T 2nd flip H = HHH T =HHT H =HTH T =HTT H =THH T =THT H =TTH T = TTT A B C B AB C AC A C BC B A
- 51. 50 Example 44: A = {2, 4, 6} B = {4, 5, 6} P(A) = 3 6 = 1 2 p(B ) = 3 6 = 1 2 P(A∪ B ) = P(A)+P(B)-P(A∩ B) 3 6 + 3 6 − 2 6 = 4 6 = 2 3 Example 45: Let A = {1,3,5} , B = {2 , 4 , 7} Find P(A∪ B) P(A) = 3 6 = 1 2 , P(B) = 3 6 = 1 2 P(A ∪ B) = 2 3 P(A∪B ) = P(A) + P(B) Mutually exclusive events.
- 52. 51 Chapter Five Engineering Statistic Correlation and Regression Correlation Measures the relationship, or association, between two variables by looking at how the variables change with respect to each other. Statistical correlation also corresponds to simultaneous changes between two variables, and it is usually represented by linear relationships. Correlation describes how two or more variables are related, and not whether they cause changes in one another. A statistic that measures the strength and direction of a linear relationship between two quantitative variables. Importance of Correlation 1. Correlation helps us in determining the degree of relationship between variables. It enables us to make our decision for the future course of actions. 2. Correlation analysis helps us in understanding the nature and degree of relationship which can be used for future planning and forecasting. 3. Forecasting without any prior correlation analysis may prove to be defective, less reliable and more uncertain. If it is based upon the result of correlation analysis, it will be more reliable. Correlation Coefficient The correlation coefficient is an important statistical indicator of a correlation and how the two variables are indeed correlated (or not). This is a value denoted by the letter r, and it ranges between -1 and +1.
- 53. 52 (-1, +1) [-1 < r < +1] Positive correlation: A positive correlation would be 1. This means the two variables moved either up or down in the same direction together. Temperature and ice cream sales: the hotter the day, the higher the ice cream sales. Negative correlation: A negative correlation is -1. This means the two variables moved in opposite directions. length of workout and body mass index (BMI): the longer the workout, the lower the BMI. Zero or no correlation: A correlation of zero means there is no relationship between the two variables. In other words, as one variable moves one way, the other moved in another unrelated direction. shoe size and hair color: shoe size has no relation to hair color. 𝑟 = 𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌 √[𝑛 ∑ 𝑋2 − (∑ 𝑋) 2 ][𝑛 ∑ 𝑌2 − (∑ 𝑌) 2 ] OR 𝑟 = 1 𝑛 − 1 ∑ ( 𝑋𝑖 − 𝑋𝑚𝑒𝑎𝑛 𝑆𝑥 ) ( 𝑌𝑖 − 𝑌𝑚𝑎𝑛 𝑆𝑌 ) + 1 + 0.5 - 1 - 0.5 0.0 Strong Strong Weak Weak
- 54. 53 0 100 200 300 400 500 600 700 15 25 35 45 55 0 100 200 300 400 500 600 700 15 25 35 45 Types of correlation: 1. Simple and Multiple correlation 2. Positive and negative correlation 3. Linear and non-linear correlation Positive and Negative Correlation Figure 1: Positive and Negative Correlation Figure 2: Zero or no correlation 0 50 100 150 200 250 300 350 400 450 10 20 30 40 50
- 55. 54 R² = 1 0 100 200 300 400 500 600 700 15 25 35 45 55 R² = 0.6803 0 10 20 30 40 50 60 70 80 600 800 1000 1200 R² = 0.9862 0 100 200 300 400 500 600 700 15 25 35 45 55 R² = 0.7197 0 10 20 30 40 50 60 70 80 600 800 1000 1200 Linear and non-linear correlation Figure 3: Linear Correlation Figure 4: Non Linear Correlation Simple and Multiple Correlation Simple correlation: if there are only two variables under study, the correlation is said to be simple.
- 56. 55 Example 1: The correlation between traffic volume and lane width. Multiple Correlations: when one variable is related to a number of other variables, the correlation is not simple. It is multiple if there is one variable on one side and asset of variables on the other side. Example 2: Speed v (m/s) 20 30 40 50 60 70 80 90 Stopping Distance d (m) 54 90 138 206 292 396 489 598 Speed v (m/s) Stopping Distance d (m) XY X^2 Y^2 20 54 1080 400 2916 30 90 2700 900 8100 40 138 5520 1600 19044 50 206 10300 2500 42436 60 292 17520 3600 85264 70 396 27720 4900 156816 80 489 39120 6400 239121 90 598 53820 8100 357604 440 2263 157780 28400 911301 Y- Axis Dependent Variable Output Experimental X – Axis Independent Variable Input Explanatory
- 57. 56 𝑟 = 8(157780) − (440 ∗ 2263) √[(8 ∗ 28400) − (440)2][(8 ∗ 911301) − (2263)2] = 0.987 𝑟2 = 0.993 Regression Is sometimes used to describe the analysis of a straight line relationship (linear relationship) between a response variable (Y variable) and an explanatory variable (X variable). Regression analysis is the area of statistics that is used to examine the relationship between a quantitative response variable and one or more explanatory variables. The Equation for the Regression Line Y − hat = b0 + b1x Y-hat = is pronounced Y-hat and is also called the Y-intercept. The intercept is the value of Y-hat when X = 0. b0= is the intercept of the straight line, also called the Y-intercept. b1= is the slope of the straight line. Note: Reminder: the least-squares criterion. This mathematical criterion is used to determine numerical values of the intercept and slope of a sample regression line. b1 = ∑(Xi − Xmean)(Yi − Ymean) ∑(Xi − Xmean)2 b0 = Ymean − b1Xmean Xi = represents the X measurement for the ith observation. Yi = represents the Y measurement for the ith observation. X mean = represents the mean of the X measurements.
- 58. 57 Y mean = represents the mean of the Y measurements. Formula for Correlation: r = 1 n − 1 ∑ ( Xi − Xmean Sx ) ( Yi − Ymean Sy ) r2 = ( 1 n − 1 ∑ ( Xi − Xmean Sx ) ( Yi − Ymean Sy )) 2 n = the sample size. Sx = is the standard deviation of the x measurements. Sy = is the standard deviation of the y measurements. r2 = SSTO − SSE SSTO = SSR SSTO
- 59. 58 SSTO = Total Sum of Squares SSE = Sum of Squares of Errors Y hat = b0 + b1X Sample 𝐸(𝑌) = 𝐵0 + 𝐵1𝑋 Population 0 = Not correlation coefficient between this relationship. 0.5 = Correlation coefficient measuring the not strength of the linear relationship. 1 = Correlation coefficient measuring the strength of the linear relationship. Example 3: A physiotherapist advises 12 of his patients, all of whom had the same knee surgery done, to regularly perform a set of exercises. He asks them to record how long they practice. He then summarizes the average time they practiced (X, time in minutes) and how long it takes them to regain their full range of motion again (Y, time in days). The results are as follows: 1 2 3 4 5 6 7 8 9 10 11 12 X 24 35 64 20 33 27 42 41 22 50 36 31 Y 90 65 30 60 60 80 45 45 80 35 50 45 X mean = 35.4 Y mean = 57.1 0.5 0 1
- 60. 59 I X Y (Xi – Xmean) (Yi – Ymean) (Xi-Xmean) * (Yi-Ymean) (Xi – Xmean)² 1 24 90 -11.4 32.9 -375.06 129.96 2 35 65 -0.4 7.9 -3.16 0.16 3 64 30 28.6 -27.1 -775.06 817.96 4 20 60 -15.4 2.9 -44.66 237.16 5 33 60 -2.4 2.9 -6.96 5.76 6 27 80 -8.4 22.9 -192.36 70.56 7 42 45 6.6 -12.1 -79.86 43.56 8 41 45 5.6 -12.1 -67.76 31.36 9 22 80 -13.4 22.9 -306.86 179.56 10 50 35 14.6 -22.1 -322.66 213.16 11 36 50 0.6 -7.1 -4.26 0.36 12 31 45 -4.4 -12.1 53.24 19.36 ∑ -2125.4 1748.92 b1 = ∑(Xi − Xmean)(Yi − Ymean) ∑(Xi − Xmean)2 = −2125.4 1748.92 = −1.22 b0 = Ymean − b1Xmean = 57.1 − (−1.22 ∗ 35.4) = 100.29 Y − hat = 100.29 − 1.22x Y-hat SSE (Yi – Y-hat)² SSTO (Y – Ymean)² 71.12 356.40 1082.41 57.75 52.51 62.41 22.51 56.09 734.41 75.98 255.44 8.41 60.18 0.03 8.41 67.48 156.86 524.41 49.25 18.03 146.41 50.46 29.83 146.41 73.55 41.58 524.41 39.52 20.47 488.41 56.54 42.75 50.41 62.61 310.27 146.41 ∑ 1340.27 3922.92
- 61. 60 0 100 200 300 400 500 600 700 15 25 35 45 55 Coefficient of Determination: r2 = SSTO − SSE SSTO = 3922.92 − 1340.27 3922.92 = 0.658 ≃ 0.66 Curve Fitting Is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Method of least squares: the method of least square helps us to find the values of unknown b0 and b1 in such a way that following two conditions are satisfied. 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 Full Motion Time (day) Average Practice Time (min.) 𝑏1 𝑏0 R² = 0.658
- 62. 61 Residual: the difference between the given yi value and the fit function evaluated at xi is 𝑟𝑖 = 𝑦𝑖 − 𝐹(𝑥𝑖) = 𝑦𝑖 − 𝑦ℎ𝑎𝑡 0 50 100 150 200 250 300 350 400 450 10 20 30 40 50 y-hat Residual yi
- 63. 62 Example 4: Relationships between age and distance calculate the residual for each point? If the equation is Y-hate = 12.5x + 303.33 Age 18 20 22 Distance 510 590 560 X = Age Y = Distance Y-hate = 577-3x Residual = Y – Y-hate 18 510 (12.5 * 18) + 303.33 = 528.33 510 – 528.33 = -18.33 20 590 (12.5 * 20) + 303.33 = 553.33 590 – 553.33 = 36.67 22 560 (12.5 * 22) + 303.33 = 578.33 560 – 578.33 = -18.33 y = -1.2153x + 100.12 R² = 0.6584 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 Full Motion Time (day) Average Practice Time (min.)
- 64. 63 Example 5: Find the linear correlation between dependent and independent variable and is statically significant or not by least-squares method? Independent variable 41 73 67 37 58 Dependent variable 52 95 72 52 96 (X-Xmean) (Y-Ymean) (X-Xmean)² (Y-Ymean)² -14.2 -21.4 201.64 457.96 17.8 21.6 316.84 466.56 11.8 -1.4 139.24 1.96 -18.2 -21.4 331.24 457.96 2.8 22.6 7.84 510.76 ∑ 996.8 ∑ 1895.2 (X-Xmean)* (Y-Ymean) Y-hat (Y-Yhat)² 303.88 57.37 28.836 384.48 93.466 2.353 -16.52 86.698 216.031 389.48 52.858 0.736 63.28 76.546 378.458 ∑ 1124.6 ∑ 626.414 500 510 520 530 540 550 560 570 580 590 600 15 16 17 18 19 20 21 22 23 Distance Age (year)
- 65. 64 b1 = 1124.6 996.8 = 1.128 b0 = 73.4 − 1.128 ∗ 55.2 = 11.122 Y-hat = 11.122 + 1.128X (linear equation) 𝑟2 = 1895.2 – 626.414 1895.2 = 669 Statically significant because correlation coefficient more than 0.5. Some simple functions commonly used to fit data are: Straight line Parabola Polynomial Gaussian Sine or cosine Linear Regression Is the error function for a best fit curve, the error function (E) should be minimized to the coefficients of the equation b0 and b1. Can be arranged in matrix form: ( 𝑁 ∑ 𝑋𝑖 ∑ 𝑋𝑖 ∑ 𝑋𝑖 2 ) ( 𝑏0 𝑏1 ) ( ∑ 𝑦𝑖 ∑ 𝑥𝑦𝑖 ) Polynomial Regression: same times from the trend of the plot of a given set of data, it may appear that a higher degree polynomial will be the curve of best fit. Can be arranged in matrix form: -
- 66. 65 Example 6: Find the straight line that best fits the following data by least squares method. x 1 2 3 4 5 6 y 2 4 7 9 12 14 x y xy x² 1 2 2 1 2 4 8 4 3 7 21 9 4 9 36 16 5 12 60 25 6 14 84 36 Total 21 48 211 91 21a + 6b = 48 ……………………eq. (1) 91a + 21b = 211………………….eq. (2) Eq. (1): b = -3.5a + 8 ……………set in eq. (2) 91a + 21(-3.5a + 8) = 211 a = 2.457 b = -3.5 * 2.457 + 8 = -0.6 Straight line fitted for the data is: Y = 2.457X – 0.6
- 67. 66 Example 7: Determining spot speed characteristics from a set of spot speed data mentioned data collected on an urban (60-m Ring Road) in Erbil City during a spot speed study below: so determine all of them for input data: Input data 37 51 55 65 42 40 55 60 42 47 35 58 59 48 42 56 59 42 53 65 65 For the output data given a below with input data a bow, find linear regression, residual and correlation as tabulated below? output data 582 661 530 478 316 682 726 484 559 635 762 491 520 647 682 571 526 569 458 395 360 1075.998 a + 21 b = 11634 ……….………….1 56960 a + 1075.998 b = 575970……………..2 a = -11.012 b = 1118.253 Y-hate = -11.012x + 1118.253 r² = (279536 – 57752.1376) / 279536 = 0.793 𝑟2 = ( 1 𝑛 − 1 ∑ ( 𝑋𝑖 − 𝑋𝑚𝑒𝑎𝑛 𝑆𝑥 ) ( 𝑌𝑖 − 𝑌𝑚𝑒𝑎𝑛 𝑆𝑦 )) 2 = 0.793
- 68. 67 x y x² xy (y-ymean)² y.hat (y-yhat)² (x-xmean)² 37 582 2209 27354 784 600.689 349.278 17.960 51 661 1369 24457 11449 710.809 2480.936 202.720 55 530 2601 27030 576 556.641 709.742 0.0566 65 478 3025 26290 5776 512.593 1196.675 14.152 42 316 4225 20540 56644 402.473 7477.579 189.392 40 682 1764 28644 16384 655.749 689.115 85.340 55 726 1600 29040 29584 677.773 2325.843 126.292 60 484 3025 26620 4900 512.593 817.559 14.152 42 559 3600 33540 25 457.533 10295.55 76.772 47 635 1764 26670 6561 655.749 430.521 85.340 35 762 1225 26670 43264 732.833 850.713 263.672 58 491 3364 28478 3969 479.557 130.942 45.724 59 520 3481 30680 1156 468.545 2647.617 60.248 48 647 2304 31056 8649 589.677 3285.926 10.484 42 682 1764 28644 16384 655.749 689.115 85.340 56 571 3136 31976 289 501.581 4818.997 22.676 59 526 3481 31034 784 468.545 3301.077 60.248 42 569 1764 23898 225 655.749 7525.389 85.340 53 458 2809 24274 9216 534.617 5870.164 3.104 65 395 4225 25675 25281 402.473 55.845 189.392 65 360 4225 23400 37636 402.473 1803.955 189.392 1075.998 11634 56960 575976 279536 57752.537 1827.78
- 69. 68 R² = 0.7039 0 10 20 30 40 50 10 20 30 40 50 space mean speed, km/hr density, pc/km/lane section (1-2) R² = 0.7237 0 10 20 30 40 10 20 30 40 50 60 70 space mean speed, km/hr density, pc/km/lane section (2-1) R² = 0.611 0 5 10 15 20 30 50 70 90 110 130 space mean speed, km/hr density, pc/km/lane section (3-4) R² = 0.9172 0 10 20 30 10 30 50 70 90 space mean speed, km/hr density, pc/km/lane section (4-3) Example 8: Select best fit model between speed-density relationships shown below: Section No. Name of Road Zone Function Direction of Road Coded Number 1 30-M Ring Road (Sawaf Mosque) Tourism From Safy to Nawroz Intersection 1-2 From Nawroz to Safy Intersection 2-1 2 30-M Ring Road (Tairawa) Commercial From Gazino Antar to Naqliati Kon Road 3-4 From Naqliati Kon to Gazino Antar Road 4-3
- 70. 69 Procedure by SPSS Microsoft Excel Data – data analysis – regression – input and output data – ok Variable view – name (speed) – type (numerical) – measure (scale) - Analysis –regression – curve fitting – dependent and variable – (liner, log., ex.,) - ok Model Summary and Parameter Estimates Dependent Variable: Void Equation Model Summary Parameter Estimates R Square F df1 df2 Sig. Constant b1 Linear .858 48.449 1 8 .000 19.265 -.111 The independent variable is Pressure.
- 71. 70 Chapter Six Engineering Statistic Random Variable The numerical value to a possible outcome of a random circumstance. As examples, we might count how many people in a random sample have type O blood. The two different broad classes of random variables follow: 1. Discrete random variables can take one of a countable list of distinct values. An example of a discrete random variable is the number of people with type O blood in a sample of ten individuals. The possible values are 0, 1, ... , 10, a list of distinct values. 2. Continuous random variables can take any value in an interval or collection of intervals. An example of a continuous random variable is height for adult women. With accurate measurement to any number of decimal places, any height is possible within the range of possibilities for heights. Discrete random variables (Traffic Volume veh/hr) Continuous random variables (Spot Speed km/hr) 230 38.2 340 30.8 250 29.3 400 35.9 450 34.7 Probabilities for Discrete and Continuous Variables For discrete random variables we can find probabilities for exact outcomes.
- 72. 71 For continuous variables we are limited to finding probabilities for intervals of values. 3. Binomial Random Variable The binomial distribution is a discreet distribution displaying data the only two outcomes and each trail includes replacement. Such as: Pass/Fail Success/Failure Male/Female Hot/Cold Heads/Tails High/Low Defective/not Defective In/Out 𝑃(𝑋 = 𝑘) = 𝑛! 𝑘! (𝑛 − 𝑘)! 𝑝𝑘 (1 − 𝑝)𝑛−𝑘 𝑓𝑜𝑟 𝑘 = 0, 1, 2, … , 𝑛 Example 1: It has been claimed that in 60% of all solar-heat installations the utility bill is reduced by at least one third. Accordingly, what are the probabilities that the utility bill will be reduced by at least one third in: a. Four of five installations? b. At least four of five installations? a. X = k = 4, n = 5, and p = 0.6 P(X = 5) = 5! 5! (5 − 5)! 0.64 (1 − 0.6)5−5 = 5! 5! ∗ 0! ∗ 0.1296 ∗ 0.4 = 0.259 Table 1 n X P = 0.6 5 1 2 3 0.663 4 0.922 0.922 – 0.663 = 0.259 is same as the result of equation (ok) b. X = k = 5, n = 5, and p = 0.6
- 73. 72 P(X = 4) = 5! 5! (5 − 5)! 0.65 (1 − 0.6)5−5 = 5 ∗ 4! 4! ∗ 0! ∗ 0.078 ∗ 0.40 = 0.078 Table 1 n X P = 0.6 5 1 2 3 4 0.922 1 – 0.922 = 0.078 Example 2: If the probability is 0.05 that a certain wide-flange column will fail under a given axial load, what are the probabilities that an among 16 such columns a. at most two will fail? b. at least four will fail? a. X = k = 14, n = 16, and p = 0.95 P(X = 14) = 16! 14! (16 − 14)! 0.9514 (1 − 0.95)16−14 = 0.146 0.8108 - 0.9751= 0.146 (in table-1) ok b. X = k = 12, n = 16, and p = 0.95 P(X = 12) = 16! 12! (16 − 12)! 0.9512 (1 − 0.95)16−12 = 0.0061 0.9991 - 0.9930= 0.0061 (in table-1) ok Mean, Standard Deviation, and Variance of a Binomial Random Variable For a binomial random variable X based on n trials with success probability p, 𝜇 = 𝐸(𝑋) = 𝑛𝑝 𝜎 = √𝑛𝑝(1 − 𝑝) 𝑉(𝑋) = 𝑛𝑝(1 − 𝑝) Where:
- 74. 73 n = number of trials P = probability of success 1 – p = probability of failure Example 3: Find the mean of the probability distribution of the number of heads obtained in 3 flips of a balanced coin? n = 10 p = ½ µ = 3 * ½ = 3/2 Example 4: A shipment of 20 digital voice recorders contains 5 that are defective. If 10 of them are randomly chosen for inspection, find the mean of the probability distribution of the number of defectives in a sample of 10 randomly chosen for inspection? n = 10 p = 5/20 µ = 10 * 5/20 = 2.5 Example 5: Verify the result stated in the preceding example; that standard deviation equal to 2 for the binomial distribution with n = 16 and p = 1/2? n = 16 p = ½ (1-p) = 2 Variance = 16 * ½ * ½ = 4 Standard deviation = √4 = 2 4. Normal Random Variables The most commonly encountered type of continuous random variable is the normal random variable, which has a specific form of a bell-shaped probability density curve called a normal curve. A normal random variable is also said to have a normal distribution. The equation of the normal probability density is shown in below.
- 75. 74 𝑓(𝑋) = 1 √2𝜋 𝑒−(𝑥−𝜇)2/2 Confidence Interval A confidence interval displays the probability that a parameter will fall between a pair of values around the mean. Confidence intervals measure the degree of uncertainty or certainty in a sampling method. They are most often constructed using confidence levels of 95% or 99%. Confidence level (%) Constant z 68.3 1.00 86.6 1.50 90.0 1.64 95.0 1.96 95.5 2.00 98.8 2.50 99.0 2.58 99.7 3.00 * Use confidence level (95) for engineering works because may be middle of the confidence intervals. * Use the letter Z to represent a standard normal random variable. z∗ = X − μ σ The following basic properties are used (see figure) 1. The normal distribution is symmetrical about the mean. 2. The total area under the normal distribution curve is equal to 1% or 100%. 3. The area under the curve between µ + σ and µ - σ is 0.6827. 4. The area under the curve between µ + 1.96 σ and µ - 1.96 σ is 0.9500. 5. The area under the curve between µ + 2 σ and µ - 2 σ is 0.9545. 6. The area under the curve between µ + 3 σ and µ - 3 σ is 0.9971.
- 76. 75 Useful probability relationships for normal distribution: 1. P (z < a) 2. P (z > a) 3. P (a < z < b) 4. P (z < µ - d) = p (z > µ+d) Note: A normal random variable with mean μ = 0 and standard deviation σ = 1 is said to be a standard normal random variable and to have a standard normal distribution. Reading Table A.1 As an example, P(Z ≤ 1.82) = 0.9656 Example 6: Assume that the compressive strength of a set of concrete cubes have a normal distribution with mean µ = 35 N/mm2 and standard deviation σ = 2.7 N/mm2 . What is the probability that a randomly selected cube is 31 N/mm2 or
- 77. 76 fewer? This is the same as asking what proportion of concrete cubes are 31 N/mm2 or fewer. 1. Step 1: z∗ = 31 − 35 2.7 = −1.481 2. Step 2: P(X ≤ 31) = P(Z ≤ - 1.481) = .1335 (found by using Table A.1) Example 7: Suppose that for the SAT test given to prospective college students in the United States, scores on the math section have a normal distribution with mean μ = 515 and standard deviation σ = 100. Let’s consider some questions that illustrate probability relationships. Question 1: What is the probability that a randomly selected test-taker had a score less than or equal to 600? Said another way, what is the cumulative probability for a score of 600? 1. Step 1: z∗ = 600 ∗ 515 100 = −0.85 2. Step 2: In Table A.1, the cumulative probability for this standardized score is .8023. Question 2: What is the probability that a randomly selected test-taker scored higher than 600? 1. Step 1: 1 – 0.8023 = 0.1977 2. Step 2: P(X > 600) = P(Z > 0.85) = 1 – P(Z ≤ 0.85) = 1 – .8023 = 0.1977.
- 78. 77 Question 3: What is the probability that a randomly selected test-taker scored between 515 and 600? 1. Step 1: For 600, z* = 0.85. For 515, z∗ = 515−515 100 = 0 2. Step 2: P(515 ≤ X ≤ 600) = P(0 ≤ Z ≤0.85) = P(Z ≤ 0.85) – P(Z ≤ 0) = 0.8023 – 0.5000 = 0.3023 Question 4: What is the probability that a randomly selected test-taker’s score was more than 85 points from the mean in either direction? 1. Step 1: A score that’s more than 85 points below the mean is less than 430. z∗ = 430 − 515 100 = −0.85 2. Step 2: In Table A.1, find P(Z ≤ –0.85) = 0.1977. 0.1977 + 0.1977 = 0.3954. Example 8: If the amount of cosmic variation to which a person is exposed while flying by jet across the united states is random variable having the normal distribution with µ = 4.35 mrem and standard deviation = 0.59 mrem, find the probabilities that the amount of cosmic radiation to which a person will be exposed on such a flight is: a. Between 4.00 and 5.00 mrem. b. At least 5.50 mrem. a. 5 − 4.35 0.59 − 4 − 4.35 0.59
- 79. 78 = (𝑧∗ 1.10) − (𝑧∗ − 0.59) = 0.5867 b. 1 − 5.5 − 4.35 0.59 = 1 − (𝑧∗ − 1.95) = 0.0256 Example 9: Actual amount of instant coffee that a filling machine puts in to 4 ounce jars may be looked upon as a random variable having a normal distribution with standard deviation = 0.04 ounces. If only 2% of the jars are to contain less than 4 ounces, what should be the mean fill of these jars? z∗ ( 4 − μ 0.04 ) = 0.02 4 − μ 0.04 = −2.05 μ = 4.082 Example 10: It has been claimed that in 80% of all solar-heat installations the utility bill is reduced by at least one third. According what are the probabilities that the utility bill will be reduced by at least on third in: a. Six to seven installations? b. At least six of seven installations? by using general equation for Binomial distribution. a. X = 6, n = 7, and p = 0.80 P(X = 6) = 7! 6! (7 − 6)! 0.806 (1 − 0.80)7−6 = 0.367
- 80. 79 b. X = 7, n = 7, and p = 0.80 P(X = 7) = 7! 7! (7 − 7)! 0.807 (1 − 0.8)7−7 = 0.209 Procedure by SPSS Variable view – name (speed) – type (numerical) – measure (scale) - Analysis – descriptive statistics – frequencies – variable (speed) – statistics – (mean, median, mode, max., min., range, SD, SE, variance) – Continue – charts- (bar, pie, histogram) and show normal curve for histogram - continue – ok
- 81. 80
- 82. 81
- 83. 82
- 84. 83