Support vector machines (SVMs) are a type of supervised machine learning model used for classification and regression analysis. SVMs can handle both linearly separable and non-linearly separable data by mapping data points to a higher dimension feature space. Kernels are used to compute dot products between data points without explicitly computing coordinates in the feature space. SVMs select a subset of training points, called support vectors, to define the decision boundary. They have advantages like effectiveness in high dimensions and memory efficiency.
2. SVM FOR LINEARLY SEPARABLE DATA Plot the points. Find the margin and support vectors. Find the hyperplane having maximum margin. Based on the computed margin value classify the new input data sets into different categories.
3. FIGURE REPRESENTING LINEARLY SEPARABLE DATA Figure representing the support vector and maximum margin hyper plane. (w · x) + b = +1 (positive labels) (w · x) + b = -1 (negative labels) (w · x) + b = 0 (hyperplane) Margin ::
5. STEPS FOR NON LINEARLY SEPARABLE DATA 1.) Map into feature space. 2.) Use Polynomial kernel Φ(X1) = (X1, X1^2) to map points. 3.) Compute the positive , negative and zero hyperplane. 4.) We get the support vectors and the margin value from it. 5.) Classify the new input values from margin value
6. KERNEL AND ITS TYPES. Computation of various points in the feature space can be very costly because feature space can be typically said to be infinite-dimensional. The kernel function is used for to reduce these cost because the data points appear in dot product and the kernel function are able to compute the inner products of these points. By kernel function we can directly compute the data points through inner product without explicitly mapping on the feature space.
7. KERNEL AND ITS TYPES. 1.) Polynomial kernel with degree d. 2.) Radial basis function kernel with width s 3.) Sigmoid with parameter k and q 4.) Linear Kernel K(x,y)= x' * y
8. SPARSE MATRIX AND SPARSE DATA Simple data structure of 2-dimensional array storing non-zero values. Sparse Data iterates over non-zero values only. Stores the values, row number and column number of non-zero values from the matrix. Easy to compute the inner product of zeroes. Speed of SVM algorithms increases by use of Sparse data.
9. STORING SPARSE DATA Dictionary of keys (DOK) DOK represents non-zero values as a dictionary mapping (row, column) tuples to values List of lists (LIL) LIL stores one list per row, where each entry stores a column index and value. Typically, these entries are kept sorted by column index for faster lookup. Coordinate list (COO) COO stores a list of (row, column, value) tuples. In this the entries are sorted (row index then column index value) to improve random access times. Yale format
10. STORING SPARSE DATA The Yale Sparse Matrix Format stores an initial sparse m×n matrix, Where M = row in three one-dimensional arrays. NNZ = number of nonzero entries of M. Array A = length= NNZ, and holds all nonzero entries. Order-top bottom right left. Array IA= length is m + 1. IA(i) contains the index in A of the first nonzero element of row i. Row i of the original matrix extends from A(IA(i)) to A(IA(i+1)-1), i.e. from the start of one row to the last index before the start of the next. Array JA= column index of each element of A, length= NNZ. EXAMPLES::: [ 1 2 0 0 ] [ 0 3 9 0 ] [ 0 1 4 0 ] So computing it we get values as, A = [ 1 2 3 9 1 4 ] , IA = [ 0 2 4 6 ] and JA = [ 0 1 1 2 1 2 ].
11. ADVANTAGES OF SVM In high dimensional spaces Support Vector Machines are very effective. When number of dimensions is greater than the number of samples in such cases also it is found to be very effective. Memory Efficient because it uses subset of training points(support vectors) as decisive factors for classification. Versatile: For different decision function we can define different kernel as long as they provide correct result. Depending upon our requirement we can define our own kernel.
12. DISADVANTAGES OF SVM If the number of features is much greater than the number of samples, the method is likely to give poor performances. It is useful for small training samples. SVMs do not directly provide probability estimates, so these must be calculated using indirect techniques. We can have Non-traditional data like strings and trees as input to SVM instead of featured vectors. Should select appropriate kernel for their project according to requirement