- The ultimate goal of linear regression is to find a line that best fits the data.
- The goal of multiple linear regression and polynomial regression is to find the plane that best fits the data in n-dimension.
- Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more independent variables.
- The logistic equation is created in such a way that the output a probability value that can be mapped to classes and values can only be between 0 and 1.
SUPPORT VECTOR MACHINE
- SVM finds the hyperplane between classes of data which maximizes the margin between classes.
- There can be many hyperplanes that can separate the classes. but only one plane can maximize the distance between the classes.
K NEAREST NEIGHBOR
- K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure.
- It is an approach to data classification that estimates how likely a data point is to be a member of one group or the other depending on what group the data points nearest to it are in.
- Naive Bayes is another popular classification algorithm.
- The goal is to find the class with the maximum proportional probability.
- It answers the following question. “What is the probability of A given B? And because of the naive assumption that variables are independent given the class
- Each circle above is called a node.
- The last nodes of the decision tree, where a decision is made, are called the leaves of the tree.
- Decision trees are intuitive and easy to build but fall short when it comes to accuracy.
- It creates multiple decision trees using bootstrapped datasets of the original data and randomly selecting a subset of variables at each step of the decision tree.
- The model then selects the mode of all of the predictions of each decision tree.
- By multiple trees it reduces the risk of error from an individual tree.
- Clustering is an unsupervised technique that involves the grouping, or clustering, of data points. It’s frequently used for customer segmentation, fraud detection, and document classification.
- Common clustering techniques include k-means clustering, hierarchical clustering, mean shift clustering, and density-based clustering. While each technique has a different method in finding clusters, they all aim to achieve the same thing.
After reading this blog, I am sure you will get the basics concepts of the machine learning model.