Linear Regression tends to establish a relationship between a dependent variable(Y) and one or more independent variable(X) by finding the best fit of the straight line.
The equation for the Linear model is Y = mX+c, where m is the slope and c is the intercep.
In the diagram, the blue dots we see are the distribution of ‘y’ w.r.t ‘x.’ There is no straight line that runs through all the data points. So, the objective here is to fit the best fit of a straight line that will try to minimize the error between the expected and actual value.
The logistic regression technique involves the dependent variable, which can be represented in the binary (0 or 1, true or false, yes or no) values, which means that the outcome could only be in either
one form of two. For example, it can be utilized when we need to find the probability of a successful or fail event.
A decision tree is a type of supervised learning algorithm that can be used in classification as well as regressor problems. The input to a decision tree can be both continuous as well as categorical. The
decision tree works on an if-then statement. Decision tree tries to solve a problem by using tree representation (Node and Leaf)
Assumptions while creating a decision tree:
1) Initially all the training set is considered as a root
2) Feature values are preferred to be categorical, if continuous then they are discretized
3) Records are distributed recursively on the basis of attribute values
4) Which attributes are considered to be in root node or internal node is done by using a statistical approach.
Random Forest is an ensemble machine learning algorithm that follows the bagging technique. The base estimators in the random forest are decision trees. Random forest randomly selects a set of
features that are used to decide the best split at each node of the decision tree. Looking at it step-by-step, this is what a random forest model does:
1. Random subsets are created from the original dataset (bootstrapping).
2. At each node in the decision tree, only a random set of features are considered to decide the
3. A decision tree model is fitted on each of the subsets.
4. The final prediction is calculated by averaging the predictions from all decision trees.
To sum up, the Random forest randomly selects data points and features and builds multiple trees
Random Forest is used for feature importance selection. The attribute (.feature_importances_) is used to find feature importance.
Some Important Parameters:-
1. n_estimators:- It defines the number of decision trees to be created in a random forest.
2. criterion:- “Gini” or “Entropy.”
3. min_samples_split:- Used to define the minimum number of samples required in a leaf node before a split is attempted
4. max_features: -It defines the maximum number of features allowed for the split in each decision tree.
5. n_jobs:- The number of jobs to run in parallel for both fit and predict. Always keep (-1) to use all the cores for parallel processing.
5.Support Vector Machine
A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new Data.
Type of svm kernels
2. Polynomial kernel
4. Gaussian radial basis function (RBF)
3. Support Vector
4. Hyper Plane
6.K Nearest Neighbor
KNN means K-Nearest Neighbour Algorithm. It can be used for both classification and regression. It is the simplest machine learning algorithm. Also known as lazy learning (why? Because it does not create a generalized model during the time of training, so the testing phase is very important where it does the actual job. Hence Testing is very costly – in terms of time & money). Also called an instancebased or memory-based learning.
In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is
a positive integer, typically small). If k = 1, then the object is assigned to the class of that single nearest neighbor.
7.Naive Bayes Classifier
It is a classification technique based on Bayes‘ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.)
Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups.
Type of Clustring
Neural networks are one of the main tools used in machine learning. As neural suggests, they are braininspired systems which are intended to replicate the way that we humans learn. NNs consist of input and output layers, as well as a hidden layer consisting of units that transform the input.