Which model will be a good fit for your data ?
2 min readNov 1, 2022
In this article we will explain the advantages and disadvantages of each machine learning model and which model will be a good fit for your data,the article is presented in a way that is easy to understand.
Linear Regression
- The advantages
- Simple to implement and efficient to train.
- Overfitting can be reduced by regularization.
- Performs wellwhen the dataset is lenearly separable
2. The disadvantages
- Assumes that the data is independent which is rare in real life.
- Prone to noise and overfitting.
- Sensitive to outliers.
Logistic Regression
- The advantages
- Less prone to overfitting but it can overfit in high dimensional datasets.
- Efficient when the dataset has features that are linearly separable.
- Easy to implement and efficient to train.
2. The disadvantages
- Should not be used when the number of observations are lesser than the number of features.
- Assumption of linearity which is rare in practise.
- Can only be used to predict discrete functions.
Support Vector Machine (SVM)
- The advantages
- Good at hight dimensional data.
- Can work on small dataset.
- Can solve non-linear problems.
2. The disadvantages
- Inefficient on large data.
- Requires picking the right kernel
Decision Tree
- The advantages
- Can work on high-dimensional data with excellent accuracy.
- Can solve non-linear problems.
- Easy to visualize and explain.
2. The disadvantages
- Overfitting Might be resolved by random forest.
- A small change in the data can lead to a large change in the structure of the optimal decision tree.
- Calculations can get very complex.
K Nearest Neighbour
- The advantages
- Can make predictions without training.
- Time complexity is O(n).
- Can be used for both classification and regression.
2. The disadvantages
- Does not work well with large dataset.
- Sensitive to noisy data, missing values and outliers.
- Need feature scaling.
- Choose the correct K value.
Principal Component Analysis
- The advantages
- Reduce correlated features.
- Reduce overfitting.
2. The disadvantages
- Principal components are less interpretable.
- Information loss.
- Must standardize data before implementing PCA.
Naive Bayes
- The advantages
- Training period is less.
- Better suited for categorical inputs.
- Easy to implement.
2. The disadvantages
- Assumes that all features are independent which is rarely happening in real life.
- Zero Frequency.
- Estimations can be wrong in some cases.
ANN (Artificial Neural Network)
- The advantages
- Have fault tolerance.
- Have the ability to learn and model non-linear and complex relationships.
- Can generalize on unsean data.
2. The disadvantages
- Long training time.
- Non-guaranteed convergence.
- Black box.
- Hard to explain the solution.
- Hardware dependence.
- Requires user’s ability to translate the problem.
Adaboost
- The advantages
- Relatively robust to overfitting.
- High accuracy.
- Easy to understand and to visualize.
2. The disadvantages
- Sensitive to noise data.
- Affected by outliers.
- Not optimized for speed.