Which model will be a good fit for your data ?

mostefa sihamdi
2 min readNov 1, 2022

In this article we will explain the advantages and disadvantages of each machine learning model and which model will be a good fit for your data,the article is presented in a way that is easy to understand.

Linear Regression

  1. The advantages
  • Simple to implement and efficient to train.
  • Overfitting can be reduced by regularization.
  • Performs wellwhen the dataset is lenearly separable

2. The disadvantages

  • Assumes that the data is independent which is rare in real life.
  • Prone to noise and overfitting.
  • Sensitive to outliers.

Logistic Regression

  1. The advantages
  • Less prone to overfitting but it can overfit in high dimensional datasets.
  • Efficient when the dataset has features that are linearly separable.
  • Easy to implement and efficient to train.

2. The disadvantages

  • Should not be used when the number of observations are lesser than the number of features.
  • Assumption of linearity which is rare in practise.
  • Can only be used to predict discrete functions.

Support Vector Machine (SVM)

  1. The advantages
  • Good at hight dimensional data.
  • Can work on small dataset.
  • Can solve non-linear problems.

2. The disadvantages

  • Inefficient on large data.
  • Requires picking the right kernel

Decision Tree

  1. The advantages
  • Can work on high-dimensional data with excellent accuracy.
  • Can solve non-linear problems.
  • Easy to visualize and explain.

2. The disadvantages

  • Overfitting Might be resolved by random forest.
  • A small change in the data can lead to a large change in the structure of the optimal decision tree.
  • Calculations can get very complex.

K Nearest Neighbour

  1. The advantages
  • Can make predictions without training.
  • Time complexity is O(n).
  • Can be used for both classification and regression.

2. The disadvantages

  • Does not work well with large dataset.
  • Sensitive to noisy data, missing values and outliers.
  • Need feature scaling.
  • Choose the correct K value.

Principal Component Analysis

  1. The advantages
  • Reduce correlated features.
  • Reduce overfitting.

2. The disadvantages

  • Principal components are less interpretable.
  • Information loss.
  • Must standardize data before implementing PCA.

Naive Bayes

  1. The advantages
  • Training period is less.
  • Better suited for categorical inputs.
  • Easy to implement.

2. The disadvantages

  • Assumes that all features are independent which is rarely happening in real life.
  • Zero Frequency.
  • Estimations can be wrong in some cases.

ANN (Artificial Neural Network)

  1. The advantages
  • Have fault tolerance.
  • Have the ability to learn and model non-linear and complex relationships.
  • Can generalize on unsean data.

2. The disadvantages

  • Long training time.
  • Non-guaranteed convergence.
  • Black box.
  • Hard to explain the solution.
  • Hardware dependence.
  • Requires user’s ability to translate the problem.

Adaboost

  1. The advantages
  • Relatively robust to overfitting.
  • High accuracy.
  • Easy to understand and to visualize.

2. The disadvantages

  • Sensitive to noise data.
  • Affected by outliers.
  • Not optimized for speed.

--

--