Which model will be a good fit for your data ?

2 min readNov 1, 2022

In this article we will explain the advantages and disadvantages of each machine learning model and which model will be a good fit for your data,the article is presented in a way that is easy to understand.

Linear Regression

The advantages

Simple to implement and efficient to train.
Overfitting can be reduced by regularization.
Performs wellwhen the dataset is lenearly separable

2. The disadvantages

Assumes that the data is independent which is rare in real life.
Prone to noise and overfitting.
Sensitive to outliers.

Logistic Regression

The advantages

Less prone to overfitting but it can overfit in high dimensional datasets.
Efficient when the dataset has features that are linearly separable.
Easy to implement and efficient to train.

2. The disadvantages

Should not be used when the number of observations are lesser than the number of features.
Assumption of linearity which is rare in practise.
Can only be used to predict discrete functions.

Support Vector Machine (SVM)

The advantages

Good at hight dimensional data.
Can work on small dataset.
Can solve non-linear problems.

2. The disadvantages

Inefficient on large data.
Requires picking the right kernel

Decision Tree

The advantages

Can work on high-dimensional data with excellent accuracy.
Can solve non-linear problems.
Easy to visualize and explain.

2. The disadvantages

Overfitting Might be resolved by random forest.
A small change in the data can lead to a large change in the structure of the optimal decision tree.
Calculations can get very complex.

K Nearest Neighbour

The advantages

Can make predictions without training.
Time complexity is O(n).
Can be used for both classification and regression.

2. The disadvantages

Does not work well with large dataset.
Sensitive to noisy data, missing values and outliers.
Need feature scaling.
Choose the correct K value.

Principal Component Analysis

The advantages

Reduce correlated features.
Reduce overfitting.

2. The disadvantages

Principal components are less interpretable.
Information loss.
Must standardize data before implementing PCA.

Naive Bayes

The advantages

Training period is less.
Better suited for categorical inputs.
Easy to implement.

2. The disadvantages

Assumes that all features are independent which is rarely happening in real life.
Zero Frequency.
Estimations can be wrong in some cases.

ANN (Artificial Neural Network)

The advantages

Have fault tolerance.
Have the ability to learn and model non-linear and complex relationships.
Can generalize on unsean data.

2. The disadvantages

Long training time.
Non-guaranteed convergence.
Black box.
Hard to explain the solution.
Hardware dependence.
Requires user’s ability to translate the problem.

Adaboost

The advantages

Relatively robust to overfitting.
High accuracy.
Easy to understand and to visualize.

2. The disadvantages

Sensitive to noise data.
Affected by outliers.
Not optimized for speed.

Which model will be a good fit for your data ?

Written by mostefa sihamdi