Bias and Variance Tradeoff

October 24, 2024

Bias and Variance are one of the most fundamental concepts in Machine Learning, before trying to understand them, it is important we go through how training is done in Machine Learning. The balance between bias and variance is a critical concept in machine learning, as achieving this balance is key to building models that generalize well to new data.

Training in Machine Learning

Let us paint a scenario- You want to predict your package after you get placed as a Machine Learning engineer, given a dataset of your alumni, which consists of the following features:

  • CGPA: Cumulative Grade Point Average, which reflects the academic performance of an individual throughout their course(Input variable or Predictor variable)
  • No. of Internships: The number of internships completed, indicating practical experience in the field.
  • No. of Certifications: The number of certifications obtained, reflecting additional skills and knowledge acquired outside the formal education system.
  • Major/Field of Study: The specific area of study, which can impact job prospects in Machine Learning.
  • Relevant Coursework: Courses taken that are directly related to Machine Learning and AI.
  • Projects: The number and quality of projects completed, especially those related to Machine Learning.
  • Research Publications: The number of research papers published in relevant journals or conferences.
  • Work Experience: Any previous work experience in the tech industry or related fields.
  • Extracurricular Activities: Participation in relevant clubs, competitions, or hackathons.
  • Letters of Recommendation: The strength and relevance of recommendation letters.
  • Technical Skills: Proficiency in programming languages and tools commonly used in Machine Learning, such as Python, R, TensorFlow, etc.
  • Soft Skills: Communication, teamwork, and other interpersonal skills that are important for job performance.
  • Geographical Location: The location where the individual is applying for jobs, as job availability can vary by region.
  • Networking: The extent of professional networking, such as connections on LinkedIn, participation in industry conferences, etc.
  • Job Application Strategies: The quality and frequency of job applications, including the customization of resumes and cover letters for specific roles.
  • Package: The CTC the alumni secured.(Output Variable)

Now we would select an algorithm(let us say, Linear Regression, for the sake of simplicity) and try to fit a line on all these data points, a model(the line) that understands the patterns and when provided with new data point gives us a rough idea of our CTC.

The algorithm will try different line until the MSE(Mean Squared Error) also known as training loss, this is the training error, which is a part of the total error rate of the model.(there are other types of errors (loss functions) as well) is the minimum. In this case, the MSE would be calculated as,

where:

  • n is the number of data points,
  • y(i)​ is the actual value for the i-th data point,
  • y^(i) is the predicted value for the i-th data point.
  • This y(i)-y^(i) is the prediction errors.It’s important to understand that errors can be reducible or irreducible. Reducible errors can be minimized by improving the model, while irreducible error is due to random noise.

In each pass MSE is calculated and then in the next pass optimized, for how it is done visit this link.

Bias

Now we have a rough idea of how training is done, let us understand what bias is. Taking the same problem as above, let us take one parameter only(because i am human and can only think in 3d and draw in 2d), CGPA(input feature). This is what the CGPA vs Package graph might look like-

Now we apply Linear Regression to this data and we get an accuracy of 50%, why did this happen? The model simply was unable to understand the patterns in the data(model complexity was low). This is also called underfitting, thus we did not get accurate predictions An underfitted linear regression would look like-

Bias measures how well the model matches the training data. A high bias means the model makes strong assumptions about the data, leading to poor performance on the training data. This is often due to an overly simple model. Conversely, a low bias indicates the model closely matches the training data.

For a more statistical explanation of bias, let Y be the true value of a parameter, and let Ŷ be an estimator of Y based on a sample of data. Then, the bias of the estimator Ŷ is given by:
                                     Bias(Ŷ) = E(Ŷ) - Y

where E(Ŷ) is the expected value of the estimator Ŷ. It measures how well the model fits the data. This equation reflects the systematic error introduced by the model.

Low Bias: Low bias value means fewer assumptions are taken to build the target function. In this case, the model will closely match the training dataset.

High Bias: High bias value means more assumptions are taken to build the target function. In this case, the model will not match the training dataset closely. 

Variance

Variance is the measure of spread in data from its mean position. In machine learning variance is the amount by which the performance of a predictive model changes when it is trained on different subsets of the training data. More specifically, variance is the variability of the model that how much it is sensitive to another subset of the training dataset. i.e. how much it can adjust on the new subset of the training dataset.

Let us go back to our example, in a scenario where you get an accuracy of 99% on the linear model we just trained, do not just jump out of your chair yet! This accuracy is on the training set but if we calculate the model accuracy on data the linear regression model has not seen, if it comes out to be say 30%, this is what is called overfitting, meaning the model did so well on the training data it did not generalize the patterns and thus failed on new data(model complexity is too high). The delicate balance between bias and variance is crucial to avoid such scenarios.In our case an overfitted linear regression model would be-

This is called overfitting, a high variance means that the model is very sensitive to changes in the training data and can result in significant changes in the estimate of the target function when trained on different subsets of data from the same distribution. This is the case of overfitting when the linear regression model performs well on the training data but poorly on new, unseen test data. It fits the training data too closely that it fails on the new training dataset. This is called generalization error and thus leads to inaccurate predictions.

Low variance means that the linear regression model is less sensitive to changes in the training data and can produce consistent estimates of the target function with different subsets of data from the samedistribution. This is the case of underfitting when the linear regression model fails to generalize on both training and test data.Low variance indicates consistent model performance across different training sets.

Variance[f(x))=E[X^2]−E[X]^2

Bias-Variance Tradeoff

The goal is to find the right level of complexity in a machine learning model to minimize both bias and variance, achieving good generalization to new data. This balance is known as the Bias-Variance Tradeoff. An ideal model minimizes both bias error and variance error, achieving the lowest possible error rate. This is Bias-Variance tradeoff.

In terms of model complexity, we can use the following diagram to decide on the optimal complexity of our model.

optimal  Model complexity


Ways to reduce high bias in Machine Learning:

To reduce high bias, one can use a more complex model, increase the number of features, reduce regularization, or increase the size of the training data. These methods aim to better capture the underlying patterns in the data, thereby improving model performance.

  • Use a more complex model: One of the main reasons for high bias is the very simplified model. it will not be able to capture the complexity of the data. In such cases, we can make our mode more complex by increasing the number of hidden layers in the case of a deep neural network. Or we can use a more complex model like Polynomial regression for non-linear datasets, nonlinear models, CNN for image processing, and RNN for sequence learning.
  • Increase the number of features: By adding more features to train the dataset will increase the complexity of the model. And improve its ability to capture the underlying patterns in the data.
  • Reduce Regularization of the model: Regularization techniques such as L1 or L2 regularization can help to prevent overfitting and improve the generalization ability of the model. if the model has a high bias, reducing the strength of regularization or removing it altogether can help to improve its performance.
  • Increase the size of the training data: Increasing the size of the training data can help to reduce bias by providing the model with more examples to learn from the dataset.

Ways to Reduce the reduce Variance in Machine Learning:

To reduce high variance, techniques such as cross-validation, feature selection, regularization, ensemble methods, simplifying the model, and early stopping can be employed. These methods help in building a model that generalizes well to new, unseen data.

  • Cross- validation: By splitting the data into training and testing sets multiple times, Cross - validation can help identify if a model is overfitting or underfitting and can be used to tune hyperparameters to reduce variance.
  • Feature selection: By choosing the only relevant feature will decrease the model’s complexity. and it can reduce the variance error.
  • Regularization: We can use L1 or L2 regularization to reduce variance in machine learning models, which uses regularization parameters. These regularization parameters control bias and variance.
  • Ensemble methods: It will combine multiple models to improve generalization performance. Bagging, boosting, and stacking are common ensemble methods that can help reduce variance and improve generalization performance.
  • Simplifying the model: Reducing the complexity of the model, such as decreasing the number of parameters or layers in a neural network, can also help reduce variance and improve generalization performance.
  • Early stopping: Early stopping is a technique used to prevent overfitting by stopping the training of the deep learning model when the performance on the validation set stops improving.

Author

This article was written by Zohair Badshah, a former member of our software team, and edited by our writers team.

Latest Articles

All Articles
Resources for building ML pipelines

Resources for building ML pipelines

🚀 "Build ML Pipelines Like a Pro!" 🔥 From data collection to model deployment, this guide breaks down every step of creating machine learning pipelines with top resources

AI/ML
Is there an AI for that?

Is there an AI for that?

Explore top AI tools transforming industries—from smart assistants like Alexa to creative powerhouses like ChatGPT and Aiva. Unlock the future of work, creativity, and business today!

AI/ML
Model Selection

Model Selection

Master the art of model selection to supercharge your machine-learning projects! Discover top strategies to pick the perfect model for flawless predictions!

AI/ML