In todayās world, a booming word! and a thriving technology is unarguably Artificial Intelligence, Machine learning is a field under Artificial intelligence focused on developing algorithms. These algorithms allow computers to learn and generate various predictions using massive data. The very foundation of these algorithms has its roots in mathematics which enables complex problem solving. Mathematics is Undoubtedly an essential skill for those looking to build a career in machine learning field.Ā This is an attempt to provide core mathematical concepts required for understanding and further developing machine learning algorithms.
mathematics helps learners with a toolkit of machine learning techniques that are important in creating data-efficient learning
We can call it the lifeblood of machine learning! It plays a very important role in representing as well as manipulating data. Linear algebra is the study of vectors, matrices and linearĀ transformation and Geometry provides a visual understanding of these concepts in machine learning, especially in high-dimensional spaces
Vectors represent data points or arrays and matrices represent datasets or tables. How is it related to machine learning you may ask?!, we understand for a fact that computers understand numbers and a way to represent them is vectors and matrices, for example in a dataset any feature is represented as a vector which is a one-dimensional array (although geometrically they do have magnitude and direction) and the dataset is represented as matrices, hence by default it becomes crucial to understand the operations on matrices like addition, multiplication, transposition and matrix decomposition, these understanding on a simple 3x3 matrices will enable an engineer to do various operations on a matrix or we can say a table or large dataset.
For example transposition: It flips a matrix around its diagonal. Useful for image processing or data cleaning. For example, an image can be represented as a matrix where each pixel is a value. Transposing this matrix would be useful for applying filters or even performing image recognition.
Eigenvalue is denoted by (Ī»),Ā An eigenvalue of a square matrix A is a scalar (a single number), Eigenvector is denoted by (v), multiplication of Matrix with Eigenvector results into a scaled version of Eigenvector scaled by Ī». for a square matrix A, an eigenvector v and corresponding eigenvalue Ī» satisfy Av=Ī»v.
In the machine learning context, finding Eigenvalues and Eigenvectors helps us decompose the data into principal components, these are āDirectionsā of maximum variance in the data and ultimately this can be very useful in dimensionality reduction (i.e basically reducing the number of features in a dataset while retaining the essential information), this Principal component analysis (PCA) is used in feature selection where most important features are selected before model training and fitting functions.
SVD is a matrix factorization method that breaks down a matrix into three simpler matrices. important for dimensionality reduction and data compression. It helps identify patterns, reduce noise, and extract important features from data, which is useful in techniques like collaborative filtering and recommender systems.
āin simple words, none of the vectors can be formed by scaling and adding other vectors in the set is linearly independent. linear independence ensures that each feature in the dataset provides unique information for example if we have a population data set and thereās a column of gender and every data point represents the same gender, then the feature would not be as much useful for any sort of prediction, preventing redundancy and improving model performance. It is also important to find the rank of matrices, which affects their invertibility and the stability of algorithms like linear regression and matrix factorizations.
Solving matrix equation is important for understanding transformations on spaces and covariance matrices,
Many algorithms like Linear regression can be shown with matrix operations: linear regression fits a line to a set of data points such that it minimizes the sum of squared differences between the observed values and the values predicted by the line usually called the Best fit line!Ā
this involves solving linear equations in the form y=XĪ²+Ļµ ,Ā Estimation of Coefficients= Ī²^ā=(XTX)ā1XTy
Matrix representation of linear regression uses powerful linear algebra concepts to achieve efficient computation especially required while working with large datasets. and having numpy knowledge as it is a famous and widely used library is beneficial.
To understand the optimization in machine learning algorithms, a solid foundation of differentiation, integration, and multivariable calculus is required. It deals with rates of change and accumulation of quantities especially in the context of machine learning while optimizing, multivariate calculus, including vector calculus plays a role in adjusting the parameters of a model such that it minimizes errors in prediction (technically finding the āMinimaā).
For ease of understanding let's take an example of Linear regression on the prediction of property prices based on its size, here we intend to find the best-fit line that predicts the price of the property based on its size. The best-fit line can be expressed by the equation y=mx+b, where y is the predicted price, x is the size of the property, m is the āslopeā of the line based on how price increases with size and b is the y-intercept, for simplicity of understanding we can say that b represents a price of the property with a size 0! (bias)
A common choice of the āError functionā in such an example would be Mean squared error (MSE), which simply represents an average of squared differences between predicted prices and actual prices.
Equation of MSEĀ Ā Ā
ā
Now let's understand minimizing error with calculus, here we want to find the bestĀ āslopeā of the line and bias/y-interceptĀ
Derivatives will help us understand changes in error with respect to changes in slope and y_interccept, there is an iterative optimization algorithm to update/change slope and y-intercept in the direction that reduces the āerrorā, its called āGradient descentā
There are 4 simple steps of gradient descent to perform:
The real-world data used for analysis and prediction sometimes have inherent uncertainty and probability and statistics help make conclusions about data and quantify the uncertainty and it makes probability and statistics an important part of Mathematics for Machine learning
Probability theory is the likelihood of an event occurringĀ & statistics is analyzing and interpreting that data., probability distribution is formed by list of discrete random variable.
For ease of understanding let's take an example of a Medical Diagnosis, dissecting the process in 5 parts.
Conditional Probability:
Ā Ā Ā Ā Ā Ā
ā
ā
The requirement of mathematical understanding varies from job description to the other using different concepts for machine learning, but if we take a look at a Machine Learning engineer for example, they are responsible for designing, building, and deploying machine learning models. And all three key concepts require a strong foundation in mathematics, particularly in areas like linear algebra, calculus, and probability, level of mathematics is a little bit more than school math!. And these skills will be used in various central machine learning methods like:
If this article ignited the mathematician inside you!, the resources below will help you dive deeper into each of the topics covered here and mathematics of machine learning in general as well to expand your mathematical background
This article was written by Kartikey Vyas, and edited by our writers team.
š "Build ML Pipelines Like a Pro!" š„ From data collection to model deployment, this guide breaks down every step of creating machine learning pipelines with top resources
Explore top AI tools transforming industriesāfrom smart assistants like Alexa to creative powerhouses like ChatGPT and Aiva. Unlock the future of work, creativity, and business today!
Master the art of model selection to supercharge your machine-learning projects! Discover top strategies to pick the perfect model for flawless predictions!