One of the most powerful uses of Artificial Intelligence today is Image Recognition models, broadly speaking image recognition models consists of 3 sub classes:
Before diving into the specifics of each one of these and understand image recognition technolo, let us first try to understand how a computer(which only understands 0 and 1) comprehends an image.
Images are made up of pixels, which is a very small area that is illuminated. You might have heard of an image being 1080p, this means that the image consists of One thousand and Eighty pixels. The more the number of pixels in an image the higher the quality.
Now let us take a black and white picture, it consists of two colours. This means the pixels are either black or white. Thus for each pixel we can say it is 0(black) or 1(white). This is how a black and white image is represented in binary. But what about coloured images, after all we aren't in the 90's!!
We will apply the same logic, but instead of each pixel having one value(0 or 1), we can assign a vector to each pixel. This vector can be the Red-Green-Blue(RGB), this is because every colour can be represented by the combination of these three colours. Read here for more info...
Now we know how we can assign number to an image, in the next section we will look at how from numbers a machine fathoms the image.
Let us talk our favorite subject, Machine Learning! At the core of it images are just raw data, numbers assigned to pixels. To get something out of these, we need to follow certain steps--
And Voila! At the outputr of the model your phone gets unlocked or the FBI catches a criminal!
As I stated, a model able to understand our pixels is not the same as a normal Neural Network, this is because a feedforward neural network,does not factor in the spatial relationships between pixels, as it treats each input independently, you dont want only your eyebrows to unlock your phone right?(facial recognition pun)
Also since a normal image consists of thousands of each pixels(multiply by 3 for coloured images) a FeedForward neural network would require that many nuerons which is computationally very expensive.
All these problems were solved by a novel approach to mathematical computations and model architecture and image preprocessing as a new operator was introduced. Convolution is essentially a mathematical operation used to extract features from the input data by applying a filter (also known as a kernel) to the input data. Broadly, Convolution involves sliding a filter (a small matrix of weights) over the input image and computing the dot product between the filter and the overlapping regions of the input image. This process produces a feature map that highlights specific features of the input image, such as edges or textures. This process capture all the relevant information from the image while reducing the number of parameters required, also called feature extraction.
Another important introduction in Convolutional Neural Networks was of pooling layers, pooling essentially downsample the output from the convolution further to reduce the number of paramaters. Say you have a 3x3 matrix, in Max Pooling (a type of pooling) you would represent this 3x3 matrix as a single number which will be the maximum of all 9 numbers, in Avg Pooling we do the average.
Thats all repeat these two, combined with activation functions and a fully connect layer, we have successfully processed our data. But what exactly is this output? Let us understand in the next section.
Before we try to understand the output we need to delve into the types of problems in computer vision. Primarily there are 3 image recognition tasks which we mentioned above. Let us look at each one in detail.
At the output of these three are ofcourse numbers but for image classification the numbers are the probabilities of the image belonging to a particular task, for object detection and segmentation the output is the location of the object either as a bounding box or pixel by pixel. This is at the core of image recognition technology.
Let us use deep learning models for image recognition and build our image recognition application. We willl use TensorFlow from the programming language python.
Enough theory, let us get our hands dirty with a good problem. For this example we will do cat and dog classification with this training dataset.
Step 1: Set Up Your Development Environment
Before we begin, ensure you have Python and TensorFlow installed on your system. You can install TensorFlow using pip:
pip install tensorflow
Step 2: Collect the Dataset
Extract the dataset to a directory named “dataset” in your project folder.
Step 3: Prepare the Data
We need to preprocess the images before training the AI model. Create a Python script named prepare_data.py and use the following code, let us apply some image transformations:
import os import cv2 import numpy as np
data_directory = "dataset" categories = ["cat", "dog"] img_size = 100
training_data = []
def create_training_data():
for category in categories:
path = os.path.join(data_directory, category)
class_num = categories.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (img_size, img_size))
training_data.append([new_array, class_num])
except Exception as e:
pass
create_training_data()
import random
random.shuffle(training_data)
X = [] y = []
for features, label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, img_size, img_size, 1)
y = np.array(y)
Step 4: Build the AI Model
Create a Python script named image_classifier.py and add the following code to build the AI model, this model is used for training images.
import tensorflow as tf from tensorflow.keras.models
import Sequential from tensorflow.keras.layers
import Dense, Conv2D, MaxPooling2D, Flatten
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape=X.shape[1:], activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Step 5: Train the Model
Now, let’s train the AI model using the prepared data:
model.fit(X, y, batch_size=32, epochs=10, validation_split=0.1)
Step 6: Test the Model
To test the model, create a Python script named test_model.py and use the following code:
import cv2 def prepare(filepath):
img_array = cv2.imread(filepath, cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (img_size, img_size))
return new_array.reshape(-1, img_size, img_size, 1)
model = tf.keras.models.load_model("image_classifier.model")
prediction = model.predict([prepare("test_image.jpg")])
print(categories[int(prediction[0][0])])
That is that, we have succesfully built a model that can classify cats and dogs!
Variability in Breeds:
We will detect number plates from the images of a car!
Download the Open Images Dataset V6 with annotations for license plates image datasets. Organize the dataset as follows, or use Google Colab
/data
/train
- img1.jpg
- img2.jpg
...
/val
- img1.jpg
- img2.jpg
...
/test
- img1.jpg
- img2.jpg
...
/annotations
- train_annotations.csv
- val_annotations.csv
- test_annotations.csv
Install necessary libraries:
pip install tensorflow keras opencv-python pandas matplotlib
Use a pre-trained model like YOLO (You Only Look Once) or SSD (Single Shot MultiBox Detector). We'll use YOLO for this example.
Here's a simplified code example to set up and train a YOLO model using TensorFlow/Keras, and do image preprocessing!
import tensorflow as tf
from tensorflow.keras.models import load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
import cv2
import pandas as pd
import numpy as np
import os
def load_dataset(image_dir, annotations_file):
annotations = pd.read_csv(annotations_file)
images = []
boxes = []
for index, row in annotations.iterrows():
img_path = os.path.join(image_dir, row['filename'])
img = cv2.imread(img_path)
images.append(img)
boxes.append([row['xmin'], row['ymin'], row['xmax'], row['ymax']])
return np.array(images), np.array(boxes)
train_images, train_boxes = load_dataset('/data/train', '/annotations/train_annotations.csv')
val_images, val_boxes = load_dataset('/data/val', '/annotations/val_annotations.csv')
# Define a simple YOLO model (for demonstration purposes)
model = tf.keras.applications.MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet')
model = tf.keras.Sequential([
model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(4, activation='sigmoid')
])
model.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=['accuracy'])
checkpoint = ModelCheckpoint('best_model.h5', monitor='val_loss', save_best_only=True, mode='min')
early_stopping = EarlyStopping(monitor='val_loss', patience=10, mode='min')
# Train the model
model.fit(train_images, train_boxes, validation_data=(val_images, val_boxes), epochs=50, batch_size=8, callbacks=[checkpoint, early_stopping])
Evaluate the model on the test set:
test_images, test_boxes = load_dataset('/data/test', '/annotations/test_annotations.csv')
model.evaluate(test_images, test_boxes)
Run inference on new images and visualize the results:
def draw_boxes(image, boxes):
for box in boxes:
cv2.rectangle(image, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (255, 0, 0), 2)
return image
# Load and preprocess new image
new_image = cv2.imread('new_image.jpg')
input_image = cv2.resize(new_image, (224, 224))
input_image = np.expand_dims(input_image, axis=0)
# Predict bounding box
predicted_box = model.predict(input_image)
# Draw predicted box on the image
output_image = draw_boxes(new_image, predicted_box)
cv2.imshow('Output', output_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Done, we have succesfully done object detection as well!
Image Segmentation is a relatively new field of Computer Vision and image recognition technology. It was introduced to address a major drawback of object detection. While object detection is proficient at identifying and locating objects within an image, it falls short in providing detailed information about the shape and boundaries of these objects. Image segmentation overcomes this limitation by partitioning the image into segments, allowing each pixel to be classified into a specific object or region. This granular approach enables more precise analysis and understanding of the visual content, making it invaluable for applications such as medical imaging, autonomous driving, and image editing.
The output of image segmentation significantly differs from object detection. Instead of generating bounding boxes around objects, image segmentation provides a mask that delineates the exact shape of each object within the image. This mask is typically a binary or multi-class matrix where each pixel is assigned a class label, corresponding to the object or background it belongs to.
To model the output for image segmentation, we need to modify the architecture of the neural network, particularly the decoder, to perform per-pixel classification. This involves the following steps:
And now, for a fun twist: Let’s keep this one as a homework assignment! Using the same dataset, try extracting the number plate of a car pixel by pixel. Once you have it, you can even replace it with another number (but don’t try this at home, folks)!
lead in terms of digital content. It is now so important that an extremely important part of Artificial Intelligence is based on analyzing pictures. Nowadays, it is applied to various activities and for different purposes.
Autonomous vehicles are a true revolution. It seems to be quite futuristic for a lot of people: watching cars able to drive passengers without seeing them even touching the steering wheel or the pedals. With the help of cameras all around the device, radars, and sensors, the car is able to determine which are the elements present in its surrounding area and make predictions regarding their trajectory or actions. The neural networks within the program analyze the pixel patterns from the images of cameras and can tell whether the object on the right-hand side is a bicycle or not and if it is coming towards the car or going away from it. self-driving cars also detect and identify traffic signs and signals, trees, pathways, or even pedestrians.
Home Security has become a huge preoccupation for people as well as Insurance Companies. Robberies happen every day to many different people. Many individuals have decided to tackle this problem. They started to install cameras and security alarms all over their homes and surrounding areas. pre-trained model has proven to be very efficient to a lot of people. Most of the time, it is used to show the Police or the Insurance Company that a thief indeed broke into the house and robbed something. But this solution is also used to detect a lot of fraud. On another note, CCTV cameras are more and more installed in big cities to spot incivilities and vandalism for instance. Digital Images are also used by stores to highlight shoplifters in actions and provide the Police authorities with proof of the felony. Lastly, Airport Security agents are using this kind of camera as well so as to detect suspicious behavior of individuals, to practice facial recognition, and to identify potential threats such as the presence of unattended bags. It is a complex task, but Machine Learning has made it possible.
Medical staff members seem to be appreciating more and more the application of AI in their field. Through X-rays for instance, Image annotations can detect and put bounding boxes around fractures, abnormalities, or even tumors. Thanks to Object Detection and image preprocessing, doctors are able to give their patients their diagnostics more rapidly and more accurately. They can check if their treatment is functioning properly or not, and they can even recognize the age of certain bones.
Since the beginning of the COVID-19 pandemic and the lockdown it has implied, people have started to place orders on the Internet for all kinds of items (clothes, glasses, food, etc.). Some companies have developed their own unsupervised learning algorithm for their specific activities. Online shoppers now have the possibility to try clothes or glasses online. They just have to take a video or a picture of their face or body to get try items they choose online directly through their smartphones. This way, the customer can visualize how the items look on him or her. The person just has to place the order on the items he or she is interested in. Online shoppers also receive suggestions of pieces of clothing they might enjoy, based on what they have searched for, purchased, or shown interest in.
Farmers are always looking for new ways to improve their working conditions. Taking care of both their cattle and their plantation can be time-consuming and not so easy to do. Today more and more of them use AI and Image Recognition to improve the way they work. Cameras inside the buildings allow them to monitor the animals, make sure everything is fine. When animals give birth to their babies, farmers can easily identify if it is having difficulties delivering and can quickly react and come to help the animal. These professionals also have to deal with the health of their plantations. Object Detection helps them to analyze the condition of the plant and gives them indications to improve or save the crops, as they will need it to feed their cattle.
The first industry is somewhat obvious taking into account our application. Yes, fitness and wellness is a perfect match for image recognition and pose estimation systems.
Yes, fitness and wellness is a perfect match for image recognition and pose estimation systems.
Image recognition fitness apps can give a user some tips on how to improve their yoga asanas, watch the user’s posture during the exercises, and even minimize the possibility of injury for elderly fitness lovers.
While Youtube tutorials can only show how to perform an exercise, human pose recognition apps go way further and help users with improving their performance. How many of us went to an offline training just to get some feedback and know whether we are exercising not in vain?
Image recognition works well for manufacturers and B2B retailers too. Remember our example with a milk batch that had to be recalled? That could be avoided with a better quality assurance system aided with image recognition.
For example, an IR algorithm can visually evaluate the quality of fruit and vegetables. Those that do not look fresh anymore won’t be shipped to the retailers. Producers can also use IR in the packaging process to locate damaged or deformed items. What is more, it is easy to count the number of items inside a package. For example, a pharmaceutical company needs to know how many tables are in each bottle.
The use of IR in manufacturing doesn’t come down to quality control only. If you have a warehouse or just a small storage space, it will be way easier to keep it all organized with an image recognition system. For instance, it is possible to scan products and pallets via drones to locate misplaced items.
What about med tech? Image recognition can be applied to dermatology images, X-rays, tomography, and ultrasound scans. Such classification can significantly improve telemedicine and monitoring the treatment outcomes resulting in lower hospital readmission rates and simply better patient care.
For example, IR technology can help with cancer screenings. Medical image analysis is now used to monitor tumors throughout the course of treatment.Medical image analysis is a true revolution.
This article was written by Zohair Badshah, a former member of our software team, and edited by our writers team.
🚀 "Build ML Pipelines Like a Pro!" 🔥 From data collection to model deployment, this guide breaks down every step of creating machine learning pipelines with top resources
Explore top AI tools transforming industries—from smart assistants like Alexa to creative powerhouses like ChatGPT and Aiva. Unlock the future of work, creativity, and business today!
Master the art of model selection to supercharge your machine-learning projects! Discover top strategies to pick the perfect model for flawless predictions!