CNN 102: Building an AI That Can See 👁️

Why Should You Care About CNNs?

Have you ever wondered how your phone recognizes faces in photos? Or how self-driving cars can tell the difference between a pedestrian and a street sign? Welcome to the fascinating world of Convolutional Neural Networks (CNNs) – the eyes of artificial intelligence.

🏥 Medical image analysis for disease detection
🔍 Google Lens identifying objects in real-time
📸 Instagram filters that can detect and modify facial features
Facebook tagging system that identifies your profile

While our previous exploration of Artificial Neural Networks (ANNs) showed how computers can make decisions like a human brain, CNNs take this further by giving computers the ability to understand visual information like the human eye.

The Magic Behind Computer Vision: CNN

What we'll cover in this article together?

In this guide, we'll build a CNN that can distinguish between cats and dogs – a task that seems trivial for humans but represents a significant challenge in computer vision. We'll break down each step into simple, digestible pieces, just like we did with ANNs.

By the end of this tutorial, you'll understand:

How computers see images 👀
Why CNNs are perfect for image recognition 🎯
How to build your own image classification system 🛠️

Let's teach a computer to see! 🚀

We will be using Tensorflow.keras for building our neural network. Since neural network is a sequence of a network.

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

1. Data Preprocessing:

Before we feed images into our CNN, we need to prepare them properly - like washing ingredients before cooking! Good preprocessing can make the difference between a model that learns well and one that fails to learn at all.

# Creating the data augmentation configuration
image_augmentor = ImageDataGenerator(
  rescale = 1./255,
  shear_range = 0.2,
  zoom_range = 0.2,
  horizontal_flip = True #
)

# Creating the training data pipeline
training_data = image_augmentor.flow_from_directory(
  'dataset/training_set',
  target_size = (64,64),
  batch_size = 32,
  class_mode = 'binary'
)

# For test data - only rescaling configuration
test_preprocessor = ImageDataGenerator(rescale = 1./255)

# Creating the test data pipeline
testing_data = test_preprocessor.flow_from_directory(
  'dataset/test_set',
  target_size = (64,64),
  batch_size = 32,
  class_mode = 'binary'
)

Why do we process our data?

A simple example: Imagine teaching a child to recognize dogs:

If you only show them perfectly straight, front-view dog photos. They might not recognize a dog when it's:

Lying sideways
Seen from an angle
Partially visible
In different lighting

This is what image_transformer does - it creates these variations automatically so our model learns to recognize objects in different situations!

Here's what ImageDataGenerator really does:

Rescale: Makes pixel values smaller (like turning down the brightness)
Shear_range: Tilts the image (like when a photo is taken at an angle)
Zoom_range: Makes the image look closer or farther
Horizontal_flip: Flips the image (like in a mirror) )

if you observe properly we will use all of these techniques in training set but why only rescale in testing sets. Why?

It's like taking the actual exam in standard conditions. No special variations. Testing what you actually learned in real-world application.

Why do we apply these transformation?

We'll apply transformation on images of the training set. if we don't apply these transformation, we will get huge difference in accuracy , we probably will get overfitting. In computer vision a way to avoid overfitting is transformation.

2. Build the CNN (Like Building with LEGO!)

Think of this like getting an empty box to build your LEGO model. Instantiating our neural network from a sequential class. Neural network is basically a sequence of layers of neurons. We basically to work with an object of sequential class.

a. Instantiating an object of this sequential class:

cnn = tf.keras.models.Sequential()

b. Adding our first layer in neural network:

We have a instance that is sequential, how do we add layer? To add our very first convolutional layer to our network. This convolutional layer is again object of a certain class i.e Conv2D class from layers module in keras.

what parameters Conv2D takes?

Filters: Number of feature detector
Kernel_size = Size of the feature detector, number of row which is also the number of col, a squared array.
Activation function: To introduce non linearity since we are dealing with image.
Input_shape: Wether it's a convolutional layer or a normal dense layer, you have to specify the input shape. Since we are working with color images(RGB , 3 dimensions), we are resized our images to 64 * 64.

cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size =3, activation ='relu', input_shape = [64,64,3]))

3. Applying Pooling:

After finding patterns (Convolution), we simplify the information. Taking a big photo and making it smaller while keeping important details on it.

Real-world Example: Imagine you have a 4x4 photo of a cat's eye:

Original (4x4):     After MaxPooling (2x2):
8 3 2 1        8 4
4 6 3 4    →   7 9
2 7 9 5
1 3 2 8

Why do this?

Makes the image smaller (reduces computation)
Keeps the important features (like edges, textures)
Makes the model more robust

it's like looking at a city from an airplane. You don't see every house. But you still see the important landmarks. The big picture remains clear!

we are going to apply max pooling. This time we are adding pooling layer to our sequential layer. It's again a instance of a certain class i.e MaxPool2D class from layers.

parameters:

pool_size: The size of the color frame 2*2 frame.
strides: What distance these color palettes move.For example each time it's moving, it moving by 2, so everytime it moved it has new color.

cnn.add(tf.keras.layers.MaxPool2D(pool_size = 2, strides = 2))

4. Adding second convolutional layer:

We simply copy paste our code earlier in number 2.b. In addition we then remove input_shape, as it is only needed to connect first layer to our input layer.

cn.add(tf.keras.layers.Conv2D(filters=32, kernel_size =3, activation ='relu'))

again apply pooling to our layer:

cnn.add(tf.keras.layers.MaxPool2D(pool_size = 2, strides = 2))

5. Applying Flattening:

Like unrolling a rolled-up poster. Flattening converts our 2D image data into a single line.

Transition from Convolutional to Dense Layers:

After convolution and pooling layers, we have 3D feature maps (height × width × channels). But dense layers expect 1D input vectors. Flattening converts the 3D output into a 1D vector

Before Flattening:
Feature map shape: (3, 3, 2) # Height=3, Width=3, Channels=2

[[[1, 2], [3, 4], [5, 6]],
 [[7, 8], [9, 10], [11,12]],
 [[13,14], [15,16], [17,18]]]

After Flattening:
1D vector shape: (18,)
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]

cnn.add(tf.keras.layers.Flatten())

Real-world Analogy: Imagine a Courtroom 👨‍⚖️

Flattening is like organizing all evidence in a line on a table, Everything visible at once

6. How to make it fully connected network?

a: The actual brain that makes decisions. Like connecting all information to make a final decision. We use dense class from keras and parameters it takes are:

Units: number of neurons, since we are dealing with images in cnn, we might wanna bump the neurons
Activation function: rectifier
cnn.add(tf.keras.layers.Dense(units = 128, activation = 'relu'))

Analogy of Courtroom 👨‍⚖️:

it's like having 128 jury members
Each jury member looks at ALL evidence
Each makes their own opinion
Together they make the final decision

The activation='relu' part is like each jury member deciding "Yes, this evidence is important" or "No, this isn't relevant" i.e making information either positive or zero.

cnn.add(tf.keras.layers.Dense(units = 128, activation = 'relu'))

b: Adding Output Layer:

Since we are doing binary classification, determining the image is either dog or cat. So we we picked sigmoid function.On dealing with multiclass classification then we will use softmax function.

Why sigmoid not 'relu'?

relu can output any positive number
'sigmoid' gives us probability-like output (0-1)
Perfect for yes/no decisions like cat/dog

Think of it like:

ReLU is like a thermometer (can go very high)
Sigmoid is like a percentage (always 0-100%)

cnn.add(tf.keras.layers.Dense(units = 1,activation = 'sigmoid'))

7. Train the Model

cnn.compile( optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

cnn.fit(x=train_set, validation_data=test_set, epochs=25)

our model will start learning the data, it will iterate for 25 times as we mentioned in our epochs.

Epoch 1/25
100/100 [==============================] - 30s 302ms/step - loss: 0.6931 - accuracy: 0.5066 - val_loss: 0.6928 - val_accuracy: 0.5133

Epoch 2/25
100/100 [==============================] - 29s 295ms/step - loss: 0.6922 - accuracy: 0.5233 - val_loss: 0.6901 - val_accuracy: 0.5467

It will take some time and it should be ready to tell you an image is either dog or cat.

How can we feed the image to our model?

import numpy as np
from tensorflow.keras.preprocessing import image

#this returns pl type format
test_image = image.load_img('prediction/cat_or_dog_1.jpg', target_size = (64,64))

# converting pl format to array for processing
test_image = image.img_to_array(test_image)

# adding into batch
test_image = np.expand_dims(test_image, axis = 0)
result = cnn.predict(test_image)
train_set.class_indices

# result is also returned in the batch so we need that extra layer to access index
if result[0][0] == 1:
    prediction = 'dog'
else:
    prediction = 'cat'

print(prediction)

This is how we can use our model.

To Conclude:

While our cat-dog classifier is just the beginning, it opens the door to understanding how modern AI sees and interprets the visual world around us.

Remember:

Every advanced AI system started with fundamental concepts like what we've learned here. The journey from distinguishing cats from dogs to detecting diseases or driving cars is just a matter of scale and refinement.

#MachineLearning #ComputerVision #CNN #DeepLearning #AI #ArtificialIntelligence #DataScience #NeuralNetworks #ImageProcessing #TensorFlow #Keras #Python #Programming #Tech #Technology #CodingTutorial #AIEducation #TechTutorial #ComputerScience #CatsVsDogs #ImageClassification #ConvolutionalNeuralNetworks #LearnAI #LearnML #AITutorial #DataScience #PythonProgramming #AITechnology #TechInnovation