Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
CNN 102: Building an AI That Can See 👁️
Prashant Basnet
Nov 3, 2024
190 views
Why Should You Care About CNNs?
Have you ever wondered how your phone recognizes faces in photos? Or how self-driving cars can tell the difference between a pedestrian and a street sign? Welcome to the fascinating world of Convolutional Neural Networks (CNNs) – the eyes of artificial intelligence.
While our previous exploration of Artificial Neural Networks (ANNs) showed how computers can make decisions like a human brain, CNNs take this further by giving computers the ability to understand visual information like the human eye.
What we'll cover in this article together?
In this guide, we'll build a CNN that can distinguish between cats and dogs – a task that seems trivial for humans but represents a significant challenge in computer vision. We'll break down each step into simple, digestible pieces, just like we did with ANNs.
By the end of this tutorial, you'll understand:
We will be using Tensorflow.keras for building our neural network. Since neural network is a sequence of a network.
1. Data Preprocessing:
Before we feed images into our CNN, we need to prepare them properly - like washing ingredients before cooking! Good preprocessing can make the difference between a model that learns well and one that fails to learn at all.
Why do we process our data?
A simple example: Imagine teaching a child to recognize dogs:
If you only show them perfectly straight, front-view dog photos. They might not recognize a dog when it's:
This is what image_transformer does - it creates these variations automatically so our model learns to recognize objects in different situations!
Here's what ImageDataGenerator really does:
It's like taking the actual exam in standard conditions. No special variations. Testing what you actually learned in real-world application.
Why do we apply these transformation?
We'll apply transformation on images of the training set. if we don't apply these transformation, we will get huge difference in accuracy , we probably will get overfitting. In computer vision a way to avoid overfitting is transformation.
2. Build the CNN (Like Building with LEGO!)
Think of this like getting an empty box to build your LEGO model. Instantiating our neural network from a sequential class. Neural network is basically a sequence of layers of neurons. We basically to work with an object of sequential class.
a. Instantiating an object of this sequential class:
b. Adding our first layer in neural network:
We have a instance that is sequential, how do we add layer? To add our very first convolutional layer to our network. This convolutional layer is again object of a certain class i.e Conv2D class from layers module in keras.
what parameters Conv2D takes?
3. Applying Pooling:
After finding patterns (Convolution), we simplify the information. Taking a big photo and making it smaller while keeping important details on it.
Real-world Example: Imagine you have a 4x4 photo of a cat's eye:
it's like looking at a city from an airplane. You don't see every house. But you still see the important landmarks. The big picture remains clear!
we are going to apply max pooling. This time we are adding pooling layer to our sequential layer. It's again a instance of a certain class i.e MaxPool2D class from layers.
4. Adding second convolutional layer:
We simply copy paste our code earlier in number 2.b. In addition we then remove input_shape, as it is only needed to connect first layer to our input layer.
again apply pooling to our layer:
5. Applying Flattening:
Like unrolling a rolled-up poster. Flattening converts our 2D image data into a single line.
After convolution and pooling layers, we have 3D feature maps (height × width × channels). But dense layers expect 1D input vectors. Flattening converts the 3D output into a 1D vector
Real-world Analogy: Imagine a Courtroom 👨⚖️
Flattening is like organizing all evidence in a line on a table, Everything visible at once
6. How to make it fully connected network?
a: The actual brain that makes decisions. Like connecting all information to make a final decision. We use dense class from keras and parameters it takes are:
Analogy of Courtroom 👨⚖️:
The activation='relu' part is like each jury member deciding "Yes, this evidence is important" or "No, this isn't relevant" i.e making information either positive or zero.
b: Adding Output Layer:
Since we are doing binary classification, determining the image is either dog or cat. So we we picked sigmoid function.On dealing with multiclass classification then we will use softmax function.
Why sigmoid not 'relu'?
Think of it like:
7. Train the Model
our model will start learning the data, it will iterate for 25 times as we mentioned in our epochs.
It will take some time and it should be ready to tell you an image is either dog or cat.
How can we feed the image to our model?
This is how we can use our model.
To Conclude:
While our cat-dog classifier is just the beginning, it opens the door to understanding how modern AI sees and interprets the visual world around us.
Every advanced AI system started with fundamental concepts like what we've learned here. The journey from distinguishing cats from dogs to detecting diseases or driving cars is just a matter of scale and refinement.
#MachineLearning #ComputerVision #CNN #DeepLearning #AI #ArtificialIntelligence #DataScience #NeuralNetworks #ImageProcessing #TensorFlow #Keras #Python #Programming #Tech #Technology #CodingTutorial #AIEducation #TechTutorial #ComputerScience #CatsVsDogs #ImageClassification #ConvolutionalNeuralNetworks #LearnAI #LearnML #AITutorial #DataScience #PythonProgramming #AITechnology #TechInnovation