Beginners Guide to Neural Networks

Biological Neural Networks works as follows:

Neurons are connected to and receive electrical signals from other neutrons.
Neurons process input signals and can be activated.

Artificial Neural Networks

Model mathematical functions from inputs to outputs based on the structure and parameters of the network.
Allows for learning the network's parameters based on data.

Given $h(x_1,x_2) = w_0 + w_1x_1 + w_2x_2$, in other to make predictions, the variables of $x_1$ and $x_2$ are multiplied by some weights and the also added to a bias $w_0$ in other to make some classification.

weights are just values

![[Pasted image 20250301100635.png]]

These are known as [[ML Algorithms Overview .pdf|activation]] functions. Examples include reLU, Sigmoid, Tahn etc, which takes some function g and applies it to all the result of the computation.

$$ h(x_1,x_2) = g(w_0 + w_1x_1 + w_2x_2) $$

![[Pasted image 20250301101225.png]]

The neural network learns what the activation function, the values of $w_0$, $w_1$, and $w_2$ should be.

Functions like the And or Or could be used to model the neural networks.

Example: Given the And Function:

$x$	$y$	$f(x,y)$
0	0	0
0	1	0
1	0	0
1	1	0
![[Pasted image 20250301101843.png]]
This can be achieved by using $-2$ as the bias.

For OR $-1$ can be used for the bias.

The neural networks can be complex with more nodes. However the idea still remains the same. The inputs are multiplied by some weights and then added to some bias.

![[Pasted image 20250301102246.png]]

How do we train the Neural Networks

The strategy for doing this is inspired by the domain of calculus called gradient descent.

Gradient Descent

Algorithm for minimizing loss when training neural network. Loss can be defined as how bad an hypothesis function is. The gradient calculates the direction the loss function is moving.

Steps

Start with a random choice of weights.
Repeat:
- Calculate the gradient based on all data points: direction that will lead to decreasing loss.
- Update weights according to the gradient.

However, using all data points is very expensive and Stochastic Gradient Descent can instead be used to calculate the gradient based on one data point

[!note] Stochastic Gradient Descent can be less accurate despite being computationally efficient. Mini-Batch Gradient Descent focuses on one small batch at a time instead of computing using all the data points at once.

The neural networks can be used for supervised learning and unsupervised learning tasks. Example predicting the weather with multiple outputs(eg. sunny, rainy, snowy ) and based on the probabilities, choose the higher one. Could also be used for reinforcement learning where the outputs will represent various actions and agent can perform given a particular state.

![[Pasted image 20250301103905.png]]

Perceptron

Only capable of learning linearly separable decision boundary.

Multilayer Neural Network

Artificial neural network with an input layer an output layer, and at least one hidden layer

![[Pasted image 20250301104805.png]]

The gives us the ability to model more complex functions rather than just given a single line like in the case of the [[#Perceptron|Perceptron]].

How to train a neural network with hidden layers Because we have the values of the input and the output but no information on the hidden layers. However the ideas is, if you know what the loss is for the output node, based on the losses you could estimate the weights for the hidden layers. ie. given the loss, it's possible to back propagate and get an estimate of what the values are for the hidden layers.

[[Deep Learning Fundamentals#Backpropagation|Backpropagation]]

Algorithm for training neural networks with hidden layers

Steps

Start with a random choice of weights.
Repeat:
- Calculate error for output layer.
- For each layer, starting with output layer, and moving inwards towards earliest hidden layer:
  - Propagate error back one layer

this is the main ideas for neural networks and can be accommodate even more complex networks with multiple hidden layers, deep neural networks

![[Pasted image 20250301124210.png]]

Overfitting

Happens when a model learns the training data too well, including its noise and specifics, and fails to generalize to new, unseen data.

Dropout

Is one of the techniques used to overcome overfitting. It temporarily removes units - selected at random - from a neural network to prevent over-reliance on certain units.

Pooling

reducing the size of an input by sampling from regions in the input.

Max-pooling

Pooling by choosing the maximum value in each region. Make the algorithm more robust as it considers the maximum values of the regions and make analysis independent of the values within a particular region.

![[Pasted image 20250301131857.png]]

Flattening

Refers to the process of converting a multi-dimensional feature map—typically the output of convolutional and pooling layers—into a one-dimensional vector. This step is necessary to transition from the feature extraction phase (handled by convolutional layers) to the classification or regression phase (handled by fully connected layers).

Convolutional Neural Network

Neural networks that uses convolution, usually for analyzing images. This is achieved by applying image filters in other to get feature map. The neural network is trained on what these filters should be to be able to get the most useful information

Convolution refers to the process of applying a small filter (or kernel) to an input, such as an image, to extract features like edges, textures, or patterns.

![[Pasted image 20250301132204.png]] Traditional CNN

Convolution and Pooling can be used multiple times, in multiple steps to extract relevant features and use pooling to reduce the dimension of the image/input.

![[Pasted image 20250301132436.png]]

Feed-forward neural networks (FFNNs)

Neural network that has connections only in one direction. The issue here is that, the input needs to be a fix shape with a fix number of neurons in the input layer and a corresponding fix number of neurons in the output layer.

[!note] process data in a single direction—from input to output—without any loops or memory.

Recurrent Neural Networks (RNNs)

RNNs are a type of neural network designed to handle sequential or time-series data by maintaining a "memory" of previous inputs. They achieve this through loops in their architecture, where the output of one step is fed back into the network as an input for the next step. This makes RNNs well-suited for tasks like language modeling, speech recognition, or time-series prediction, where context from earlier data points matters.

TensorFlow

Is a neural network library that helps us to implement some of the concepts such as backpropagation, dropouts, activation functions etc.

TensorFlow Playground