-- Leo's gemini proxy

-- Connecting to aprates.dev:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Deep Learning notation

2021-10-15 | aprates.dev


Leia este post em português


> A computer would deserve to be called intelligent if it could deceive a human into believing that it was human. - Alan Turing


Deep maths, ufff…


By the mid 2021 I started diving into a machine learning course I though I should do. A long time ago, when I graduated, my graduation paper was about chatbots with emotions and how humans would react to that. I wanted to better understand how the techniques had evolved from back then, in 2006, and found something a bit different from what I was expecting.


For the current status quo, you just cannot avoid some basic knowledge of python libraries (such as numpy), linear algebra, and a good dose of mathematical notation understanding, when reading descriptions of machine learning methods. And it can be very frustrating at times.


One bit of notation in an equation you don't grasp completely might prevent you from implementing the concept your are trying to learn. Coming as an experienced developer, I had that beginner-like feeling, while facing modern machine learning basics.


So here I have collected some mathematical notation that I have come across while doing the deep learning course, and also some notes on concepts that felt like mysterious to me like cost and derivatives.


I noted those mostly for my personal use, but posted it as I wish I had found this when searching the Internet. Also I must say notation varies a lot from author to author, and also, that I am still learning, so take my notes with a grain of salt.


Principle


The activation of a node in a neural network is something of the form:

output = activation_function(dot_product(weights, inputs) + bias)

General Notation


as per Andrew Ng of the deeplearning.ai specialization on Coursera [2]


x : input


y : output


m : amount of data


X : set of training examples


Y : set of outcome examples


N : size of X


( x(i) , y(i) ) : i-th example pair in X


( x' , y' ) : a testing pair


yˆ : predicted output


L : loss function (can also refer to hidden layers, see hyperparameters)


J : cost function


W : set of w parameters (weights)


B : set of b parameters (biases)


w[1], w[2], b[1], b[2],… : parameters per layer


Z = transpose(W) * X + B : vectorized implementation of hidden and output layers


dw1, dw2, db : derivatives of parameters


Hyperparameters


These parameters actually control how parameters w and b work:


α : learning rate (alpha symbol)


number of iterations for the gradient descent


L : number of hidden layers


n[1], n[2],… : number of hidden units per layer


choice of activation function, like relu, tanh, sigmoid…


Concepts


Cost


The loss function is determined as the difference between the actual output and the predicted output from the model, like y V.S. y^.


Although sometimes loss is also referred as cost, it's not the same thing. The cost function is an average loss over the complete train dataset like Y.


Derivatives (dx)

Collected from a note I found useful on forum posted by BurntCalcium (nick), another student:


> Basically if f is a function of x, you're taking a ratio of the *change in f* to the *change in x*, given that the latter is an infinitesimally small quantity. The 'd' that is used while writing the notation represents the Greek letter Δ (Delta), which is commonly used to show change in a quantity in physics and math. So basically dx would mean the change in x, df(x) would mean the change in f(x), and df(x)/dx as a whole is called the derivative of f(x) with respect to x. And of course, in the course the instructors have adopted the notation that dx represents df(x)/dx, however outside the context of this course dx would simply mean change in x.


Reference


Deep Learning on Coursera


See also


Capsule Archives

Capsule Home


Want more?


Comment on one of my posts, talk to me, say: hello@aprates.dev


Subscribe to the Capsule's Feed

Checkout the FatScript project on GitLab

Checkout my projects on GitHub

Checkout my projects on SourceHut


© aprates.dev, 2021-2023 - content on this site is licensed under

Creative Commons BY-NC-SA 4.0 License

Proudly built with GemPress

Privacy Policy

-- Response ended

-- Page fetched on Wed May 22 01:42:50 2024