机器学习简介

Different types of Functions

Regression : The function outputs a scalar(标量）.

predict the PM2.5

Classification ： Given options (classes), the function outputs the correct one.

Spam filtering

Structured Learning ： create something with structure(image, document)

Example : YouTube Channel

1.Function with Unknown Parameters.

$$ y=b+wx_1 $$

2.Define Loss from Training Data

Loss is a function of parameters

$$ L(b,w) $$

Loss : how good a set of values is.
L is mean absolute error (MAE)

$$ e=\left | y-\hat{y} \right | $$

L is mean square error (MSE)

$$ e=(y-\hat{y})^2 $$

$$ L=\frac{1}{N} \sum_{n}^{}e_n $$

3.Optimization

$$ w^,b^=arg,\min_{w,b} ,L $$

Gradient Descent

(Randomly) Pick an initial value ：

$$ w^0 $$

Compute :

$$ \frac {\partial L} {\partial w} |_{w=w_0} $$

Negative : Increase w

Positive : Decrease w

$$ \eta\frac {\partial L} {\partial w} |_{w=w_0} $$

η：learning rate (hyperparameters)

Update w iteratively
- Local minima
- global minima

类似一个参数，推广到多个参数。

Linear Models

Linear models have severe limitation. Model Bias.

We need a more flexible model!

curve = constant + sum of a set of Hard Sigmoid Function

$$ y=c\frac {1} {1+exp(-(b+wx_1))} \ =csigmoid(b+wx_1) $$

$$ y=b+\sum_{i}sigmoid(b_i+w_ix_i) $$

$$ y=b+\sum_{i}sigmoid(b_i+\sum_{j}w_{ij}x_j) $$

线性代数角度：

$$ r=b+Wx $$

$$ a=\sigma(r) $$

$$ y=b+c^Ta $$

Loss

Loss is a function of parameters L(θ)
Loss means how good a set of values is.

Optimization of New Model

$$ \theta= \begin{bmatrix}
\theta_1 \ \theta_2 \ \theta_3 \ \dots \end{bmatrix} $$

$$ \theta=arg \min_\theta L $$

(Randomly) Pick initial values θ^0

1 epoch = see all the batched once

update : update θ for each batch

Sigmoid -> ReLU (Rectified Linear Unit)

统称为 Activation function

Neural Network

机器学习简介#

1.Function with Unknown Parameters.#

2.Define Loss from Training Data#

3.Optimization#

Linear Models#

Optimization of New Model#

Sigmoid -> ReLU (Rectified Linear Unit)#