机器学习简介

Different types of Functions

Regression : The function outputs a scalar(标量).

  • predict the PM2.5

Classification : Given options (classes), the function outputs the correct one.

  • Spam filtering

Structured Learning : create something with structure(image, document)

Example : YouTube Channel

1.Function with Unknown Parameters.

$$ y=b+wx_1 $$

2.Define Loss from Training Data

  • Loss is a function of parameters

$$ L(b,w) $$

  • Loss : how good a set of values is.
  • L is mean absolute error (MAE)

$$ e=\left | y-\hat{y} \right | $$

  • L is mean square error (MSE)

$$ e=(y-\hat{y})^2 $$

$$ L=\frac{1}{N} \sum_{n}^{}e_n $$

3.Optimization

$$ w^,b^=arg,\min_{w,b} ,L $$

Gradient Descent

  • (Randomly) Pick an initial value :

$$ w^0 $$

  • Compute :

$$ \frac {\partial L} {\partial w} |_{w=w_0} $$

Negative : Increase w

Positive : Decrease w

$$ \eta\frac {\partial L} {\partial w} |_{w=w_0} $$

η:learning rate (hyperparameters)

  • Update w iteratively
    • Local minima
    • global minima

类似一个参数,推广到多个参数。

Linear Models

Linear models have severe limitation. Model Bias.

We need a more flexible model!

curve = constant + sum of a set of Hard Sigmoid Function

$$ y=c\frac {1} {1+exp(-(b+wx_1))} \ =csigmoid(b+wx_1) $$

$$ y=b+\sum_{i}sigmoid(b_i+w_ix_i) $$

$$ y=b+\sum_{i}sigmoid(b_i+\sum_{j}w_{ij}x_j) $$

线性代数角度:

$$ r=b+Wx $$

$$ a=\sigma(r) $$

$$ y=b+c^Ta $$

Loss

  • Loss is a function of parameters L(θ)
  • Loss means how good a set of values is.

Optimization of New Model

$$ \theta= \begin{bmatrix}
\theta_1 \ \theta_2 \ \theta_3 \ \dots \end{bmatrix} $$

$$ \theta=arg \min_\theta L $$

  • (Randomly) Pick initial values θ^0

1 epoch = see all the batched once

update : update θ for each batch

Sigmoid -> ReLU (Rectified Linear Unit)

统称为 Activation function

Neural Network