使用numpy手写一个神经网络_业界新闻

发布时间:2024-07-29 09:59

阅读量:0

本文主要包含以下内容：

推导神经网络的误差反向传播过程
使用numpy编写简单的神经网络，并使用iris数据集和california_housing数据集分别进行分类和回归任务，最终将训练过程可视化。

1. BP算法的推导过程

1.1 导入

前向传播和反向传播的总体过程。

神经网络的直接输出记为 $Z^{[l]}$ ，表示激活前的输出，激活后的输出记为 $A$ 。

第一个图像是神经网络的前向传递和反向传播的过程，第二个图像用于解释中间的变量关系，第三个图像是前向和后向过程的计算图，方便进行推导，但是第三个图左下角的 $A^{[l-2]}$ 有错误，应该是 $A^{[l-1]}$ 。

1.2 符号表

为了方便进行推导，有必要对各个符号进行介绍

符号表

记号	含义
$n_l$	第 $l$ 层神经元个数
$f_l(\cdot)$	第 $l$ 层神经元的激活函数
$\mathbf{W}^l\in\R^{n_{l-1}\times n_{l}}$	第 $l - 1$ 层到第 $l$ 层的权重矩阵
$\mathbf{b}^l \in \R^{n_l}$	第 $l - 1$ 层到第 $l$ 层的偏置
$\mathbf{Z}^l \in \R^{n_l}$	第 $l$ 层的净输出，没有经过激活的输出
$\mathbf{A}^l \in \R^{n_l}$	第 $l$ 层经过激活函数的输出， $A^0=X$

深层的神经网络都是由一个一个单层网络堆叠起来的，于是我们可以写出神经网络最基本的结构，然后进行堆叠得到深层的神经网络。

于是，我们可以开始编写代码，通过一个类Layer来描述单个神经网络层

class Layer:     def __init__(self, input_dim, output_dim):         # 初始化参数         self.W = np.random.randn(input_dim, output_dim) * 0.01         self.b = np.zeros((1, output_dim))              def forward(self, X):         # 前向计算         self.Z = np.dot(X, self.W) + self.b         self.A = self.activation(self.Z)         return self.A          def backward(self, dA, A_prev, activation_derivative):         # 反向传播         # 计算公式推导见下方         m = A_prev.shape[0]         self.dZ = dA * activation_derivative(self.Z)         self.dW = np.dot(A_prev.T, self.dZ) / m         self.db = np.sum(self.dZ, axis=0, keepdims=True) / m         dA_prev = np.dot(self.dZ, self.W.T)         return dA_prev          def update_parameters(self, learning_rate):         # 参数更新         self.W -= learning_rate * self.dW         self.b -= learning_rate * self.db           # 带有ReLU激活函数的Layer class ReLULayer(Layer):     def activation(self, Z):         return np.maximum(0, Z)          def activation_derivative(self, Z):         return (Z > 0).astype(float)      # 带有Softmax激活函数（主要用于分类）的Layer class SoftmaxLayer(Layer):     def activation(self, Z):         exp_z = np.exp(Z - np.max(Z, axis=1, keepdims=True))         return exp_z / np.sum(exp_z, axis=1, keepdims=True)          def activation_derivative(self, Z):         # Softmax derivative is more complex, not directly used in this form.         return np.ones_like(Z)

1.3 推导过程

权重更新的核心在于计算得到self.dW和self.db，同时，为了将梯度信息不断回传，需要backward函数返回梯度信息dA_prev。

需要用到的公式
$Z^l = W^l A^{l-1} +b^l \\A^l = f(Z^l)\\\frac{dZ}{dW} = (A^{l-1})^T \\\frac{dZ}{db} = 1$
解释：

从上方计算图右侧的反向传播过程可以看到，来自于上一层的梯度信息dA经过dZ之后直接传递到db，也经过dU之后传递到dW，于是我们可以得到dW和db的梯度计算公式如下：
$\begin{align}dW &= dA \cdot \frac{dA}{dZ} \cdot \frac{dZ}{dW}\\ &= dA \cdot f'(dZ) \cdot A_{prev}^T \\ \end{align}$
其中， $f(\cdot)$ 是激活函数， $f'(\cdot)$ 是激活函数的导数， $A_{prev}^T$ 是当前层上一层激活输出的转置。

同理，可以得到
$\begin{align}db &= dA \cdot \frac{dA}{dZ} \cdot \frac{dZ}{db}\\ &= dA \cdot f'(dZ) \\ \end{align}$
需要仅需往前传递的梯度信息：
$\begin{align}dA_{prev} &= dA \cdot \frac{dA}{dZ} \cdot \frac{dZ}{A_{prev}}\\ &= dA \cdot f'(dZ) \cdot W^T \\ \end{align}$
所以，经过上述推导，我们可以将梯度信息从后向前传递。

分类损失函数

分类过程的损失函数最常见的就是交叉熵损失了，用来计算模型输出分布和真实值之间的差异，其公式如下：
$-\frac{1}{N}\sum_{i=1}^N \sum_{j=1}^C{y_{ij} log(\hat{y_{ij}})}$
其中， $N$ 表示样本个数， $C$ 表示类别个数， $y_{ij}$ 表示第i个样本的第j个位置的值，由于使用了独热编码，因此每一行仅有1个数字是1，其余全部是0，所以，交叉熵损失每次需要对第 $i$ 个样本不为0的位置的概率计算对数，然后将所有所有概率取平均值的负数。

交叉熵损失函数的梯度可以简洁地使用如下符号表示：
$\nabla_zL = \mathbf{\hat{y}} - \mathbf{{y}}$

回归损失函数

均方差损失函数由于良好的性能被回归问题广泛采用，其公式如下：
$\frac{1}{N} \sum_{i=1}^N(y_i - \hat{y_i})^2$
向量形式：
$\frac{1}{N} ||\mathbf{y} - \mathbf{\hat{y}}||^2_2$
梯度计算：
$\nabla_{\hat{y}}L = \frac{2}{N}(\mathbf{\hat{y}} - \mathbf{y})$

2 代码

2.1 分类代码

import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder import matplotlib.pyplot as plt  class Layer:     def __init__(self, input_dim, output_dim):         self.W = np.random.randn(input_dim, output_dim) * 0.01         self.b = np.zeros((1, output_dim))              def forward(self, X):         self.Z = np.dot(X, self.W) + self.b     # 激活前的输出         self.A = self.activation(self.Z)        # 激活后的输出         return self.A          def backward(self, dA, A_prev, activation_derivative):         # 注意：梯度信息是反向传递的: l+1 --> l --> l-1         # A_prev是第l-1层的输出，也即A^{l-1}         # dA是第l+1的层反向传递的梯度信息         # activation_derivative是激活函数的导数         # dA_prev是传递给第l-1层的梯度信息         m = A_prev.shape[0]         self.dZ = dA * activation_derivative(self.Z)         self.dW = np.dot(A_prev.T, self.dZ) / m         self.db = np.sum(self.dZ, axis=0, keepdims=True) / m         dA_prev = np.dot(self.dZ, self.W.T) # 反向传递给下一层的梯度信息         return dA_prev          def update_parameters(self, learning_rate):         self.W -= learning_rate * self.dW         self.b -= learning_rate * self.db  class ReLULayer(Layer):     def activation(self, Z):         return np.maximum(0, Z)          def activation_derivative(self, Z):         return (Z > 0).astype(float)  class SoftmaxLayer(Layer):     def activation(self, Z):         exp_z = np.exp(Z - np.max(Z, axis=1, keepdims=True))         return exp_z / np.sum(exp_z, axis=1, keepdims=True)          def activation_derivative(self, Z):         # Softmax derivative is more complex, not directly used in this form.         return np.ones_like(Z)  class NeuralNetwork:     def __init__(self, layer_dims, learning_rate=0.01):         self.layers = []         self.learning_rate = learning_rate         for i in range(len(layer_dims) - 2):             self.layers.append(ReLULayer(layer_dims[i], layer_dims[i + 1]))         self.layers.append(SoftmaxLayer(layer_dims[-2], layer_dims[-1]))          def cross_entropy_loss(self, y_true, y_pred):         n_samples = y_true.shape[0]         y_pred_clipped = np.clip(y_pred, 1e-12, 1 - 1e-12)         return -np.sum(y_true * np.log(y_pred_clipped)) / n_samples          def accuracy(self, y_true, y_pred):         y_true_labels = np.argmax(y_true, axis=1)         y_pred_labels = np.argmax(y_pred, axis=1)         return np.mean(y_true_labels == y_pred_labels)          def train(self, X, y, epochs):         loss_history = []         for epoch in range(epochs):             A = X             # Forward propagation             cache = [A]             for layer in self.layers:                 A = layer.forward(A)                 cache.append(A)                          loss = self.cross_entropy_loss(y, A)             loss_history.append(loss)                          # Backward propagation             # 损失函数求导             dA = A - y             for i in reversed(range(len(self.layers))):                 layer = self.layers[i]                 A_prev = cache[i]                 dA = layer.backward(dA, A_prev, layer.activation_derivative)                          # Update parameters             for layer in self.layers:                 layer.update_parameters(self.learning_rate)                          if (epoch + 1) % 100 == 0:                 print(f'Epoch {epoch + 1}/{epochs}, Loss: {loss:.4f}')                  return loss_history          def predict(self, X):         A = X         for layer in self.layers:             A = layer.forward(A)         return A  # 导入数据 iris = load_iris() X = iris.data y = iris.target.reshape(-1, 1)  # One hot encoding encoder = OneHotEncoder(sparse_output=False) y = encoder.fit_transform(y)  # 分割数据 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # 定义并训练神经网络 layer_dims = [X_train.shape[1], 100, 20, y_train.shape[1]]  # Example with 2 hidden layers learning_rate = 0.01 epochs = 5000  nn = NeuralNetwork(layer_dims, learning_rate) loss_history = nn.train(X_train, y_train, epochs)  # 预测和评估 train_predictions = nn.predict(X_train) test_predictions = nn.predict(X_test)  train_acc = nn.accuracy(y_train, train_predictions) test_acc = nn.accuracy(y_test, test_predictions)  print(f'Training Accuracy: {train_acc:.4f}') print(f'Test Accuracy: {test_acc:.4f}')  # 绘制损失曲线 plt.plot(loss_history) plt.xlabel('Epochs') plt.ylabel('Loss') plt.title('Loss Curve') plt.show()

输出

Epoch 100/1000, Loss: 1.0983 Epoch 200/1000, Loss: 1.0980 Epoch 300/1000, Loss: 1.0975 Epoch 400/1000, Loss: 1.0960 Epoch 500/1000, Loss: 1.0891 Epoch 600/1000, Loss: 1.0119 Epoch 700/1000, Loss: 0.6284 Epoch 800/1000, Loss: 0.3711 Epoch 900/1000, Loss: 0.2117 Epoch 1000/1000, Loss: 0.1290 Training Accuracy: 0.9833 Test Accuracy: 1.0000

可以看到经过1000轮迭代，最终的准确率到达100%。

回归代码

import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt from sklearn.datasets import fetch_california_housing   class Layer:     def __init__(self, input_dim, output_dim):         self.W = np.random.randn(input_dim, output_dim) * 0.01         self.b = np.zeros((1, output_dim))              def forward(self, X):         self.Z = np.dot(X, self.W) + self.b         self.A = self.activation(self.Z)         return self.A          def backward(self, dA, X, activation_derivative):         m = X.shape[0]         self.dZ = dA * activation_derivative(self.Z)         self.dW = np.dot(X.T, self.dZ) / m         self.db = np.sum(self.dZ, axis=0, keepdims=True) / m         dA_prev = np.dot(self.dZ, self.W.T)         return dA_prev          def update_parameters(self, learning_rate):         self.W -= learning_rate * self.dW         self.b -= learning_rate * self.db  class ReLULayer(Layer):     def activation(self, Z):         return np.maximum(0, Z)          def activation_derivative(self, Z):         return (Z > 0).astype(float)  class LinearLayer(Layer):     def activation(self, Z):         return Z          def activation_derivative(self, Z):         return np.ones_like(Z)  class NeuralNetwork:     def __init__(self, layer_dims, learning_rate=0.01):         self.layers = []         self.learning_rate = learning_rate         for i in range(len(layer_dims) - 2):             self.layers.append(ReLULayer(layer_dims[i], layer_dims[i + 1]))         self.layers.append(LinearLayer(layer_dims[-2], layer_dims[-1]))          def mean_squared_error(self, y_true, y_pred):         return np.mean((y_true - y_pred) ** 2)          def train(self, X, y, epochs):         loss_history = []         for epoch in range(epochs):             A = X             # Forward propagation             cache = [A]             for layer in self.layers:                 A = layer.forward(A)                 cache.append(A)                          loss = self.mean_squared_error(y, A)             loss_history.append(loss)                          # Backward propagation             # 损失函数求导             dA = -(y - A)             for i in reversed(range(len(self.layers))):                 layer = self.layers[i]                 A_prev = cache[i]                 dA = layer.backward(dA, A_prev, layer.activation_derivative)                          # Update parameters             for layer in self.layers:                 layer.update_parameters(self.learning_rate)                          if (epoch + 1) % 100 == 0:                 print(f'Epoch {epoch + 1}/{epochs}, Loss: {loss:.4f}')                  return loss_history          def predict(self, X):         A = X         for layer in self.layers:             A = layer.forward(A)         return A  housing = fetch_california_housing()  # 导入数据 X = housing.data y = housing.target.reshape(-1, 1)  # 标准化 scaler_X = StandardScaler() scaler_y = StandardScaler() X = scaler_X.fit_transform(X) y = scaler_y.fit_transform(y)  # 分割数据 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # 定义并训练神经网络 layer_dims = [X_train.shape[1], 50, 5, 1]  # Example with 2 hidden layers learning_rate = 0.8 epochs = 1000  nn = NeuralNetwork(layer_dims, learning_rate) loss_history = nn.train(X_train, y_train, epochs)  # 预测和评估 train_predictions = nn.predict(X_train) test_predictions = nn.predict(X_test)  train_mse = nn.mean_squared_error(y_train, train_predictions) test_mse = nn.mean_squared_error(y_test, test_predictions)  print(f'Training MSE: {train_mse:.4f}') print(f'Test MSE: {test_mse:.4f}')  # 绘制损失曲线 plt.plot(loss_history) plt.xlabel('Epochs') plt.ylabel('Loss') plt.title('Loss Curve') plt.show()

输出

Epoch 100/1000, Loss: 1.0038 Epoch 200/1000, Loss: 0.9943 Epoch 300/1000, Loss: 0.3497 Epoch 400/1000, Loss: 0.3306 Epoch 500/1000, Loss: 0.3326 Epoch 600/1000, Loss: 0.3206 Epoch 700/1000, Loss: 0.3125 Epoch 800/1000, Loss: 0.3057 Epoch 900/1000, Loss: 0.2999 Epoch 1000/1000, Loss: 0.2958 Training MSE: 0.2992 Test MSE: 0.3071