171 lines
4.5 KiB
Markdown
171 lines
4.5 KiB
Markdown
# Benchmark 函数收敛性实验
|
||
|
||
## 优化器
|
||
|
||
| 优化器 | 类型 | 线型 | 颜色 |
|
||
|--------|------|------|------|
|
||
| SGD | 无动量 | 点线 | 红色 |
|
||
| Adagrad | 自适应 | 点线 | 橙色 |
|
||
| SGDM | 动量 | 虚线 | 绿色 |
|
||
| Adam | 自适应+动量 | 虚线 | 蓝色 |
|
||
| DMAdam | 自适应+动量 | 实线 | 黑色 |
|
||
|
||
DMAdam优化器代码实现见 `dmadam.py`。算法流程如下:
|
||
|
||
> **Algorithm 1: DMAdam**
|
||
>
|
||
> **Require:** $\eta_k > 0, \alpha_k>0,\beta_k\in(0,1),\epsilon>0,$
|
||
> Ensure: $x^1\in\R^d, m^0=0, v^0=0$
|
||
>
|
||
> 1. **while** k=1 to K **do**
|
||
> 2.   $g^k = \nabla f(x^k)$
|
||
> 3.   $m^k = \frac{m^{k-1} + \lambda_k \cdot g^k}{\sqrt{k+1}}$,   where $\lambda_k = \alpha_k \cdot \sqrt{k+1}$
|
||
> 4.   $v^k = \beta_k \cdot v^{k-1} + (1-\beta_k) \cdot ((g^k)^2 + \varepsilon_0)$,   where $\varepsilon_0 = \frac{\varepsilon}{1-\beta_k}$
|
||
> 5.   $x^{k+1} = x^k - \eta_k \cdot \frac{m^k}{\sqrt{v^k + \varepsilon}}$
|
||
> 6. **end while**
|
||
|
||
## 实验设置
|
||
|
||
### 参数说明
|
||
|
||
- 迭代次数: 2000
|
||
- 其余优化器参数见各小节表格
|
||
- DMAdam优化器中:
|
||
|
||
$$
|
||
\begin{align*}
|
||
\eta_k &= \frac{\eta_0}{\sqrt{k}} \\
|
||
\beta_k &= 1-\frac{1}{\sqrt{k}}
|
||
\end{align*}
|
||
$$
|
||
|
||
### 实验环境
|
||
|
||
```
|
||
python >= 3.12
|
||
torch==2.10.0
|
||
numpy==2.4.2
|
||
matplotlib==3.10.8
|
||
```
|
||
|
||
---
|
||
|
||
## 1. Sphere 函数
|
||
|
||
$$f(x, y) = x^2 + y^2$$
|
||
|
||
- **全局最优**: $(0, 0)$,函数值 $0$
|
||
- **起始点**: $(-3, 4)$
|
||
|
||
### 优化器参数
|
||
|
||
| 优化器 | lr | beta1 | beta2 | momentum | eta0 |
|
||
|--------|-----|-------|-------|----------|------|
|
||
| SGD | 0.01 | - | - | - | - |
|
||
| Adagrad | 0.1 | - | - | - | - |
|
||
| SGDM | 0.01 | - | - | 0.9 | - |
|
||
| Adam | 0.1 | 0.9 | 0.999 | - | - |
|
||
| DMAdam | 0.1 | - | - | - | 3 |
|
||
|
||

|
||
|
||
---
|
||
|
||
## 2. Booth 函数
|
||
|
||
$$f(x, y) = (x + 2y - 7)^2 + (2x + y - 5)^2$$
|
||
|
||
- **全局最优**: $(1, 3)$,函数值 $0$
|
||
- **起始点**: $(-8, 8)$
|
||
|
||
### 优化器参数
|
||
|
||
| 优化器 | lr | beta1 | beta2 | momentum | eta0 |
|
||
|--------|-----|-------|-------|----------|------|
|
||
| SGD | 0.01 | - | - | - | - |
|
||
| Adagrad | 0.1 | - | - | - | - |
|
||
| SGDM | 0.01 | - | - | 0.9 | - |
|
||
| Adam | 0.1 | 0.9 | 0.999 | - | - |
|
||
| DMAdam | 0.1 | - | - | - | 3 |
|
||
|
||

|
||
|
||
### 收敛性分析
|
||
|
||
SGDM出现严重震荡,在谷底来回摆动,收敛极慢。SGD 和 Adagrad 收敛缓慢但稳定。Adam 和 DMAdam 表现最佳,其中DMAdam路径最短,直接沿山谷滑向最优点。
|
||
|
||
---
|
||
|
||
## 3. Matyas 函数
|
||
|
||
$$f(x, y) = 0.26(x^2 + y^2) - 0.48xy$$
|
||
|
||
- **全局最优**: $(0, 0)$,函数值 $0$
|
||
- **起始点**: $(4, -4)$
|
||
|
||
### 优化器参数
|
||
|
||
| 优化器 | lr | beta1 | beta2 | momentum | eta0 |
|
||
|--------|-----|-------|-------|----------|------|
|
||
| SGD | 0.01 | - | - | - | - |
|
||
| Adagrad | 0.1 | - | - | - | - |
|
||
| SGDM | 0.01 | - | - | 0.9 | - |
|
||
| Adam | 0.1 | 0.9 | 0.999 | - | - |
|
||
| DMAdam | 0.1 | - | - | - | 3 |
|
||
|
||

|
||
|
||
---
|
||
|
||
## 4. Beale 函数
|
||
|
||
$$f(x, y) = (1.5 - x + xy)^2 + (2.25 - x + xy^2)^2 + (2.625 - x + xy^3)^2$$
|
||
|
||
- **全局最优**: $(3, 0.5)$,函数值 $0$
|
||
- **起始点**: $(2, 2)$
|
||
|
||
### 优化器参数
|
||
|
||
| 优化器 | lr | beta1 | beta2 | momentum | eta0 |
|
||
|--------|-----|-------|-------|----------|------|
|
||
| SGD | 0.01 | - | - | - | - |
|
||
| Adagrad | 0.1 | - | - | - | - |
|
||
| SGDM | 0.01 | - | - | 0.9 | - |
|
||
| Adam | 0.1 | 0.9 | 0.999 | - | - |
|
||
| DMAdam | 0.1 | - | - | - | 3 |
|
||
|
||

|
||
|
||
### 收敛性分析
|
||
|
||
SGDM发散,完全飞出可视范围,无法收敛。SGD 收敛极慢,几乎停滞。Adagrad 收敛慢,在相同迭代次数下并未到达全局最小值点。Adam 收敛但路径迂回。DMAdam 是唯一直接收敛至全局最优 $(3, 0.5)$ 的优化器,路径干净利落。
|
||
|
||
---
|
||
|
||
## 5. Goldstein-Price 函数
|
||
|
||
$$f(x,y) = [1 + (x+y+1)^2(19 - 14x + 3x^2 - 14y + 6xy + 3y^2)]$$
|
||
$$\times [30 + (2x-3y)^2(18 - 32x + 12x^2 + 48y - 36xy + 27y^2)]$$
|
||
|
||
- **全局最优**: $(0, -1)$,函数值 $3$
|
||
- **起始点**: $(-0.5, 1.5)$
|
||
- **特点**: 多个局部极值,地形复杂
|
||
|
||
### 优化器参数
|
||
|
||
| 优化器 | lr | beta1 | beta2 | momentum | eta0 |
|
||
|--------|-----|-------|-------|----------|------|
|
||
| SGD | 0.01 | - | - | - | - |
|
||
| Adagrad | 0.1 | - | - | - | - |
|
||
| SGDM | 0.01 | - | - | 0.9 | - |
|
||
| Adam | 0.1 | 0.9 | 0.999 | - | - |
|
||
| DMAdam | 0.1 | - | - | - | **8** |
|
||
|
||

|
||
|
||
### 收敛性分析
|
||
|
||
SGD、SGDM几乎无移动,Adagrad收敛速度慢。Adam陷入局部极值。只有DMAdam成功收敛至全局最优点 $(0, -1)$,展现了其在复杂地形中逃逸局部极值的能力。
|
||
|
||
## 待实现
|
||
- Bukin函数的实验结果还需进一步调参,待后期更新 |