Skip to content
线性拟合
2025年3月13日 root

线性方程

线线方程(1):

y=β0+β1x1+β2x2+β3x3+...βkxk+ϵ

式(1)用矩阵表示为(2):

[y1y2y3...yn]n×1=[1x11x12x13...x1k1x21x22x23...x2k1x31x32x33...x3k..................1xn1xn2xn3...xnk]n×(k+1)×[β0β1β2...βk](K+1)×1+[ϵ1ϵ2ϵ3...ϵn]n×1

即:

y=Xβ+ϵ

线性回归(LR)

求解

线性回归前提:

  • 各个自变量x(x1,x2,x3...xk)之间不存在严格线性相关性
  • 样本n满足: n >= 30 或 n >=3*(k+1)
  • 干扰因子ϵ与x无线性相关, 且ϵ均值为零 由线性方程(2)可得:
ϵ=yXβ

线性回归的目的是找到一个β, 使得ϵ的平均方差最小, 最小二乘(OLS)提供了最小平均方差的计算公式:

β=(XX)1Xy

y值预测

求解后预测公式为:

y^=x^Tβ

代码

cpp
#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;
int main() {
// Input data
MatrixXd X(5, 2);
X << 1, 2,
2, 4,
3, 6,
4, 8,
5, 10;
VectorXd y(5);
y << 3, 6, 9, 12, 15;

// Add a column of ones to X for the intercept term
MatrixXd X_ones(X.rows(), X.cols() + 1);
X_ones << MatrixXd::Ones(X.rows(), 1), X;

// Calculate the least squares solution
VectorXd beta = (X_ones.transpose() * X_ones).inverse() * X_ones.transpose() * y;

// Print the coefficients (intercept and slope)
std::cout << "Intercept: " << beta(0) << std::endl;
std::cout << "Slope 1: " << beta(1) << std::endl; 
std::cout << "Slope 2: " << beta(2) << std::endl;
return 0;
}

贝叶斯回归(BLR)

求解

贝叶斯线性回归前提:

  • 残差(噪声)ξ服从正态分布, 其方差σ2服从逆Gamma分布(Inverse-Gamma distribution)

在此条件下, 当X确定时, y被观察到的分布, 也就是似然函数p(y|X)为:

p(y|X,β,θ2)=1(2πσ2)n/2exp(12σ2(yXβ)T(yXβ))

结合贝叶斯定理:

P(A|B)=P(A)P(B|A)P(B)
  • P(A)是A的先验概率, 它不考虑任何B方面的因素
  • P(A|B)是已知B发生后A的条件概率
  • P(B|A)是已知A发生后B的条件概率
  • P(B)是B的先验概率, 它不考虑任何A方面的因素

推导可得后验分布(3):

p(β,σ2|X,y)p(y|X,β,σ2).p(β).p(σ2)

预测

p(y^|x^,X,y)=p(y^|x^,β,σ2)p(β,σ2|X,y)dβdσ2

代码

cpp
#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;

int main() {
    // Input data
    MatrixXd X(5, 2);
    X << 1, 2,
         2, 4,
         3, 6,
         4, 8,
         5, 10;
    VectorXd y(5);
    y << 3, 6, 9, 12, 15;

    // Add a column of ones to X for the intercept term
    MatrixXd X_ones(X.rows(), X.cols() + 1);
    X_ones << MatrixXd::Ones(X.rows(), 1), X;

    // Prior parameters
    double alpha = 1.0; // Prior precision for the coefficients
    double beta = 1.0;  // Prior precision for the noise variance

    // Calculate posterior parameters
    double N = X_ones.rows();
    double M = X_ones.cols();
    MatrixXd XtX = X_ones.transpose() * X_ones;
    MatrixXd Xty = X_ones.transpose() * y;
    MatrixXd Sigma_inv = alpha * MatrixXd::Identity(M, M) + beta * XtX;
    MatrixXd Sigma = Sigma_inv.inverse();
    VectorXd mu = beta * Sigma * Xty;

    // Calculate predictive distribution for new input data points
    MatrixXd X_star(1, 2); // New input point for prediction
    X_star << 6, 12;

    MatrixXd X_star_ones(X_star.rows(), X_star.cols() + 1);
    X_star_ones << MatrixXd::Ones(X_star.rows(), 1), X_star;

    double y_pred = X_star_ones * mu;

    // Print the coefficients (intercept and slope)
    std::cout << "Intercept: " << mu(0) << std::endl;
    std::cout << "Slope 1: " << mu(1) << std::endl;
    std::cout << "Slope 2: " << mu(2) << std::endl;

    // Print the predicted output
    std::cout << "Predicted Output: " << y_pred << std::endl;

    return 0;
}

高斯过程回归(GPR)

求解

高斯过程回归的前提:

  • 任何有限数量的自变量x的组合都服从联合高斯分布

根据高斯模型方程(4):

y=f(X)+ϵ

和前验分布方程(5):

p(f(X))N(m(X),K(X,X))
  • m(X)是均值函数
  • K(X,X)是X的自身的协方差矩阵

得到后验分布为:

p(f(X)|X,y)N(μ(X),(X,X))

预测

p(y^|x^,X,y)N(μ(x^),σ2(x^))
  • μ(x^)是预测均值
  • σ2(x^)是预测方差

代码

cpp
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/Cholesky>

using namespace Eigen;

// Define the kernel function (Squared Exponential Kernel)
double kernel(const VectorXd& x1, const VectorXd& x2, double lengthScale, double variance) {
    double squaredDistance = (x1 - x2).squaredNorm();
    return variance * exp(-squaredDistance / (2.0 * lengthScale * lengthScale));
}

int main() {
    // Input data
    MatrixXd X(5, 1);
    X << 1,
         2,
         3,
         4,
         5;

    VectorXd y(5);
    y << 3, 6, 9, 12, 15;

    // New input point for prediction
    VectorXd X_star(1);
    X_star << 6;

    // Kernel hyperparameters
    double lengthScale = 1.0;
    double variance = 1.0;

    // Calculate the kernel matrix (covariance matrix)
    int n = X.rows();
    MatrixXd K(n, n);
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            K(i, j) = kernel(X.row(i), X.row(j), lengthScale, variance);
        }
    }

    // Add small noise to the diagonal of the kernel matrix for numerical stability
    K.diagonal() += 1e-6;

    // Compute the kernel vector for the new input point
    VectorXd k_star(n);
    for (int i = 0; i < n; i++) {
        k_star(i) = kernel(X.row(i), X_star, lengthScale, variance);
    }

    // Calculate the predictive mean and variance
    VectorXd mu = k_star.transpose() * K.inverse() * y;
    double sigma = kernel(X_star, X_star, lengthScale, variance) - k_star.transpose() * K.inverse() * k_star;

    // Print the predictive mean and variance
    std::cout << "Predicted Mean: " << mu(0) << std::endl;
    std::cout << "Predicted Variance: " << sigma << std::endl;

    return 0;
}

人工神经网络线性回归(ANNR)

求解

人工神经网络是利用残差网络求解出损失L最小时, 权重参数值w:

ypred=w0+w1x1+w2x2+w2x2+...wkxkL=1ni=1n(yprediytrue)2

对于神经网络其解问题分为以下步骤:

  1. 选择网络结构: 对于线性回归一般选择单层神经网络.
  2. 选择选择激活函数: 对于线性回归选择线性激活函数
  3. 正向传导计算得到ypred
  4. 计算损失值L
  5. 反向传导: 选择梯度下降或者其他算法迭代更新权值w
  6. 修正权重参数w, 重复执行3,4,5,6直到损失L最小

预测

ypred=w0+w1x1+w2x2+w2x2+...wkxk

代码

cpp
#include <iostream>
#include <Eigen/Dense>
using namespace Eigen;

// Linear Activation Function (Identity Function)
MatrixXd linearActivation(const MatrixXd& x) {
    return x;
}

// Neural Network Prediction
VectorXd predict(const MatrixXd& X, const MatrixXd& W) {
    return X * W;
}

int main() {
    // Input data (X) and true output (y)
    MatrixXd X(5, 2);
    X << 1, 2,
         2, 4,
         3, 6,
         4, 8,
         5, 10;

    VectorXd y(5);
    y << 3, 6, 9, 12, 15;

    // Add a column of ones to X for the bias term
    MatrixXd X_bias(X.rows(), X.cols() + 1);
    X_bias << MatrixXd::Ones(X.rows(), 1), X;

    // Randomly initialize the weights (including bias weight)
    MatrixXd W(X_bias.cols(), 1);
    W.setRandom();

    // Hyperparameters
    double learningRate = 0.01;
    int epochs = 1000;

    // Neural Network Training (Gradient Descent)
    for (int epoch = 0; epoch < epochs; ++epoch) {
        // Forward pass: Prediction
        VectorXd y_pred = predict(X_bias, W);

        // Calculate the loss (Mean Squared Error)
        VectorXd loss = y_pred - y;
        double meanSquaredError = loss.squaredNorm() / X.rows();

        // Backpropagation: Update the weights
        MatrixXd gradient = X_bias.transpose() * loss / X.rows();
        W -= learningRate * gradient;
    }

    // Print the learned weights
    std::cout << "Learned Weights:" << std::endl;
    std::cout << W << std::endl;

    return 0;
}

Last updated: