k-Fold交叉验证代码

发表于 2023-02-09 更新于 2023-02-10 分类于机器学习

1. k-Fold交叉验证代码

k-Fold交叉验证是一种在机器学习中常用的验证模型性能的方法。它将训练数据集分为k个大小相似的互不重叠的子集，每个子集验证一次，最终验证结果是k次验证结果的平均值。

以下是使用python实现的k-Fold交叉验证代码：

import numpy as np

def learning_function(train_data):
    # 模拟学习函数，实际应用中请根据实际需求实现学习函数
    return train_data.mean()

def evaluate(model, test_data):
    # 模拟评估函数，实际应用中请根据实际需求实现评估函数
    return (model - test_data.mean()) ** 2

def KFoldCV(D, A, k):
    """
    k-fold 交叉验证
    参数说明：
    D：给定数据集
    A：学习函数
    k：折数
    """
    n = len(D)
    performance = []
    for i in range(k):
        test_data = D[int(i * n / k) : int((i + 1) * n / k)]
        train_data = np.concatenate((D[: int(i * n / k)], D[int((i + 1) * n / k) :]), axis=0)
        model = A(train_data)
        p = evaluate(model, test_data)
        performance.append(p)
    return np.mean(performance)

# 生成随机数据
np.random.seed(0)
data = np.random.rand(100, 2)
# 进行 k 折交叉验证
performance = KFoldCV(data, learning_function, 5)
print("平均误差：", performance)

2. np.concatenate函数

np.concatenate是numpy库中的函数，用于将多个数组拼接起来。它的参数有两个：

arrays：要拼接的数组的列表
axis：拼接方向，0表示按行拼接，1表示按列拼接

array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[7, 8, 9], [10, 11, 12]])
array3 = np.array([[13, 14, 15], [16, 17, 18]])

# 沿轴 0 连接（按行）
result = np.concatenate((array1, array2, array3), axis=0)
print("Concatenation along axis 0: \n", result)

# 沿轴 1 连接（按列）
result = np.concatenate((array1, array2, array3), axis=1)
print("Concatenation along axis 1: \n", result)

沿轴 0 连接: 
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]
 [16 17 18]]
沿轴 1 连接: 
 [[ 1  2  3  7  8  9 13 14 15]
 [ 4  5  6 10 11 12 16 17 18]]

3. np.random.rand

Numpy中提供了多种生成随机数的函数，主要包括以下几种：

np.random.rand: 生成[0, 1)范围内的均匀分布随机数
np.random.randn: 生成标准正态分布随机数
np.random.randint: 生成给定范围内的整数随机数
np.random.choice: 从给定的一维数组中随机选取数据
np.random.permutation: 对给定的一维数组进行随机排列

在使用这些函数时，通过np.random.seed(0)可以设置随机数生成的种子，使得每次生成的随机数都相同。

np.random.seed(0)
# 生成 [0, 1) 之间的随机数
result = np.random.rand(2, 3)
print("rand: \n", result)

# 生成 [0, 1) 之间的均匀分布随机数
result = np.random.rand(2, 3)
print("rand: \n", result)

# 生成正态分布的随机数
result = np.random.randn(2, 3)
print("randn: \n", result)

# 生成指定区间内的随机整数
result = np.random.randint(0, 10, size=(2, 3))
print("randint: \n", result)

# 生成指定形状的随机数组，并从给定的一维数组中随机选择元素
array = np.array([1, 2, 3, 4, 5, 6])
result = np.random.choice(array, size=(2, 3))
print("choice: \n", result)

# 生成指定形状的随机数组，并从给定的一维数组中随机选择元素，可以重复
result = np.random.choice(array, size=(2, 3), replace=True)
print("choice with replacement: \n", result)

rand: 
 [[0.31179588 0.69634349 0.37775184]
 [0.17960368 0.02467873 0.06724963]]
rand: 
 [[0.67939277 0.45369684 0.53657921]
 [0.89667129 0.99033895 0.21689698]]
randn: 
 [[-1.22543552  0.84436298 -1.00021535]
 [-1.5447711   1.18802979  0.31694261]]
randint: 
 [[7 0 3]
 [8 7 7]]
choice: 
 [[6 3 2]
 [1 5 1]]
choice with replacement: 
 [[4 5 5]
 [2 1 3]]