如何使用Python中的决策树进行模型融合的调参

2023-04-15 00:00:00 模型 融合 如何使用

模型融合是提高模型预测准确性和稳定性的一种有效方式,决策树是其中一种重要的模型。决策树的调参可以通过以下步骤实现:

1.导入必要的库

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

2.设置模型参数

param_grid = {
    'criterion': ['gini', 'entropy'],  # 判断纯度的方法
    'splitter': ['best', 'random'],  # 选择节点时的策略
    'max_depth': [None, 10, 20, 30],  # 树的最大深度
    'min_samples_split': [2, 5, 10],  # 内部节点再划分所需最小样本数
    'min_samples_leaf': [1, 2, 4],  # 叶子节点最少样本数
    'max_features': ['auto', 'sqrt', 'log2'],  # 分裂节点时考虑的最大特征数
    'random_state': [42]  # 随机种子
}

3.加载数据集

X_train = [[2, 3], [8, 8], [5, 6], [9, 12], [1, 1], [7, 10], [1, 6], [11, 12], [7, 6], [11, 11], [1, 10], [3, 2], [2, 6], [12, 11], [6, 7], [10, 10]]
y_train = ['pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', 'pidancode.com', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程']

4.训练模型

dt = DecisionTreeClassifier()
grid_search = GridSearchCV(dt, param_grid=param_grid, cv=5, n_jobs=-1)
grid_search.fit(X_train, y_train)

5.选择最优参数

best_params = grid_search.best_params_
print(best_params)

6.使用最优参数训练模型

dt_best = DecisionTreeClassifier(**best_params)
dt_best.fit(X_train, y_train)

完整代码演示:

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

param_grid = {
    'criterion': ['gini', 'entropy'],
    'splitter': ['best', 'random'],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['auto', 'sqrt', 'log2'],
    'random_state': [42]
}

X_train = [[2, 3], [8, 8], [5, 6], [9, 12], [1, 1], [7, 10], [1, 6], [11, 12], [7, 6], [11, 11], [1, 10], [3, 2], [2, 6], [12, 11], [6, 7], [10, 10]]
y_train = ['pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', 'pidancode.com', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程']

dt = DecisionTreeClassifier()
grid_search = GridSearchCV(dt, param_grid=param_grid, cv=5, n_jobs=-1)
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
print(best_params)

dt_best = DecisionTreeClassifier(**best_params)
dt_best.fit(X_train, y_train)

输出结果:

{'criterion': 'gini', 'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'random_state': 42, 'splitter': 'random'}

相关文章