如何使用Python中的决策树进行模型融合的调参
模型融合是提高模型预测准确性和稳定性的一种有效方式,决策树是其中一种重要的模型。决策树的调参可以通过以下步骤实现:
1.导入必要的库
from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import GridSearchCV
2.设置模型参数
param_grid = { 'criterion': ['gini', 'entropy'], # 判断纯度的方法 'splitter': ['best', 'random'], # 选择节点时的策略 'max_depth': [None, 10, 20, 30], # 树的最大深度 'min_samples_split': [2, 5, 10], # 内部节点再划分所需最小样本数 'min_samples_leaf': [1, 2, 4], # 叶子节点最少样本数 'max_features': ['auto', 'sqrt', 'log2'], # 分裂节点时考虑的最大特征数 'random_state': [42] # 随机种子 }
3.加载数据集
X_train = [[2, 3], [8, 8], [5, 6], [9, 12], [1, 1], [7, 10], [1, 6], [11, 12], [7, 6], [11, 11], [1, 10], [3, 2], [2, 6], [12, 11], [6, 7], [10, 10]] y_train = ['pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', 'pidancode.com', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程']
4.训练模型
dt = DecisionTreeClassifier() grid_search = GridSearchCV(dt, param_grid=param_grid, cv=5, n_jobs=-1) grid_search.fit(X_train, y_train)
5.选择最优参数
best_params = grid_search.best_params_ print(best_params)
6.使用最优参数训练模型
dt_best = DecisionTreeClassifier(**best_params) dt_best.fit(X_train, y_train)
完整代码演示:
from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import GridSearchCV param_grid = { 'criterion': ['gini', 'entropy'], 'splitter': ['best', 'random'], 'max_depth': [None, 10, 20, 30], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4], 'max_features': ['auto', 'sqrt', 'log2'], 'random_state': [42] } X_train = [[2, 3], [8, 8], [5, 6], [9, 12], [1, 1], [7, 10], [1, 6], [11, 12], [7, 6], [11, 11], [1, 10], [3, 2], [2, 6], [12, 11], [6, 7], [10, 10]] y_train = ['pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程', 'pidancode.com', 'pidancode.com', 'pidancode.com', '皮蛋编程', 'pidancode.com', '皮蛋编程'] dt = DecisionTreeClassifier() grid_search = GridSearchCV(dt, param_grid=param_grid, cv=5, n_jobs=-1) grid_search.fit(X_train, y_train) best_params = grid_search.best_params_ print(best_params) dt_best = DecisionTreeClassifier(**best_params) dt_best.fit(X_train, y_train)
输出结果:
{'criterion': 'gini', 'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'random_state': 42, 'splitter': 'random'}
相关文章