Python递归实现决策树回归算法

2023-04-16 00:00:00 算法递归回归

决策树回归算法是一种非参数回归方法，可以用于预测连续变量的值。它是通过将数据集划分成多个小部分，每个小部分都拟合一个简单的线性模型来实现的。这个划分过程可以通过递归调用来实现，这也是Python中实现决策树回归算法的常见方法之一。

下面是一个使用递归调用实现决策树回归算法的示例代码：

import numpy as np

class Node():
    def __init__(self, depth, max_depth, split_feature, split_value, left_child, right_child):
        self.depth = depth
        self.max_depth = max_depth
        self.split_feature = split_feature
        self.split_value = split_value
        self.left_child = left_child
        self.right_child = right_child

    def is_leaf(self):
        return self.left_child is None and self.right_child is None

    def predict(self, x):
        if self.is_leaf():
            return self.split_value
        elif x[self.split_feature] <= self.split_value:
            return self.left_child.predict(x)
        else:
            return self.right_child.predict(x)

class DecisionTree():
    def __init__(self, max_depth=10):
        self.max_depth = max_depth

    def fit(self, X, y):
        self.root = self.build_tree(X, y, depth=0)

    def build_tree(self, X, y, depth):
        if depth >= self.max_depth or len(X) < 2:
            return Node(depth, self.max_depth, None, np.mean(y), None, None)

        current_var = np.var(y)
        best_var = float("inf")
        best_feature = None
        best_value = None
        best_left_index = None
        best_right_index = None

        for i in range(X.shape[1]):
            for value in X[:,i]:
                left_index = X[:,i] <= value
                right_index = X[:,i] > value

                if len(y[left_index]) < 1 or len(y[right_index]) < 1:
                    continue

                left_var = np.var(y[left_index])
                right_var = np.var(y[right_index])
                total_var = (len(y[left_index])/len(y))*left_var + (len(y[right_index])/len(y))*right_var

                if total_var < best_var:
                    best_var = total_var
                    best_feature = i
                    best_value = value
                    best_left_index = left_index
                    best_right_index = right_index

        if best_feature is None or best_var >= current_var:
            return Node(depth, self.max_depth, None, np.mean(y), None, None)

        left_child = self.build_tree(X[best_left_index], y[best_left_index], depth+1)
        right_child = self.build_tree(X[best_right_index], y[best_right_index], depth+1)

        return Node(depth, self.max_depth, best_feature, None, left_child, right_child)

    def predict(self, X):
        y_pred = []
        for x in X:
            y_pred.append(self.root.predict(x))
        return np.array(y_pred)

这个实现中，首先定义了一个Node类来表示决策树的节点，一个DecisionTree类来表示整个决策树。在DecisionTree类中，有一个fit方法来训练决策树，和一个predict方法来预测新的数据。

在build_tree方法中，递归地构建决策树。对于每个节点，它会选择一个最佳的特征和分割点，然后递归地创建左右子树。如果一个节点无法分裂或分裂后不会减小方差，则将该节点标记为叶子节点。

最后，在predict方法中，通过遍历决策树来预测每个输入数据的输出值，并返回输出结果。

可以使用这个实现和任何数据集来训练和测试决策树回归模型。例如，以下是使用“pidancode.com”和“皮蛋编程”作为示例数据的使用示例：

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([5, 6, 7, 8, 9])
tree = DecisionTree(max_depth=2)
tree.fit(X, y)
print(tree.predict(np.array([[1, 2], [3, 4], [5, 6]])))

相关文章