Yellowbrick使用笔记3-特征分析可视化

2023-02-27 00:00:00 数据 可视化 绘制 特征 流形

特征分析可视化工具设计用于在数据空间中可视化实例,以便检测可能影响下游拟合的特征或目标。因为ML操作高维数据集(通常至少35个),可视化工具将重点放在聚合、优化和其他技术上,以提供对数据的概述。这是Yellowbrick的意图,指导过程将允许数据科学家缩放和过滤,并探索他们的实例和维度之间的关系。

代码下载

目前,我们实现了以下功能分析可视化工具:

  • 特征排名Rank Features:对单个特征和成对特征进行排名以检测协方差
  • RadViz Visualizer:沿围绕圆形排列的轴绘制数据点以检测可分离性
  • 平行坐标Parallel Coordinates:沿垂直轴将样本绘制为线以检测类或聚类
  • PCA投影:使用PCA将更高维投影到可视空间中
  • 流形可视化Manifold Visualization:使用流形学习可视化高维数据
  • 双变量关系图:(又名Jointplots)绘制特征和目标之间的二维相关性

功能分析可视化工具Transformer从scikit-learn 实现API,这意味着它们可以用作Pipeline(尤其是a VisualPipeline)中的中间转换步骤。它们以相同的方式实例化,然后在它们上调用fit和transform,从而正确绘制了实例。后show被调用以完成并显示图像。
头文件调用如下:

# Feature Analysis Imports
# NOTE that all these are available for import directly from the ``yellowbrick.features`` module
from yellowbrick.features.rankd import Rank1D, Rank2D
from yellowbrick.features.radviz import RadViz
from yellowbrick.features.pcoords import ParallelCoordinates
from yellowbrick.features.jointplot import JointPlotVisualizer
from yellowbrick.features.pca import PCADecomposition
from yellowbrick.features.manifold import Manifold

本文如果数据集下载不下来,查看下面地址,然后放入yellowbrick安装目录\datasets\fixtures文件夹:

{
"bikeshare": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/bikeshare.zip",
"signature": "4ed07a929ccbe0171309129e6adda1c4390190385dd6001ba9eecc795a21eef2"
},
"hobbies": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/hobbies.zip",
"signature": "6114e32f46baddf049a18fb05bad3efa98f4e6a0fe87066c94071541cb1e906f"
},
"concrete": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/concrete.zip",
"signature": "5807af2f04e14e407f61e66a4f3daf910361a99bb5052809096b47d3cccdfc0a"
},
"credit": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/credit.zip",
"signature": "2c6f5821c4039d70e901cc079d1404f6f49c3d6815871231c40348a69ae26573"
},
"energy": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/energy.zip",
"signature": "174eca3cd81e888fc416c006de77dbe5f89d643b20319902a0362e2f1972a34e"
},
"game": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/game.zip",
"signature": "ce799d1c55fcf1985a02def4d85672ac86c022f8f7afefbe42b20364fba47d7a"
},
"mushroom": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/mushroom.zip",
"signature": "f79fdbc33b012dabd06a8f3cb3007d244b6aab22d41358b9aeda74417c91f300"
},
"occupancy": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/occupancy.zip",
"signature": "0b390387584586a05f45c7da610fdaaf8922c5954834f323ae349137394e6253"
},
"spam": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/spam.zip",
"signature": "000309ac2b61090a3001de3e262a5f5319708bb42791c62d15a08a2f9f7cb30a"
},
"walking": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/walking.zip",
"signature": "7a36615978bc3bb74a2e9d5de216815621bd37f6a42c65d3fc28b242b4d6e040"
},
"nfl": {
"url": "https://s3.amazonaws.com/ddl-data-lake/yellowbrick/v1.0/nfl.zip",
"signature": "4989c66818ea18217ee0fe3a59932b963bd65869928c14075a5c50366cb81e1f"
}
}

# 多行输出
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"


1 特征排名Rank Features

Rank1D和Rank2D使用各种指标对单个要素或要素对进行评估,这些指标以[-1,1]或[0,1]等级对要素进行评分,从而可以对它们进行排名。数在左下角的三角形热图上可视化,因此可以轻松识别特征对之间的模式以进行下游分析。Rank1D, Rank2D具体对比如下:

展示器Rank1D, Rank2D
快速使用方法rank1d(), rank2d()
模型通用线性模型
工作流程特征工程和模型选择

相关文章