Plotly.py Sankey图表-控制节点目的地

问题描述

我遇到了与以前发布的问题类似的问题:

Plotly: How to set node positions in a Sankey Diagram?

在Sankey Diagram中,我需要获取所有以同一字符结尾的值,以便在Sankey Diagram中的同一垂直列中对齐(总共有三个垂直列,我希望(A)在第一列中,(B)在第二列中,(C)在第三列中)。前面的帖子提供了一个自定义函数,可以将以相同字符结尾的节点分配给相同的目的地,我已经修改了该目的地以适应我的数据集,如下所示:

# Extract list of nodes and list of Source / Target links from my_df DataFrame 

all_nodes = my_df.Source.values.tolist() + my_df.Target.values.tolist()
values = my_df.Value.values.tolist()
source_indices = [all_nodes.index(source) for source in my_df.Source]
target_indices = [all_nodes.index(target) for target in my_df.Target] 
label_names = all_nodes + my_df.Value.values.tolist()
print (label_names)

# Function to assign identical x-positions to label names that have a common ending ((A),(B),(C))

def nodify (node_names):
    node_names = all_nodes 
    # unique name endings 
    ends = sorted(list(set([e[-2] for e in node_names])))
    #intervals 
    steps = 0.5
    # x-values for each unique name ending for input as node position 
    nodes_x = {}
    xVal = 0.5
    for e in ends: 
        nodes_x[str(e)] = xVal
        xVal += steps 
        
    #x and y values in list form
    x_values = [nodes_x[n[-2]] for n in node_names]
    y_values = []
    y_val = 0
    for n in node_names:
        y_values.append(y_val)
        y_val+=.001
    return x_values, y_values 

nodified = nodify(node_names=all_nodes)

# Plot the Sankey Diagram from my_df with node destination control 

fig = go.Figure(data=[go.Sankey(
      arrangement='snap',
      node = dict(
      pad = 8,
      thickness = 10,
      line = dict(color = "black", width = 0.5),
      label = all_nodes,
      color = "blue",
     x=nodified[0],
     y=nodified[1]
    ),

    # Add links
    link = dict(
      source =  source_indices,
      target =  target_indices,
      value =  my_df.Value,
))])

fig.update_layout(title_text= "My Title",
                  font_size=10,
                  autosize=True,
                  height = 2000,
                  width = 2000
                 )
fig.show()

目的地分配对我来说根本不起作用,直到我发现GitHub出现一个未解决的问题(#3002),该问题表明Ploly不喜欢x和y坐标设置为0,因此我将‘xval’更改为从0.5开始,而不是从0开始,这会将节点目的地捕捉到大部分位置,只有四个(B)值仍以(C)列结尾。

  • 我意识到我目前的‘y_val’仍然是从0开始的,但是当我尝试换成1e-09时,一切都会陷入混乱
  • 我已尝试扩展高度/宽度,并将节点折弯以减少节点数量(以防出现适配性问题),但在这两种情况下,我仍然在垂直(C)列中得到几个(B)值。

有关Ploly坐标系或节点目标的一般情况下,我是否遗漏了什么可以帮助我理解为什么Ploly不断为少数总节点覆盖我的节点目标分配?

示例DataFrame:

0   1(A)    11(B)   6
1   1(A)    12(B)   2
2   1(A)    13(B)   20
3   1(A)    14(B)   1
4   1(A)    15(B)   1
5   1(A)    2(B)    17
6   1(A)    16(B)   5
7   1(A)    17(B)   9
8   1(A)    18(B)   6
9   1(A)    19(B)   5
10  1(A)    20(B)   255
11  1(A)    21(B)   1
12  1(A)    22(B)   9
13  1(A)    3(B)    200
14  1(A)    23(B)   1
15  1(A)    4(B)    1035
16  1(A)    24(B)   14
17  1(A)    25(B)   20
18  1(A)    26(B)   2
19  1(A)    27(B)   222
20  1(A)    28(B)   8
21  1(A)    29(B)   44
22  1(A)    5(B)    3
23  1(A)    6(B)    1529
24  1(A)    30(B)   1
25  1(A)    31(B)   2
26  1(A)    7(B)    6
27  1(A)    32(B)   1
28  1(A)    8(B)    10
29  1(A)    33(B)   11
30  1(A)    34(B)   35
31  1(A)    35(B)   1
32  1(A)    36(B)   41
33  1(A)    37(B)   6
34  1(A)    38(B)   4
35  1(A)    39(B)   2
36  1(A)    40(B)   68
37  1(A)    41(B)   46
38  1(A)    42(B)   24
39  1(A)    9(B)    21
40  1(A)    10(B)   13
41  1(A)    43(B)   6
42  2(B)    44(C)   12
43  3(B)    45(C)   19
44  4(B)    46(C)   1
45  5(B)    47(C)   6
46  6(B)    46(C)   2
47  6(B)    48(C)   1
48  6(B)    49(C)   1
49  7(B)    50(C)   84
50  8(B)    51(C)   2
51  9(B)    46(C)   4
52  10(B)   52(C)   2
53  10(B)   52(C)   2
54  10(B)   53(C)   8
55  10(B)   53(C)   8
56  10(B)   53(C)   12
57  10(B)   53(C)   20
58  10(B)   53(C)   10
59  10(B)   53(C)   4

感谢任何帮助!


解决方案

  • 您尚未提供示例数据,因此已构建与您描述的生成器匹配的生成器
  • 归一化的x和y范围需要为>;0和<;1
  • 我使用了与此答案plotly sankey graph data formatting相同的方法从数据帧生成SANKEY
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import itertools

S = 40
labels = [str(p + 1) + s for s, p in itertools.product(list("ABC"), range(5))]
df = pd.DataFrame(
    {
        "source": np.random.choice(labels, S),
        "target": np.random.choice(labels, S),
        "value": np.random.randint(1, 10, S),
    }
)
# make sure paths are valid...
df = df.loc[df["source"].str[-1].apply(ord) < df["target"].str[-1].apply(ord)]
df = df.groupby(["source", "target"], as_index=False).sum()


def factorize(s):
    a = pd.factorize(s, sort=True)[0]
    return (a + 0.01) / (max(a) + 0.1)


# unique nodes
nodes = np.unique(df[["source", "target"]], axis=None)
nodes = pd.Series(index=nodes, data=range(len(nodes)))
# work out positioning of nodes
nodes = (
    nodes.to_frame("id")
    .assign(
        x=lambda d: factorize(d.index.str[-1]),
        y=lambda d: factorize(d.index.str[:-1]),
    )
)

# now simple job of building sankey
fig = go.Figure(
    go.Sankey(
        arrangement="snap",
        node={"label": nodes.index, "x": nodes["x"], "y": nodes["y"]},
        link={
            "source": nodes.loc[df["source"], "id"],
            "target": nodes.loc[df["target"], "id"],
            "value": df["value"],
        },
    )
)

fig

相关文章