Plotly.py Sankey图表-控制节点目的地
问题描述
我遇到了与以前发布的问题类似的问题:
Plotly: How to set node positions in a Sankey Diagram?
在Sankey Diagram中,我需要获取所有以同一字符结尾的值,以便在Sankey Diagram中的同一垂直列中对齐(总共有三个垂直列,我希望(A)在第一列中,(B)在第二列中,(C)在第三列中)。前面的帖子提供了一个自定义函数,可以将以相同字符结尾的节点分配给相同的目的地,我已经修改了该目的地以适应我的数据集,如下所示:# Extract list of nodes and list of Source / Target links from my_df DataFrame
all_nodes = my_df.Source.values.tolist() + my_df.Target.values.tolist()
values = my_df.Value.values.tolist()
source_indices = [all_nodes.index(source) for source in my_df.Source]
target_indices = [all_nodes.index(target) for target in my_df.Target]
label_names = all_nodes + my_df.Value.values.tolist()
print (label_names)
# Function to assign identical x-positions to label names that have a common ending ((A),(B),(C))
def nodify (node_names):
node_names = all_nodes
# unique name endings
ends = sorted(list(set([e[-2] for e in node_names])))
#intervals
steps = 0.5
# x-values for each unique name ending for input as node position
nodes_x = {}
xVal = 0.5
for e in ends:
nodes_x[str(e)] = xVal
xVal += steps
#x and y values in list form
x_values = [nodes_x[n[-2]] for n in node_names]
y_values = []
y_val = 0
for n in node_names:
y_values.append(y_val)
y_val+=.001
return x_values, y_values
nodified = nodify(node_names=all_nodes)
# Plot the Sankey Diagram from my_df with node destination control
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 8,
thickness = 10,
line = dict(color = "black", width = 0.5),
label = all_nodes,
color = "blue",
x=nodified[0],
y=nodified[1]
),
# Add links
link = dict(
source = source_indices,
target = target_indices,
value = my_df.Value,
))])
fig.update_layout(title_text= "My Title",
font_size=10,
autosize=True,
height = 2000,
width = 2000
)
fig.show()
目的地分配对我来说根本不起作用,直到我发现GitHub出现一个未解决的问题(#3002),该问题表明Ploly不喜欢x和y坐标设置为0,因此我将‘xval’更改为从0.5开始,而不是从0开始,这会将节点目的地捕捉到大部分位置,只有四个(B)值仍以(C)列结尾。
- 我意识到我目前的‘y_val’仍然是从0开始的,但是当我尝试换成1e-09时,一切都会陷入混乱
- 我已尝试扩展高度/宽度,并将节点折弯以减少节点数量(以防出现适配性问题),但在这两种情况下,我仍然在垂直(C)列中得到几个(B)值。
有关Ploly坐标系或节点目标的一般情况下,我是否遗漏了什么可以帮助我理解为什么Ploly不断为少数总节点覆盖我的节点目标分配?
示例DataFrame:
0 1(A) 11(B) 6
1 1(A) 12(B) 2
2 1(A) 13(B) 20
3 1(A) 14(B) 1
4 1(A) 15(B) 1
5 1(A) 2(B) 17
6 1(A) 16(B) 5
7 1(A) 17(B) 9
8 1(A) 18(B) 6
9 1(A) 19(B) 5
10 1(A) 20(B) 255
11 1(A) 21(B) 1
12 1(A) 22(B) 9
13 1(A) 3(B) 200
14 1(A) 23(B) 1
15 1(A) 4(B) 1035
16 1(A) 24(B) 14
17 1(A) 25(B) 20
18 1(A) 26(B) 2
19 1(A) 27(B) 222
20 1(A) 28(B) 8
21 1(A) 29(B) 44
22 1(A) 5(B) 3
23 1(A) 6(B) 1529
24 1(A) 30(B) 1
25 1(A) 31(B) 2
26 1(A) 7(B) 6
27 1(A) 32(B) 1
28 1(A) 8(B) 10
29 1(A) 33(B) 11
30 1(A) 34(B) 35
31 1(A) 35(B) 1
32 1(A) 36(B) 41
33 1(A) 37(B) 6
34 1(A) 38(B) 4
35 1(A) 39(B) 2
36 1(A) 40(B) 68
37 1(A) 41(B) 46
38 1(A) 42(B) 24
39 1(A) 9(B) 21
40 1(A) 10(B) 13
41 1(A) 43(B) 6
42 2(B) 44(C) 12
43 3(B) 45(C) 19
44 4(B) 46(C) 1
45 5(B) 47(C) 6
46 6(B) 46(C) 2
47 6(B) 48(C) 1
48 6(B) 49(C) 1
49 7(B) 50(C) 84
50 8(B) 51(C) 2
51 9(B) 46(C) 4
52 10(B) 52(C) 2
53 10(B) 52(C) 2
54 10(B) 53(C) 8
55 10(B) 53(C) 8
56 10(B) 53(C) 12
57 10(B) 53(C) 20
58 10(B) 53(C) 10
59 10(B) 53(C) 4
感谢任何帮助!
解决方案
- 您尚未提供示例数据,因此已构建与您描述的生成器匹配的生成器
- 归一化的x和y范围需要为>;0和<;1
- 我使用了与此答案plotly sankey graph data formatting相同的方法从数据帧生成SANKEY
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import itertools
S = 40
labels = [str(p + 1) + s for s, p in itertools.product(list("ABC"), range(5))]
df = pd.DataFrame(
{
"source": np.random.choice(labels, S),
"target": np.random.choice(labels, S),
"value": np.random.randint(1, 10, S),
}
)
# make sure paths are valid...
df = df.loc[df["source"].str[-1].apply(ord) < df["target"].str[-1].apply(ord)]
df = df.groupby(["source", "target"], as_index=False).sum()
def factorize(s):
a = pd.factorize(s, sort=True)[0]
return (a + 0.01) / (max(a) + 0.1)
# unique nodes
nodes = np.unique(df[["source", "target"]], axis=None)
nodes = pd.Series(index=nodes, data=range(len(nodes)))
# work out positioning of nodes
nodes = (
nodes.to_frame("id")
.assign(
x=lambda d: factorize(d.index.str[-1]),
y=lambda d: factorize(d.index.str[:-1]),
)
)
# now simple job of building sankey
fig = go.Figure(
go.Sankey(
arrangement="snap",
node={"label": nodes.index, "x": nodes["x"], "y": nodes["y"]},
link={
"source": nodes.loc[df["source"], "id"],
"target": nodes.loc[df["target"], "id"],
"value": df["value"],
},
)
)
fig
相关文章