神图示例-AbutionGraph面向多维知识图谱的查询语言Aremlin

2022-04-18 00:00:00 查询数据节点图谱多维

Aremlin是图数据库AbutionGraph的图查询语言，AbutionGraph是图特摩斯科技自研的GraphHOLAP图数据仓库，面向大规模实时图查询分析，在传统静态数据图谱的基础上，时序多维的动态知识图谱是其一大特色，从底层构建解决和优化一些既往图数据库无法完成的任务，是一款原生图存储系统。

前言

维度是数据库的概念，标签是业务上的概念，是对应关系。

在大多数场景中，知识数据往往是多维的，就拿以人为中心的图谱画像来说，每个人的数据（金融、购物、出行等）都可能不同，有些人去过医院拥有医疗记录，有些人则没有去过，这样的数据标签如果通过生成edge来关联，则需要多维护一条无关紧要的边数据，造成存储空间的浪费，再有对于关联推理时，多查询一层实体和关系就相当于多搞了个多表关联啊，这对于计算资源来说是昂贵的。在面向实时图计算和分析中，还有年月日等不同时间的维度对于业务模型的重要性不言而喻，在风控等用户行为分析领域具有决定性的作用，但是传统的图数据库则无法高效的得到这些特征，甚至随着数据量的不断增大，还有超级节点问题时，计算速度将下降明显，大规模数据甚至导致计算崩溃。

AbutionGraph就是为了解决这些技术痛点而生，它的存储和计算模型更适合真实社会中的业务场景。

本文将带领大家了解一维图谱与多维图谱在查询实现上的差异，先来看一下架构区别：

上图反映了使用AbutionGraph实现业务与既往技术的区别：

知识结构更简洁了。如通过教育信息查找出生信息，既往是2层查询，Abution是0层查询。
维护量少了。既往节点标签与关系容易混淆，Abution无需再用冗余边维护节点标签的关联性。
覆盖的业务范围更广了。既往技术只能用于事理图谱推理，Abution时序动态图模型可以完全覆盖物联网等实时大规模设备的管理与监控场景，取代时序数据库+图谱的方案。

一、图查询语言Gremlin

Gremlin是JanusGraph和Neo4J的图数据查询语言，也是受欢迎的图查询语言之一，通过面向路径的向前推理，可以简洁地表示复杂的图遍历操作。但是，它也存在一些让AbutionGraph不得不放弃对接它的缺点：

Gremlin是一种面向一维静态图谱的查询语言，无法扩展到多维的Abution图谱上；
Gremlin只是一套API，数据格式的转换也会极大的影响查询速度，这也是为什么JanusGraph速度这么慢的原因之一。
Gremlin是架在Tinkerpop服务框架之上，这使得用户必须多维护一套系统服务，增加开发与运维难度；
Tinkerpop非分布式架构，具有单点故障，查询服务无高可用；
Tinkerpop无数据共享机制，多服务器间的元数据无法打通，举个例子，A服务器上新建了图graphA，则同集群的B服务器上是看不到的。
Gremlin语言体系独立，很难与Java、Scala等编程语言混编，这对于Java等开发人员门槛很高，需要完全的熟练Gremlin才可以开发复杂业务，无法利用原本开发人员的语言优势。

二、图查询语言Aremlin

AbutionGraph有3种API，Aremlin是AbutionGraph的一种图查询语言，借鉴自Gremlin且使用Java完全重新设计和实现，相比于Gremlin更精巧，它是图灵完备的图查询语言，解决了以上Gremlin中的所有缺点，极速响应、分布式、高可用、Job定时任务机制、用户查询权限、对开发者友好,与所有的JVM语言无缝混编等等，是一款专门面向多维图谱数据的查询语言。

查询示例：

获取6月分总交易额大于100万的所有实体和银行交易数据总结报告。

var allEntity = g.V().dim("BankAgg")
    .has("startTime","stopTime")
        .by(P.DateRangeDualIn("2021-06-01","2021-06-30"))
    .groupBy().where("sumMoney").by(P.MoreThan(1000000))
    .exec(User("张三"));

让我们来解释一下上面的语句：

g: 我们当前在使用的图

V: 该图中的全部顶点

dim("BankAgg"): 银行预聚合维度数据

has("startTime","stopTime").by(): 过滤出开始和结束时间窗口内的数据

groupBy(): 对过滤出来的数据执行后聚合，即每个实体的多条数据将合并为一条汇总报告。

where("sumMoney").by(): 对6月份的汇总数据执行过滤>100w的实体出来，即为结果。

exec()：赋予张三作为操作权限执行语句。

我们还可以在之后连接Out()/OutV()等等遍历逻辑，这些步骤加在一起构成了类似路径的遍历查询表达式，其中每个步骤都可以分解，并且可以证明其结果。

三、诸神关系图示例

本节中使用与JanusGraph一起分发的示例图-称为神的图形（The Graph of the Gods）作为示例，可以方便将Aremlin与其提供的Gremlin实现代码进行一个对比，该图如下图所示。抽象数据模型被称为属性图模型，这个特定的实例描述了罗马万神殿的神和地点之间的关系。此外，图中的特殊文本和符号修饰符（例如粗体，下划线等）表示图中的不同示意图/类型。

图中，对应的类型解释如下：

符号类型	含义
粗体key	图索引键
加粗加星号的key	图索引键
下划线key	顶点为中心的索引键，vertex-centric index
空心箭头edge	不可重复边，两个节点之间多只能有一个该类型边
实心箭头edge	单向边，只能A–>B，不可以B–>A

包含类型

主要包含6种实体类型：

location：位置（sky：天空，sea：海，tartarus：塔耳塔洛斯）
titan：巨人（saturn：罗马神话中的农神）
god：神（jupiter，neptune，pluto）
demigod：半神（hercules）
human：人类（alcmene）
monster：怪物（nemean，hydra，cerberus）

主要包含6中关系类型：

father：父亲
mother：母亲
brother：兄弟
battled：战斗
lives：生活在
pet：宠物

属性：

time：争斗次数
name：名字
age：年纪

1、诸神图Schema知识建模示例

1）JanusGraph知识建模

// 1、获取图管理对象实例
if (graph instanceof StandardJanusGraph) {
    Preconditions.checkState(mixedIndexNullOrExists((StandardJanusGraph)graph, mixedIndexName),
                       ERR_NO_INDEXING_BACKEND, mixedIndexName);
}
JanusGraphManagement management = graph.openManagement();

// 2、开始Schema构建
final PropertyKey name = management.makePropertyKey("name").dataType(String.class).make();
JanusGraphManagement.IndexBuilder nameIndexBuilder = management.buildIndex("name", Vertex.class).addKey(name);
if (uniqueNameCompositeIndex)
    nameIndexBuilder.unique();
JanusGraphIndex nameIndex = nameIndexBuilder.buildCompositeIndex();
management.setConsistency(nameIndex, ConsistencyModifier.LOCK);
final PropertyKey age = management.makePropertyKey("age").dataType(Integer.class).make();
if (null != mixedIndexName)
    management.buildIndex("vertices", Vertex.class).addKey(age).buildMixedIndex(mixedIndexName);
final PropertyKey time = management.makePropertyKey("time").dataType(Integer.class).make();
final PropertyKey reason = management.makePropertyKey("reason").dataType(String.class).make();
final PropertyKey place = management.makePropertyKey("place").dataType(Geoshape.class).make();
if (null != mixedIndexName)
    management.buildIndex("edges", Edge.class).addKey(reason).addKey(place).buildMixedIndex(mixedIndexName);

// 3、创建edge标签和对应索引
management.makeEdgeLabel("father").multiplicity(Multiplicity.MANY2ONE).make();
management.makeEdgeLabel("mother").multiplicity(Multiplicity.MANY2ONE).make();
EdgeLabel battled = management.makeEdgeLabel("battled").signature(time).make();
management.buildEdgeIndex(battled, "battlesByTime", Direction.BOTH, Order.desc, time);
management.makeEdgeLabel("lives").signature(reason).make();
management.makeEdgeLabel("pet").make();
management.makeEdgeLabel("brother").make();

// 4、创建vertex标签
management.makeVertexLabel("titan").make();
management.makeVertexLabel("location").make();
management.makeVertexLabel("god").make();
management.makeVertexLabel("demigod").make();
management.makeVertexLabel("human").make();
management.makeVertexLabel("monster").make();

// 5、提交创建的schema
management.commit();

2）AbutionGraph知识建模

Ps：AbutionGraph是多维图谱数据仓库，为了与Janus的一维模型对应，以下也将采用一维的模型。为了体现Abution的特色，我们在之中加了些动态的模型设计，加入了战争次数的自动聚合，这将在数据实时加载进图后立即可以查询到战争总次数的变更，其中，还可以指定groupBy()作为维度列作为指标列的聚合分组。

Schema schema = new Schema()
    .entity(
        Dimension.label("titan", "太阳神").property("age", Integer.class).build(),
        Dimension.label("god", "上帝").property("age", Integer.class).build(),
        Dimension.label("demigod", "小神").property("age", Integer.class).build(),
        Dimension.label("human", "人类").property("age", Integer.class).build(),
        Dimension.label("monster", "怪物").build(),
        Dimension.label("location", "场景").build())
    .edge(Dimension.label("father", "父亲").build(),
        Dimension.label("brother", "兄弟").build(),
        Dimension.label("mother", "母亲").build(),
        Dimension.label("battled", "战争").property("time", Integer.class)
                .property("place", Geoshape.class).build(),
        Dimension.label("battledAgg", "战争总数").property("totalTime", Integer.class, Agg.Sum())
                .groupBy().build(),
        Dimension.label("pet", "宠物").build(),
        Dimension.label("lives", "生活").property("reason", String.class).build())
    .build();

2、诸神图数据写入

性能对比请查看

1）JanusGraph写入数据

JanusGraph支持批量的离线数据导入，需要处理成gson数据格式的文本文件，但是对实时流批的数据写入支持量不是很好，这里是官方提供的单条数据写入。

//获取图事务对象
JanusGraphTransaction tx = graph.newTransaction();
// 插入节点
Vertex saturn = tx.adVertex(T.label, "titan", "name", "saturn", "age", 10000);
Vertex sky = tx.adVertex(T.label, "location", "name", "sky");
Vertex sea = tx.adVertex(T.label, "location", "name", "sea");
Vertex jupiter = tx.adVertex(T.label, "god", "name", "jupiter", "age", 5000);
Vertex neptune = tx.adVertex(T.label, "god", "name", "neptune", "age", 4500);
Vertex hercules = tx.adVertex(T.label, "demigod", "name", "hercules", "age", 30);
Vertex alcmene = tx.adVertex(T.label, "human", "name", "alcmene", "age", 45);
Vertex pluto = tx.adVertex(T.label, "god", "name", "pluto", "age", 4000);      
Vertex nemean = tx.adVertex(T.label, "monster", "name", "nemean");
Vertex hydra = tx.adVertex(T.label, "monster", "name", "hydra");
Vertex cerberus = tx.adVertex(T.label, "monster", "name", "cerberus");
Vertex tartarus = tx.adVertex(T.label, "location", "name", "tartarus");
// 插入边数据
jupiter.addEdge("father", saturn);
jupiter.addEdge("lives", sky, "reason", "loves fresh breezes");
jupiter.addEdge("brother", neptune);
jupiter.addEdge("brother", pluto);
neptune.addEdge("lives", sea).property("reason", "loves waves");
neptune.addEdge("brother", jupiter);
neptune.addEdge("brother", pluto);
hercules.addEdge("father", jupiter);
hercules.addEdge("mother", alcmene);
hercules.addEdge("battled", nemean, "time", 1, "place", Geoshape.point(38.1f, 23.7f));
hercules.addEdge("battled", hydra, "time", 2, "place", Geoshape.point(37.7f, 23.9f));
hercules.addEdge("battled", cerberus, "time", 12, "place", Geoshape.point(39f, 22f));
pluto.addEdge("brother", jupiter);
pluto.addEdge("brother", neptune);
pluto.addEdge("lives", tartarus, "reason", "no fear of death");
pluto.addEdge("pet", cerberus);
cerberus.addEdge("lives", tartarus);
// 提交事务
tx.commit();

2）AbutionGraph写入数据

Ps：AbutionGraph支持自定义id，所以我们不需要给name单独建立索引，也可以使用到类似ES搜索引擎的vertex模糊匹配等功能。在真实场景的数据中我们也不需要这么麻烦每条数据都构造一遍，采用模板匹配即可。

构造神图数据：

// 制造节点数据
Entity saturn = Knowledge.dimV("titan").vertex("saturn").property("age", 10000).build();
Entity sky = Knowledge.dimV("location").vertex("sky").build();
Entity sea = Knowledge.dimV("location").vertex("sea").build();
Entity jupiter = Knowledge.dimV("god").vertex("jupiter").property("age", 5000).build();
Entity neptune = Knowledge.dimV("god").vertex("neptune").property("age", 4500).build();
Entity hercules = Knowledge.dimV("demigod").vertex("hercules").property("age", 30).build();
Entity alcmene = Knowledge.dimV("human").vertex("alcmene").property("age", 45).build();
Entity pluto = Knowledge.dimV("god").vertex("pluto").property("age", 4000).build();
Entity nemean = Knowledge.dimV("monster").vertex("nemean").build();
Entity hydra = Knowledge.dimV("monster").vertex("hydra").build();
Entity cerberus = Knowledge.dimV("monster").vertex("cerberus").build();
Entity tartarus = Knowledge.dimV("location").vertex("tartarus").build();

// 制造边数据
Edge eg = Knowledge.dimE("father").edge("jupiter", "saturn").build();
Edge eg1 = Knowledge.dimE("lives").edge("jupiter", "sky").property("reason", "loves fresh breezes").build();
Edge eg2 = Knowledge.dimE("brother").edge("jupiter", "neptune").build();
Edge eg3 = Knowledge.dimE("brother").edge("jupiter", "pluto").build();
//neptune relation
Edge eg4 = Knowledge.dimE("lives").edge("neptune", "sea").property("reason", "loves waves").build();
Edge eg5 = Knowledge.dimE("brother").edge("neptune", "jupiter").build();
Edge eg6 = Knowledge.dimE("brother").edge("neptune", "pluto").build();
//hercules relation
Edge eg7 = Knowledge.dimE("father").edge("hercules", "jupiter").build();
Edge eg8 = Knowledge.dimE("mother").edge("hercules", "alcmene").build();
Edge eg9 = Knowledge.dimE("battled").edge("hercules", "nemean").property("time", 1).property("place", Geoshape.point(38.1,23.7)).build();
Edge eg10 = Knowledge.dimE("battled").edge("hercules", "hydra").property("time", 2).property("place", Geoshape.point(37.7, 23.9)).build();
Edge eg11 = Knowledge.dimE("battled").edge("hercules", "cerberus").property("time", 12).property("place", Geoshape.point(39,22)).build();
//pluto relation
Edge eg12 = Knowledge.dimE("brother").edge("pluto", "jupiter").build();
Edge eg13 = Knowledge.dimE("brother").edge("pluto", "neptune").build();
Edge eg14 = Knowledge.dimE("lives").edge("pluto", "tartarus").property("reason", "no fear of death").build();
Edge eg15 = Knowledge.dimE("pet").edge("pluto", "cerberus").build();
//cerberus relation
Edge eg16 = Knowledge.dimE("lives").edge("cerberus", "tartarus").build();

// 汇集数据
List<Entity> entitys = Lists.newArrayList(saturn,sky,sea,jupiter,neptune,hercules,alcmene,pluto,nemean,hydra,cerberus,tartarus);
List<Edge> edges = Lists.newArrayList(eg, eg1, eg2, eg3, eg4, eg5, eg6, eg6, eg7, eg8, eg9, eg10, eg11, eg12, eg13, eg14, eg15, eg16);

执行写入：

Graph g = G.GetGraph("god",mkSchema());
// 执行导入（及声明用户,原子级权限控制,子图隔离（可选）
g.addKnow(entitys, edges).exec(new User("Thutmomse.cn"));

AbutionGraph支持单节点100万数据的实时写入，每核6w+/s的高强度实时写入，只需要像以上addKnow增加数据即可，也可以使用Kafka、Flink或Spark启动一个端到端的永不掉线的实时写入。

3、诸神图查询

查找Saturn 的“孙子”：

1）Gremlin

gremlin> saturn = g.V().has('name', 'saturn').next()
==>v[256]
gremlin> g.V(saturn).valueMap()
==>[name:[saturn], age:[10000]]
gremlin> g.V(saturn).in('father').in('father').values('name')
==>hercules

可以看到，Gremlin需要先从全量数据的name列找到'saturn'的全局vertexId（数据库自增id），再通过vid遍历找到其'father'的'father'的vid，后通过得到的孙子vid获取vid之上的name字段。

Ps：vid之上的所有信息即是一个1维的实体模型，如;v[256]有：name,age..，我们只能通过增加属性来添加信息，并且只能往vid上添加一种类型的业务信息。

2）Aremlin

jshell> g.V("saturn").In().dim("father").In().dim("father").exec(user)

可以看到，Aremlin只需要根据提供名字叫'saturn'的节点名称即可遍历2层拿到其孙子的信息，这是支持自定义节点Id功能带来的好处，大多数情况的业务如身份证号、银行卡号都可以通过这种方式直接检索。

为了不影响视觉，以下将只列出Aremlin的查询示例。

查询所有实体：

jshell> g.V().exec(user);
==> [Entity[vertex=hercules,dimension=demigod,properties=Properties[age=30]],...

查询年龄小于50的实体：

jshell> g.V().dims().has("age").by(P.LessThan(50)).exec(user);
==> [Entity[vertex=hercules,dimension=demigod,properties=Properties[age=30]], Entity[vertex=alcmene,dimension=human,properties=Properties[age=45]]]

查询年龄小于50并且维度是human的实体：

jshell> g.V().dim("human").has("age").by(P.LessThan(50)).exec(user);
==>[{"class":"cn.thutmose.abution.graph.data.knowhow.Entity","dimension":"human","vertex":"alcmene","properties":{"age":45}}]

查询 hercules 的一层顶点id：

jshell> g.V("hercules").Out().exec(user);
==> [nemean, cerberus, hydra, jupiter, alcmene]

查询 hercules 的爸爸：

jshell> g.V("hercules").Out().dim("father").exec(user);
==> [jupiter]

查询 hercules 的爸爸的爸爸：

jshell> g.V("hercules").Out().dim("father").Out().dim("father").exec(user);
==> [saturn]

查询从起点 hercules出发遍历2层找到响应的节点：

jshell> g.V("hercules").While()
    .gql(G.GetNeighborIds().ToSet().ToEntityIds().Out().dim("father"))
    .maxRepeats(2)
    .exec(user);
==> [saturn]

获得在雅典50公里范围内发生的所有事件（北纬：37.97和长：23.72）：

jshell> g.E().dims().
    has("place").
    by(P.GeoWithin(Geoshape.circle(37.97, 23.72, 50))).
    exec(user)
==> [Edge[source=hercules,target=nemean,dimension=battled,properties=Properties[time=1,place=POINT (23.7 38.1)]], Edge[source=hercules,target=hydra,dimension=battled,properties=Properties[time=2,place=POINT (23.9 37.7)]]]

然后，加个中间结果缓存的操作，给定该信息，查看哪些顶点涉及哪些事件：

jshell> g.E().dims().
    has("place").by(P.GeoWithin(Geoshape.circle(37.97, 23.72, 50)))
    .ToVertices().extBothE().Store("source")
    .Select("source").In().ToVertices().Store("god1")
    .Select("source").Out().ToVertices().Store("god2")
    .SelectMap().setExports("god1","god2")
    .exec(user)
==> {"god1":["hercules"],"god2":["nemean","cerberus","hydra","jupiter","alcmene"]}

使用自定义函数Map：

jshell> g.V("saturn").In().dim("father")
          .Map(F.ItFunc(x-> {
               System.out.println("这里是打印中间结果: "+x);
               return x;
           }))
          .In().dim("father").exec(user))
==> 这里是打印中间结果: EntityKey[vertex=jupiter]
==> 这里是打印中间结果: EntityKey[vertex=jupiter]

Ps：在Aremlin中，您可以使用诸如Map操作在任何步骤后执行你想要的一切数据处理，并将处理后的结果传递给下一步操作。

四、后

AbutionGraph的Aremlin查询语言与Gremlin的大区别是维度概念多维与一维的区别，以前是一个节点只能拥有一个数据结构完全一样的标签和关联多个关系标签，现在是一个节点可以拥有多个数据结构不同的标签和关联任意多条拥有不同标签的相同关系，这是从维位到多维的提升。

在数据存储上，AbutionGraph同时拥有静态图和动态图两种存储模型，静态图存储与既往图库技术一样，动态图可以让您定制符合业务的模型，避免复杂GQL和查询延迟，我想，实时图分析是AbutionGraph对于图数据库领域大的贡献，接下来有时间我会分享一下在图数仓这个空白领域的应用案例。

来源https://www.modb.pro/db/75459

相关文章