CRUDRepository 的保存方法很慢?

我想在我的 neo4j 数据库中存储一些数据.我为此使用 spring-data-neo4j.

i want to store some data in my neo4j database. i use spring-data-neo4j for that.

我的代码如下:

    for (int i = 0; i < newRisks.size(); i++) {
        myRepository.save(newRisks.get(i));
        System.out.println("saved " + newRisks.get(i).name);
    }

我的 newRisks 数组包含大约 60000 个对象和 60000 个边.每个节点和边都有一个属性.此循环的持续时间约为 15 - 20 分钟,这正常吗?我使用 Java VisualVM 来搜索一些瓶颈,但我的平均 CPU 使用率为 10 - 25%(4 个内核),而且我的堆还不到一半.

My newRisks-array contains circa 60000 objects and 60000 edges. Every node and edge has one property. The duration of this loop is circa 15 - 20 minutes, is this normal? I used Java VisualVM to search some bottlenecks, but my average CPU usage was 10 - 25% (of 4 cores) and my heap was less than half full.

有什么办法可以提升这个操作?

There are any options to boost up this operation?

额外的是,在第一次调用 myRepository.save(newRisks.get(i)); 时 jvm 在第一次输出前几分钟进入睡眠状态来了

additional is, on the first call of myRepository.save(newRisks.get(i)); the jvm falling assleep fpr some minutes before the first output is comming

第二次

类别风险:

@NodeEntity
public class Risk {
    //...
    @Indexed
    public String name;

    @RelatedTo(type = "CHILD", direction = Direction.OUTGOING)
    Set<Risk> risk = new HashSet<Risk>();

    public void addChild(Risk child) {
        risk.add(child);
    }

    //...
}

制造风险:

@Autowired
private Repository myRepository;

@Transactional
public Collection<Risk> makeSomeRisks() {

    ArrayList<Risk> newRisks = new ArrayList<Risk>();

    newRisks.add(new Risk("Root"));

    for (int i = 0; i < 60000; i++) {
        Risk risk = new Risk("risk " + (i + 1));
        newRisks.get(0).addChild(risk);
        newRisks.add(risk);
    }

    for (int i = 0; i < newRisks.size(); i++) {
        myRepository.save(newRisks.get(i));
    }

    return newRisks;
}

推荐答案

这里的问题是您正在使用不是为此而设计的 API 进行批量插入.

The problem here is that you are doing mass-inserts with an API that is not intended for that.

您创建一个 Risk 和 60k 个子项,您首先保存根,该根同时也保留 60k 个子项(并创建关系).这就是为什么第一次保存需要这么长时间.然后你又救了孩子们.

You create a Risk and 60k children, you first save the root which also persists the 60k children at the same time (and creates the relationships). That's why the first save takes so long. And then you save the children again.

有一些解决方案可以通过 SDN 加快速度.

There are some solutions to speed it up with SDN.

  1. 不要使用集合的方式进行大量插入,持久化参与者并使用 template.createRelationshipBetween(root, child, "CHILD",false);

  1. don't use the collection approach for mass inserts, persist both participants and use template.createRelationshipBetween(root, child, "CHILD",false);

先持久化子对象,然后将所有持久化的子对象添加到根对象并持久化

persist the children first then add all the persisted children to the root object and persist that

正如您所做的那样,使用 Neo4j-Core API,但调用 template.postEntityCreation(node,Risk.class) 以便您可以通过 SDN 访问实体.然后你还必须自己索引实体 (db.index.forNodes("Risk").add(node,"name",name);) (或使用 neo4j core-api 自动索引,但这不是与 SDN 兼容).

As you did, use the Neo4j-Core API but call template.postEntityCreation(node,Risk.class) so that you can access the entities via SDN. Then you also have to index the entities on your own (db.index.forNodes("Risk").add(node,"name",name);) (or use the neo4j core-api auto-index, but that's not compatible with SDN).

无论是 core-api 还是 SDN,您都应该使用大约 10-20k 个节点/rels 的 tx-size 以获得最佳性能

Regardless with the core-api or SDN you should use tx-sizes of around 10-20k nodes/rels for best performance

相关文章