SPARQL查询是否重复我不理解

2022-03-26 00:00:00 sparql java

我使用此查询获取所有编程语言及其详细信息。这是我的测试课。我在Java中使用过它,它工作得很好。我面临的问题是,有一种语言叫做"ML(编程语言)"

它以不同的摘要、不同的影响多次印刷。不仅是ML,还有一些其他语言也在做这件事。我不知道我的查询中是否有任何问题,或者它是否原样获取了准确的数据。

package io.naztech.dbpedia;

import java.io.ByteArrayOutputStream;
import java.util.List;

import org.apache.jena.query.ResultSet;
import org.apache.jena.query.ResultSetFormatter;
import org.apache.jena.sparql.engine.http.QueryEngineHTTP;
import org.junit.BeforeClass;
import org.junit.Test;

import io.naztech.talent.model.PediaTag;

public class testDataFetching {

    @Test
    public void testAllDataFetching() {

        String q =  "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
"+
                    "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
"+
                    "PREFIX dbo: <http://dbpedia.org/ontology/> 
"+
                    "PREFIX dbp: <http://dbpedia.org/property/> 
"+
                    "PREFIX owl: <http://www.w3.org/2002/07/owl#> 
"+
                    "PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
" +
                    "PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
" +
                    "PREFIX dc: <http://purl.org/dc/elements/1.1/> 
" +
                    "PREFIX : <http://dbpedia.org/resource/> 
" +
                    "PREFIX dbpedia2: <http://dbpedia.org/property/> 
" +
                    "PREFIX dbpedia: <http://dbpedia.org/> 
" +
                    "PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
" +

                    "SELECT DISTINCT ?pl ?pl_label ?abstract ?_thumbnail 
" +
                    "( Group_concat ( DISTINCT ?_influenced_label; separator= ", ")   AS ?influenced ) 
" + 
                    "( Group_concat ( DISTINCT ?_influencedBy_label; separator= ", ") AS ?influencedBy ) 
" + 
                    "( group_concat ( DISTINCT ?_sameAs; separator=", " ) AS ?sameAs ) 
" +
                    "( group_concat ( DISTINCT ?_paradigm_label; separator=", " ) AS ?paradigm ) 
" +

                    "WHERE  {
" +

                    "       ?pl rdf:type dbo:ProgrammingLanguage .
" + 

                    "       OPTIONAL { ?pl dbo:abstract ?abstract .
" + 

                    "       FILTER ( LANG ( ?abstract ) = 'en' ) . } 
" + 

                    "       ?pl rdfs:label ?pl_label .
" + 

                    "       FILTER ( LANG ( ?pl_label ) = 'en' ) .
" + 

                    "       OPTIONAL { ?pl dbo:influenced ?_influenced . 
" + 

                    "       ?_influenced rdfs:label ?_influenced_label . 
" + 

                    "       FILTER ( LANG ( ?_influenced_label ) = 'en' ) . } 
" + 

                    "       OPTIONAL { ?pl dbo:influencedBy  ?_influencedBy . 
" + 

                    "       ?_influencedBy  rdfs:label ?_influencedBy_label . 
" + 

                    "       FILTER ( LANG ( ?_influencedBy_label ) = 'en' ) . } 
" +

                    "       OPTIONAL { ?pl owl:sameAs ?_sameAs . } 
" +

                    "       OPTIONAL { ?pl dbp:paradigm ?_paradigm . 
" +

                    "       ?_paradigm rdfs:label ?_paradigm_label . } 
" + 

                    "       OPTIONAL { ?pl dbo:thumbnail ?_thumbnail . } 
" +

                    "       }"+

                    "       GROUP BY ?pl ?pl_label ?abstract ?_thumbnail ?influenced ?influencedBy ?sameAs ?paradigm";

        @SuppressWarnings("resource")
        QueryEngineHTTP queryEngine = new QueryEngineHTTP("http://live.dbpedia.org/sparql", q);
        ResultSet results = queryEngine.execSelect();

        int count = 0;

        while (results.hasNext()) 
        {
            QuerySolution qs =  results.next();
            System.out.println("NAME-->
"+qs.get("pl_label").toString()+"
");

            if(qs.get("influenced") != null)
            {
            System.out.println("INFLUENCED-->
"+qs.get("influenced").toString()+"
"); 
            }
           if(qs.get("influencedBy") != null)
            {
                System.out.println("INFLUENCED BY-->
"+qs.get("influencedBy").toString()+"
"); 
            }
           if(qs.get("abstract") != null)
            {
                System.out.println("ABSTRACT-->
"+qs.get("abstract").toString()+"
");  
            }

            if(qs.get("sameAs") != null)
            {
                System.out.println("SAME AS-->
"+qs.get("sameAs").toString()+"
");  
            }

            if(qs.get("paradigm") != null)
            {
            System.out.println("PARADIGM-->
"+qs.get("paradigm").toString()+"
");  
            }

            if(qs.get("_thumbnail") != null)
            {
                System.out.println("THUMBNAIL-->
"+qs.get("_thumbnail").toString()+"
");  
            }

            System.out.println("
");

            count++;
        }

        System.out.println(count);



    }

}

解决方案

数据集中有3篇英文摘要,请看DBpedia Live resource。

您可以通过从group by ...部分中删除?abstract变量来解决此问题,而使用聚合函数(sample, min, max)来获取任何抽象:

SELECT  ?pl ?pl_label 
        (MIN(?_abstract) AS ?abstract) # <- used MIN here to ensure stable result
        ?_thumbnail 
        (GROUP_CONCAT(DISTINCT ?_influenced_label ; separator='; ') AS ?influenced) 
        (GROUP_CONCAT(DISTINCT ?_influencedBy_label ; separator='; ') AS ?influencedBy) 
        (GROUP_CONCAT(DISTINCT ?_sameAs ; separator=', ') AS ?sameAs) 
        (GROUP_CONCAT(DISTINCT ?_paradigm_label ; separator=', ') AS ?paradigm)
WHERE
  { ?pl  a  dbo:ProgrammingLanguage ;
         rdfs:label  ?pl_label
    FILTER ( lang(?pl_label) = "en" )

    OPTIONAL
      { ?pl  dbo:abstract  ?_abstract
        FILTER ( lang(?_abstract) = "en" )
      }
    OPTIONAL
      { ?pl       dbo:influenced/rdfs:label  ?_influenced_label
        FILTER ( lang(?_influenced_label) = "en" )
      }
    OPTIONAL
      { ?pl       dbo:influencedBy/rdfs:label  ?_influencedBy_label
        FILTER ( lang(?_influencedBy_label) = "en" )
      }
    OPTIONAL
      { ?pl  owl:sameAs  ?_sameAs }
    OPTIONAL
      { ?pl       dbp:paradigm/rdfs:label  ?_paradigm_label
        FILTER ( lang(?_paradigm_label) = "en" )
      }
    OPTIONAL
      { ?pl  dbo:thumbnail  ?_thumbnail }
  }
GROUP BY ?pl ?pl_label ?_thumbnail

更新

我在这里添加@TallTed的评论,他是Virtuoso背后的人之一,比我更了解:

请注意,虽然建议的聚合函数(<[2-3]、MAXSAMPLE)将获得值,不能保证 该值将是最新接收到数据集的值。

相关文章