大数据技术之Elasticsearch【下】
大数据技术之Elasticsearch【下】
3.1.6 新建文档(源数据map方式添加json)
1)源代码
@Test
public void createIndexByMap() {
// 1 文档数据准备
Map<String, Object> json = new HashMap<String, Object>();
json.put("id", "2");
json.put("title", "基于Lucene的搜索服务器");
json.put("content", "它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口");
// 2 创建文档
IndexResponse indexResponse = client.prepareIndex("blog", "article", "2").setSource(json).execute().actionGet();
// 3 打印返回的结果
System.out.println("index:" + indexResponse.getIndex());
System.out.println("type:" + indexResponse.getType());
System.out.println("id:" + indexResponse.getId());
System.out.println("version:" + indexResponse.getVersion());
System.out.println("result:" + indexResponse.getResult());
// 4 关闭连接
client.close();
}
2)结果查看
3.1.7 新建文档(源数据es构建器添加json)
1)源代码
@Test
public void createIndex() throws Exception {
// 1 通过es自带的帮助类,构建json数据
XContentBuilder builder = XContentFactory.jsonBuilder().startObject().field("id", 3).field("title", "基于Lucene的搜索服务器").field("content", "它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。")
.endObject();
// 2 创建文档
IndexResponse indexResponse = client.prepareIndex("blog", "article", "3").setSource(builder).get();
// 3 打印返回的结果
System.out.println("index:" + indexResponse.getIndex());
System.out.println("type:" + indexResponse.getType());
System.out.println("id:" + indexResponse.getId());
System.out.println("version:" + indexResponse.getVersion());
System.out.println("result:" + indexResponse.getResult());
// 4 关闭连接
client.close();
}
2)结果查看
3.1.8 搜索文档数据(单个索引)
1)源代码
@Test
public void getData() throws Exception {
// 1 查询文档
GetResponse response = client.prepareGet("blog", "article", "1").get();
// 2 打印搜索的结果
System.out.println(response.getSourceAsString());
// 3 关闭连接
client.close();
}
2)结果查看
3.1.9 搜索文档数据(多个索引)
1)源代码
@Test
public void getMultiData() {
// 1 查询多个文档
MultiGetResponse response = client.prepareMultiGet().add("blog", "article", "1").add("blog", "article", "2", "3").add("blog", "article", "2").get();
// 2 遍历返回的结果
for(MultiGetItemResponse itemResponse:response){
GetResponse getResponse = itemResponse.getResponse();
// 如果获取到查询结果
if (getResponse.isExists()) {
String sourceAsString = getResponse.getSourceAsString();
System.out.println(sourceAsString);
}
}
// 3 关闭资源
client.close();
}
2)结果查看
{"id":"1","title":"基于Lucene的搜索服务器","content":"它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口"}
{"content":"它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口","id":"2","title":"基于Lucene的搜索服务器"}
{"id":3,"titile":"ElasticSearch是一个基于Lucene的搜索服务器","content":"它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。"}
{"content":"它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口","id":"2","title":"基于Lucene的搜索服务器"}
3.1.10 更新文档数据(update)
1)源代码 注:只能对已有得文件进行更改
@Test
public void updateData() throws Throwable {
// 1 创建更新数据的请求对象
UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("blog");
updateRequest.type("article");
updateRequest.id("3");
updateRequest.doc(XContentFactory.jsonBuilder().startObject()
// 对没有的字段添加, 对已有的字段替换
.field("title", "基于Lucene的搜索服务器")
.field("content","它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。大数据前景无限")
.field("createDate", "2017-8-22").endObject());
// 2 获取更新后的值
UpdateResponse indexResponse = client.update(updateRequest).get();
// 3 打印返回的结果
System.out.println("index:" + indexResponse.getIndex());
System.out.println("type:" + indexResponse.getType());
System.out.println("id:" + indexResponse.getId());
System.out.println("version:" + indexResponse.getVersion());
System.out.println("create:" + indexResponse.getResult());
// 4 关闭连接
client.close();
}
2)结果查看
3.1.11 更新文档数据(upsert)
设置查询条件, 查找不到则添加IndexRequest内容,查找到则按照UpdateRequest更新。
@Test
public void testUpsert() throws Exception {
// 设置查询条件, 查找不到则添加
IndexRequest indexRequest = new IndexRequest("blog", "article", "5")
.source(XContentFactory.jsonBuilder().startObject().field("title", "搜索服务器").field("content","它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java开发的,并作为Apache许可条款下的开放源码发布,是当前流行的企业级搜索引擎。设计用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。").endObject());
// 设置更新, 查找到更新下面的设置
UpdateRequest upsert = new UpdateRequest("blog", "article", "5")
.doc(XContentFactory.jsonBuilder().startObject().field("user", "李四").endObject()).upsert(indexRequest);
client.update(upsert).get();
client.close();
}
次执行
bigdata11:9200/blog/article/5
第二次执行
bigdata11:9200/blog/article/5
3.1.12 删除文档数据(prepareDelete)
1)源代码
@Test
public void deleteData() {
// 1 删除文档数据
DeleteResponse indexResponse = client.prepareDelete("blog", "article", "5").get();
// 2 打印返回的结果
System.out.println("index:" + indexResponse.getIndex());
System.out.println("type:" + indexResponse.getType());
System.out.println("id:" + indexResponse.getId());
System.out.println("version:" + indexResponse.getVersion());
System.out.println("found:" + indexResponse.getResult());
// 3 关闭连接
client.close();
}
2)结果查看
3.2 条件查询QueryBuilder
3.2.1 查询所有(matchAllQuery)
1)源代码
@Test
public void matchAllQuery() {
// 1 执行查询
SearchResponse searchResponse = client.prepareSearch("blog").set*("article")
.setQuery(QueryBuilders.matchAllQuery()).get();
// 2 打印查询结果
SearchHits hits = searchResponse.getHits(); // 获取命中次数,查询结果有多少对象
System.out.println("查询结果有:" + hits.getTotalHits() + "条");
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());//打印出每条结果
}
// 3 关闭连接
client.close();
}
2)结果查看
3.2.2 对所有字段分词查询(queryStringQuery)
1)源代码
@Test
public void query() {
// 1 条件查询
SearchResponse searchResponse = client.prepareSearch("blog").set*("article")
.setQuery(QueryBuilders.queryStringQuery("全文")).get();
// 2 打印查询结果
SearchHits hits = searchResponse.getHits(); // 获取命中次数,查询结果有多少对象
System.out.println("查询结果有:" + hits.getTotalHits() + "条");
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());//打印出每条结果
}
// 3 关闭连接
client.close();
}
2)结果查看
3.2.3 通配符查询(wildcardQuery)
* :表示多个字符(0个或多个字符)
?:表示单个字符
1)源代码
@Test
public void wildcardQuery() {
// 1 通配符查询
SearchResponse searchResponse = client.prepareSearch("blog").set*("article")
.setQuery(QueryBuilders.wildcardQuery("content", "*全*")).get();
// 2 打印查询结果
SearchHits hits = searchResponse.getHits(); // 获取命中次数,查询结果有多少对象
System.out.println("查询结果有:" + hits.getTotalHits() + "条");
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());//打印出每条结果
}
// 3 关闭连接
client.close();
}
2)结果查看
3.2.4 词条查询(TermQuery)
1)源代码
@Test
public void termQuery() {
// 1 field查询
SearchResponse searchResponse = client.prepareSearch("blog").set*("article")
.setQuery(QueryBuilders.termQuery("content", "全文")).get();
// 2 打印查询结果
SearchHits hits = searchResponse.getHits(); // 获取命中次数,查询结果有多少对象
System.out.println("查询结果有:" + hits.getTotalHits() + "条");
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());//打印出每条结果
}
// 3 关闭连接
client.close();
}
2)结果查看
3.2.5 模糊查询(fuzzy)
@Test
public void fuzzy() {
// 1 模糊查询
SearchResponse searchResponse = client.prepareSearch("blog").set*("article")
.setQuery(QueryBuilders.fuzzyQuery("title", "lucene")).get();
// 2 打印查询结果
SearchHits hits = searchResponse.getHits(); // 获取命中次数,查询结果有多少对象
System.out.println("查询结果有:" + hits.getTotalHits() + "条");
Iterator<SearchHit> iterator = hits.iterator();
while (iterator.hasNext()) {
SearchHit searchHit = iterator.next(); // 每个查询对象
System.out.println(searchHit.getSourceAsString()); // 获取字符串格式打印
}
// 3 关闭连接
client.close();
}
3.3 映射相关操作
1)源代码
@Test
public void createMapping() throws Exception {
// 1设置mapping
XContentBuilder builder = XContentFactory.jsonBuilder()
.startObject()
.startObject("article")
.startObject("properties")
.startObject("id1")
.field("type", "string")
.field("store", "yes")
.endObject()
.startObject("title2")
.field("type", "string")
.field("store", "no")
.endObject()
.startObject("content")
.field("type", "string")
.field("store", "yes")
.endObject()
.endObject()
.endObject()
.endObject();
// 2 添加mapping
PutMappingRequest mapping = Requests.putMappingRequest("blog4").type("article").source(builder);
client.admin().indices().putMapping(mapping).get();
// 3 关闭资源
client.close();
}
2)查看结果
四 IK分词器
针对词条查询(TermQuery),查看默认中文分词器的效果:
[bdqn@hadoop105 elasticsearch]$ curl -XGET 'http://hadoop105:9200/_analyze?pretty&analyzer=standard' -d '中华人民共和国'
{
"tokens" : [
{
"token" : "中",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "华",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "人",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "民",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "共",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 4
},
{
"token" : "和",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<IDEOGRAPHIC>",
"position" : 5
},
{
"token" : "国",
"start_offset" : 6,
"end_offset" : 7,
"type" : "<IDEOGRAPHIC>",
"position" : 6
}
]
}
4.1 IK分词器的安装
4.1.1 前期准备工作
1)CentOS联网
配置CentOS能连接外网。Linux虚拟机ping www.baidu.com 是畅通的
2)jar包准备
(1)elasticsearch-analysis-ik-master.zip
(下载地址:https://github.com/medcl/elasticsearch-analysis-ik)
(2)apache-maven-3.0.5-bin.tar.gz
4.1.2 jar包安装
1)Maven解压、配置 MAVEN_HOME和PATH。
[bdqn@bigdata11 software]# tar -zxvf apache-maven-3.0.5-bin.tar.gz -C /opt/module/
[bdqn@bigdata11 apache-maven-3.0.5]# sudo vi /etc/profile
#MAVEN_HOME
export MAVEN_HOME=/opt/module/apache-maven-3.0.5
export PATH=$PATH:$MAVEN_HOME/bin
[bdqn@hadoop101 software]#source /etc/profile
验证命令:mvn -version
2)Ik分词器解压、打包与配置
ik分词器解压
[bdqn@bigdata11 software]$ unzip elasticsearch-analysis-ik-master.zip -d ./
进入ik分词器所在目录
[bdqn@bigdata11 software]$ cd elasticsearch-analysis-ik-master
使用maven进行打包
[bdqn@bigdata11 elasticsearch-analysis-ik-master]$ mvn package -Pdist,native -DskipTests -Dtar
打包完成之后,会出现 target/releases/elasticsearch-analysis-ik-{version}.zip
[bdqn@bigdata11 releases]$ pwd /opt/software/elasticsearch-analysis-ik-master/target/releases
对zip文件进行解压,并将解压完成之后的文件拷贝到es所在目录下的/plugins/
[bdqn@bigdata11 releases]$ unzip elasticsearch-analysis-ik-6.0.0.zip
[bdqn@bigdata11 releases]$ cp -r elasticsearch /opt/module/elasticsearch-5.6.1/plugins/
需要修改plugin-descriptor.properties文件,将其中的es版本号改为你所使用的版本号,即完成ik分词器的安装
[bdqn@bigdata11 elasticsearch]$ vi plugin-descriptor.properties
71行
elasticsearch.version=5.6.1
修改为
elasticsearch.version=5.2.2
至此,安装完成,重启ES!
4.2 IK分词器的使用
4.2.1 命令行查看结果
ik_smart模式
[bdqn@bigdata11 elasticsearch]$ curl -XGET 'http://bigdata11:9200/_analyze?pretty&analyzer=ik_smart' -d '中华人民共和国'
{
"tokens" : [
{
"token" : "中华人民共和国",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
}
]
}
ik_max_word模式
[bdqn@bigdata11 elasticsearch]$ curl -XGET 'http://bigdata11:9200/_analyze?pretty&analyzer=ik_max_word' -d '中华人民共和国'
{
"tokens" : [
{
"token" : "中华人民共和国",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "中华人民",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "中华",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "华人",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "人民共和国",
"start_offset" : 2,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "人民",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "共和国",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "共和",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "国",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 8
}
]
}
4.2.2 JavaAPI操作
1)创建索引
//创建索引(数据库)
@Test
public void createIndex() {
//创建索引
client.admin().indices().prepareCreate("blog4").get();
//关闭资源
client.close();
}
2)创建mapping
//创建使用ik分词器的mapping
@Test
public void createMapping() throws Exception {
// 1设置mapping
XContentBuilder builder = XContentFactory.jsonBuilder()
.startObject()
.startObject("article")
.startObject("properties")
.startObject("id1")
.field("type", "string")
.field("store", "yes")
.field("analyzer","ik_smart")
.endObject()
.startObject("title2")
.field("type", "string")
.field("store", "no")
.field("analyzer","ik_smart")
.endObject()
.startObject("content")
.field("type", "string")
.field("store", "yes")
.field("analyzer","ik_smart")
.endObject()
.endObject()
.endObject()
.endObject();
// 2 添加mapping
PutMappingRequest mapping = Requests.putMappingRequest("blog4").type("article").source(builder);
client.admin().indices().putMapping(mapping).get();
// 3 关闭资源
client.close();
}
3)插入数据
//创建文档,以map形式
@Test
public void createDocumentByMap() {
HashMap<String, String> map = new HashMap<>();
map.put("id1", "2");
map.put("title2", "Lucene");
map.put("content", "它提供了一个分布式的web接口");
IndexResponse response = client.prepareIndex("blog4", "article", "3").setSource(map).execute().actionGet();
//打印返回的结果
System.out.println("结果:" + response.getResult());
System.out.println("id:" + response.getId());
System.out.println("index:" + response.getIndex());
System.out.println("type:" + response.getType());
System.out.println("版本:" + response.getVersion());
//关闭资源
client.close();
}
4 词条查询
//词条查询
@Test
public void queryTerm() {
SearchResponse response = client.prepareSearch("blog4").set*("article").setQuery(QueryBuilders.termQuery("content","提供")).get();
//获取查询命中结果
SearchHits hits = response.getHits();
System.out.println("结果条数:" + hits.getTotalHits());
for (SearchHit hit : hits) {
System.out.println(hit.getSourceAsString());
}
}
5)结果查看
相关文章