JoyLau's Blog

JoyLau 的技术学习与思考

准备工作

首先

首先要说明的是,本篇文章用的 Spark 的版本都是目前最新版,直接在官网上下载就可以了,有注意的,下面详细说
有些命令可能已经不适应之前的旧版本了,以最新的版的为准
以下操作命令均是在服务的根目录下,使用的是相对目录

当前版本说明

  • jdk 1.8.0
  • Hadoop 版本2.8.2
  • 操作系统版本 centos 7.2
  • Spark 2.2.0

首先需要做的

安装 jdk 环境,再此不做详细叙述了,需要注意的是 jdk 的环境变量的配置
安装 Hadoop 环境,必须安装 Hadoop 才能使用 Spark,但如果使用 Spark 过程中没用到 HDFS,不启动 Hadoop 也是可以的

安装 Spark

打开官网下载的地址: http://spark.apache.org/downloads.html
需要注意的是,在选择下载包类型 Choose a package type 这个需要根据安装的 Hadoop 的版本来定的,或者直接选择 Pre-build with user-provided Apache Hadoop
这样我们可以自己配置 Hadoop 的版本

下载后,解压

进入 conf目录拷贝一份配置文件

1
cp ./conf/spark-env.sh.template ./conf/spark-env.sh

加入环境变量

1
export SPARK_DIST_CLASSPATH=$(/home/hadoop-2.8.2/bin/hadoop classpath)

我们运行

1
# ./sbin/start-all.sh

Spark 便会运行起来,查看地址 : http://localhost:8080 可查看到集群情况

运行 Spark 示例程序

正如前面的 Hadoop 一样, Spark 自带有很多示例程序,目录在 ./example 下面,有 Java 的 Python,Scala ,R 语言的,
这里我们选个最熟悉的 Java 版的来跑下

我们找到 Java 的目录里也能看到里面有很多程序,能看到我们熟悉的 wordcount

这里我们跑个 计算π的值

1
# ./bin/run-example SparkPi

运行后控制台打印很多信息,但是能看到这么一行:

Pi is roughly 3.1432557162785812

这就可以了

RDD

RDD : Spark 的分布式的元素集合(distributed collection of items),称为RDD(Resilient Distributed Dataset,弹性分布式数据集),它可被分发到集群各个节点上,进行并行操作。RDDs 可以通过 Hadoop InputFormats 创建(如 HDFS),或者从其他 RDDs 转化而来

我就简单的理解为 类比 Hadoop 的 MapReduce

RDDs 支持两种类型的操作

  • actions: 在数据集上运行计算后返回值
  • transformations: 转换, 从现有数据集创建一个新的数据集

Spark-Shell

Spark-shell 支持 Scala 和 Python 2中语言,这里我们就用 Scala 来做,关于 Scala 的使用和语法我打算新写一篇文章来记录下,
在之前我也写过 在 maven 中集成使用 Scala 来编程,这里我先用下

执行 shell

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# ./bin/spark-shell

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/11/24 09:33:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/24 09:33:37 WARN util.Utils: Your hostname, JoyLinux resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface enp0s3)
17/11/24 09:33:37 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context Web UI available at http://10.0.2.15:4040
Spark context available as 'sc' (master = local[*], app id = local-1511487218050).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

来执行一个文本统计

1
2
3
scala> val textFile = sc.textFile("file:///home/hadoop-2.8.2/input/test.txt").count()

textFile: Long = 4

默认读取的文件是 Hadoop HDFS 上的,上面的示例是从本地文件读取

来一个从 HDFS 上读取的,在这里我们之前在 HDFS 上传了个 tets.txt 的文档,在这里就可以直接使用了

1
2
3
4
scala> val textFile = sc.textFile("test2.txt");textFile.count()

textFile: org.apache.spark.rdd.RDD[String] = test2.txt MapPartitionsRDD[19] at textFile at <console>:26
res7: Long = 4

可以看到结果是一样的

Spark SQL 和 DataFrames

Spark SQL 是 Spark 内嵌的模块,用于结构化数据。在 Spark 程序中可以使用 SQL 查询语句或 DataFrame API。DataFrames 和 SQL 提供了通用的方式来连接多种数据源,支持 Hive、Avro、Parquet、ORC、JSON、和 JDBC,并且可以在多种数据源之间执行 join 操作。

下面仍在 Spark shell 中演示一下 Spark SQL 的基本操作,该部分内容主要参考了 Spark SQL、DataFrames 和 Datasets 指南。

Spark SQL 的功能是通过 SQLContext 类来使用的,而创建 SQLContext 是通过 SparkContext 创建的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
scala> var df = spark.read.json("file:///home/spark-2.2.0-bin-without-hadoop/examples/src/main/resources/employees.json")
df: org.apache.spark.sql.DataFrame = [name: string, salary: bigint]

scala> df.show()
+-------+------+
| name|salary|
+-------+------+
|Michael| 3000|
| Andy| 4500|
| Justin| 3500|
| Berta| 4000|
+-------+------+


scala>

再来执行2条查询语句
df.select("name").show()
df.filter(df("salary")>=4000).show()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
scala> df.select("name").show()
+-------+
| name|
+-------+
|Michael|
| Andy|
| Justin|
| Berta|
+-------+


scala> df.filter(df("salary")>=4000).show()
+-----+------+
| name|salary|
+-----+------+
| Andy| 4500|
|Berta| 4000|
+-----+------+

执行一条 sql 语句试试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
scala> df.registerTempTable("employees")
warning: there was one deprecation warning; re-run with -deprecation for details

scala> spark.sql("select * from employees").show()
+-------+------+
| name|salary|
+-------+------+
|Michael| 3000|
| Andy| 4500|
| Justin| 3500|
| Berta| 4000|
+-------+------+


scala> spark.sql("select * from employees where salary >= 4000").show()
+-----+------+
| name|salary|
+-----+------+
| Andy| 4500|
|Berta| 4000|
+-----+------+

其实还有很多功能呢, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame ,这里先写2个试试,后续再细节学习

这篇文章暂时先写到这,还有后续的 Spark Streaming ,想先学学看流式计算Storm,之后对比下看看写一篇文章

接下来,熟悉 Scala 语法写一个 JavaScala 应用程序来通过 SparkAPI 单独部署一下试试

感受

这篇文章写下来等于将当时搭建 Spark 环境重复了一遍, 也是一遍敲命令,一遍记录下来,温故而知新,自己也学到不少东西,棒棒哒💯

首先

首先要说明的是,本篇文章用的 Hadoop 的版本都是目前最新版,直接在官网上下载就可以了
有些命令可能已经不适应之前的旧版本了,以最新的版的为准
以下操作命令均是在服务的根目录下,使用的是相对目录

当前版本说明

  • Hadoop 版本2.8.2
  • 操作系统版本 centos 7.2

首先需要做的

安装 jdk 环境,再此不做详细叙述了,需要注意的是 jdk 的环境变量的配置

yum install openjdk1.8xxxxx 这个安装的是 jre环境,并不是 jdk,安装 jdk

1
sudo yum install java-1.7.0-openjdk java-1.8.0-openjdk-devel

配置环境变量

1
vim ~/.bashrc

最后一行添加

1
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk

紧接着,让环境变量生效

1
source ~/.bashrc    # 使变量设置生效

设置好之后,再看下是否生效了

1
2
3
echo $JAVA_HOME     # 检验变量值
java -version
$JAVA_HOME/bin/java -version # 与直接执行 java -version 一样就没什么问题了

Hadoop 单机环境搭建及测试运行

官网下载 Hadoop 包

上传到服务器上,解压 tar -zxf hadoop-2.8.2.tar.gz
解压完了,我们可以查看下版本信息

1
2
3
4
5
6
7
8
bin/hadoop version

Hadoop 2.8.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 66c47f2a01ad9637879e95f80c41f798373828fb
Compiled by jdu on 2017-10-19T20:39Z
Compiled with protoc 2.5.0
From source with checksum dce55e5afe30c210816b39b631a53b1d
This command was run using /home/hadoop-2.8.2/share/hadoop/common/hadoop-common-2.8.2.jar

出现上述信息就没有什么问题

接下来,就可以运行 Hadoop 自带的列子了,例子的目录在 /share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// 创建1个输入目录,输出目录不用创建,在命令中会自动创建,如果创建了,会提示目录已经存在,再次运行示例程序化,删除输出目录即可
mkdir ./input

// 看看都有哪些例子
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar

An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

接下来,跑一个经典的 wordcount ,再次之前,我们创建一个文本以供程序统计

1
2
3
4
cat input/test.txt
vi input/test.txt

插入一些字符

开始记录

1
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar wordcount ./input/test.txt ./output/

截取部分输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
17/11/22 11:30:08 INFO mapred.LocalJobRunner: reduce > reduce
17/11/22 11:30:08 INFO mapred.Task: Task 'attempt_local1247748922_0001_r_000000_0' done.
17/11/22 11:30:08 INFO mapred.LocalJobRunner: Finishing task: attempt_local1247748922_0001_r_000000_0
17/11/22 11:30:08 INFO mapred.LocalJobRunner: reduce task executor complete.
17/11/22 11:30:08 INFO mapreduce.Job: Job job_local1247748922_0001 running in uber mode : false
17/11/22 11:30:08 INFO mapreduce.Job: map 100% reduce 100%
17/11/22 11:30:08 INFO mapreduce.Job: Job job_local1247748922_0001 completed successfully
17/11/22 11:30:08 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=605002
FILE: Number of bytes written=1267054
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=38
Map output records=35
Map output bytes=277
Map output materialized bytes=251
Input split bytes=103
Combine input records=35
Combine output records=23
Reduce input groups=23
Reduce shuffle bytes=251
Reduce input records=23
Reduce output records=23
Spilled Records=46
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=21
Total committed heap usage (bytes)=461250560
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=140
File Output Format Counters
Bytes Written=165

看下输出情况

1
2
3
4
5
# cat output/*
hello 1
jjjjj 1
joylau 2
world 1

可以看到每个单词出现的次数

Hadoop 伪分布式环境搭建

我们需要设置 HADOOP 环境变量

1
2
3
4
5
6
7
8
9
10
11
12
13
gedit ~/.bashrc

export HADOOP_HOME=/home/hadoop-2.8.2
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

source ~/.bashrc

修改配置文件

core-site.xml

1
2
3
4
5
6
7
8
9
10
11
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/temp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/temp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/temp/hdfs/datanode</value>
</property>
</configuration>

配置完成后,执行 NameNode 和 DataNode 的格式化:

1
2
./bin/hdfs namenode -format
./bin/hdfs datanode -format

现在启动 Hadoop 伪分布式服务器

1
2
./sbin/start-dfs.sh 
./sbin/start-yarn.sh

以前版本的命令是

1
./sbin/start-all.sh

jps查看启动是否成功启动

1
2
3
4
5
6
7
jps

5360 Jps
4935 ResourceManager
5225 NodeManager
4494 NameNode
4782 SecondaryNameNode

成功启动后,可以访问 Web 界面 http://localhost:50070 查看 NameNode 和 Datanode 信息,还可以在线查看 HDFS 中的文件
运行 stop-all.sh 来关闭所有进程

伪分布式环境实例运行

上面实例的运行时单机版的,伪分布式的实例的运行的不同之处在于,读取文件是在 HDFS 上的

按照常规的尿性,我们先创建个用户目录 ,以后就可以以相对目录来进行文件的操作了

这里得说下 hdfs 的常用 shell

  • 创建目录 ./bin/hdfs dfs -mkdir -p /user/root
  • 上传文档 ./bin/hdfs dfs -put ./input/test.txt input
  • 删除文档 ./bin/hdfs dfs -rmr input
  • 产看文档 ./bin/hdfs dfs -cat input/*
  • 查看列表 ./bin/hdfs dfs -ls input
  • 拉取文档 ./bin/hdfs dfs -get output/* ./output

有了这些简单的命令,现在就可以运行实例

先创建用户目录 ./bin/hdfs dfs -mkdir -p /user/root
在新建一个目录 ./bin/hdfs dfs -mkdir input
将之前的文件上传 ./bin/hdfs dfs -put ./input/test.txt input
上传成功后还可以查看下时候有文件 ./bin/hdfs dfs -ls input
运行实例 ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar wordcount input/ output/
查看运行结果 ./bin/hdfs dfs -cat output/*

其实这些命令都是类 linux 命令,熟悉 linux 命令,这些都很好操作

可以看到统计结果和单机版是一致的

将结果导出 ./bin/hdfs dfs -get output ./output

其实 在 http://host:50070/explorer.html#/user/root 可以看到上传和输出的文件目录

YARN 启动

伪分布式不启动 YARN 也可以,一般不会影响程序执行
YARN 是从 MapReduce 中分离出来的,负责资源管理与任务调度。YARN 运行于 MapReduce 之上,提供了高可用性、高扩展性

首先修改配置文件 mapred-site.xml,这边需要先进行重命名:

1
mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml

增加配置

1
2
3
4
5
6
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

配置文件 yarn-site.xml:

1
2
3
4
5
6
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
1
2
./sbin/start-yarn.sh      $ 启动YARN
./sbin/mr-jobhistory-daemon.sh start historyserver # 开启历史服务器,才能在Web中查看任务运行情况

启动 YARN 之后,运行实例的方法还是一样的,仅仅是资源管理方式、任务调度不同。
观察日志信息可以发现,不启用 YARN 时,是 “mapred.LocalJobRunner” 在跑任务,
启用 YARN 之后,是 “mapred.YARNRunner” 在跑任务。
启动 YARN 有个好处是可以通过 Web 界面查看任务的运行情况:http://localhost:8088/cluster

踩坑记录

  • 内存不足:一开始虚拟机只开了2G 内存,出现了很多错误,后来将虚拟机内存开到8G, 就没有问题了
  • hosts 配置,一开始启动的时候会报不识别 localhost 的域名的错误,更改下 hosts文件即可,加一行
1
127.0.0.1   localhost HostName
  • put 上传文档时报错:There are 0 datanode(s) running and no node(s) are excluded in this operation
    这可能由于之前在目录下有操作,有一些其他的文档,只要清空指定的目录,然后再格式化 namenode 和 datanode 就可以了

参考资料

《Hadoop 权威指南 : 第四版》 –Tom White 著

感受

这篇文章写下来等于将当时搭建 Hadoop 环境重复了一遍,花了不少功夫的,一遍敲命令,一遍记录下来,温故而知新,自己也学到不少东西,棒棒哒💯

首先

首先记录,在这篇文章书写前,自己并不是刚刚上手 Hadoop, 其实学了有一段时间了
在这段时间内,由最开始的对 Hadoop 的懵懂无知到渐渐的熟悉 Hadoop 大致的开发流程
整个过程越来越清晰
于是就想着,把自己接下来在 Hadoop 上的学习计划记录下来

流程

  1. 了解 Hadoop 背景,开发作用
  2. 然后搭建Hadoop集群,先让它在自己电脑上运行。
  3. 学习分布式文件系统HDFS。
  4. 学习分布式计算框架MapReduce
  5. 学习流式计算Storm
  6. 学习分布式协作服务Zookeeper
  7. 学习Hive—数据仓库工具
  8. 学习Hbase—分布式存储系统
  9. 学习Spark
  10. 学习Scala
  11. 学习Spark开发技术

最后

这些技术在工作中是远远不够的,但也不是工作中每项都有用到了
就自己现在公司的大数据环境来说,还有像 impala,zookeeper,spark,kafka…等等
等有新的学习计划再补充吧

首先要说

java 操作 elasticsearch 有四种方式

  1. 调用 elasticsearch 的 restapis 接口
  2. 调用 java elasticsearch client 的接口
  3. 整合 spring data 使用 ElasticsearchTemplate 封装的方法
  4. 继承 ElasticsearchRepository 接口调用方法

测试准备

我们先来准备一些数据,写了一个之前的获取JoyMusic 的音乐数据的项目来说,项目的结构是这样的:
elasticsearch-test-project
获取数据的主要代码如下,只是为了增加数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
@RunWith(SpringJUnit4ClassRunner.class)
@SpringBootTest(classes = JoylauElasticsearchApplication.class,webEnvironment = SpringBootTest.WebEnvironment.MOCK)
public class JoylauElasticsearchApplicationTests {
@Autowired
private RestTemplate restTemplate;

@Autowired
private PlaylistDAO playlistDAO;

@Autowired
private SongDAO songDAO;

@Autowired
private CommentDAO commentDAO;
@Test
public void createData() {
String personalizeds = restTemplate.getForObject("http://localhost:3003/apis/v1"+"/personalized",String.class);
JSONObject perJSON = JSONObject.parseObject(personalizeds);
JSONArray perArr = perJSON.getJSONArray("result");
List<Playlist> list = new ArrayList<>();
List<Integer> playListIds = new ArrayList<>();
for (Object o : perArr) {
JSONObject playListJSON = JSONObject.parseObject(o.toString());
Playlist playlist = new Playlist();
playlist.setId(playListJSON.getIntValue("id"));
playListIds.add(playlist.getId());
playlist.setName(playListJSON.getString("name"));
playlist.setPicURL(playListJSON.getString("picUrl"));
playlist.setPlayCount(playListJSON.getIntValue("playCount"));
playlist.setBookCount(playListJSON.getIntValue("bookCount"));
playlist.setTrackCount(playListJSON.getIntValue("trackCount"));
list.add(playlist);
}
playlistDAO.saveAll(list);



/*存储歌曲*/
List<Integer> songIds = new ArrayList<>();
List<Song> songList = new ArrayList<>();
for (Integer playListId : playListIds) {
String res = restTemplate.getForObject("http://localhost:3003/apis/v1"+"/playlist/detail?id="+playListId,String.class);
JSONArray songJSONArr = JSONObject.parseObject(res).getJSONObject("playlist").getJSONArray("tracks");
for (Object o : songJSONArr) {
JSONObject songJSON = JSONObject.parseObject(o.toString());
Song song = new Song();
song.setId(songJSON.getIntValue("id"));
songIds.add(song.getId());
song.setName(songJSON.getString("name"));
song.setAuthor(getSongAuthor(songJSON.getJSONArray("ar")));
song.setTime(songJSON.getLong("dt"));
song.setPlaylistId(playListId);
song.setPicURL(songJSON.getJSONObject("al").getString("picUrl"));
song.setAlbum(songJSON.getJSONObject("al").getString("name"));
songList.add(song);
}
}
songDAO.saveAll(songList);



/*存储评论*/
List<Comment> commentList = new ArrayList<>();
for (Integer songId : songIds) {
String res = restTemplate.getForObject("http://localhost:3003/apis/v1"+"/comment/music?id="+songId+"&offset="+300,String.class);
JSONArray commentArr = JSONObject.parseObject(res).getJSONArray("comments");
for (Object o : commentArr) {
JSONObject commentJSON = JSONObject.parseObject(o.toString());
Comment comment = new Comment();
comment.setId(commentJSON.getIntValue("commentId"));
comment.setSongId(songId);
comment.setContent(commentJSON.getString("content"));
comment.setAuthor(commentJSON.getJSONObject("user").getString("nickname"));
comment.setPicUrl(commentJSON.getJSONObject("user").getString("avatarUrl"));
comment.setTime(commentJSON.getLong("time"));
comment.setSupport(commentJSON.getIntValue("likedCount"));
commentList.add(comment);
}

}

commentDAO.saveAll(commentList);
}

/**
* 获取歌曲作者名
* @param arr arr
* @return String
*/
private String getSongAuthor(JSONArray arr){
StringBuilder author = new StringBuilder();
for (Object o : arr) {
JSONObject json = JSONObject.parseObject(o.toString());
author.append(json.getString("name"));
if (arr.size() > 1){
author.append(",");
}
}
return author.toString();
}
}

跑了起来之后, elasticsearch 增加的数据如下:
elasticsearch-test-guide
elasticsearch-test-data

现在数据有了,接下来就是使用各种方法了

ElasticSearchTemplate 和 ElasticsearchRepository 的关系

ElasticSearchTemplate 是 spring date 对 elasticsearch 客户端 Java API 的封装,而 ElasticsearchRepository,是ElasticSearchTemplate更深层次的封装,可以使用注解,很类似以前 mybatis 的使用
ElasticSearchTemplate提供的方法更多,ElasticsearchRepository能用的方法其实全部都在而 ElasticSearchTemplate 都有实现
我们只要能熟悉调用的 ElasticSearchTemplate 里面的方法操作
ElasticsearchRepository都能够会操作

ElasticSearchTemplate

一些很底层的方法,我们最常用的就是elasticsearchTemplate.queryForList(searchQuery, class);
而这里面最主要的就是构建searchQuery,一下总结几个最常用的searchQuery以备忘
searchQuery能构建好,其他的就很简单了

queryStringQuery

单字符串全文查询

1
2
3
4
5
6
7
8
9
10
11
12
/**
* 单字符串模糊查询,默认排序。将从所有字段中查找包含传来的word分词后字符串的数据集
*/
@Test
public void queryStringQuerySong(){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(queryStringQuery("Time")).withPageable(of(0,100)).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

返回的结果如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
"query_string" : {
"query" : "Time",
"fields" : [ ],
"use_dis_max" : true,
"tie_breaker" : 0.0,
"default_operator" : "or",
"auto_generate_phrase_queries" : false,
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"split_on_whitespace" : true,
"boost" : 1.0
}
}
{"album":"Time","author":"Cat naps","id":459733590,"name":"Time","picURL":"http://p1.music.126.net/9DmApLeDwutb4HpuhD_E-Q==/18624627464667106.jpg","playlistId":900228548,"time":86465}
{"album":"Go Time","author":"Mark Petrie","id":29717271,"name":"Go Time","picURL":"http://p1.music.126.net/TJe468hZr_0ndQRfTAKdqA==/3233663697760186.jpg","playlistId":636015704,"time":136071}
{"album":"Out of Time","author":"R.E.M.","id":20282663,"name":"Losing My Religion","picURL":"http://p1.music.126.net/wYtpqN8Yu2jamQwdM6ugGg==/6638851209090428.jpg","playlistId":772430182,"time":269270}
{"album":"Time Flies... 1994-2009","author":"Oasis","id":17822660,"name":"Cigarettes & Alcohol","picURL":"http://p1.music.126.net/qDgXElJRtSsuqNwsTzW8lw==/667403558069001.jpg","playlistId":772430182,"time":291853}
{"album":"Electric Warrior","author":"T. Rex","id":29848501,"name":"There Was A Time","picURL":"http://p1.music.126.net/dn1MwEBfBcL4l6isrnEwDw==/3246857839528733.jpg","playlistId":772430182,"time":60577}
{"album":"Ride On Time","author":"山下達郎","id":22693846,"name":"DAYDREAM","picURL":"http://p1.music.126.net/GaQVveQiyTIqecs7hhoYpA==/749866930165154.jpg","playlistId":900228548,"time":273476}
{"album":"The Blossom Chronicles","author":"Philter","id":21375446,"name":"Adventure Time","picURL":"http://p1.music.126.net/YjMS5_kM3u9PCUU0lcRK8g==/6657542907248762.jpg","playlistId":636015704,"time":207412}
{"album":"Decimus","author":"Audio Machine","id":36586631,"name":"Ashes of Time","picURL":"http://p1.music.126.net/7InBepjNDGCzpzH8Feyw9A==/3395291908535260.jpg","playlistId":636015704,"time":190826}
{"album":"In Time: The Best Of R.E.M. 1988-2003","author":"R.E.M.","id":20283068,"name":"Bad Day","picURL":"http://p1.music.126.net/aZXu5ulRJvH4dnoWPjxb3A==/18277181789089107.jpg","playlistId":772430182,"time":248111}
{"album":"It's a Poppin' Time","author":"山下達郎","id":22693864,"name":"HEY THERE LONELY GIRL","picURL":"http://p1.music.126.net/PGZlyXk20_-5d6E3pDEKpg==/815837627833461.jpg","playlistId":900228548,"time":325956}
{"album":"Shire Music Annual Selection - Myth","author":"Shire Music,Songs To Your Eyes,","id":34916751,"name":"Between Space And Time","picURL":"http://p1.music.126.net/CCqLd2ly2XuuSPz0IW0u-g==/3284241233077333.jpg","playlistId":636015704,"time":222456}
{"album":"Double Live Doggie Style I","author":"X-Ray Dog","id":26246058,"name":"Time Will Tell","picURL":"http://p1.music.126.net/oYEIMWnAvpuRDTk4g_l-lg==/2503587976473913.jpg","playlistId":636015704,"time":202133}
{"album":"The Ghost Of Tom Joad","author":"Bruce Springsteen","id":16657852,"name":"Straight Time (Album Version)","picURL":"http://p1.music.126.net/yK0V-aD3Myh4xorvwUtCrw==/17889054184179160.jpg","playlistId":772430182,"time":210651}
{"album":"Epic Action & Adventure Vol. 6","author":"Epic Score","id":4054121,"name":"Time Will Remember Us","picURL":"http://p1.music.126.net/uN8AYI3sQEgoECuSYmi9Eg==/658607465082090.jpg","playlistId":636015704,"time":165000}

我们修改一下排序方式,按照id从大到小排序

1
2
3
4
5
6
7
8
9
10
11
12
/** 
* 单字符串模糊查询,单字段排序。
*/
@Test
public void queryStringQueryWeightSong(){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(queryStringQuery("Time")).withPageable(of(0,100,new Sort(Sort.Direction.DESC,"id"))).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

也可以使用注解,这么写

1
2
3
4
5
6
7
8
public void queryStringQueryWeightSong(@PageableDefault(sort = "id", direction = Sort.Direction.DESC) Pageable pageable){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(queryStringQuery("Time")).withPageable(pageable).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

返回的结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
"query_string" : {
"query" : "Time",
"fields" : [ ],
"use_dis_max" : true,
"tie_breaker" : 0.0,
"default_operator" : "or",
"auto_generate_phrase_queries" : false,
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"split_on_whitespace" : true,
"boost" : 1.0
}
}
{"album":"Time","author":"Cat naps","id":459733590,"name":"Time","picURL":"http://p1.music.126.net/9DmApLeDwutb4HpuhD_E-Q==/18624627464667106.jpg","playlistId":900228548,"time":86465}
{"album":"Decimus","author":"Audio Machine","id":36586631,"name":"Ashes of Time","picURL":"http://p1.music.126.net/7InBepjNDGCzpzH8Feyw9A==/3395291908535260.jpg","playlistId":636015704,"time":190826}
{"album":"Shire Music Annual Selection - Myth","author":"Shire Music,Songs To Your Eyes,","id":34916751,"name":"Between Space And Time","picURL":"http://p1.music.126.net/CCqLd2ly2XuuSPz0IW0u-g==/3284241233077333.jpg","playlistId":636015704,"time":222456}
{"album":"Electric Warrior","author":"T. Rex","id":29848501,"name":"There Was A Time","picURL":"http://p1.music.126.net/dn1MwEBfBcL4l6isrnEwDw==/3246857839528733.jpg","playlistId":772430182,"time":60577}
{"album":"Go Time","author":"Mark Petrie","id":29717271,"name":"Go Time","picURL":"http://p1.music.126.net/TJe468hZr_0ndQRfTAKdqA==/3233663697760186.jpg","playlistId":636015704,"time":136071}
{"album":"Double Live Doggie Style I","author":"X-Ray Dog","id":26246058,"name":"Time Will Tell","picURL":"http://p1.music.126.net/oYEIMWnAvpuRDTk4g_l-lg==/2503587976473913.jpg","playlistId":636015704,"time":202133}
{"album":"It's a Poppin' Time","author":"山下達郎","id":22693864,"name":"HEY THERE LONELY GIRL","picURL":"http://p1.music.126.net/PGZlyXk20_-5d6E3pDEKpg==/815837627833461.jpg","playlistId":900228548,"time":325956}
{"album":"Ride On Time","author":"山下達郎","id":22693846,"name":"DAYDREAM","picURL":"http://p1.music.126.net/GaQVveQiyTIqecs7hhoYpA==/749866930165154.jpg","playlistId":900228548,"time":273476}
{"album":"The Blossom Chronicles","author":"Philter","id":21375446,"name":"Adventure Time","picURL":"http://p1.music.126.net/YjMS5_kM3u9PCUU0lcRK8g==/6657542907248762.jpg","playlistId":636015704,"time":207412}
{"album":"In Time: The Best Of R.E.M. 1988-2003","author":"R.E.M.","id":20283068,"name":"Bad Day","picURL":"http://p1.music.126.net/aZXu5ulRJvH4dnoWPjxb3A==/18277181789089107.jpg","playlistId":772430182,"time":248111}
{"album":"Out of Time","author":"R.E.M.","id":20282663,"name":"Losing My Religion","picURL":"http://p1.music.126.net/wYtpqN8Yu2jamQwdM6ugGg==/6638851209090428.jpg","playlistId":772430182,"time":269270}
{"album":"Time Flies... 1994-2009","author":"Oasis","id":17822660,"name":"Cigarettes & Alcohol","picURL":"http://p1.music.126.net/qDgXElJRtSsuqNwsTzW8lw==/667403558069001.jpg","playlistId":772430182,"time":291853}
{"album":"The Ghost Of Tom Joad","author":"Bruce Springsteen","id":16657852,"name":"Straight Time (Album Version)","picURL":"http://p1.music.126.net/yK0V-aD3Myh4xorvwUtCrw==/17889054184179160.jpg","playlistId":772430182,"time":210651}
{"album":"Epic Action & Adventure Vol. 6","author":"Epic Score","id":4054121,"name":"Time Will Remember Us","picURL":"http://p1.music.126.net/uN8AYI3sQEgoECuSYmi9Eg==/658607465082090.jpg","playlistId":636015704,"time":165000}

matchQuery

查询某个字段中模糊包含目标字符串,使用matchQuery

1
2
3
4
5
6
7
8
9
10
11
12
/** 
* 单字段对某字符串模糊查询
*/
@Test
public void matchQuerySong(){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(matchQuery("name","Time")).withPageable(of(0,100)).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

返回结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
"match" : {
"name" : {
"query" : "Time",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"boost" : 1.0
}
}
}
{"album":"Time","author":"Cat naps","id":459733590,"name":"Time","picURL":"http://p1.music.126.net/9DmApLeDwutb4HpuhD_E-Q==/18624627464667106.jpg","playlistId":900228548,"time":86465}
{"album":"Go Time","author":"Mark Petrie","id":29717271,"name":"Go Time","picURL":"http://p1.music.126.net/TJe468hZr_0ndQRfTAKdqA==/3233663697760186.jpg","playlistId":636015704,"time":136071}
{"album":"The Blossom Chronicles","author":"Philter","id":21375446,"name":"Adventure Time","picURL":"http://p1.music.126.net/YjMS5_kM3u9PCUU0lcRK8g==/6657542907248762.jpg","playlistId":636015704,"time":207412}
{"album":"Shire Music Annual Selection - Myth","author":"Shire Music,Songs To Your Eyes,","id":34916751,"name":"Between Space And Time","picURL":"http://p1.music.126.net/CCqLd2ly2XuuSPz0IW0u-g==/3284241233077333.jpg","playlistId":636015704,"time":222456}
{"album":"Electric Warrior","author":"T. Rex","id":29848501,"name":"There Was A Time","picURL":"http://p1.music.126.net/dn1MwEBfBcL4l6isrnEwDw==/3246857839528733.jpg","playlistId":772430182,"time":60577}
{"album":"Double Live Doggie Style I","author":"X-Ray Dog","id":26246058,"name":"Time Will Tell","picURL":"http://p1.music.126.net/oYEIMWnAvpuRDTk4g_l-lg==/2503587976473913.jpg","playlistId":636015704,"time":202133}
{"album":"Decimus","author":"Audio Machine","id":36586631,"name":"Ashes of Time","picURL":"http://p1.music.126.net/7InBepjNDGCzpzH8Feyw9A==/3395291908535260.jpg","playlistId":636015704,"time":190826}
{"album":"The Ghost Of Tom Joad","author":"Bruce Springsteen","id":16657852,"name":"Straight Time (Album Version)","picURL":"http://p1.music.126.net/yK0V-aD3Myh4xorvwUtCrw==/17889054184179160.jpg","playlistId":772430182,"time":210651}
{"album":"Epic Action & Adventure Vol. 6","author":"Epic Score","id":4054121,"name":"Time Will Remember Us","picURL":"http://p1.music.126.net/uN8AYI3sQEgoECuSYmi9Eg==/658607465082090.jpg","playlistId":636015704,"time":165000}

matchPhraseQuery

PhraseMatch查询,短语匹配

1
2
3
4
5
6
7
8
9
10
11
12
/** 
* 单字段对某短语进行匹配查询,短语分词的顺序会影响结果
*/
@Test
public void phraseMatchSong(){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(matchPhraseQuery("name","Time")).withPageable(of(0,100)).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

termQuery

这个是最严格的匹配,属于低级查询,不进行分词的,参考这篇文章 http://www.cnblogs.com/muniaofeiyu/p/5616316.html

1
2
3
4
5
6
7
8
9
10
11
12
/** 
* term匹配,即不分词匹配,你传来什么值就会拿你传的值去做完全匹配
*/
@Test
public void termQuerySong(){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(termQuery("name","Time")).withPageable(of(0,100)).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

multiMatchQuery

多个字段匹配某字符串,如果我们希望name,author两个字段去匹配某个字符串,只要任何一个字段包括该字符串即可,就可以使用multiMatchQuery。

1
2
3
4
5
6
7
8
9
10
11
12
/** 
* 多字段匹配
*/
@Test
public void multiMatchQuerySong(){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(multiMatchQuery("time","name","author")).withPageable(of(0,100)).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

完全包含查询

之前的查询中,当我们输入“我天”时,ES会把分词后所有包含“我”和“天”的都查询出来,如果我们希望必须是包含了两个字的才能被查询出来,那么我们就需要设置一下Operator。

1
2
3
4
5
6
7
8
9
10
11
12
/** 
* 单字段包含所有输入
*/
@Test
public void matchQueryOperatorSong(){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(matchQuery("name","真的").operator(Operator.AND)).withPageable(of(0,100)).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

无论是matchQuery,multiMatchQuery,queryStringQuery等,都可以设置operator。默认为Or,设置为And后,就会把符合包含所有输入的才查出来。
如果是and的话,譬如用户输入了5个词,但包含了4个,也是显示不出来的。我们可以通过设置精度来控制。

1
2
3
4
5
6
7
8
9
10
11
12
/** 
* 单字段包含所有输入(按比例包含)
*/
@Test
public void matchQueryOperatorWithMinimumShouldMatchSong(){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(matchQuery("name","time").operator(Operator.AND).minimumShouldMatch("80%")).withPageable(of(0,100)).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

minimumShouldMatch可以用在match查询中,设置最少匹配了多少百分比的能查询出来。

合并查询

即boolQuery,可以设置多个条件的查询方式。它的作用是用来组合多个Query,有四种方式来组合,must,mustnot,filter,should。
must代表返回的文档必须满足must子句的条件,会参与计算分值;
filter代表返回的文档必须满足filter子句的条件,但不会参与计算分值;
should代表返回的文档可能满足should子句的条件,也可能不满足,有多个should时满足任何一个就可以,通过minimum_should_match设置至少满足几个。
mustnot代表必须不满足子句的条件。
譬如我想查询name包含“XXX”,且userId=“2345098”,且time最好小于165000的结果。那么就可以使用boolQuery来组合。

1
2
3
4
5
6
7
8
9
10
11
12
13
/** 
* 多字段合并查询
*/
@Test
public void boolQuerySong(){
SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(boolQuery().must(termQuery("userId", "2345098"))
.should(rangeQuery("time").lt(165000)).must(matchQuery("name", "time"))).build();
System.out.println(searchQuery.getQuery().toString());
List<Song> songList = songDAO.search(searchQuery).getContent();
for (Song song : songList) {
System.out.println(JSONObject.toJSONString(song));
}
}

详细点的看这篇 http://blog.csdn.net/dm_vincent/article/details/41743955
boolQuery使用场景非常广泛,应该是主要学习的知识之一。

Query和Filter的区别

query和Filter都是QueryBuilder,也就是说在使用时,你把Filter的条件放到withQuery里也行,反过来也行。那么它们两个区别在哪?
查询在Query查询上下文和Filter过滤器上下文中,执行的操作是不一样的:

1、查询:是在使用query进行查询时的执行环境,比如使用search的时候。
在查询上下文中,查询会回答这个问题——“这个文档是否匹配这个查询,它的相关度高么?”
ES中索引的数据都会存储一个_score分值,分值越高就代表越匹配。即使lucene使用倒排索引,对于某个搜索的分值计算还是需要一定的时间消耗。

2、过滤器:在使用filter参数时候的执行环境,比如在bool查询中使用Must_not或者filter
在过滤器上下文中,查询会回答这个问题——“这个文档是否匹配?”
它不会去计算任何分值,也不会关心返回的排序问题,因此效率会高一点。
另外,经常使用过滤器,ES会自动的缓存过滤器的内容,这对于查询来说,会提高很多性能。

ElasticsearchRepository

ElasticsearchRepository接口的方法有

1
2
3
4
5
6
7
8
9
10
11
12
@NoRepositoryBean
public interface ElasticsearchRepository<T, ID extends Serializable> extends ElasticsearchCrudRepository<T, ID> {
<S extends T> S index(S var1);

Iterable<T> search(QueryBuilder var1);

FacetedPage<T> search(QueryBuilder var1, Pageable var2);

FacetedPage<T> search(SearchQuery var1);

Page<T> searchSimilar(T var1, String[] var2, Pageable var3);
}

执行复杂查询最常用的就是 FacetedPage search(SearchQuery var1); 这个方法了,需要的参数是 SearchQuery
主要是看QueryBuilder和SearchQuery两个参数,要完成一些特殊查询就主要看构建这两个参数。
我们先来看看它们之间的类关系
image

实际使用中,我们的主要任务就是构建NativeSearchQuery来完成一些复杂的查询的。

1
2
3
4
5
6
public NativeSearchQuery(QueryBuilder query, QueryBuilder filter, List<SortBuilder> sorts, Field[] highlightFields) {  
this.query = query;
this.filter = filter;
this.sorts = sorts;
this.highlightFields = highlightFields;
}

我们可以看到要构建NativeSearchQuery,主要是需要几个构造参数

当然了,我们没必要实现所有的参数。
可以看出来,大概是需要QueryBuilder,filter,和排序的SortBuilder,和高亮的字段。
一般情况下,我们不是直接是new NativeSearchQuery,而是使用NativeSearchQueryBuilder。
通过NativeSearchQueryBuilder.withQuery(QueryBuilder1).withFilter(QueryBuilder2).withSort(SortBuilder1).withXXXX().build();这样的方式来完成NativeSearchQuery的构建。
从名字就能看出来,QueryBuilder主要用来构建查询条件、过滤条件,SortBuilder主要是构建排序。

很幸运的 ElasticsearchRepository 里的 SearchQuery 也就是上述描述的 temple 的 SearchQuery,2 者可以共用

介绍

JSONPath。这是一个很强大的功能,可以在java框架中当作对象查询语言(OQL)来使用

语法说明

JSONPATH 描述
$ 根对象,例如$.name
[num] 数组访问,其中num是数字,可以是负数。例如$[0].leader.departments[-1].name
[num0,num1,num2…] 数组多个元素访问,其中num是数字,可以是负数,返回数组中的多个元素。例如$[0,3,-2,5]
[start:end] 数组范围访问,其中start和end是开始小表和结束下标,可以是负数,返回数组中的多个元素。例如$[0:5]
[start:end :step] 数组范围访问,其中start和end是开始小表和结束下标,可以是负数;step是步长,返回数组中的多个元素。例如$[0:5:2]
[?(key)] 对象属性非空过滤,例如$.departs[?(name)]
[key > 123] 数值类型对象属性比较过滤,例如$.departs[id >= 123],比较操作符支持=,!=,>,>=,<,<=
[key = ‘123’] 字符串类型对象属性比较过滤,例如$.departs[name = ‘123’],比较操作符支持=,!=,>,>=,<,<=
[key like ‘aa%’] 字符串类型like过滤,例如$.departs[name like ‘sz*’],通配符只支持% 支持not like
[key rlike ‘regexpr’] 字符串类型正则匹配过滤,例如departs[name like ‘aa(.)*’],正则语法为jdk的正则语法,支持not rlike
[key in (‘v0’, ‘v1’)] IN过滤, 支持字符串和数值类型 例如: $.departs[name in (‘wenshao’,’Yako’)] $.departs[id not in (101,102)]
[key between 234 and 456] BETWEEN过滤, 支持数值类型,支持not between 例如: $.departs[id between 101 and 201]$.departs[id not between 101 and 201]length() 或者 size() 数组长度。例如$.values.size() 支持类型java.util.Map和java.util.Collection和数组
. 属性访问,例如$.name
.. deepScan属性访问,例如$..name
* 对象的所有属性,例如$.leader.*
[‘key’] 属性访问。例如$[‘name’]
[‘key0’,’key1’] 多个属性访问。例如$[‘id’,’name’]

语法示例

JSONPath 语义
$ 根对象
$[-1] 最后元素
$[:-2] 第1个至倒数第2个
$[1:] 第2个之后所有元素
$[1,2,3] 集合中1,2,3个元素

java 示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{ "store": {
"book": [
{ "category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{ "category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99,
"isbn": "0-553-21311-3"
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
private static void jsonPathTest() {
JSONObject json = jsonTest();//调用自定义的jsonTest()方法获得json对象,生成上面的json

//输出book[0]的author值
String author = JsonPath.read(json, "$.store.book[0].author");

//输出全部author的值,使用Iterator迭代
List<String> authors = JsonPath.read(json, "$.store.book[*].author");

//输出book[*]中category == 'reference'的book
List<Object> books = JsonPath.read(json, "$.store.book[?(@.category == 'reference')]");

//输出book[*]中price>10的book
List<Object> books = JsonPath.read(json, "$.store.book[?(@.price>10)]");

//输出book[*]中含有isbn元素的book
List<Object> books = JsonPath.read(json, "$.store.book[?(@.isbn)]");

//输出该json中所有price的值
List<Double> prices = JsonPath.read(json, "$..price");

//可以提前编辑一个路径,并多次使用它
JsonPath path = JsonPath.compile("$.store.book[*]");
List<Object> books = path.read(json);
}

今天在安装完nodejs后执行 npm install 居然出错了

npm: relocation error: npm: symbol SSL_set_cert_cb, version libssl.so.10 not defined in file libssl.

npm: relocation error: npm: symbol SSL_set_cert_cb, version libssl.so.10 not defined in file libssl.so.10 with link time reference”, “rc”: 127, “stderr”: “npm: relocation error: npm: symbol SSL_set_cert_cb, version libssl.so.10 not defined in file libssl.so.10 with link time reference\n”, “stderr_lines”: [“npm: relocation error: npm: symbol SSL_set_cert_cb, version libssl.so.10 not defined in file libssl.so.10 with link time reference

解决办法:

yum -y install openssl

如果已经安装,就更新一下

yum -y update openssl

第一个坑

SpringBoot 在1.5版本后就有了 starter, 但是在依赖列表中却没有找到相应的依赖,原因是名字不叫starter,傻傻的我还用JavaConfig 配置了一遍
现在看下整合 starter 之后的是怎么样的吧!

1
2
3
4
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>

上面这个依赖其实就是starter, 不需要些版本,SpringBoot会自己选择版本

yml配置文件

1
2
3
4
5
6
7
8
spring
kafka:
bootstrap-servers: 192.168.10.192:9092
consumer:
group-id: secondary-identification
producer:
batch-size: 65536
buffer-memory: 524288

默认只需要 bootstrap-servers 和 group-id 即可

接下来 生产者 和 消费者

1
2
3
4
5
6
7
8
@Component
public class MsgProducer {
@Autowired
private KafkaTemplate kafkaTemplate;
public void sendMessage() {
kafkaTemplate.send("index-vehicle","key","hello,kafka" + LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS")));
}
}
1
2
3
4
5
6
7
@Component
public class MsgConsumer {
@KafkaListener(topics = {"index-vehicle"})
public void processMessage(String content) {
System.out.println(content);
}
}

第二个坑

可以发消息,但是SpringBoot始终收不到,我用Kafka自带的工具却可以收到,很气愤,搞了好长时间都没有解决
后来遍访Google和官方文档,终于找到原因了,只要修改下配置文件的一个配置即可:

1
2
3
4
5
6
7
# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://0.0.0.0:9092

上面的额这个 listeners,因为我的程序是加了@KafkaListener 来监听消息的,需要开启一个这样的配置项

这项配置项的含义在此也备注下:

监听列表(以逗号分隔 不同的协议(如plaintext,trace,ssl、不同的IP和端口)),hostname如果设置为0.0.0.0则绑定所有的网卡地址;如果hostname为空则绑定默认的网卡。如果没有配置则默认为java.net.InetAddress.getCanonicalHostName()

这2个坑在此记录下

一些常用命令在此记录下

zookeeper-server-start.bat ../../config/zookeeper.properties : 开启自带的zookeeper
kafka-server-start.bat ../../config.properties : 开启kafka
kafka-console-consumer.bat –bootstrap-server localhost:9092 –topic myTopic –from-beginning : 控制台接受指定topic消息
kafka-console-producer.bat –broker-list localhost:9092 –topic myTopic : 指定topic发送消息

注意的是用命令行创建的producer绑定的主题topic需要用命令行先创建topic,已经创建的就直接发送就好了

本文转自:http://blog.csdn.net/jsshaojinjie/article/details/64125458

maven dependencies增加

1
2
3
4
5
<dependency>  
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-devtools</artifactId>
<optional>true</optional>
</dependency>

project增加

1
2
3
4
5
6
7
8
9
10
11
12
<build>  
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<!--fork : 如果没有该项配置,devtools不会起作用,即应用不会restart -->
<fork>true</fork>
</configuration>
</plugin>
</plugins>
</build>

idea设置

image

ctrl+shift+alt+/

image
image

重启项目即可。

0%