1. k8s 服务质量 和 jvm参数的关系

本文使用的jdk版本为 jdk11

在Java 8及以上版本中,元空间被分配在非堆中的方法区中。因此,元空间和非堆之间的关系是,元空间是非堆中的一个子区域,用于存储类元数据。同时,非堆中还包括其他的内存区域,如方法区、虚拟机栈、本地方法栈等。

jvm配置

查看所有JVM配置的默认值

1
java  -XX:+PrintFlagsFinal -version 
  • 默认堆大小

-verbose:gc -XX:+PrintGCDetails -Xlog:gc*:stdout:time -XX:+UseConcMarkSweepGC
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof

1
-verbose:gc -XX:+PrintGCDetails -Xlog:gc*:stdout:time
  • 查看jvm使用的配置 *
    java -XX:+PrintCommandLineFlags -version
1
2
3
4
5
6
7
-XX:InitialHeapSize=19660800 -XX:MaxHeapSize=314572800 -XX:MinHeapSize=6815736
-XX:+PrintCommandLineFlags -XX:ReservedCodeCacheSize=251658240
-XX:+SegmentedCodeCache -XX:+UseCompressedClassPointers
-XX:+UseCompressedOops -XX:+UseSerialGC
openjdk version "17.0.5" 2022-10-18 LTS
OpenJDK Runtime Environment Zulu17.38+21-CA (build 17.0.5+8-LTS)
OpenJDK 64-Bit Server VM Zulu17.38+21-CA (build 17.0.5+8-LTS, mixed mode, sharing)

上述出现的参数释义:

  • InitialHeapSize: JVM初始堆内存

  • MaxHeapSize: 最大堆内存

  • ReservedCodeCacheSize: JIT即时编译后的代码存储空间限额。超出后会关闭JIT,字节码将不会再被编译为机器码,应用程序会继续运行,但运行速度会降低一个数量级。

  • UseCompressedOops:为了在 64bit机器上使用32bit的原始对象指针。使用压缩功能后,每个对象中的 Klass* 字段就会被压缩成 32bit(不是所有的 oop 都会被压缩的),总所周知 Klass* 指向的 Klass 在永久代(Java7 及之前)。Java8使用Metaspace替代永久代,于是之前压缩指针 Klass* 指向的这块 Klass 区域有了一个名字 —— Compressed Class Space。它是Metaspace的一部分。堆大小超过32g则自动关闭。关闭后就没有Compressed Class Space了。

  • UseSerialGC: 表示使用的是 serial 作为垃圾收集器。针对新生代和老年代都使用串行回收期,新生代使用 复制算法,老年代使用标记-整理算法,它是单线程的。
    同时,上述返回中Server表示jvm是以服务器模式运行的。

什么是JIT ? 什么是Compressed Class Space

当 Java 被编译为字节码形式的class文件之后,可以在任意的 JVM 运行。这里说的编译,主要是指前端编译器。但是 class 文件里面的字节码并不能直接运行,而是要通过后端编译器在程序运行期间,将字节码转变成机器码,这样电脑才能执行你的代码。
JavaCodeCache:对于某些热点代码,缓存其变以后的机器码,下次无需重新即时编译。但是也不能缓存所有代码即时编译后的机器码,需要有相关的优化和清理策略。

Java Code Cache 分块(Segmented Code Cache): 从 Java 9 开始引入的 Code Cache 分块,主要解决之前把所有种类代码放一起,导致扫描的时候效率低下。例如之前说的有些代码经过 C1 优化,之后 C2 优化,这些代码最好分开存储。(C1 优化过得代码,C2 优化完之后,C1的要被清理掉)。
目前 Java Code Cache 分为 3 块:

  • 非方法代码堆(non-method code heap): JIT 编译器要用到的内存区域,例如编译器要用的缓存等等。他们会永远存在于 Code Cache 内。
  • 带采样的代码堆(profiled code heap):目前是经过 C1 编译器优化的代码,就是轻度优化,带采样代码的代码,存活时间比较短,因为 C1 优化的代码最终会被 C2 优化,或者退回 level 1 去掉采样。
  • 不带采样的代码堆(non-profiled code heap):包含不带采样的代码,已经完全优化好,并且长期存活的代码。默认情况下,非方法代码堆包括 3MB 的来自于虚拟机的固定空间占用,以及随着编译线程个数上涨而上涨的空间占用。剩下的空间,带采样的代码堆与不带采样的代码堆评分,可以通过如下参数修改:
  • -XX:NonProfiledCodeHeapSize: 不带采样的代码堆大小.
  • -XX:ProfiledCodeHeapSize: 带采样的代码堆大小.
  • -XX:NonNMethodCodeHeapSize: 非方法代码堆大小.
  • -XX:ReservedCodeCacheSize 以上三个加起来需要等于这个

采样gc
jstat -gcutil 1 1000

jvm gc 日志

1
2
3
4
5
6
[2023-11-14T17:08:54.477+0800] GC(165) Pause Young (Allocation Failure)
[2023-11-14T17:08:54.501+0800] GC(165) DefNew: 71898K->6019K(78080K)
[2023-11-14T17:08:54.501+0800] GC(165) Tenured: 121654K->121654K(173316K)
[2023-11-14T17:08:54.501+0800] GC(165) Metaspace: 126591K(130032K)->126591K(130032K) NonClass: 111948K(114088K)->111948K(114088K) Class: 14643K(15944K)->14643K(15944K)
[2023-11-14T17:08:54.501+0800] GC(165) Pause Young (Allocation Failure) 189M->124M(245M) 23.792ms
[2023-11-14T17:08:54.501+0800] GC(165) User=0.02s Sys=0.00s Real=0.02s

DefNewGeneration是default new generation

  • [2023-11-14T17:08:54.477+0800]: 表示gc发起的时间
  • GC(165): GC没有被full 修饰,说明是young gc, 括号中的内容表示是 第165次gc。
  • Allocation Failure:表明本次引起GC的原因是因为在年轻代中没有足够的空间能够存储新的数据了
  • DefNew 71898K->6019K(78080K):使用 default new generation 默认的串行收集器进行垃圾收集。后面三个数字的含义分别为:GC前新生代的堆内存占用,GC后的堆内存占用,新生代堆内存分配总量
  • Tenured: 121654K->121654K(173316K): 表示对老年代的垃圾回收,GC前老年代的堆内存占用,GC后老年代的堆内存占用,新生代堆内存分配总量
  • Metaspace: 126591K(130032K)->126591K(130032K): 关于元空间三个参数的描述参见这篇文章深入理解堆外内存 Metaspace
  • Pause Young (Allocation Failure) 189M->124M(245M) 23.792ms: 反映了本次 young gc的整体情况,整个堆区的内存占用从189M降低为124M。此次垃圾收集共耗时 23.792 毫秒。
  • User=0.02s Sys=0.00s Real=0.02s:分别表示用户态耗时,内核态耗时和总耗时

jmap使用

jmap是一个可以输出内存中所有对象的工具。可以将JVM中的heap(堆),以二进制形式输出成文本。打印出某个java进程内存内所有’对象’的情况。

jmap –heap –pid

jdk 11中使用 jhsdb jmap –heap –pid 1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
jhsdb jmap --heap --pid 1
Attaching to process ID 1, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 11.0.21+9-adhoc.root.jdk11u

using thread-local object allocation.
Mark Sweep Compact GC

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 536870912 (512.0MB)
NewSize = 44695552 (42.625MB)
MaxNewSize = 178913280 (170.625MB)
OldSize = 89522176 (85.375MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 99549184 (94.9375MB)
used = 44857688 (42.779624938964844MB)
free = 54691496 (52.157875061035156MB)
45.06082942879773% used
Eden Space:
capacity = 88539136 (84.4375MB)
used = 44857688 (42.779624938964844MB)
free = 43681448 (41.657875061035156MB)
50.66424863237879% used
From Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
To Space:
capacity = 11010048 (10.5MB)
used = 0 (0.0MB)
free = 11010048 (10.5MB)
0.0% used
tenured generation:
capacity = 220962816 (210.7265625MB)
used = 132577048 (126.4353256225586MB)
free = 88385768 (84.2912368774414MB)
59.99970963440292% used

上述参数介绍如下:

  • MinHeapFreeRatio: 空闲堆空间的最小百分比
  • MaxHeapFreeRatio:空闲堆空间的最大百分比
    HeapFreeRatio =(CurrentFreeHeapSize/CurrentTotalHeapSize) * 100
    如果 HeapFreeRatio > MaxHeapFreeRatio 则应当缩容
    如果 HeapFreeRatio < MinHeapFreeRatio 则应当扩容。扩缩容的时机应当在gc之后。
  • MaxHeapSize:jvm允许的最大堆内存 这里设置的是 512.0MB
  • NewSize:新生代堆空间的默认大小 此处是 42.625MB
  • MaxNewSize:新生代堆空的最大值,这里是 170.625MB
  • OldSize:老年代堆空间的默认值 此处是85.375MB
  • NewRatio: 新生代(eden+2*survivor)和老年代堆空间的比值 1:2
  • SurvivorRatio: 两个survivor和eden的比值: 1:1:8
  • MetaspaceSize:默认元空间大小 20M
    CompressedClassSpaceSize = 1073741824 (1024.0MB)
    MaxMetaspaceSize = 17592186044415 MB
    G1HeapRegionSize = 0 (0.0MB)
jmap -histo:live 1 打印每个class的实例数目,内存占用,类全名信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@order-69458c44f-9rf2f app]jmap -histo:live 1
num #instances #bytes class name (module)
-------------------------------------------------------
1: 313791 19229192 [B (java.base@11.0.21)
2: 41510 13496736 [I (java.base@11.0.21)
3: 143714 8403088 [Ljava.lang.Object; (java.base@11.0.21)
4: 301469 7235256 java.lang.String (java.base@11.0.21)
5: 205974 6591168 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.21)
6: 74755 6578440 java.lang.reflect.Method (java.base@11.0.21)
7: 150974 6038960 java.util.LinkedHashMap$Entry (java.base@11.0.21)
8: 104204 5835424 java.util.LinkedHashMap (java.base@11.0.21)
9: 52196 4392392 [Ljava.util.HashMap$Node; (java.base@11.0.21)
10: 171836 4124064 java.util.ArrayList (java.base@11.0.21)
11: 68024 3265152 org.aspectj.weaver.reflect.ShadowMatchImpl
12: 23903 2855688 java.lang.Class (java.base@11.0.21)

jmap -dump:format=b,file=order.hprof 1

1
2
3
4
5
6

kubectl exec -it -n platform order-69458c44f-9rf2f -- jmap -dump:format=b,file=order.hprof 1
Dumping heap to /app/order.hprof ...
Heap dump file created [333251166 bytes in 3.741 secs]

docker cp 6c55bd3106c1:/app/order.hprof .

分析前后两次堆的差异

切换垃圾收集器

-XX:+UseConcMarkSweepGC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
[root@order-65d6887dc4-pkkfv app]# jhsdb jmap --heap --pid 1
Attaching to process ID 1, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 11.0.21+9-adhoc.root.jdk11u

using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 536870912 (512.0MB)
NewSize = 44695552 (42.625MB)
MaxNewSize = 174456832 (166.375MB)
OldSize = 89522176 (85.375MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 40239104 (38.375MB)
used = 39782968 (37.93999481201172MB)
free = 456136 (0.43500518798828125MB)
98.86643599221294% used
Eden Space:
capacity = 35782656 (34.125MB)
used = 35326520 (33.68999481201172MB)
free = 456136 (0.43500518798828125MB)
98.72525952237866% used
From Space:
capacity = 4456448 (4.25MB)
used = 4456448 (4.25MB)
free = 0 (0.0MB)
100.0% used
To Space:
capacity = 4456448 (4.25MB)
used = 0 (0.0MB)
free = 4456448 (4.25MB)
0.0% used
concurrent mark-sweep generation:
capacity = 184668160 (176.11328125MB)
used = 147725264 (140.8817901611328MB)
free = 36942896 (35.23149108886719MB)
79.99498343406898% used
查看gc日志
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[2023-11-14T19:43:22.885+0800] GC(188) Pause Young (Allocation Failure)
[2023-11-14T19:43:22.885+0800] GC(188) Using 2 workers of 2 for evacuation
[2023-11-14T19:43:22.897+0800] GC(188) ParNew: 36153K->1398K(39296K)
[2023-11-14T19:43:22.897+0800] GC(188) CMS: 146520K->146520K(180340K)
[2023-11-14T19:43:22.897+0800] GC(188) Metaspace: 129190K(132620K)->129190K(132620K) NonClass: 114389K(116496K)->114389K(116496K) Class: 14800K(16124K)->14800K(16124K)
[2023-11-14T19:43:22.897+0800] GC(188) Pause Young (Allocation Failure) 178M->144M(214M) 11.893ms
[2023-11-14T19:43:22.897+0800] GC(188) User=0.01s Sys=0.01s Real=0.01s
[2023-11-14T19:45:01.823+0800] GC(189) Pause Young (Allocation Failure)
[2023-11-14T19:45:01.824+0800] GC(189) Using 2 workers of 2 for evacuation
[2023-11-14T19:45:01.839+0800] GC(189) ParNew: 36342K->3219K(39296K)
[2023-11-14T19:45:01.839+0800] GC(189) CMS: 146520K->146520K(180340K)
[2023-11-14T19:45:01.839+0800] GC(189) Metaspace: 129306K(132876K)->129306K(132876K) NonClass: 114505K(116752K)->114505K(116752K) Class: 14800K(16124K)->14800K(16124K)
[2023-11-14T19:45:01.839+0800] GC(189) Pause Young (Allocation Failure) 178M->146M(214M) 15.275ms
[2023-11-14T19:45:01.839+0800] GC(189) User=0.01s Sys=0.00s Real=0.02s

服务重启前有如下日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
[2023-11-14T20:32:51.004+0800] GC(188) Pause Young (Allocation Failure)
[2023-11-14T20:32:51.004+0800] GC(188) Using 2 workers of 2 for evacuation
[2023-11-14T20:32:51.058+0800] GC(189) Pause Full (Allocation Failure)
[2023-11-14T20:32:51.062+0800] GC(189) Phase 1: Mark live objects
[2023-11-14T20:32:51.467+0800] GC(189) Phase 1: Mark live objects 405.287ms
[2023-11-14T20:32:51.468+0800] GC(189) Phase 2: Compute new object addresses
[2023-11-14T20:32:51.564+0800] GC(189) Phase 2: Compute new object addresses 96.497ms
[2023-11-14T20:32:51.564+0800] GC(189) Phase 3: Adjust pointers
[2023-11-14T20:32:51.796+0800] GC(189) Phase 3: Adjust pointers 232.254ms
[2023-11-14T20:32:51.796+0800] GC(189) Phase 4: Move objects
[2023-11-14T20:32:51.868+0800] GC(189) Phase 4: Move objects 71.866ms
[2023-11-14T20:32:51.870+0800] GC(189) Pause Full (Allocation Failure) 175M->155M(240M) 812.843ms
[2023-11-14T20:32:51.870+0800] GC(188) ParNew: 36097K->0K(39296K)
[2023-11-14T20:32:51.870+0800] GC(188) CMS: 147437K->159306K(206632K)
[2023-11-14T20:32:51.870+0800] GC(188) Metaspace: 136869K(140664K)->136869K(140664K) NonClass: 121057K(123436K)->121057K(123436K) Class: 15812K(17228K)->15812K(17228K)
[2023-11-14T20:32:51.871+0800] GC(188) Pause Young (Allocation Failure) 179M->155M(376M) 866.841ms
[2023-11-14T20:32:51.871+0800] GC(188) User=0.77s Sys=0.02s Real=0.87s
[2023-11-14T20:32:52.264+0800] GC(190) Pause Young (Allocation Failure)
[2023-11-14T20:32:52.264+0800] GC(190) Using 2 workers of 2 for evacuation
[2023-11-14T20:32:52.323+0800] GC(190) ParNew: 106044K->2425K(119616K)
[2023-11-14T20:32:52.323+0800] GC(190) CMS: 159306K->253782K(265512K)
[2023-11-14T20:32:52.323+0800] GC(190) Metaspace: 137075K(140920K)->137075K(140920K) NonClass: 121246K(123692K)->121246K(123692K) Class: 15829K(17228K)->15829K(17228K)
[2023-11-14T20:32:52.323+0800] GC(190) Pause Young (Allocation Failure) 259M->250M(376M) 59.274ms
[2023-11-14T20:32:52.323+0800] GC(190) User=0.07s Sys=0.03s Real=0.06s
[2023-11-14T20:32:52.333+0800] GC(191) Pause Initial Mark
[2023-11-14T20:32:52.336+0800] GC(191) Pause Initial Mark 314M->314M(376M) 3.208ms
[2023-11-14T20:32:52.336+0800] GC(191) User=0.01s Sys=0.00s Real=0.01s
[2023-11-14T20:32:52.337+0800] GC(191) Concurrent Mark
[2023-11-14T20:32:52.359+0800] GC(192) Pause Young (Allocation Failure)
[2023-11-14T20:32:52.360+0800] GC(192) Using 2 workers of 2 for evacuation
[2023-11-14T20:32:52.386+0800] GC(192) ParNew: 98752K->1208K(119616K)
[2023-11-14T20:32:52.386+0800] GC(192) CMS: 253782K->282718K(294456K)
[2023-11-14T20:32:52.386+0800] GC(192) Metaspace: 137076K(140920K)->137076K(140920K) NonClass: 121246K(123692K)->121246K(123692K) Class: 15829K(17228K)->15829K(17228K)
[2023-11-14T20:32:52.386+0800] GC(192) Pause Young (Allocation Failure) 344M->277M(404M) 26.394ms
[2023-11-14T20:32:52.386+0800] GC(192) User=0.03s Sys=0.01s Real=0.03s
[2023-11-14T20:32:52.404+0800] GC(193) Pause Young (Allocation Failure)
[2023-11-14T20:32:52.404+0800] GC(193) Using 2 workers of 2 for evacuation
[2023-11-14T20:32:52.404+0800] GC(194) Pause Full (Allocation Failure)
[2023-11-14T20:32:52.670+0800] GC(191) Concurrent Mark 333.151ms
[2023-11-14T20:32:52.670+0800] GC(191) User=0.35s Sys=0.01s Real=0.33s
[2023-11-14T20:32:52.671+0800] GC(194) Phase 1: Mark live objects
[2023-11-14T20:32:53.041+0800] GC(194) Phase 1: Mark live objects 370.158ms
[2023-11-14T20:32:53.041+0800] GC(194) Phase 2: Compute new object addresses
[2023-11-14T20:32:53.136+0800] GC(194) Phase 2: Compute new object addresses 94.856ms
[2023-11-14T20:32:53.136+0800] GC(194) Phase 3: Adjust pointers
[2023-11-14T20:32:53.382+0800] GC(194) Phase 3: Adjust pointers 245.870ms
[2023-11-14T20:32:53.382+0800] GC(194) Phase 4: Move objects
[2023-11-14T20:32:53.458+0800] GC(194) Phase 4: Move objects 75.996ms
[2023-11-14T20:32:53.460+0800] GC(194) Pause Full (Allocation Failure) 371M->155M(404M) 1056.315ms
[2023-11-14T20:32:53.460+0800] GC(193) ParNew: 97634K->0K(119616K)
[2023-11-14T20:32:53.460+0800] GC(193) CMS: 282718K->159367K(294456K)
[2023-11-14T20:32:53.460+0800] GC(193) Metaspace: 137076K(140920K)->137076K(140920K) NonClass: 121247K(123692K)->121247K(123692K) Class: 15829K(17228K)->15829K(17228K)
[2023-11-14T20:32:53.461+0800] GC(193) Pause Young (Allocation Failure) 371M->155M(495M) 1057.050ms
[2023-11-14T20:32:53.461+0800] GC(193) User=0.97s Sys=0.01s Real=1.06s
MAT 安装注意事项

解决MAT无法打开

1. jvm字段含义
2. Java中9种常见的CMS GC问题分析与解决
3. 深入理解堆外内存 Metaspace
4. G1大对象致Old区占用率高
5. jvm参数详解
6. serial gc
7. Segmented Code Cache
8. Memory Analyzer Tools使用说明
9. 深度探索JFR - 3. 各种Event详细说明与JVM调优策略
9. k8s oom排查
10. k8s服务质量
11. k8s容器资源