Investigation ideas in the K8S cluster encounter ETCD cluster faults

When creating an instance in the k8s cluster, it was found that the etcd cluster status failed to connect, causing the creation of the instance to fail. So I checked the cause.

Source of the problem

Here is the health status of etcd cluster:

1 < /div>

2
3
4
5
6
7

< div class="line number8 index7 alt1"> 8

9
10

< div class="line number11 index10 alt2"> 11

[[email protected] ~] # cd /opt/kubernetes/ssl/
[[email protected] ssl] # /opt/kubernetes/bin/etcdctl \
> --ca- file =ca.pem --cert- file =server.pem --key- < code class="bash functions">file =server-key.pem \
> --endpoints= "https://10.0.0.99:2379,https:/ /10.0.0.100:2379,https://10.0.0.111:2379" \
> cluster-health
member 1bd4d12de986e887 is healthy: got healthy result from https: //10 .0.0.99:2379
member 45396926a395958b is healthy: got healthy result from https: //10 .0.0.100:2379
failed to check the health of member c2c5804bd87e2884 on https: //10 .0.0. 111:2379: Get https: //10 .0.0.111:2379 /health : net /http : TLS handshake timeout
member c2c5804bd87e2884 is unreachable: [https: //10 .0.0.111:2379] are all unreachable
cluster is healthy
[[email protected] ssl] #

It is obvious that there is a problem with etcd node 03.

At this time, restart the etcd service on node 03 as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[[emailprotected] ~] # systemctl restart etcd
Job for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service " and "journalctl -xe" for details.
[[emailprotected] ~] # journalctl -xe
Mar 24 22:24:32 docker03 etcd[1895]: setting maximum number of CPUs to 1, total number of available CPUs is 1
Mar 24 22:24:32 docker03 etcd[1895]: the server is already initialized as member before, starting as etcd member...
Mar 24 22:24:32 docker03 etcd[1895] : peerTLS: cert = /opt/kubernetes/ssl/server .pem, key = /opt/kubernetes/ssl/server-key .pem, ca =, trusted-ca = /opt/kubernetes/ssl
Mar 24 22:24:32 docker03 etcd[1895]: listening for peers on https: //10 .0.0.111:2380
< code class="bash plain">Mar 24 22:24:32 docker03 etcd[1895]: The scheme of client url http: //127 .0.0.1:2379 is HTTP while peer key /cert files are presented. Ignored key /cert files.
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 127.0.0.1:2379
Mar 24 22:24:32 docker03 etcd[1895]: listening for code> client requests on 10.0.0.111:2379
Mar 24 22 :24:32 docker03 etcd[1895]: member c2c5804bd87e2884 has already been bootstrapped
Mar 24 22:24: 32 docker03 systemd[1]: etcd.service: main process exited, code=exited, status=1 /FAILURE
Mar 24 22:24:32 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit e tcd.service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed. < /div>

--
-- The result is failed.
Mar 24 22:24:32 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:32 docker03 systemd[1]: etcd.service failed.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service holdoff time over, scheduling restart.
Mar 24 22:24:33 docker03 systemd[ 1]: start request repeated too quickly for etcd.service
Mar 24 22: 24:33 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd. service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:33 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service failed.

The service did not start successfully, you can see the prompt message: member c2c5804bd87e2884 has already been bootstrapped

Viewing the information says:
One of the member was bootstrapped via discovery service. You must remove the previous data-dir to clean up the member information. Or the member will ignore the new configuration and start with the old configuration . That is why you see the mismatch.
Probably meaning:
One of the members is guided through the discovery service. The previous data directory must be deleted to clean up member information. Otherwise, the member will ignore the new configuration and use the old configuration. This is why you see a mismatch.
Seeing this, the problem is very clear. The reason for the startup failure is that the information recorded in data-dir (/var/lib/etcd/default.etcd) is not the same as the information identified by etcd startup options Caused by matching.

Problem solving

In the first way, we can solve this type of error by modifying the startup parameters. Since the information has been recorded in the data-dir, we don’t need to add more than configuration to the startup item. Specific modification –initial-cluster-state parameters:

1
2
3
4
5
6
7
8 < /div>

9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[[email protected] ~] # cat /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network. target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
EnvironmentFile=- /opt/kubernetes/cfg/etcd
ExecStart= /opt/kubernetes/bin/etcd \
--name=${ETCD_NAME} \

--data- dir =${ETCD_DATA_DIR} \
--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http: //127 .0.0.1:2379 \ < /div>

--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${ETCD_INITIAL_CLUSTER} \
--initial- cluster-token=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-state=existing \ # Modify the new parameter to existing, start normally!
--cert- file = /opt/kubernetes/ssl/server .pem \
--key- file = /opt/kubernetes/ssl/server-key .pem \
--peer-cert- file = /opt/kubernetes/ssl/server .pem \
--peer-key- file = /opt/kubernetes/ssl/server-key .pem \< /code>
--trusted-ca- file = /opt/kubernetes/ssl/ca .pem \ code>
--peer-trusted-ca- file code> = /opt/kubernetes/ssl/ca .pem c ode>
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target

We will modify --initial-cluster-state=new to -initial-cluster-state=existing, restart again and it will be ok.

The second way is to delete the data-dir files of all etcd nodes (you don't need to delete them), and restart the etcd service of each node. At this time, the data-dir data of each node will be updated. There will be no such failures.

The third way is to copy the contents of the data-dir of other nodes, and on this basis, force one in the form of --force-new-cluster, and then add new members Restore this cluster.

These are the current solutions.

Create an instance in the k8s cluster once and find that the etcd cluster status has a connection failure, resulting in Failed to create instance. So I checked the cause.

Source of the problem

Here is the health status of etcd cluster:

1 < /div>

2
3
4
5
6
7

< div class="line number8 index7 alt1"> 8

9
10
11
[[emailprotected] ~] # cd /opt/kubernetes/ssl/
[[email protected] ssl] # /opt/kubernetes/bin/etcdctl \
> --ca- file =ca.pem --cert- file =server.pem --key- file< /code> =server-key.pem \
> --endpoints= "https://10.0.0.99:2379,https://10.0.0.100:2379,https://10.0.0.111:2379 " \
> cluster-health
member 1bd4d12de986e887 is healthy: got healthy result from https: / /10 .0.0.99:2379
member 45396926a395958b is healthy: got healthy result from https: //10 .0.0.100:2379 < /div>

failed to check the health of member c2c5804bd87e2884 on http s: //10 .0.0.111:2379: Get https: //10 .0.0.111:2379 /health : net /http : TLS handshake timeout
member c2c5804bd87e2884 is unreachable: [https: //10 .0.0.111:2379] are all unreachable
cluster is healthy

[[emailprotected] ssl] #

It is obvious that there is a problem with etcd node 03 .

At this time, restart the etcd service on node 03 as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[[email protected] ~] # systemctl restart etcd
Job for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.
[[email protected] ~] # journalctl -xe
Mar 24 22:24:32 docker03 etcd[1895]: setting maximum number of CPUs to 1, total number of available CPUs is 1
Mar 24 22:24:32 do cker03 etcd[1895]: the server is already initialized as member before, starting as etcd member...
Mar 24 22:24:32 docker03 etcd[1895]: peerTLS: cert = /opt/kubernetes/ssl/server .pem, key = /opt/kubernetes/ssl/server-key .pem, ca = , trusted-ca = /opt/kubernetes/ssl
Mar 24 22:24:32 docker03 etcd[1895]: listening for peers on https: //10 .0.0.111:2380
Mar 24 22:24:32 docker03 etcd[1895]: The scheme of client url http: //127 .0.0.1:2379 is HTTP while peer key /cert files are presented. Ignored key /cert files.
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 127.0.0.1:2379
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 10.0.0.111:2379
Mar 24 22:24:32 docker03 etcd[1895]: member c2c5804bd87e2884 has already been bootstrapped
Mar 24 22:24:32 docker03 systemd[1]: etcd.service: main process exited, code=exited, status=1 /FAILURE
Mar 24 22:24:32 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd

-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel

--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:32 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:32 docker03 systemd[1]: etcd.service failed.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service holdoff time over, scheduling restart.
Mar 24 22:24:33 docker03 systemd[1]: start request repeated too quickly for etcd.service
Mar 24 22:24:33 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:33 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service failed.

并没有成功启动服务,可以看到提示信息:member c2c5804bd87e2884 has already been bootstrapped

查看资料说是:
One of the member was bootstrapped via discovery service. You must remove the previous data-dir to clean up the member information. Or the member will ignore the new configuration and start with the old configuration. That is why you see the mismatch.
大概意思:
其中一个成员是通过discovery service引导的。必须删除以前的数据目录来清理成员信息。否则成员将忽略新配置,使用旧配置。这就是为什么你看到了不匹配。
看到了这里,问题所在也就很明确了,启动失败的原因在于data-dir (/var/lib/etcd/default.etcd)中记录的信息与 etcd启动的选项所标识的信息不太匹配造成的。

问题解决

第一种方式我们可以通过修改启动参数解决这类错误。既然 data-dir 中已经记录信息,我们就没必要在启动项中加入多于配置。具体修改--initial-cluster-state参数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[[email protected] ~] # cat /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server

After=network.target

After=network-online.target
Wants=network-online.target
 
[Service]
Type=notify
EnvironmentFile=- /opt/kubernetes/cfg/etcd
ExecStart= / opt/kubernetes/bin/etcd \
--name=${ETCD_NAME} \
--data- dir =${ETCD_DATA_DIR} \
--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http: //127 .0.0.1:2379 \
--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-token=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-state=existing \  # 将new这个参数修改成existing,启动正常!
--cert- file = /opt/kubernetes/ssl/server .pem \
--key- file = /opt/kubernetes/ssl/server-key .pem \
--peer-cert- file = /opt/kubernetes/ssl/server .pem \
--peer-key- file = /opt/kubernetes/ssl/server-key .pem \
--trusted-ca- file = /opt/kubernetes/ssl/ca .pem \
--peer-trusted-ca- file = /opt/kubernetes/ssl/ca .pem
Restart=on-failure
LimitNOFILE=65536
 
[Install]
WantedBy=multi-user.target

我们将 --initial-cluster-state=new 修改成  --initial-cluster-state=existing,再次重新启动就ok了。

第二种方式删除所有etcd节点的 data-dir 文件(不删也行),重启各个节点的etcd服务,这个时候,每个节点的data-dir的数据都会被更新,就不会有以上故障了。

第三种方式是复制其他节点的data-dir中的内容,以此为基础上以 --force-new-cluster 的形式强行拉起一个,然后以添加新成员的方式恢复这个集群。

这是目前的几种解决办法

1
2
3
4
5
6
7
8
9
10
11
[[email protected] ~] # cd /opt/kubernetes/ssl/
[[email protected] ssl] # /opt/kubernetes/bin/etcdctl \
> --ca- file =ca.pem --cert- file =server.pem --key- file =server-key.pem \
> --endpoints= "https://10.0.0.99:2379,https://10.0.0.100:2379,h ttps://10.0.0.111:2379" \
> cluster-health
member 1bd4d12de986e887 is healthy: got healthy result from https: //10 .0.0.99:2379
member 45396926a395958b is healthy: got healthy result from https: //10 .0.0.100:2379
failed to check the health of member c2c5804bd87e2884 on https: //10 .0.0.111:2379: Get https: //10 .0.0.111:2379 /health : net /http : TLS handshake timeout
member c2c5804bd87e2884 is unreachable: [https: //10 .0.0.111:2379] are all unreachable
cluster is healthy
[[email protected] ssl] #

1
2
3
4
5
6
7
8
9
10
11
[[email protected] ~] # cd /opt/kubernetes/ssl/
[[email protected] ssl] # /opt/kubernetes/bin/etcdctl \
> --ca- file =ca.pem --cert- file =server.pem --key- file =server-key.pem \
> --endpoints= "https://10.0.0.99:2379,https://10.0.0.100:2379,https://10.0.0.111:2379" \
> cluster-health
member 1bd4d12de986e887 is healthy: got healthy result from https: //10 .0.0.99:2379
member 45396926a395958b is healthy: got healthy result from https: //10 .0.0.100:2379
failed to check the health of member c2c5804bd87e2884 on https: //10 .0.0.111:2379: Get https: //10 .0.0.111:2379 /health : net /http : TLS handshake timeout
member c2c5804bd87e2884 is unreachable: [https: //10 .0.0.111:2379] are all unreachable
cluster is healthy
[[email protected] ssl] #

1
2
3
4
5
6
7
8
9
10
11
[[email protected] ~] # cd /opt/kubernetes/ssl/
[[email protected] ssl] # /opt/kubernetes/bin/etcdctl \
> --ca- file =ca.pem --cert- file =server.pem --key- file =server-key.pem \
> --endpoints= "https://10.0.0.99:2379,https://10.0.0.100:2379,https://10.0.0.111:2379" \
> cluster-health
member 1bd4d12de986e887 is healthy: got healthy result from https: //10 .0.0.99:2379
member 45396926a395958b is healthy: got healthy result from https: //10 .0.0.100:2379
failed to check the health of member c2c5804bd87e2884 on https: //10 .0.0.111:2379: Get https: //10 .0.0.111:2379 /health : net /http : TLS handshake timeout
member c2c5804bd87e2884 is unreachable: [https: //10 .0.0.111 :2379] are all unreachable
cluster is healthy
[[email protected] ssl] #

1

2

3

4

5

6

7

8

9

10

11

[[email protected] ~] # cd /opt/kubernetes/ssl/
[[email protected] ssl] # /opt/kubernetes/bin/etcdctl \
> --ca- file =ca.pem --cert- fi le =server.pem --key- file =server-key.pem \
> --endpoints= "https://10.0.0.99:2379,https://10.0.0.100:2379,https://10.0.0.111:2379" \
> cluster-health
member 1bd4d12de986e887 is healthy: got healthy result from https: //10 .0.0.99:2379
member 45396926a395958b is healthy: got healthy result from https: //10 .0.0.100:2379
failed to check the health of member c2c5804bd87e2884 on https: //10 .0.0.111:2379: Get https: //10 .0.0.111:2379 /health : net /http : TLS handshake timeout
member c2c5804bd87e2884 is unreachable: [https: //10 .0.0.111:2379] are all unreachable
cluster is healthy
[[email protected] ssl] #

[[email protected] ~] # cd /opt/kubernetes/ssl/

[[email protected] ssl] # /opt/kubernetes/bin/etcdctl \

> --ca- file =ca.pem --cert- file =server.pem --key- file =server-key.pem \

> --endpoints= "https://10.0.0.99:2379,https://10.0.0.100:2379,https://10.0.0.111:2379" \

> cluster-health

member 1bd4d12de986e887 is healthy: got healthy result from https: //10 .0.0.99:2379

member 45396926a395958b is healthy: got healthy result from https: //10 .0.0.100:2379

failed to check the health of member c2c5804bd87e2884 on https: //10 .0.0.111:2379: Get https: //10 .0.0.111:2379 /health : net /http : TLS handshake timeout

member c2c5804bd87e2884 is unreachable: [https: //10 .0.0.111:2379] are all unreachable

cluster is healthy

[[email protected] ssl] #

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[[email protected] ~] # systemctl restart etcd
Job for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.
[[email protected] ~] # journalctl -xe
Mar 24 22:24:32 docker03 etcd[1895]: setting maximum number of CPUs to 1, total number of available CPUs is 1
Mar 24 22:24:32 docker03 etcd[1895]: the server is already initialized as member before, starting as etcd member...
Mar 24 22:24:32 docker03 etcd[1895]: peerTLS: cert = /opt/kubernetes/ssl/server .pem, key = /opt/kubernetes/ssl/server-key .pem, ca = , trusted-ca = /opt/kubernetes/ssl
Mar 24 22:24:32 docker03 etcd[1895]: listening for peers on https: //10 .0.0.111:2380
Mar 24 22:24:32 docker03 etcd[1895]: The scheme of client url http: //127 .0.0.1:2379 is HTTP while peer key /cert files are presented. Ignored key /cert files.
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 127.0.0.1:2379
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 10.0.0.111:2379
Mar 24 22:24:32 docker03 etcd[1895]: member c2c5804bd87e2884 has already been bootstrapped
Mar 24 22:24:32 docker03 systemd[1]: etcd.service: main process exited, code=exited, status=1 /FAILURE
Mar 24 22:24:32 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:32 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:32 docker03 systemd[1]: etcd.service failed.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service holdoff time ove r, scheduling restart.
Mar 24 22:24:33 docker03 systemd[1]: start request repeated too quickly for etcd.service
Mar 24 22:24:33 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /m ailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:33 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service failed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[[email protected] ~] # systemctl restart etcd
Job for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe " for details.
[[email protected] ~] # journalctl -xe
Mar 24 22:24:32 docker03 etcd[1895]: setting maximum number of CPUs to 1, total number of available CPUs is 1
Mar 24 22:24:32 docker03 etcd[1895]: the server is already initialized as member before, starting as etcd member...
Mar 24 22:24:32 docker03 etcd[1895]: peerTLS: cert = /opt/kubernetes/ssl/server .pem, key = /opt/kubernetes/ssl/server-key .pem, ca = , trusted-ca = /opt/kubernetes/ssl
Mar 24 22:24:32 docker03 etcd[1895]: listening for peers on https: //10 .0.0.111:2380
Mar 24 22:24:32 docker03 etcd[1895]: The scheme of client url http: //127 .0.0.1:2379 is HTTP while peer key /cert files are presented. Ignored key /cert files.
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 127.0.0.1:2379
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 10.0.0.111:2379
Mar 24 22:24:32 docker03 etcd[1895]: member c2c5804bd87e2884 has already been bootstrapped
Mar 24 22:24:32 docker03 systemd[1]: etcd.service: main process exited, code=exited, status=1 /FAILURE
Mar 24 22:24:32 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.

--
-- The result is failed.
Mar 24 22:24:32 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:32 docker03 systemd[1]: etcd.service failed.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service holdoff time over, scheduling restart.
Mar 24 22:24:33 docker03 systemd[1]: start request repeated too quickly for etcd.service
Mar 24 22:24:33 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:33 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service failed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[[email protected] ~] # systemctl restart etcd
Job for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.
[[email protected] ~] # journalctl -xe
Mar 24 22:24:32 docker03 etcd[1895]: setting maximum number of CPUs to 1, total number of available CPUs is 1
Mar 24 22:24:32 docker03 etcd[1895]: the server is already initialized as member before, starting as etcd member...
Mar 24 22:24:32 docker03 etcd[1895]: peerTLS: cert = /opt/kubernetes/ssl/server .pem, key = /opt/kubernetes/ssl/server-key .pem, ca = , trusted-ca = /opt/kubernetes/ssl
Mar 24 22:24:32 docker03 etcd[1895]: listening for peers on https: //10 .0.0.111:2380
Mar 24 22:24:32 docker03 etcd[1895]: The scheme of client url http: //127 .0.0.1:2379 is HTTP while peer key /cert files are presented. Ignored key /cert files.
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 127.0.0.1:2379
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 10.0.0.111:2379
Mar 24 22:24:32 docker03 etcd[1895]: member c2c5804bd87e2884 has already been bootstrapped
Mar 24 22:24:32 docker03 systemd[1]: etcd.service: main process exited, code=exited, status=1 /FAILURE
Mar 24 22:24:32 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:32 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:32 docker03 systemd[1]: etcd. service failed.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service holdoff time over, scheduling restart.
Mar 24 22:24:33 docker03 systemd[1]: start request repeated too quickly for etcd.service
Mar 24 22:24:33 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:33 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:33 docker03 systemd [1]: etcd.service failed.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

[[email protected] ~] # systemctl restart etcd
Job for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.
[[email protected] ~] # journalctl -xe
Mar 24 22:24:32 docker03 etcd[1895]: setting maximum number of CPUs to 1, total number of available CPUs is 1
Mar 24 22:24:32 docker03 etcd[1895]: the server is already initialized as member before, starting as etcd member...
Mar 24 22:24:32 docker03 etcd[1895]: peerTLS: cert = /opt/kubernetes/ssl/server .pem, key = /opt/kubernetes/ssl/server-key .pem, ca = , trusted-ca = /opt/kubernetes/ssl
Mar 24 22:24:32 docker03 etcd[1895]: listening for peers on https: //10 .0.0.111:2380
Mar 24 22:24:32 docker03 etcd[1895]: The scheme of client url http: //127 .0.0.1:2379 is HTTP while peer key /cert files are presented. Ignored key /cert files.
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 127.0.0.1:23 79
Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 10.0.0.111:2379
Mar 24 22:24:32 docker03 etcd[1895]: member c2c5804bd87e2884 has already been bootstrapped
Mar 24 22:24:32 docker03 systemd[1]: etcd.service: main process exited, code=exited, status=1 /FAILURE
Mar 24 22:24:32 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd< /code>
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:32 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:32 docker03 systemd[1]: etcd.service failed.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service holdoff time over, scheduling restart.
Mar 24 22:24:33 docker03 systemd[1]: start request repeated too quickly for etcd.service
Mar 24 22:24:33 docker03 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Mar 24 22:24:33 docker03 systemd[1]: Unit etcd.service entered failed state.
Mar 24 22:24:33 docker03 systemd[1]: etcd.service failed.

[[email protected] ~] # systemctl restart etcd

Job for etcd.service failed bec ause the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.

[[email protected] ~] # journalctl -xe

Mar 24 22:24:32 docker03 etcd[1895]: setting maximum number of CPUs to 1, total number of available CPUs is 1

Mar 24 22:24:32 docker03 etcd[1895]: the server is already initialized as member before, starting as etcd member...

Mar 24 22:24:32 docker03 etcd[1895]: peerTLS: cert = /opt/kubernetes/ssl/server .pem, key = /opt/kubernetes/ssl/server-key .pem, ca = , trusted-ca = /opt/kubernetes/ssl

Mar 24 22:24:32 docker03 etcd[1895]: listening for peers on https: //10 .0.0.111:2380

Mar 24 22:24:32 docker03 etcd[1895]: The scheme of client url http: //127 .0.0.1:2379 is HTTP while peer key /cert files are presented. Ignored key /cert files.

Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 127.0.0.1:2379

Mar 24 22:24:32 docker03 etcd[1895]: listening for client requests on 10.0.0.111:2379

Mar 24 22:24:32 docker03 etcd[1895]: member c2c5804bd87e2884 has already been bootstrapped

Mar 24 22:24:32 docker03 systemd[1]: etcd.service: main process exited, code=exited, status=1 /FAILURE

Mar 24 22:24:32 docker03 systemd[1]: Failed to start Etcd Server.

-- Subject: Unit etcd.service has failed

-- Defined-By: systemd

-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel

--

-- Unit etcd.service has failed.

--

-- The result is failed.

Mar 24 22:24:32 docker03 systemd[1]: Unit etcd.service entered failed state.

Mar 24 22:24:32 docker03 systemd[1]: etcd.service failed.

Mar 24 22:24:33 docker03 systemd[1]: etcd.service holdoff time over, scheduling restart.

Mar 24 22:24:33 docker03 systemd[1]: start request repeated too quickly for etcd.service

Mar 24 22:24:33 docker03 systemd[1]: Failed to start Etcd Server.

-- Subject: Unit etcd.service has failed

-- Defined-By: systemd

-- Support: http: //lists .freedesktop.org /mailman/listinfo/systemd-devel

--

-- Unit etcd.service has failed.

--

-- The result is failed.

Mar 24 22:24:33 docker03 systemd[1]: Unit etcd.service entered failed state.

Mar 24 22:24:33 docker03 systemd[1]: etcd.service failed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[[email protected] ~] # cat /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
 
[Service]
Type=notify
EnvironmentFile=- /opt/kubernetes/cfg/etcd
ExecStart= /opt/kubernetes/bin/etcd \
--name=${ETCD_NAME} \
--data- dir =${ETCD_DATA_DIR} \
--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http: //127 .0.0.1:2379 \
--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-token=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-state=existing \  # 将new这个参数修改成existing,启动正常!
--cert- file = /opt/kubernetes/ssl/server .pem \
--key- file = /opt/kubernetes/ssl/server-key .pem \
--peer-cert- file = /opt/kubernetes/ssl/server .pem \
--peer-key- file = /opt/kubernetes/ssl/server-key .pem \
--trusted-ca- file = /opt/kubernetes/ssl/ca .pem \
--peer-trusted-ca- file = /opt/kubernetes/ssl/ca .pem
Restart=on-failure
LimitNOFILE=65536
 
[Install]
WantedBy=multi-user.target

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[[email protected] ~] # cat /usr/lib/systemd/system/etcd. service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
 
[Service]
Type=notify
EnvironmentFile=-< /code> /opt/kubernetes/cfg/etcd
ExecStart= /opt/kubernetes/bin/etcd \
--name=${ETCD_NAME} \
--data- dir =${ETCD_DATA_DIR} \
--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http: //127 .0.0. 1:2379 \
--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-token=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-state=existing \  # 将new这个参数修改成existing,启动正常!
--cert- file = /opt/kubernetes/ssl/server .pem \
--key- file = /opt/kubernetes/ssl/server-key .pem \
--peer-cert- file = /opt/kubernetes/ssl/server .pem \
--peer-key- file = /opt/kubernetes/ssl/server-key .pem \
--trusted-ca- file = /opt/kubernetes/ssl/ca .pem \
--peer-trusted-ca- file = /opt/kubernetes/ssl/ca .pem
Restart=on-failure
LimitNOFILE=65536
 
[Install]
WantedBy=multi-user.target

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[[email protected] ~] # cat /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
 
[Service]
Type=notify
EnvironmentFile=- /opt/kubernetes/cfg/etcd
ExecStart= /opt/kubernetes/bin/etcd \
--name=${ETCD_NAME} \
--data- dir =${ETCD_DATA_DIR} \
--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http: //127 .0.0.1:2379 \
--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-u rls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-token=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-state=existing \  # 将new这个参数修改成existing,启动正常!
--cert- file = /opt/kubernetes/ssl/server .pem \
--key- file = /opt/kubernetes/ssl/server-key .pem \
--peer-cert- file = /opt/kubernetes/ssl/server .pem \
--peer-key- file = /opt/kubernetes/ssl/server-key .pem \
--trusted-ca- file = /opt/kubernetes/ssl/ca .pem \
--peer-trusted-ca- file = /opt/kubernetes/ssl/ca .pem
Restart=on-failure
LimitNOFILE=65536
 
[Install]
WantedBy=multi-user.target

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

[[email protected] ~] # cat /usr/lib/systemd/system/etcd.service
[Unit]
< code class="bash plain">Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
 
[Service]
Type=notify
EnvironmentFile=- /opt/kubernetes/cfg/etcd
ExecStart= /opt/kubernetes/bin/etcd \
--name=${ETCD_NAME} \
--data- dir =${ETCD_DATA_DIR} \
--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http: //127 .0.0.1:2379 \
--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-token=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-state=existing \  # 将new这个参数修改成existing,启动正常!
--cert- file = /opt/kubernetes/ssl/server .pem \
--key- file = /opt/kubernetes/ssl/server-key .pem \
--peer-cert- file = /opt/kubernetes/ssl/server .pem \
--peer-key- file = /opt/kubernetes/ssl/server-key .pem \
--trusted-ca- file = /opt/kubernetes/ssl/ca .pem \
--peer-trusted-ca- file = /opt/kubernetes/ssl/ca .pem
Restart=on-failure
LimitNOFILE=65536
 
[Install]
WantedBy=multi-user.target

[[email protected] ~] # cat /usr/lib/systemd/system/etcd.service

[Unit]

Description=Etcd Server

After=network.target

After=network-online.target

Wants=network-online.target

 

[Service]

Type=notify

EnvironmentFile=- /opt/kubernetes/cfg/etcd

ExecStart= /opt/kubernetes/bin/etcd \

--name=${ETCD_NAME} \

--data- dir =${ETCD_DATA_DIR} \

--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \

--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http: //127 .0.0.1:2379 \

--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \

--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \

--initial-cluster=${ETCD_INITIAL_CLUSTER} \

--initial-cluster-token=${ETCD_INITIAL_CLUSTER} \

--initial-cluster-state=existing \  # 将new这个参数修改成existing,启动正常!

--cert- file = /opt/kubernetes/ssl/server .pem \

--key- file = /opt/kubernetes/ssl/server-key .pem \

--peer-cert- file = /opt/kubernetes/ssl/server .pem \

--peer-key- file = /opt/kubernetes/ssl/server-key .pem \

--trusted-ca- file = /opt/kubernetes/ssl/ca .pem \

--peer-trusted-ca- file = /opt/kubernetes/ssl/ca .pem

Restart=on-failure

LimitNOFILE=65536

 

[Install]

WantedBy=multi-user.target

Leave a Comment

Your email address will not be published.