Ceph basic operation

Version number

[[email protected] ~]# ceph -v 
ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)

Status
In admin node executes
ceph -s

You can see the status of the cluster, as shown in the following example

cluster 936a5233-9441-49df-95c1-01de82a192f4< br /> health HEALTH_OK
monmap e5: 6 mons at {ceph-1=100.100.200.201:6789/0,ceph-2=100.100.200.202:6789/0,ceph-3=100.100.200.203:6789/ 0,ceph-4=100.100.200.204:6789/0,ceph-5=100.100.200.205:6789/0,ceph-6=100.100.200.206:6789/0}
election epoch 382, ​​quorum 0,1 ,2,3,4,5 ceph-1,ceph-2,ceph-3,ceph-4,ceph-5,ceph-6
fsmap e85: 1/1/1 up {0=ceph-2 =up:active}
osdmap e62553: 111 osds: 109 up, 109 in
flags sortbitwise,require_jewel_osds
pgmap v72844263: 5064 pgs, 24 pools, 93130 GB data, 13301 kobjects
273 TB used, 133 TB / 407 TB avail
5058 active+clean
6 active+clean+scrubbing+deep
client io 57046 kB/s rd, 3544 2 kB/s wr, 1703 op/s rd, 1486 op/s wr

If we need to observe continuously, there are two ways
one is:
ceph -w

This is the official method, the effect is the same as ceph -s, but the client io line below will continue to be updated
Sometimes we expect to see the changes in other information above, so I wrote a script

watch -n 1 "ceph -s|
awk -v ll=$COLUMNS'/^ *mds[0-9]/{
\$0 =substr(\$0, 1, ll);
}
/^ +[0-9]+ pg/{next}
/monmap/{ next }
/^ +recovery [0-9]+/{next}
{print}';
ceph osd pool stats | awk'/^pool/{
p=\$2
}
/^ +(recovery|client)/{
if(p){print \"\n\"p; p=\"\"};
print
} '"

Reference output

Every 1.0s: ceph -s| awk -v ll=105'/^ *mds[0-9]/{$0=substr($0, 1, ll);} /^ ... Mon Jan 21 18:09:44 2019

cluster 936a5233-9441-49df-95c1-01de82a192f4
health HEALTH_OK
election epoch 382, quorum 0,1,2,3,4,5 ceph-1,ceph-2,ceph-3,ceph-4,ceph-5,ceph-6
fsmap e85: 1/1/1 up {0=ceph-2=up:active}
osdmap e62561: 111 osds: 109 up, 109 in
flags sortbitwise, require _jewel_osds
pgmap v73183831: 5064 pgs, 24 pools, 93179 GB data, 13310 kobjects
273 TB used, 133 TB / 407 TB avail
5058 active+clean
6 active+clean +scrubbing+deep
client io 263 MB/s rd, 58568 kB/s wr, 755 op/s rd, 1165 op/s wr

cinder-sas
client io 248 MB/s rd, 33529 kB/s wr, 363 op/s rd, 597 op/s wr

vms
client io 1895 B/s rd, 2343 kB/s wr, 121 op/s rd, 172 op/s wr

cinder-ssd
client io 15620 kB/s rd, 22695 kB/s wr, 270 op/s rd, 395 op/s wr

Dosage

# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
407T 146T 260T 64.04
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
cinder-sas 13 76271G 89.25 9186G 10019308
images 14 649G 6.60 9186G 339334
vms 15 7026G 43.34 9186G 1807073
cinder-ssd 16 4857G 74.73 1642G 645823
rbd 17 0 0 16909G 1

osd
You can quickly see the osd Topological relationship, can be used to view information such as osd status

# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-10008 0 root sas6t3
-10007 0 root sas6t2
-10006 130.94598 root sas6t1
-12 65.47299 host ceph-11
87 5.45599 osd.87 up 1.00000 0.89999
88 5.45599 osd.88 up 0.79999 0.29999
89 5.45599 osd.89 up 1.00000 0.89999
90 5.45599 osd.90 up 1.00000 0.89999
91 5.45599 osd.91 up 1.00000 0.89999
92 5.45599 osd.92 up 1.00000 0.79999
93 5.45599 osd.93 up 1.00000 0.89999
94 5.45599 osd.94 up 1.00000 0.89999
95 5.45599 osd.95 up 1.00000 0.89999
96 5.45599 osd.96 up 1.00000 0.89999< br /> 97 5.45599 osd.97 up 1.00000 0.89999
98 5.45599 osd.98 up 0.89999 0.89999
-13 65.47299 host ceph-12
99 5.45599 osd.99 up 1.00000 0.79999
100 5.45599 osd.100 up 1.00000 0.79999
101 5.45599 osd.101 up 1.00000 0.79999
102 5.45599 osd.102 up 1.00000 0.79999
103 5.45599 osd.103 up 1.00000 0.79999
104 5.45599 osd.104 up 0.79999 0.79999
105 5.45599 osd.105 up 1.00000 0.79999
106 5.45599 osd.106 up 1.00000 0.79999
107 5.45599 osd.107 up 1.00000 0.79999
108 5.45599 osd.108 up 1.00000 0.79999
109 5.45599 osd.109 up 1.00000 0.79999
110 5.45599 osd.110 up 1.00000 0.79999

I wrote a script that can be highlighted
ceph osd df | awk -v c1=84 -v c2=90'{z=NF -2; if($z<=100&&$z>c1){c=34;if($z>c2)c=31;$z="\033["c";1m"$z"\033[ 0m"}; print}'

reweight
Manual weight
When the osd load is unbalanced, manual intervention is required. The default value is 1, we generally Reduce weight
osd reweight <int[0-]> <float[0.0-1.0]> reweight osd to 0.0 < <weight> < 1.0

primary affinity
This controls the ratio of the pg in the osd to the primary. 0 means that unless other pgs are down, they will not become pg. 1 means that unless the others are all 1, then this will definitely become primary As for other values, the specific number of PGs is calculated based on the osd topology calculation. After all, different pools may be located on different osd

osd primary-affinity  id)>   <= 1.0

pool
The commands all start with ceph osd pool
Look at which pools
ceph osd pool ls

Add detail at the end to see the pool details

# ceph osd pool ls detail
pool 13'cinder-sas' replicated size 3 min_size 2 crush_ruleset 8 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 63138 flags hashpspool stripe_width 0
removed_snaps [1~5,7~2,a~2,e~10,23~4,2c~24,51~2,54~2,57~2,5a~a]
pool 14'images' replicated size 3 min_size 2 crush_ruleset 8 object_hash rjenkins pg_num 512 pgp_num 512 last_change 63012 flags hashpspool stripe_width 0

Adjust pool attributes

# ceph osd pool set pool name attribute value
osd pool set size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_period|hit_set_period|hit_set_period|hit_set_period|hit_set_period |use_gmt_hitset|debug_fake_ec_pool|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target_dirty_high _ratio | cache_target_full_ratio | cache_min_flush_age | cache_min_evict_age | auid | min_read_recency_for_promote | min_write_recency_for_promote | fast_read | hit_set_grade_decay_rate | hit_set_search_last_n | scrub_min_interval | scrub_max_interval | deep_scrub_interval | recovery_priority | recovery_op_priority | scrub_priority {--yes-i-really-mean-it}: set pool parameter to

pg
Commands start with ceph pg

View status

# ceph pg stat
v79188443 : 5064 pgs: 1 active+clean+scrubbing, 2 active+clean+scrubbing+deep, 5061 active+clean; 88809 GB data, 260 TB used, 146 TB / 407 TB avail; 384 MB/s rd, 134 MB/s wr, 2380 op/s

ceph pg ls, followed by status or other parameters

# ceph pg ls | grep scrub
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
13.1e 4832 0 0 0 0 3955033088 0 3034 3034 active+clean+scrubbing+deep 2019-04-08 15:24:46.496295 63232'167226529 63232:72970092 [95,80,44] 95 [95,80,44] 95 63130'167208564 2019-04-07 05:16:01.452400 63130'167117875 2019-04-05 18:53:54.796948
13.13b 4955 0 0 0 0 40587477010 3065 3065 active+clean+scrubbing+deep 2019-04-08 15:19:43.641336 63232 '93849435 63232:89107385 [87,39,78] 87 [87,39,78] 87 63130'93838372 2019-04-07 08:07:43.825933 62998'93796094 2019-04-01 22:23:14.399257
13.1ac 4842 0 0 0 0 39605106850 3081 3081 active+clean+scrubbing+deep 2019-04-08 15:26:40.119698 63232'29801889 63232:23652708 [110,31,76] 110 [110,31,76] 110 63130'29797321 2019-04-07 10:50:26.243588 62988'29759937 2019-04-01 08:19:34.927978
13.31f 4915 0 0 0 0 40128633874 3013 3013 active+clean+scrubbing 2019-04-08 1 5:27:19.489919 63232'45174880 63232:38010846 [99,25,42] 99 [99,25,42] 99 63130'45170307 2019-04-07 06:29:44.946734 63130'45160962 2019-04-05 21: 30:38.849569
13.538 4841 0 0 0 0 39564094976 3003 3003 active+clean+scrubbing 2019-04-08 15:27:15.731348 63232'69555013 63232:58836987 [109,85,24] 109 [109,85, 24] 109 63130'69542700 2019-04-07 08:09:00.311084 63130'69542700 2019-04-07 08:09:00.311084
13.71f 4851 0 0 0 0 39552301568 3014 3014 active+clean+scrubbing 2019- 04-08 15:27:16.896665 63232'57281834 63232:49191849 [100,75,66] 100 [100,75,66] 100 63130'57247440 2019-04-07 05:43:44.886559 63008'57112775 2019-04- 03 05:15:51.434950
13.774 4867 0 0 0 0 39723743842 3092 3092 active+clean+scrubbing 2019-04-08 15:27:19.501188 63232'32139217 63232:28360980 [101,63,2 1] 101 [101,63,21] 101 63130'32110484 2019-04-07 06:24:22.174377 63130'32110484 2019-04-07 06:24:22.174377
13.7fe 4833 0 0 0 0 39485484032 3015 3015 active+clean+scrubbing+deep 2019-04-08 15:27:15.699899 63232'38297730 63232:32962414 [108,82,56] 108 [108,82,56] 108 63130'38286258 2019-04-07 07: 59:53.586416 63008'38267073 2019-04-03 14:44:02.779800

Of course, you can also use the command starting with ls-by

pg ls {} {active| clean|down|replay|splitting|scrubbing|scrubq|degraded|inconsistent|peering|repair|recovering|backfill_wait|incomplete|stale|remapped|deep_scrub|backfill|backfill_toofull|recovery_wait|undersized|activating|peered [active|clean|down| replay|splitting|scrubbing|scrubq|degraded|inconsistent|peering|repair|recovering|backfill_wait|incomplete|stale|remapped|deep_scrub|backfill|backfill_toofull|recovery_wait|undersized|activating|peered...]}
pg ls -by-primary {} {a ctive|clean|down|replay|splitting|scrubbing|scrubq|degraded|inconsistent|peering|repair|recovering|backfill_wait|incomplete|stale|remapped|deep_scrub|backfill|backfill_toofull|recovery_wait|undersized|activating|peered [active|clean| down|replay|splitting|scrubbing|scrubq|degraded|inconsistent|peering|repair|recovering|backfill_wait|incomplete|stale|remapped|deep_scrub|backfill|backfill_toofull|recovery_wait|undersized|activating|peered...])
pg ls-by-osd {} {active|clean|down|replay|splitting|scrubbing|scrubq|degraded|inconsistent|peering|repair|recovering|backfill_wait|incomplete| stale|remapped|deep_scrub|backfill|backfill_toofull|recovery_wait|undersized|activating|peered [active|clean|down|replay|splitting|scrubbing|scrubq|degraded|inconsistent|peering|repair|recovering|backfill_wait|incomplete|stale|remapped| deep_scrub|backfill|backfill_toofull|recovery_wait|undersized|activating|peered...]}
pg ls-by-pool {active|clean|down|replay|split ting|scrubbing|scrubq|degraded|inconsistent|peering|repair|recovering|backfill_wait|incomplete|stale|remapped|deep_scrub|backfill|backfill_toofull|recovery_wait|undersized|activating|peered [active|clean|down|replay|splitting|scrubbing| scrubq|degraded|inconsistent|peering|repair|recovering|backfill_wait|incomplete|stale|remapped|deep_scrub|backfill|backfill_toofull|recovery_wait|undersized|activating|peered...])

Repair

# ceph pg repair 13.e1
instructing pg 13.e1 on osd.110 to repair

Daily troubleshooting
pg inconsistent
appears inconsistent status, it means conformity This problem. The 1scrub error that follows indicates that this is a scrub-related problem

# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors; noout flag(s) set
pg 13 .e1 is active+clean+inconsistent, acting [110,55,21]
1 scrub errors
noout flag(s) set

Use the following operations:

< pre># ceph pg repair 13.e1
instructing pg 13.e1 on osd.110 to repair

Check
At this time, you can see that 13.e1 has entered deep scrub

# ceph health detail
HEALTH_ERR 1 pgs in consistent; 1 pgs repair; 1 scrub errors; noout flag(s) set
pg 13.e1 is active+clean+scrubbing+deep+inconsistent+repair, acting [110,55,21]
1 scrub errors
noout flag(s) set

After waiting for a period of time, you can see that the error disappears, and pg13.e1 also returns to the active+clean state

# ceph health detail
HEALTH_WARN noout flag(s) set
noout flag(s) set

Cause of the problem
ceph will check pg regularly. Inconsistent does not mean it must be There is a data inconsistency. This is just because the data and the check code are inconsistent. When the repair is confirmed, ceph will perform a deep scrub to determine whether the data is inconsistent. If the deep scrub passes, then there is no data problem, only You need to correct the calibration.

request blocked for XXs
Locate the osd whose request is blocked
ceph health detail | grep blocked

Then reduce the primary affinity of the above osd , You can divert part of the pg out. The pressure will be smaller. The previous value can be viewed through the ceph osd tree
ceph osd primary-affinity OSD_ID is lower than the previous value

Main It is still due to the imbalance of the cluster that some OSDs are under excessive pressure. Requests cannot be processed in time. 1. If it occurs frequently, it is recommended to investigate the reasons: 2. If it is because the client IO demand increases, then try to optimize the client to reduce unnecessary Read and write. 3. If it is because part of the osd has been unable to process the request, it is recommended to temporarily reduce the primary affinity of the osd. And keep paying attention, because this may be a precursor to a disk failure. 4. If this occurs in the osd on a journal ssd Problem, it is recommended to check whether the journal ssd has a write bottleneck, or whether it is faulty.

Leave a Comment

Your email address will not be published.