Normal state:
Failure status:
Implementation Replacement steps:
1. Close the data migration of the ceph cluster:
The osd hard disk fails, and the status changes to down. After the time interval set by mod osd down out interval, ceph marks it as out and starts data migration and recovery. In order to reduce the performance impact of ceph’s data recovery or scrub operation, you can temporarily turn it off, and turn it on after the hard disk replacement is completed and the osd is restored:
2, locate fault osd
ceph osd tree | grep -i down
3. Enter the node where the osd fails, and uninstall the osd mount directory
umount /var/lib/ceph/osd/ceph-5
4. Remove osd from the crush map
[[email protected] ~]# ceph osd crush remove osd.5
removed item id 5 name'osd.5' from crush map
5. Delete the fault osd key
[[ email protected] ~]# ceph auth del osd.5
updated
6, delete the fault osd
[[email protected] ~]# ceph osd rm 5
removed osd.5
7. After replacing the new hard disk, pay attention to the new hard disk Symbol and create osd
8. On the deployment node, switch to the cent user and add a new osd< /span>
[[emailprotected] ceph] $ ceph-deploy osd create --data /dev/sdd node3
9, to be After adding the crush map to the new osd, re-enable the cluster disable flag
for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd unset $i;done
ceph cluster after a period of data migration After that, restore the active+clean state
for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd set $i;done
ceph osd tree | grep -i down
umount /var/lib/ceph/osd/ceph-5
[[emailprotected] ~]# ceph osd crush remove osd.5
removed item id 5 name'osd.5' from crush map
[[email protected] ~]# ceph auth del osd.5
updated
[[emailprotected] ~ ]# ceph osd rm 5
removed osd.5
[[email protected] ceph]$ ceph-deploy osd create --data /dev/sdd node3
for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd unset $i;done