评论

收藏

[Unix] Sun SPARC 小机故障硬盘更换方法

服务系统 服务系统 发布于:2021-11-15 13:25 | 阅读数:289 | 评论:0

单位的SUN小机硬盘前阵子指示灯 告警,检查发现是硬盘故障,设备早已脱保,如今都是自己维护,网上查询了很多资料,可用的信息太少,还是自己动手丰衣足食的好,也给有类似设备故障的同僚一点参考

一、   系统介绍
操作系统:Solaris 10
文件系统:ZFS
存储池:
-3.2# zpool status
pool: rpool
state: ONLINE
scan: resilvered 144G in 2h10m with 0 errors on Mon Dec 11 13:56:43 2017  
config:  
  NAME      STATE   READ WRITE CKSUM
  rpool     ONLINE     0   0   0
    mirror-0  ONLINE     0   0   0
    c0t0d0s0  ONLINE     0   0   0
    c0t1d0s0  ONLINE     0   0   0
二、故障描述
存储池rpool中一个磁盘故障,待更换:
-3.2# zpool status
pool: rpool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
  Sufficient replicas exist for the pool to continue functioning in a
  degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device   repaired.
scan: resilvered 197G in 2h24m with 0 errors on Fri May 12 19:15:56 2017
config:
   NAME      STATE   READ WRITE CKSUM
  rpool     DEGRADED   0   0   0
    mirror-0  DEGRADED   0   0   0
    c0t0d0s0  FAULTED    9   618   0  too many errors
    c0t1d0s0  ONLINE     0   0   0
errors: No known data errors
三、操作过程
1.确定磁盘
在换盘过程中,如果无法在外观上确定坏盘,可通过以下命令生成一个10G的文件:
-3.2# mkfile 10G file1
目前两块磁盘中,一个已经是坏盘,所以当生成1个10G文件时,在两块磁盘外观上,无故障的磁盘灯会闪烁,不闪烁的则是故障盘。
2.确定引导盘
bash-3.2# prtconf -vp|grep -i bootpath
  bootpath:  '/pci@0,600000/pci@0/scsi@1/disk@0,0:a'
确定引导盘为disk0,再换盘前需更换为另外一个dis1进行系统引导
重启操作系统到OK界面:
{0} ok boot disk1 (以disk1磁盘进行引导,在用disk1将系统引导起来后再进行故障盘更换工作)
3.更换磁盘
-3.2# zpool offline rpool c0t0d0s0
bash-3.2# zpool status 
pool: rpool
state: DEGRADED
 status: One or more devices has been taken offline by the administrator.
  Sufficient replicas exist for the pool to continue functioning in a
  degraded state.
 action: Online the device using 'zpool online' or replace the device with
  'zpool replace'.
scan: scrub repaired 0 in 2h7m with 0 errors on Thu Jan 24 20:31:42 2019
config:
  NAME      STATE   READ WRITE CKSUM
  rpool     DEGRADED   0   0   0
    mirror-0  DEGRADED   0   0   0
    c0t0d0s0  OFFLINE    0   0   0
    c0t1d0s0  ONLINE     0   0   0
errors: No known data errors
bash-3.2# df -h   
Filesystem  size   used  avail capacity  Mounted on rpool/ROOT/s10s_u11wos_24a
        274G   109G   145G  43%  /
/devices    0K   0K   0K   0%  /devices
ctfs       0K   0K   0K   0%  /system/contract
proc       0K   0K   0K   0%  /proc
mnttab     0K   0K   0K   0%  /etc/mnttab
swap       92G   448K  92G   1%  /etc/svc/volatile
objfs      0K   0K   0K   0%  /system/object
sharefs    0K   0K   0K   0%  /etc/dfs/sharetab
fd       0K   0K   0K   0%  /dev/fd
swap       92G  64K  92G   1%  /tmp
swap       92G  72K  92G   1%  /var/run
rpool/export   274G  34K   145G   1%  /export
rpool/export/home 274G  13M   145G   1%  /export/home
rpool       274G   106K   145G   1%  /rpool
拔出旧磁盘,插入新磁盘
4.建立磁盘分区表
对于系统盘做镜像,需要将两个磁盘的分区表做成一模一样。
-3.2# devfsadm -C
bash-3.2# format 
Searching for disks...done
c0t0d0: configured with capacity of 279.38GB
AVAILABLE DISK SELECTIONS:
   0. c0t0d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
    /pci@0,600000/pci@0/scsi@1/sd@0,0
   1. c0t1d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
    /pci@0,600000/pci@0/scsi@1/sd@1,0
   2. c3t60060E8005638900000063890000000Cd0 <HITACHI-OPEN-V*19   -SUN-6008-4.94TB>
    /scsi_vhci/ssd@g60060e8005638900000063890000000c
   3. c3t60060E80056389000000638900000000d0 <HITACHI-OPEN-V*6  -SUN-6008-1.56TB>
    /scsi_vhci/ssd@g60060e80056389000000638900000000
   4. c3t60060E80056389000000638900000008d0 <HITACHI-OPEN-V*4  -SUN-6008-1.04TB>  oracle
    /scsi_vhci/ssd@g60060e80056389000000638900000008
   5. c3t600507640081002FC0000000000000FCd0 <IBM-2145-0000-5.00TB>
    /scsi_vhci/ssd@g600507640081002fc0000000000000fc
Specify disk (enter its number): 0
selecting c0t0d0
[disk formatted]
Disk not labeled.  Label it now? y  #新盘需打标签,2T以上容量可以打EFI标签(打标签会删除磁盘数据)
FORMAT MENU:
  disk     - select a disk
  type     - select (define) a disk type
  partition  - select (define) a partition table
  current  - describe the current disk
  format   - format and analyze the disk
  repair   - repair a defective sector
  label    - write label to the disk
  analyze  - surface analysis
  defect   - defect list management
  backup   - search for backup labels
  verify   - read and display labels
  save     - save new disk/partition definitions
  inquiry  - show vendor, product and revision
  volname  - set 8-character volume name
  !<cmd>   - execute <cmd>, then return
  quit
format> p
PARTITION MENU:
  0    - change `0' partition
  1    - change `1' partition
  2    - change `2' partition
  3    - change `3' partition
  4    - change `4' partition
  5    - change `5' partition
  6    - change `6' partition
  7    - change `7' partition
  select - select a predefined table
  modify - modify a predefined partition table
  name   - name the current table
  print  - display the current table
  label  - write partition map and label to the disk
  !<cmd> - execute <cmd>, then return
  quit
partition> p
Current partition table (original):
Total disk cylinders available: 46873 + 2 (reserved cylinders)
Part    Tag  Flag   Cylinders     Size      Blocks
 0     root  wm     0 -  20    128.17MB  (21/0/0)   262500
 1     swap  wu    21 -  41    128.17MB   (21/0/0)  262500
 2   backup  wu     0 - 46872    279.38GB   (46873/0/0) 585912500
 3 unassigned  wm     0        0    (0/0/0)    0
 4 unassigned  wm     0        0    (0/0/0)    0
 5 unassigned  wm     0        0    (0/0/0)    0
 6    usr  wm    42 - 46872    279.13GB   (46831/0/0) 585387500
 7 unassigned  wm     0        0    (0/0/0)    0
以上是新盘“c0t0d0”的磁盘的分区结构,
查看之前无故障的磁盘“c0t1d0”的分区结构:
Tag  Flag   Cylinders     Size         Blocks
 0     root  wm     0 - 46872  279.38GB (46873/0/0) 585912500
 1 unassigned  wu     0        0 (0/0/0)       0
 2   backup  wm     0 - 46872    279.38G (46873/0/0)  585912500
 3 unassigned  wu     0        0     (0/0/0)    0
 4 unassigned  wu     0        0     (0/0/0)  0
 5 unassigned  wu     0        0     (0/0/0)  0
 6 unassigned  wu     0        0     (0/0/0)  0
 7 unassigned  wu     0        0     (0/0/0)  0
通过对比可看出新磁盘“c0t0d0”与旧磁盘“c0t1d0”的分区结构是不一样的,由于两块磁盘彼此是镜像的结构,所以在进行数据同步前,先将新磁盘“c0t0d0”的分区按照旧磁盘“c0t1d0”进行分区设置,如下图所示,分区表已经拷贝一模一样,注意磁盘末尾是分片“s2”(分片 2,表示带有 VTOC 标签的整个磁盘。)
-3.2# prtvtoc /dev/rdsk/c0t1d0s2|fmthard -s - /dev/rdsk/c0t0d0s2(新盘)
fmthard:  New volume table of contents now in place.
#将“c0t0d0s2”的分区按照“c0t1d0s2”的分区表进行复制设置
-3.2# format   # c0t0d0s2的分区表如下所示
Part    Tag  Flag   Cylinders    Size        Blocks
 0     root  wm     0 - 46872  279.38GB (46873/0/0)  585912500
 1 unassigned  wu     0       0   (0/0/0)       0
 2   backup  wm     0 - 46872   279.38GB(46873/0/0)  585912500
 3 unassigned  wu     0       0  (0/0/0)       0
 4 unassigned  wu     0       0  (0/0/0)       0
 5 unassigned  wu     0       0  (0/0/0)       0
 6 unassigned  wu     0       0  (0/0/0)       0
 7 unassigned  wu     0       0  (0/0/0)       0
5.  磁盘镜像(数据同步)
目前系统只有一块磁盘在正常工作:
-3.2# zpool status   
pool: rpool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
  Sufficient replicas exist for the pool to continue functioning in a
  degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
  repaired.
 scan: resilvered 197G in 2h24m with 0 errors on Fri May 12 19:15:56 2017
config:
  NAME      STATE   READ WRITE CKSUM
  rpool     DEGRADED   0   0   0
    mirror-0  DEGRADED   0   0   0
    c0t0d0s0  FAULTED    9   618   0  too many errors
    c0t1d0s0  ONLINE     0   0   0
errors: No known data errors
将新换的磁盘加入存储池,制作磁盘镜像,开始数据拷贝
-3.2# zpool replace rpool c0t0d0s0
Make sure to wait until resilver is done before rebooting.
bash-3.2# zpool status
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
  continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Mon Dec 11 11:46:11 2017
   12.4M scanned out of 144G at 906K/s, 46h25m to go
   12.2M resilvered, 0.01% done
config:
  NAME        STATE   READ WRITE CKSUM
  rpool         DEGRADED   0   0   0
    mirror-0      DEGRADED   0   0   0
    replacing-0   DEGRADED   0   0   0
      c0t0d0s0/old  FAULTED    9   618   0  too many errors
      c0t0d0s0    ONLINE     0   0   0  (resilvering)
    c0t1d0s0    ONLINE     0   0   0
errors: No known data errors
bash-3.2# iostat -xn 3
        extended device statistics        
r/s  w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
1.0  4.8   39.9   79.3  0.0  0.2  0.0   28.1   0   3 c0t0d0
1.5  6.7   35.0  114.8  0.0  0.2  0.0   23.5   0   3 c0t1d0
0.0  0.0  0.7  0.0  0.0  0.0  0.0  2.8   0   0 c3t60060E8005638900000063890000000Cd0
2.0  1.0  1.0  0.5  0.0  0.0  0.0  0.3   0   0 c3t60060E80056389000000638900000008d0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.2   0   0 c3t60060E80056389000000638900000000d0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 c3t600507640081002FC0000000000000FCd0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 rdms02b:vold(pid523)
        extended device statistics        
r/s  w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  358.8  0.0 10546.3  0.0  8.4  0.0   23.3   0  98 c0t0d0
 475.3   37.0 10446.3  392.8  0.0  2.2  0.0  4.2   0  56 c0t1d0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 c3t60060E8005638900000063890000000Cd0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 c3t60060E80056389000000638900000008d0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 c3t60060E80056389000000638900000000d0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 c3t600507640081002FC0000000000000FCd0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 rdms02b:vold(pid523)
        extended device statistics        
r/s  w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0  405.2  0.0 11338.2  0.0  9.1  0.0   22.4   0 100 c0t0d0
647.6  0.0 11317.5  0.0  0.0  1.4  0.0  2.1   0  44 c0t1d0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 c3t60060E8005638900000063890000000Cd0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 c3t60060E80056389000000638900000008d0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 c3t60060E80056389000000638900000000d0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 c3t600507640081002FC0000000000000FCd0
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0 rdms02b:vold(pid523
bash-3.2# zpool status
pool: backup
state: ONLINE
scan: none requested
config:
  NAME                   STATE   READ WRITE CKSUM
  backup                   ONLINE     0   0   0
    c3t60060E8005638900000063890000000Cd0  ONLINE     0   0   0
errors: No known data errors
 pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
  continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Dec 11 11:46:11 2017
107G scanned out of 144G at 17.5M/s, 0h36m to go
107G resilvered, 73.92% done
config:
  NAME        STATE   READ WRITE CKSUM
  rpool         DEGRADED   0   0   0
    mirror-0      DEGRADED   0   0   0
    replacing-0   DEGRADED   0   0   0
      c0t0d0s0/old  FAULTED    9   618   0  too many errors
      c0t0d0s0    ONLINE     0   0   0  (resilvering)
    c0t1d0s0    ONLINE     0   0   0
errors: No known data errors
数据同步完成后如下所示,磁盘更换完毕
-3.2# zpool status
pool: backup
state: ONLINE
scan: none requested
config:
  NAME                   STATE   READ WRITE CKSUM
  backup                   ONLINE     0   0   0
    c3t60060E8005638900000063890000000Cd0  ONLINE     0   0   0
errors: No known data errors
pool: rpool
state: ONLINE
scan: resilvered 144G in 2h10m with 0 errors on Mon Dec 11 13:56:43 2017
config:
  NAME      STATE   READ WRITE CKSUM
  rpool     ONLINE     0   0   0
    mirror-0  ONLINE     0   0   0
    c0t0d0s0  ONLINE     0   0   0
    c0t1d0s0  ONLINE     0   0   0
errors: No known data errors
6.硬盘测试
重启系统,在xscf界面下,用新换的磁盘启动系统,若启动无问题,则磁盘更换成功:
{0} ok boot disk0
 Boot device: /pci@0,600000/pci@0/scsi@1/disk@0  File and args:
DSC0000.png
</div>
    
    <div id="asideoffset"></div>

关注下面的标签,发现更多相似文章