The state database contains the configuration and status information of all volumes, hot spares and disk sets. To provide redundancy we create multiple copies of the state database called as state database replicas. Now in case of losing any state database replica SVM determines the valid state database replica by using majority consensus algorithm. According to the algorithm it is required to have atleast (half + 1) to be available at boot time to be able to consider any of them to be valid. Each replica takes around 4 MB of size. It is ideal to create 3 of them on each disk in case you are mirroring the root disk. The post describes the meaning of the fields displayed in the metadb -i command.
When running “metadb -i” command the following output is displayed:
# metadb -i flags first blk block count a u 16 8192 /dev/dsk/c3t600A0B800011BABC000036A151C1B85Ad0s0 a u 8208 8192 /dev/dsk/c3t600A0B800011BABC000036A151C1B85Ad0s0 a u 16400 8192 /dev/dsk/c3t600A0B800011BABC000036A151C1B85Ad0s0
r - replica does not have device relocation information o - replica active prior to last mddb configuration change u - replica is up to date l - locator for this replica was read successfully c - replica's location was in /etc/lvm/mddb.cf p - replica's location was patched in kernel m - replica is master, this is replica selected as input W - replica has device write errors a - replica is active, commits are occurring to this replica M - replica had problem with master blocks D - replica had problem with data blocks F - replica had format problems S - replica is too small to hold current data base R - replica had device read errors
The characters in front of the device name represent the status of the state database. Explanations of the characters are displayed following the replicas status
The flags “m”,”p”,”c”,”l”,”o” are only set after reboot:
For example :
# metadb - a -f -c 3 /dev/rdsk/c3t600A0B800011BABC000036A151C1B85Ad0s0
# metadb flags first blk block count a u 16 8192 /dev/dsk/c3t600A0B800011BABC000036A151C1B85Ad0s0 a u 8208 8192 /dev/dsk/c3t600A0B800011BABC000036A151C1B85Ad0s0 a u 16400 8192 /dev/dsk/c3t600A0B800011BABC000036A151C1B85Ad0s0
# metadb flags first blk block count a m pc luo 16 8192 /dev/dsk/c3t600A0B800011BABC000036A151C1B85Ad0s0 a pc luo 8208 8192 /dev/dsk/c3t600A0B800011BABC000036A151C1B85Ad0s0 a pc luo 16400 8192 /dev/dsk/c3t600A0B800011BABC000036A151C1B85Ad0s0
t = tagged data is associated with replicas r = replica does not have device relocation information
For various reasons it may be necessary to gather a crash dump while booted from alternate media, for example if the system is not booting properly from the normal boot disk. This document outlines the steps required to accomplish this.
In general, the steps required are:
- Boot from the alternate media
- Mount the root filesystem
- Determine the dump device
- Save the crashdump from the dump device
- Unmount the root (and other) filesystems
1. Boot from the Alternate Media
To boot from CDROM or DVD media use (in case of SPARC machine):
ok> boot cdrom -s
In case of a x86/x64 hardware use the below guide to boot from DVD :
2. Mount the root filesystem
1. Root Filesystem on UFS on a simple slice
Assuming that the root filesystem resides on c0t0d0s0 when booted from the alternate media, and that we are not using Solaris Volume Manager. Ensure that the device has consistent metadata, then mount the root filesystem.
# fsck -y /dev/rdsk/c0t0d0s0 # mount /dev/dsk/c0t0d0s0 /a
Determine if /var or /usr are separate mount points, and if so mount them.
# egrep '/usr|/var' /a/etc/vfstab /dev/dsk/c0t0d0s3 /dev/rdsk/c0t0d0s3 /var ufs 1 yes logging # mount /dev/dsk/c0t0d0s3 /a/var
2. Root filesystem is a zpool and the dump device is a zvol
Import the root zpool without mounting any filesystems
# zpool import -fN rpool
Find the root filesystem we are interested in and mount it as /a.
# zfs list NAME USED AVAIL REFER MOUNTPOINT ... rpool/ROOT/Solaris10-2 20.9G 79.8G 13.4G / ...
# zfs mount rpool/ROOT/Solaris10-2 /a
If the zfs list shows /var and/or /usr these should also be similarly mounted on /a/var and /a/usr.
3. Determine the dump device
Given that we have our root filesystem mounted on /a, have a look at the dump configuration:
# cat /a/etc/dumpadm.conf ... DUMPADM_DEVICE=/dev/zvol/dsk/rpool/dump DUMPADM_SAVDIR=/var/crash/hostname DUMPADM_CONTENT=kernel DUMPADM_ENABLE=no DUMPADM_CSAVE=on
Take note of DUMPADM_DEVICE (where we will read the crashdump from) and DUMPADM_SAVEDIR (where we will save the crashdump to).
4. Save the crashdump
Assuming that you want to write the crashdump into the default dump directory (DUMPADM_SAVDIR above), run :
# savecore -dv -f DUMPADM_DEVICE DUMPADM_SAVDIR
substituting the values from step 3 for DUMPADM_DEVICE and DUMPADM_SAVDIR. Verify that you have new crashdump in the DUMPADM_SAVEDIR directory.
5. Unmount the root (and other) filesystems
If you have mounted any other filesystem they will need to be unmounted before unmounting /a. For example if you had a /var mounted:
# umount /a/var # umount /a
The Open Boot PROM(OBP) aliases in solaris SPARC environments are created to simplify the access to hardware devices using user friendly names. They can be used in place of the full OBP hardware path at the “ok” prompt. The post discusses the procedure to set the Open Boot Prom (OBP) environment for SPARC systems properly if system is mirrored with Solaris Volume Manager (SVM).
1. Identify the root disk and mirror disk
To identify the root and mirror disk used for the meta device for / (root) file system :
# df -lh / Filesystem size used avail capacity Mounted on /dev/md/dsk/d10 12G 5.1G 6.3G 45% /
Look for the mirror setup and find the submirrors constituting the mirror d10.
# metastat -p d10 d10 -m d11 d12 1 d11 1 1 c0t0d0s0 d12 1 1 c0t1d0s0
2. Identify device path of physical disk
From the output of the metastat command output get the 2 disks and find the physical device path of the disks.
# ls -l /dev/dsk/c0t0d0s0 lrwxrwxrwx 1 root root 47 Dec 8 2011 /dev/dsk/c0t0d0s0 -> ../..[email protected],[email protected][email protected],0:a,raw # ls -l /dev/dsk/c0t1d0s0 lrwxrwxrwx 1 root root 47 Dec 8 2011 /dev/dsk/c0t1d0s0 -> ../..[email protected],[email protected][email protected],0:a,raw
or at OBP
ok show-disks a) [email protected],[email protected],1/disk b) [email protected],[email protected]/disk
3. Setup alias at OBP
Once we know the physical path of the disks, we can set an user friendly alias name for both the root disk and mirror disk at ok prompt.
ok> nvalias rootdisk [email protected],[email protected][email protected],0:a ok> nvalias rootmirror [email protected],[email protected][email protected],0:a
4. Setup boot-device and diag-device in OBP
We would also need to set boot device as the 2 disk device aliases we just created to boot.
ok> setenv boot-device rootdisk rootmirror ok> printenv boot-device
Add also to diag-device ,because if diag-switch is set then diag-device is used for booting instead of boot-device variable.
ok> setenv diag-device rootdisk rootmirror
5. Try booting from each device alias
The final verification step is to try booting from each device alias we created. This also verifies that the system boots from both the submirrors of the SVM mirror.
ok> boot rootdisk # init 0 ok> boot rootmirror
Like many others, I am a big fan of live upgrade when it comes to upgrading/patching solaris. This post is for the system admins who still wants to use the traditional method of patching for whatever reason they want to.
The example system we will be using has SVM based mirrored root.
1. Check the health of all metadatabase replicas (metadbs) and SVM metadevices.
# metastat # metastat -c # metadb -i
2. Check the current boot device.
# prtconf -vp | grep -i boot bootpath: '[email protected],0/pci15ad,[email protected][email protected],0:a' boot-device: [email protected],0/pci15ad,[email protected][email protected],0:a disk0:a'
3. Confirm the boot disk and mirror disk from the format output.
# echo | format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 [email protected],0/pci15ad,[email protected][email protected],0 1. c1t1d0 [email protected],0/pci15ad,[email protected][email protected],0 Specify disk (enter its number): Specify disk (enter its number):
As seen above, here the root disk is c1t0d0 and the mirror rootdisk is c1t1d0.
4. Backup important command outputs.
# df -h # netstat -nrv # ifconfig -a # metastat -c # metastat # swap -l # cat /etc/vfstab # uname -a # showrev -p
5. Install the bootblock in the root mirror disk c1t1d0, to make sure it is bootable.
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t1d0
6. Detach and delete the SVM mirrors.
# metadetach -f d10 d12 ## detach root mirror # metadetach -f d20 d22 ## detach swap mirror # metaclear d12 # metaclear d22
7. Clear the metadatabase replicas from the root mirror disk.
# metadb -d /dev/dsk/c1t1d0sX ## X is the slice on which metadb was created on root mirror disk
8. Mount the mirror root disk and replace the SVM related entries in the /etc/vfstab and /etc/system to prevent SVM to start on boot from the root mirror disk.
# mount /dev/dsk/c1t1d0 /mnt # vi /etc/fstab #device device mount FS fsck mount mount #to mount to fsck point type pass at boot options # fd - /dev/fd fd - no - /proc - /proc proc - no - /dev/dsk/c1t1d0s1 - - swap - no - /dev/dsk/c1t1d0s0 /dev/rdsk/c1t1d0s0 / ufs 1 no - /devices - /devices devfs - no - sharefs - /etc/dfs/sharetab sharefs - no - ctfs - /system/contract ctfs - no - objfs - /system/object objfs - no - swap - /tmp tmpfs - yes -
Remove the entries related to SVM from the /etc/system file.
# vi /etc/system * Begin MDD root info (do not edit) rootdev:[email protected]:0,10,blk * End MDD root info (do not edit) set md:mirrored_root_flag = 1
9. Confirm if the server boots from the un-encapsulated SVM root mirror disk.
# init 0 ok> boot rootmirror (root mirror is a devalias at OBP)
1. Boot the server into single user mode.
# init 0 ok> boot -s
2. Unzip the patchset bundle and look for the passcode.
# unzip -q sol10_Recommended.zip # grep PASSCODE sol10_Recommended/sol10_Recommended.README
3. Install the patch cluster.
# cd sol10_Recommended # ./installcluster --[passcode]
4. Check for any errors or warnings during the installation. In no error found reboot the server in multi-user mode.
# init 6
Reattaching the mirror disk
Once we successfully install the patchset bundle and are sure to go ahead with it, we can re-mirror the root disk with the mirror.
1. Create the metadatabase replicas on the root mirror disk.
# metadb -a -c 3 c1t1d0s7
2. re-create and attach the root and swap mirrors.
# metainit d12 1 1 c1t1d0s0 # metainit d22 1 1 c1t1d0s1 # metattach d10 d12 # metattach d20 d22 # metastat | grep -i sync (check sync status)
In case the patching gave you an error or if there is some issue with applications not working properly after patching, you can always roll back the patching and boot from the old patch level. The way to do this is, we re-encapsulate the un-encapsulated rootmirror disk under SVM and mirror it with the original root disk. This should look the figure below, once it is completed.
1. Boot from the un-encapsulated SVM root mirror disk.
# init 0 ok> boot rootmirror
2. Copy the partition table from rootmirror disk to root disk.
# prtvtoc /dev/rdsk/c1t1d0s2 | fmthard -s - /dev/rdsk/c1t0d0s2
3. Create the state database replicas(metadbs).
# metadb -afc 3 c1t1d0s7 c1t0d0s7
4. Create the sub-mirror metadevices.
# metainit -f d11 1 1 c1t1d0s0 # metainit -f d12 1 1 c1t0d0s0
5. Create the mirror d10 for root.
# metainit d10 -m d11 # metaroot d10
6. Add the below entry in the /etc/system file which allows the system to boot with less than or equal to half the total metadbs.
vi /etc/system set md:mirrored_root_flag = 1
7. Reboot the system
# init 6
8. Attach the sub-mirror d12 to mirror d10 to sync the data.
# metattach d10 d12
9. Re-create swap under root partition.
# swap -l swapfile dev swaplo blocks free /dev/dsk/c1t1d0s1 30,1 8 1548280 1548280 # swap -d /dev/dsk/c1t1d0s1 # metainit d22 1 1 c1t0d0s1 # metainit d21 1 1 c1t1d0s1 # metainit d20 -m d21 # metattach d20 d22 Change the /etc/vfstab entry for the new swap. /dev/dsk/c1t1d0s1 - - swap - no - to: /dev/md/dsk/d20 - - swap - no -
Add the swap again and set as dump device.
# swap -a /dev/md/dsk/d20 # dumpadm -d swap
10. Set the OBP variables to reflect the new aliases
# ls -l /dev/dsk/c1t0d0s0 lrwxrwxrwx 1 root root 46 Nov 16 12:35 /dev/dsk/c1t0d0s0 -> ../..[email protected],0/pci15ad,[email protected][email protected],0:a # ls -l /dev/dsk/c1t1d0s0 lrwxrwxrwx 1 root root 46 Nov 16 12:35 /dev/dsk/c1t1d0s0 -> ../..[email protected],0/pci15ad,[email protected][email protected],0:a
# eeprom "nvramrc=devalias rootmirror [email protected],0/pci15ad,[email protected][email protected],0:a rootdisk [email protected],0/pci15ad,[email protected][email protected],0:a"
11. Install the bootblock on the new mirror disk.
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0
This article describes the method used to maintain device relocation information for Solaris Volume Manager (SVM). Device relocation information is based on device IDs, and care must be taken to ensure that it is updated when a disk is replaced. Device IDs are unique identifiers branded to all disk drives and LUNs (SCSI, fibre channel, SAS, etc.). Every time a disk is incorporated into SVM, this identifier is read and stored in the replica database. This allows SVM to access the disk by its device ID, as opposed to its device name. The previous practice of accessing drives by device names, such as c2t4d3s0, allowed the possibility of a problem if these names changed during a reboot. Device ID provides a static, constant access name which persists for the life of the disk.
Updating device relocation information
1. It is possible to disable device relocation information. To verify that it is in use on your system, use the metastat command. If it is enabled, information for each disk will appear at the very end of metastat output.
# metastat ...... Device Relocation Information: Device Reloc Device ID c2t5d0 Yes id1,[email protected] c2t4d0 Yes id1,[email protected] c0t9d0 Yes id1,[email protected]_ST39173W_SUN9.0GLMD69076000079291K9W c0t10d0 Yes id1,[email protected]_ST39173W_SUN9.0GLMD7772800007930HKZ1
2. After a disk replacement, the device ID of the new disk must be updated. Since SVM takes its information from Solaris, you must be sure that Solaris sees the new device. This is obvious with the replacement of fibre channel and SAS disks when you view the device name.
# ls -l (before) /dev/rdsk/c2t6d0s2 -> ../..[email protected],[email protected]/SUNW,[email protected][email protected],[email protected],0:c,raw
# ls -l (after) /dev/rdsk/c2t6d0s2 -> ../..[email protected],[email protected]/SUNW,[email protected][email protected],[email protected],0:c,raw
3. Use the metadevadm command to update device relocation information. In the example below, the device relocation information change parallels that seen in the device names above. This is an indication that the metadevadm procedure succeeded.
# metadevadm -u c2t6d0 Updating Solaris Volume Manager device relocation information for c2t6d0 Old device reloc information: id1,[email protected] New device reloc information: id1,[email protected]
4. Device ID updates for SCSI disks are less obvious, because the device name will not change after the disk is replaced.
5. Because the device names don’t change, you can reliably verify if device relocation information has been correctly updated by examining the output of metadevadm, or by comparing current and previous output of metastat.
# metadevadm -u c0t9d0 Updating Solaris Volume Manager device relocation information for c0t9d0 Old device reloc information: id1,[email protected]_ST39173W_SUN9.0GLMD69076000079291K9W New device reloc information: id1,[email protected]_ST39173W_SUN9.0GLMD69076000079291K9W
6. If metadevadm command is not used, SVM will fail the disk as soon as Solaris recognizes the new device ID. This usually occurs during a reboot. The following message is logged :
Jun 22 18:22:57 host1 metadevadm: [ID 209699 daemon.error] Invalid device relocation information detected in Solaris Volume Manager
The primary cause for this error is an improper disk replacement procedure.
Most of the storage arrays now-a-days provides the feature of dynamic LUN expansion. This feature allows you to grow your existing volume on the fly without affecting existing data or I/O. Dynamic LUN expansion increases the capacity of the physical storage. You must then make Solaris aware that the device has grown, and if a file system resides on the device, it must also be grown.
Before LUN expansion
1. Here is the disk that was used to create the SVM mwtadevice as seen in the format command output :
# format < . . . . > 97. c7t600A0B80002FBC5D00001AC952B3294Cd0 [SUN-LCSM100_F-0670 cyl 51198 alt 2 hd 128 sec 64] svm-vol [email protected] format> partition partition> print Part Tag Flag Cylinders Size Blocks 2 backup wu 0 - 51197 199.99GB (51198/0/0) 419414016
2. Here is the sequence of commands that originally created the metadevice and UFS file system on this LUN :
# metainit d100 1 1 c7t600A0B80002FBC5D00001AC952B3294Cd0s2 # newfs /dev/md/rdsk/d100 # mkdir /svm-vol # mount /dev/md/dsk/d100 /svm-vol # df -k /svm-vol Filesystem kbytes used avail capacity Mounted on /dev/md/dsk/d100 206532277 204809 204262146 1% /svm-vol
After LUN expansion
In this example, 50GB are added to the existing volume on the storage array. The steps involved in adding the space in SVM are :
1. Delete the metadevice. Doing so has no effect on the data.
2. Expand the disk device.
3. Recreate the metadevice.
4. Mount and grow the file system.
Step 1: Delete the metadevice
Document the metadevice information properly and then delete the metadevice.
# metastat -p d100 d100 1 1 /dev/dsk/c7t600A0B80002FBC5D00001AC952B3294Cd0s2 # umount /svm-vol # metaclear d100 d100: Concat/Stripe is cleared
Step 2 : Expand the disk device
# format c7t600A0B80002FBC5D00001AC952B3294Cd0 selecting c7t600A0B80002FBC5D00001AC952B3294Cd0: svm-vol format> type AVAILABLE DRIVE TYPES: 0. Auto configure < . . . . > Specify disk type (enter its number): 0 c7t600A0B80002FBC5D00001AC952B3294Cd0: configured with capacity of 249.99GB [SUN-LCSM100_F-0670 cyl 63998 alt 2 hd 128 sec 64] format> partition partition> 2 Enter partition id tag[backup]: [Enter] Enter partition permission flags[wu]: [Enter] Enter new starting cyl: (0 was the value prior to the expansion) Enter partition size[524271616b, 63998c, 63997e, 255992.00mb, 249.99gb]: $ partition> label Ready to label disk, continue? yes partition> quit format> quit
Step 3 : Re-create the metadevice
# metainit d100 1 1 /dev/dsk/c7t600A0B80002FBC5D00001AC952B3294Cd0s2 d100: Concat/Stripe is setup
Step 4 : Mount and grow the file system
# mount /dev/md/dsk/d100 /svm-vol # growfs -M /svm-vol /dev/md/rdsk/d100 # metastat d100 d100: Concat/Stripe Size: 524271616 blocks (249 GB) Stripe 0: Device Start Block Dbase Reloc /dev/dsk/c7t600A0B80002FBC5D00001AC952B3294Cd0s2 0 No Yes
Verify the df -h output to confirm the filesystem space has increased.
# df -k /svm-vol Filesystem kbytes used avail capacity Mounted on /dev/md/dsk/d100 258167212 256009 255845881 1% /svm-vol
The LUN on the storage array has been expanded. Additional space has been given to the Solaris disk device. The metadevice has been recreated to pick up the new size. The UFS file system has been grown to take advantage of the added space. The operation is complete.
There may be times when you want to boot from the unencapsulated SVM root disk by detaching one submirror. The term unencapsulation refers to a method of taking disk out of SVM control while retaining its content. The reason to do this can be many – For patching or when the system is not bootable when under SVM (this offcourse can not be done in multi-user environment and requires you to boot the system in single user-mode. The procedure remains the same in that case too)
The setup where we are performing the unencapsulation is as below :
We begin with a system in which the root file system and swap are mirrored on c0t0d0 and c0t1d0. At the end of this procedure, c0t0d0 will be unchanged and will still boot under SVM control, and c0t1d0 will boot directly from slices. The two disks under a mirrored SVM configuration are :
# echo | format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0 [SUN36G cyl 24620 alt 2 hd 27 sec 107] [email protected],[email protected][email protected],0 1. c0t1d0 [SUN36G cyl 24620 alt 2 hd 27 sec 107] [email protected],[email protected][email protected],0
The /etc/fstab entries for the SVM mirrors are :
# cat /etc/vfstab #device device mount FS fsck mount mount #to mount to fsck point type pass at boot options … /dev/md/dsk/d20 - - swap - no - /dev/md/dsk/d10 /dev/md/rdsk/d10 / ufs 1 no -
We have 6 state metadatabase replicas (3 on each disk) :
# metadb flags first blk block count a m p luo 16 8192 /dev/dsk/c0t0d0s7 a p luo 8208 8192 /dev/dsk/c0t0d0s7 a p luo 16400 8192 /dev/dsk/c0t0d0s7 a p luo 16 8192 /dev/dsk/c0t1d0s7 a p luo 8208 8192 /dev/dsk/c0t1d0s7 a p luo 16400 8192 /dev/dsk/c0t1d0s7
The root and swap partition and their SVM mirrors :
# metastat -p d20 -m d21 d22 1 d21 1 1 c0t0d0s1 d22 1 1 c0t1d0s1 d10 -m d11 d12 1 d11 1 1 c0t0d0s0 d12 1 1 c0t1d0s0 # df -h / Filesystem size used avail capacity Mounted on /dev/md/dsk/d10 29G 8.2G 20G 29% / # swap -l swapfile dev swaplo blocks free /dev/md/dsk/d20 85,20 16 8389632 8389632
Lets begin with the unencapsulation procedure.
Step 1 : Use metadetach and metaclear to detach and remove submirrors on c0t1d0. (If the submirrors have already been detached, do not repeat the metadetach commands; just use metaclear to remove them.)
# metadetach d10 d12 d10: submirror d12 is detached # metadetach d20 d22 d20: submirror d22 is detached # metaclear d12 d22 d12: Concat/Stripe is cleared d22: Concat/Stripe is cleared # metastat -p d10 d20 d10 -m d11 1 d11 1 1 c0t0d0s0 d20 -m d21 1 d21 1 1 c0t0d0s1
Step 2 : Remove state database replicas from c0t1d0. (This step is technically not necessary because replicas can exist on this disk even if it is not under SVM control. We include it here to remove all traces of SVM on c0t1d0.)
# metadb -d c0t1d0s7 # metadb flags first blk block count a m p luo 16 8192 /dev/dsk/c0t0d0s7 a p luo 8208 8192 /dev/dsk/c0t0d0s7 a p luo 16400 8192 /dev/dsk/c0t0d0s7
Step 3 : Mount the root file system on c0t1d0 on /mnt.
# mount /dev/dsk/c0t1d0s0 /mnt
Step 4 : Edit /mnt/etc/system to remove the following lines :
* Begin MDD root info (do not edit) rootdev:[email protected]:0,10,blk * End MDD root info (do not edit)
Step 5 : Edit /mnt/etc/vfstab to change entries from metadevices to Solaris slices :
Before: /dev/md/dsk/d20 - - swap - no - /dev/md/dsk/d10 /dev/md/rdsk/d10 / ufs 1 no - After: /dev/dsk/c0t1d0s1 - - swap - no - /dev/dsk/c0t1d0s0 /dev/rdsk/c0t1d0s0 / ufs 1 no -
Step 6 : Use “init 0” and then boot from OBP to reboot the system. Because changes have been made to the root file system mounted on /mnt, this will update the boot archive for that disk, /mnt/platform/sun4u/boot_archive.
# init 0 Creating boot_archive for /mnt updating /mnt/platform/sun4u/boot_archive … ok> boot mirrordisk
Refer the post How to identify primary and alternate boot disk to identify altername rootdisk (mirrordisk) above.
Step 7 : Verify that the system started from the non-mirrored disk c0t1d0s0
# df -h Filesystem size used avail capacity Mounted on /dev/dsk/c0t1d0s0 29G 8.2G 20G 29% / # swap -l swapfile dev swaplo blocks free /dev/dsk/c0t1d0s1 32,33 16 8389632 8389632
Reestablishing the mirrors
In case if you want to have the original setup of a mirrored root under SVM, here is what you do. In this example the data on d10/d11 will be copied to d12 and d20/21 data is copied over to d22 respectively. Firstly, boot from the SVM submiror disk (aliased as rootdisk) :
ok> boot rootdisk (I assume that this is still a SVM root mirror which have one submirror active)
Create the submirrors d12 (root), d22 (swap) and attach them to the existing mirror d10 and d20 respectively :
# metainit d12 1 1 c0t1d0s0 d12: Concat/Stripe is setup # metainit d22 1 1 c0t1d0s1 d22: Concat/Stripe is setup # metattach d10 d12 d10: submirror d12 is attached # metattach d20 d22 d20: submirror d22 is attached
Check if the resyncing process has started :
# metastat d10 d20 | grep Resync (to monitor the resyncing process) State: Resyncing Resync in progress: 1 % done State: Resyncing State: Resyncing Resync in progress: 12 % done State: Resyncing
If the metadb was removed in step 2 then add it back with :
# metadb -a -c 3 c0t1d0s7
Why Soft Partitions ?
An obvious question in everybody’s mind – why the hell do we need the soft partitions. Well, the simple answer is – A limitation existed in that a disk can only have 8 partitions, thus limiting the number of metadevices to 8 per disk. This may have originally been an acceptable limitation, but as disk sizes increased, the restriction became unmanageable. This is where soft partitions come into play. Soft partitioning allows a disk to be subdivided into many partitions which are controlled and maintained by SVM, thereby removing the 8-metadevice limitation per disk.
Soft partition configuration information is written in two places :
– the state database replicas
– directly onto the disk, in the extent header
Interesting facts about soft partitions
– Soft partitioning was introduced in Solaris 8 with SVM product patch 108693-06.
– Soft partitions are managed with the md_sp kernel driver.
# modinfo | grep md_sp 228 78328000 4743 - 1 md_sp (Meta disk soft partition module)
Creating a soft partition
The -p option in the metainit command refers to a soft partition.
# metainit softpart -p [-e] component size
Creating Soft Partitions from a Single Disk
# metainit d100 -p -e c1t10d0 2gb
- The -e option requires that the name of the disk supplied be in the form c#t#d#. - The last parameter (2gb) specifies the initial size of the soft partition. - The sizes can be specified in blocks, kilobytes, megabytes, gigabytes, and terabytes. - The -e option causes the disk to be repartitioned. One partition (other than 0) will contain enough space to hold a replica (although no replica is actually created) and slice 0 will be the remainder of the drive. The soft partition that is being created is put into slice 0. - Further soft partitions can be created on slice 0. See the next example.
# metainit d200 -p c0t10d0s0 3gb
This will create a soft partition on the specified slice. No repartitioning of the disk is done. This soft partition starts where the previous soft partition ended. No overlap will occur. Soft partitions may be continually created, providing space is available on the drive.
A subsequent metastat (-p) will show the soft partitions, and their respective locations on the disk.
# metastat -p d200 -p c0t10d0s0 -o 4194306 -b 6291456 d100 -p c0t10d0s0 -o 1 -b 4194304 # metastat d200: Soft Partition Device: c0t10d0s0 State: Okay Size: 6291456 blocks (3.0 GB) Device Start Block Dbase Reloc c0t10d0s0 0 No Yes Extent Start Block Block count 0 4194306 6291456 d100: Soft Partition Device: c0t10d0s0 State: Okay Size: 4194304 blocks (2.0 GB) Device Start Block Dbase Reloc c0t10d0s0 0 No Yes Extent Start Block Block count 0 1 4194304
Mirroring Soft partitions
The proper method to mirror soft partitions begins with creating a large mirror metadevice from slices, which you then divide into multiple soft partitions.
Create your mirror first :
# metainit d10 1 1 c0t8d0s1 # metainit d11 1 1 c0t9d0s1 # metainit d1 -m d10 # metattach d1 d11
Create soft partition from mirror :
# metainit d100 -p d1 1gb d100: Soft Partition is setup
View results with metastat and metastat -p
# metastat -p d1 -m d10 d11 1 d10 1 1 c0t8d0s1 d11 1 1 c0t9d0s1 d100 -p d1 -o 32 -b 2097152 # metastat d1: Mirror Submirror 0: d10 State: Okay Submirror 1: d11 State: Okay Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 17465344 blocks (8.3 GB) d10: Submirror of d1 State: Okay Size: 17465344 blocks (8.3 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t8d0s1 0 No Okay Yes d11: Submirror of d1 State: Okay Size: 17465344 blocks (8.3 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t9d0s1 0 No Okay Yes d100: Soft Partition Device: d1 State: Okay Size: 2097152 blocks (1.0 GB) Extent Start Block Block count 0 32 2097152
Creating Soft Partitions from a RAID-5 Metadevice
The proper method to configure soft partitions for use in a RAID-5 metadevice begins with creating a large RAID-5 metadevice from slices, which you then divide into multiple soft partitions.
Create your RAID-5 metadevice :
# metainit d5 -r c0t8d0s1 c0t9d0s1 c0t10d0s1
Create soft partitions from the RAID-5 metadevice :
# metainit d200 -p d5 1gb d200: Soft Partition is setup # metainit d201 -p d5 3gb d201: Soft Partition is setup
View results with metastat and metastat -p :
# metastat -p d5 -r c0t8d0s1 c0t9d0s1 c0t10d0s1 -k -i 32b d200 -p d5 -o 32 -b 2097152 d201 -p d5 -o 2097216 -b 6291456 # metastat d5: RAID State: Okay Interlace: 32 blocks Size: 34930688 blocks (16 GB) Original device: Size: 34935360 blocks (16 GB) Device Start Block Dbase State Reloc Hot Spare c0t8d0s1 330 No Okay Yes c0t9d0s1 330 No Okay Yes c0t10d0s1 330 No Okay Yes d200: Soft Partition Device: d5 State: Okay Size: 2097152 blocks (1.0 GB) Extent Start Block Block count 0 32 2097152 d201: Soft Partition Device: d5 State: Okay Size: 6291456 blocks (3.0 GB) Extent Start Block Block count 0 2097216 6291456
What are the configuration files used in SVM ?
1. The file is empty by default. The file is only used when metainit command is issued by the administrator. It is configured manually.
2. It can be populated by appending the output of # metastat -p. For example #metastat -p >> /etc/lvm/md.tab.
3. It can be used to recreate all the metadevices in one go. Best used in recovery of SVM configurations.
# metainit -a (to create all metadevices mentioned in md.tab file) # metainit dxx (create metadevice dxx only)
4. DO NOT use it on root file system though.
SVM uses the configuration files /etc/lvm/mddb.cf to store the location of state database replicas. Do not edit this file manually.
The configuration file /etc/lvm/md.cf contains the automatically generated configuration information for the default (unspecified or local) disk set.
This file can also be used to recover the SVM configuration If your system loses the information maintained in the state database.
Again do not edit this file manually.
The configuration file md.conf contains fields like nmd (i.e. number of volumes (metadevices) that the configuration supports) etc. The file can be edited to change the default values for various such parameters.
The RC script configures and starts SVM at boot and can be used to start/stop the daemons.
The RC script checks the SVM configuration at boot, start sync of mirrors if necessary and start the active monitoring daemon (mdmonitord).
If one of the root disk under mirrored SVM fails and you have to reboot the system, would the system reboot without any error ?
This is one of the most common question asked on SVM. Now in case of losing any state database replica (metadb) SVM determines the valid state database replica by using majority consensus algorithm. According to the algorithm it is required to have atleast (half + 1) to be available at boot time to be able to consider any of them to be valid. So in our case we if we had 6 metadb in total (3 on each disk), then we would need atleast 4 metadbs to be able to boot the system successfully, which we do not have. Hence we can’t boot the system.
To avoid this we need to add one entry in the /etc/system file to bypass the majority consensus algorithm. This enable us to boot from a single disk, which may be the requirement in many cases in production like patching the system etc. The entry is :
set md:mirrored_root_flag = 1
How to create different RAID layouts in SVM ?
RAID 0 (stripe and concatenation)
1. Creating a concatenation from slice S2 of 3 disks :
# metainit d1 3 1 c0t1d0s2 1 c1t1d0s2 1 c2t1d0s2
d1 - the metadevice 3 - the number of components to concatenate together 1 - the number of devices for each component.
2. Creating a stripe from slice S2 of 3 disks :
# metainit d2 1 3 c0t1d0s2 c1t1d0s2 c2t1d0s2 -i 16k
d2 - the metadevice 1 - the number of components to concatenate 3 - the number of devices in each stripe.
-i 16k – the stripe segment size.
3. Creating three, 2 disk concatenation and stripe them together :
# metainit d3 3 2 c0t1d0s2 c1t1d0s2 -i 16k 2 c3t1d0s2 c4t1d0s2 -i 16k 2 c6t1d0s2 c7t1d0s2 -i 16k
d3 - the meatadevice 3 - the number of stripes 2 - the number of disk (slices) in each stripe -i 16k - the stripe segment size.
How to create a mirrored (RAID 1) layout in SVM ?
In SVM mirroring is a 2 step procedure – create the 2 sub-mirrors (d11 and d12) first and associate them with the mirror (d10).
# metainit -f d11 1 1 c0t3d0s7 # metainit -f d12 1 1 c0t4d0s7 # metainit d10 -m d11 # metattach d10 d12
Here d10 is the device to mount and and d11 and d12 hold the 2 copies of the data.
How to creare a RAID 5 layout in SVM ?
To setup a RAID 5 mirror using 3 disks :
# metainit d1 -r c0t1d0s2 c1t1d0s2 c2t1d0s2 -i 16k
how to remove a metadevice ?
The metadevice can be removed if they are not open (i.e. not mounted):
# metaclear d3
To delete all the metadevices (use it carefully as it blows away entire SVM configuration):
# metaclear -a -f
How to view the SVM configuration and status of metadevices ?
To view the entire SVM configuration and status of all the metadevices :
# metastat -p
To check the configuration and status of a particular device :
# metastat d3
Another command to view SVM configuraion is :
# metastat -c
How to extend a metadevice ?
To grow a metadevice we need to attach a slice to the end and then grow the underlying filesystem:
# metattach d1 c3t1d0s2
If the metadevice is not mounted :
# growfs /dev/md/rdsk/d1
If the metadevice is mounted :
# growfs -M /export/home /dev/md/rdsk/d1
How to create metasets in SVM ?
Example below has 2 nodes (node01 and node02) with 2 shared disks assigned to both.
metadb -afc 3 c0d0s7
Create disk set
Add hosts to the diskset
On node01 :
# metaset -s [disk_set] -a -h node01 node02
Take ownership of the disk set
# metaset -s nfs1 -t -f -t --> for taking ownership -f --> forcefully
Add disks to the disk set :
# metaset -s [disk_set] -a /dev/did/rdsk/c15t0d0 /dev/did/rdsk/c15t1d0
Create the volumes in the diskset :
# metainit -s [disk_set] d11 1 1 c15t0d0 [disk_set]/d11: Concat/Stripe is setup
# metastat -s [disk_set]
How to do a root encapsulation and mirroring under SVM ?
How to grow a concat metadevice ?
How to grow a RAID 5 metadevice ?
How to grow a Mirrored metadevice ?
How to replace a failed root disk under SVM ?
Solaris Volume Manager (SVM) : Growing RAID 5 metadevices online
In the example shown below, the concat metadevice d80 is configured using the slice c1t3d0s0 of size 1 GB. The high level steps to grow this metadevice are :
1. Umount the file system on the metadevice if any.
2. Increase the size of disk partition being used by metadevice.
3. Recreate the metadevice.
4. Growing the file system.
# metastat d10 d10: Concat/Stripe Size: 2104515 blocks (1.0 GB) === 1gb of size Stripe 0: Device Start Block Dbase Reloc c1t3d0s0 0 No Yes
Increase size of disk partition
We would increase the size of partition 0 on disk c1t3d0 to around 1.5 GB. Check the prtvtoc command output for the increased space :
# prtvtoc /dev/rdsk/c1t3d0s0 * /dev/rdsk/c1t3d0s0 partition map * ....(output truncated for brevity) * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 0 00 417690 3148740 3566430 === Size increased to 1.5gb (3148740 sectors) .....
Interestingly, if you see the metastat command output, it would still show the size of metadevice d10 same as previous (1 GB).
Recreate the metadevice
Now, to reflect the change in size for metadevice d10, we have to recreate it. Make sure the file system using this metadevice is un-mounted before recreating the metadevice.
# metaclear -r d10 d10: Concat/Stripe is cleared
# metainit d10 1 1 c1t3d0s0 d10: Concat/Stripe is setup
Verify the chnage in size :
# metastat -c d10 s 1.5GB c1t3d0s0 === 1.5gb of size
Growing the UFS file system
The final step is to increase the file system.
# growfs -M /data /dev/md/rdsk/d80 /dev/md/rdsk/d80: 3148740 sectors in 209 cylinders of 240 tracks, 63 sectors 1537.5MB in 35 cyl groups (6 c/g, 44.30MB/g, 10688 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 90816, 181600, 272384, 363168, 453952, 544736, 635520, 726304, 817088, 2269632, 2360416, 2451200, 2541984, 2632768, 2723552, 2814336, 2905120, 2995904, 3086688
Mount the file system and verify the new size of the file system :
# df -h /data Filesystem size used avail capacity Mounted on /dev/md/dsk/d80 1.5G 18M 1.4G 2% /data
Solaris Volume Manager (SVM) : Growing RAID 5 metadevices online