ZFS Boot Solaris

por | 26 noviembre, 2008

http://blogs.sun.com/storage/entry/zfs_boot_in_solaris_10

Quick ZFS Overview
– Storage devices are grouped into pools
– Pools have redundancy and robustness features ( mirror, raid)
– Datasets ( File systems and volumes ) are allocated from within the pool ( no longer associated with disk slices) – Copy-on-write allows for fast snapshots and clones of datasets ( clones are writable snapshots )


ZFS as a Root File System
– There is a benefit to having only one file system type to understand and manage ( assuming ZFS is already in use for data)
– ZFS’s features male it an excellent root file system with many management advantages.
– At least for Solaris, it’s the coming thing. New installation and management features will depend on it.

ZFS Features that matter ( for ROOT file systems )
– Pooled Storage – No need to preallocate volumes. File systems only use as much space as they need.
– Built-in redundancy capabilities ( such as mirroring ) at the pool level
– Unparalleled data integrity features. On-disk consistency always maintained – NO FCSK
-Snapshots and clones ( writable snapshots) instantaneous, nearly free, persistent, and unlimited in size and number ( except by the size of the pool )
– ZFS volumes (zvols) can be used for in-pool swap and dump areas ( no need for a swap/dump slice). One pool does it all.

Storage Layout for System Software with Traditional File Systems

Root A — mirror >> Root A
Root B — mirror >> Root B
Swap/Dump — mirror >> Swap/Dump
/export — mirror >> /export
Disk 1 Disk 2

Storage Layout for System Software with ZFS Storage Pool

POOL TANK
ROOT A
ROOT B
Swap/Dump
/export
—mirror—>> POOL TANK
ROOT A
ROOT B
Swap/Dump
/export
Disk 1 Disk 2

A short Primer on Booting Solaris

Three Phases

PROM —> BOOTER —-> KERNEL

Booting Solaris – PROM phase

– The PROME ( Bios on x86, Open-Boot Prom on SPARC ) identifies a boot device.
– The PROM loads and executes a booter from the boot device.

Booting Solaris – Booter phase

– The booter selects a root file system
– The booter loads one or more files from the root file system into memory and executes one of them. The executable file is either part of the Solaris kernel, or a program that knows how to load the Solaris kernel.

Booting Solaris – Kernel phase

– The kernel uses the I/O facilities provided by the booter to load the necessary kernel modules and files ( drivers, file system, and some control files) in order to do its own I/O and mount the root file system.
– The root file system is mounted and system initialization is performed.

Booting from ZFS – PROM phase

– At the PROM stage, booting ZFS is essentially the same as booting any other file system type.
– The boot device identifies a storage pool, not a root file system
– At this time, the booter which gets loaded is GRUB 0.95 on x86 platforms, and is a standalone ZFS reader on SPARC platforms.

Booting from ZFS – Booter phase

– With ZFS, there is no one-to-one correspondence between boot device and root file system. A boot device identifies a storage pool, not a file system. Storage pools can contain multiple root file systems.
– Thus, the booter phase must have a way to select among the available root file systems in the pool.
– The booter must have a way of identifying the default root file system to be booted, and also must provide a way for a user to override the default.

Booting from ZFS – Booter phase, Root File System Selection

– Root pools have a «bootfs» property that identifies the default root file system.
– We need a control file that list all of the avaiable root file system, but in which file system do we store it? (we don’t want to keep it in any particular root file system ).
– Answer: Keep it in the «pool dataset», which is the dataset at the root of the dataset hierarchy. There’s only one of them per pool and it’s guaranteed to be there.

In ZFS datasets form a hierarchy, every pool has a dataset in it which has the same name as the pool. Is at the top of the hierarchy. Its always there, its only one of the, and you cant delete it.

Booting from ZFS – Booter phase, Root File System Selection – x86
– On x86 platforms, the GRUB menu provides a way to list alternate root file systems.
– One of the GRUB menu entries is designated as the default.
– This default entry ( or any other, for that matter) can be set up to mount the pool’s default root file system ( indicated by the pool’s «bootfs» property).

Boot from ZFS – Booter phase, Root File System Selection – SPARC
– On SPARC platforms, a control file (/<rootpool>/boot/menu.lst) will list the available root file systems.
– A simple «boot» or «boot disk» command at the OBP (OPEN BOOT PROM) promt will boot whatever root file system is identified by the «bootfs» pool property.
– The booter has a -L option which list the bootable datasets on the disk being booted.

Booting from ZFS – Kernel Phase
– The booter has passed(1) the device identifier of the boot device, and (2) the name and type of the root file system as arguments to the kernel.
– Because the root file system is ZFS, the ZFS file system module is loaded and its «mountroot» function is called.
– The ZFS mountroot function reads the pool metadata from the boot device, initializes the pool, and mounts the designated datased as root.

BOOT ENVIRONMENTS
– A boot enviroment is a root file system, plus all of its subordinate file systems (i.e., the file systems that are mounted under it)
– There is one-to-one correspondence between boot environments and root file systems.
– A boot environment ( sometimes abbreviated as a BE ) is fundamental object in Solaris system software management.

Using Boot Environments

– There can be multiple boot environments on a system, varying by version, patch level or configuration.
– Boot environments can be related ( for example, one BE might be a modified copy of another BE).
– Multiple BE’s allow for safe application and testing of configuration changes.

The «Clone and Modify» Model of System Updates

– In-place updates of boot environments can be risky and time-consuming. A safer model is to do the following:
->Make a new boot environment which is a clone of the current active boot enviroment.
->Update the clone (upgrade, patch, or reconfigure )
->Boot the updated clone BE.
->If the clone is acceptable, make it the new active BE, if not, leave the old one active.

«CLONE AND MODIFY» TOOLS

– Solaris supports a set of tools calls «LiveUpgrade», which do cloning of boot environments for the purpose of safe upgrades and patching
– New install technology under develpoment will support this also.
– ZFS is ideally suited to maing «clone and modify» fast, easy, and space-efficient. Both «clone and modify» tools will work much better if your root file system is ZFS. ( The new install tool will require it for some features.)

Clone and Modify with Traditional File Systems

Root A
Clone of Root A <- Then upgrade this root file system
Swap/Dump
/export

Clone and Modify with a ZFS Storage Pool

Pool «Tank»

Root A

+

Swap+Dump

+

/export

->
Pool «Tank»

Root A

+

Clone of Root A

+

Swap+Dump

+

/export

-> Pool «Tank»

Root A

+

UPGRADED Root A

+

Swap+Dump

+

/export

Initial State After Clone After Upgrade

Boot Enviroment Management with ZFS
– Boot environments can be composed of multiple datasets, with exactly one root file system.
– Regardless of how many datasets compose the boot environment, the «clone and modify» tools will treat the boot enviroment as a single manageable object.

THE ZFS «SAFE» UPGRADE
The low-risk, almost-no-down-time system upgrade (using LiveUpgrade):
#lucreate -n S10_U6
#luupgrade -u -n S10_U6 -s /cdrom/Solaris_10_U6
#luactivate S10_U6
[ reboot ]

What Happens During the ZFS «Safe» Upgrade

lucreate
-Does a ZFS snapshot of the datasets in the current Boot Environment, and then clones them to create writable copies.
-Requires almost no additional disk space and occurs almost instantaneoulsy ( because ZFS cloning works by copy-on-write).

What Happens During the ZFS «Safe» Upgrade

luupgrade
– The system remains «live» (still running the original boot) during the upgrade of the clone.
– The upgrade gradually increases the amount of disk space used as copy-on-write takes place. New space is required only for files that are modified by the upgrade.

The systems remains live while upgrade the clone, but the upgrade still takes a lot of time because the packages.

luactivate
– Make the specified boot environment the new active BE. Both the old and the new BE are available from the boot menu ( but the new one is the default ).
<reboot>
– User can select either the old or the new BE. If the new BE fails for some reason, the system can be booted from the old BE.

ludestroy
– At some point, the old BE can be destroyed.
Boot Environment Management with ZFS

– Boot environments can be composed of multiple datasets.
– By default, all Solaris is installed into one dataset. Any optional directories placed under root ( such as a /zoneroots directory, for example) will tipically be in ther own datasets.
The /var directory can optionally be placed in its own dataset ( for prevention of denial-of-service attacks by filling up root ).

Swap and Dump support
– Swap and Dump areas are (by default) zvols in the root pool.
– It’s still possible to set up swap and dump areas on disk slices. Some environments ( such as those where the root pool is stored on compact flash) might need this.
– Swap and dump require two separate zvols ( can’t share the space as the can with slices )

ZFS Boot Limitations
– Currently, root pools can be n-way mirror only ( no striping or Raid-Z). We hope to relax this restriction in the next release.
– On Solaris, root pools cannot have EFI labels (the boot firmware doesn’t support booting from them).

Migration from UFS to ZFS
-System must be running a version of Solaris that supports zfs root ( S10U6 or Nevada build 90 or later )
– Create a pool ( mirrored only) in some available storage.
– Use lucreate to clone one of the UFS boot enviroments into the zfs root pool.

Project open solaris
opensolaris.org/os/project/caiman

Further Information ZFS Boot page:
http://www.opensolaris.org/os/community/zfs/boot/