Adrenalin’s Experience

Software raid 5 under FreeBSD 7

Posted in freebsd, raid by Adrenalin on February 1, 2008

Hi, my first post here, want to tell you about my findings of raid 5 software on freebsd 7 beta.
Freebsd 7 is not yet released, and a lot of things are “experimental”..

My setup,

3x500GB hdd
intel quad core
freebsd 7 beta 4 and freebsd 7 rc1
kernel amd64

When mounted separated, every hdd can write and read about 70mb/s.

There is three solutions, I could find, to create software raid 5 under freebsd 7, speed of 2 was already tested by Michael from Mindmix, benchmark geom raid 5, geom raid 3, zfs raidz

Unfortunately all 3 solution are experimental ;o) geom vinum raid 5, geom raid 5, zfs raid.

All my “mini-benchmarks” was done with the dd tool. That mean sequential write and reads only was tested.

For write: dd if=/dev/zero of=/tank/test bs=1m count=4000

For read: dd if=/tank/test of=/dev/null bs=1m

Geom vinum (gvinum) Raid 5

With gvinum raid 5, my experience was pretty bad, had exactly the same problem as this guy it seem unfinished, there is some patches, but unfortunately after patch the sources, I can’t compile.

According to handbook

Starting with FreeBSD 5, Vinum has been rewritten in order to fit into the GEOM architecture (Chapter 19), retaining the original ideas, terminology, and on-disk metadata. This rewrite is called gvinum (for GEOM vinum).

How to setup gvinum raid 5 ? Firstly see gvinum page in the handbook, after see this, and this, and this.
Performance was strange, write only ~10mb/s, read ~120mbs/s.

Maybe with patches it’s better, I filled a bug report and send an email to the current maintainer, hope to get any answer.

Update: Got and answer, need to retry to compile..

Geom raid 5

Since geom raid 5 is not in freebsd distribution, because of some errors and I didn’t manage to compile from sources.

How to setup geom raid 5 ? Also there is a wikipedia geom raid 5 article. Maybe you’re more lucky.

ZFS RAIDZ

That’s one of of the 7th FreeBSD‘s jeweler : )

That’s a totally new file system that comes from Solaris OS.

Creating a raidz drive from 3 drives is easy as this

zpool create tank raidz ad12 ad14 ad16

And it will mount tank for you. You can access /tank/ straightaway.

That’s the only one who support hot-spares. That kick-ass ! %)

From man, it also have some kind of “double-parity”, that’s amazing !

raidz     A  variation on RAID-5 that allows for better distribution of
raidz1    parity and eliminates the “RAID-5 write hole” (in which  data
raidz2    and  parity become inconsistent after a power loss). Data and
          parity is striped across all disks within a raidz group.

          A raidz group can have either single- or double-parity, mean-
          ing  that  the  raidz  group  can sustain one or two failures
          respectively without losing any data. The  raidz1  vdev  type
          specifies  a  single-parity  raidz  group and the raidz2 vdev
          type specifies a double-parity raidz group.  The  raidz  vdev
          type is an alias for raidz1.

          A  raidz group with N disks of size X with P parity disks can
          hold approximately (N-P)*X bytes and can withstand one device
          failing  before  data  integrity  is compromised. The minimum
          number of devices in a raidz group is one more than the  num-
          ber  of parity disks. The recommended number is between 3 and
          9.

spare     A special pseudo-vdev which  keeps  track  of  available  hot
          spares for a pool. For more information, see the “Hot Spares”
          section.

Write:
dd if=/dev/zero of=/tank/test bs=1m count=4000
4194304000 bytes transferred in 34.494573 secs (121593156 bytes/sec)
Read:
dd if=/tank/test of=/dev/null bs=1m
4194304000 bytes transferred in 27.663889 secs (151616572 bytes/sec)

Performance was fair enough: write 115mb/s, read 145mbs/s.

Need to do some more testing to see how it recovery when one of hdd is wiped up.

Currently I stopped  my choice at ZFS, it’s raid 5 (or raidz, how they call it) seem more mature than other.

Updated: Mini-Crash-Test was passed without any problem, I just took offline a drive, dd it with /dev/zero 4gb and after bring it back, and run zpool scrub tank, the check was done pretty fast, 1-2 minutes, I think because there was not too much data, anyway, when a similar test was done for gvinum raid 5, a parityrebuild took several hours, even if there was not too much data..

  pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ‘zpool clear’ or replace the device with ‘zpool replace’.
   see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed with 0 errors on Fri Feb  1 04:34:07 2008
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad12    ONLINE       0     0   714
            ad14    ONLINE       0     0     0
            ad16    ONLINE       0     0     0

errors: No known data errors

There is ppl complaining about ZFS data lost, if your data worth something, having an additional backup solution better offsite is a must, nothing is perfect.

Some meditation

Sequential read of >140MB/s for raid 5 is really great. It’s like having raid 0 but with redundancy with some space being lost, but not as much as for raid 1 %) Here is good sum up what raid 5 is.

As the blocks are stripped, it’s logical that read speed is higher than just reading from a single drive..

Advertisement

One Response

Subscribe to comments with RSS.

  1. Service-labs said, on April 21, 2013 at 7:17 am

    This is certainly a great post, surely i would like to read more about this topic.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: