19. September 2017 aerialsurveybase Ollie

RAID-1: Theory behind RAID (Part 1)

Talking about RAID is very common in our industry, since aerial mapping produce always a huge amount of data at every step of the workflow.

What is a RAID?

RAID = Redundant Array of Independent Disks.

Other people call it as a Redundant Array of Inexpensive Disks.

Wikipedia’s definition is: A RAID is a data storage virtualization technology that combines multiple physical disk drive components into a single logical unit for the purposes of data redundancy, performance improvement, or both. Data is distributed across the drives in one of several ways, referred to as RAID levels, depending on the required level of redundancy and performance. The different schemes, or data distribution layouts, are named by the word RAID followed by a number, for example RAID 0 or RAID 1. Each schema, or RAID level, provides a different balance among the key goals: reliability, availability, performance, and capacity. RAID levels greater than RAID 0 provide protection against unrecoverable sector read errors, as well as against failures of whole physical drives. (Wikipedia)

It comes into account specifically when big data need to be processed, such as LiDAR data sets, processing raw images of aerial mapping camera or creating point clouds.

The most common RAID levels are:

Raid 0 Striping without parity, improved performance, additional storage, no fault tolerance
Advantages
  • I/O performance is maximum improved by spreading the I/O load across many channels and all drives
  • No parity calculation overhead is involved
  • Very simple design
  • Easy to implement
Disadvantages
  • Not a „True“ RAID because the failure of only one drive will result in all data in a virtual disk being lost
  • Should not be used for critical data
RAID 1 Mirroring without parity, fault tolerance for disk errors and single disk failures
Advantages
  • High performance up to twice the read transaction rate of single disks, and the same write transaction rate as single disks
  • 100 % redundancy of data  – no rebuild of data is necessary in case of disk failure, just a copy to the replacement disk
  • Supports hot-swap disks
  • Simplest RAID storage subsystem design
Disadvantages

 

  • Highest disk overhead of all RAID types (100 %) results in inefficient use of drive capacity and costs
  • Limited capacity since the virtual disk can only include two disk drives
RAID 5 Striping with distributed parity, improved performance, fault tolerance for disk errors & single disk failures
Advantages
  • Most efficient use of drive capacity of all the redundant RAID configurations
  • Can survive the loss of one disk without losing data
  • High read transaction rate
  • Medium-to-high write transaction rate
  • Disk failure has a medium impact on throughput
  • Most complex controller design
  • Retrieval of parity information after a drive failure takes longer than with mirroring
Disadvantages
  • Disk failure has a medium impact on throughput
  • Most complex controller design
  • Retrieval of parity information after a drive failure takes longer than with mirroring
RAID 6 Striping with dual parity, fault tolerance for dual drive failures
Advantages

 

  • Can survive the loss of 2 disks without losing data
  • Data redundancy, high read rates, and good performance
Disadvantages

 

  • Requires 2 sets of parity data for each write operation, resulting in significant decrease in write performance
  • Additional costs because of the extra capacity required by using 2 parity blocks per stripe
  • Retrieval of parity information after a drive failure takes longer than with mirroring
RAID 10 Combination of RAID 0 + RAID 1Level, enabling block level and mirroring combined with striping, better performance, fault tolerance for disk errors and multiple drive failure (one drive failure per mirror set)
Advantages

 

  • RAID 10 has the same redundancy as RAID 1
  • High I/O rates are achieved by striping RAID 1 segments
Disadvantages

 

  • Most expensive RAID solution
  • Requires 2n where n > 1 disks
  • Very limited scalability at a very high inherent cost

 

There are more levels existing, called Hybrid or Nested RAID Levels, but these are not so common in the aerial industry.

Exotic RAID Levels:

unRAID

unRAID is a Linux-based operating system from Limetech optimized for media file storage. Personally I use that at home and it’s works well. unRAID allows support of a SSD cache pool which can dramatically speed up the write performance. Advantages include lower power consumption than standard RAID levels, the ability to use multiple hard drives with differing sizes to their full capacity and in the event of multiple concurrent hard drive failures (exceeding the redundancy), only losing the data stored on the failed hard drives compared to standard RAID levels which offer striping. Disadvantages include slower write performance than a single disk and bottlenecks when multiple drives are written concurrently. In my basic understanding of unRAID is it’s „JBOD + Parity“ and it’s not striped like RAID 5, more or less similar to a RAID 4 solution with an dedicated parity disk. You can configure unRAID also similar to a RAID 6 solution with 2 parity disks for higher level of data protection.
So you can create an array with drives of different sizes, and you can add anytime additional drives when you need them. Because the drives are not striped, even if you lose multiple drives, you won’t lose everything. unRAID started by using journaled ReiserFS file system, but changed now to proven and stable XFS file system or, what I use, BTRFS.
BTRFS is a more modern and advanced file system than XFS and has functionality to protect against ‚bit rot‘ type data corruption and other advanced features. It’s also used by the Synology DiskStations.

Synology DiskStation SHR

Almost all SOHO Diskstations from Synology are capable for the Synology Hybrid Raid (SHR), which is a good alternative to the standard RAID 5/6 levels in a SOHO environment.

SHR is an automated RAID management system that makes storage volume deployment easier than traditional RAID systems. SHR allows 1 disk or 2 disk of redundancy – meaning the SHR volume can suffer up to 2 disks lost, and the data volume will still be available for use. It can use multiple hard drives with differing sizes to their full capacity. Unlike Classic RAID, SHR makes the newly upgraded storage readily available for use. If the drives are replaced with larger ones, the upgraded storage can be used as soon as 2 of the larger disks are upgraded to form a redundant storage array. That is a great feature, saving costs. No need to replace the full bunch of drives at the same time, the system will grow and storage space will be accessible on demand. In addition, as a financially approach, SHR helps users to achieve maximum storage capacity without purchasing an entire full set of new drives. It also benefits from more modern BTRFS file system, especially for archiving purposes, since it has functionality to protect against ‚bit rot‚ type data corruption. SHR works perfect at all of my DiskStations for years. Similar technology is used in most NAS systems, such as from Qnap, Asustor, but also by Drobo.

Intel Rapid Storage Technology

Intel Rapid Storage Technology (IRST) features a Matrix RAID (not a RAID level) present in the ICH6R controller and subsequent Southbridge chipsets from Intel, accessible and configurable via the RAID BIOS setup utility of the mainboard. Matrix RAID supports RAID 0, 1, 5, or 10 volumes in the array. A Matrix RAID array can improve both performance and data integrity. A practical instance of this would use a small RAID 0 (stripe) volume for the operating system, program, and paging files; second larger RAID 1 (mirror) volume would store critical data.

 

Microsoft Storage Spaces

Microsoft Storage spaces is a storage virtualization technology which succeeds Logical Disk Manager and allows to mount  physical disks into logical volumes. This technology is similar to Linux based Logical Volume Manager (LVM) RAID1 or RAID5.

A storage space can be handled like a big physical disk with thin provisioning of available disk space. The spaces are organized within a storage pool, as a collection of physical disks. It can handle multiple disks of different sizes, performance or technology (mix USB, SATA, SAS etc).  Storage Spaces have built-in resiliency for disk failures.

Storage Spaces was introduced with Windows 8 and enhanced in Windows Server 2012R2 with tiering and SSD cache support. Personally I made strange experiences with that technology (also later with Windows 10 and Server 2016), the performance of the Storage Spaces was incredible bad for home use and aerial production as well.

 

continue with Part 2 – Which RAID Level I should use? We’ll compare the different RAID levels.

, , , , , , ,