GlusterFS - distributed, redundant, scalable, filesystem for file storage

I had written a detailed post on Clustered and distributed Filesystems on our Wiki a while ago. I have been taking a relook at the filesystems on the market frequently, and am now contemplating production use of one of them - GlusterFS.

Typical applications can be divided into the usual tiers - UI, App server, Data layer. The Data layer can be an RDBMs or files or both. For instance a content sharing application would likely have an RDBMS to maintain all the meta-data required to apply various business rules, and then each content item (photos / videos etc) can be individually stored on an underlying filesystem.

RDBMses have their own high availability configuirations (replication, shared-nothing cluster, failover cluster etc). This article focuses on effective ways of storing files instead of structured data, in a high-available, redundant configuration

Config 1 - NFS Store

  • Typical models tend to store files on a central NFS server
  • The NFS Server in turn can be an appliance (NAS device) or a server with a backend SAN
  • Disadvantages
    • The NFS Server is a single point of failure and requires failover capability
    • NFS by itself does not provide any redundancy and relies on the underlying RAID to provide redundancy against hardware failure
    • RAID does not provide redundancy against a filesystem or OS crash. So for instance if the underlying filesystem was EXT3 and it crashed one would need an fsck for recovery. If such crashes result in data loss recovery is only possible if there was a backup
    • NFS is slower than native filesystems since it is an application layer filesystem

Config 2 - GFS Store

  • In this model multiple clients can mount the same underlying block device using GFS
  • Advantages
    • GFS is faster than NFS
    • Does not require an inbetween Server. Clients can directly mount the block device
  • Disadvantages
    • GFS by itself does not provide any redundancy and relies on the underlying RAID to provide redundancy against hardware failure
    • RAID does not provide redundancy against a filesystem or OS crash. So for instance if GFS crashed one would need an fsck for recovery. If such crashes result in data loss recovery is only possible if there was a backup
    • Does not allow heterogenous clients - all clients must be Linux

Config 3 - GlusterFS

  • Gluster supports various unique and interesting configurations
  • The clients could aggregate their individual hard drives using gluster to look like one single filesystem
  • Alternatively one could use separate bricks for storage which all appear as a single filesystem to the client
  • Advantages
    • Gluster has redundancy built-in. In a gluster config you can specify a replication threshold, and gluster will ensure that the underlying file is replicated across those many bricks
    • Replication works at a file-level. If a brick goes down due to hardware, filesystem or OS crash it does not affect Gluster which will continue serving the file from any other brick
    • Complete POSIX compliance
    • Very creative configurations possible
    • Provides direct access to the underlying files through NFS/CIFS
    • Has modules that allow direct access to files from Apache/lightppd - significantly boosting performance if the purpose of the files is web serving

Labels

 
 

Life@Directi


From Blogs & Wikis

Directi Presentations

General Wikis

Directi Univ Wikis

Company Blogs

Businesses


TechCamp
Home.pw - Chat and collaboration for companies and individuals. LogicBoxes - Registry & Registrar Solutions ResellerClub - Domain Reseller, Domain Name Reseller, Cheap Domain Reseller - Resellers BigRock - Domain Names, Domain Registration India, Web Hosting, Domains Skenzo - Exclusive Traffic Monetization Programs WebHosting - Web Hosting Information CodeChef - Online Programming Competition
All content in the Directi Wiki is licensed under a Creative Commons Attribution-Share Alike 3.0 .