Blog

The Role of IBM Spectrum Scale (aka GPFS) in the Enterprise

By in Uncategorized on April 21, 2017

IBM Spectrum Scale (formerly known as GPFS) is a high-performance, highly scalable global file system that provides a single namespace to data. Given its long history with high-performance computing (HPC) and data-intensive media and data stream serving applications, Spectrum Scale has traditionally been viewed as a niche data solution: complex to install, optimize, and maintain, with more focus on performance and less on enterprise features.

In recent years, however, IBM has added a number of enterprise-level features to Spectrum Scale, along with simplified installation, monitoring, and tuning. In addition, Spectrum Scale can use virtually any storage hardware from any vendor, and can use generic Linux or Windows hosts as either servers or clients. These attributes make Spectrum Scale an ideal data platform for a number of enterprise use cases.

Global Namespace

In an HPC environment, many different compute hosts must have access to the same data. This is the fundamental benefit provided by a single namespace, meaning that a given file can be addressed by a uniform file path, regardless of where the file is located, or from where the file is accessed.

The single namespace concept extends beyond simply allowing data access from multiple clients, and enables several key capabilities in Spectrum Scale:

  • Data tiering. Spectrum Scale can use a variety of storage: flash-based SSDs, NVME, spinning disk (SAS, NLSAS, SATA), and even tape. Spectrum Scale’s Information Lifecycle Management (ILM) policy engine can place files on the desired storage tier and migrate them between tiers based on access time, file type, file ”heat,” and other factors. Using ILM policies, one can get the most effective use of flash-based storage, while still meeting capacity requirements.
  • Universal access. Spectrum Scale runs on Linux (RHEL, SLES, Debian and similar variants), AIX, and Windows clients to allow native file system access. When export services are enabled in a Spectrum Scale cluster, clients may also access data using NFS, SMB, and object protocols such as Swift or Amazon S3.
  • Archiving to tape, object storage, or the cloud. ILM can transparently move data to “cool storage” on tape or to on-premises or public cloud object storage. This data can be immediately recalled on demand.
  • Sharing data across geographically disperse sites. Spectrum Scale’s Active File Management (AFM) enables asynchronous replication to cache data between sites, and for use as a “hot cache” to stage on-demand copies of data on fast storage. AFM has many use cases depending on the type of home-cache relationship established.
  • Mirroring and disaster recovery across sites. AFM can be used to establish a primary and secondary copy of data at multiple sites, to enable failover in case the primary site goes offline for any reason.

IBM Spectrum Scale connects many storage media, access methods, applications, and sites to create a “data lake” with a single namespace.

Concurrency and Scalability

A key strength of Spectrum Scale is its performance scalability, largely due to its distributed management of, and access to, metadata.

By designing a standard building block (storage plus servers) that can deliver a target level of throughput with a given amount of data capacity, an enterprise can readily grow its storage capacity using Spectrum Scale without sacrificing performance by adding the required number of building blocks to meet capacity goals.

In addition, Spectrum Scale supports high levels of concurrency (multiple applications reading and writing the same file at once). For this reason, the SAS statistical package uses Spectrum Scale as its preferred file. Today’s enterprise environments have many high-concurrency applications, among these the many flavors of relational and “NoSQL” databases.

Enterprise Data Security Features

Spectrum Scale offers at-rest encryption with NIST and FIPS 140-2 compliance, and enables secure data erasure through the use of KMIP-compliant encryption keys and remote key servers. Spectrum Scale also offers immutability and append-only features that meet certifications required for the financial industry and other sectors where data integrity and security are paramount.

Analytics

In Hadoop environments, Spectrum Scale is a drop-in replacement for HDFS with enhanced scalability and reliability. Spectrum Scale may be used as either local-disk or shared-disk Hadoop storage. When using enterprise-grade shared storage, the number of data replicas in Hadoop can often be reduced to just one replica with suitable RAID protection. Spectrum Scale is validated with the Hortonworks Data Platform (HDP) and several IBM analytics offerings.

Since Spectrum Scale is a POSIX file system, Hadoop, Spark, and other applications can read, write, and edit files in place without the need to copy data between a separate Hadoop storage silo and the rest of the enterprise storage environment. By leveraging AFM and ILM, Spectrum Scale provides a complete solution for managing the analytics data lifecycle (including archiving and backups) as part of a comprehensive data management framework.

Spectrum Scale creates a “data lake” where analytics tools can be brought to bear on a vast array of unstructured data in an organization’s data space.

For a deeper discussion of how you can most effectively use IBM Spectrum Scale in your organization, reach out.

Sign Up for Our Newsletter


Archives
Categories
Subscribe to Our Newsletter

Keep connected—subscribe to our blog by email.


Back to Top

All rights reserved. Copyright 2014 RedLine Performance Solutions, LLC.