IBM Spectrum Scale (formerly known as GPFS) is a high-performance, highly scalable global file system that provides a single namespace to data. Given its long history with high-performance computing (HPC) and data-intensive media and data stream serving applications, Spectrum Scale has traditionally been viewed as a niche data solution: complex to install, optimize, and maintain, with more focus on performance and less on enterprise features.
In recent years, however, IBM has added a number of enterprise-level features to Spectrum Scale, along with simplified installation, monitoring, and tuning. In addition, Spectrum Scale can use virtually any storage hardware from any vendor, and can use generic Linux or Windows hosts as either servers or clients. These attributes make Spectrum Scale an ideal data platform for a number of enterprise use cases.
In an HPC environment, many different compute hosts must have access to the same data. This is the fundamental benefit provided by a single namespace, meaning that a given file can be addressed by a uniform file path, regardless of where the file is located, or from where the file is accessed.
The single namespace concept extends beyond simply allowing data access from multiple clients, and enables several key capabilities in Spectrum Scale:
A key strength of Spectrum Scale is its performance scalability, largely due to its distributed management of, and access to, metadata.
By designing a standard building block (storage plus servers) that can deliver a target level of throughput with a given amount of data capacity, an enterprise can readily grow its storage capacity using Spectrum Scale without sacrificing performance by adding the required number of building blocks to meet capacity goals.
In addition, Spectrum Scale supports high levels of concurrency (multiple applications reading and writing the same file at once). For this reason, the SAS statistical package uses Spectrum Scale as its preferred file. Today’s enterprise environments have many high-concurrency applications, among these the many flavors of relational and “NoSQL” databases.
Spectrum Scale offers at-rest encryption with NIST and FIPS 140-2 compliance, and enables secure data erasure through the use of KMIP-compliant encryption keys and remote key servers. Spectrum Scale also offers immutability and append-only features that meet certifications required for the financial industry and other sectors where data integrity and security are paramount.
In Hadoop environments, Spectrum Scale is a drop-in replacement for HDFS with enhanced scalability and reliability. Spectrum Scale may be used as either local-disk or shared-disk Hadoop storage. When using enterprise-grade shared storage, the number of data replicas in Hadoop can often be reduced to just one replica with suitable RAID protection. Spectrum Scale is validated with the Hortonworks Data Platform (HDP) and several IBM analytics offerings.
Since Spectrum Scale is a POSIX file system, Hadoop, Spark, and other applications can read, write, and edit files in place without the need to copy data between a separate Hadoop storage silo and the rest of the enterprise storage environment. By leveraging AFM and ILM, Spectrum Scale provides a complete solution for managing the analytics data lifecycle (including archiving and backups) as part of a comprehensive data management framework.
Spectrum Scale creates a “data lake” where analytics tools can be brought to bear on a vast array of unstructured data in an organization’s data space.
For a deeper discussion of how you can most effectively use IBM Spectrum Scale in your organization, reach out.
Keep connected—subscribe to our blog by email.