JR GPFS Engineer(NACS)

Job Description:

RedLine Performance Solutions (RedLine) has been in the HPC solutions engineering services business for approximately 17 years and is consistently determined to keep the “bar of excellence” quite high for new hires. This enables RedLine to accomplish what other firms cannot and promotes a high level of staff retention. We offer services ranging from full life cycle HPC systems engineering to remote managed services to HPC program analysis. We are located in the Washington, DC area and are looking for a Junior GPFS Engineer to join us for our NASA NACS High Performance Computing contract.

US citizenship and the ability to obtain a Public Trust security clearance are mandatory requirements for this position. The position is located at a customer site in Greenbelt, MD. Preference is for local candidates, but we will consider relocation as well.

This position is a member of an HPC Support team focusing on storage hardware and software for two supercomputing clusters. You will specialize in both the monitoring and management of storage systems and storage-related network management for a large supercomputer.

Job Responsibilities:

    • Storage tasks:
      • Hardware installation
      • Hardware testing and daily maintenance/monitoring, LUN configuration and presentation with various controller OS’s, filesystem and cluster management with GPFS)
      • Monitor and maintain Discover’s storage hardware (spinning disk and NVMe-based) and backend storage network (Fibre Channel)
      • Monitor and maintain Discover’s GPFS cluster, including all ~3700 clients and 60 NSD servers (plus managers and quorum nodes)
    • Monitor and maintain Discover’s 3 high-speed interconnect fabrics (2 FDR InfiniBand and 1 Omni-Path OPA100 fabric, including cables, switches, firmware, and software-level such as the SM’s)
    • Address user tickets and resolve issues in various cluster areas
    • Attend meetings with high-priority user groups to keep open channels of communication and address concerns they may have
    • Maintain test and development system to keep it consistent with the production cluster
    • Consult the customer on new cluster hardware purchases (both storage and compute)
    • Assist with benchmarking new products (storage systems and switches) that will potentially be used in production
    • Test and verify hardware such as storage and high-speed fabrics to validate it for production

    Required Skills/Experience:

    • Bachelor’s degree in Computer Science, Management Information Systems or other technical discipline plus 3 years of relevant work experience or equivalent
    • Experience with HPC parallel filesystems (e.g., GPFS, Lustre)
    • Experience with storage systems (data/metadata/IO server configurations in GPFS, spinning disk, SSD, and NVMe)
    • Experience with high-speed interconnect networking (e.g., InfiniBand, Omni-Path, Fibre Channel) – cabling, cards, switches, OFED/MOFED, etc.
    • Working knowledge of scripting and programming languages such as C, C++, Fortran Bash, CSH, TSCH, Perl, Python, Ruby.
    • Good organization skills to balance and prioritize work, and ability to multitask
    • Good communication skills to communicate with support personnel, customer, and managers.

__________

Please email hr@redlineperf.com with your resume if this opportunity is of interest to you.

Back to Top

All rights reserved. Copyright 2014 RedLine Performance Solutions, LLC.