Come support the development and engineering support efforts directed at enhancing the Agency's High Performance Computing (HPC) infrastructure, HPC ecosystem, and HPC Storage capabilities as a Site Reliability Engineer. The candidate will support system engineering level studies and analysis of next generation HPC's and related technologies to design and optimize large scale storage systems.
The Site Reliability Engineer (SRE) shall perform the following duties:
- Integrate, install, configure, upgrade, compile, and support COTS/GOTS software.
- Generate documentation for the full software stack.
- Update software for sustainment support.
- Perform Linux system administration and shell scripting.
- Execute test codes for characterization of software performance.
- Provide software product ownership for HPC tools.
- Work in a fast paced environment and switch between various architectural paradigms.
- Perform administration of large distributed files systems such as Lustre and HDFS
Bachelor's Degree in Computer Science or related field and have at least 12+ years of demonstrable experience with integrating, installing, configuring, upgrading, compiling, and supporting COTS/GOTS software in a heterogeneous operating system environment.
The individual shall have five (5) years full time Computer Science directly related work that can be substituted for a degree and have eight (8) years of demonstrable experience with integrating, installing, configuring, upgrading, compiling, and supporting COTS/GOTS software in a heterogeneous operating system environment..
An industry recognized professional certification, as defined in the TT0s, may substitute as one (1) year experience. A Master's Degree in Computer Science or related field may substitute for two (2) years' experience.
- A minimum of 3 years experience writing scripts using Bash/Python
- A minimum of 3 years experience with Unix command line
- A minimum of 3 years experience performing Unix System Administration including installation, configuration, and support of COTS/GOTS software in a large scale Unix HPC cluster environment
- General HPC technical knowledge regarding compute, network, memory, and storage components
- Demonstrated experience supporting large Unix HPC Clusters
- Familiar with various network communications like IP and InfiniBand
- Excellent verbal and written communication skills
- Experience with Configuration Management, including versioning and automated tools such as Puppet, Chef, Salt, and Ansible
- Demonstrated experience with the sustainment, support, maintenance, development and deployment of Lustre based HPC parallel file systems.
- Familiar with Site Reliability Engineering (SRE) principles and applications
- Demonstrated experience using system monitoring tools such as Nagios and ibmonitor
- Demonstrated experience developing test plans, procedures, and reports ensuring consistency across the storage architecture
Security Clearance Requirement
Must have an active TS/SCI with a polygraph in order to be considered for this position.
- Experience with the Atlassian Tool Suite (JIRA, Bitbucket, Confluence)
- Familiarity with test driven Agile development best practices
- Familiarity with Lustre file systems
- Experience with ZFS and GPFS
- Experience with lnfiniband and Ethernet based networks
External Referral Bonus:Eligible
Potential for Telework:No
Clearance Level Required:Top Secret/SCI with Polygraph
Scheduled Weekly Hours:40
Job Family:Software Engineering
Leidos is a Fortune 500® information technology, engineering, and science solutions and services leader working to solve the world's toughest challenges in the defense, intelligence, homeland security, civil, and health markets. The company's 33,000 employees support vital missions for government and commercial customers. Headquartered in Reston, Virginia, Leidos reported annual revenues of approximately $10.19 billion for the fiscal year ended December 28, 2018. For more information, visit www.Leidos.com.
Pay and Benefits
Pay and benefits are fundamental to any career decision. That's why we craft compensation packages that reflect the importance of the work we do for our customers. Employment benefits include competitive compensation, Health and Wellness programs, Income Protection, Paid Leave and Retirement. More details are available here.
Securing Your Data
Leidos will never ask you to provide payment-related information at any part of the employment application process. And Leidos will communicate with you only through emails that are sent from a Leidos.com email address. If you receive an email purporting to be from Leidos that asks for payment-related information or any other personal information, please report the email to [email protected].
Commitment to Diversity
All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law. Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws.