We are looking for the newest member of the FADE Site Reliability Engineering (SRE) team to focus on the overall reliability of a complex and ever-changing system with a fast growing user base. Your responsibilities include maintaining complex computer systems by writing code to automate software releases, monitor systems, and detect and fix problems before users even know there's an issue. You use these skills to improve site performance overall reliability.
You're the type of person that automates any and all mundane tasks and so you can focus on solving real issues. Problem solving is your passion and you enjoy discovering the root cause of an issue. You work well in a team environment to solve complex problems. As an SRE, you will work alongside the development team to ensure speedy and reliable software deployments, monitor systems, and improve overall reliability of the platform. In addition, as you discover and document system bugs, you have the motivation to go off and fix them yourself. You have experience participating in the complete agile software life cycle including planning, scrums, sprints, and frequent software releases. You thrive in a fast moving environment and enjoy learning new technologies and overcoming new challenges. You have a strong understand of configuration management and infrastructure as code to ensure consistency across multiple environments. Your most important skill is your ability to learn and pick up new technologies.
- Requires bachelor's degree in Computer Science or a related major or equivalent, and five to seven years of related experience.
- 5-7 years relevant experience software integration and/or test
- Strong grasp of Linux
- Automating tasks by writing quality code
- Configuration managements tools (Puppet, Chef)
- Monitoring complex systems (Nagios, ElasticSearch, Grafana)
- Automation tools (Jenkins, Bamboo)
- Source-control systems (Git, SVN)
- Candidate must have a TS/SCI clearance and have or be willing to obtain a current Polygraph
- Candidate must be certified to meet DoD 8570 level IAT-II qualifications. A Security+ certification is required within 6 months of start date.
- Willing to share on-call responsibilities to include coming into work during non-business hours to troubleshoot customer issues
- Cloud computing environments (Amazon Web Services)
- Container orchestration frameworks such as Docker and Kubernetes
- Strong background in data stores including PostgreSQL and Accumulo
- Experience with building Java code and deployables (e.g. jar, war, ears, OSGI)
- Geospatial Information Systems (GIS) and OGC standards (WMS/WFS/KML)
- Familiarity with agile development methodologies
"External Referral Eligible"