Springfield, VA - Salary Range 140-180k (TS/SCI Poly Required)
Job Brief
Enterprise Management Engineer is needed to support Data Center Service in support of installation, administration, management, configuration, testing, and integration tasks related to Fault and Performance tools. Work independently as part of a small team of system engineers responsible for the care and feeding of a diverse IT infrastructure
Responsibilities
- Demonstrated hands-on experience and knowledge of Linux platform (i.e. RHEL, CentOS) including administration, management, and troubleshooting for physical and virtual platforms
- Demonstrated hands-on experience and understanding of proactive monitoring concepts, including experience configuring and deploying Network and systems monitoring (i.e. SNMP, Nagios, Splunk, SolarWinds, etc.)
- Demonstrated experience performing trend analysis on overall system health, performance, and capacity management with regard to utilization and growth
- Demonstrated ability to develop and maintain capacity metric
- Performs software upgrades, patch installs, firmware upgrades then test for functionality on a periodic basis.
- Demonstrated knowledge of the following infrastructure principles: Fault tolerance, High availability, Scalability and Capacity planning, Data center organization, Backup / Recovery
- Create shell and Perl scripts in various shells to automate daily and periodic tasks
- Maintain server configuration baselines and configuration compliance against baseline/benchmarks
- Collaborate with Application Teams to perform system maintenance and patch management tasks
- Document work for leadership, update/create Standard Operating Procedures, and brief staff and customers various tasks.
- Interfaces with other engineering teams to adapt performance management tool capabilities to meet operational requirements.
- Assists with analysis using enterprise tool solutions and other tools to detect and respond to IT events, incidents, and outages.
- Performing systems hardening to DoD Standards
Requirements
- 2+ years of related systems engineering experience.
- 1+ years of experience with monitoring tools
- Must have current 8570 IAT II Level Certification (CNA-Security, GICSP, GSEC, Sec+ CE, SSCP) or higher within 90 days of hire.
- Must be able to support a large, complex server and network infrastructure
- Experience with Linux systems in the areas of system administration troubleshooting, integration, shell scripting, and development.
- Advanced scripting skills to include experience using Perl and Python
- Experience with Infrastructure as a Service (IaaS)
- Ability to partner with other systems administrators, storage administrators, application developers, and network engineers to solve complex problems
- Strong understanding of enterprise networks including load balancers, routers, switches, TCP/IP, DNS, Local Area Networking, AD, GPO
- Fault event rules development, performance threshold management
- Experience with tools in an enterprise environment
- Event analysis
- Experience with one or more of security hardening, backup management, capacity planning, change management, or patch management.
- Strong understanding of enterprise networks including load balancers, routers, switches, TCP/IP, DNS, Local Area Networking, AD, GPO
Desired Skills
- Experience with fault and performance management tools; including installation, configuration, and administration
- Experience with CA eHealth and/or other network performance tools
- Experience with any of the following: JIRA, Nagios, Python, Puppet, YUMRpo, Ansible Tower, and/or ILOM
- Familiarity of VMware ESX 6.0/6.5/6.7
- Experience with Windows System Administration
\#CJ