Job Summary: The System Reliability Engineer (SRE) is responsible for working with the Infrastructure, Operations, and Development teams to increase system resilience while keeping in focus the end user experience. This position emphasizes collaboration across Infrastructure, Operations, and Development teams to automate processes, improve system scalability, and maintain high availability using modern DevOps principles and best practices. The SRE is responsible for proactively identifying and resolving system issues, driving continuous improvement, and supporting the development teams with tools, frameworks, and infrastructure to achieve seamless integration and deployment. Job Duties:
- Measures and optimizes system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
- Gathers and analyzes metrics from operating systems as well as applications to assist in performance tuning and fault finding
- Designs, implements, and maintains automation scripts and tools to improve system reliability and efficiency
- Partners with development teams to improve services through rigorous testing, release procedures, and robust instrumentation
- Establishes and manages service-level objectives (SLOs) to define system reliability goals
- Balances feature development speed and reliability with well-defined service-level objectives
- Establishes monitoring practices in order to proactively resolve system issues
- Works with Digital Experience Engineers and Developers to implement application instrumentation via code and APIs
- Establishes and utilizes OTEL standards for systems that cannot be monitored via traditional tool sets
- Supports incident management processes, including real-time troubleshooting, root cause analysis, and preventative action plans
- Creates, maintains, and upgrades system documentation to reflect the latest infrastructure and processes
- Other duties as required
Supervisory Responsibilities
Qualifications, Knowledge, Skills, and Abilities: Education:
- High School diploma / GED, required
- Bachelor's Degree in Computer Science or Information Systems, preferred
Experience:
- Four (4) or more years of experience in NET, JavaScript, or similar programming languages, required
- Four (4) or more years of experience working in a cloud or hybrid infrastructure, preferred
- Three (3) or more years of experience leading projects, preferred
- Experience with distributed storage technologies like NFS, HDFS and Amazon S3, preferred
- License/Certifications:
- Microsoft AZ-305 certification, preferred
- Dynatrace Certified Professional, preferred
Software:
- Experience with modern Microsoft Windows Servers, required
- Experience with Microsoft Azure, required
- Experience with Cisco Thousand Eye's, and Dynatrace monitoring platforms, preferred
- Experience or knowledge managing CI/CD pipelines (e.g., Jenkins, GitHub Actions, Azure DevOps), preferred
- Experience or knowledge of Terraform, preferred
- Experience or knowledge of VMWare ESXi/VSphere, preferred
- Experience or knowledge of Linux (RedHat), preferred
Language:
Other Knowledge, Skills & Abilities:
- Proactive approach to identifying problems, performance bottlenecks and areas for improvement
- Understanding of observability standards, including OpenTelemetry, and experience implementing them in diverse environments
- Strong verbal and written communication skills
- Excellent interpersonal and customer relationship skills
- Capacity to work in a deadline-driven environment while handling multiple complex projects/tasks simultaneously with a focus on details
- Capable of successfully multi-tasking while working independently or within a group environment
- Ability to rely on extensive experience and judgment to plan and accomplish goals
- Capable of working well under pressure while dealing with unexpected problems in a professional manner
- Capacity to communicate and interact with all levels of employees and management
- Ability to interact and build consensus among people
- Ability to work after standard business hours and travel (up to 25%)
Individual salaries that are offered to a candidate are determined after consideration of numerous factors including but not limited to the candidate's qualifications, experience, skills, and geography.
National Range:
$120,000 - $150,000
Maryland Range:
$120,000 - $150,000
NYC/Long Island/Westchester Range:
$120,000 - $150,000
|