Job Details
Description Able to perform effective Incident Management from incident start through resolution,partnering with Development to determine root causes, and driving rigorous ProblemManagement to follow through on actionsProactive issue identification and resolutionOwn the production environment, monitoring availability and ensure a holistic systemhealth, handle application deploymentsTriage and Remediate production systemsResolve issues within SLA.
Achieve 90% automation and reduce manual interventionBe the primary operational support engineer for multiple large distributed criticalsoftware applications Job Requirements Engineering degree with 5+ years of experience in Application SupportStrong understanding of modern monitoring and logging technologies (Logzio, CloudWatch, Splunk, DynaTrace, New Relic, AppDynamics, etc).
Understand microservice architectureExperience in Unix, Shell scripting/Python, SQL, AWS, etcExperience in troubleshooting complex application as well as environment issuesExcellent communication, presentation and documentation skillsStrong experience with Intake, Problem Management, and Service AvailabilityManagement.
Basic knowledge of CI/CD tools and conceptsKnowledge of ITIL processes Ready to work in shifts