Site Reliability Engineer I

Requisition ID
US-CA-Agoura Hills
Position Type


This is a 24/7 team responsible for production systems health monitoring, deployment of code changes, escalation handling and standardized communication of all Change Management within the Technical Operations organization Candidates must be comfortable communicating quickly and accurately in the event of production emergencies, both with internal and external groups. This individual must also be comfortable navigating through both Unix and Windows environments and be involved in actively troubleshooting and/or resolving production issues.

Job Description

  • Monitoring - 24x7x365 Health monitoring of Windows and UNIX environments hosting various based web, mobile and telephony platforms using server, network and application monitoring systems under minimal supervision 
  • Assist and provide first escalation support in the monitoring, troubleshooting and support of various Windows and Linux operating systems, databases, utilities, system tools and the hardware on which they reside under minimal supervision.
  • Conduct systems support activities, such as network and server monitoring, troubleshooting, escalation and resolution under minimal supervision.
  • Modify, maintain and update software, such as firmware, drivers, anti-virus and Windows Service Pack updates, performed through a practice of Change Management, Configuration Management and Release Management under minimal supervision.
  • Maintain Service Delivery to meet customer Service Level requirements which includes Disaster Recovery and Business Continuity planning, data security and recovery, and Production level policies and procedures
  • Execute daily NOC operations, escalations, ticketing and communications with customers
  • Flexible with the ability to handle stressful situations, such as initiating emergency conference bridge calls and sending quick and accurate outage notifications. 
  • Use of Standardized Communications for Code Releases, Schedule Maintenances and Service Interruptions. 
  • Deployment/release of engineering code across multiple environments - all builds/releases communicated and applied to Staging and Production environments according to standard operating procedures under minimal supervision 
  • Perform other related duties as required and assigned
  • Demonstrate behaviors which are aligned with the organization’s desired culture and values

Ideal Candidate will have the following:

  • System Administration skills a plus
  • Experience with, Windows Server 2012, Active Directory VmWare, Amazon Web Services, Nagios or Opsview or ServiceNow is a plus
  • Financial Services and, if possible, mortgage industry experience preferred
  • Candidates must also be flexible to work a combination of day, evening and or third shifts as needed
  • Must be a team player with strong attention to detail and able to work independently
  • Proven track record at delivering timely and accurate information in a fast-paced environment 
  • Excellent critical thinking, problem solving, mathematical skills and sound judgment 
  • Strong business acumen and ability to interface with executive management


Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed

Need help finding the right job?

We can recommend jobs specifically for you! Click here to get started.