Director, Service Reliability Engineering

Location UK / Belfast

Job Type Permanent full-time

Salary Not disclosed

Updated 2 days ago

Reference 1404488

 Job Description

We are looking for an outstanding Director of Service Reliability Engineering with extensive experience working with both Operations and Product Engineering teams on service delivery in highly available, large-scale production environments.

Company Overview

At Proofpoint, we have a passion for protecting people, data, and brands from today’s advanced threats and compliance risks. We hire the best people in the business to:

  • Build and enhance our proven security platform
  • Blend innovation and speed in a constantly evolving cloud architecture
  • Analyze new threats and offer deep insight through data-driven intel
  • Collaborate with customers to help solve their toughest security challenges

We are singularly devoted to helping our customers protect what matters most. That’s why we’re a leader in next-generation cybersecurity—and why more than half of the Fortune 100 trust us as a security partner.

The Role

We are looking for an outstanding Director of Infrastructure Engineering with extensive experience working with both Operations and Product Engineering teams on service delivery in highly available, large-scale production environments.  With the majority of the Fortune 100 as customers, we have high standards when it comes to production availability, response time, and business continuity practices.

In this role, you will be leading a global team of Operations Service Reliability Engineers whose job it is to keep our production core infrastructure systems, online, scalable, and out ahead of the business services that depend on them.  You’ll be responsible for developing an infrastructure services roadmap and leading your team through the execution of continuous service improvement initiatives.

In your role as Director of Infrastructure Engineering, you will also be responsible for:

  • Report to the Senior Director of Global Infrastructure Operations
  • Strategically drive our evolving Operations production design and service architecture, developing reliable and scalable infrastructure with proper monitoring, alerting, documentation, and capacity planning
  • Manage the international 24×7 multi-site infrastructure that powers Proofpoint’s production services
  • Develop the leaders of tomorrow by training and mentoring our global staff of Operations personnel
  • Manage the team’s service coverage strategies and on-call rotations
  • Own the CapEx and OpEx forecast for Proofpoint’s shared infrastructure platforms    

Your day-to-day

  • Manage a global team of Service Reliability Engineers, supporting our production infrastructure and the many mission-critical business services that depend on it
  • Work with other technical leaders in Operations to define and implement process and technical standards that improve performance, stability, and standardization across our environments
  • Understand large-scale product architectures and the production ecosystem in which they live in order to develop capacity plans, SLA metrics and standards, plan product launches and releases, and solve operability challenges
  • Own the uptime, reliability, scalability and security of our Core Infrastructure platforms
  • Own the shared compute and storage infrastructure, what we use, where we use it, how much it costs and how we can constantly improve it
  • Create comprehensive plans to solve technical, process, and staffing challenges.  Relentlessly drive those plans forward to achieve business success
  • Develop the technical and soft skills of your team members and participate as part of the broader Operations leadership team to develop the organization’s skills as a whole
  • Work closely with our recruiting staff to expand the team, including sourcing and interviewing candidates, participating in conferences/events, and onboarding new hires           

What you bring to the team

  • Extensive experience of direct management of technical staff in a large-scale Linux-based production environment
  • Previous experience as a member of a technical team responsible for a production environment
  • Exceptional verbal and written communication skills and executive presence
  • Experience managing a large distributed computing environment and associated configuration management solutions (Puppet, Nagios, Chef, Juju, etc.)
  • Extensive knowledge of Public and Private Cloud infrastructure and modern virtualization technologies (AWS, GCP, VMware, OpenStack, etc.)
  • Experience with industry-standard operational practices such as change management, incident management, and working in colocation datacenters
  • Previous responsibility for hiring and performance management decisions, including managing super stars and under-performers
  • Solid understanding of developer tools and concepts like Agile, continuous integration, version control, test-driven deployment
  • BS or MIS in Computer Science, Engineering, or a related technical discipline, or equivalent experience            

Why Proofpoint

As a customer focused and driven-to-win organization with leading edge products, there are many exciting reasons to join the Proofpoint team. We believe in hiring the best the brightest and cultivating a culture of collaboration and appreciation. As we continue to grow and expand globally, we understand that hiring the right people and treating them well is key to our success! We are a multi-national company with locations in 10 countries, with each location contributing to Proofpoint’s amazing culture!