Proofpoint are hiring Senior Service Reliability Engineer responsible for provisioning, maintaining, and scaling our production services and server farms across multiple data centers.
As a Senior Service Reliability Engineer at Proofpoint you will develop a deep understanding of the various services and applications that come together to deliver Proofpoint’s next generation security products. You will contribute to the architecture to improve scalability, operability, service reliability, capacity, and performance. You will be responsible for provisioning, maintaining, and scaling our production services and server farms across multiple, world-wide data centers. We are looking for passion, curiosity, attention to details, taking pride in one’s work, taking ownership, and having ideas/opinions. If you’re the enthusiastic team player who cares about the infrastructure, remains calm in crisis, collaborates cross functionally, and easily writes code for automation we want to talk to you.
- Build long lasting, effective partnerships across the organization to foster collaboration between Product, Engineering and Operations teams.
- Organize and manage multiple simultaneous projects.
- Lead by example, care for your team, and establish credibility with the quality of your and your team’s technical execution.
- Manage an international 24×7, multi-site production infrastructure powering the Proofpoint services, including deployment, maintenance, troubleshooting, performance tuning, and security.
- Root-cause complex problems and involve multiple stakeholders, network, hardware and software that relate to scaling and performance.
- Ensure proper monitoring, alerting, capacity planning and reporting in the production environment.
- Contribute to the evolving design and architecture of reliable and scalable infrastructure.
- Collaborate with product engineering teams to ensure Operations standards are observed, determine resource impacts for upcoming product deployments, and ensure successful product rollouts.
- Participate in an on-call rotation and be willing to jump on escalated issues as needed.
What you bring to the team
- Extensive experience managing, troubleshooting, and tuning Linux systems
- Demonstrable experience working in a high volume, large deployment, multi-datacenter environment
- Competency in automating management of systems and applications using Perl, Python, or Ruby.
- Use of industry-standard foundation technologies such as TCP/IP, HTTP, DNS, SMTP, and LDAP.
- Previous experience managing a large distributed computing environment.
- Experience with virtualization – KVM, VMware vSphere, ESX, ESXi, and vCenter.
- Excellent verbal and written communication skills.
- Experience with monitoring and alerting systems.
- Experience with industry-standard operational practices such as change management, incident management, and working in colocation datacenters.
- Extensive experience with configuration management tools such as Puppet or Chef.
- Past experience implementing load-balancing technologies – F5, Netscaler or similar.
- Experience with Kafka, Elastic Search, and Cassandra desired.
- Experience with services, tools, platforms and infrastructure components offered by public cloud providers such as AWS or Rackspace Cloud.