We’re hiring a Site Reliability Engineer

ExpressVPN is looking for SREs to join our small but growing Cloud Platform tribe. If you identify as an SRE with prior platform and observability experience, or as a Software Engineer passionate about building resilient and scalable systems, this could be the role for you. With your ability to dive deep into problem spaces and come up with automated solutions, you will help shape company wide initiatives to improve service reliability and customer satisfaction. You will work closely with product development teams within different business domains including services which serve millions of requests per minute to millions of users across the world.

 

What you’ll be doing

We are open to varying degrees of experience, the more experienced you are the more we’ll expect to see your expertise show.

  • Designing, building and operating the services we consume from AWS and platform shared services we run on top.
  • Embed and pair with product development teams across the company to solve application and infrastructure style challenges.
  • Help meet reliability objectives including service readiness, SLOs and SLAs across the business.
  • Offer consultation on reliability and creating scalable, secure and resilient systems.
  • Build tools and infrastructure to make developers’ lives easier.

 

What you’ll need to succeed 

We do not expect that you have a deep understanding or experience of everything listed, but you should be willing to develop in the areas you have less experience in:

  • Excellent written and verbal communication skills.
  • Working knowledge of scalable architectures and performance optimization techniques for services that serve millions of requests per minute to millions of users across the world.
  • Exceptional interpersonal skills: Empathy, negotiation skills, problem-solving acumen, emotional intelligence.
  • Solution driven with a track record of breaking down complex problems and measuring results.
  • 3+ years experience with a public cloud provider such as AWS or GCP.
  • 3+ years experience with observability solutions and concepts, including their usage in creating resilient systems, such as Prometheus, Datadog or Grafana.
  • 3+ years experience working with databases and object storage such as MySQL, PostgreSQL and S3.
  • Experience in Linux environments with the ability to troubleshoot problems at the OS, database, server, or network level.
  • Strong experience of being on-call for mission critical services, incident management and running postmortems.
  • Excellent understanding of Infrastructure as Code (IaC) concepts and tooling such as Terraform or CloudFormation.
  • Experience with at least one programming language, such as Python or Golang.
  • Familiarity with software development best practices including test driven development, continuous delivery and agile methodologies.
  • Eager to learn and improve your skill set.

 

Nice skills to have, but not required 

  • Experience operating services at scale on top of Kubernetes, ideally with a service mesh such as Istio.
  • Experience with distributed microservices architectures.
  • Familiarity with caches and message queues such as Redis and RabbitMQ.
  • Knowledge of OKRs.
  • Ability to participate in build versus buy decisions.

Please upload your resume as a PDF and do not include compensation information.

About Us

For more than 11 years, we’ve paved the way towards a more private and secure digital world. We’re a global SaaS company and an industry leader in cybersecurity. Millions of consumers worldwide use our internet privacy and security products every day.

Our team of over 800 employees spans the planet. Team members work from major international hubs like London, Hong Kong, Singapore, Tokyo, Toronto, Taiwan, Poznań, and more.

We’re profitable, and we’re growing. Right now, we’re hiring talent across all functions: software development and engineering, product, data analytics, marketing, content, and people.

We’d love you to join us and be part of the team.