Job Site Reliability Engineer (Kafka) en Madrid

Talent Hackers en Madrid

Digital job Site Reliability Engineer (Kafka) at Talent Hackers

Site Reliability Engineer (Kafka)

Talent Hackers Madrid

A consultar

Oficina Full-time
Development 2-5 años
By Talent Hackers
¡I want this Job!


Partnership with Talent Hackers

Job description


We are looking for passionate and innovative professionals eager to join our Platform Systems team, which is part of the Developer Experience group. The team ensures that our Cloud platform is operated using the best methodologies and tools and enables us to delight our customers with the best Cloud experience. This position focuses on operating, scaling and automating the Kafka and streaming infrastructure with the highest performance, availability and service level, but also ensuring it runs cost effectively. The new Platform Engineer for Kafka should contribute to the team with previous experience in Software Engineering, Platform Engineering or Site Reliability Engineering and have a particular interest in distributed systems and large scale data intensive pipelines.

  • Monitoring and reliability. Use and own the specifications of our monitoring, telemetry, reliability and automation toolset to assess the status of the data pipeline based on workloads.
  • Operation. Manage the availability of Kafka clusters and streaming platforms that power data pipelines. Understand and be able to communicate scale, capacity, security, redundancy and performance attributes and requirements.
  • Incident management and response. Detect, diagnose and correct incidents by finding solutions to achieve required Service Levels. Own the post-mortem process of such incidents by writing technical content for both customers and internal stakeholders.
  • Work with architects, team leads and developers on activities such as system design consulting, software platform and framework development, capability planning and release reviews.
  • Contribute to the tooling and automation framework for infrastructure provisioning and scaling, with a focus on resiliency and elasticity strategies.
  • You will have on-call responsibilities in rotation with the engineering team.

  • Min 5 years of experience as a Software Engineer, DevOps Engineer, Platform Engineer or Site Reliability Engineer with knowledge of best practices of professional software development.
  • Experience with distributed systems and streaming technologies in general, and familiarity with Apache Kafka in particular.
  • Experience operating services on Linux systems.
  • Experience with monitoring solutions such as New Relic, Prometheus, Grafana and others. Experience administering and deploying on cloud-based platforms (Azure, AWS, Google and/or others), using infrastructure as code (Cloud Formation, Terraform, etc.), configuration management tools (Ansible, Puppet) and pipeline creation tools (like Jenkins, GitHub Actions, GitLab).
  • At ease with operating and managing production systems, solving issues striking the right balance between urgency and methodology.
  • Experience with AWS MSK.
  • Experience working with Kubernetes and writing custom operators.
  • Experience with Kafka in-depth configuration and performance optimisation.
  • Excellent written and verbal skills in English
¡This Job is mine!


Partnership with Talent Hackers

Share Job:

Cookies help us deliver our services. By using our services, you agree to our use of cookies.