Senior Platform Engineer
állásajánlat

Munkavégzés helye:
Budapest II. – Pasaréti út 83., Hibrid / Budapest
Munkaidő:
Teljes munkaidő
Munkaviszony:
Alkalmazott

Senior Platform Engineer

Hivatkozási szám: SEPLEN0401

In our AI Lab, we merge the stability of a bank with the dynamism of a startup. Our mission is to build groundbreaking AI products from scratch. We're looking for a Senior Platform Engineer to architect and build the high-availability, scalable platform that will power our entire AI operation.
 
Our platform will be built on a multi-region Azure foundation (AKS + Cosmos DB + Event Hubs). We are just starting to build our Platform team, and you will be a founding member. You won't just be operating a platform; you will be building it from the ground up: from the Terraform code for our AKS clusters to the CI/CD pipelines for our models. This is a hands-on role focused on engineering & automation. We work according to SRE best practices with the goal of creating a platform that will achieve 99.9%+ availability.
 
 
 
What You'll Do
  • Build the Platform from Scratch:
    • Code new AKS clusters, networking (VNet), and IAM guardrails using Terraform and Helm charts.
    • Create "golden" Docker images, GitOps pipelines (ArgoCD/Flux), automatic node provisioning, and scaling policies for both CPU and GPU workloads.
    • Design and implement the core MLOps infrastructure, including artifact repositories, model registries, and feature stores.
  • Automate for Reliability:
    • Implement and fine-tune our observability stack: Azure Monitor metrics, Prometheus, Grafana dashboards.
    • Build automated recovery mechanisms and chaos engineering tests to proactively find and fix weaknesses in the system.
  • Champion Platform Best Practices:
    • Work with development teams to ensure they are building reliable, observable, and secure applications from day one.
    • Create runbooks and documentation to prepare for future incident management.
 
 
Key Responsibilities
  • IaC Development and Maintenance: Manage our infrastructure state with Terraform Cloud or Atlantis.
  • Kubernetes Operations: Handle version upgrades, manage node pools (including GPU nodes), and define network policies.
  • Data Environment Reliability: Ensure the reliability of our data stores (e.g., Cosmos DB geo-replication, Event Hubs consumer group management).
  • Security Hardening: Implement security best practices, including CVE scanning for Docker images and regular patching of node AMIs.
  • Observability Pipeline: Manage log processing, alerting rules, and capacity forecasting to stay ahead of problems.
  • Support AI Engineers: Provide a self-service platform and tooling that enables AI Engineers to train, deploy, and monitor their models with minimal friction.
 
 
What You'll Bring
  • 5+ years of experience in a DevOps, SRE, or Platform Engineering role.
  • Deep, hands-on experience with at least one major cloud provider (Azure is a strong plus).
  • Proven experience with containerization (Docker) and orchestration (Kubernetes) in a production environment.
  • Expertise in Infrastructure as Code (Terraform is a must).
  • Strong programming skills in a scripting language (Python is a strong plus).
  • Experience building and maintaining production-grade CI/CD systems.
  • A proactive mindset focused on preventing incidents rather than just reacting to them.
 
 
What We Offer
  • A Green-field Opportunity: You will be building a state-of-the-art AI platform from the ground up, using the best tools for the job.
  • A Modern Toolkit: Work with GitHub, Kubernetes, Managed Grafana, Terraform, and the latest Azure AI services.
  • Real Impact: Your work is the foundation upon which our entire AI strategy is built. You are a critical enabler for the entire team.
  • Focus on Engineering, Not Firefighting: In the initial phase, your role is 100% focused on building and automating, not on reactive, on-call firefighting.
  • A Laid-back, Senior Team: We have one daily stand-up, then we focus on deep work.
  • Competitive Salary.
  • HO-friendly with a cool HQ in Budapest.
 
 
This is NOT the job for you if
  • You are looking for a role that is primarily about maintaining existing systems. We are building from scratch.
  • You enjoy manual configuration and doing the same task twice.
  • You are not passionate about building secure, reliable, and highly automated systems

Az állás alapinformációi

  • Munkaterület: Fejlesztés
  • Nyelvtudás: Nem igényel nyelvtudást
  • Műszakrend: Hibrid munkavégzés