Systems we've shipped

Real client projects. Real problems solved. From MLOps platforms to cloud migrations to enterprise AI.

MLOps & Infrastructure Ongoing since Jan 2026

Multi-Region MLOps Platform

Production MLOps platform on AWS EKS with KubeRay, MLflow, and GitOps for a major European retailer.

AWS EKS KubeRay MLflow ArgoCD Terraform Karpenter

The Challenge

One of Europe's largest retail cooperatives needed a production-grade MLOps platform that data scientists could actually use without filing infrastructure tickets. The platform had to span multiple AWS regions, support GPU workloads for model training, and give teams self-service access to experiment tracking, model registry, and notebook environments—all while meeting enterprise security and compliance requirements.

What We Did

  • AWS EKS with KubeRay: Multi-region Kubernetes clusters running distributed Ray for scalable ML training and inference workloads
  • MLflow + JupyterHub: Self-service experiment tracking, model registry, and notebook environments
  • ArgoCD GitOps pipeline: Declarative deployments from Git, automated sync, rollback capabilities, full audit trail
  • GPU-aware Karpenter autoscaling: Right-sized GPU instances provisioned on demand, scaling down when idle
  • Observability & security: Prometheus + Loki, External Secrets, Trivy and Checkov for container and IaC scanning

What Actually Mattered

The hardest part wasn't any single technology—it was making them work together reliably across regions. GPU workloads have different scaling characteristics than web services. Karpenter had to understand GPU instance types, spot availability, and cost trade-offs.

Everything is Terraform. Every cluster, every ArgoCD application, every security policy. If it's not in Git, it doesn't exist.

AI/ML Engineering Ongoing since May 2025

Enterprise GenAI Platform

Enterprise generative AI platform on Azure with multi-provider orchestration, EU AI Act compliance, and automated evaluation.

Azure AI Foundry Databricks FastAPI Semantic Kernel MCP

The Challenge

A major European energy company needed to bring generative AI into their operations—but in a way that met EU AI Act and GDPR requirements, supported multiple model providers, and could be governed and evaluated at enterprise scale. Not a chatbot demo. A production platform that business units could actually build on.

What We Did

  • Azure AI Foundry + Databricks: Enterprise AI platform with centralized model management, fine-tuning pipelines, and governed data access
  • Multi-provider model orchestration: Azure OpenAI and Anthropic models behind a unified API, with routing based on task requirements and cost
  • GenAIOps evaluation framework: Automated quality evaluation, regression testing for prompt changes, performance benchmarking
  • MCP tool integrations: Model Context Protocol for connecting LLMs to internal systems, enabling agentic workflows
  • Zero-trust security & compliance: EU GDPR and AI Act compliant architecture, data residency controls, audit logging

What Actually Mattered

Enterprise GenAI isn't about picking the best model—it's about governance. Which data can flow where, who approved which prompt template, how do you prove to regulators that your AI system meets EU AI Act requirements.

The evaluation framework was critical. Without automated quality checks, every prompt change is a gamble. Python, FastAPI, and Semantic Kernel power the backend orchestration.

Cloud Architecture Jan–Jul 2024

Cloud Backend for Mobile Gaming

Serverless cloud backend on AWS Fargate for mobile games serving hundreds of millions of players globally.

AWS Fargate ECS Terraform CI/CD

The Challenge

The client's games serve hundreds of millions of players globally. Backend services need to handle massive concurrent load, scale elastically, and deploy reliably without impacting live games. The infrastructure had to be fully automated and reproducible across environments.

What We Did

  • AWS Fargate on ECS: Serverless container orchestration, eliminating server management while maintaining fine-grained deployment control
  • Terraform IaC: Complete infrastructure defined in code, reproducible across staging and production
  • Elastic scaling: Auto-scaling tuned for gaming traffic patterns—daily peaks, viral spikes, and event-driven load
  • CI/CD pipeline: Automated build, test, and deploy with canary releases to catch issues before full rollout

What Actually Mattered

Gaming backend traffic is unlike anything else. A new feature launch or social media moment can 10x your load in minutes. The scaling policies had to be aggressive enough to handle spikes but smart enough not to burn money during quiet hours.

Fargate removed the toil of managing EC2 instances, but the trade-off is less control over the underlying compute. We designed services that worked within Fargate's constraints while meeting performance requirements.

Cloud Migration Sep 2024–Mar 2025

Datacenter to GCP Migration

Full datacenter-to-GCP migration with Cloud Run, serverless functions, and automated data pipelines.

GCP Cloud Run Cloud Functions Terraform Python

The Challenge

The client was running their entire stack in a co-location datacenter—physical servers, manual deployments, limited scalability. They needed to move to the cloud without disrupting their business, modernize their application architecture along the way, and set up proper data pipelines for their growing analytics needs.

What We Did

  • GCP Cloud Run: Containerized services running serverless—no cluster management, automatic scaling, pay-per-request
  • Python Cloud Functions: Event-driven processing for data ingestion, scheduled jobs, and third-party integrations
  • Terraform IaC: All GCP resources defined in code, repeatable environments, version-controlled changes
  • Data pipelines: Structured data flows replacing ad-hoc manual exports with automated, reliable pipelines

What Actually Mattered

Moving from a datacenter to the cloud isn't just a lift-and-shift. The application architecture had to change to take advantage of serverless and managed services. We containerized services, broke apart tightly coupled components, and introduced proper CI/CD where there was none before.

The migration was phased—least critical services first, then core systems. The co-location contract had a hard deadline, so planning and execution had to be tight.

Fullstack Development May–Dec 2023

Psychometric Testing Platform

Psychometric testing platform with Next.js, Go, and Nomad on GCP for a scientific publisher.

Next.js Go Terraform Pulumi Nomad

The Challenge

A scientific publisher and psychological testing company needed a modern platform for delivering psychometric assessments. The system had to handle sensitive test data, provide a smooth experience for test-takers, and support complex scoring algorithms—all running on reliable infrastructure with zero-downtime deployments.

What We Did

  • Next.js frontend: Server-rendered UI for assessment delivery, optimized for accessibility and cross-device compatibility
  • Go backend services: High-performance scoring engine and API layer for complex psychometric calculations
  • GCP infrastructure with Terraform & Pulumi: Dual IaC approach leveraging strengths of each tool
  • Nomad for container orchestration: Zero-downtime deployments, chosen for operational simplicity over Kubernetes

What Actually Mattered

Psychometric data is sensitive—test integrity depends on it. The platform needed strict access controls, audit logging, and data isolation between organizations.

We chose Nomad over Kubernetes. For a team this size, Nomad's simplicity meant they could operate the platform themselves without dedicated DevOps staff. The right tool isn't always the most popular one.

DevOps & Security Apr–Nov 2022

Payments Hub Modernization

Kubernetes modernization and CI/CD pipeline overhaul for a Nordic bank's payments hub.

Kubernetes Ansible Jenkins Kafka IBM MQ

The Challenge

A major Nordic bank's payments hub—handling millions of transactions—needed its containerized stack modernized. The existing infrastructure had grown organically, with inconsistent deployment practices, manual configuration management, and aging CI/CD pipelines. In banking, reliability isn't optional and every change carries regulatory scrutiny.

What We Did

  • Kubernetes modernization: Standardized container orchestration across the payments stack with consistent deployment patterns
  • Ansible automation: Configuration management replacing manual server setup, ensuring consistency across environments
  • Jenkins CI/CD pipelines: Modernized build and deployment with testing stages, security scanning, and approval gates
  • Messaging infrastructure: Kafka and IBM MQ integration for reliable, ordered transaction processing

What Actually Mattered

Banking infrastructure moves slowly for good reason. Every change needs audit trails, rollback plans, and regulatory sign-off. The challenge was implementing changes in an environment where a failed deployment could affect millions of payment transactions.

We introduced changes incrementally, with extensive testing at each stage. The Kafka and IBM MQ messaging layer was particularly sensitive—message ordering and exactly-once delivery aren't negotiable when processing payments.

Have a similar problem?

Let's talk about what you're building and whether we can help.

Talk to Our Team