$ whoami

David Pigna

David Pigna

Senior DevOps / SRE

I operate critical high-availability platforms, with experience in large-scale streaming, reliability, automation and observability.

Large-scale streaming Millions of users Production Kubernetes Multi-cloud Incident response

$ ls ./services

What I can do for you

I help teams stabilize, automate and scale production platforms, focusing on reliability, observability and real-world operations.

Platform Engineering

Your platform grows but operations become unpredictable. I design and operate Kubernetes clusters that scale without surprises: safe deployments, stable workloads and zero downtime on every release.

Kubernetes Helm Argo CD Argo Rollouts Karpenter

Infrastructure as Code

Manually-built infrastructure nobody knows how to reproduce. I automate and codify everything with Terraform, Terragrunt and Ansible so every environment is reproducible, auditable and free from manual work.

Terraform Terragrunt Ansible AWS GCP OCI

Observability & SRE

You find out about problems when users have already reported them. I implement full observability stacks with metrics, logs and SLO-oriented alerts so you see problems before they impact your users.

Prometheus Grafana Loki OpenTelemetry SLOs Alerting

CI/CD & Automation

Manual, slow or fragile deployments that block your team. I design continuous delivery pipelines that reduce feedback cycles, eliminate manual steps and make getting to production boring — in the best way.

GitHub Actions GitLab CI/CD Jenkins Concourse

Troubleshooting & On-call

Incidents that repeat, postmortems that lead nowhere. I solve production problems with real root cause analysis and structural remediation so the same incident doesn't wake anyone up at 3am again.

Incident Management Root Cause Analysis Postmortem

Networking & Databases

Unexplained latency, databases that become everyone's bottleneck. I operate and optimize cloud networks and relational and NoSQL databases in production with a focus on performance, availability and reliable recovery.

Networking PostgreSQL Redis MongoDB

$ grep -r "on-call" ./incidents/

When teams reach out

I usually get called when...

  • Kubernetes is growing but nobody wants to touch it
  • Incidents are reported by users before alerts fire
  • Deployments are manual, slow, or fragile
  • Infrastructure nobody knows how to reproduce
  • They need a solid reliability foundation before the next scale

$ cat tech-stack.yaml

Technical stack

Orchestration & Containers

  • Kubernetes
  • Helm
  • Docker
  • Containerd
  • Argo Rollouts
  • Karpenter

IaC & Automation

  • Terraform
  • Terragrunt
  • Ansible

Cloud

  • AWS
  • GCP
  • Azure
  • OCI

CI/CD

  • GitHub Actions
  • GitLab CI/CD
  • Jenkins
  • Concourse CI
  • Argo CD

Observability

  • Prometheus
  • Grafana
  • Loki
  • Alertmanager
  • OpenTelemetry

Databases

  • PostgreSQL
  • MySQL
  • Redis
  • Valkey
  • MongoDB
  • Elasticsearch

Networking & Security

  • VPC
  • VPN
  • Nginx
  • Istio
  • cert-manager
  • Vault

Scripting & Languages

  • Bash
  • Python
  • Go for tooling

$ cat about.md

About me

about.md

I'm a Senior DevOps / SRE with experience in the reliability and operation of large-scale streaming platforms, focused on availability, automation and troubleshooting of distributed systems serving millions of users.

My focus is on reliability, automation and observability. I believe well-built infrastructure is the kind that doesn't need constant attention — because it was designed, automated and monitored from day one.

I work independently with clients who need to operate their systems seriously: scaling teams, startups that need a solid foundation, or companies looking to improve their reliability posture.

Outside the terminal

TTRPGs

Player and enthusiast of tabletop role-playing games. Founder of dadomanija.com, a news and community site for TTRPGs in Spanish. I also developed the website for the publisher MitoRol.

Music & Production

Active musician currently studying Music Production. The same attention to detail I bring to systems, I bring to sound.

$ ping david

Let's work together

Reach out if you're facing reliability, automation, or operations challenges — whether it's a specific project, ongoing support, or an initial assessment.

How I usually engage

Reliability assessment K8s / Platform review Observability & SLOs Incident remediation Ongoing SRE
response.json
{
  "availability": "open to discuss",
  "mode": "remote",
  "timezone": "UTC-3 / Argentina",
  "languages": ["es", "en"]
}