Kubernetes Autoscaling: HPA, VPA ve KEDA ile Dinamik Ölçeklendirme

Modern cloud-native uygulamalarda ölçeklendirme (scaling) hem performans hem de maliyet optimizasyonu için kritik bir gerekliliktir. Kubernetes, üç farklı autoscaling mekanizması sunarak uygulamalarınızın yük değişimlerine otomatik olarak adapte olmasını sağlar: Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA) ve Cluster Autoscaler.

Bu yazıda, Kubernetes'in yerleşik autoscaling çözümlerinden KEDA (Kubernetes Event-Driven Autoscaling) ile gelişmiş event-driven ölçeklendirmeye kadar kapsamlı bir rehber sunuyoruz. Production ortamlarında kullanabileceğiniz pratik örnekler, best practices ve gerçek dünya senaryoları ile autoscaling stratejinizi sıfırdan inşa edebileceksiniz.

İçindekiler

Autoscaling'e Giriş
Horizontal vs Vertical Scaling
Horizontal Pod Autoscaler (HPA)
Metrics Server Kurulumu
HPA ile CPU-Based Autoscaling
Custom Metrics ve Prometheus HPA
Vertical Pod Autoscaler (VPA)
VPA Kurulumu ve Konfigürasyonu
HPA + VPA Kombinasyonu
KEDA: Event-Driven Autoscaling
KEDA Kurulumu
KEDA Scalers: RabbitMQ, Kafka, Azure Queue
KEDA ile HTTP Request Scaling
Cluster Autoscaler
Production Best Practices
Cost Optimization Stratejileri
Monitoring ve Troubleshooting

Autoscaling'e Giriş {#giris}

Neden Autoscaling?

Geleneksel yaklaşımda, uygulamalarınız için peak load (zirve yük) üzerinden kaynak tahsisi yaparsınız. Bu yaklaşımın problemleri:

Maliyet verimsizliği: Gece saatlerinde kullanılmayan kaynaklara ödeme yaparsınız
Esneklik eksikliği: Beklenmedik trafik artışlarında yetersiz kalırsınız
Manuel müdahale: Kampanya dönemlerinde manuel scaling yapmanız gerekir

Autoscaling bu problemleri çözer:

          
text

Geleneksel Yaklaşım:
┌─────────────────────────────────┐
│  Peak Load: 100 pods            │
│  Off-peak: 20 pods gerekli      │
│  Maliyet: 100 pods × 7/24       │ ❌ Verimsiz
└─────────────────────────────────┘

Autoscaling:
┌─────────────────────────────────┐
│  Peak: 100 pods (otomatik)      │
│  Off-peak: 20 pods (otomatik)   │
│  Maliyet: Ortalama 40 pods      │ ✅ %60 tasarruf
└─────────────────────────────────┘

Kubernetes Autoscaling Türleri

Tür	Ne Yapar	Kullanım Alanı
HPA (Horizontal)	Pod sayısını artırır/azaltır	CPU/memory veya custom metrics bazlı
VPA (Vertical)	Pod kaynak limitlerini ayarlar	Doğru resource request/limit belirleme
Cluster Autoscaler	Node sayısını artırır/azaltır	Cluster kapasitesi yönetimi
KEDA	Event-driven scaling	Message queue, cron, HTTP requests

Horizontal vs Vertical Scaling {#horizontal-vs-vertical}

Horizontal Scaling (HPA)

Pod sayısını artırır:

          
yaml

# Başlangıç: 3 pods
kubectl get pods
# web-app-6d4b8c7f9-abc12
# web-app-6d4b8c7f9-def34
# web-app-6d4b8c7f9-ghi56

# Yük arttı → HPA devreye girer
# Sonuç: 10 pods

Avantajları:

Fault tolerance (bir pod crash olsa diğerleri çalışır)
Load balancing (trafik podlar arasında dağıtılır)
Sınırsız scaling (yüzlerce pod'a çıkabilirsiniz)

Dezavantajları:

Network overhead (pod-to-pod communication)
Stateful uygulamalarda karmaşıklık

Vertical Scaling (VPA)

Pod kaynaklarını artırır:

          
yaml

# Başlangıç: 100m CPU, 128Mi RAM
# Yük arttı → VPA devreye girer
# Sonuç: 500m CPU, 512Mi RAM (aynı pod)

Avantajları:

Basitlik (pod sayısı değişmez)
Stateful uygulamalar için uygun
Network overhead yok

Dezavantajları:

Node kapasitesi ile sınırlı
Pod restart gerektirir (VPA "Auto" modunda)

Hangisini Kullanmalı?

          
text

┌─────────────────────────────────────────────┐
│  Stateless Web Apps                         │
│  Microservices                              │
│  API Servers                                │ → HPA ✅
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  Databases                                  │
│  Stateful Sets                              │
│  Single-instance Apps                       │ → VPA ✅
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  Batch Processing                           │
│  Message Queue Consumers                    │
│  Cron Jobs                                  │ → KEDA ✅
└─────────────────────────────────────────────┘

Horizontal Pod Autoscaler (HPA) {#hpa}

HPA Nasıl Çalışır?

HPA, Metrics Server veya custom metrics kaynaklarından veri alarak pod sayısını ayarlar:

          
text

┌─────────────┐
│ HPA         │
│ Controller  │
└──────┬──────┘
       │
       │ 1. Metrics oku (her 15 saniye)
       │
       ▼
┌─────────────┐    2. Karar ver:
│ Metrics     │    Current: CPU %80
│ Server      │    Target: CPU %50
└─────────────┘    → Scale UP gerekli
       │
       │ 3. Deployment güncelle
       │
       ▼
┌─────────────┐
│ Deployment  │
│ replicas: 5 │ → 8
└─────────────┘

HPA Formula

          
text

desiredReplicas = ceil[currentReplicas × (currentMetricValue / targetMetricValue)]

Örnek:
- Şu an: 4 pods
- Mevcut CPU: %80
- Hedef CPU: %50
- Hesaplama: ceil[4 × (80 / 50)] = ceil[6.4] = 7 pods

Metrics Server Kurulumu {#metrics-server}

HPA'nın çalışması için Metrics Server gereklidir:

          
bash

# Metrics Server kurulumu
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Doğrulama
kubectl get deployment metrics-server -n kube-system

# Metrics görüntüleme
kubectl top nodes
kubectl top pods -A

Minikube/development için:

          
bash

# Minikube'de TLS skip edelim
kubectl patch deployment metrics-server -n kube-system --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

HPA ile CPU-Based Autoscaling {#hpa-cpu}

Basit CPU-Based HPA

Örnek uygulama deployment:

          
yaml

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: nginx
        image: nginx:1.25
        resources:
          requests:
            cpu: 100m      # ⚠️ Kritik: HPA için gerekli
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: web-app
spec:
  selector:
    app: web-app
  ports:
  - port: 80
    targetPort: 80
  type: ClusterIP

HPA konfigürasyonu:

          
yaml

# hpa-cpu.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # %50 CPU hedefi
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5 dakika bekle
      policies:
      - type: Percent
        value: 50         # Her seferinde %50 azalt
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0   # Hemen scale up
      policies:
      - type: Percent
        value: 100        # Her seferinde 2x artır
        periodSeconds: 15
      - type: Pods
        value: 4          # Veya en fazla 4 pod ekle
        periodSeconds: 15
      selectPolicy: Max   # İkisinden büyüğünü seç

Deploy ve test:

          
bash

# Deploy
kubectl apply -f deployment.yaml
kubectl apply -f hpa-cpu.yaml

# HPA durumu
kubectl get hpa web-app-hpa
# NAME           REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS
# web-app-hpa    Deployment/web-app   15%/50%   3         20        3

# Load test
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh

# Container içinde:
while sleep 0.01; do wget -q -O- http://web-app; done

# Başka bir terminalde HPA izle
kubectl get hpa web-app-hpa --watch

# 1-2 dakika içinde:
# NAME           REFERENCE            TARGETS    MINPODS   MAXPODS   REPLICAS
# web-app-hpa    Deployment/web-app   85%/50%    3         20        3
# web-app-hpa    Deployment/web-app   85%/50%    3         20        6  ← Scale UP
# web-app-hpa    Deployment/web-app   52%/50%    3         20        6
# web-app-hpa    Deployment/web-app   48%/50%    3         20        6  ← Stabilize

Memory-Based HPA

          
yaml

# hpa-memory.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-memory-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70  # %70 memory hedefi

⚠️ Memory-based HPA için dikkat edilecekler:

Memory leak varsa sürekli scale up yapar (çözüm olmaz)
Memory azalmaz (garbage collection gerekir)
Genellikle CPU + Memory kombinasyonu önerilir

CPU + Memory Kombine HPA

          
yaml

# hpa-combined.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-combined-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  # ⚠️ HPA, metriklerden EN YÜKSEK replica sayısını seçer
  # CPU → 6 pods, Memory → 8 pods → Sonuç: 8 pods

Custom Metrics ve Prometheus HPA {#custom-metrics}

Custom Metrics Neden Gerekli?

CPU/Memory her zaman doğru metric olmayabilir:

          
text

Senaryo 1: HTTP Request Rate
- Pod CPU: %30 (düşük)
- Ancak 1000 req/sec geliyor
- Response time: 2 saniye (yavaş)
→ CPU değil, request rate'e göre scale et!

Senaryo 2: Queue Depth
- Message queue'da 10,000 mesaj bekliyor
- Worker pods CPU: %40
→ Queue depth'e göre scale et!

Senaryo 3: Database Connection Pool
- DB connection pool %95 dolu
- Pod CPU: %50
→ Connection saturation'a göre scale et!

Prometheus Adapter Kurulumu

          
bash

# Helm ile Prometheus Adapter kurulumu
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Konfigürasyon
cat > prometheus-adapter-values.yaml <<EOF
prometheus:
  url: http://prometheus-server.monitoring.svc
  port: 80

rules:
  default: false  # Default rules'u devre dışı bırak
  custom:
  - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      matches: "^(.*)_total"
      as: "http_requests_per_second"
    metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
EOF

# Deploy
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  -n monitoring \
  -f prometheus-adapter-values.yaml

# Custom metrics doğrulama
kubectl get apiservices | grep custom.metrics
# v1beta1.custom.metrics.k8s.io

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

HTTP Request Rate HPA

Uygulama metriği expose eder:

          
yaml

# app-with-metrics.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: api
        image: your-api:v1.0
        ports:
        - containerPort: 8080
          name: http
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi

HPA konfigürasyonu:

          
yaml

# hpa-custom-requests.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"  # Pod başına 100 req/sec hedefi
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 50
        periodSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60

Test:

          
bash

# HPA durumu
kubectl get hpa api-server-hpa
# NAME              REFERENCE               TARGETS     MINPODS   MAXPODS   REPLICAS
# api-server-hpa    Deployment/api-server   75/100      2         50        2

# Load test (Apache Bench)
ab -n 100000 -c 500 http://api-server/

# HPA otomatik scale eder
# 75 req/sec → 2 pods
# 250 req/sec → 5 pods
# 1000 req/sec → 20 pods

Vertical Pod Autoscaler (VPA) {#vpa}

VPA Neden Kullanılır?

Doğru resource requests ve limits belirlemek zordur:

          
yaml

# ❌ Problem 1: Çok düşük requests
resources:
  requests:
    cpu: 100m      # Gerçek kullanım: 500m
    memory: 128Mi  # Gerçek kullanım: 512Mi
# Sonuç: Pod throttle olur, yavaş çalışır

# ❌ Problem 2: Çok yüksek requests
resources:
  requests:
    cpu: 4000m     # Gerçek kullanım: 200m
    memory: 8Gi    # Gerçek kullanım: 256Mi
# Sonuç: Kaynak israfı, scheduling problemleri

VPA çözümü:

          
text

VPA → Gerçek kullanımı izler → Doğru değerleri önerir/uygular

VPA Kurulumu ve Konfigürasyonu {#vpa-setup}

          
bash

# VPA kurulumu (official repo)
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

# Deploy
./hack/vpa-up.sh

# Doğrulama
kubectl get pods -n kube-system | grep vpa
# vpa-admission-controller-xxx
# vpa-recommender-xxx
# vpa-updater-xxx

# CRDs
kubectl get crd | grep autoscaling
# verticalpodautoscalers.autoscaling.k8s.io

VPA Modları

Mod	Davranış
Off	Sadece öneri verir, uygulamaz
Initial	Sadece pod create anında uygular
Recreate	Pod'u yeniden başlatarak uygular
Auto	Recreate + eviction (production'da dikkatli!)

VPA Örnek: Recommendation Only

          
yaml

# vpa-recommendation.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Off"  # Sadece öneri ver, uygulaما
  resourcePolicy:
    containerPolicies:
    - containerName: nginx
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 1000m
        memory: 1Gi
      controlledResources: ["cpu", "memory"]

Deploy ve öneriler:

          
bash

# VPA deploy
kubectl apply -f vpa-recommendation.yaml

# Önerileri görüntüle
kubectl describe vpa web-app-vpa

# Output:
# Recommendation:
#   Container Recommendations:
#     Container Name:  nginx
#     Lower Bound:
#       Cpu:     100m    # Minimum güvenli değer
#       Memory:  128Mi
#     Target:
#       Cpu:     250m    # Önerilen değer ✅
#       Memory:  256Mi
#     Uncapped Target:
#       Cpu:     300m    # Limit olmadan öneri
#       Memory:  300Mi
#     Upper Bound:
#       Cpu:     500m    # Maximum güvenli değer
#       Memory:  512Mi

Manuel uygulama:

          
yaml

# deployment'ı VPA önerileri ile güncelle
spec:
  containers:
  - name: nginx
    resources:
      requests:
        cpu: 250m      # VPA tavsiyesi
        memory: 256Mi
      limits:
        cpu: 500m      # Upper bound
        memory: 512Mi

VPA Auto Mode (Production)

          
yaml

# vpa-auto.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: worker-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: worker
  updatePolicy:
    updateMode: "Auto"  # Otomatik uygula ve restart
  resourcePolicy:
    containerPolicies:
    - containerName: worker
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4000m
        memory: 8Gi
      controlledResources: ["cpu", "memory"]
      # CPU limitlerine dokunma (throttling önlemek için)
      controlledValues: RequestsOnly

⚠️ Auto mode dikkat noktaları:

Pod evict edilir ve yeniden başlatılır (downtime!)
PodDisruptionBudget (PDB) kullanın
Stateful uygulamalar için dikkatli olun
Production'da önce Off modda test edin

HPA + VPA Kombinasyonu {#hpa-vpa-combo}

⚠️ Dikkat: HPA ve VPA Çakışması

Problem: HPA ve VPA aynı metric (CPU/Memory) üzerinde çalışırsa çakışır:

          
text

Senaryo:
1. CPU %80 → HPA scale up yapar → 2 pods
2. VPA CPU request'i azaltır → Pod başına daha az CPU
3. CPU %80 → HPA tekrar scale up → 4 pods
4. VPA tekrar azaltır...
→ Sonsuz loop! ♾️

✅ Doğru Kombinasyon

          
yaml

# 1. HPA: Custom metrics kullan (CPU değil!)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second  # Custom metric ✅
      target:
        type: AverageValue
        averageValue: "100"
---
# 2. VPA: Sadece resource optimization
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: api
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsOnly  # Sadece requests ayarla ✅

Sonuç:

HPA: Request rate'e göre pod sayısını ayarlar (3-30 pods)
VPA: Her pod'un doğru CPU/memory requests'ini ayarlar
✅ Çakışma yok!

KEDA: Event-Driven Autoscaling {#keda}

KEDA Nedir?

KEDA (Kubernetes Event-Driven Autoscaling), HPA'yı genişletir ve 60+ farklı event source ile çalışabilir:

✅ Message Queues (RabbitMQ, Kafka, Azure Service Bus, AWS SQS)
✅ Databases (PostgreSQL, MySQL, Redis)
✅ Cloud Services (AWS CloudWatch, Azure Monitor, GCP Pub/Sub)
✅ HTTP Requests (Prometheus metrics)
✅ Cron (zaman bazlı scaling)

Avantajı: Scale to zero yapabilir! (HPA minimum 1 pod gerektirir)

          
text

Geleneksel HPA:
┌──────────────────────────────────┐
│  Gece 3:00 - Queue boş           │
│  Yine de 1 pod çalışır           │ ❌ Kaynak israfı
└──────────────────────────────────┘

KEDA:
┌──────────────────────────────────┐
│  Gece 3:00 - Queue boş           │
│  0 pods (scale to zero)          │ ✅ Sıfır maliyet
│  Mesaj gelince → Hemen scale up  │
└──────────────────────────────────┘

KEDA Kurulumu {#keda-setup}

          
bash

# Helm ile KEDA kurulumu
helm repo add kedacore https://kedacore.github.io/charts
helm repo update

helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace

# Doğrulama
kubectl get pods -n keda
# keda-operator-xxx
# keda-operator-metrics-apiserver-xxx

# CRDs
kubectl get crd | grep keda
# scaledobjects.keda.sh
# scaledjobs.keda.sh
# triggerauthentications.keda.sh

KEDA Scalers: RabbitMQ, Kafka, Azure Queue {#keda-scalers}

RabbitMQ Scaler

          
yaml

# rabbitmq-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-consumer
spec:
  replicas: 0  # KEDA manage edecek, 0'dan başla
  selector:
    matchLabels:
      app: rabbitmq-consumer
  template:
    metadata:
      labels:
        app: rabbitmq-consumer
    spec:
      containers:
      - name: consumer
        image: your-consumer:v1.0
        env:
        - name: RABBITMQ_HOST
          value: "rabbitmq.default.svc.cluster.local"
        - name: QUEUE_NAME
          value: "tasks"
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
---
# rabbitmq-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-consumer-scaler
spec:
  scaleTargetRef:
    name: rabbitmq-consumer
  minReplicaCount: 0      # Scale to zero! ✅
  maxReplicaCount: 30
  pollingInterval: 15     # Her 15 saniyede kontrol et
  cooldownPeriod: 60      # Scale down öncesi 60 saniye bekle
  triggers:
  - type: rabbitmq
    metadata:
      queueName: "tasks"
      host: "amqp://guest:guest@rabbitmq.default.svc.cluster.local:5672"
      queueLength: "20"   # Queue'da 20 mesaj/pod hedefi

Davranış:

          
bash

# Başlangıç: Queue boş
kubectl get pods
# No pods! (replicas: 0)

# 100 mesaj gönder
kubectl exec -it rabbitmq-0 -- rabbitmqadmin publish routing_key=tasks payload="test"

# KEDA otomatik scale eder
kubectl get pods
# rabbitmq-consumer-xxx-1
# rabbitmq-consumer-xxx-2
# rabbitmq-consumer-xxx-3
# rabbitmq-consumer-xxx-4
# rabbitmq-consumer-xxx-5  → 100 mesaj / 20 = 5 pods

# Mesajlar tükendi
# 60 saniye sonra → 0 pods

Kafka Scaler

          
yaml

# kafka-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
spec:
  scaleTargetRef:
    name: kafka-consumer
  minReplicaCount: 1      # Kafka için minimum 1 önerilir
  maxReplicaCount: 50
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: "kafka-broker.default.svc.cluster.local:9092"
      consumerGroup: "my-consumer-group"
      topic: "events"
      lagThreshold: "100"    # 100 mesaj lag → 1 pod ekle
      offsetResetPolicy: "earliest"

Azure Queue Scaler

          
yaml

# azure-queue-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: azure-storage-secret
type: Opaque
stringData:
  connection: "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;"
---
# azure-queue-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: azure-queue-auth
spec:
  secretTargetRef:
  - parameter: connection
    name: azure-storage-secret
    key: connection
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: azure-queue-scaler
spec:
  scaleTargetRef:
    name: queue-processor
  minReplicaCount: 0
  maxReplicaCount: 20
  triggers:
  - type: azure-queue
    authenticationRef:
      name: azure-queue-auth
    metadata:
      queueName: "orders"
      queueLength: "10"    # 10 mesaj/pod

KEDA ile HTTP Request Scaling {#keda-http}

KEDA HTTP Add-on kurulumu:

          
bash

helm install http-add-on kedacore/keda-add-ons-http \
  --namespace keda \
  --set interceptor.replicas=2

HTTP Scaler:

          
yaml

# http-scaledobject.yaml
apiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
  name: web-api-http-scaler
spec:
  scaleTargetRef:
    name: web-api
    service: web-api
    port: 80
  minReplicaCount: 0          # Scale to zero!
  maxReplicaCount: 100
  scalingMetric:
    requestRate:
      targetValue: 50         # 50 req/sec/pod
      granularity: 1s
  cooldownPeriod: 30

Test:

          
bash

# 0 pods başlangıç
kubectl get pods

# İlk request → Cold start (1-2 saniye)
curl http://web-api/

# Pod otomatik başlar
kubectl get pods
# web-api-xxx-1

# Yüksek trafik → Scale up
ab -n 10000 -c 500 http://web-api/

# 50 req/sec/pod × 10 pods = 500 req/sec toplam

Cluster Autoscaler {#cluster-autoscaler}

Cluster Autoscaler node sayısını yönetir:

          
text

Senaryo:
1. HPA 50 pods oluşturmaya çalışır
2. Cluster'da sadece 20 pod için yer var
3. 30 pod "Pending" durumda kalır
4. Cluster Autoscaler devreye girer
5. Yeni node ekler (AWS ASG, GKE Node Pool, etc.)
6. Pending pods schedule edilir

AWS EKS örneği:

          
bash

# IAM policy oluştur (node termination izni)
cat > cluster-autoscaler-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:DescribeTags",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplateVersions"
      ],
      "Resource": ["*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup",
        "ec2:DescribeImages",
        "ec2:GetInstanceTypesFromInstanceRequirements",
        "eks:DescribeNodegroup"
      ],
      "Resource": ["*"]
    }
  ]
}
EOF

aws iam create-policy \
  --policy-name AmazonEKSClusterAutoscalerPolicy \
  --policy-document file://cluster-autoscaler-policy.json

# Helm ile deploy
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=your-eks-cluster \
  --set awsRegion=us-west-2 \
  --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::ACCOUNT_ID:role/cluster-autoscaler

Production Best Practices {#best-practices}

1. Resource Requests Mutlaka Belirleyin

          
yaml

# ❌ Kötü: Requests yok
containers:
- name: app
  image: app:v1.0

# ✅ İyi: Requests belirtilmiş
containers:
- name: app
  image: app:v1.0
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

Neden kritik?

HPA requests'e göre hesaplama yapar
Requests olmadan HPA çalışmaz!

2. PodDisruptionBudget Kullanın

          
yaml

# pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2    # En az 2 pod her zaman available olmalı
  selector:
    matchLabels:
      app: web-app

Faydası: VPA veya cluster autoscaler node evict ederken minimum availability garantisi.

3. Stabilization Window Kullanın

          
yaml

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300  # 5 dakika flapping önler

Neden gerekli?

          
text

# ❌ Stabilization yok:
12:00 → %60 CPU → Scale up → 10 pods
12:02 → %40 CPU → Scale down → 5 pods
12:04 → %60 CPU → Scale up → 10 pods
# Sürekli flapping! Pods restart eder durur.

# ✅ Stabilization var:
12:00 → %60 CPU → Scale up → 10 pods
12:02 → %40 CPU → Bekle... (stabilization window)
12:07 → Hala %40 CPU → Scale down → 7 pods

4. HPA + VPA + Cluster Autoscaler Kombinasyonu

          
yaml

# HPA: Pod count (custom metrics)
# VPA: Resource sizing (requests/limits)
# Cluster Autoscaler: Node count

# Best practice:
1. VPA → Doğru resource requests belirle (Off mode ile başla)
2. HPA → Custom metrics ile scale et (request rate, queue depth)
3. Cluster Autoscaler → Node capacity yönet

5. Monitoring ve Alerting

          
yaml

# Prometheus alerts
groups:
- name: autoscaling
  rules:
  - alert: HPAMaxedOut
    expr: |
      kube_horizontalpodautoscaler_status_current_replicas
      >= kube_horizontalpodautoscaler_spec_max_replicas
    for: 5m
    annotations:
      summary: "HPA {{ $labels.horizontalpodautoscaler }} maxed out"
      description: "Consider increasing maxReplicas"

  - alert: HPAScalingDisabled
    expr: |
      kube_horizontalpodautoscaler_status_condition{condition="ScalingActive",status="false"}
      == 1
    for: 5m
    annotations:
      summary: "HPA {{ $labels.horizontalpodautoscaler }} scaling disabled"

Cost Optimization Stratejileri {#cost-optimization}

1. Scale to Zero (KEDA)

          
yaml

# Gece saatleri için scale to zero
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dev-api-scaler
spec:
  scaleTargetRef:
    name: dev-api
  minReplicaCount: 0          # Gece 0 pods
  maxReplicaCount: 10
  triggers:
  - type: cron
    metadata:
      timezone: Europe/Istanbul
      start: 0 8 * * 1-5       # Pazartesi-Cuma 08:00 → minReplicas=2
      end: 0 19 * * 1-5        # Pazartesi-Cuma 19:00 → minReplicas=0
      desiredReplicas: "2"
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      metricName: http_requests_per_second
      query: sum(rate(http_requests_total[2m]))
      threshold: "50"

Tasarruf hesabı:

          
text

Geleneksel (7/24 2 pods):
- 2 pods × 730 saat/ay = 1,460 pod-saat
- Maliyet: $146/ay (pod başına $0.10/saat)

KEDA (09:00-19:00, Pzt-Cuma):
- 2 pods × 10 saat × 22 gün = 440 pod-saat
- Maliyet: $44/ay
- 💰 Tasarruf: %70 ($102/ay)

2. Spot/Preemptible Nodes + Cluster Autoscaler

          
yaml

# AWS EKS Node Group (Spot instances)
eksctl create nodegroup \
  --cluster=my-cluster \
  --name=spot-workers \
  --node-type=m5.large \
  --nodes-min=2 \
  --nodes-max=20 \
  --spot \
  --managed \
  --asg-access \
  --labels="workload=batch,lifecycle=spot"

# Batch workloads için node affinity
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: lifecycle
                operator: In
                values:
                - spot
      containers:
      - name: processor
        image: batch:v1.0

Tasarruf: Spot instances %70'e kadar ucuz!

3. Right-Sizing (VPA Recommendations)

          
bash

# VPA önerileri al
kubectl get vpa --all-namespaces -o json | \
  jq -r '.items[] | 
    "Deployment: " + .spec.targetRef.name + 
    "\n  Current CPU Request: " + (.status.recommendation.containerRecommendations[0].lowerBound.cpu // "N/A") +
    "\n  Recommended CPU: " + (.status.recommendation.containerRecommendations[0].target.cpu // "N/A") +
    "\n  Potential Savings: " + (
      ((.status.recommendation.containerRecommendations[0].target.cpu // "0") | 
      tonumber) < 
      ((.status.recommendation.containerRecommendations[0].upperBound.cpu // "999") | 
      tonumber) | 
      tostring
    )'

# Over-provisioned deployments bul
kubectl get deployments --all-namespaces -o json | \
  jq -r '.items[] | 
    select(.spec.template.spec.containers[0].resources.requests.cpu > "1000m") |
    .metadata.namespace + "/" + .metadata.name'

Monitoring ve Troubleshooting {#monitoring}

Grafana Dashboard

          
yaml

# HPA + VPA + KEDA dashboard (Grafana)
{
  "dashboard": {
    "title": "Kubernetes Autoscaling Overview",
    "panels": [
      {
        "title": "HPA Status",
        "targets": [
          {
            "expr": "kube_horizontalpodautoscaler_status_current_replicas"
          },
          {
            "expr": "kube_horizontalpodautoscaler_spec_max_replicas"
          },
          {
            "expr": "kube_horizontalpodautoscaler_status_desired_replicas"
          }
        ]
      },
      {
        "title": "VPA Recommendations",
        "targets": [
          {
            "expr": "kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target{resource='cpu'}"
          }
        ]
      },
      {
        "title": "KEDA ScaledObject Status",
        "targets": [
          {
            "expr": "keda_scaledobject_paused"
          }
        ]
      }
    ]
  }
}

Troubleshooting Commands

          
bash

# HPA çalışmıyor?
kubectl describe hpa <hpa-name>
# Kontrol et:
# - "unable to get metrics" → Metrics server çalışmıyor
# - "missing request for cpu" → Deployment'ta resources.requests yok
# - "failed to get cpu utilization" → Pod ready değil

# VPA önerileri güncellenmiyor?
kubectl logs -n kube-system deployment/vpa-recommender
# Kontrol et:
# - OOM killer (recommender memory limiti düşük)
# - Metrics yok (en az 24 saat data gerekli)

# KEDA scale etmiyor?
kubectl logs -n keda deployment/keda-operator
# Kontrol et:
# - Authentication hatası (TriggerAuthentication yanlış)
# - Metric source ulaşılamıyor (RabbitMQ, Kafka down)
# - Scaler configuration hatası

# Cluster Autoscaler node eklemiyor?
kubectl logs -n kube-system deployment/cluster-autoscaler
# Kontrol et:
# - IAM permissions (AWS/GCP/Azure)
# - Node group max size
# - Instance quota

Common Issues

Problem 1: HPA flapping (sürekli scale up/down)

          
bash

# Çözüm: Stabilization window artır
kubectl patch hpa my-hpa --type='json' -p='[
  {
    "op": "add",
    "path": "/spec/behavior/scaleDown/stabilizationWindowSeconds",
    "value": 300
  }
]'

Problem 2: VPA + HPA çakışması

          
bash

# HPA hangi metriği kullanıyor?
kubectl get hpa my-hpa -o jsonpath='{.spec.metrics[*].resource.name}'

# VPA aynı metriği kontrol ediyor mu?
kubectl get vpa my-vpa -o yaml | grep controlledResources

# Çözüm:
# - HPA: Custom metrics kullan (http_requests_per_second)
# - VPA: controlledValues: RequestsOnly

Problem 3: KEDA scale to zero sonrası cold start çok yavaş

          
yaml

# Çözüm: minReplicaCount=1 veya activation threshold kullan
spec:
  minReplicaCount: 0
  triggers:
  - type: rabbitmq
    metadata:
      queueLength: "20"
      activationQueueLength: "5"  # 5 mesaj → hemen 1 pod başlat

Sonuç

Kubernetes autoscaling, modern cloud-native uygulamalar için vazgeçilmez bir özelliktir. Doğru kullanıldığında:

✅ Performans: Yük artışlarında otomatik scale up
✅ Maliyet: Düşük yükte otomatik scale down (%40-70 tasarruf)
✅ Operasyonel verimlilik: Manuel scaling yok
✅ Fault tolerance: Pod/node failures otomatik recover edilir

Başlangıç önerimiz:

HPA ile başlayın (CPU-based, basit deployment)
VPA önerileri inceleyin (Off mode, right-sizing)
Custom metrics ekleyin (Prometheus, request rate)
KEDA ile event-driven scaling (queue depth, cron)
Cluster Autoscaler ekleyin (node yönetimi)

Production checklist:

Resource requests/limits tanımlı
PodDisruptionBudget yapılandırılmış
HPA stabilization window ayarlanmış
VPA Off mode test edilmiş
Monitoring ve alerting aktif
Cost tracking dashboard oluşturulmuş

Sorularınız için TekTık Yazılım ile iletişime geçebilir veya GitHub repository'mizi inceleyebilirsiniz.

Bir sonraki yazıda görüşmek üzere! 🚀

Bu yazı TekTık Yazılım DevOps Ekibi tarafından hazırlanmıştır. Production Kubernetes cluster'larımızda kullandığımız pratikler ve deneyimlerimiz paylaşılmıştır.

Kubernetes Autoscaling: HPA, VPA ve KEDA ile Dinamik Ölçeklendirme

Kubernetes Autoscaling: HPA, VPA ve KEDA ile Dinamik Ölçeklendirme

İçindekiler

Autoscaling'e Giriş {#giris}

Neden Autoscaling?

Kubernetes Autoscaling Türleri

Horizontal vs Vertical Scaling {#horizontal-vs-vertical}

Horizontal Scaling (HPA)

Vertical Scaling (VPA)

Hangisini Kullanmalı?

Horizontal Pod Autoscaler (HPA) {#hpa}

HPA Nasıl Çalışır?

HPA Formula

Metrics Server Kurulumu {#metrics-server}

HPA ile CPU-Based Autoscaling {#hpa-cpu}

Basit CPU-Based HPA

Memory-Based HPA

CPU + Memory Kombine HPA

Custom Metrics ve Prometheus HPA {#custom-metrics}

Custom Metrics Neden Gerekli?

Prometheus Adapter Kurulumu

HTTP Request Rate HPA

Vertical Pod Autoscaler (VPA) {#vpa}

VPA Neden Kullanılır?

VPA Kurulumu ve Konfigürasyonu {#vpa-setup}

VPA Modları

VPA Örnek: Recommendation Only

VPA Auto Mode (Production)

HPA + VPA Kombinasyonu {#hpa-vpa-combo}

⚠️ Dikkat: HPA ve VPA Çakışması

✅ Doğru Kombinasyon

KEDA: Event-Driven Autoscaling {#keda}

KEDA Nedir?

KEDA Kurulumu {#keda-setup}

KEDA Scalers: RabbitMQ, Kafka, Azure Queue {#keda-scalers}

RabbitMQ Scaler

Kafka Scaler

Azure Queue Scaler

KEDA ile HTTP Request Scaling {#keda-http}

Cluster Autoscaler {#cluster-autoscaler}

Production Best Practices {#best-practices}

1. Resource Requests Mutlaka Belirleyin

2. PodDisruptionBudget Kullanın

3. Stabilization Window Kullanın

4. HPA + VPA + Cluster Autoscaler Kombinasyonu

5. Monitoring ve Alerting

Cost Optimization Stratejileri {#cost-optimization}

1. Scale to Zero (KEDA)

2. Spot/Preemptible Nodes + Cluster Autoscaler

3. Right-Sizing (VPA Recommendations)

Monitoring ve Troubleshooting {#monitoring}

Grafana Dashboard

Troubleshooting Commands

Common Issues

Sonuç

Kubernetes Service Mesh: Istio ve Linkerd ile Mikroservis Yönetimi

Kubernetes Maliyet Optimizasyonu: Production Ortamında Para Tasarrufu

Container Security: Kubernetes'te Güvenli Container İmajları ile Çalışma