Argo 家族完全入门指南:Kubernetes 的 GitOps 利器

从零开始掌握 Argo 生态系统,打造现代化的云原生 CI/CD 平台

目录


Argo 家族概览

Argo 是一套专为 Kubernetes 设计的开源工具集,由 CNCF(云原生计算基金会)孵化。它解决了云原生应用从开发到部署的全生命周期管理问题。

家族成员一览

工具核心功能适用场景成熟度
Argo CDGitOps 持续部署应用发布、配置管理⭐⭐⭐⭐⭐ 生产就绪
Argo Workflows工作流编排引擎CI/CD、数据处理⭐⭐⭐⭐⭐ 生产就绪
Argo Rollouts渐进式交付金丝雀、蓝绿部署⭐⭐⭐⭐ 稳定
Argo Events事件驱动自动化Webhook、消息队列⭐⭐⭐⭐ 稳定
Argo Image Updater镜像版本自动更新自动化部署⭐⭐⭐ 可用

架构关系图

┌─────────────────────────────────────────────────────┐
│              Git Repository (Single Source of Truth) │
└─────────────────┬───────────────────────────────────┘
                  │
                  ▼
         ┌────────────────┐
         │   Argo CD      │ ◄─── 监控 Git 变化
         │  (GitOps 核心)  │       自动同步到 K8s
         └────────┬───────┘
                  │
      ┌───────────┼───────────┐
      ▼           ▼           ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Rollouts │ │ Workflows│ │  Events  │
│(部署策略) │ │(任务编排) │ │(事件触发) │
└──────────┘ └──────────┘ └──────────┘
      │           │           │
      └───────────┴───────────┘
                  │
                  ▼
         ┌────────────────┐
         │   Kubernetes   │
         │     Cluster    │
         └────────────────┘

核心组件详解

1️⃣ Argo CD - GitOps 持续部署的基石

什么是 GitOps?

传统部署: 手动执行 kubectl apply → 配置不一致 → 难以追踪变更
GitOps 部署: Git 作为唯一真实源 → 自动同步 → 声明式管理

核心概念

# Application 定义示例
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/yourorg/configs.git
    targetRevision: main
    path: apps/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true      # 自动删除不再存在的资源
      selfHeal: true   # 自动修复漂移

安装步骤

# 1. 添加 Helm 仓库
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update

# 2. 安装 Argo CD
helm install argocd argo/argo-cd \
  --namespace argocd \
  --create-namespace \
  --set server.service.type=LoadBalancer

# 3. 等待 Pod 就绪
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=argocd-server -n argocd --timeout=300s

# 4. 获取初始管理员密码
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d && echo

# 5. 访问 UI(任选一种方式)
# 方式 1: 端口转发
kubectl port-forward svc/argocd-server -n argocd 8080:443

# 方式 2: 如果使用 LoadBalancer
kubectl get svc argocd-server -n argocd
# 访问: https://<EXTERNAL-IP>

首次使用流程

# 1. 安装 ArgoCD CLI(可选但推荐)
curl -sSL -o argocd-linux-amd64 https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd
rm argocd-linux-amd64

# 2. CLI 登录
argocd login localhost:8080 --username admin --password <初始密码>

# 3. 修改密码
argocd account update-password

# 4. 创建第一个应用
argocd app create guestbook \
  --repo https://github.com/argoproj/argocd-example-apps.git \
  --path guestbook \
  --dest-server https://kubernetes.default.svc \
  --dest-namespace default

# 5. 同步应用
argocd app sync guestbook

2️⃣ Argo Workflows - Kubernetes 原生的工作流引擎

核心优势

  • 容器原生: 每个步骤都是一个容器
  • DAG 支持: 复杂依赖关系编排
  • 并行执行: 自动并行无依赖任务
  • 资源高效: 比 Jenkins 节省 60% 资源

工作流示例

# 简单的 CI 流水线
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: ci-pipeline-
spec:
  entrypoint: ci-pipeline
  templates:
  
  # 主流程定义
  - name: ci-pipeline
    dag:
      tasks:
      - name: clone
        template: git-clone
      
      - name: test
        dependencies: [clone]
        template: run-tests
      
      - name: build
        dependencies: [test]
        template: docker-build
      
      - name: push
        dependencies: [build]
        template: docker-push

  # 子任务模板
  - name: git-clone
    container:
      image: alpine/git
      command: [sh, -c]
      args: ["git clone https://github.com/yourorg/app.git /work"]
      volumeMounts:
      - name: workdir
        mountPath: /work

  - name: run-tests
    container:
      image: node:18
      command: [sh, -c]
      args: ["cd /work && npm install && npm test"]
      volumeMounts:
      - name: workdir
        mountPath: /work

  - name: docker-build
    container:
      image: gcr.io/kaniko-project/executor:latest
      args:
      - "--context=/work"
      - "--dockerfile=/work/Dockerfile"
      - "--destination=myregistry.com/app:{{workflow.uid}}"
      volumeMounts:
      - name: workdir
        mountPath: /work

  - name: docker-push
    container:
      image: curlimages/curl
      command: [sh, -c]
      args: ["echo 'Image pushed successfully'"]

  # 共享存储卷
  volumeClaimTemplates:
  - metadata:
      name: workdir
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

安装与使用

# 1. 安装 Argo Workflows
helm install argo-workflows argo/argo-workflows \
  --namespace argo \
  --create-namespace \
  --set server.serviceType=LoadBalancer

# 2. 访问 UI
kubectl -n argo port-forward svc/argo-workflows-server 2746:2746

# 3. 提交工作流
kubectl apply -f my-workflow.yaml

# 4. 查看工作流状态
kubectl get workflows -n argo

# 5. 查看日志
kubectl logs -n argo <workflow-pod-name>

3️⃣ Argo Rollouts - 渐进式交付的利器

支持的部署策略

# 金丝雀发布(Canary)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app-canary
spec:
  replicas: 10
  strategy:
    canary:
      steps:
      - setWeight: 10    # 10% 流量到新版本
      - pause: {duration: 2m}
      
      - setWeight: 25    # 增加到 25%
      - pause: {duration: 2m}
      
      - setWeight: 50    # 增加到 50%
      - pause: {duration: 5m}
      
      - setWeight: 75    # 增加到 75%
      - pause: {}        # 手动审批
      
      - setWeight: 100   # 全量发布
      
      # 分析配置(自动回滚)
      analysis:
        templates:
        - templateName: success-rate
        startingStep: 2
        args:
        - name: service-name
          value: my-app
  
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: app
        image: myregistry.com/app:v2.0.0
# 蓝绿部署(Blue-Green)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app-bluegreen
spec:
  replicas: 5
  strategy:
    blueGreen:
      activeService: my-app-active     # 生产流量
      previewService: my-app-preview   # 预览流量
      autoPromotionEnabled: false      # 手动切换
      scaleDownDelaySeconds: 300       # 保留旧版本 5 分钟
  
  template:
    spec:
      containers:
      - name: app
        image: myregistry.com/app:v2.0.0

安装与集成

# 1. 安装 Rollouts Controller
helm install argo-rollouts argo/argo-rollouts \
  --namespace argo-rollouts \
  --create-namespace

# 2. 安装 kubectl 插件
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
sudo install -m 555 kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts

# 3. 查看 Rollout 状态
kubectl argo rollouts get rollout my-app -n production --watch

# 4. 手动推进金丝雀发布
kubectl argo rollouts promote my-app -n production

# 5. 紧急回滚
kubectl argo rollouts abort my-app -n production
kubectl argo rollouts undo my-app -n production

4️⃣ Argo Events - 事件驱动的自动化枢纽

核心概念

事件源 (Event Source) → 传感器 (Sensor) → 触发器 (Trigger)
    │                       │                    │
  Webhook               条件判断              创建 Workflow
  Git Push              事件过滤              发送通知
  消息队列              数据转换              更新 Rollout

实战示例:Webhook 触发部署

# 1. 定义事件源(接收 GitHub Webhook)
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
  name: github-webhook
  namespace: argo-events
spec:
  service:
    ports:
    - port: 12000
      targetPort: 12000
  webhook:
    github-push:
      port: "12000"
      endpoint: /push
      method: POST
# 2. 定义传感器(触发工作流)
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: github-sensor
  namespace: argo-events
spec:
  dependencies:
  - name: github-dep
    eventSourceName: github-webhook
    eventName: github-push
    filters:
      data:
      - path: body.ref
        type: string
        value:
        - "refs/heads/main"  # 只监听 main 分支
  
  triggers:
  - template:
      name: trigger-workflow
      k8s:
        operation: create
        source:
          resource:
            apiVersion: argoproj.io/v1alpha1
            kind: Workflow
            metadata:
              generateName: ci-pipeline-
            spec:
              entrypoint: build-and-deploy
              templates:
              - name: build-and-deploy
                container:
                  image: alpine
                  command: [sh, -c]
                  args: ["echo 'Building from commit {{inputs.parameters.commit}}'"]
        parameters:
        - src:
            dependencyName: github-dep
            dataKey: body.head_commit.id
          dest: spec.arguments.parameters.0.value

安装与配置

# 1. 安装 Argo Events
helm install argo-events argo/argo-events \
  --namespace argo-events \
  --create-namespace

# 2. 创建 EventBus(消息总线)
kubectl apply -n argo-events -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
  name: default
spec:
  nats:
    native:
      replicas: 3
EOF

# 3. 部署事件源和传感器
kubectl apply -f event-source.yaml
kubectl apply -f sensor.yaml

# 4. 暴露 Webhook 端点
kubectl -n argo-events port-forward svc/github-webhook-eventsource-svc 12000:12000

# 5. 在 GitHub 配置 Webhook
# URL: http://your-domain:12000/push
# Content type: application/json
# Events: Push events

5️⃣ Argo Image Updater - 自动化镜像更新

工作原理

容器镜像仓库 → Image Updater 检测新版本 → 更新 Git 配置 → ArgoCD 自动部署

配置示例

# 在 ArgoCD Application 添加注解
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
  annotations:
    # 启用自动更新
    argocd-image-updater.argoproj.io/image-list: myapp=myregistry.com/app
    
    # 更新策略:latest 标签
    argocd-image-updater.argoproj.io/myapp.update-strategy: latest
    
    # 或者使用语义化版本
    # argocd-image-updater.argoproj.io/myapp.update-strategy: semver
    # argocd-image-updater.argoproj.io/myapp.allow-tags: regexp:^v[0-9]+\.[0-9]+\.[0-9]+$
    
    # Git 写回配置
    argocd-image-updater.argoproj.io/write-back-method: git
    argocd-image-updater.argoproj.io/git-branch: main
spec:
  source:
    repoURL: https://github.com/yourorg/app-configs.git
    path: overlays/production

安装步骤

# 1. 安装 Image Updater
helm install argocd-image-updater argo/argocd-image-updater \
  --namespace argocd \
  --set config.argocd.token=$(kubectl get secret -n argocd argocd-secret -o jsonpath='{.data.admin\.password}' | base64 -d)

# 2. 配置镜像仓库认证(如果需要)
kubectl create secret generic regcred \
  --from-file=.dockerconfigjson=$HOME/.docker/config.json \
  --type=kubernetes.io/dockerconfigjson \
  -n argocd

# 3. 查看更新日志
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-image-updater -f

快速上手实战

场景 1:零基础部署第一个 GitOps 应用

目标: 10 分钟内完成 Git → Kubernetes 的自动部署

# Step 1: 安装 Argo CD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Step 2: 访问 UI
kubectl port-forward svc/argocd-server -n argocd 8080:443 &

# Step 3: 获取密码并登录
ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
echo "ArgoCD Password: $ARGOCD_PASSWORD"

# 浏览器访问 https://localhost:8080
# 用户名: admin
# 密码: 上面输出的密码

# Step 4: 通过 UI 创建应用
# - Repository URL: https://github.com/argoproj/argocd-example-apps.git
# - Path: guestbook
# - Cluster: in-cluster
# - Namespace: default

# Step 5: 点击 "Sync" 按钮,几秒后应用就部署完成!

# 验证
kubectl get all -n default | grep guestbook

场景 2:构建完整的 CI/CD 流水线

目标: 代码推送 → 自动构建 → 自动测试 → 自动部署

# 架构:GitHub Webhook → Argo Events → Argo Workflows → Argo CD

# 1. 准备 Git 仓库结构
your-app-repo/
├── src/                 # 应用代码
├── Dockerfile
├── k8s/
│   ├── base/
│   │   ├── deployment.yaml
│   │   └── kustomization.yaml
│   └── overlays/
│       └── production/
│           └── kustomization.yaml
└── .github/
    └── workflows/       # (可选)保留用于其他 CI 任务

# 2. 创建 Workflow 模板
kubectl apply -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: ci-template
  namespace: argo
spec:
  entrypoint: ci-pipeline
  arguments:
    parameters:
    - name: repo-url
    - name: revision
  
  templates:
  - name: ci-pipeline
    dag:
      tasks:
      - name: checkout
        template: git-clone
      
      - name: unit-test
        dependencies: [checkout]
        template: run-tests
      
      - name: build-image
        dependencies: [unit-test]
        template: kaniko-build
      
      - name: update-manifest
        dependencies: [build-image]
        template: update-k8s

  - name: git-clone
    script:
      image: alpine/git
      command: [sh]
      source: |
        git clone {{workflow.parameters.repo-url}} /work
        cd /work && git checkout {{workflow.parameters.revision}}

  - name: run-tests
    container:
      image: node:18
      command: [sh, -c]
      args: ["cd /work && npm ci && npm test"]

  - name: kaniko-build
    container:
      image: gcr.io/kaniko-project/executor:latest
      args:
      - "--context=/work"
      - "--destination=myregistry.com/app:{{workflow.parameters.revision}}"

  - name: update-k8s
    script:
      image: alpine/git
      command: [sh]
      source: |
        apk add yq
        cd /work/k8s/overlays/production
        yq e ".images[0].newTag = \"{{workflow.parameters.revision}}\"" -i kustomization.yaml
        git add .
        git commit -m "Update image to {{workflow.parameters.revision}}"
        git push
EOF

# 3. 配置 GitHub Webhook(Events 部分见上文)

# 4. 创建 ArgoCD Application 监听 k8s 目录
argocd app create my-app \
  --repo https://github.com/yourorg/your-app.git \
  --path k8s/overlays/production \
  --dest-server https://kubernetes.default.svc \
  --dest-namespace production \
  --sync-policy automated

工作流程:

  1. 开发者推送代码到 main 分支
  2. GitHub Webhook 触发 Argo Events
  3. Argo Events 创建 Workflow 实例
  4. Workflow 执行:克隆 → 测试 → 构建 → 更新 manifest
  5. ArgoCD 检测到 Git 变化,自动同步到集群
  6. 5 分钟内完成从代码到生产的全流程

场景 3:实现金丝雀发布

目标: 新版本灰度发布,自动监控,异常自动回滚

# 1. 安装 Prometheus(用于指标监控)
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

# 2. 创建 AnalysisTemplate(定义成功标准)
kubectl apply -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
  namespace: production
spec:
  args:
  - name: service-name
  
  metrics:
  - name: success-rate
    interval: 1m
    count: 5
    successCondition: result[0] >= 0.95
    failureLimit: 2
    provider:
      prometheus:
        address: http://prometheus-server.monitoring.svc
        query: |
          sum(rate(http_requests_total{
            service="{{args.service-name}}",
            status!~"5.."
          }[1m])) 
          / 
          sum(rate(http_requests_total{
            service="{{args.service-name}}"
          }[1m]))
EOF

# 3. 创建 Rollout 资源
kubectl apply -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
  namespace: production
spec:
  replicas: 10
  strategy:
    canary:
      canaryService: my-app-canary
      stableService: my-app-stable
      trafficRouting:
        nginx:
          stableIngress: my-app-ingress
      steps:
      - setWeight: 10
      - pause: {duration: 2m}
      - setWeight: 30
      - pause: {duration: 2m}
      
      # 自动分析
      - setWeight: 50
      - analysis:
          templates:
          - templateName: success-rate
          args:
          - name: service-name
            value: my-app
      
      - setWeight: 100
  
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: my-app
  
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: app
        image: myregistry.com/app:v1.0.0
        ports:
        - containerPort: 8080
EOF

# 4. 更新镜像触发金丝雀发布
kubectl argo rollouts set image my-app app=myregistry.com/app:v2.0.0 -n production

# 5. 实时监控发布过程
kubectl argo rollouts get rollout my-app -n production --watch

# 6. 如果需要手动干预
kubectl argo rollouts promote my-app -n production  # 推进到下一步
kubectl argo rollouts abort my-app -n production    # 中止并回滚

典型场景方案

方案 A:小团队快速起步(最小化)

适用: 5-20 人团队,简单微服务架构

# 只需 Argo CD
helm install argocd argo/argo-cd -n argocd --create-namespace

# Git 仓库组织
app-configs/
├── apps/
│   ├── service-a/
│   ├── service-b/
│   └── service-c/
└── argocd/
    └── applications.yaml  # 定义所有 Application

优势: 部署简单,5 分钟上手,适合快速验证 GitOps


方案 B:中型团队标准配置(推荐)

适用: 20-100 人团队,多环境管理

# 1. 核心三件套
helm install argocd argo/argo-cd -n argocd --create-namespace
helm install argo-rollouts argo/argo-rollouts -n argo-rollouts --create-namespace
helm install argocd-image-updater argo/argocd-image-updater -n argocd

# 2. Git 仓库结构(推荐 Monorepo)
infrastructure/
├── clusters/
│   ├── dev/
│   ├── staging/
│   └── production/
├── base/              # 公共配置
└── monitoring/        # 监控配置

app-manifests/
├── app-a/
│   ├── base/
│   └── overlays/
│       ├── dev/
│       ├── staging/
│       └── production/

优势: 覆盖 90% 场景,支持多环境,易维护


方案 C:大型企业完整方案

适用: 100+ 人团队,复杂 CI/CD 需求

# 全家桶部署
helm install argocd argo/argo-cd -n argocd --create-namespace
helm install argo-workflows argo/argo-workflows -n argo --create-namespace
helm install argo-rollouts argo/argo-rollouts -n argo-rollouts --create-namespace
helm install argo-events argo/argo-events -n argo-events --create-namespace
helm install argocd-image-updater argo/argocd-image-updater -n argocd

# 集成外部工具
- Vault (密钥管理)
- Harbor (镜像仓库)
- Prometheus + Grafana (监控)
- ELK (日志)

架构图:

开发者提交代码
    ↓
GitHub Webhook → Argo Events
    ↓
触发 Argo Workflows (CI)
    ├── 代码检查
    ├── 单元测试
    ├── 镜像构建
    └── 安全扫描
    ↓
Image Updater 检测新镜像
    ↓
更新 Git 配置仓库
    ↓
Argo CD 检测变化
    ↓
Argo Rollouts 执行金丝雀发布
    ├── 10% 流量
    ├── 自动分析(Prometheus)
    ├── 50% 流量
    └── 100% 流量
    ↓
生产环境运行

最佳实践建议

1. Git 仓库组织策略

❌ 反模式:应用代码和配置混在一起

my-app/
├── src/           # 应用代码
├── Dockerfile
└── k8s/           # 配置文件
    └── deployment.yaml

问题:代码变更触发不必要的部署

✅ 推荐:配置仓库分离

# 仓库 1: 应用代码
app-source/
├── src/
├── Dockerfile
└── .github/workflows/

# 仓库 2: 配置管理
app-configs/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── dev/
    ├── staging/
    └── production/

优势:代码发布和配置变更解耦,审计清晰


2. 多环境管理策略

方式 1: Kustomize Overlays(推荐)

# base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 1  # 基础配置
  template:
    spec:
      containers:
      - name: app
        image: myregistry.com/app:latest
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
patchesStrategicMerge:
- deployment-patch.yaml

# overlays/production/deployment-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 10  # 生产环境副本数
  template:
    spec:
      containers:
      - name: app
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"

方式 2: Helm Values(适合复杂应用)

# values-dev.yaml
replicaCount: 1
resources:
  limits:
    memory: 256Mi

# values-production.yaml
replicaCount: 10
resources:
  limits:
    memory: 2Gi
ingress:
  enabled: true
  hosts:
  - app.example.com

3. 密钥管理最佳实践

❌ 错误做法:明文存储密码

apiVersion: v1
kind: Secret
metadata:
  name: db-password
stringData:
  password: "MyP@ssw0rd123"  # 千万别这样!

✅ 推荐方案:外部密钥管理

# 方式 1: 使用 Sealed Secrets
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: db-password
spec:
  encryptedData:
    password: AgBxG7... # 加密后的密文,可以安全提交到 Git
# 方式 2: 使用 External Secrets Operator (推荐)
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-password
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: SecretStore
  target:
    name: db-password
  data:
  - secretKey: password
    remoteRef:
      key: database/production
      property: password

配置 ArgoCD 忽略敏感字段:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  resource.customizations: |
    Secret:
      ignoreDifferences: |
        jsonPointers:
        - /data    

4. 性能优化技巧

ArgoCD 优化

# 大规模集群配置(管理 100+ 应用)
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  # 并发同步数量
  application.sync.max-concurrent: "20"
  
  # 资源缓存时间
  timeout.reconciliation: "180s"
  
  # 使用 Server-side apply (提升性能)
  application.resourceTrackingMethod: "annotation+label"

Workflows 优化

# 资源限制模板
spec:
  templates:
  - name: cpu-intensive-task
    container:
      image: myapp
      resources:
        requests:
          memory: "2Gi"
          cpu: "2000m"
        limits:
          memory: "4Gi"
          cpu: "4000m"
    # 节点选择
    nodeSelector:
      workload: compute-intensive
    # 容忍度
    tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "workflows"
      effect: "NoSchedule"

5. 监控与告警配置

# Prometheus ServiceMonitor for Argo CD
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-metrics
  namespace: argocd
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-server
  endpoints:
  - port: metrics

# 告警规则示例
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: argocd-alerts
  namespace: argocd
spec:
  groups:
  - name: argocd
    interval: 30s
    rules:
    
    # 应用同步失败告警
    - alert: ArgoAppSyncFailed
      expr: |
        argocd_app_info{sync_status="OutOfSync"} == 1        
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "ArgoCD app {{ $labels.name }} sync failed"
    
    # 应用健康状态异常
    - alert: ArgoAppUnhealthy
      expr: |
        argocd_app_info{health_status!="Healthy"} == 1        
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "ArgoCD app {{ $labels.name }} is unhealthy"

Grafana Dashboard 推荐:

  • ArgoCD 官方 Dashboard: ID 14584
  • Argo Workflows Dashboard: ID 13927
  • Argo Rollouts Dashboard: ID 15386

6. 灾难恢复与备份

# 备份 ArgoCD 配置
kubectl get applications -n argocd -o yaml > argocd-apps-backup.yaml
kubectl get appprojects -n argocd -o yaml > argocd-projects-backup.yaml

# 备份 Workflows
kubectl get workflows -n argo -o yaml > workflows-backup.yaml
kubectl get workflowtemplates -n argo -o yaml > workflow-templates-backup.yaml

# 定期备份脚本
cat > backup-argo.sh <<'EOF'
#!/bin/bash
BACKUP_DIR="/backup/argo/$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR

# 备份所有 Argo 资源
for ns in argocd argo argo-rollouts argo-events; do
  kubectl get all -n $ns -o yaml > $BACKUP_DIR/$ns-all.yaml
done

# 压缩并上传到 S3
tar -czf $BACKUP_DIR.tar.gz $BACKUP_DIR
aws s3 cp $BACKUP_DIR.tar.gz s3://my-backups/argo/
EOF

chmod +x backup-argo.sh

# 添加到 crontab(每天凌晨 2 点备份)
echo "0 2 * * * /path/to/backup-argo.sh" | crontab -

7. 安全加固清单

ArgoCD 安全配置

# 1. 启用 RBAC
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  policy.default: role:readonly
  policy.csv: |
    # 开发者只能查看和同步
    p, role:developer, applications, get, */*, allow
    p, role:developer, applications, sync, */*, allow
    g, dev-team, role:developer
    
    # 管理员完全权限
    p, role:admin, *, *, *, allow
    g, admin-team, role:admin    

# 2. 启用 SSO(以 GitHub 为例)
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  url: https://argocd.example.com
  dex.config: |
    connectors:
    - type: github
      id: github
      name: GitHub
      config:
        clientID: $GITHUB_CLIENT_ID
        clientSecret: $GITHUB_CLIENT_SECRET
        orgs:
        - name: your-org    

# 3. 限制可部署的镜像仓库
data:
  resource.customizations: |
    argoproj.io/Application:
      health.lua: |
        -- 只允许来自信任仓库的镜像
        local allowed_registries = {
          "myregistry.com",
          "gcr.io/myproject"
        }    

网络策略

# 限制 ArgoCD 出站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argocd-egress
  namespace: argocd
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/part-of: argocd
  policyTypes:
  - Egress
  egress:
  # 允许访问 Kubernetes API
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 443
  
  # 允许访问 Git 仓库
  - to:
    - podSelector: {}
    ports:
    - protocol: TCP
      port: 22
    - protocol: TCP
      port: 443

常见问题与解决方案

Q1: ArgoCD 同步卡在 “Progressing” 状态

原因: Pod 启动失败或健康检查失败

排查步骤:

# 1. 查看应用详情
argocd app get my-app

# 2. 查看具体资源状态
kubectl get pods -n production
kubectl describe pod <pod-name> -n production

# 3. 查看日志
kubectl logs <pod-name> -n production

# 4. 强制刷新
argocd app sync my-app --force

Q2: Workflow 执行失败,提示权限不足

解决方案: 配置 ServiceAccount 和 RBAC

# 创建 ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: workflow-executor
  namespace: argo

---
# 授予权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: workflow-executor
  namespace: argo
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "watch", "list"]
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: workflow-executor
  namespace: argo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: workflow-executor
subjects:
- kind: ServiceAccount
  name: workflow-executor
  namespace: argo

---
# 在 Workflow 中使用
spec:
  serviceAccountName: workflow-executor

Q3: Rollout 一直处于 “Degraded” 状态

常见原因与解决:

# 1. 查看 Rollout 状态
kubectl argo rollouts status my-app -n production

# 2. 查看分析结果
kubectl get analysisrun -n production
kubectl describe analysisrun <analysis-name> -n production

# 3. 如果是指标查询问题,检查 Prometheus
kubectl logs -n monitoring prometheus-server-xxx

# 4. 跳过当前分析(紧急情况)
kubectl argo rollouts promote my-app -n production --skip-current-step

# 5. 完全回滚
kubectl argo rollouts undo my-app -n production

Q4: 多集群管理最佳实践

# 1. 在 ArgoCD 中注册多个集群
argocd cluster add dev-cluster --name dev
argocd cluster add staging-cluster --name staging
argocd cluster add prod-cluster --name prod

# 2. 使用 ApplicationSet 管理多集群应用
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: my-app-multi-cluster
  namespace: argocd
spec:
  generators:
  - list:
      elements:
      - cluster: dev
        url: https://dev-cluster
        namespace: default
      - cluster: staging
        url: https://staging-cluster
        namespace: default
      - cluster: prod
        url: https://prod-cluster
        namespace: production
  
  template:
    metadata:
      name: 'my-app-{{cluster}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/yourorg/configs.git
        targetRevision: main
        path: 'apps/my-app/overlays/{{cluster}}'
      destination:
        server: '{{url}}'
        namespace: '{{namespace}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

进阶主题

自定义健康检查

-- ConfigMap: argocd-cm
resource.customizations: |
  argoproj.io/Rollout:
    health.lua: |
      hs = {}
      if obj.status ~= nil then
        if obj.status.phase == "Healthy" then
          hs.status = "Healthy"
          hs.message = "Rollout is healthy"
          return hs
        end
        if obj.status.phase == "Degraded" then
          hs.status = "Degraded"
          hs.message = obj.status.message
          return hs
        end
      end
      hs.status = "Progressing"
      hs.message = "Waiting for rollout"
      return hs

Webhook 通知集成

# ArgoCD 通知配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  # Slack 通知
  service.slack: |
    token: $slack-token    
  
  # 定义触发器
  trigger.on-sync-succeeded: |
    - when: app.status.operationState.phase in ['Succeeded']
      send: [app-sync-succeeded]    
  
  trigger.on-sync-failed: |
    - when: app.status.operationState.phase in ['Error', 'Failed']
      send: [app-sync-failed]    
  
  # 定义模板
  template.app-sync-succeeded: |
    message: |
      Application {{.app.metadata.name}} has been successfully synced.
      Sync Status: {{.app.status.sync.status}}
    slack:
      attachments: |
        [{
          "title": "{{.app.metadata.name}}",
          "title_link": "{{.context.argocdUrl}}/applications/{{.app.metadata.name}}",
          "color": "good"
        }]    

学习资源与社区

官方文档

实战教程

社区资源

认证培训

  • CNCF 提供的 GitOps 认证课程
  • Codefresh GitOps Fundamentals

总结

快速决策树

需要部署应用到 K8s?
    ├─ 是 → 使用 Argo CD(必选)
    │
    ├─ 需要复杂的构建流水线?
    │   └─ 是 → 添加 Argo Workflows
    │
    ├─ 需要金丝雀/蓝绿部署?
    │   └─ 是 → 添加 Argo Rollouts
    │
    ├─ 需要事件驱动的自动化?
    │   └─ 是 → 添加 Argo Events
    │
    └─ 想要自动更新镜像版本?
        └─ 是 → 添加 Image Updater

核心价值

  1. GitOps 理念: 一切皆代码,Git 是唯一真实源
  2. 声明式管理: 描述期望状态,系统自动达成
  3. 自动化: 减少人工操作,提升效率和可靠性
  4. 可观测性: 完整的审计日志和变更历史
  5. 云原生: 专为 Kubernetes 设计,深度集成

开始你的 Argo 之旅

# 第一步:安装 Argo CD(5 分钟)
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 第二步:部署你的第一个应用(3 分钟)
argocd app create my-first-app \
  --repo https://github.com/argoproj/argocd-example-apps.git \
  --path guestbook \
  --dest-server https://kubernetes.default.svc \
  --dest-namespace default

argocd app sync my-first-app

# 第三步:享受 GitOps 的魅力 🎉

祝你在云原生的道路上越走越远!如有问题,欢迎加入 Argo 社区交流。


导航 文章 分类 标签