Argo 家族完全入门指南:Kubernetes 的 GitOps 利器
字数:
3187
·
阅读:
15 分钟
·
访问:
-
Argo 家族完全入门指南:Kubernetes 的 GitOps 利器
从零开始掌握 Argo 生态系统,打造现代化的云原生 CI/CD 平台
目录
Argo 家族概览
Argo 是一套专为 Kubernetes 设计的开源工具集,由 CNCF(云原生计算基金会)孵化。它解决了云原生应用从开发到部署的全生命周期管理问题。
家族成员一览
| 工具 | 核心功能 | 适用场景 | 成熟度 |
|---|---|---|---|
| Argo CD | GitOps 持续部署 | 应用发布、配置管理 | ⭐⭐⭐⭐⭐ 生产就绪 |
| Argo Workflows | 工作流编排引擎 | CI/CD、数据处理 | ⭐⭐⭐⭐⭐ 生产就绪 |
| Argo Rollouts | 渐进式交付 | 金丝雀、蓝绿部署 | ⭐⭐⭐⭐ 稳定 |
| Argo Events | 事件驱动自动化 | Webhook、消息队列 | ⭐⭐⭐⭐ 稳定 |
| Argo Image Updater | 镜像版本自动更新 | 自动化部署 | ⭐⭐⭐ 可用 |
架构关系图
┌─────────────────────────────────────────────────────┐
│ Git Repository (Single Source of Truth) │
└─────────────────┬───────────────────────────────────┘
│
▼
┌────────────────┐
│ Argo CD │ ◄─── 监控 Git 变化
│ (GitOps 核心) │ 自动同步到 K8s
└────────┬───────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Rollouts │ │ Workflows│ │ Events │
│(部署策略) │ │(任务编排) │ │(事件触发) │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└───────────┴───────────┘
│
▼
┌────────────────┐
│ Kubernetes │
│ Cluster │
└────────────────┘
核心组件详解
1️⃣ Argo CD - GitOps 持续部署的基石
什么是 GitOps?
传统部署: 手动执行 kubectl apply → 配置不一致 → 难以追踪变更
GitOps 部署: Git 作为唯一真实源 → 自动同步 → 声明式管理
核心概念
# Application 定义示例
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourorg/configs.git
targetRevision: main
path: apps/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # 自动删除不再存在的资源
selfHeal: true # 自动修复漂移
安装步骤
# 1. 添加 Helm 仓库
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
# 2. 安装 Argo CD
helm install argocd argo/argo-cd \
--namespace argocd \
--create-namespace \
--set server.service.type=LoadBalancer
# 3. 等待 Pod 就绪
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=argocd-server -n argocd --timeout=300s
# 4. 获取初始管理员密码
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d && echo
# 5. 访问 UI(任选一种方式)
# 方式 1: 端口转发
kubectl port-forward svc/argocd-server -n argocd 8080:443
# 方式 2: 如果使用 LoadBalancer
kubectl get svc argocd-server -n argocd
# 访问: https://<EXTERNAL-IP>
首次使用流程
# 1. 安装 ArgoCD CLI(可选但推荐)
curl -sSL -o argocd-linux-amd64 https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd
rm argocd-linux-amd64
# 2. CLI 登录
argocd login localhost:8080 --username admin --password <初始密码>
# 3. 修改密码
argocd account update-password
# 4. 创建第一个应用
argocd app create guestbook \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace default
# 5. 同步应用
argocd app sync guestbook
2️⃣ Argo Workflows - Kubernetes 原生的工作流引擎
核心优势
- 容器原生: 每个步骤都是一个容器
- DAG 支持: 复杂依赖关系编排
- 并行执行: 自动并行无依赖任务
- 资源高效: 比 Jenkins 节省 60% 资源
工作流示例
# 简单的 CI 流水线
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: ci-pipeline-
spec:
entrypoint: ci-pipeline
templates:
# 主流程定义
- name: ci-pipeline
dag:
tasks:
- name: clone
template: git-clone
- name: test
dependencies: [clone]
template: run-tests
- name: build
dependencies: [test]
template: docker-build
- name: push
dependencies: [build]
template: docker-push
# 子任务模板
- name: git-clone
container:
image: alpine/git
command: [sh, -c]
args: ["git clone https://github.com/yourorg/app.git /work"]
volumeMounts:
- name: workdir
mountPath: /work
- name: run-tests
container:
image: node:18
command: [sh, -c]
args: ["cd /work && npm install && npm test"]
volumeMounts:
- name: workdir
mountPath: /work
- name: docker-build
container:
image: gcr.io/kaniko-project/executor:latest
args:
- "--context=/work"
- "--dockerfile=/work/Dockerfile"
- "--destination=myregistry.com/app:{{workflow.uid}}"
volumeMounts:
- name: workdir
mountPath: /work
- name: docker-push
container:
image: curlimages/curl
command: [sh, -c]
args: ["echo 'Image pushed successfully'"]
# 共享存储卷
volumeClaimTemplates:
- metadata:
name: workdir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
安装与使用
# 1. 安装 Argo Workflows
helm install argo-workflows argo/argo-workflows \
--namespace argo \
--create-namespace \
--set server.serviceType=LoadBalancer
# 2. 访问 UI
kubectl -n argo port-forward svc/argo-workflows-server 2746:2746
# 3. 提交工作流
kubectl apply -f my-workflow.yaml
# 4. 查看工作流状态
kubectl get workflows -n argo
# 5. 查看日志
kubectl logs -n argo <workflow-pod-name>
3️⃣ Argo Rollouts - 渐进式交付的利器
支持的部署策略
# 金丝雀发布(Canary)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app-canary
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10 # 10% 流量到新版本
- pause: {duration: 2m}
- setWeight: 25 # 增加到 25%
- pause: {duration: 2m}
- setWeight: 50 # 增加到 50%
- pause: {duration: 5m}
- setWeight: 75 # 增加到 75%
- pause: {} # 手动审批
- setWeight: 100 # 全量发布
# 分析配置(自动回滚)
analysis:
templates:
- templateName: success-rate
startingStep: 2
args:
- name: service-name
value: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: app
image: myregistry.com/app:v2.0.0
# 蓝绿部署(Blue-Green)
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app-bluegreen
spec:
replicas: 5
strategy:
blueGreen:
activeService: my-app-active # 生产流量
previewService: my-app-preview # 预览流量
autoPromotionEnabled: false # 手动切换
scaleDownDelaySeconds: 300 # 保留旧版本 5 分钟
template:
spec:
containers:
- name: app
image: myregistry.com/app:v2.0.0
安装与集成
# 1. 安装 Rollouts Controller
helm install argo-rollouts argo/argo-rollouts \
--namespace argo-rollouts \
--create-namespace
# 2. 安装 kubectl 插件
curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
sudo install -m 555 kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts
# 3. 查看 Rollout 状态
kubectl argo rollouts get rollout my-app -n production --watch
# 4. 手动推进金丝雀发布
kubectl argo rollouts promote my-app -n production
# 5. 紧急回滚
kubectl argo rollouts abort my-app -n production
kubectl argo rollouts undo my-app -n production
4️⃣ Argo Events - 事件驱动的自动化枢纽
核心概念
事件源 (Event Source) → 传感器 (Sensor) → 触发器 (Trigger)
│ │ │
Webhook 条件判断 创建 Workflow
Git Push 事件过滤 发送通知
消息队列 数据转换 更新 Rollout
实战示例:Webhook 触发部署
# 1. 定义事件源(接收 GitHub Webhook)
apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
name: github-webhook
namespace: argo-events
spec:
service:
ports:
- port: 12000
targetPort: 12000
webhook:
github-push:
port: "12000"
endpoint: /push
method: POST
# 2. 定义传感器(触发工作流)
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
name: github-sensor
namespace: argo-events
spec:
dependencies:
- name: github-dep
eventSourceName: github-webhook
eventName: github-push
filters:
data:
- path: body.ref
type: string
value:
- "refs/heads/main" # 只监听 main 分支
triggers:
- template:
name: trigger-workflow
k8s:
operation: create
source:
resource:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: ci-pipeline-
spec:
entrypoint: build-and-deploy
templates:
- name: build-and-deploy
container:
image: alpine
command: [sh, -c]
args: ["echo 'Building from commit {{inputs.parameters.commit}}'"]
parameters:
- src:
dependencyName: github-dep
dataKey: body.head_commit.id
dest: spec.arguments.parameters.0.value
安装与配置
# 1. 安装 Argo Events
helm install argo-events argo/argo-events \
--namespace argo-events \
--create-namespace
# 2. 创建 EventBus(消息总线)
kubectl apply -n argo-events -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
name: default
spec:
nats:
native:
replicas: 3
EOF
# 3. 部署事件源和传感器
kubectl apply -f event-source.yaml
kubectl apply -f sensor.yaml
# 4. 暴露 Webhook 端点
kubectl -n argo-events port-forward svc/github-webhook-eventsource-svc 12000:12000
# 5. 在 GitHub 配置 Webhook
# URL: http://your-domain:12000/push
# Content type: application/json
# Events: Push events
5️⃣ Argo Image Updater - 自动化镜像更新
工作原理
容器镜像仓库 → Image Updater 检测新版本 → 更新 Git 配置 → ArgoCD 自动部署
配置示例
# 在 ArgoCD Application 添加注解
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
annotations:
# 启用自动更新
argocd-image-updater.argoproj.io/image-list: myapp=myregistry.com/app
# 更新策略:latest 标签
argocd-image-updater.argoproj.io/myapp.update-strategy: latest
# 或者使用语义化版本
# argocd-image-updater.argoproj.io/myapp.update-strategy: semver
# argocd-image-updater.argoproj.io/myapp.allow-tags: regexp:^v[0-9]+\.[0-9]+\.[0-9]+$
# Git 写回配置
argocd-image-updater.argoproj.io/write-back-method: git
argocd-image-updater.argoproj.io/git-branch: main
spec:
source:
repoURL: https://github.com/yourorg/app-configs.git
path: overlays/production
安装步骤
# 1. 安装 Image Updater
helm install argocd-image-updater argo/argocd-image-updater \
--namespace argocd \
--set config.argocd.token=$(kubectl get secret -n argocd argocd-secret -o jsonpath='{.data.admin\.password}' | base64 -d)
# 2. 配置镜像仓库认证(如果需要)
kubectl create secret generic regcred \
--from-file=.dockerconfigjson=$HOME/.docker/config.json \
--type=kubernetes.io/dockerconfigjson \
-n argocd
# 3. 查看更新日志
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-image-updater -f
快速上手实战
场景 1:零基础部署第一个 GitOps 应用
目标: 10 分钟内完成 Git → Kubernetes 的自动部署
# Step 1: 安装 Argo CD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Step 2: 访问 UI
kubectl port-forward svc/argocd-server -n argocd 8080:443 &
# Step 3: 获取密码并登录
ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
echo "ArgoCD Password: $ARGOCD_PASSWORD"
# 浏览器访问 https://localhost:8080
# 用户名: admin
# 密码: 上面输出的密码
# Step 4: 通过 UI 创建应用
# - Repository URL: https://github.com/argoproj/argocd-example-apps.git
# - Path: guestbook
# - Cluster: in-cluster
# - Namespace: default
# Step 5: 点击 "Sync" 按钮,几秒后应用就部署完成!
# 验证
kubectl get all -n default | grep guestbook
场景 2:构建完整的 CI/CD 流水线
目标: 代码推送 → 自动构建 → 自动测试 → 自动部署
# 架构:GitHub Webhook → Argo Events → Argo Workflows → Argo CD
# 1. 准备 Git 仓库结构
your-app-repo/
├── src/ # 应用代码
├── Dockerfile
├── k8s/
│ ├── base/
│ │ ├── deployment.yaml
│ │ └── kustomization.yaml
│ └── overlays/
│ └── production/
│ └── kustomization.yaml
└── .github/
└── workflows/ # (可选)保留用于其他 CI 任务
# 2. 创建 Workflow 模板
kubectl apply -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: ci-template
namespace: argo
spec:
entrypoint: ci-pipeline
arguments:
parameters:
- name: repo-url
- name: revision
templates:
- name: ci-pipeline
dag:
tasks:
- name: checkout
template: git-clone
- name: unit-test
dependencies: [checkout]
template: run-tests
- name: build-image
dependencies: [unit-test]
template: kaniko-build
- name: update-manifest
dependencies: [build-image]
template: update-k8s
- name: git-clone
script:
image: alpine/git
command: [sh]
source: |
git clone {{workflow.parameters.repo-url}} /work
cd /work && git checkout {{workflow.parameters.revision}}
- name: run-tests
container:
image: node:18
command: [sh, -c]
args: ["cd /work && npm ci && npm test"]
- name: kaniko-build
container:
image: gcr.io/kaniko-project/executor:latest
args:
- "--context=/work"
- "--destination=myregistry.com/app:{{workflow.parameters.revision}}"
- name: update-k8s
script:
image: alpine/git
command: [sh]
source: |
apk add yq
cd /work/k8s/overlays/production
yq e ".images[0].newTag = \"{{workflow.parameters.revision}}\"" -i kustomization.yaml
git add .
git commit -m "Update image to {{workflow.parameters.revision}}"
git push
EOF
# 3. 配置 GitHub Webhook(Events 部分见上文)
# 4. 创建 ArgoCD Application 监听 k8s 目录
argocd app create my-app \
--repo https://github.com/yourorg/your-app.git \
--path k8s/overlays/production \
--dest-server https://kubernetes.default.svc \
--dest-namespace production \
--sync-policy automated
工作流程:
- 开发者推送代码到
main分支 - GitHub Webhook 触发 Argo Events
- Argo Events 创建 Workflow 实例
- Workflow 执行:克隆 → 测试 → 构建 → 更新 manifest
- ArgoCD 检测到 Git 变化,自动同步到集群
- 5 分钟内完成从代码到生产的全流程
场景 3:实现金丝雀发布
目标: 新版本灰度发布,自动监控,异常自动回滚
# 1. 安装 Prometheus(用于指标监控)
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
# 2. 创建 AnalysisTemplate(定义成功标准)
kubectl apply -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
namespace: production
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 1m
count: 5
successCondition: result[0] >= 0.95
failureLimit: 2
provider:
prometheus:
address: http://prometheus-server.monitoring.svc
query: |
sum(rate(http_requests_total{
service="{{args.service-name}}",
status!~"5.."
}[1m]))
/
sum(rate(http_requests_total{
service="{{args.service-name}}"
}[1m]))
EOF
# 3. 创建 Rollout 资源
kubectl apply -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
namespace: production
spec:
replicas: 10
strategy:
canary:
canaryService: my-app-canary
stableService: my-app-stable
trafficRouting:
nginx:
stableIngress: my-app-ingress
steps:
- setWeight: 10
- pause: {duration: 2m}
- setWeight: 30
- pause: {duration: 2m}
# 自动分析
- setWeight: 50
- analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: my-app
- setWeight: 100
revisionHistoryLimit: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: app
image: myregistry.com/app:v1.0.0
ports:
- containerPort: 8080
EOF
# 4. 更新镜像触发金丝雀发布
kubectl argo rollouts set image my-app app=myregistry.com/app:v2.0.0 -n production
# 5. 实时监控发布过程
kubectl argo rollouts get rollout my-app -n production --watch
# 6. 如果需要手动干预
kubectl argo rollouts promote my-app -n production # 推进到下一步
kubectl argo rollouts abort my-app -n production # 中止并回滚
典型场景方案
方案 A:小团队快速起步(最小化)
适用: 5-20 人团队,简单微服务架构
# 只需 Argo CD
helm install argocd argo/argo-cd -n argocd --create-namespace
# Git 仓库组织
app-configs/
├── apps/
│ ├── service-a/
│ ├── service-b/
│ └── service-c/
└── argocd/
└── applications.yaml # 定义所有 Application
优势: 部署简单,5 分钟上手,适合快速验证 GitOps
方案 B:中型团队标准配置(推荐)
适用: 20-100 人团队,多环境管理
# 1. 核心三件套
helm install argocd argo/argo-cd -n argocd --create-namespace
helm install argo-rollouts argo/argo-rollouts -n argo-rollouts --create-namespace
helm install argocd-image-updater argo/argocd-image-updater -n argocd
# 2. Git 仓库结构(推荐 Monorepo)
infrastructure/
├── clusters/
│ ├── dev/
│ ├── staging/
│ └── production/
├── base/ # 公共配置
└── monitoring/ # 监控配置
app-manifests/
├── app-a/
│ ├── base/
│ └── overlays/
│ ├── dev/
│ ├── staging/
│ └── production/
优势: 覆盖 90% 场景,支持多环境,易维护
方案 C:大型企业完整方案
适用: 100+ 人团队,复杂 CI/CD 需求
# 全家桶部署
helm install argocd argo/argo-cd -n argocd --create-namespace
helm install argo-workflows argo/argo-workflows -n argo --create-namespace
helm install argo-rollouts argo/argo-rollouts -n argo-rollouts --create-namespace
helm install argo-events argo/argo-events -n argo-events --create-namespace
helm install argocd-image-updater argo/argocd-image-updater -n argocd
# 集成外部工具
- Vault (密钥管理)
- Harbor (镜像仓库)
- Prometheus + Grafana (监控)
- ELK (日志)
架构图:
开发者提交代码
↓
GitHub Webhook → Argo Events
↓
触发 Argo Workflows (CI)
├── 代码检查
├── 单元测试
├── 镜像构建
└── 安全扫描
↓
Image Updater 检测新镜像
↓
更新 Git 配置仓库
↓
Argo CD 检测变化
↓
Argo Rollouts 执行金丝雀发布
├── 10% 流量
├── 自动分析(Prometheus)
├── 50% 流量
└── 100% 流量
↓
生产环境运行
最佳实践建议
1. Git 仓库组织策略
❌ 反模式:应用代码和配置混在一起
my-app/
├── src/ # 应用代码
├── Dockerfile
└── k8s/ # 配置文件
└── deployment.yaml
问题:代码变更触发不必要的部署
✅ 推荐:配置仓库分离
# 仓库 1: 应用代码
app-source/
├── src/
├── Dockerfile
└── .github/workflows/
# 仓库 2: 配置管理
app-configs/
├── base/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── kustomization.yaml
└── overlays/
├── dev/
├── staging/
└── production/
优势:代码发布和配置变更解耦,审计清晰
2. 多环境管理策略
方式 1: Kustomize Overlays(推荐)
# base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1 # 基础配置
template:
spec:
containers:
- name: app
image: myregistry.com/app:latest
resources:
requests:
memory: "128Mi"
cpu: "100m"
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
patchesStrategicMerge:
- deployment-patch.yaml
# overlays/production/deployment-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 10 # 生产环境副本数
template:
spec:
containers:
- name: app
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
方式 2: Helm Values(适合复杂应用)
# values-dev.yaml
replicaCount: 1
resources:
limits:
memory: 256Mi
# values-production.yaml
replicaCount: 10
resources:
limits:
memory: 2Gi
ingress:
enabled: true
hosts:
- app.example.com
3. 密钥管理最佳实践
❌ 错误做法:明文存储密码
apiVersion: v1
kind: Secret
metadata:
name: db-password
stringData:
password: "MyP@ssw0rd123" # 千万别这样!
✅ 推荐方案:外部密钥管理
# 方式 1: 使用 Sealed Secrets
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: db-password
spec:
encryptedData:
password: AgBxG7... # 加密后的密文,可以安全提交到 Git
# 方式 2: 使用 External Secrets Operator (推荐)
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-password
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: db-password
data:
- secretKey: password
remoteRef:
key: database/production
property: password
配置 ArgoCD 忽略敏感字段:
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
resource.customizations: |
Secret:
ignoreDifferences: |
jsonPointers:
- /data
4. 性能优化技巧
ArgoCD 优化
# 大规模集群配置(管理 100+ 应用)
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
# 并发同步数量
application.sync.max-concurrent: "20"
# 资源缓存时间
timeout.reconciliation: "180s"
# 使用 Server-side apply (提升性能)
application.resourceTrackingMethod: "annotation+label"
Workflows 优化
# 资源限制模板
spec:
templates:
- name: cpu-intensive-task
container:
image: myapp
resources:
requests:
memory: "2Gi"
cpu: "2000m"
limits:
memory: "4Gi"
cpu: "4000m"
# 节点选择
nodeSelector:
workload: compute-intensive
# 容忍度
tolerations:
- key: "dedicated"
operator: "Equal"
value: "workflows"
effect: "NoSchedule"
5. 监控与告警配置
# Prometheus ServiceMonitor for Argo CD
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-metrics
namespace: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-server
endpoints:
- port: metrics
# 告警规则示例
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: argocd-alerts
namespace: argocd
spec:
groups:
- name: argocd
interval: 30s
rules:
# 应用同步失败告警
- alert: ArgoAppSyncFailed
expr: |
argocd_app_info{sync_status="OutOfSync"} == 1
for: 10m
labels:
severity: warning
annotations:
summary: "ArgoCD app {{ $labels.name }} sync failed"
# 应用健康状态异常
- alert: ArgoAppUnhealthy
expr: |
argocd_app_info{health_status!="Healthy"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "ArgoCD app {{ $labels.name }} is unhealthy"
Grafana Dashboard 推荐:
- ArgoCD 官方 Dashboard: ID
14584 - Argo Workflows Dashboard: ID
13927 - Argo Rollouts Dashboard: ID
15386
6. 灾难恢复与备份
# 备份 ArgoCD 配置
kubectl get applications -n argocd -o yaml > argocd-apps-backup.yaml
kubectl get appprojects -n argocd -o yaml > argocd-projects-backup.yaml
# 备份 Workflows
kubectl get workflows -n argo -o yaml > workflows-backup.yaml
kubectl get workflowtemplates -n argo -o yaml > workflow-templates-backup.yaml
# 定期备份脚本
cat > backup-argo.sh <<'EOF'
#!/bin/bash
BACKUP_DIR="/backup/argo/$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR
# 备份所有 Argo 资源
for ns in argocd argo argo-rollouts argo-events; do
kubectl get all -n $ns -o yaml > $BACKUP_DIR/$ns-all.yaml
done
# 压缩并上传到 S3
tar -czf $BACKUP_DIR.tar.gz $BACKUP_DIR
aws s3 cp $BACKUP_DIR.tar.gz s3://my-backups/argo/
EOF
chmod +x backup-argo.sh
# 添加到 crontab(每天凌晨 2 点备份)
echo "0 2 * * * /path/to/backup-argo.sh" | crontab -
7. 安全加固清单
ArgoCD 安全配置
# 1. 启用 RBAC
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-rbac-cm
namespace: argocd
data:
policy.default: role:readonly
policy.csv: |
# 开发者只能查看和同步
p, role:developer, applications, get, */*, allow
p, role:developer, applications, sync, */*, allow
g, dev-team, role:developer
# 管理员完全权限
p, role:admin, *, *, *, allow
g, admin-team, role:admin
# 2. 启用 SSO(以 GitHub 为例)
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
url: https://argocd.example.com
dex.config: |
connectors:
- type: github
id: github
name: GitHub
config:
clientID: $GITHUB_CLIENT_ID
clientSecret: $GITHUB_CLIENT_SECRET
orgs:
- name: your-org
# 3. 限制可部署的镜像仓库
data:
resource.customizations: |
argoproj.io/Application:
health.lua: |
-- 只允许来自信任仓库的镜像
local allowed_registries = {
"myregistry.com",
"gcr.io/myproject"
}
网络策略
# 限制 ArgoCD 出站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: argocd-egress
namespace: argocd
spec:
podSelector:
matchLabels:
app.kubernetes.io/part-of: argocd
policyTypes:
- Egress
egress:
# 允许访问 Kubernetes API
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
# 允许访问 Git 仓库
- to:
- podSelector: {}
ports:
- protocol: TCP
port: 22
- protocol: TCP
port: 443
常见问题与解决方案
Q1: ArgoCD 同步卡在 “Progressing” 状态
原因: Pod 启动失败或健康检查失败
排查步骤:
# 1. 查看应用详情
argocd app get my-app
# 2. 查看具体资源状态
kubectl get pods -n production
kubectl describe pod <pod-name> -n production
# 3. 查看日志
kubectl logs <pod-name> -n production
# 4. 强制刷新
argocd app sync my-app --force
Q2: Workflow 执行失败,提示权限不足
解决方案: 配置 ServiceAccount 和 RBAC
# 创建 ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: workflow-executor
namespace: argo
---
# 授予权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: workflow-executor
namespace: argo
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "watch", "list"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: workflow-executor
namespace: argo
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: workflow-executor
subjects:
- kind: ServiceAccount
name: workflow-executor
namespace: argo
---
# 在 Workflow 中使用
spec:
serviceAccountName: workflow-executor
Q3: Rollout 一直处于 “Degraded” 状态
常见原因与解决:
# 1. 查看 Rollout 状态
kubectl argo rollouts status my-app -n production
# 2. 查看分析结果
kubectl get analysisrun -n production
kubectl describe analysisrun <analysis-name> -n production
# 3. 如果是指标查询问题,检查 Prometheus
kubectl logs -n monitoring prometheus-server-xxx
# 4. 跳过当前分析(紧急情况)
kubectl argo rollouts promote my-app -n production --skip-current-step
# 5. 完全回滚
kubectl argo rollouts undo my-app -n production
Q4: 多集群管理最佳实践
# 1. 在 ArgoCD 中注册多个集群
argocd cluster add dev-cluster --name dev
argocd cluster add staging-cluster --name staging
argocd cluster add prod-cluster --name prod
# 2. 使用 ApplicationSet 管理多集群应用
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: my-app-multi-cluster
namespace: argocd
spec:
generators:
- list:
elements:
- cluster: dev
url: https://dev-cluster
namespace: default
- cluster: staging
url: https://staging-cluster
namespace: default
- cluster: prod
url: https://prod-cluster
namespace: production
template:
metadata:
name: 'my-app-{{cluster}}'
spec:
project: default
source:
repoURL: https://github.com/yourorg/configs.git
targetRevision: main
path: 'apps/my-app/overlays/{{cluster}}'
destination:
server: '{{url}}'
namespace: '{{namespace}}'
syncPolicy:
automated:
prune: true
selfHeal: true
进阶主题
自定义健康检查
-- ConfigMap: argocd-cm
resource.customizations: |
argoproj.io/Rollout:
health.lua: |
hs = {}
if obj.status ~= nil then
if obj.status.phase == "Healthy" then
hs.status = "Healthy"
hs.message = "Rollout is healthy"
return hs
end
if obj.status.phase == "Degraded" then
hs.status = "Degraded"
hs.message = obj.status.message
return hs
end
end
hs.status = "Progressing"
hs.message = "Waiting for rollout"
return hs
Webhook 通知集成
# ArgoCD 通知配置
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
# Slack 通知
service.slack: |
token: $slack-token
# 定义触发器
trigger.on-sync-succeeded: |
- when: app.status.operationState.phase in ['Succeeded']
send: [app-sync-succeeded]
trigger.on-sync-failed: |
- when: app.status.operationState.phase in ['Error', 'Failed']
send: [app-sync-failed]
# 定义模板
template.app-sync-succeeded: |
message: |
Application {{.app.metadata.name}} has been successfully synced.
Sync Status: {{.app.status.sync.status}}
slack:
attachments: |
[{
"title": "{{.app.metadata.name}}",
"title_link": "{{.context.argocdUrl}}/applications/{{.app.metadata.name}}",
"color": "good"
}]
学习资源与社区
官方文档
- Argo CD: https://argo-cd.readthedocs.io
- Argo Workflows: https://argoproj.github.io/workflows
- Argo Rollouts: https://argoproj.github.io/rollouts
- Argo Events: https://argoproj.github.io/events
实战教程
社区资源
- GitHub: https://github.com/argoproj
- Slack: https://argoproj.github.io/community/join-slack
- Twitter: @argoproj
认证培训
- CNCF 提供的 GitOps 认证课程
- Codefresh GitOps Fundamentals
总结
快速决策树
需要部署应用到 K8s?
├─ 是 → 使用 Argo CD(必选)
│
├─ 需要复杂的构建流水线?
│ └─ 是 → 添加 Argo Workflows
│
├─ 需要金丝雀/蓝绿部署?
│ └─ 是 → 添加 Argo Rollouts
│
├─ 需要事件驱动的自动化?
│ └─ 是 → 添加 Argo Events
│
└─ 想要自动更新镜像版本?
└─ 是 → 添加 Image Updater
核心价值
- GitOps 理念: 一切皆代码,Git 是唯一真实源
- 声明式管理: 描述期望状态,系统自动达成
- 自动化: 减少人工操作,提升效率和可靠性
- 可观测性: 完整的审计日志和变更历史
- 云原生: 专为 Kubernetes 设计,深度集成
开始你的 Argo 之旅
# 第一步:安装 Argo CD(5 分钟)
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# 第二步:部署你的第一个应用(3 分钟)
argocd app create my-first-app \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace default
argocd app sync my-first-app
# 第三步:享受 GitOps 的魅力 🎉
祝你在云原生的道路上越走越远!如有问题,欢迎加入 Argo 社区交流。