4.0.0 • Published 6d ago

@claudeautopm/plugin-data

Licence

MIT

Version

4.0.0

Deps

Size

151 kB

Vulns

Weekly

Summary Dependency Versions

@claudeautopm/plugin-data

Data engineering, machine learning pipelines, and workflow orchestration specialists.

Installation

# Install the plugin package
npm install -g @claudeautopm/plugin-data

# Install plugin agents to your project
autopm plugin install data

Agents Included

Workflow Orchestration

airflow-orchestration-expert - Apache Airflow DAG development
- DAG design and best practices
- Task dependencies and scheduling
- Sensor and operator development
- XCom for inter-task communication
- Connection and variable management
- Monitoring and alerting

ML Pipeline Development

kedro-pipeline-expert - Kedro ML pipeline framework
- Pipeline architecture
- Data catalog management
- Node and pipeline creation
- Parameters and configuration
- Testing and debugging
- Production deployment

AI Workflow Automation

langgraph-workflow-expert - LangGraph AI workflow orchestration
- Graph-based workflow design
- State management
- Agent coordination patterns
- LLM integration
- Error handling and retries
- Streaming and async workflows

Usage

In Claude Code

After installation, agents are available in your project:

<!-- CLAUDE.md -->
## Active Team Agents

<!-- Load data engineering agents -->
- @include .claude/agents/data/airflow-orchestration-expert.md
- @include .claude/agents/data/kedro-pipeline-expert.md

Or use autopm team load to automatically include agents:

# Load data engineering team
autopm team load data

# Or include in fullstack team
autopm team load fullstack

Direct Invocation

# Invoke agent directly from CLI
autopm agent invoke airflow-orchestration-expert "Design ETL DAG for data warehouse"

Agent Capabilities

Data Pipeline Orchestration

Complex DAG design and scheduling
Task dependency management
Dynamic pipeline generation
Resource allocation and optimization

ML Workflow Management

End-to-end ML pipeline design
Data versioning and lineage
Experiment tracking
Model deployment automation

AI Agent Orchestration

Multi-agent coordination
LLM workflow automation
State machine design
Tool integration patterns

Data Engineering

ETL/ELT pipeline development
Data quality validation
Incremental processing
Error handling and recovery

Examples

Airflow ETL Pipeline

@airflow-orchestration-expert

Create Airflow DAG for daily ETL:

Requirements:
- Extract from PostgreSQL source
- Transform data with pandas
- Load to BigQuery warehouse
- Data quality checks
- Email alerts on failure
- Retry logic with backoff

Schedule:
- Run daily at 2 AM UTC
- Handle time zones
- SLA monitoring

Include:
1. DAG definition
2. Custom operators
3. Data quality sensors
4. Alert configuration
5. Testing strategy

Kedro ML Pipeline

@kedro-pipeline-expert

Build ML pipeline for churn prediction:

Pipeline stages:
1. Data ingestion (multiple sources)
2. Feature engineering
3. Model training (XGBoost, LightGBM)
4. Model evaluation
5. Model deployment

Requirements:
- Modular pipeline design
- Data catalog for versioning
- Parameter management
- Cross-validation
- Model registry integration

Include:
1. Pipeline structure
2. Node implementations
3. Data catalog YAML
4. Parameters YAML
5. Testing suite

LangGraph AI Workflow

@langgraph-workflow-expert

Design multi-agent research workflow:

Agents:
- Research Agent (web search)
- Analysis Agent (data processing)
- Writer Agent (report generation)
- Reviewer Agent (quality check)

Workflow:
1. Research gathers information
2. Analysis processes findings
3. Writer creates draft
4. Reviewer validates quality
5. Loop back if quality < threshold

Requirements:
- State persistence
- Error recovery
- Streaming output
- Token usage tracking

Include:
1. Graph definition
2. Agent nodes
3. State management
4. Edge conditions
5. Testing examples

Complex Airflow Architecture

@airflow-orchestration-expert

Design multi-tenant data platform:

Requirements:
- 10+ data sources (APIs, databases, files)
- Dynamic DAG generation per tenant
- Parallel processing with pools
- Resource quotas per tenant
- Cost tracking and optimization
- Disaster recovery

Features:
- DAG factory pattern
- Custom operators for common tasks
- Centralized logging
- Metric collection
- Auto-scaling workers

Include:
1. Architecture diagram
2. DAG factory implementation
3. Custom operator library
4. Configuration management
5. Monitoring setup

Kedro Production Deployment

@kedro-pipeline-expert

Productionize Kedro pipeline:

Requirements:
- Docker containerization
- Kubernetes deployment
- CI/CD with GitHub Actions
- Model registry (MLflow)
- Monitoring and logging
- A/B testing support

Pipeline:
- Training pipeline (weekly)
- Inference pipeline (real-time)
- Evaluation pipeline (daily)

Include:
1. Dockerfile and docker-compose
2. Kubernetes manifests
3. CI/CD workflows
4. Deployment scripts
5. Monitoring dashboards

Configuration

Environment Variables

Some agents benefit from environment variables:

# Airflow
export AIRFLOW_HOME=/opt/airflow
export AIRFLOW__CORE__EXECUTOR=CeleryExecutor
export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://...

# Kedro
export KEDRO_ENV=production
export KEDRO_LOGGING_CONFIG=conf/base/logging.yml

# LangGraph
export OPENAI_API_KEY=your-key
export LANGSMITH_API_KEY=your-key
export LANGSMITH_PROJECT=my-project

Agent Customization

You can customize agent behavior in .claude/config.yaml:

plugins:
  data:
    airflow:
      default_executor: CeleryExecutor
      default_retries: 3
      schedule_interval: '@daily'
    kedro:
      default_runner: SequentialRunner
      log_level: INFO
      data_catalog_type: local
    langgraph:
      llm_provider: openai
      model: gpt-4
      enable_tracing: true

Documentation

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT ClaudeAutoPM Team

Keywords

claudeautopm plugin data-engineering ml machine-learning etl analytics airflow kedro langgraph kafka dbt pandas orchestration workflow streaming