CLI Reference

Complete reference for the DataCompose command-line interface

CLI Reference

The DataCompose CLI provides commands to initialize projects, add transformers, and manage your data transformation pipeline.

Installation

pip install datacompose

Commands Overview

Initialize a Project

datacompose init [--yes] [--force]

Creates a datacompose.json configuration file with default settings.

Options:

  • --yes, -y: Auto-accept all defaults
  • --force: Overwrite existing configuration file

Add Transformers

datacompose add <transformer> [--output OUTPUT] [--verbose] [--force]

Generate production-ready transformation code for the specified transformer.

Arguments:

  • <transformer>: Name of the transformer to add (e.g., emails, addresses, phone_numbers)

Options:

  • --output, -o: Output directory (default: ./transformers/pyspark)
  • --verbose, -v: Enable verbose output
  • --force: Overwrite existing files

Examples:

# Add email transformers
datacompose add emails

# Add address transformers to custom directory
datacompose add addresses --output ./custom/path

# Add phone transformers with verbose output
datacompose add phone_numbers --verbose

# Force overwrite existing transformers
datacompose add emails --force

List Available Resources

datacompose list transformers
datacompose list generators

Display available transformers and code generators.

Show Version

datacompose --version

Get Help

datacompose --help
datacompose <command> --help

Configuration File

The datacompose.json file controls DataCompose behavior:

{
  "version": "1.0.0",
  "targets": {
    "pyspark": {
      "output": "./transformers/pyspark",
      "generator": "SparkPandasUDFGenerator"
    }
  },
  "templates": {
    "directory": "src/transformers/templates"
  }
}

Configuration Options

  • version: DataCompose configuration version
  • targets: Platform-specific settings
    • output: Where to generate code
    • generator: Which code generator to use
  • templates: Custom template settings
    • directory: Path to custom templates

Project Structure

After running datacompose add, your project will have:

project/
├── datacompose.json                    # Configuration file
├── transformers/
│   └── pyspark/
│       ├── emails.py                   # Email transformation primitives
│       ├── addresses.py                # Address transformation primitives
│       ├── phone_numbers.py            # Phone number transformation primitives
│       └── utils.py                    # Core framework and PrimitiveRegistry

Update Strategies

When updating transformers, you have several options:

  1. Regenerate: Use datacompose add --force to overwrite existing files
  2. Merge: Generate to a temporary location and manually merge changes
  3. Extend: Create wrapper functions that call generated code
  4. Fork: Copy and rename for complete independence

Best Practice: Always use version control and review changes before merging.

Environment Variables

DataCompose respects the following environment variables:

  • DATACOMPOSE_CONFIG: Path to configuration file (default: ./datacompose.json)
  • DATACOMPOSE_OUTPUT: Default output directory
  • DATACOMPOSE_VERBOSE: Enable verbose output by default

Common Workflows

Starting a New Project

# Initialize DataCompose
datacompose init --yes

# Or force overwrite existing config
datacompose init --force

# Add all common transformers
datacompose add emails
datacompose add addresses
datacompose add phone_numbers

Updating Existing Transformers

# Backup existing code
cp -r transformers/pyspark transformers/pyspark.backup

# Regenerate transformers
datacompose add emails --force

# Compare changes
diff -r transformers/pyspark.backup transformers/pyspark

Custom Output Locations

# Generate to specific locations
datacompose add emails --output src/transformers/email
datacompose add addresses --output src/transformers/address
datacompose add phone_numbers --output src/transformers/phone

Troubleshooting

Command Not Found

If datacompose is not found after installation:

# Check if it's in your PATH
which datacompose

# Or run directly with Python
python -m datacompose init

Permission Errors

If you encounter permission errors:

# Install in user space
pip install --user datacompose

# Or use a virtual environment
python -m venv venv
source venv/bin/activate
pip install datacompose

Configuration Issues

To reset configuration:

# Remove existing config
rm datacompose.json

# Reinitialize
datacompose init

Next Steps