CLI Reference
The DataCompose CLI provides commands to initialize projects, add transformers, and manage your data transformation pipeline.
Installation
pip install datacompose
Commands Overview
Initialize a Project
datacompose init [--yes] [--force]
Creates a datacompose.json
configuration file with default settings.
Options:
--yes
,-y
: Auto-accept all defaults--force
: Overwrite existing configuration file
Add Transformers
datacompose add <transformer> [--output OUTPUT] [--verbose] [--force]
Generate production-ready transformation code for the specified transformer.
Arguments:
<transformer>
: Name of the transformer to add (e.g.,emails
,addresses
,phone_numbers
)
Options:
--output
,-o
: Output directory (default:./transformers/pyspark
)--verbose
,-v
: Enable verbose output--force
: Overwrite existing files
Examples:
# Add email transformers
datacompose add emails
# Add address transformers to custom directory
datacompose add addresses --output ./custom/path
# Add phone transformers with verbose output
datacompose add phone_numbers --verbose
# Force overwrite existing transformers
datacompose add emails --force
List Available Resources
datacompose list transformers
datacompose list generators
Display available transformers and code generators.
Show Version
datacompose --version
Get Help
datacompose --help
datacompose <command> --help
Configuration File
The datacompose.json
file controls DataCompose behavior:
{
"version": "1.0.0",
"targets": {
"pyspark": {
"output": "./transformers/pyspark",
"generator": "SparkPandasUDFGenerator"
}
},
"templates": {
"directory": "src/transformers/templates"
}
}
Configuration Options
- version: DataCompose configuration version
- targets: Platform-specific settings
- output: Where to generate code
- generator: Which code generator to use
- templates: Custom template settings
- directory: Path to custom templates
Project Structure
After running datacompose add
, your project will have:
project/
├── datacompose.json # Configuration file
├── transformers/
│ └── pyspark/
│ ├── emails.py # Email transformation primitives
│ ├── addresses.py # Address transformation primitives
│ ├── phone_numbers.py # Phone number transformation primitives
│ └── utils.py # Core framework and PrimitiveRegistry
Update Strategies
When updating transformers, you have several options:
- Regenerate: Use
datacompose add --force
to overwrite existing files - Merge: Generate to a temporary location and manually merge changes
- Extend: Create wrapper functions that call generated code
- Fork: Copy and rename for complete independence
Best Practice: Always use version control and review changes before merging.
Environment Variables
DataCompose respects the following environment variables:
DATACOMPOSE_CONFIG
: Path to configuration file (default:./datacompose.json
)DATACOMPOSE_OUTPUT
: Default output directoryDATACOMPOSE_VERBOSE
: Enable verbose output by default
Common Workflows
Starting a New Project
# Initialize DataCompose
datacompose init --yes
# Or force overwrite existing config
datacompose init --force
# Add all common transformers
datacompose add emails
datacompose add addresses
datacompose add phone_numbers
Updating Existing Transformers
# Backup existing code
cp -r transformers/pyspark transformers/pyspark.backup
# Regenerate transformers
datacompose add emails --force
# Compare changes
diff -r transformers/pyspark.backup transformers/pyspark
Custom Output Locations
# Generate to specific locations
datacompose add emails --output src/transformers/email
datacompose add addresses --output src/transformers/address
datacompose add phone_numbers --output src/transformers/phone
Troubleshooting
Command Not Found
If datacompose
is not found after installation:
# Check if it's in your PATH
which datacompose
# Or run directly with Python
python -m datacompose init
Permission Errors
If you encounter permission errors:
# Install in user space
pip install --user datacompose
# Or use a virtual environment
python -m venv venv
source venv/bin/activate
pip install datacompose
Configuration Issues
To reset configuration:
# Remove existing config
rm datacompose.json
# Reinitialize
datacompose init
Next Steps
- Getting Started Guide - Learn the basics
- Transformers Documentation - Explore available transformers
- API Reference - Detailed API documentation