datagen - Encoded Binary
datagen is the encoded/production binary that has models already compiled and embedded within it. Use this binary when you have a pre-built executable with models encoded for fast, repeatable data generation without transpilation overhead.
Overview
Section titled “Overview”When to use datagen:
- You have a pre-built binary with embedded models
- You’re in production or CI/CD pipelines
- You need fast, repeatable runs without transpilation
- You want to distribute a single executable
What it does:
- Uses models already embedded in the binary
- Filters models using tags or config files
- Generates data directly (no transpilation step)
Key Difference: No file paths needed - models are already inside the binary!
Commands
Section titled “Commands”datagen gen - Generate Data to Files
Section titled “datagen gen - Generate Data to Files”Generate data from embedded models and output to files or stdout.
Syntax
Section titled “Syntax”datagen gen [flags]Command Flags
Section titled “Command Flags”| Flag | Short | Description | Default | Example |
|---|---|---|---|---|
--count | -n | Number of records to generate (overrides metadata) | Uses metadata count | -n 1000 |
--seed | -s | Seed for deterministic random generation | none | -s 12345 |
--tags | -t | Filter models by tags (must match ALL key-value pairs) | "" | -t "service=auth,team=platform" |
--output | -o | Output directory or file path | ”.” | -o ./data |
--format | -f | Output format: csv, json, xml, stdout | stdout | -f csv |
Quick Examples
Section titled “Quick Examples”# Generate data for all embedded modelsdatagen gen
# Generate 1000 records for all modelsdatagen gen -n 1000
# Generate data for models matching specific tagsdatagen gen -t "service=auth"
# Generate models matching multiple tag criteria (AND logic)datagen gen -t "service=auth,environment=prod"
# Generate and save as CSVdatagen gen -n 1000 -f csv -o ./data
# Deterministic output with seeddatagen gen -n 10 -s 12345
# Generate for specific teamdatagen gen -t "team=platform" -n 500Output Formats
Section titled “Output Formats”csv- Comma-separated values with headersjson- JSON array of objectsxml- XML format with root elementstdout- Print to standard output (default)
Count Behavior
Section titled “Count Behavior”The --count flag controls how many records to generate:
Without --count flag
Section titled “Without --count flag”Uses the count specified in each model’s metadata section:
model User { metadata { count: 500 // Will generate 500 records } // ...}If no metadata count is specified, defaults to 1 record.
With --count flag
Section titled “With --count flag”Overrides all model counts with the specified value:
# Generate exactly 1000 records for each model, ignoring metadatadatagen gen -n 1000Tags Filtering
Section titled “Tags Filtering”Since the binary contains multiple embedded models, use tags to filter which models to generate:
How Tags Work
Section titled “How Tags Work”Tags are defined in the model’s metadata (before compilation):
model User { metadata { tags: { "service": "user-management", "team": "platform", "environment": "prod" } } // ...}Filtering by Tags
Section titled “Filtering by Tags”# Generate only models with specific servicedatagen gen -t "service=user-management"
# Generate models matching multiple criteria (AND logic)datagen gen -t "service=auth,environment=prod"
# Generate models for specific teamdatagen gen -t "team=platform"Important: Models must match ALL provided tag key-value pairs to be selected.
datagen execute - Load Data to Data Sinks
Section titled “datagen execute - Load Data to Data Sinks”Load data from embedded models directly into database sinks like MySQL.
Syntax
Section titled “Syntax”datagen execute --config <config_file> [flags]Required Arguments
Section titled “Required Arguments”--config- Path to configuration JSON file
Command Flags
Section titled “Command Flags”| Flag | Short | Description | Example |
|---|---|---|---|
--config | -c | Path to configuration JSON file | -c config.json |
--output | -o | Output directory for logs/artifacts | -o ./logs |
Configuration File
Section titled “Configuration File”The execute command requires a JSON configuration file that specifies which embedded models to use:
{ "models": [ { "model_name": "User", "target_sinks": ["mysql_sink"], "count": 1000 }, { "model_name": "Order", "target_sinks": ["mysql_sink"], "count": 500 } ], "sinks": [ { "sink_name": "mysql_sink", "sink_type": "mysql", "config": { "host": "localhost", "database": "testdb", "port": "3306", "user": "root", "password": "password", "batch_size": 1000, "throttle_ms": 10 } } ]}Examples
Section titled “Examples”# Load data into database using embedded modelsdatagen execute -c config.json
# Load data with custom output directory for logsdatagen execute -c config.json -o ./logs
# Production deploymentdatagen execute --config prod-config.jsonProcess Flow
Section titled “Process Flow”- Uses models already embedded in the binary
- Reads configuration file to determine which models to use
- Generates data according to config
- Loads data into specified sinks
- No transpilation - fast execution
Use Cases
Section titled “Use Cases”- Production data loading
- Scheduled data generation (cron jobs)
- CI/CD pipeline integration
- High-performance scenarios
- Distributed deployments with consistent models
Building an Encoded Binary
Section titled “Building an Encoded Binary”To create a datagen encoded binary from your models:
# Use datagenc to transpile and builddatagenc gen ./models --noexec -o ./output
# Navigate to output directory and buildcd ./outputgo build -o datagen
# Now you have an encoded binary with embedded models./datagen gen -t "service=auth"Getting Help
Section titled “Getting Help”# General helpdatagen --help
# Command-specific helpdatagen gen --helpdatagen execute --help
# Version informationdatagen --versionNext Steps
Section titled “Next Steps”- For development workflows, see the datagenc reference
- For a detailed comparison between binaries, see datagenc vs datagen
- For model syntax, see Data Model concepts
- For examples, see the Examples section