LogWise - Local Development Setup β
A complete end-to-end logging system that streams logs from Vector β Kafka β Spark β S3/Athena, with a Spring Boot Orchestrator, Grafana dashboards, and automated cron jobs.
ποΈ Architecture β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β Vector βββββββΆβ Kafka βββββββΆβ Spark βββββββΆβ S3 β
β (Logs) β β(Stream) β β(Process)β β(Storage)β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β
βΌ
βββββββββββββββ
β Athena β
β (Query) β
βββββββββββββββ
β
βΌ
βββββββββββββββ
β Grafana β
β (Dashboard) β
βββββββββββββββComponents:
- Vector: Log collection and forwarding
- Kafka: Message streaming (KRaft mode)
- Spark 3.1.2: Stream processing and Parquet writing
- S3: Object storage for processed logs
- Athena: Query engine for S3 data
- Grafana: Visualization and dashboards
- Orchestrator: Spring Boot service for job management
- MySQL: Database for orchestrator configuration
Note on log collectors
In the Docker-based Logwise stack, logs are shipped into Vector using the OpenTelemetry Collector (OTEL) over OTLP by default.
If you prefer to use other agents such as Fluent Bit, Fluentd, Logstash, or syslog-ng/rsyslog, see the Send Logs β Log collectors section for agent-specific setup guides and the required Vector configuration changes.
π Prerequisites β
Required β
- Docker (v20.10+) and Docker Compose (v2.0+)
- Make (for convenience commands)
- AWS Credentials with access to:
- S3 bucket (read/write)
- Athena workgroup (query execution)
Note: The
setup.shscript will automatically install Docker, Make, and other prerequisites if they're missing (on macOS and Debian/Ubuntu Linux). For other systems, install these manually before running setup.
Optional β
- Maven 3.2+ (if building Spark JAR locally)
- Java 11+ (if building Spark JAR locally)
β οΈ Mandatory: S3 & Athena Setup (Must Complete First) β
Before proceeding with the Docker setup, you MUST complete the S3 & Athena configuration. This is a required prerequisite as the LogWise stack depends on AWS S3 for log storage and Athena for querying.
Steps to Complete: β
Follow the S3 & Athena Setup Guide to:
- Create an S3 bucket with
logsandathena-outputfolders - Create an AWS Glue database
- Create an Athena workgroup
- Create the
application-logstable
- Create an S3 bucket with
Note down the following information (you'll need it for the
.envfile):- S3 bucket name
- S3 URI for logs (e.g.,
s3://your-bucket-name/logs/) - S3 URI for Athena output (e.g.,
s3://your-bucket-name/athena-output/) - Athena workgroup name
- Athena database name (typically
logs)
Return to this page after completing the S3 & Athena setup to continue with the Docker deployment.
Critical
Do not proceed with the Docker setup until you have completed the S3 & Athena configuration. The setup will fail without proper AWS resources configured.
π Quick Start β
One-Command Setup β
The easiest way to get started is with our one-click setup script:
cd deploy/docker/scripts
./setup.shThis single command will:
- β Install prerequisites (Docker, Make, AWS CLI, etc.) if needed
- β
Create
.envfile from template (.env.example) - β Prompt you to fill in AWS credentials
- β Start all services (Vector, Kafka, Spark, Grafana, Orchestrator, MySQL)
- β Wait for services to become healthy
- β Create Kafka topics automatically
That's it! Your LogWise stack will be up and running.
π Accessing Services β
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin (default) |
| Spark Master UI | http://localhost:18080 | - |
| Spark Worker UI | http://localhost:8081 | - |
| Orchestrator | http://localhost:8080 | - |
| Orchestrator Health | http://localhost:8080/healthcheck | - |
βοΈ Configuration Details β
The .env file contains all configuration for the LogWise stack. When you run setup.sh, it automatically creates this file from .env.example. Here are the key configuration sections:
AWS Configuration (Required) β
AWS_REGION=us-east-1 # AWS region for S3 and Athena
AWS_ACCESS_KEY_ID=your-access-key # AWS access key ID
AWS_SECRET_ACCESS_KEY=your-secret-key # AWS secret access key
AWS_SESSION_TOKEN= # Optional: for temporary credentialsS3 Configuration (Required) β
S3_BUCKET_NAME=your-bucket-name # S3 bucket for storing processed logs
S3_PREFIX=logs/ # Prefix/path within the bucketAthena Configuration (Required) β
S3_ATHENA_OUTPUT=s3://bucket/athena-output/ # S3 path for Athena query results
ATHENA_WORKGROUP=primary # Athena workgroup name
ATHENA_CATALOG=AwsDataCatalog # Athena data catalog
ATHENA_DATABASE=logwise # Athena database nameKafka Configuration β
KAFKA_BROKERS=kafka:9092 # Kafka broker address (default for Docker)
KAFKA_TOPIC=logs # Kafka topic name for logs
KAFKA_CLUSTER_ID=9ZkYwXlQ2Tq8rBn5JcH0xA # Kafka cluster ID (KRaft mode)Spark Configuration β
SPARK_MASTER_URL=spark://spark-master:7077 # Spark master URL
SPARK_STREAMING=true # Enable Spark streaming
SPARK_MASTER_UI_PORT=18080 # Spark Master UI port
SPARK_VERSION_MATCH=3.1.2 # Spark version
HADOOP_AWS_VERSION=3.2.0 # Hadoop AWS library version
AWS_SDK_VERSION=1.11.375 # AWS SDK version
MAIN_CLASS=com.logwise.spark.MainApplication # Spark application main classDatabase Configuration β
MYSQL_DATABASE=myapp # MySQL database name
MYSQL_USER=myapp # MySQL user
MYSQL_PASSWORD=myapp_pass # MySQL password
MYSQL_ROOT_PASSWORD=root_pass # MySQL root passwordOther Configuration β
ORCH_PORT=8080 # Orchestrator service port
TENANT_VALUE=ABC # Tenant identifierFor a complete list of all environment variables, see .env.example in the deploy directory.
π οΈ Common Commands β
# Start all services
make up
# Stop all services
make down
# View logs
make logs
# Check service status
make ps
# Stop and remove volumes
make teardown
# Reset Kafka (fix cluster ID issues)
make reset-kafkaβ οΈ Troubleshooting β
Spark Worker Not Accepting Resources β
Symptom: WARN Master: App requires more resource than any of Workers could have
Solution:
- Check worker memory:
docker compose logs spark-worker | grep "Starting Spark worker" - Ensure worker has enough memory. The worker needs:
- Memory for driver + executor + overhead
- Default: 512m driver + 512m executor = ~1GB minimum
- Adjust in
.env:bashSPARK_DRIVER_MEMORY=400m SPARK_EXECUTOR_MEMORY=400m - Or increase worker memory limit in
docker-compose.yml:yamlspark-worker: mem_limit: 3g
ClassNotFoundException for S3 or Kafka β
Symptom: java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem
Solution:
- The custom Spark Dockerfile includes required JARs:
hadoop-aws-3.2.0.jaraws-java-sdk-bundle-1.11.375.jarspark-sql-kafka-0-10_2.12-3.1.2.jarkafka-clients-2.6.0.jar
- Rebuild the Spark image:
docker compose build spark-worker spark-master spark-client
AWS Access Denied (403 Forbidden) β
Symptom: AccessDeniedException: 403 Forbidden
Solution:
- Verify AWS credentials in
.env:bashAWS_ACCESS_KEY_ID=your-key AWS_SECRET_ACCESS_KEY=your-secret AWS_SESSION_TOKEN=your-token # If using temporary credentials AWS_REGION=us-east-1 - Ensure IAM permissions include:
s3:GetObject,s3:PutObject,s3:ListBucketon target bucketathena:StartQueryExecution,athena:GetQueryResults(if using Athena)
- Restart Spark client:
docker compose restart spark-client
Port Conflicts β
Symptom: Error: bind: address already in use
Solution:
- Change ports in
.env:bashGRAFANA_PORT=3001 ORCH_PORT=8081
Kafka Cluster ID Mismatch β
Symptom: Cluster ID mismatch errors
Solution:
make reset-kafka
make upDisk Space Issues β
Symptom: no space left on device
Solution:
# Clean up Docker
docker system prune -a --volumes
# Remove unused images
docker image prune -aSpark Worker Not Registering β
Symptom: Worker fails to connect to master
Solution:
- Check network connectivity:bash
docker compose exec spark-worker curl http://spark-master:8080 - Verify master is running:bash
docker compose logs spark-master | grep "Successfully started service" - Check worker logs:bash
docker compose logs spark-worker | grep -i "error\|exception"
π Project Structure β
logwise/
βββ deploy/
β βββ docker-compose.yml # Main orchestration file
β βββ Makefile # Convenience commands
β βββ setup.sh # One-click setup script
β βββ grafana/provisioning/ # Grafana dashboards & datasources
β βββ healthcheck-dummy/
β βββ Dockerfile # Healthcheck test service
βββ vector/
β βββ vector.yaml # Vector configuration
β βββ logwise-vector.desc # Protobuf descriptor
βββ spark/
β βββ docker/Dockerfile # Spark container image
βββ orchestrator/
βββ docker/Dockerfile # Orchestrator container image
βββ db/init/ # Database initialization scriptsπ Security Notes β
- Never commit
.envfile - Contains sensitive AWS credentials - Use IAM roles in production instead of access keys
- Enable TLS/SSL for production deployments
- Restrict network access to services in production
Happy Logging! π
