In this tutorial, we’ll walk through the process of setting up a Kafka connector using Docker and docker-compose.yml. We’ll focus on configuring a file connector, which is useful for reading data from files and writing data to files using Kafka.
Link to repository
Prerequisites
- Docker and Docker Compose installed on your system
- Basic understanding of Kafka and Docker concepts
Project Structure
Before we begin, let’s look at the files we’ll be working with:
docker-compose.yml
: Defines our services (Kafka and Kafka Connect)Dockerfile
: Custom image for Kafka Connectentrypoint.sh
: Script to configure and start Kafka Connectplugins.sh
: Script to set up the necessary connector pluginscustom-plugins/
: Directory for storing custom/external connector plugins
Step 1: Setting up the Docker Compose File
Our docker-compose.yml
file defines two services: kafka
and connect
. Here’s a breakdown of the important configurations:
version: "3"
services:
kafka:
image: apache/kafka:latest
# ... (Kafka configuration)
connect:
build:
context: .
dockerfile: Dockerfile
depends_on:
kafka:
condition: service_healthy
ports:
- 8083:8083
environment:
BOOTSTRAP_SERVERS: kafka:9092
REST_ADVERTISED_HOST_NAME: connect
REST_PORT: 8083
GROUP_ID: compose-connect-group
CONFIG_STORAGE_TOPIC: docker-connect-configs
OFFSET_STORAGE_TOPIC: docker-connect-offsets
STATUS_STORAGE_TOPIC: docker-connect-status
KEY_CONVERTER: "org.apache.kafka.connect.storage.StringConverter"
VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONFIG_STORAGE_REPLICATION_FACTOR: 1
OFFSET_STORAGE_REPLICATION_FACTOR: 1
STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY: All
CONNECT_PLUGIN_PATH: "/opt/kafka/plugins"
volumes:
- ./data:/data
- ./custom-plugins:/opt/kafka/custom-plugin
The connect
service is built from our custom Dockerfile and depends on the kafka
service. We’ve set up environment variables to configure Kafka Connect, including the bootstrap servers, storage topics, and converters.
Step 2: Creating the Dockerfile
Our Dockerfile sets up the Kafka Connect image:
FROM apache/kafka:latest
WORKDIR /opt/kafka
EXPOSE 8083
RUN mkdir /opt/kafka/plugins
COPY plugins.sh .
RUN sh plugins.sh
COPY entrypoint.sh .
ENTRYPOINT ["./entrypoint.sh"]
CMD ["start"]
This Dockerfile:
- Uses the latest Apache Kafka image as the base
- Creates a plugins directory
- Copies and runs the
plugins.sh
script to set up connectors - Copies the
entrypoint.sh
script - Sets the entrypoint to our custom script
Step 3: Configuring the Entrypoint Script
The entrypoint.sh
script is responsible for configuring and starting Kafka Connect. It:
- Sets default environment variables
- Configures logging
- Processes Kafka Connect properties
- Starts the Kafka Connect distributed worker
Step 4: Setting up Connector Plugins
The plugins.sh
script copies the necessary connector JARs to the plugins directory:
#!/bin/usr/env bash
set -eo pipefail
KAFKA_HOME=${KAFKA_HOME:-/opt/kafka}
KAFKA_CONNECT_PLUGINS_DIR=${KAFKA_CONNECT_PLUGINS_DIR:-/opt/kafka/plugins}
# Copy the necessary connector JARs
cp "$KAFKA_HOME"/libs/connect-file-*.jar "$KAFKA_CONNECT_PLUGINS_DIR"
cp "$KAFKA_HOME"/libs/kafka-clients-*.jar "$KAFKA_CONNECT_PLUGINS_DIR"
cp "$KAFKA_HOME"/libs/connect-api-*.jar "$KAFKA_CONNECT_PLUGINS_DIR"
cp "$KAFKA_HOME"/libs/connect-transforms-*.jar "$KAFKA_CONNECT_PLUGINS_DIR"
echo "Kafka File Source and Sink connectors have been set up in $KAFKA_CONNECT_PLUGINS_DIR"
# List the installed connector JARs
echo "Installed connector JARs:"
ls -1 "$KAFKA_CONNECT_PLUGINS_DIR"
echo "Done!"
This script ensures that the file connector and necessary dependencies are available in the plugins directory.
Step 5: Running the Setup
To start the Kafka and Kafka Connect services:
- Make sure all files (
docker-compose.yml
,Dockerfile
,entrypoint.sh
, andplugins.sh
) are in the same directory. - Create a
data
directory in the same location for file connector data. - Run the following command:
docker-compose up -d
This command will build the custom Kafka Connect image and start both the Kafka and Kafka Connect services.
Step 6: Configuring a File Connector
Now that our services are running, we can configure a file connector. Here’s an example of how to create a file source connector:
Create a file named
source.txt
in thedata
directory with some sample content.Use the following curl command to create the connector:
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{
"name": "file-source",
"config": {
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"file": "/data/source.txt",
"topic": "file-topic"
}
}'
This will create a source connector that reads from /data/source.txt
and writes to the file-topic
Kafka topic.
To create a file sink connector, use:
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{
"name": "file-sink",
"config": {
"connector.class": "org.apache.kafka.connect.file.FileStreamSinkConnector",
"file": "/data/sink.txt",
"topics": "file-topic"
}
}'
This will create a sink connector that reads from the file-topic
Kafka topic and writes to /data/sink.txt
.
Step 7: Verifying the Setup
After setting up your connectors, it’s important to verify that everything is working correctly. Here are some ways to do that:
Checking Topics
To verify if the topic is created, you can list all topics:
docker-compose exec kafka ./opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
You should see file-topic
in the list of topics.
Consuming Messages
To verify that messages are being produced to the topic, you can use the Kafka console consumer:
docker exec -it kafka /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic file-topic --from-beginning
This command will display all messages in the file-topic
from the beginning.
Checking Connector Status
You can use the Kafka Connect REST API to check the status of your connectors. Here are some useful curl commands:
List all connectors:
curl http://localhost:8083/connectors
Check the status of a specific connector (e.g., file-source):
curl http://localhost:8083/connectors/file-source/status
Check the offsets of a specific connector:
curl http://localhost:8083/connectors/file-source/offsets
These commands will help you verify that your connectors are running correctly and processing data as expected.
Adding Custom Connector Plugins
While our setup includes the file connector by default, you might want to add other custom connector plugins. You can do this easily using Docker volumes. Here’s how:
Create a directory for your custom plugins:
mkdir -p custom-plugins
Add your custom connector JAR files to this directory.
Modify the
docker-compose.yml
file to add a volume for the custom plugins. Add the following under thevolumes
section of theconnect
service:volumes: - ./data:/data - ./custom-plugins:/opt/kafka/custom-plugins
Update the
CONNECT_PLUGIN_PATH
environment variable in thedocker-compose.yml
file:environment: # ... other environment variables ... CONNECT_PLUGIN_PATH: "/opt/kafka/plugins,/opt/kafka/custom-plugins"
Restart your services:
docker-compose down docker-compose up -d
Verify the plugins:
curl http://localhost:8083/connector-plugins/
Now, Kafka Connect will load any connector plugins found in your custom-plugins
directory.
Example: Adding the Aiven JDBC connector:
- Download the connector JAR from the Aiven JDBC connector releases.
- Place the JAR file in your
custom-plugins
directory. - Restart the services as described above.
- You should now be able to use the JDBC connector in your Kafka Connect setup.
Remember to check the documentation of any custom connectors you add, as they may require additional configuration or dependencies.
Conclusion
In this tutorial, we’ve walked through the process of setting up Kafka and Kafka Connect using Docker and docker-compose.yml. We’ve also demonstrated how to configure file source and sink connectors. This setup provides a flexible foundation for building data pipelines with Kafka, allowing you to easily connect various data sources and sinks to your Kafka cluster.
Remember to monitor your connectors and Kafka cluster for performance and adjust configurations as needed for your specific use case.