Kafka : Communication among Micro Services
Microservice Communications has two Approaches
1. RPC based and
2. Enterprise Service Bus which has root from SOA.
Implementation
RPC based is implemented using
1. REST API
It needs:
-load balancer
-service discovery
2. gRPC
Enterprise Service Bus is implemented using
1. Messaging Queue
1.1 RabbitMQ
1.2 ActiveMQ
1.3 ZeroMQ
2. Kafka
1. Synchronous protocol using RESTful API over HTTP
2. Asynchronous protocol AMQP. smart endpoints and dumb pipe pattern.
Kafka is distributed stream processing platform with high resilience and fault tolerance. Kafka replaces Java Message Service (JMS), Advanced Message Queuing Protocol (AMQP), etc. Kafka supports both Pub-sub and queuing feature
Streaming platform capabilities
1. Publish and subscribe
2. Store stream of record in fault-tolerant way
3. Process stream of records
Kafka Features
- low latency,
- auto-scaling,
- centralized management,
- proven high availability
- unified platform,
- high-throughput,
for handling real-time data feeds
Apache Kafka Architecture
Kafka is combination of messaging, storage, and stream processing
Messages are written to a log-style stream called a topic. It is like a list of chronological events broken up into multiple streams, known as “topics”.
Two types of topics
1. Regular topic
- time or space bound
- default 7 days.
2. compacted
- never expire
- can be updated
- to delete write : tombstone message with null-value for a specific key.
Topic has records
Record has key, value and timestamp
Kafka topics are divided into various partitions. Partitions enable parallelization of topics. As many consumer that many partitions. Partitions are distributed and replicated across multiple brokers (servers). The flag "replication-factor" determines how many copies of the topic partition has to be made. This is how fault tolerance is achieved.
Broker has partitions, that can be leader or replica for given topic.
JSON, AVRO, or Protobufs as their serialization format. For effective use of n/w bandwidth, Kafka supports GZIP, Snappy, LZ4 and ZStandard compression protocol
ACL for read and write topics
Kafka APIs types
1. consumer
2. producer
3. connector : for import and export
4. Streams : To develop, stateful, scalable stream processing app. Kafka offers the Streams API that allows writing Java applications that consume data from Kafka and write results back to Kafka. Apache Kafka also works with external stream processing systems such as
- Apache Apex,
- Apache Flink,
- Apache Spark, and
- Apache Storm,
- Samza,
- Spout,
- SpartStreaming,
- IBM Streams,
- Spring Cloud Stream.
5. Admin Client API
Two types of Stream APIs
4.1 DSL API. Stream processing DSL (Domain specific language) offers filter, map, grouping, windowing, aggregation, joins, and the notion of tables
4.2 Processor API.
Controller
Controller is also broker with more responsibilities of partition management, that includes
* Leader selection
* Leader switch
* New topic and partition
* New broker
Kafka Monitoring tools
1. Burrow
2. Datadog
Advantage with micro-services
* Messages are ordered chronologically and delivery is guaranteed
* Strong durability, resilience and performance
Steps
1. Add Kafka producer code to existing service in monolith
2. Develop new service with Kafka consume code and store everything in DB
3. Test. How far the new consumer service is time lagging.
4. Enhance newly added service with some relevant code from existing monolith. It should fetch data from DB. Temporarily disable some external calls.
5. Now filter events in producer at monolith. Send only actionable events to consumer new service, via Kafka.
6. Enhance the new service further. Send events back to monolith via Kafka.
7. Test
8. Remove the code from monolith that was added to new service.
9. Repeat.
Dependency
1. Gradel build tool
2. Java
Python and Kafka
Three alternatives
1. kafka-python : https://github.com/dpkp/kafka-python
2. pykafka https://github.com/Parsely/pykafka
3. confluent-kafka-python https://github.com/confluentinc/confluent-kafka-python
Reference
Kafka eco system : https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Kafka Documentation
http://kafka.apache.org/documentation/
Python and Kafka
https://towardsdatascience.com/kafka-python-explained-in-10-lines-of-code-800e3e07dad1
https://blog.heroku.com/monolithic-applications-into-services
strimzi Kafka on Minikube https://strimzi.io/quickstarts/
1. RPC based and
2. Enterprise Service Bus which has root from SOA.
Implementation
RPC based is implemented using
1. REST API
It needs:
-load balancer
-service discovery
2. gRPC
Enterprise Service Bus is implemented using
1. Messaging Queue
1.1 RabbitMQ
1.2 ActiveMQ
1.3 ZeroMQ
2. Kafka
1. Synchronous protocol using RESTful API over HTTP
2. Asynchronous protocol AMQP. smart endpoints and dumb pipe pattern.
Kafka is distributed stream processing platform with high resilience and fault tolerance. Kafka replaces Java Message Service (JMS), Advanced Message Queuing Protocol (AMQP), etc. Kafka supports both Pub-sub and queuing feature
Streaming platform capabilities
1. Publish and subscribe
2. Store stream of record in fault-tolerant way
3. Process stream of records
Kafka Features
- low latency,
- auto-scaling,
- centralized management,
- proven high availability
- unified platform,
- high-throughput,
for handling real-time data feeds
Apache Kafka Architecture
Kafka is combination of messaging, storage, and stream processing
Messages are written to a log-style stream called a topic. It is like a list of chronological events broken up into multiple streams, known as “topics”.
Two types of topics
1. Regular topic
- time or space bound
- default 7 days.
2. compacted
- never expire
- can be updated
- to delete write : tombstone message with null-value for a specific key.
Topic has records
Record has key, value and timestamp
Kafka topics are divided into various partitions. Partitions enable parallelization of topics. As many consumer that many partitions. Partitions are distributed and replicated across multiple brokers (servers). The flag "replication-factor" determines how many copies of the topic partition has to be made. This is how fault tolerance is achieved.
Broker has partitions, that can be leader or replica for given topic.
JSON, AVRO, or Protobufs as their serialization format. For effective use of n/w bandwidth, Kafka supports GZIP, Snappy, LZ4 and ZStandard compression protocol
ACL for read and write topics
Kafka APIs types
1. consumer
2. producer
3. connector : for import and export
4. Streams : To develop, stateful, scalable stream processing app. Kafka offers the Streams API that allows writing Java applications that consume data from Kafka and write results back to Kafka. Apache Kafka also works with external stream processing systems such as
- Apache Apex,
- Apache Flink,
- Apache Spark, and
- Apache Storm,
- Samza,
- Spout,
- SpartStreaming,
- IBM Streams,
- Spring Cloud Stream.
5. Admin Client API
Two types of Stream APIs
4.1 DSL API. Stream processing DSL (Domain specific language) offers filter, map, grouping, windowing, aggregation, joins, and the notion of tables
4.2 Processor API.
Controller
Controller is also broker with more responsibilities of partition management, that includes
* Leader selection
* Leader switch
* New topic and partition
* New broker
Kafka Monitoring tools
1. Burrow
2. Datadog
Advantage with micro-services
* Messages are ordered chronologically and delivery is guaranteed
* Strong durability, resilience and performance
Steps
1. Add Kafka producer code to existing service in monolith
2. Develop new service with Kafka consume code and store everything in DB
3. Test. How far the new consumer service is time lagging.
4. Enhance newly added service with some relevant code from existing monolith. It should fetch data from DB. Temporarily disable some external calls.
5. Now filter events in producer at monolith. Send only actionable events to consumer new service, via Kafka.
6. Enhance the new service further. Send events back to monolith via Kafka.
7. Test
8. Remove the code from monolith that was added to new service.
9. Repeat.
Dependency
1. Gradel build tool
2. Java
Python and Kafka
Three alternatives
1. kafka-python : https://github.com/dpkp/kafka-python
from time import sleep from json import dumps, loads from kafka import KafkaProducer, KafkaConsumer
producer = KafkaProducer(bootstrap_servers=['localhost:9092'], value_serializer=lambda x: dumps(x).encode('utf-8'))
for e in range(1000): data = {'number' : e} producer.send('numtest', value=data) sleep(5)
consumer = KafkaConsumer( 'numtest', bootstrap_servers=['localhost:9092'], auto_offset_reset='earliest', enable_auto_commit=True, group_id='my-group', value_deserializer=lambda x: loads(x.decode('utf-8'))
for message in consumer: message = message.value collection.insert_one(message) print('{} added to {}'.format(message, collection))
2. pykafka https://github.com/Parsely/pykafka
3. confluent-kafka-python https://github.com/confluentinc/confluent-kafka-python
Reference
Kafka eco system : https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Kafka Documentation
http://kafka.apache.org/documentation/
Python and Kafka
https://towardsdatascience.com/kafka-python-explained-in-10-lines-of-code-800e3e07dad1
https://blog.heroku.com/monolithic-applications-into-services
strimzi Kafka on Minikube https://strimzi.io/quickstarts/
0 comments:
Post a Comment