Kafka : Communication among Micro Services


Microservice Communications has two Approaches

1. RPC based and
2. Enterprise Service Bus which has root from SOA. 

Implementation

RPC based is implemented using 

1. REST API
It needs: 
-load balancer
-service discovery
2. gRPC

Enterprise Service Bus is implemented using

1. Messaging Queue
1.1 RabbitMQ
1.2 ActiveMQ
1.3 ZeroMQ
2. Kafka

1. Synchronous protocol using RESTful API over HTTP
2. Asynchronous protocol AMQP. smart endpoints and dumb pipe pattern. 

Kafka is distributed stream processing platform with high resilience and fault tolerance. Kafka replaces  Java Message Service (JMS), Advanced Message Queuing Protocol (AMQP), etc. Kafka supports both Pub-sub and queuing feature

Streaming platform capabilities

1. Publish and subscribe
2. Store stream of record in fault-tolerant way
3. Process stream of records

Kafka Features

-  low latency, 
-  auto-scaling, 
-  centralized management, 
-  proven high availability 
-  unified platform, 
-  high-throughput, 
for handling real-time data feeds

Apache Kafka Architecture

Kafka is combination of messaging, storage, and stream processing 

Messages are written to a log-style stream called a topic. It is like a list of chronological events broken up into multiple streams, known as “topics”.

Two types of topics
1. Regular topic
- time or space bound
- default 7 days.  
2. compacted
- never expire
- can be updated
- to delete write : tombstone message with null-value for a specific key.

Topic has records
Record has key, value and timestamp

Kafka topics are divided into various partitions. Partitions enable parallelization of topics. As many consumer that many partitions. Partitions are distributed and replicated across multiple brokers (servers). The flag "replication-factor" determines how many copies of the topic partition has to be made. This is how fault tolerance is achieved. 

Broker has partitions, that can be leader or replica for given topic. 

JSON, AVRO, or Protobufs as their serialization format. For effective use of n/w bandwidth, Kafka supports GZIP, Snappy, LZ4 and ZStandard compression protocol

ACL for read and write topics

Kafka APIs types

1. consumer
2. producer
3. connector : for import and export
4. Streams : To develop, stateful, scalable stream processing app. Kafka offers the Streams API that allows writing Java applications that consume data from Kafka and write results back to Kafka. Apache Kafka also works with external stream processing systems such as 
- Apache Apex, 
- Apache Flink, 
- Apache Spark, and 
- Apache Storm, 
- Samza, 
- Spout, 
- SpartStreaming, 
- IBM Streams, 
- Spring Cloud Stream. 
5. Admin Client API

Two types of Stream APIs
4.1 DSL API. Stream processing DSL (Domain specific language) offers filter, map, grouping, windowing, aggregation, joins, and the notion of tables
4.2 Processor API. 

Controller

Controller is also broker with more responsibilities of partition management, that includes

* Leader selection
* Leader switch
* New topic and partition
* New broker

Kafka Monitoring tools

1. Burrow
2. Datadog

Advantage with micro-services

* Messages are ordered chronologically and delivery is guaranteed
* Strong durability, resilience and performance

Steps

1. Add Kafka producer code to existing service in monolith
2. Develop new service with Kafka consume code and store everything in DB
3. Test. How far the new consumer service is time lagging. 
4. Enhance newly added service with some relevant code from existing monolith. It should fetch data from DB. Temporarily disable some external calls. 
5. Now filter events in producer at monolith. Send only actionable events to consumer new service, via Kafka.
6. Enhance the new service further. Send events back to monolith via Kafka. 
7. Test
8. Remove the code from monolith that was added to new service. 
9. Repeat. 

Dependency

1. Gradel build tool
2. Java

Python and Kafka

Three alternatives

1. kafka-python : https://github.com/dpkp/kafka-python
from time import sleep
from json import dumps, loads
from kafka import KafkaProducer, KafkaConsumer
producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
                         value_serializer=lambda x: 
                         dumps(x).encode('utf-8'))
for e in range(1000):
    data = {'number' : e}
    producer.send('numtest', value=data)
    sleep(5)
consumer = KafkaConsumer(
    'numtest',
     bootstrap_servers=['localhost:9092'],
     auto_offset_reset='earliest',
     enable_auto_commit=True,
     group_id='my-group',
     value_deserializer=lambda x: loads(x.decode('utf-8'))
for message in consumer:
    message = message.value
    collection.insert_one(message)
    print('{} added to {}'.format(message, collection))

2. pykafka https://github.com/Parsely/pykafka
3. confluent-kafka-python https://github.com/confluentinc/confluent-kafka-python

Reference 

Kafka eco system : https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem

Kafka Documentation
http://kafka.apache.org/documentation/

Python and Kafka
https://towardsdatascience.com/kafka-python-explained-in-10-lines-of-code-800e3e07dad1

https://blog.heroku.com/monolithic-applications-into-services

0 comments:

Post a Comment