Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Anchor
_GoBack
_GoBack
BNY Lake Vault Integration
Table of Contents
Objective
Data Requirements
Technical requirements
Data flows diagram
Cluster configuration
BNY Lake Cluster
Topic
Development items
DNA Produce/Consume Library
DNA Producer
DNA Consumer
Schema Registry

Anchor
_Toc84683670
_Toc84683670
Objective

...

Based on recommendation from architectural doc the name of the topic should be:
BNY.IN.PRD.LAKE.DEFAULT.DEFAULT
This topic can be created once when Cluster is available.

Design considerations

Based on Kafka architecture Producer is so-called "topic owner". It would be producer who will create batches and load balance records between topic partitions and will control commits and acks. It also controls compression.
That is why it is critical for us own this part.
Our business logic requires proper micro-batching. For example, if we allowed some foreign producer to send records of indicated above feeds, we can have batches that are absolutely of mixed feeds, which will be very slow to process. We need to create batches based on feed type as the key. On the other hand, we do not want to create key-based partitions and restrict partition to a specific feed type as it would make partitions very unevenly populated. We should achieve micro-batches where we serialize multiple records of the same feed type into the same physical Kafka message while the schema during serialization process. Finally compress it to the configured size. This way we will not need to have key partitions and we will spread physical messages evenly between partitions. Finally, we will be able to easily add partitions if we saw that volume has grown.
We also must achieve exactly-once delivery guarantee (this term actually means exactly-once processing). It is only possible by implementing two design principals:

...