Anchor | ||||
---|---|---|---|---|
|
Table of Contents
Objective
Data Requirements
Technical requirements
Data/Metadata flow diagram
Cluster configuration
BNY iHUB Cluster
Topic
Design considerations
Possible implementations
Development items
DNA Produce/Consume Library
DNA Producer
DNA Consumer
Schema Registry
Anchor | ||||
---|---|---|---|---|
|
...
Feed | Row Count | Data Size (kb) |
Positions | 10000 | 80,000 |
Transactions | 10000 | 82,500 |
Tax lots | 10000 | 42,500 |
Net asset values | 4,500 | |
Ledger balance | 5000 | 23,500 |
Account dimensions | 162 | |
Account share class | 462 | |
Fund instruments | 75,000 | |
Snapshot | 30 | |
Accounts | 155 |
The data will come unevenly with the peak volume EOD where most of this data (95%, possibly 100% initially) will arrive.
In future the number of funds will increase and will reach 500, so, we should expect 10 times of the volume of data and we should be able to expand the Cluster easily to accommodate the growth.
It is also expected that future integrations would become more even with at least 20-30% of data arriving during the day.
The data in given in JSON format. The format is simple JSON object with only 2 nested levels. For all data feeds the schema is known.
...
Based on recommendation from architectural doc the name of the topic should be:
BNY.IN.PRD.IHUB.DEFAULT.DEFAULT
This topic can be created once when Cluster is available.
Anchor | ||||
---|---|---|---|---|
|
...
We do not consider at this point Kafka Connectors as this is separate piece of infrastructure, which has to be maintained and most importantly it would go directly to snowflake and thus bypassing our current ingestion pipe-line and data rules, which would create a need for completely different monitoring consoles and the necessity to synchronize schema between connector and Vault.
We believe that option number 1 using DNA Producer library is the best and, so, below is the further discussion how to implement it.
Anchor | ||||
---|---|---|---|---|
|
...