Schema Drift

Schema drift is detection of any schema changes in incoming data file. There are several schema drift modes that handle drifted data in specific ways.

API for schema drift is inflow API:

Dev - https://apps.dev.az.eagleinvsys.com:8443/api/vault/eds/api/doc/#/inflow-rest/submitV3InflowPOST

Stage - https://apps.stage.az.eagleinvsys.com:8443/api/vault/eds/api/doc/#/inflow-rest/submitV3InflowPOST

Prod - https://apps.prod.az.eagleinvsys.com:8443/api/vault/eds/api/doc/#/inflow-rest/submitV3InflowPOST

IGNORE Schema Drift Mode

This schema drift mode is set by default

Schema drift mode IGNORE will ignore any schema drift if it was found or not and will load data without drifted columns.

Example of request:

{ "resourcename": "legalentityid", "dbprovider": "snowflake", "schemadriftmode": "ignore" }

VALIDATE Schema Drift Mode

Schema drift mode VALIDATE is detecting any schema changes for resource on load.

Example of request:

{ "resourcename": "legalentityid", "dbprovider": "snowflake", "schemadriftmode": "validate" }

If drift was not found - data loads into database.

If drift was found - user receives response with drift columns.

Example of validate response when schema drift was found:

{ "status": 0, "proc_stats": null, "additional_data_type": "string", "processing_statuses": [ [ "Information", "Starting processing an EBS request", "eaglepy-i-1000", "", 0, "2022-07-07T11:22:16.136+00:00" ], [ "Error", "Schema validation failed", "eaglepy-e-1026", "", 0, "2022-07-07T11:22:16.143+00:00" ], [ "Information", "Executed on svc-eds-5579bc956-fzl48/PID=12/WORKER=/IMPL=CPython", "eaglepy-i-1000", "", 0, "2022-07-07T11:22:16.143+00:00" ] ], "drift_columns": { "_id": "legalentityid", "load": { "dataframe": { "_id": "legalentityid", "group_name": "Reference", "type": "dftFile", "description": "Lei details load", "maxrows": -1, "source_sink": { "sink_type": "ssFileType", "sink_params": {}, "source_descriptor_type": "Mmap", "allow_format_discovery": true, "source_format_dialect": { "lineterminator": "\n", "delimiter": ",", "headerat": 1, "headerlines": 1, "dialectname": "csv_char_strip", "encode_header": true, "force_vocab_stru": true } }, "format": "delimited", "vocabulary": { "newElementChar": { "path": "extension_newElementChar", "datatype": "string", "formatdialect": { "length": 16 } }, "newElementDate": { "path": "extension_newElementDate", "datatype": "date", "formatdialect": { "dialect": "YYYY-MM-DD" } }, "NewElementNum": { "path": "extension_NewElementNum", "datatype": "number", "formatdialect": { "precision": 38, "scale": 12 } } } }, "interface": { "_id": "legalentityid", "format": "eaglemlformat", "dataset": "legalentityid", "dataframe": "legalentityid" } } }, "worker_id": "svc-eds-5579bc956-fzl48/PID=12/WORKER=/IMPL=CPython" }

AUTO_APPROVE schema drift mode

Schema drift mode AUTO_APPROVE handles schema drift without any manual steps.

Loading a New Custom Resource with a Schema Drift

When schema drift detects new custom resource, these steps will be executed:

  • generate ontology for new resource

  • generate processing rule

  • execute ddl

  • publish data models

  • load data into new table

IGNORE Schema Drift Mode

This schema drift mode is set by default

Schema drift mode IGNORE will ignore any schema drift if it was found or not and will load data without drifted columns.

Example of request:

VALIDATE schema drift mode

Schema drift mode VALIDATE is detecting any schema changes for resource on load.

Example of request:

If drift was not found - data loads into database.

If drift was found - user receives response with drift columns.

Example of validate response when schema drift was found:

AUTO_APPROVE schema drift mode

Schema drift mode AUTO_APPROVE handles schema drift without any manual steps.

Load New Custom Resource with Schema Drift

When schema drift detects new custom resource, these steps will be executed:

  • generate ontology for new resource

  • generate processing rule

  • execute ddl

  • publish data models

  • load data into new table

Example of request:

Example of request with specified vendor and feedsystem:

By default vendor is BNYM, feedsystem is EAGLE

Example of request with schema drift tolerance:

This param is used on update of custom resource.

  • If drifted columns count is more than tolerance then Error will be added to processing result.

  • If drifted columns count is less or equal than tolerance - then schema drift process will be done.

Data file example:

Example of auto_approve response when schema drift was found for new resource:

Load Core EDS Resource with Schema Drift

When schema drift is detected in data for core resource, these steps will be executed:

  • generate extension for resource

  • generate processing rule

  • execute alter ddl

  • publish data models

  • load data into core and extension tables

Example of request:

Data file example:

Example of auto_approve response when schema drift was found for core resource:

Â