EDS Mashup Definitions

We have introduced the concept of mashup definitions to support dynamic snapshot selection and true "as of" views of security and entity datasets. Mashup definitions are maintained centrally and are referenced from within interfaces. Mashup definitions define the logic of selecting the necessary snapshot based on the request parameters and conditions. They will also implement the loading of snapshots dynamically from database if necessary.

Mashup definition examples with genericsmf

First example

{
"_id": "m_genericsmf",
"_alias": "genericsmf",
"columns": "securityAlias,stateCode,countryOfRiskCode,issueCountryCode,securityType,processingSecurityType,assetCurrency,couponTypeCode,securitySubType,bbid,cusip,buysell,contraSecAlias,coupon,dayCnt,isin,priceMultiplier,restrictedIndicator,xrefidentifiers,underlyingCusip,underlyingSecAlias,exchange,investmentType,issueDescription,issueName,maturityDate,primaryAssetId,primaryAssetIdType,securityType2,sedol,ticker",
"key_expr": "(convert(atoi(|SECURITY_ALIAS|),0))=( convert(atoi(|SECURITY_ALIAS|),0))",
"mode": "first",
"filter_expr": null,
"asof_date": "iif(stringlength(get_instruction_param('effectivedate', :GVAR_RTR)>0, date(get_instruction_param('effectivedate'), 'yyyy-mm-dd'), today())",
"dataframe": {
"genericsmf_asof": {
"condexpr": ":asof_date: != today() and get_global_setting('generic_smf_asof_mode') = 'full'",
"cache_update_mode": "dynamic"
},
"genericsmf_asof_delta": {
"condexpr": "asof_date != today() and get_global_setting('generic_smf_asof_mode') = 'delta'",
"effective_for": ":asof_date:"
},
"genericsmf": {
"condexpr": ":asof_date: = today()"
}
},
"proc_alias": "genericsmf"
}
The definition has several easy to follow properties, which are standard properties for existing EDS dataset definitions: _id – id of mashup definition
alias – name of data object that will be associated with this definition
columns – list of columns to pick up from data frames (allows lower amount of requested fields and enhanced processing performance), EML names are used, linked to exact tables and table columns via ontology
key_expr – key expression to pull data from dataframes
filter_expr – expression to filter out pulled data from cache, should be evaluated to True to pass record from dataframe to be mashed up
mode – value defines which value and how many values should be pulled from data frame
proc_alias – alias that should be referred to on mashup processing.
The new properties are asof_date and dataframe:
asof_date – defines data for which date should be used (it is evaluated based on effectivedate extract parameter)
dataframe – to define list of available dataframes for data objects that can be chosen to provide data and corresponding conditions. It is possible to have as many dataframes with conditions as required. The dataframes, however, have to be defined already in respective dataset definitions.
Part of the condition is the global setting called 'generic_smf_asof_mode'. This setting is important specifically for security master data as different clients maintain their SM data differently. The delta mode assumes maintaining only delta modifications on a daily basis, whereas 'full' mode indicates the client has the entire security data set updated daily.
The logic of selecting the dataframe in the above example:

  • asof_date is effectiveDate parameter value if specified in RTR, otherwise it defaults to current date.

  • genericsmf_asof dataframe should be used when effectivedate is in RTR and is not current date (thus, security history should be used instead of security master) and global setting generic_smf_asof_mode is set to full (this setting should be set to this value if client has all records reloaded each and every day). cache_update_mode equal to dynamic refers to need to extract data from DB and build temporary data frame based on DB data.

  • genericsmf_asof_delta should be used when effectivedate is in RTR and is not todays current date (same as in previous case), but unlike previous case, client loads security data on delta basis, so security history data can be cached. In this case, effective_for attribute is used in dataframe node – to define which date should be used to pull data from dataframe.

  • genericsmf will be used in case if there is no effective date in RTR or effectivedate in RTR is current date.

Now, let us consider each dataframe. The first dataframe, genericsmf_asof, has two parameters required for creating a cache snapshot: effective_date and source, or, max_effective_date and source. It has cache mode set to none. It represents the 'cold' data as people would rarely use security data in the past. In case the condition is true, the mashup processing engine will fetch the security data snapshot and place it into dynamic short-lived cache.

The actual fetch from DB is going to be optimized by 2 factors: dynamic query profile, which will be built based on "columns" property in definition and ontology (which would allow one to map canonical elements into DB columns), and the fact that it would be executed concurrently while the main query of interface is performed. The result cursor will be placed into a "smart" descriptor, which uses in-memory compression and is able to effectively store "sparse" data. Due to all these factors, we expect almost no overhead related to this request. This descriptor would then be made available as a mashup dataframe.
The reason for 2 sets of parameters for creating snapshot is due to the fact that some clients keep security data inconsistently, not really following any of the 'full' or 'delta' modes. For those we will have to use 'max_effective_date' queries, which builds snapshot based on unique list of securities up to specified max effective date. Again, the actual processing of this query is still greatly optimized by the previously mentioned factors.

The second dataframe, genericsmf_asof_delta, assumes pure 'delta' mode storage of security data. It has cache mode set to 'full' and would be fully cached and 'effective for' pattern will be used if this dataframe is selected. In order to maintain up to date cache, this dataframe comes with delta cache update policy. Finally, the third dataframe is genericsmf and it represents current snapshot of security master data. It is "hot" data which is fully cached and readily available and it also comes with delta cache update policy.

Second definition

{
"_id": "m_genericsmf",
"_alias": "genericsmf",
"columns": "primaryAssetId,issueName,securityAlias,stateCode,smdCountry_OfRisk,nraTaxCountry,securityType,processSecType,currencyCode,couponTypeCode,smdAltSecurityType,bbid,cusip,buysell,contraSecAlias,coupon,dayCnt,isin,priceMultiplier,liquidFlag,xrefidentifiers,underlyingCusip,underlyingSecAlias,exchange,investmentType,issueDescription,matDate,primaryAssetIdType,smdeSecurityType,sedol,ticker,sec144AFlag,smdUserGroupChar14,smdUserGroupChar15,smdUserGroupChar16,couponFreqCode,cins,contractSize,dervChar1,smdUserGroupFloat1,incomeCurrency,datedDate,muniEscrowTyp2Nd,expirationDate,maturity2,firstIncomeDate,gicsIndustry,gicsIndustryName,gicsIndustryGroup,gicsIndustryGroupName,gicsSector,gicsName,gicsSubIndustry,gicsSubIndustryName,icbIndustry,icbIndustryName,icbSector,icbSectorName,icbSubsector,icbSubsectorName,icbSupersector,icbSupersectorName,settlementCurrency,issueDate,issuePrice,paymentDelay,issueCountry,muniIssueTyp2,naicCode,nxtCallDt,nxtCallPx,nextCouponDate,smdUserGroupDate11,smdUserGroupDate12,parAmount,pficFlag,muniPreRfndDt,muniAdvRfndPx,preRfndTyp,puttable,callable,varRateFreqCode,userGroupSector3,userGroupSector9,userGroupSector1,userGroupSector6,userGroupSector2,smdUserGroupDate10,smdAltInvestmentType,sharesOutstanding,strikePrice,taxStatusCode,issuerId,smdUserGroup2,countryOfIssue,smdUserGroup1,smdExpcpty,derExpirationDate,payReceiveInd,putCallFlag,smdUserGroup19,smdUserGroupChar8,smdUserGroupChar13,smdUserGroupChar6,smdUserGroupChar9,smdUserGroupChar7,smdUserGroupChar11,smdUserGroupChar2,notionalAmt,redemptionCurrency,desNotes,parentCompTicker,postRedenomCrncy,preRedenomCrncy,smdUserGroupChar3,smdUserGroupChar4,smdUserGroupChar5,smdUserGroupChar12,smdUserGroupChar20,smdUserGroupChar21,smdUserGroupChar22,smdUserGroupChar23,userGroupSector4,industryGroup,userGroupSector5,industrySubgroup,userGroupSector7,industrySector,lseSegment,lseSector,smdUserGroupDesc6,smdUserGroupDesc7,smdUserGroupDesc8,smdUserGroupDesc9,smdUserGroupChar18,smdUserGroupChar17,smdUserGroupChar25,smdUserGroupChar10,smdUserGroup18,userTag,smdUserGroupFloat10,extnsContractType,extnsAltExchange,muniOptCallTimingInd,firstRateResetDate,commercialPaperFlag,extnsIssuerName,liquidity,underlyingSecurity,adrAdsGdrFlag,smdUserGroup21,smdUserGroup22,smdUserGroup23,smdUserGroup24,smdUserGroupDesc10,smdUserGroupDesc12,smdUserGroupDesc14,smdUserGroupDesc16,sectALevel3Code,sectALevel2Code,sectALevel1Code,sectALevel4Code,paymentFreq,extnsDaysToMat,extnsAnnualRateTyp",
"query_profile": {},
"key_expr": "(convert(atoi(|SECURITY_ALIAS|),0))=( convert(atoi(|SECURITY_ALIAS|),0))",
"mode": "first",
"filter_expr": null,
"asof_date": "iif(stringlength(get_instruction_param('effectivedate', :GVAR_RTR)>0, date(get_instruction_param('effectivedate'), 'yyyy-mm-dd'), today())"
"dataframe": {
"genericsmf_asof": {
"condexpr": ":asof_date: != today()",
"cache_update_mode": "dynamic"
},
"genericsmf_prior": {
"condexpr": "get_mashup_status(:GVAR_RTR:, 'genericsmf_asof') != '1'"
},
"genericsmf": {
"condexpr": "get_mashup_status(:GVAR_RTR:, 'genericsmf_prior') != '1'"
}
},
"proc_alias": "genericsmf"
}
Compared to the previous example:

  • This definition does not have genericsmf_asof_delta as it assumes full re-load of all security records each day (therefore, no reason for caching security history table).

  • it has additional genericsmf_prior dataset that is triggered when asof_date is current date, or if no data was retrieved from genericsmf_asof dataframe. Basically, this dataframe is the default one.

  • genericsmf dataset data is pulled only when the other dataframes did not return any data. This dataframe is the same as genericsmf dataframe in the first example.

genericsmf_prior dataframe is based on core genericsmf_prior dataset and stands for cached value of security data for a single previous date. This enables extracts for current day transactions to pull SMF data from the previous day.