Develop a Data Mart Implementation Plan
With a clear and complete functional specification, a knowledgeable resource can configure a Data Mart very quickly. This makes it worth the time to create a good implementation plan at the start.
Inventory the Fields and Groupings to be Supported
The first step in planning a Data Mart is to take as complete an inventory as possible of all the fields and groupings you plan to support in your information delivery project. The more complete the inventory, the better you can create a logical and economical design of models, model extensions and groupings. This does not mean that use of the Mart is an all or nothing or big bang step, only that within the scope of the data to be supported in the Mart, a complete inventory of it makes for a better plan.
This inventory process should include an effort to standardize where possible on a minimal set of definitions of common fields like market value and security price. Standardization makes possible a compact Mart where data elements are widely shared among consumers of the data.
Start Small and Build on Your Successes
While it is best practice to create a complete inventory of fields you will need for a specific information delivery goal, that goal may be a modest one, especially if it is your first Data Mart project. The mart is architected for future growth. So doing a complete analysis of a present known need does not require foreknowledge of all future reporting requirements.
Identify and Create any Missing Meta Data Elements
Every Data Mart field requires a field attribute, and every Data Mart grouping requires some set of field attributes, range rules and dictionaries (regular and performance). The Mart implementation plan should identify these meta data elements quite specifically, and indicate where something must be created.
Use the Data Mart Planning Template
Global Professional Services has a template that is very useful for storing all of the Data Mart and regular meta data elements involved in a Mart, and can serve as the basis of an implementation plan.
Plan a Sources Strategy
A sources strategy for Data Mart is a design that uses the minimal list of source rules and source-specific fields required to provide the entire source perspectives needed in reporting. Often, this implies a single source rule if data of every type, such as positions and trades, can correctly be reported using just a single hierarchy of sources. More than a single source rule is required only if certain individual positions, trades, or securities must be viewed in more than one way within the scope of information delivery.
Common use cases for multiple source rules or source-specific fields include:
Multiple providers of bond ratings or other security analytic data whose viewpoints must all be considered in the security evaluation process;
Multiple accounting bases, such as GAAP and statutory, that require accounting fields to be computed and reported in different ways.
When more than one type of data must be reported in more than one source hierarchy, the number of required source rules can proliferate and lead to several snapshots and heavy duplication of data. For this reason, you should make choices to limit the required list of snapshots. See “Sharing of Security Details Among Snapshots” for additional information.
A best practice is to build best data in PACE where possible, and then use the resulting Eagle PACE source as the main source for those data types built.
Plan for Good System Performance
Building Only the Data You Need
The single most important aspect of Mart planning directed at good system performance in the build process is a realistic assessment of what data you really need to send to the Mart. The best way to answer this question is to consider all of the processes that will use Mart data, and understand their demands in detail. Requirements are likely to expand over time, but you can add fields. It is better to add later than to over-build your Mart from the start.
Which fields are needed daily? Many organizations maintain a monthly reporting cycle that generates demand for a comprehensive list of monthly fields. Daily values may be needed for a much shorter list. If this is true, use the Selective Fields capability to limit the daily build to those fields needed daily, with a full build conducted at month end.
Some types of performance returns, such as returns for specific years or quarters, do not change every day. If you report such returns, you can avoid building them every day or even every month by building them only once on their true effective date, for example on December 31, 2020 for a “Year 2020” return. Then in reporting, you would always select them as of their effective date.
If you plan to deliver a piece of information from Eagle Portal, you can source it from Data Mart or use the dynamic query approach where results are computed on the fly. The dynamic query approach may have limitations as to breadth of data, and the calculation may take somewhat longer than if Portal is queried directly from the Mart. On the other hand, it is not best practice to build out your Mart in an overly comprehensive way just to feed Portal with some types of data that you think might be accessed, but for which no clear current demand exists. Portal dynamic queries are best suited in those cases where you need to support access to particular types of data or calculations, but you expect access frequency to be very low.
Indexing Warehouse Trade and Cash Date Fields
When you select trade and cash activity fields for Data Mart you can specify which of several different date fields, such as trade date and settlement date, you want to use as the effective date. Whichever date you choose should have an index built on it in the appropriate warehouse table to speed processing of trades or cash events. The effective_date field itself is indexed already. Statistics should be built frequently enough, ideally daily.
Table indexing does not really apply to building performance analysis fields, since they are compute-intensive.
Using Extensions to Exploit Parallel Processing
System performance in building Mart data is enhanced when your design takes advantage of parallel processing by multiple concentration (OLAP) engine instances. Each Data Mart table extension is populated under the direction of a separate Model Manager instance that generates its own set of underlying OLAP processes. This means that the use of extensions can increase parallel processing and speed population of the Mart.
A Global Professional Services study has recorded the advantages of extensions in populating performance measurement data in particular. See the Data Mart White Paper in Knowledge Base article 8459. Performance Analysis OLAP processes underlying a Data Mart model are among the most I/O and compute-intensive of any, so extensions can be particularly helpful in minimizing run time as you build performance linked returns, risk fields and other complex calculations. When you assign performance fields to an extension, it is best to group fields together that you would logically assign to the same Performance Analysis OLAP report, such as a single attribution group or a series of performance link analysis fields that span either a short or a long-term time span.
If you install Eagle Mart and the Performance Toolkit, you can see examples of best practices for using extensions to support different types of performance fields. The toolkit uses separate Fund Summary extensions for the following field types:
Performance link fields for periods up to one year
Performance link fields for periods longer than one year
Performance risk analysis fields
GIPS Disclosure fields
GIPS Composite Performance Analysis fields
There are restrictions on building only one type of field attribute in a single extension.
Sharing Security Details Among Snapshots
If you build data for multiple accounting sources as different snapshots, you may not want to build the same Security Details model data for each snapshot, as this would be repetitive. You may find it preferable to build Security Details for just one snapshot, such as the Default snapshot.
Since Data Mart requires you to build Security Details whenever you build any detail-level table, you cannot completely avoid building Security Details in each snapshot. However, you can use Selective Fields to build just one field in Security Details in all but the Default snapshot, and save considerable time. Data Mart offers the option of building detail-level tables without also building any Security Details.
If you have different snapshots with different securities, you should build all securities. However, if you have different snapshots with the same securities you would only need to build once.
Limit the Number of Entities per OLAP Process
Experience shows that performance OLAP processes typically run best when you change the No. of Funds per Report option in the Configuration dialog box in Manage Mart to about 20. The option is set to 50 by default. You can experiment with this setting to achieve optimal results, especially if you are building a significant amount of performance data.
Limit the Number of Entities per Model Manager
Similar to the number of entities per OLAP process, you can also control the number of entities per Model Manager. You can increase the default value of 1,000 to find the value that works best for you.
Minimize Log Level when Not Troubleshooting
Logging at a high level such as 8 or 10 can occupy material time in a build. During normal production you can set the level to 1 to save time and then raise it again if necessary for troubleshooting purposes.
Select Settings for Process Manager
If you use Process Manager to build your Mart, pay special attention to the settings described in the following sections.
Avoid the Sequential Processing Setting
You can use Process Manager to manage a variety of Eagle processes. For some processes, each engine should launch in sequence. For others, parallel processing is preferable.
Parallel processing is required for Data Mart to build data with the greatest possible efficiency. To use parallel processing if you use Process Manager to build your Mart, when you configure the Process Manager Workflows, deselect the Process Effective Date Change by Batch checkbox in the Workflow Definition tab on the Workflows Setup wizard. Only deselect this option when you meet the needs of the Funds Per Model Manager and Funds per Mart conditions. This allows the system to "flood the cluster" and take full advantage of parallel processing.
Ensure Parallel Processing
You must complete the following configuration step for parallel processing for Data Mart whether or not you use Process Manager with the Mart. In Eagle Navigator, type PACE Engines and select the PACE Engines (System Management Center) link. Set a Limit Ratio value of 0.25 or other value less than 1.0 as directed by Eagle Technical Services for three engine types: Data Mart Manager Engine, Model Manager Engine and OLAP reports. The Diagnostic checks that you run in Data Mart check that these Limit Ratios are below 1.0.
Find the Best Process Manager Run Group Batch Size
When you set up a Process Manager workflow, you see the Run Groups tab in the Wizard. On the Run Groups tab, you define events within the workflow. The Batch Size setting on the Run Groups tab indicates the number of entities that are batched together for purposes of your Mart build. The default setting is 1. This indicates that the workflow will start as soon as any eligible entity is found. This setting may be inefficient because the minimal overhead of starting a Data Mart build is potentially repeated for every entity built in the Mart. A setting in the range of 25-50 is preferable. You can try various settings to determine the one that gives you the best results.
Special Considerations for Performance Measurement Fields
Performance measurement fields can involve complex database queries and extensive processing requirements. Data Mart builds that require many of these fields are the most challenging for achieving good system performance. There are several ways to improve efficiency as described in the following sections.
Limit the Lowest Level of Performance OLAP Builds
Performance returns are typically stored in the database at the security, group and fund levels. When a Data Mart performance OLAP process runs to build performance analysis fields, it computes returns for all levels represented in the performance dictionary that is linked to the model by default. For the Fund Summary and group models, you can use the Performance Report Options link in the Filters/Mappings tab of the model user interface to limit computing of returns to just what is required by the model. For example, Entity level for Fund Summary. Note the following:
If you are building performance attribution fields where your fund and/or group returns are based on rollup of lower-level contributions, you must not limit the rollup process or you will not get the expected attribution data at a rollup level. Note that this does not apply to models where you use dynamic performance rather than a performance dictionary.
If you plan to report security-level performance returns linked to your groups, you can use the Create Mapping for Details option on the group model’s Filters/Mappings tab to create a convenient table of pointers from group table groups to detail table rows. However, if you do this, you cannot also limit the OLAP build to exclude the detail level. You must choose one or the other. If the choice is to avoid building the OLAP to security level, you can still associate groups and their underlying detail members by including grouping fields as data fields in the detail model, and using reporting to sort on those fields shared by groups and detail, for example, Country and Sector.
Economize on Associated-Benchmark Fields where Possible
You can associate one or more index entities with a fund as its custom list of benchmarks. Then you can define performance analysis field attributes for the fund that point to any of its associated benchmarks. In Data Mart, all of these associated-benchmark fields are built on the same table rows as the fund, which is convenient for reporting purposes. Further, if the benchmark is of the custom blended variety and fund-specific, or if the field links benchmark returns since the inception date of the fund, only these associated-benchmark fields will do the job.
However, you might need to report benchmark returns whose values do not depend upon association with a particular fund. For example, the QTD return of the “vanilla” S&P 500 index as of a given date is the same number for all funds. Obviously you can reduce your Data Mart build requirements by building that number just once rather than once per fund. Benchmark returns may be built on their own rows in Fund Summary. Then for each individual fund you can populate Fund Summary with the entity IDs of its associated benchmarks, and design your SQL accordingly.
Engage Eagle Technical Services
Database tuning can almost always make a well-designed Mart run even better. In addition, if your Data Mart requirements involve hundreds of fields or thousands of funds, hardware sizing also deserves expert attention. Eagle Technical Services offers the benefits of their experience with a variety of significant Data Mart implementations, and can add substantial value to your Mart planning process.
Assign Roles and Access Privileges
Mart configuration is usually considered an administrator function, and is not a business user function. Data Mart’s privileges in User Manager are granular enough to support the differentiation among configuration roles that most organizations need.
Where there is more than one Data Mart role required, the roles usually fall under the following categories:
Administration, with all privileges.
Model management, with Maintain, Configuration, Migrate and View privileges.
Production, with Submit and View privileges.
Data quality assurance, with Audit, View and Validate privileges.
Adopt an Archive Policy
Archiving Data Mart tables can improve index rebuilding and query performance simply by reducing the number of rows to be navigated. A logical policy is one that preserves each data frequency (daily and monthly) for the time span that reporting may need it. Many clients preserve monthend data indefinitely, but keep daily data for a few months or a year.
Remember that all as-of reporting against Data Mart requires that data be kept for all possible prior dates you might want to report. If you need data for time periods already purged, the lost rows can be rebuilt, but there will likely be restatements of numbers previously reported.
In V9.0 under Archive Engine support of Data Mart, if you want to archive to a table and save each year’s rows together with a labeling effect, you can simply assign names to archive tables with a naming convention that includes the desired date information.
You should purge the Data Mart execution tables and leave them with no more than 1 months’ worth of data. Your specific needs may be different. Not purging can result in system performance issues.
Add Comment