The Value of Metadata-driven ETL Frameworks and Simplified SOA Services

Categories: Data & Analytics, Digital Engineering

by Joe Lolordo – September 16, 2016

Would you like to be able to more effectively integrate a large variety of data sources by making it faster, easier to manage, and have repeatable processes? Do you need to be able to provide common services and data analytics which enable you to report on that data in a near real-time manner? A key to achieving that goal is to implement a flexible EDW/Analytics Architecture that includes a metadata-driven ETL framework and a simplified SOA Services approach.

It’s a goal that is sought by many organizations – essentially any company that is setting-up an environment for performing advanced business analytics to drive business decisions. And it’s a goal that’s rooted in the desire to enhance the services they provide in order to grow the business.

In today’s economy, a key trend for Insurance (Health, P&C, Life), Healthcare (Hospitals, Clinics, Doctors Groups), and Other Financial Services (Banking and Investment Management) companies is to continue to grow through acquisitions and mergers. They need this capability to quickly integrate business systems and data from similar organizations.

Let’s consider the example of a healthcare insurance provider. This business, of course, revolves around paying claims for healthcare services. Daily business operations require providing answers to a range of specialized questions, such as:

  • What claims are outstanding?
  • When will these claims get paid?
  • Do we have the necessary healthcare provider information?
  • Did we get proper authorizations?

All businesses need the ability to quickly, effectively, and efficiently source and report the data that drives daily operations.

The Advantage of a Metadata-Driven ETL Framework

There are multiple tools available for ETL development, tools such as Informatica, IBM DataStage, and Microsoft’s toolset. But using these tools effectively requires strong technical knowledge and experience with that Software Vendor’s toolset. Integrating new data sources may require complicated customization of code which can be time-consuming and error-prone.

Using a metadata-driven ETL framework means establishing an easier-to-use and more flexible abstraction layer that simplifies the technology learning curve and reduces the time to implement new data sources. It involves creating templates for data migration controls, exception handling, and rules management. Excel spreadsheets can be used by business users to create consistent transformation and integration rules. Physical data source locations, data schemas, error handling logic, and job control parameters can be stored in configuration files that can be easily maintained and processed by the Framework to generate executable ETL jobs. It streamlines the overall process of loading data into a traditional EDW, making data available sooner for analytics, reporting, and use by other applications. Last but not least, the framework removes the variability that is seen when different developers perform similar work; the code that results from the framework is standardized and is easy to review and maintain.

Additional Benefits of Atomic RESTful SOA Services

While the metadata-drive ETL framework addressed the loading of data, an architecture that also employs atomic RESTful SOA Services enables the creation of flexible and streamlined applications that can be quickly assembled to provide end users access to the data through Portals, Dashboards, and other applications. A REST based service is commonly used for simple CRUD type transactions. It is a light weight and faster interface that requires a lower learning curve to use in development.  An atomic SOA service is a single simple irreducible component of a transaction such as retrieving a claim or creating a member – accessing a single table of information. These services are then used and combine by an application to provide the necessary information to the end user. That enables the option of picking, choosing, and reusing services as needed. Where there may be 10 services, for example, you may need to use only the 3 or 4 that may be applicable to your application. And then in another situation you may want to use all ten. The point is that you have the flexibility to select only the services you need and each application can be developed by re-using common services.

For illustration, let’s return to the example of a healthcare insurance provider. The healthcare insurance industry has been rather volatile in recent years, with lots of consolidations and mergers occurring.

Let’s say that one company — a large healthcare insurance provider — has operations in one state, and then through expansions they’re going to provide healthcare insurance in another state. So they acquire an insurance provider in that state.

They need to be able to quickly provide the same capabilities and level of service after the acquisition. And that requires managing data quickly, supporting reporting and analytics to their member community, the provider community, to corporate management, and to regulatory agencies.

The metadata-driven ETL framework provides the ability to replicate and add new data sources – and quickly. Everything is at an abstraction layer where it’s easy to define/reuse mappings, easy to define different sources and targets of where the data is supposed to be, and easy to define/reuse transformation rules in the metadata portion of the framework.

Put simply, using a metadata-driven ETL framework makes it faster and easier to process, load, and transform data. While the SOA Services Architecture makes it easy to replicate applications like Member and Provider Portals. And that benefits companies by making it easier for them to manage their EDW environments and quickly replicate common reporting services/applications.

So for our healthcare insurance provider example, the benefits would be tremendous if they are acquiring multiple insurance providers in multiple states. They can easily replicate the process without recreating something totally unique for each integration effort, or for each new set of data that must be integrated.

Slashing Development Time

A metadata-driven ETL framework is an excellent approach for standardizing incoming data. It helps simplify what can be a very complicated process. It helps speed development on the ETL side by providing more flexibility during the process of incorporating different data sources into a data warehouse.

The atomic RESTful SOA services enables more services to be provided to the end business user – whether it’s a Member or Provider Portal application or whether it’s some type of Business Analytics Dashboard. Common services can be used repeatedly or a subset can be easily modified to provide any unique processing for any new business organization being acquired.

Summary of Advantages

In summary, here are some of the potential advantages of using this approach:

  • UNIFORMITY: The Metadata Driven Framework approach yields a uniform generic way of data ingestion. It makes it very easy to review existing configurations or adding new configurations just by understanding the ingestion pattern.
  • AGILITY: This Framework approach provides unique agility in developing or changing the configurations. Any changes to ingestion mainly would consist of modifying the DML’s for meta-data without any code change which is huge for an agile methodology.
  • EASY TO SCALE:  The ease to scale is demonstrated by the ease of adding new sources, configurations, environments, etc. just by merely creating meta-data.
  • MAINTAINABILITY: Since everything from business logic to data flow is in form of excels documents, it’s very easy to maintain this approach.
  • ACCELERATION: ETL Frameworks do not need to replace one’s existing ETL platforms. It might help to assist as an accelerator or code generator for rapid development in the native ETL platform of choice. For instance, the Framework can be used to generate custom factory templates of XML’s which can be imported in Informatica custom repositories to generate ready-made ETL from the framework.

What’s the real payoff?

I would estimate that the development time required for integrating an acquisition or new sources of data into a data warehousing and business analytics environment can be reduced by about 30%. And that might be a conservative estimate.

The opportunity to slash development time by as much as a third? That certainly makes using a metadata-driven ETL framework and an atomic RESTful SOA services architecture worth considering.

Subscribe to receive more blogs like this from RCG

Subscribe to get the Latest Updates

Enter your email address below to get the latest news and updates from RCG.