Data Lake Development

Data Lakes require fluid and flexible data engineering to ensure all is consistent, complete, and correct for its uses. Developing a Data Lake begins with acquiring data, conforming it to satisfy all data rules and relationships, refining it into the data structure needed for each analytic use, and publishing it so data security and user data access privileges are enforced. RCG calls this process data curation, ensuring data in the Data Lake is ready to turn into real-time actions.

According to a study performed by MIT's Sloan School of Management, the top three things managers want from their data is integration, consistency/standardization, and trustworthiness (link here).

Clearly, two decades of Business Intelligence/BI have not successfully positioned organizations for successful Data & Analytics. The obstacle to success? Not understanding trustworthy data and how to develop it. Successful trustworthy data development requires new approaches to:

  • Data Use Scenarios: People need information that helps them make decisions and do their jobs effectively and efficiently, which means data will be used for a number of business purposes and in a number of ways, so understanding data use is critical for knowing what trustworthy data needs to support;
  • Data Acquisition: As the business world continues to become based on mobile, cloud, and real-time data, methods for acquiring data and processing it must be based on high data volumes and speed of processing and analysis;
  • Data Architecture: A range of data use scenarios will be reflected in the data architecture needed to support them because no single data structure will satisfy every data use scenario;
  • Technical Architecture: Support for a wide range of data use scenarios, real-time data, and a more complex data architecture requires a technical architecture that enables value realization, provides high performance, and can adapt to future data and analytic needs without affecting existing reports, dashboards, queries, statistical models, algorithms, and more;
  • Big Data & Advanced Analytics Architecture: Existing data warehouses, data marts, operational data stores, and reports and queries they provide will continue to exist, however the end-to-end data and analytics architecture must become more capable in order to support the range of the business's Data Use Scenarios, Data Acquisition for them, and more sophisticated Data Architectures and Technical Architectures; an advanced architecture is needed to incorporate and go beyond the current BI data infrastructure.