07 Apr Cloud Data Management: Single Source of Truth
Today’s cloud and hybrid cloud IT environments create new challenges for data management. One of these problems is how to architect an information system such that there is a single source of truth (SSOT) for every data element. This means that all locations of data refer back to a primary source, and updates to every data element is only edited at that source. Without this type of architecture, once every group pulls their needed data into their own data mart and then manipulates it with their own tools data lineage and integrity is lost. The impacts of such a loss include:
- Executive decisions are made without really understanding the data’s reliability.
- Money and time are wasted on redundant storage, data and analytics work.
- Respective data projects likely follow a different process and no clear path is available to craft decision processes that are truly data driven.
Organizations may end up in this situation for a variety of reasons including:
- Current data architecture doesn’t allow data lineage to be tracked across multiple storage solutions and analysis tools.
- Insufficient governance policies are in place for change control and directing how data flows across the architecture.
- New approaches to data science introduce challenges that did not exist before, especially due to ad hoc extract, load, transform (ELT) processes (where T & L are intentionally flipped) and experiments with analytics that create new views and forms of the data at increasingly rapid rates.
The path towards a solution to this problem includes objectives such as simplifying the flow of data across the data ecosystem and stopping unnecessary data proliferation. A combination of common data management tools and augmented governance policies can help. Very complex environments should consider implementing a machine learning (ML) analytical environment, including system-to-system interaction based on ML-generated outliers, findings, etc. This type of foundational platform for automating, maintaining, and scaling ML systems (dataflows and models) is really the only way to achieve the time efficiency and cost efficiency necessary to realize the promised benefit of becoming a truly data-driven enterprise.
There are five fundamental attributes of such a system:
- Data platform – establish platform for managing data end to end
- Data-driven workflows – manage data flows from source to machine learning application
- Data science mindset – re-envision how to approach and solve problems
- Governance – establish clear roles and responsibilities with appropriate controls in place to implement data management policies
- Strategic oversight – select and build use cases according to a common strategic thread
The goal is to provide a single source of truth for users to access the data they need via the tools they typically use, so they can provide the reliable data needed to help executives make the right decisions for the enterprise.
Lessons from the Field: Getting to a SSOT Data Platform
Our cloud data management experience has given us several lessons to consider, which we share with you below:
With the observations above in mind, organizations should then form a “proof of concept” to help guide the ultimate implementation of a SSOT architecture. Such steps should include:
- Clarify objectives – align expectations based on discussions with users and study of use cases.
- Implement end-to-end framework for managing dataflows, model development, and model deployment.
- Establish a governance approach with well-defined user roles and guidelines.
- Create a strategic oversight team that reviews and selects use cases to fund in order to ensure synergy across use cases and achieve the targeted impact.
- Identify and implement 1-3 initial use cases following agile framework.
- Review results, effectiveness of the governance, and end-to-end management .
- Adjust final solution as needed.
We share lessons learned from our cloud/data management implementations on a regular basis. To read other articles in the series, click here.