During a recent project innovation sprint at a customer, we decided to tackle our customer’s data warehouse documentation problem. It was hard to get proper insight in the data streams at hand due to various data sources, changing standards and legacy code. Take for example a random field: in which reports is it being used, to which source can it be tracked, which transformations have been applied, etc?
Since our aim was to thoroughly reshape the infrastructure, we decided to add this kind of information because it would allow us to better gauge the impact of our modifications. During the innovation sprint, we developed a system that builds said info and makes it possible to query.