Dave McCrory has an interesting presentation on InfoQ which discusses enterprise platform as a service. Slide 23 onwards discusses legacy data, and the issues associated with data consumption. Master Data Management is also discussed in the deck. Which leads to a New York Times article, “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”.
Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.