Some of us started on these, learning CECIL.
Few interesting articles recently on distributed systems:
- The Space Between Theory and Practice in Distributed Systems
- Distributed systems theory for the distributed systems engineer
Of particular interesting:
How you decide whether an event happened before another event in the absence of any shared clock. This means Lamport clocks and their generalisation to Vector clocks, but also see the Dynamo paper.
And then there is the coolness of state machines:
Distributed state machine replication (Wikipedia is ok, Lampson’s paper is canonical but dry).
Dave McCrory has an interesting presentation on InfoQ which discusses enterprise platform as a service. Slide 23 onwards discusses legacy data, and the issues associated with data consumption. Master Data Management is also discussed in the deck. Which leads to a New York Times article, “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”.
Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.
Couple of interesting use cases in this InfoQ video:
- Broker-Dealer transaction history
- Client Review
- Customer Relations and Prospecting
- Enterprise Credit Risk
- Cluster Pricing
Clearly the presentation didn’t have time to get into Lambda architectures, Spark and more