•July 4, 2016 • Leave a Comment
Although not published yet, Mastering Feature Engineering early release looks like its going to be a interesting read.
Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work.
Machine Learning Mastery has a lengthly blog posting on the topic, with a number of links at the end of the posting.
Its of no surprise that Mastering Feature Engineering tackles Bag-of-words, since its a useful feature engineering tool when looking at any text e.g cv’s, documentation, etc
•July 1, 2016 • 2 Comments
Looks like BBVA are making great strides with scum within the context of corporate culture. Its great to hear they are prepared to break the historical chains that have constrained them, with the obvious Return on Investment (ROI) of business value derived from transformation projects.
moving away from rigid organizational and functional structures toward a much more collaborative way of working
Anyone know if BBVA are using SAFe, LeSS, DAD or something else? Some LinkedIn profiles hint to SAFe.
•June 30, 2016 • Leave a Comment
Recently has a read of Infrastructure as Code. One particular topic that is worth calling out is Chapter 10, Software Engineering Practices for Infrastructure. In particular, the “branch” discussion🙂
Page 183 clarifies the view that numerous teams have with branches:
Rather than continuously integrating changes, many development teams commit changes to separate branches in their VCS. The goal is usually to allow people to spend time finishing a large change before they worry about making it work with any change that other people are working on
This leads nicely into Codebase Organization Patterns, page 267, and the Antipattern: Branch-Based Codebases, and the great sub title, “Workflow Effectiveness”🙂
Page 196 provide another good read on testing, and why separate teaming and engineering teams is wrong.
•June 23, 2016 • Leave a Comment
Most corporations as soon as they venture down the road of “big data”, and AI, realise they often don’t have a big data issue, they have a data quality issue, which is probably coupled to data holes within the corporate data set. This is driven by a number of issues, including:
- No Chief Data Officer
- No Data Strategy
- Lack of thought as to how the data from a application (division, department etc) will be used outside of the application (User Experience) itself
- No Acceptance Criteria on stories around data quality
Data hygiene is key to deriving using predictions and classifications from AI models – obvious🙂
What follows are a few pointers that may aid in the area of data hygiene:
- Identification of source of truth (SoT) of data e.g. market data, trades, orders, recruitment. Using a secondary copy of the SoT can often lead to “issues”
- Context around data changes in the SoT. Specifically, who changed what, when, and ideally why. “Why” can be difficult in certain instances, but ideally would provide some context on the path that lead to a data change e.g phone call from a client requesting an amendment to a trade
- Taxonomy/ontology – if you are doing anything around LDA to extract topics, then its going to help considerably if the input data leverages a taxonomy to reduce the surface area of data.
- Applications are often built with no thought around any of the above points. Further, if you are using ELK or similar as a data source for AI models, its will not be uncommon to find that application development didn’t consider the logs during development😦 In this scenario, I’d advise mandating ELK to development teams🙂 At a minimum, this will aid the reduction of support tickets as support staff will at least have meaningful log files to work with🙂
Its truly amazing how time can be wasted prior to training AI models with cleaning and collecting data😦
•June 15, 2016 • Leave a Comment
“Deep Learning For Chatbots, Part 1 – Introduction” provides a good overview on the techniques needed to develop your own chatbot. Clearly, a closed domain problem is easier.
Microsoft’s Bot Framework also provide some good resources. Particularly nice, is that fact that Bot Builder is Node.js. Microsoft has gone for one approach to understanding natural language – LUIS.
Botkit also looks interesting, but doesn’t seem to have that complex a NLP ability.
ChatterBot has a training mode. I’ve not used it, but it would be interest for example, to play in traders Bloomberg conversations or similar, and see how the bot faired🙂
•June 14, 2016 • Leave a Comment
Slide 8 of Martin Fowler’s deck provide clarity on what polyglot persistence means.
using multiple data storage technologies, chosen based upon the way data is being used by individual applications. Why store binary images in relational database, when there are better storage systems?
•June 13, 2016 • Leave a Comment
Few articles worth a read on serverless architecture:
- OpenWhisk Vies With AWS Lambda As Developer Service
- Google has quietly launched its answer to AWS Lambda
- How I decided to use Serverless/Nanoservices Architecture with AWS to make CAPI
- Is “Serverless” architecture just a finely-grained rebranding of PaaS?