Artificial Intelligence – Data Quality

Most corporations as soon as they venture down the road of “big data”, and AI, realise they often don’t have a big data issue, they have a data quality issue, which is probably coupled to data holes within the corporate data set.  This is driven by a number of issues, including:

  • No Chief Data Officer
  • No Data Strategy
  • Lack of thought as to how the data from a application (division, department etc) will be used outside of the application (User Experience) itself
  • No Acceptance Criteria on stories around data quality

Data hygiene is key to deriving using predictions and classifications from AI models – obvious 🙂

What follows are a few pointers that may aid in the area of data hygiene:

  • Identification of source of truth (SoT) of data e.g. market data, trades, orders, recruitment.  Using a secondary copy of the SoT can often lead to “issues”
  • Context around data changes in the SoT.  Specifically, who changed what, when, and ideally why.  “Why” can be difficult in certain instances, but ideally would provide some context on the path that lead to a data change e.g phone call from a client requesting an amendment to a trade
  • Taxonomy/ontology – if you are doing anything around LDA to extract topics, then its going to help considerably if the input data leverages a taxonomy to reduce the surface area of data.
  • Applications are often built with no thought around any of the above points.  Further, if you are using ELK or similar as a data source for AI models, its will not be uncommon to find that application development didn’t consider the logs during development 😦  In this scenario, I’d advise mandating ELK to development teams 🙂  At a minimum, this will aid the reduction of support tickets as support staff will at least have meaningful log files to work with 🙂

Its truly amazing how time can be wasted prior to training AI models with cleaning and collecting data 😦

~ by mdavey on June 23, 2016.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: