Data Science: Problem Forumulation

One of the issue with data science is ensuring you know what your attempting to solve – think of it as the ROI.  Like the constant refactoring of code that never makes it to production, hours/days/weeks can be spent on data frame construction, modelling, tuning, refinement.  However, at some points you need to step back from the cycle of modelling, revisit the problem, and validate that problem you perceived you were looking at, is still the right problem, and your solution is moving you towards a conclusion.

In my experience this follows a certain pipeline:

  • Discuss problem
  • Write down problem
  • Identify data sources
  • Refine problem with data sources in mind
  • Build data frame
  • Refine problem
  • Model
  • Capture evidence of results to construct the story of solution – useful for management and discussion
  • Refine problem
  • Tune Model
  • Write summary, next steps, present to business with ROI

Or at least something similar to the above.  Clearly I’d run this work through Kanban 🙂  The business can now see value add (ROI), and decide on next steps.  Thus avoiding the questions like “What are those data scientists actually doing?”


~ by mdavey on April 25, 2016.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: