Recurrent Neural Networks (RNNs)

•August 20, 2017 • 1 Comment

Some while ago a data scientist colleague pointed me at “Practical Machine Learning With Event Streaming” over on the Monzo blog.  The key takeaway if you don’t have time to read the article fully is:

RNNs specialise in making predictions based on a sequence of actions (known as an “event time series”), e.g. user logs in → taps on “money transfer” → encounters an error.

After reading the article, I got to wondering if you can take the event stream from the Software Development Life Cycle (SDLC) process and use RNNs to predict defects?

Advertisements

Evidence of Success

•August 14, 2017 • Leave a Comment

A few thoughts if you are about to venture down the road of building a software applications:

  • How will you know when you are DONE on implementing the requirements (Stories)?
  • How will you validate to the product owner that you are DONE?  Maybe consider traceability and linkage of stories to acceptance criteria to acceptance tests?
  • Ticking a box to say you have implemented all the requirements is in many ways immediate stale as soon as the code is “enhanced” post the box ticking exercise.
  • When in production, how will you know if the application is “broken”?  Considering solutions for logging, monitoring, availability etc
  • Logging is only as good as the data logged 🙂  Over use of logging will mean there is the wrong noise to signal ratio 🙂  Likewise to monitoring
  • If the application is “highly available”, prove it, in production 🙂

Know Your Stack – Neo4j

•February 22, 2017 • 2 Comments

Sometimes things go south in production.  Often its a combination of events that have generated a perfect storm.  The storm being production outages that are difficult to diagnose due to lack of understanding of the application stack, tools, and knowledge.

When attempting to diagnose an issue, its important to piece together the situation though evidence – log files, monitoring tools, etc.  Your own codebase can often be improved with appropriate logging and inclusion of monitoring tools/UI’s.  Issues often come about with off-the-shelf products, and lack of knowledge around how to understand the inner workings.  One example in Neo4j.  What follows is a few hopefully helpful suggestion to understand what is going on in your application stack:

  • Neo4j has a query.log which is extremely useful in tailing to view the queries being run, and the time to execute each query.
  • Neo4j browser (port 7474 by default) has a “:play sysinfo” which has some basic information that can prove useful e.g the Transaction tile, and in particular “Current”.  If Current remains move than 0 for a duration, when you expect it to be 0, you could have long running transactions 😦
  • Neo4j 3.1+ has improved query management.  In particular its now possible to see what queries are running, and kill queries if required
  • Reading documentation before using a product is key.  Remember the Open Files for Neo4j.

Data Engineers

•February 7, 2017 • 1 Comment

The refinement of roles and titles in the ‘Big Data’ world continues to evolve.  Previously I blogged about the many roles in the data pipeline engineering world, now Maxime Beauchemin adds to the context with an article on “The Rise of the Data Engineer”.

In many scenarios I think the ETL concepts of old are dead.  As I’ve blogged about previous, its more a ELT pipeline, since all raw data is worthy of keeping.

I’d say the below comment is particularly key, especially if you’ve got an efficient Continuous Delivery pipeline:

Just like software engineers, data engineers should be constantly looking to automate their workloads and building abstraction that allow them to climb the complexity ladder.

Angular 2 – Relief from the madness of Angular 1

•January 12, 2017 • Leave a Comment

Angular 2 appears to be a breath of fresh air compared to the madness of Angular 1 – $stateProvider was one of those mad features in my view 🙂 Thankfully, with Angular2 we are back into a component world – not exactly sure why we left, given the history of UI frameworks from the Microsoft .NET and Java UI days.

Documentation appears good as one would expect on the angular site, which an architecture overview here.  Probably most useful is the cheat sheet.  Nice callout to a pattern everyone should be familiar with:

Apply the Single Responsibility Principle to all components, services, and other symbols. This helps make the app cleaner, easier to read and maintain, and more testable.

In the templates, its worth remembers all the HTML5 features available.

What Digital means for a bank

•January 11, 2017 • Leave a Comment

Two interesting articles on ING agile/digital transformation:

What digital means:

We’ve digitized our processes to make transactions clear and easy for our customers. We’ve invested heavily in channels and touchpoints with our customers, introducing mobile and other technologies so that we can offer our services 24/7, anytime, anywhere. We’ve invested in analytics and in getting a 360-degree view of customers to better empower them to make important decisions about their financial assets.

Changing the HR process:

Our HR processes weren’t fit for that purpose, so two years ago, we looked outside—specifically, at Netflix and what they were doing. We found the five-stage Dreyfus model,4 which is based on observable behaviors, not the number of diplomas you have. It measures how you acquire knowledge, how you apply it, and how you transfer it across teams and the organization. We defined our own five stages of IT-engineering performance. We involved the engineers themselves in this process. We asked them which areas of knowledge and which competencies they would expect themselves and their peers to have at various levels of maturity.

On silos:

As long as you continue to have different departments, steering committees, project managers, and project directors, you will continue to have silos—and that hinders agility

Onboarding:

one important initiative has been a new three-week onboarding program, also inspired by Zappos, that involves every employee spending at least one full week at the new Customer Loyalty Team operations call center taking customer calls. As they move around the key areas of the bank, new employees quickly establish their own informal networks and gain a deeper understanding of the business

 

Data Teams

•January 5, 2017 • Leave a Comment

Another interesting read over at Monzo – Laying the Foundation for a Data Team.  Hard not to agree with the “Three core principles” listed: Autonomy, Cutting-edge managed analytics infrastructure and Automation.   I’m always impressed with the ideals people come up with when given autonomy over data and thought 🙂  Automation and infrastructure (via code) are I think key to any software project, not just data 🙂

Kafka – enough said 🙂  Isn’t is in all data solutions ?  “ETL Is Dead, Long Live Streams” discusses Kafka nicely 🙂