Thursday, April 2, 2015

From 1 release a month to 6 per day

Presented to Scrum Users Group of South Africa (SUGSA), telling Nomanini's story of how our dev team has optimised our continuous delivery process to increase releases from about 1 per month to more than 6 per day.

In summary:

  1. As a start-up, Nomanini has limited cash runway, and is in a competitive market. Need to develop product as fast as possible.
  2. Tooling and automation have a compounding return on investment.
  3. In 2011 we were using Scrum. As we started supporting a production system we had more ops and less dev. We struggled to find time to work on new features. Business and customers got frustrated with engineering.
  4. Story points per sprint (week) was a terrible metric: too much variability to use for reliable estimates. Using average points per week is silly because you will miss you sprint goal about 50% of the time. (Because you get through fewer points than the average.) Not great for building confidence.
  5. Scrum was designed with 1 month sprints in mind. Variability is less over longer time horizons.
  6. There is pressure for shorter and shorter sprints to remain 'agile.'
  7. What happens if we have 1 story sprints, with a release at the end? You get kanban. Independent teams/pairs/people can work on independent stories and release independently.
  8. Kanban uses committed to SLA's and different classes of work = deadline per story.
    1. Fast-track for production issues: Release work within 2 days, 80% of the time.
    2. Fixed delivery date: Release committed to work by a fixed deadline, 80% of the time.
    3. Standard: Release a feature in to prod within 5 days of starting, 80% of the time.
    4. Background: For minor features or bugs, refactors or dev spikes. Release within 5 days, 80% of the time.
  9. We track how long it takes work to travel through our development pipeline. Before we started developers thought bottleneck to get work done was them. Proved it was the release process (test, demo, user acceptance testing, roll-out.)
  10. We built tooling to improve the test, demo, UAT, roll-out process.
  11. We dont estimate individual stories other than ask ourselves: "Is there a good chance we can get this done within 5 days? (our SLA.)" 
    1. If no, then try make story smaller, but still keep business value. (Reduce scope, split in to multiple independent pieces of work that can be released independently.)
    2. Or if you can't split the work, the accept that, for this story, you will miss the SLA. That's why the target is 80% and not 100%.
  12. For estimation: Using a history of work, you can see the trade off between accuracy of estimation and number of days to complete work:
    1. 60% chance that work will be done in 4 days.
    2. If you want 95th percentile confidence, then quote 9 days. 
    3. If managers have a "immovable deadline" then they can what chance they have of hitting it.
  13. For release planning: See there is a 50% chance of completing 11 stories in a month, and 80% chance of completing 7 stories in a month. By looking down backlog can see where will be in 1 or 2 months.
  14. Assumes that the type of work that we do this month is similar to the type of work that we did last month.
  15. We have built tools that 
    1. manage automated testing, 
    2. feature flags to decouple release of software on to servers from release (switch on) of a feature.
    3. Track version through the development pipeline.
    4. And use statistical methodsd to detect issues in production.

Sunday, March 22, 2015

Visualising Mobile Network Data with Google Big Query and Google Maps

Hiren and my talk at AfricaCom about using Google Big Query to mine OpenCellID data across 1.2 billion rows, and using Google Maps to visualise the data.

Saturday, March 7, 2015

How I: Find, screen, and hire developers

Recently I made this video as with Google as part of their Start-up Launch YouTube series "How I". The series lets developers and start-up founders share knowledge and best practices on a range of topics, from UI/UX design to engineering to hiring.

I've shared some of my experience building a hiring pipeline for recruiting and onboarding developers.

I'd love to hear your experience recruiting engineers for your start-up.

Saturday, October 4, 2014

Monitoring distributed systems with Google Big Query and R

In this blog post I’m going to explain how we use Google’s Big Query for storing and mining log data, and then use the open source statistical programming language, R, to run statistical tests to determine if new code is behaving correctly.

At Nomanini we believe that Continuous Delivery is a key business differentiator that, over the past three years, has allowed us to catch up and overtake some of our more established competitors.

This year we have increased our delivery rate from 10 production releases per month in January to over 50 releases in September, without increasing our engineering head count. We’ve also scaled from a few hundred devices to almost a thousand. Manually upgrading and monitoring each device is now impossible.

Even with thousands of unit and automated acceptance tests running in CI there was always some apprehension when it came to rolling out new code to the point of sale devices in the field. Broken code is going to prevent sales or worse, turn a device in to an expensive brick.

Rolling out firmware to devices in the field

After a new version of firmware runs the gauntlet of automated tests, it is promoted from CI to Alpha then to Beta status. Our users and their devices are by default subscribed to our Stable upgrade channel, but we’ve convinced a few of them to be Beta testers and their terminals are subscribed to the Beta upgrade channel. (Thanks, Chromium Project for this great idea.) As soon as code is promoted to Beta, devices on the Beta channel begin upgrading to the new firmware.

To limit any potential impact we rate-limit our roll-out’s by only upgrading one device every two minutes. Once a device has been told to upgrade, it logs in to the server, downloads the new binaries, and when the merchant isn’t using the terminal, seamlessly boots up the new version of firmware.

Here we have two versions of Master code (our internal name for firmware), one in Beta and the older version in Stable.

How can we be sure that the new Beta version of firmware is not broken in some subtle way that wasn’t detected during automated testing? And how do we know when it’s okay to promote a Beta version to Stable so that all terminals receive the upgrade?

The process that we use is:
  1. Monitor metrics on both the Stable and Beta embedded firmware while running in the field.
  2. Upload the metrics from the devices to our server (Google App Engine) and save to a database. (Google Big Query)
  3. Extract the metrics in to a statistical analysis tool. (R or RStudio)
  4. Run statistical tests to determine if there is any difference between the Stable and Beta releases.
  5. If there is no difference then there is no reason not to promote Beta code to Stable.

In-field monitoring

As with most things embedded, in-the-field monitoring is surprisingly challenging. Terminals are battery operated and use the often unreliable GSM/GPRS network to connect back to our servers. We only have a few hundred kB of flash available for logs. And in some countries data over GSM costs more than $1 per MB, sometimes as much as $10 per MB. So we can’t just upload raw log files or fire off UDP packets like you might do in a data centre.

Apple’s iPad and iPhone usage statistics (Settings - General - About - Diagnostics & Usage - Diagnostics & Usage Data) aggregate key:value metrics over a full day. It looks like they have histogram type counts (like backlight brightness, with 10% buckets); and counters for number of events, seconds, power in mA.

An excerpt from one of my daily iPad log files:  
Taking that as inspiration, we wrote a small C++ library that can be called by application code that wants to be instrumented. Counts are saved in a C++ key:value map. At midnight UTC the map is serialised to JSON and uploaded to our server.

The JSON data packet looks like this:
  "counts": {
    "ERROR": 7,
    "WARNING": 1475,
    "INFO": 19622,
    "DEBUG": 362754,
    "[E].EventsManager.423": 2,
    "[E].GPSManager.259": 1,
    "[E].SlaveCommsDispatcher.158": 2,
    "[E].SlaveCommsDispatcher.311": 1,
    "[E].SlaveCommsDispatcher.395": 1,
    "CSQ.BitErrors.0": 42,
    "CSQ.BitErrors.1": 1,
    "CSQ.BitErrors.3": 2,
    "CSQ.BitErrors.5": 1,
    "CSQ.SignalStrength.6-11": 18,
    "CSQ.SignalStrength.12-17": 12,
    "CSQ.SignalStrength.18-23": 15,
    "CSQ.SignalStrength.24-29": 1,
    "GPRS.TimeToConnect.0-20": 2
  "firmwareVersion": "4264-548923b591c6",
  "startTime": "2014-09-22 00:00:01.152",
  "endTime": "2014-09-23 00:00:06.574"
There are several things going on here:

The ERROR, WARNING, INFO and DEBUG counts are the total number of debug lines that have been logged by the terminal while the code is running.

For each ERROR or CRITICAL line that is logged the library makes a key in the format [loglevel].filename.linenumber, such as [E].GPSManager.259 and increments the count for that key. We also increment the global ERROR counter.

Logging the filename and line number tells us the region of code that is causing errors. Even without a stack trace we have a good idea what caused the problem. Also, this is not meant to detect detailed errors on particular devices, but rather detect similar errors across many devices so that we can detect buggy versions.

We also use the logging library to build up interesting histograms: An example is the CSQ, or signal strength, a value that ranges from 0 (0%) to 32 (100%). Each time we read the CSQ we increment the correct bucket. This is used for testing whether changes in antenna placement on different hardware revisions has improvement signal quality across those devices.

Histograms are also used for timing events: GPRS.TimeToConnect.0-20: 2 means that it took between 0 and 20 seconds both times that the modem connected to the GPRS network. Since there are no other buckets it is implied that all GPRS connections were shorter than 20s.

By default, at midnight (UTC) devices send up their counts in a JSON packet and the counters are reset back to 0. The server streams this data to a table in Google Big Query along with the unique ID of the device that it came from.

To detect problems earlier, Beta devices upload their statistics more often (every 2, 4, 8 or 12 hours) so that we get diagnostics sooner after a new Beta release starts rolling out.

Streaming diagnostic data to Google Big Query

The terminal uploads the JSON diagnostic packet to our application running on Google App Engine during the course of its normal connections to our server. Once the data arrives on the server, it is packaged in to a push task that connects to Google Big Query’s streaming insert API, where it is inserted in to our diagnostics event_log table.

Creating a View in Big Query

Because all diagnostics events, not just the counters are streamed in to this table, I created a View to pull out only the diagnostic counter data, simplifying downstream processing.

To create a View through the Big Query web UI, compose a query as usual and then click Save View. It will ask for the Project, Dataset to save the view in, and the Table ID, which is the table name of the view. Once saved, the view can be queried as any other table query.

I used the following SQL to create my view:
   JSON_EXTRACT_SCALAR(event_data, '$.firmwareVersion') AS firmware_version,
   SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data, '$.counts.CRITICAL'))) AS critcals,  
   SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data, '$.counts.ERROR'))) AS errors,  
   SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data, '$.counts.DEBUG'))) AS debugs,  
   1e6 * SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data,'$.counts.ERROR'))) / SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data, '$.counts.DEBUG'))) AS error_rate  
 FROM [nomanini.event_log]  
 WHERE event = 'Counters'  
 GROUP BY firmware_version, device_id  
 ORDER BY firmware_version DESC, device_id DESC;  
In the event_log table, the counters were stored as a JSON object in the event_data column. Another great feature of Big Query SQL is JSON_EXTRACT_SCALAR that allows you to extract data out of a JSON object by specifying the key path.

From the event_log table I've got the device_id, a unique, per terminal identifier, and from the JSON in the event_data column I've pulled out the firmware_version, critical, error and debug counts.

I summed the counts using GROUP BY firmware_version, device_id so I get the total number of errors that each device produced on each version of firmware. I've also calculate the error_rate, which is defined as the number of errors per million debug lines, or error/debug * 1e6. (More on why, later.)

Querying this view:
 SELECT * FROM [nomanini.firmware_error_rates] LIMIT 10;  

Getting the data from Google Big Query in to R

There’s a fantastic bigrquery package in R's extensive CRAN repository which allows you to run SQL queries from within R on Big Query and get the results in an R data frame.

R will start the OAuth process and open your browser for authentication. The query runs and the results are saved in to a result data frame.

To query the firmware_error_rates view in Big Query from within R, I run the following code:
 # The two versions that we want to compare: Beta and Stable   
 firmware_version_beta = "4264-548923b591c6"  
 firmware_version_stable = "4252-c2b7961f0a5b"  
 # The base SQL statement  
 sql_query_a = "SELECT firmware_version, device_id, critcals, errors, debugs, error_rate FROM [nomanini-dashboard:nomanini.firmware_error_rates] WHERE firmware_version = '"  
 sql_query_b = "' ORDER BY device_id LIMIT 100;"  
 # Create the SQL strings, concatenate using paste  
 sql_beta <- paste(sql_query_a, firmware_version_beta, sql_query_b, sep="")  
 sql_stable <- paste(sql_query_a, firmware_version_stable, sql_query_b, sep="")  
 # Run the queries for both the old and new versions  
 v_beta <- query_exec("nomanini-dashboard", "nomanini", sql_beta)  
 v_stable <- query_exec("nomanini-dashboard", "nomanini", sql_stable)  
 # Fill NA's (null's) with 1 so can log transform the data  
 v_beta[] <- 1  
 v_stable[] <- 1  
 # Join two data frames by the common key variable device_id  
 # only device_id’s that are in both v_stable and v_beta will be in the merged data frame  
 merged <- merge(v_beta, v_stable, by="device_id", suffixes = c(".beta",".stable"))  
The code runs two queries on the view, one for the Beta and one for the Stable version; replaces any null's with 1's (so we can log transform the data, more on that later); and then joins the two result sets so that only devices that were on both the the Stable and the Beta version are included in the merged data frame.

Detecting a bad versions of code using the Student t-test and the Wilcoxon Signed Rank test

We want to detect if a Beta release is 'bad' so that we can stop the roll-out of this new, broken code as early as possible. At the moment we have defined a release as 'bad' if, on average, it logs more errors than the current Stable release.

In the image below, two versions (the blue and green version) are presumed to be of similar quality because the devices on those versions have a similar number of errors. (Each square represents a device, and the x-axis is the number of errors a device had while on that version.)

The devices on the red version have a lot more errors and is assumed to mean that there is a problem with that version.

To test this statistically we use Student's paired sample t-test because we are comparing two versions of code, version_old and version_new, and the same devices are in both version_old and version_new groups because each device was on version_old and then a few days later was upgraded to version_new. This builds two datasets but with the same population (devices) in both samples.

The hypothesis that we are testing is:
  • H0: The null hypothesis: "There is no difference between the distributions of the error rates of the two versions."
  • Ha: The alternative hypothesis: "The new version has a different error rate distribution from the old version." (Note, the error rates could be better or worse, thus the two sided test.) 

I’ve made some assumptions here:

  1. That more ERROR's are logged on poor releases than on good releases.
  2. That the ratio of ERROR to DEBUG log lines (the E/D ratio) is higher for a poor release (a release that has an problem) than for good release.
  3. That the E/D ratio for all devices on a version is similar. Or at least are normally distributed.
  4. That the only reason the E/D ratio changes is due to code changes. Nothing external influences the number of ERROR's. (We know this is false because things like poor connectivity, flat battery, etc. cause errors to be logged, but hopefully there are relatively few of these outliers.)
  5. That the mean of the E/D ratio changes significantly with a poor release and does not change significantly between releases of similar quality.
  6. That running a paired t-test between two versions will detect changes in the sample mean of the E/D ratios.

But first we need to make some data transformations to make the data ‘better behaved’ so that our tests stand a better chance of finding a signal in the noise.

Change from absolute number of errors to an error rate to account for different usage patterns

If you’ve been paying attention you would’ve noticed that I changed from talking about errors to talking about error rate. This is because we need to normalise the absolute number of errors to account for the amount of usage a terminal gets. Terminals that are used a lot tend to have more errors caused by the environment, such as poor GSM and GPRS connectivity, running out of paper, flat batteries causing subsystems to shut down, and many other problems that happen in the field that aren’t related to firmware but should be logged, anyway.

By dividing the number of ERROR lines logged by the number of DEBUG lines logged and multiplying by 1 million, we get the number of errors per million debug lines, a number that allows comparison between devices whether they’ve been running on a version for one hour or one week.

Log-Normalise to pull in the outliers for the t-test

Since we have relatively few Beta testers I’ve decided not to remove outliers and throw away data. But this means that there’s a long tail with some distant outliers, and so the data is no longer normally distributed which violates one of the assumptions for the t-test.

To pull in this tail, a common transformation is to log transform the data. [1, 2] In the image below you can clearly see how the log transform made the distribution more normal.

I've chosen log base 10 (as opposed to the natural log, or base 2) so that each unit on the x-axis is an order of magnitude increase in the error rate.

You’ll see in the code snippet above that I set devices that had 0 errors (NA’s or nulls) to have 1 error so that taking the log still works. (And 0 errors per million debugs vs. 1 error per million debugs is equal as far as I'm concerned.)

Running the t-test in R

 > t.test(log10(merged$error_rate.beta), log10(merged$error_rate.stable), paired=TRUE)
 Paired t-test

data:  log10(merged$error_rate.beta) and log10(merged$error_rate.stable)
t = -1.7194, df = 13, p-value = 0.1092
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.9404635  0.1068907
sample estimates:
mean of the differences 

Geometric mean of the differences (base 10): 0.3830131 
(Note that because of the log transform the mean as quoted in the test is actually the geometric mean and not the arithmetic mean. [1])

Using the Wilcoxon Signed-Rank test

Probably more correctly, I run the Wilcoxon Signed-Rank test because, unlike the t-test it does not assume a normal distribution of error rates, which makes it more robust against outliers. For this test, then, I don’t need to log normalise the data. This also makes interpretation of the results a little easier.
 > wilcox.test(merged$error_rate.beta, merged$error_rate.stable, paired=TRUE,  
 Wilcoxon signed rank test

data:  merged$error_rate.beta and merged$error_rate.stable
V = 30, p-value = 0.1726
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -505.36438   55.72701
sample estimates:
The median is the change in the error rate between the two versions. Here, -121 means that the new, Beta version has 121 fewer errors per million debug lines than the current Stable version. And we can be 95% confident that the number of errors per million debug lines in the Beta version is somewhere between 505 fewer and 56 more than in the Stable version.

Because the p-value in both our tests is larger than the 0.05 significance level we cannot reject the null hypothesis, and have to accept that "There is no difference between the distributions of the error rates of the two versions” and therefore the code quality between versions is similar.

Based on this we'd promote the Beta code to Stable.

Note that we cannot say whether the code is truly good or bad, only that if we were happy with the old Stable version then we should be happy with the new Beta version.

Displaying the data

Perhaps easier to interpret than the statistical tests is the boxplot and stripchart. Each coloured square in the strip chart represents the error rate for a single device. The box plot shows the min and max, the first and third quartiles and the dark strip is the median. It also shows any outliers as circles. (The blue version has two outliers on the right.) I’ve vertically separated the plots for readability.

Here you can see that the strip chart as well as the box plots largely overlap each other in both versions, so both versions have similar error rates, visually backing up the statistical tests above.

Ok, so does this actually catch problems?

A few months ago we completed a major code refactor in one of our subsystems. We started rolling out the new Beta firmware, and while most of the devices behaved normally after the upgrade, on some devices the code broke because it didn't correctly deal with a corner case that only showed itself in the field, and then only on a few devices.

The errors were logged by the devices, uploaded to the server and detected when we ran these tests. The devices that exhibited the bug are clearly visible in the chart below clustered on the right as the red outliers.

Both the t-test and Wilcoxon test had p-value's below the 0.05 significance level, giving strong evidence that there was a difference between the versions.

 > t.test(log10(merged$error_rate.beta), log10(merged$error_rate.stable), paired=TRUE)
     Paired t-test

data:  log10(merged$error_rate.beta) and log10(merged$error_rate.stable)
t = 2.8624, df = 28, p-value = 0.007872
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.1131325 0.6825117
sample estimates:
mean of the differences

Geometric mean of the differences (base 10): 2.499321

 > wilcox.test(merged$error_rate.beta, merged$error_rate.stable, paired=TRUE,  
     Wilcoxon signed rank test

data:  merged$error_rate.beta and merged$error_rate.stable
V = 318, p-value = 0.02906
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
     7.075123 14546.761849
sample estimates:

Based on this evidence we stopped the code roll-out. With the help of the [E].filename.linenumber couters we were able to determine what caused the problem, wrote a unit test that recreated the bug, fixed the code and rolled out the new version.

The future

There are many things that I'd like to do, mostly around automating the detection process more and presenting the devops team with a nice overview dashboard of how a roll-out of new firmware is going. 

For this I'll probably run OpenCPU, a RESTful API for R, on an instance of Compute Engine so that R functions can be called by our production app running on AppEngine.

Some other ideas:
  1. Run tests automatically every hour and flag outliers for investigation by support staff.
  2. Automatically pause a roll-out if the statistical tests detect a significant change.
  3. Monitor [E].filename.linenumber's that are strong indications of specific types of problems, not just the global ERROR count.
  4. As we accelerate the number of releases to more than one firmware release per day we will need to increase the number of beta testers so that we can get test results sooner after beginning a roll-out.


The internet is awash with information on learning R and statistics in general. One of the best statistical textbooks I've ever read is Learning Statistics with R: A tutorial for psychology students and other beginners which the author has made available as a free pdf download. It's my go-to textbook every time I get myself confused.

Sunday, May 11, 2014

Parking at Cape Town International Airport: Cost Calculator

Parking at airports is expensive and Cape Town International is no different. Every time I return from a trip I'm surprised how much my parking cost. Sometimes it's expensive than the flight.

I made a quick chart of parking price vs days parked for various options, including getting an Uber Taxi or MyCiti bus from Cape Town CBD to the airport.

Cape Town International Airport parking cost chart

While MyCiti bus is the cheapest I want a door-to-door service (either my own car or a taxi.) But if you're on a budget then MyCiti is the clear winner (about R50.00 each way.)


  • Shaded parking in P3, P4 is cheapest up to 5 days.
  • Long Stay parking in P5 is cheapest up to 12 days.
  • And then Uber to the airport and back is the best option.
Interestingly, the Katanga Valet parking (which includes a wash, vacuum, polish) is not much more expensive than the Garage parking (in P1, P2.) Having used them before, I'd recommend it - especially if you are late and just want to dump your car and run to the check-in.


Wednesday, May 7, 2014

Interns: How to improve your CV

tl;dr: Make your CV's easy to parse and ensure it stands out from all the other CV's that I receive. I am looking for passion and a history of self-directed learning.

Last Saturday I attended #breaktherules hosted by the University of Cape Town Developer Society to help place Comp. Sci. student interns at local companies.

Even though Nomanini is a small company (compared to the likes of Amazon and Oracle who were also there) we received well over 100 CV's and Gerrit and I spoke to more than 50 hopeful students.

Most of the CV's that we received were useless as a recruiting tool.

In my follow-up emails to everyone who gave me their CV I included some pointers, which I thought I would expand here.

My biggest piece of advice: Make your CV's easy to parse and ensure it stands out from all the other CV's that I receive. I am looking for passion and a history of self-directed learning.

Things which help are:

  • Clearly show your degree and major and your current year of study. (Don't make me work it out based on when you started, or when you hope to graduate.)
  • Make it easy to find your name, email address and cell number. Add a photo to help me recall who you are from all the conversations that I had.
  • Write a paragraph on personal projects that you have worked on outside of your course work. Include links to your code and applications.
  • Describe the sort of work that you'd like to do during your internship as well as what you hope to gain from an internship. 

Less important (to me) is

  • listing every course/module that you have done or your high school results. (Highlighting subjects where you got 90%+ is fine, but I really don't care that you got 63% for "Word Power" two years ago.)
  • your references. If I need it I'll ask.

Really useless info is your age, your health status (always "Excellent"), marital status, drivers license.

I also hand out my business card at these events. If you really want to join my company email me afterwards, attach your CV, write a cover letter telling me why you are a perfect fit.

Remember, you are competing with your peers who largely have the same skills and experience that you do. Use any advantage to differentiate yourself.

Sunday, February 2, 2014

8tracks iPod Shuffle

I love 8tracks. And I love listening to my music while running or in the gym. Up to now I've used 8tracks’ fantastic iPhone app while out, and website and iPad app when at home or work.

Streaming music over my phones 3G is expensive, and strapping a phone to my arm is uncomfortable (and frankly looks silly.)

Also, Apple's matchbook sized iPod Shuffle, which holds 2GB of music, is a thing of beauty.

I wanted to marry the two and clearly I'm not the only one thinking along these lines: Check out Alecsandru Grigoriu's fan art product brochure.

This weekend I decided to write a simple 8tracks client in Python that saves the audio files locally for later upload to my Shuffle.

I’m using the 8track API and Python Requests library which vastly simplifies http comms. (I can’t believe I used to use httplib for this stuff…)

The client is it’s own user on 8tracks which happens to follow another user (me) and also just happens to play all the mixes in my iPod Shuffle collection. The client is well behaved: It downloads the tracks only as fast as if it were really playing them and reports the songs as played at the 30s mark, as per the API docs.

For anyone who’s interested I’ve got the code on BitBucket at Feel free to contribute.