For many years now we've had an integrated crash reporting system. This has helped improve the stability of our applications immensely (often report now start out with “Wow, this is the first crash I've seen…”). But, it hasn't always been clear (especially in the alpha or beta timeframe) how well we were doing on overall stability. We could guess by counting the number of crash reports vs. an estimate of the number of active users, but that wasn't very convincing.
Near the beginning of June, I added some support to our software update and crash logging frameworks to keep track of things like:
- total times the application has been launched
- total number of crashes
- total amount of time the application has been running
(As always, our software update system reports its information without including any personal details, and can be disabled entirely if so desired.)
Using this, we can now chart the total number of hours OmniFocus has been running vs. the total number of crashes (reported or not!). As the pool of people testing OmniFocus goes up, or some testers go idle, or some user with large number of crashes isn't reporting them, we don't have to wonder as much how that affects our average crash rate.
After my latest crash fix, our rate has improved to about 8000 hours per crash. We aren't sure yet what constitutes a reasonable lower limit for hours/crash, but this does let us notice when a fix we've made actually is addressing the issue. We aren't yet tracking the number of hours that the application is active (an hour spent hidden counts the same as an hour spent in full use). Whether this matters, when averaged across a large number of users, is open to question.
Still, there are only 8760 hours in a year, so if we can get above that, we'll be feeling pretty good.