Friday, April 8, 2011

Who came up with the "Hardening Sprint"?

I think this one might take the cake for stupid things invented that has lead to institutionalized delusion and "Agile" disfunction.  It's at least up there on the list.

Most projects and teams that were never brought up in an agile development context, that end up getting forced into one, start with an Agile bow wrapped around them, usually by the name of Scrum.  So now we iterate through our waterfalls!  And at the end, because we are in "transition", we of course plan for a "hardening sprint". 
Now from my experience, on a mid-sized project with a negligible amount of automated testing, the variability for identifying the source of a defect is somewhere around 10x.  Things closer to the skin of the app may be really quick, breaks in the guts usually take some time to trace down and can end up being quite complicated, defects around concurrency, performance, data or environment issues can unravel pretty quick.  Anything other than "skin-level", is more like finding a needle in a haystack.  The bigger the pile of hay (changes) to sift through the longer it will take... and not in any kind of linear way.

So I have this highly variable "hardening" activity, and then add to that, I can't drop any scope. I have to harden until it's hard enough that I can actually put it in production... and then, oh ya, it's gotta fit in these 2 weeks planned at the end of the release.  Really? 

Managers like plans.  They like the world to fit neatly into little boxes, to put milestones up on power point slides and green dots next to their projects to let everyone know that all is well.  And then we gave them this grand idea that it was ok to put a massively-variable scope-bound activity inside a "hardening sprint" box and plan the world around it like it was as likely to happen as anything else.  If you have a -reasonable- manager, they might ask, ok, well give me a more realistic date then to put on my power point slide.  If you really padded the date to cover the real variability involved here, it'll never make it in the slide show.  So, the date becomes "what we hope" will happen. 

And even if you've got a super-energetic team that will burn themselves out for you with stay-up-late heroics to make the release date, overtime can only cover a 2x variability, tops.  Beyond that, it just aint gonna happen.  The date will just keep slipping.  You keep setting new dates, and those might slip too, and ultimately the hardening will be done when its hard enough.  And in some cases it might never be.  It's not hard for these kind of things to end up binary.  It's challenging work, and if you don't have the skills on the team, or the dev that had the app all in his head walked off the project, you might never figure it out.  You might not ever be able to ship.  There might not be any light at the end of that tunnel... in which case it becomes a decision of how much money you want to sink into it before pulling the plug.  That's a scary place to be.

You can't fix this problem by trying to improve your estimates.  Thinking harder about it won't help.  Better predictions will not improve your predictability.  If you want predictability, you gotta be more predictable.

Shrink the haystack.  Even for the craziest, most complex defect you can imagine, if you can illustrate that it works fine with one version of the code, and breaks with a slightly different version of the code, I can guarantee that it will greatly reduce the variability in finding the source of the problem.  When you only have a couple pieces of hay laying over that needle, it's much easier to find!

Today I ran into a seriously obscure defect, that under other circumstances, I could see possibly taking weeks or months to track down.  But it took me a couple hours because I had a single line code change toggle.  (I know, a couple hours huh? It was a crazy bug!)  When I added a spring @Transactional annotation to my test method, my unit test broke, when I removed it and it ran with auto-commit, the test passed.  So of course, my thought was... aha! something must be broken that had to do with the transactions! And down the rabbit hole I went.  But as I dug into it more, I found out the DATA was actually being returned differently from the query when I was in a transaction, even though the data was the same in the table!  WTH!   It turned out to be a bug or misunderstood behavior with mybatis and caching by default (that you can't disable apparently)... some other random part of the code loaded in the same spring context was modifying an object returned from a query, and I happened to be calling the same query, and when I did, I got back their modified object!  Pretty insane bug.  But I had a little haystack.

No comments:

Post a Comment