Janelle Klein: April 2011

Friday, April 8, 2011

Assumptions, Predictions, and Plans

Another old post, I really liked this one. Although, now I think there are more options to keep steering the right way that don't necessarily involve rumble strips/feedback mechanisms. The metaphor kinda breaks down... but, you can also steer the way by making sure you can only steer the right way. :)

Poppendieck has a beautiful quote that I really loved on this topic, I'll have to find the whole thing at some point. But basically, "predictions don't create predictability."

--- Feb 11, 2008

One of the hardest habits to break may be the trust and reliance on our assumptions, predictions and plans. The idea that making a prediction will somehow reign in the unpredictable and lead us to reliable results. Despite repeated failures caused by the guidance of a false sense of knowledge, we continue to walk down the same path. Our reaction to the failure is to reinforce the same habit; assuming our plan was just faulty, and we should put more effort into creating better plans. Why don't we just stop?

Trying to imagine structure and order around something that is by nature unstructured and un-orderly, only provides an illusion of control. And that illusion will only lull us into more false assumptions, bad decisions and unreliable results. Instead of trying to plan your way to success, just throw your fear and discomfort out the window. Let go of any illusion that you have control, that you know what your customers want, that your solution will solve their problem. Because you know what? You don’t know that. You just want to think you do.

So how do we get to reliable results, despite an unreliable world? By creating knowledge. So your customers have a problem to solve. How do you KNOW that your solution will solve their problem? When you’ve solved it. That’s right, not any sooner. At that point, you will be able to grasp onto a tangible reality that you have created value for your customer. This is knowledge.

Suppose your code needs to find the latest edition of a book. How do you KNOW that your code does that? You test it. At that point, you know that under some specific conditions, the code behaves as you expect. It’s concrete feedback on reality... its knowledge. How do you know that in creating this new feature, you didn’t just break the one you wrote last month? Again, through feedback. If there are tests in place to protect all existing functionality, you already know.

Even if you don’t know where you are headed, you can still steer in the right direction to reliable value. At every possible opportunity, instrument your process with feedback that builds knowledge, and let that knowledge guide you to delivering reliable value. Get your software in your customer’s hands as often as possible, so you know if you’ve added value. And if not, adjust. Find out if that code really does what you think, test your assumptions. The longer you go without feedback, the further you are likely to be off track.
Create knowledge. And Let it do the steering.

A Rough Day in Training...

Another old post, this was one of my first adventures trying to do team-help consulting education type of stuff. I thought I could just open up their skulls, dump in some knowledge, they would all say 'aha!' and we would be on our way to making things better. Needless to say, it didn't quite work out that way. This experiences really stayed with me though, and I spent a lot of time reflecting, and trying out new ideas for how to effect change. But If people don't care to learn, don't want to try or think they need to improve, then better to go find a problem worth my time.

--- Feb 23, 2008

So yesterday I just had my first presentation on Lean Software Development. I did my best to make the presentation interactive to engage folks, but wow, I had no idea what I was up against.
So right before this meeting, earlier in the morning, the team decided that this would be a waste of their time because they already knew all this stuff.

Mind you, the reason I was asked to help out this project to begin with was because the project lead, who used to be the tech lead on my team could see the head lights of the software train wreck coming his way... but didnt know how to fix it, and asked me for help. But these guys knew it all right? They had it all down. Walking down those train tracks, completely oblivious that they had fundamentally changed nothing other than the fact, they were on a new project and had started over. Every project always starts with the best of intentions.

So my presentation went through a reality check of why waterfall is broken. Why its so expensive. Why you create this big hole of technical debt that is next to impossible to reverse. On my team we managed to get some hold on the problem and start to decrease the debt, but we are so far down the cost curve at this point that EVERYTHING is expensive. Its a beastly legacy project.
So the team continued with a ya, ya, speed it up, we know this stuff attitude... and then so we started talking about Lean.
So here's what I was up against, with the defensive guard unwilling to look at the problem another way (in pain order):

Inspecting for defects and preventing them is the same thing
Agile is the same thing as waterfall in a smaller box
We must understand a solution to a problem to be able to make progress in solving it.
They freaked when I said this: With an empirical control system (feedback and quick response aka agility), you dont have to know where you are going, you just need to know enough to steer in the right direction. (When I said this, it was like I said the aliens landed, or the world wasn't flat. I was just dismissed as crazy.)
A good bug tracking system is critical to a successful project. (what if you have no bugs? This was still unfathomable)
Customer sign offs on requirements have no impact on what the customer decides to put in the requirements.
A stream of features is the same as a stream of value (major team risk of feature bloat)

The part that bothered me the most was that they left the meeting with the same sense that I crazy and they knew it all already.

How do I get these folks to question their thoughts? Is it just a waste of time to try and help them at all, especially when some people actually want my help? Does anyone have any thoughts on strategies for poking some holes in their reality that they'll have to reconcile?

Bleeding Away Our Knowledge

Kerry Kimbrough wrote this a while back (again, killing my other blog), and today, I think it mostly rings true for me... except that I've always found it relatively easy to come 'up-to-speed' on a project relatively quickly via reading the codebase and learning the domain. Even with a project I'm working on now, which has an extremely confused architecture, what I needed to understand to be productive on the project and to understand how it might be rearranged to make way more 'sense'. The tests don't help much at all in this case at expressing intent, but I still think they can. And have found working at trying to make them express intent quite valuable when I have to look at them again. What I'm still missing is all the why's... especially for the decisions that aren't intuitive.

Even still, seeing the impact of no clear architectural direction and no common understanding of the design, I see what a mess it causes. And while I may be able to speed read through a code base, not everyone can do that as easily. And I have to work with those people. :) Whatever it takes to maintain that common knowledge and keep folks on the same page has gotta be much more cost effective than the impact of not having it.

--- Feb 27, 2008, Kerry Kimbrough

Janelle observed that a sign of a troubled project is an exponentially increasing cost per change over time. My response was "No, this is virtually an invariant for any software system." You can quibble if the $/change/year is really exponential, what is the coefficient, etc. But always upward.
I think this issue is much deeper and more intractable than agility vs. waterfall. Agility per se doesn’t help and may even hurt.

Here’s the problem: dissipation of the knowledge behind the system design. The system design is the result of a large number of design choices. There is a huge amount of knowledge embedded in which choices were made and, more importantly, why. In most teams, this knowledge is completely tacit, and never really exists outside the skulls of a few people. What’s left is the code itself, which typically does not capture design knowledge (only the effect of the design), much less the design rationale. The result is that every individual has to guess for himself. Eventually, none of the skulls with the original design knowledge remain on the project. The code gradually becomes the vector sum of several different design concepts, some of them confused and faulty. This increasing incoherency and resulting complexity eventually outstrips anyone’s ability to fully comprehend. $/change increases because of the increasing effort required to know how things work and why things break.

Agility hurts to the extent that it refuses to create and maintain the non-code artifacts required capture the design knowledge and rationale. Most teams fail in this difficult job, but XPers actually stand there and say you shouldn’t even try!

But the benefit of actually making the design an explicit real-world artifact is not just is holding down $/change/year. The real benefit is that the discipline of creating and maintaining this artifact leads to a better, simpler, higher-quality design. The real benefit of taking it out of the skulls and into the real world is that the whole team can learn it, share it, and improve it.

It needn’t conflict with agility. You don’t need BUFD, but you do need a design that evolves incrementally. As the tests drive the code, so the design drives the tests.

Is Automated Testing Waste?

I wrote this several years ago, and I have mixed feelings about it now. Wanted to hold onto it since I'm killing my other blog. I think there is a blurry line between knowledge management and unnecessary with automated testing. I've mostly settled on keeping the knowledge around is the value-add part, the rest of it is all a mechanism for trying to accompish that, that if you could reduce to 0, without impacting the knowledge part, then you should. After the value stream mapping process with our software project and discussing ways to do less automation work, I was struggling with how this made sense.

---- Feb 28, 2008

When you are designing something, the knowledge you build from the discoveries along the way and the mistakes you make are important inputs into designing the code that solves a users problem. If you have to relearn this stuff, its waste. Maintaining knowledge is a crucial part of lean design. When we automate a test, we codify various things that we learn and preserve that knowledge as part of the system.

Once we learn something about what the software needs to do to work, or what it shouldn’t do, or what we are intending it to do… we can build protectors into the software, that whenever one of these important things we've learned, is violated, the system tell us and we can prevent it from harming the software. And when we are working on changing that software, we have examples that tell us the intent of the software, so that we don’t have to figure all of this out again and again.

QA resources have a unique and valuable skillset that is critical to building knowledge about a system. If we can use these knowledge building abilities to prevent bugs rather than detect them, we can accelerate our ability to solve customer problems, which adds value.

Who came up with the "Hardening Sprint"?

I think this one might take the cake for stupid things invented that has lead to institutionalized delusion and "Agile" disfunction. It's at least up there on the list.

Most projects and teams that were never brought up in an agile development context, that end up getting forced into one, start with an Agile bow wrapped around them, usually by the name of Scrum. So now we iterate through our waterfalls! And at the end, because we are in "transition", we of course plan for a "hardening sprint".
Now from my experience, on a mid-sized project with a negligible amount of automated testing, the variability for identifying the source of a defect is somewhere around 10x. Things closer to the skin of the app may be really quick, breaks in the guts usually take some time to trace down and can end up being quite complicated, defects around concurrency, performance, data or environment issues can unravel pretty quick. Anything other than "skin-level", is more like finding a needle in a haystack. The bigger the pile of hay (changes) to sift through the longer it will take... and not in any kind of linear way.

So I have this highly variable "hardening" activity, and then add to that, I can't drop any scope. I have to harden until it's hard enough that I can actually put it in production... and then, oh ya, it's gotta fit in these 2 weeks planned at the end of the release. Really?

Managers like plans. They like the world to fit neatly into little boxes, to put milestones up on power point slides and green dots next to their projects to let everyone know that all is well. And then we gave them this grand idea that it was ok to put a massively-variable scope-bound activity inside a "hardening sprint" box and plan the world around it like it was as likely to happen as anything else. If you have a -reasonable- manager, they might ask, ok, well give me a more realistic date then to put on my power point slide. If you really padded the date to cover the real variability involved here, it'll never make it in the slide show. So, the date becomes "what we hope" will happen.

And even if you've got a super-energetic team that will burn themselves out for you with stay-up-late heroics to make the release date, overtime can only cover a 2x variability, tops. Beyond that, it just aint gonna happen. The date will just keep slipping. You keep setting new dates, and those might slip too, and ultimately the hardening will be done when its hard enough. And in some cases it might never be. It's not hard for these kind of things to end up binary. It's challenging work, and if you don't have the skills on the team, or the dev that had the app all in his head walked off the project, you might never figure it out. You might not ever be able to ship. There might not be any light at the end of that tunnel... in which case it becomes a decision of how much money you want to sink into it before pulling the plug. That's a scary place to be.

You can't fix this problem by trying to improve your estimates. Thinking harder about it won't help. Better predictions will not improve your predictability. If you want predictability, you gotta be more predictable.

Shrink the haystack. Even for the craziest, most complex defect you can imagine, if you can illustrate that it works fine with one version of the code, and breaks with a slightly different version of the code, I can guarantee that it will greatly reduce the variability in finding the source of the problem. When you only have a couple pieces of hay laying over that needle, it's much easier to find!

Today I ran into a seriously obscure defect, that under other circumstances, I could see possibly taking weeks or months to track down. But it took me a couple hours because I had a single line code change toggle. (I know, a couple hours huh? It was a crazy bug!) When I added a spring @Transactional annotation to my test method, my unit test broke, when I removed it and it ran with auto-commit, the test passed. So of course, my thought was... aha! something must be broken that had to do with the transactions! And down the rabbit hole I went. But as I dug into it more, I found out the DATA was actually being returned differently from the query when I was in a transaction, even though the data was the same in the table! WTH! It turned out to be a bug or misunderstood behavior with mybatis and caching by default (that you can't disable apparently)... some other random part of the code loaded in the same spring context was modifying an object returned from a query, and I happened to be calling the same query, and when I did, I got back their modified object! Pretty insane bug. But I had a little haystack.