Saturday, June 28, 2014

Constant Urgency is a Symptom of a Hazardous Environment

When we use analogies like land-mines, fires, and various types of hell, we're referring to the hazards in our everyday work.  We may not be working with sharp objects or molten metal, but the anxiety, stress and exhaustion from working in a hazardous environment is very real.

From the outside, these problems are largely invisible.  So despite the hazardous work, we are usually expected to work without making mistakes.   We never have time to fix the hazards because there's always something more important to do.  Building the tools we need for safe development like failure recovery, diagnostic support, adequate logging and reliable deployment are often deferred in favor of more features.

Then when something explodes, as it inevitably will, high-risk heroics are required to save the day.  We work late nights and weekends repairing complex problems by hand and hope that nothing else goes wrong.

Instead of recognizing the symptoms of a serious problem, the long hours and heroics are often rewarded.  Fire-fighting, overtime, and last-minute hacks start to be expected.  Constant stress and exhaustion become the norm.

More people just add fuel to the fire.


Those that don't want to put in the long hours anymore are seen as not pulling their weight.  Frustration builds and the team gets burned out and the best developers start to leave.

The new guys just make things worse.  They don't know the software and the hazards to watch out for and they keep messing things up in the code.  We try to hold things together, but it's hard to get anything else done.  It becomes a full-time job just to keep the system from falling apart.

Management doesn't understand why productivity is so poor and tries to add more people to the work.  This just adds fuel to the fire.

Once this cycle gets started, it's hard to turn things around.  We get sucked into the problems, operating in a mode of constant urgency, and we don't want to see our project fail.  So we push ourselves to the limit of stress and exhaustion doing the best we can.  However, we're so busy reacting to all the things going wrong, there's no time to stop and fix the problems.  One more late night and a few hacks to get things working, but the cycle just doesn't end.

We knew better, but we did it anyway


The worst part about this is even when we know better, we do it anyway.

I remember one night in particular after working 60+ hour weeks for several months.  I checked in some code without running it at all and deployed my changes so I could test it in production.  I was so used to working under constant urgency, I had eventually thrown all my sense of principle out the window.

We had built out the delivery infrastructure and automated our release process from the beginning.  For a while, we were releasing every week; there were challenges, but for the most part things were going fairly well.  We had a major deadline coming up to support a new customer on our platform and investors had been promised it would happen by the end of the year.

The requirements meant drastically changing parts of the architecture and conquering some extremely difficult problems.  How long was it going to take?  We had no idea, but we did know we had better get to work!

We broke down the work and started chipping away at it, trying to do just enough unit testing to get by.  We paired on the more challenging parts and tried to parallelize the work to get it done as fast as we could.  We tried to integrate early, but there were so many problems.  The software produced weird results.  We just had to work through it.


We were caught up in the cycle


Some of us worked on testing and fixing, while others kept pushing along with the remaining features.  We knew we were headed down the path of a monstrous release, but we didn't seem to have any choice.  We worked an insane amount of hours troubleshooting problems just trying to get it stable.

The end of the year was rolling around and we finally got the software in production.  We thought the pain was finally over, but that was just the beginning.  We had no time to build out the infrastructure we needed to make changes safely, and our new users had a long list of complaints.  The pressure just never let up.

Every release it seemed like things would go wrong.  We'd work all weekend and be up late Sunday night trying to fix deployments that went wrong.  The data would be messed up.  Reports wouldn't be right.  We didn't really have a viable plan B.  The system was down, it took too long to restore from backup, we just had to fix it in production.


Something had to give...


We were so exhausted, but the urgency didn't end.  We were yelled at and threatened whenever things went wrong, but expected to continue the high-risk work.  How could they possibly give us bandwidth for work that wasn't part of the deliverables, when the project was already several months behind schedule?

We had poured so much of our time into the software and the people on the team were my friends.  We had great developers that had always been disciplined engineers and we all got sucked into the same trap.

Sometimes you just have to leave.  Working under threat and constant urgency makes great people do really stupid things.

Saturday, June 14, 2014

Designing Effective Teams

The same things that make for good software make for good team structure.  We need high cohesion within a team and low coupling between teams.  If people need high bandwidth communication across team structures to do their jobs effectively, the team structures are usually pretty dysfunctional.  Likewise if the members of a team don't have a need to talk to each other, they don't really operate as a team either.

Team structure is a design problem.  Developers can be quite good at it, once they start to look at it that way.  Designing the team structure around the architecture has a lot of benefits.  However, if you have a hairball interdependent architecture, you can't build a good team structure around it.  Trying to throw more people at the problem and artificially carve it apart is often where software organizations fail.  

Trying to go faster and throw more people at it often results in going *slower*. Teams get stuck in a trap of trying to police the code with reviews and there's no way to keep up.   The best resources can no longer be productive because they spend all their time reacting to the system that is busting at the seams.  Until the team learns a way to design the system in a way that *communication* can be scaled, leaders need to keep their foot off the accelerator pedal.   We need the time to invest in that critical learning.

I don't think it's a hands off, let the team figure it out kind of problem.  Organizational design is challenging problem and we need leadership to help figure it out.  But we need leaders that listen to their engineers, that know what to look for, and have an appreciation for the challenges of our craft.

Thursday, October 10, 2013

My new dev coaching...

Hey all,
Would you mind helping me to send this out to anyone you know that might be interested?  And a personalized endorsement of me would help too if you know me. :) 

I'm trying to get the word out... trying to build a developer coach-to-get-a-job program. :)  My new business experiment.

J

-------------
FREE Personal Development Coaching


Want to learn how to write cleaner code, or write more effective unit tests that aren't so annoying to work with?  How about learning a dev approach that will keep you feeling in control of your code, so the behavior stays predictable?

Sign up for personal one on one development coaching with me!  I'll meet with you, and teach you, and show you how.

The catch:

1) I'm only taking 5 people, because I need my sanity. :)

2) In exchange for being coached, you have to let me find you an awesome job that will help you grow further.  It's the awesomest place I know of to work in fact (other than New Iron of course).

3) Java takes preference, because I know it well.

4) And you have to be able to pass my interview. :)

Send me an email at janelle@newiron.com if you'd like to sign up. Since I have limited time, please say a few words about why this is important to you too, please.  And feel free to forward to anyone you think might be interested!

Wednesday, October 9, 2013

Lovin Life

I think my most productive days have been those roll out of bed and work days. Where you're just so into what you're doing it's the the first thing that pops into your head when you wake up. And you can't sleep because ideas keep popping into your head that you have to jot down. I just love the rush from being so excited about what tomorrow will bring, and ideas just coming together in the right way, its almost magical. I have an awesome life.

Two awesome breakthroughs: 

I figured out a working capacity model for shared business resources so that I could map capacity cost per client & revenue per client.  Talk about some amazing discovery opportunities.  What I've learned over the last week has been incredible.

And the other one, I figured out how to measure understandability and controllability in software so that the metrics matched my subjective evaluation.  I'm way excited about that - it's a huge breakthrough and the crux of my book and research effort!  I was so close, and could still make some things work ok, but everything seems to be falling into place.

I'm so incredibly excited.  

Wednesday, August 14, 2013

Got Database Pain?

I'm doing free brown-bag talks at companies around the community.   I have a lot of really great material after dealing with DB struggles in lots of different environments, and learning a lot along the way.  We'll go through patterns of common mistakes and how to reduce them, and strategies for making mistakes less costly when they do happen.   If you're interested, feel free to send me an email and we can schedule it!

Database CI: Practical Strategies for Reducing Database Pain

It's a common challenge - the database gets in the wrong state, the release is delayed, and the entire team is blocked and waiting for the centrally shared resource to be fixed.  Recovering from database mistakes can be quite painful.  Most of the tools available don't really solve this problem either - migrations focus on automating the deployment of the scripts, but don't help much in developing scripts that work to begin with.

But how can we reduce the number of mistakes? And how can we detect our mistakes as early and cheaply as possible?

In this presentation, we'll discuss strategies for reducing database costs with mistake-prone systems.  By adapting continuous integration principles to database work, we can drastically reduce the costs involved with database changes.  With practical examples and patterns for organizing SQL and database build automation, we'll cover strategies for many challenging issues:
  • Managing packages, procedures and views in the database
  • Changes (or mistakes) that can't be rolled back
  • High data volumes that make everything take longer
  • Systems that are hard to keep in your head
To register for this free brown-bag lunch presentation at your company, send us an email at contact@newiron.com and schedule a date! 

Saturday, August 10, 2013

Factors in DB Development Cost

I've been working on developing a new database CI tool suite, and was talking with a friend about his DB woes.  We talked about what problems he was having, where they occurred, and how long they were taking to resolve.   His database was really simple.  The scripts were pretty much always correct.  But some scripts might be forgotten, and his problems usually revolved around tracking down missing changes.

I realized my solution didn't fit his problems at all.  I had never worked in an environment that was all that simple.  There were always mistakes in the scripts.  And it was always painful to correct them.  When things went wrong, it was like the DB became the black hole of engineering hours, sucking away everyone's time.  We'd try to repair the mistake by hand.  Or if we couldn't figure out how to repair it, we'd have to restore from a snapshot of production.  And while all these repairs were going on, the engineering department was pretty much down.

Making mistakes was so painful.  And trying to use any of the database migration tools out there didn't seem to solve my problems.  In some cases, it even made them worse.  I always ended up resorting to rolling my own custom database tools.

So that got me thinking... how do you decide what kind of solution you need?

While there are many complex factors that drive DB development costs, there are two that seem to characterize the problem space quite well: frequency of mistakes, and cost of recovery.


When mistakes are rare, and recovery costs are cheap, existing migration tools are the perfect solution.  Most of the effort is spent in keeping databases up to date, and making sure all the scripts are deployed as the changes evolve.   Migrations are excellent at solving that problem, and handle it beautifully.

When mistakes are costly, migration tools can still be very helpful, but depending on how rarely mistakes occur, and how costly they are, migrations might not be enough.  Augmenting a simple migration tool might be a good option.

When mistakes are frequent, there's an entirely different kind of problem going on.  Developers aren't struggling with deploying the scripts, they're struggling to create correctly working scripts.  Migration tools tend to be intolerant of deploying and recovering from broken scripts.  And the tools can make recovery even more cumbersome by imposing additional constraints.

When frequent mistakes are also expensive to resolve... well, life is pain.   There's really only two strategies to a less painful life--figure out how to make fewer mistakes, or figure out how to make mistakes easier to recover from.  

I haven't found much help in this space in either the open source or commercial market, which is why I set out to try to fill that gap.  And I've built custom tools for solving similar problems several times over now, and had the opportunity to make a lot of mistakes.  Now I get to do something with all that learning, and hopefully help reduce some of the pain out there. :)


Friday, August 9, 2013

Wow how time flies...

Wow, has it ever been a while.  It was January, February, then suddenly it's August.  How the hell did that happen?

Some challenges came up at work that I had to jump in and get involved with, and most everything in my life started going on hold... including my book, and all the community stuff I wanted to do.  I got through writing chapter 4 back in February, and that's still where I'm at.   But I'm excited to announce having 2 uninterrupted days per week to focus just on going after making this happen.  I couldn't be more excited!

Since then the ideas started whizzing around in my head again.  After going to lunch with an old friend of mine, I was reminded of an idea I had been struggling with.  It had been at least a month, trying to make sense of something I didn't quite get, without words to describe what I thought was there.  But like a flash in my mind, he gave me a piece to the puzzle I was missing.  It was beautiful.

He was reading Thinking Fast and Slow, a book I read half of, and actually put down.  He said, "Association triggers from specific to general."  And that it had really struck him, "specific to general."

After reading On Intelligence, and working through the Art of Learning myself, I had been thinking about memory sequencing, and how it might impact our ability to recognize patterns.  Like when you see some messed up code, why is that sometimes an idea pops into your head about how to solve it, and other times it doesn't?  Why does this seem to happen more in some developer's heads than others?  Is this something that can be learned?  And can we learn how to do it faster?  This was the puzzle I was working on.

Specific to General.

We teach developers design patterns by handing them a book of design patterns.  A collection of "aha" moments from our predecessors.  But then armed with our new knowledge, we don't seem to have an ability to apply them.  When we see our own code, with our own problems, that flash of insight just doesn't happen.  Then for some, with experience, it happens.

Well, you could just say it's experience.  But couldn't we tailor the creation of the right experiences so that you would learn what you need to learn faster?

My friend had the key to my puzzle.  Specific to General.  The memory sequencing is critical, and the specific sequence of recognition is the opposite of what we teach.

If I want to have insight that leads to a design pattern, I need to experience specific problem instances that map to the more general pattern.  When I scan the code, the similarity of structural pattern to my specific memory should trigger recognition.

Going to have to pick up that book again I think. :)