I'm doing some research on how teams think about older codebases, and I'd love everyone's take on this. No wrong answers, just trying to understand how different teams or organizations define this.
No one has maintained it in a while, and if a change is needed, it seems like a full re-write would be the better option.
Or the whole company has moved onto something new, and you’re maintaining the code to keep the legacy systems going until everything has migrated off it… so it needs to run, but no one really cares.
To me 'Legacy code' means the person who wrote it (or most of it) is not fixing / changing it.
It implies you cannot ask them WTF is this. Or more critically Why did you do this? Which of course is critical to avoid the 'Chesterton's Fence' Which I learned about here on HN and find it such a critical aspect of documentation now.
Having worked on code that was anywhere from 1 to 30 years old, in my experience "legacy" mostly means, "we wouldn't do it this way now, but it's not an easy thing to fix so we're either stuck with it, or stuck with it for a while anyway".
"Legacy" means "don't blame me, I'm not saying it should be this way, just that is is this way".
Which is interesting because outside of code, the word "legacy" is usually positive.
“legacy” means that tens if not hundreds of people made something absolutely ridiculously amazing that has stand the test of time and probably made lives better for millions of people (depending on how large the customer base is)
Michael Feathers, author of the book "Working Effectively with Legacy Code", [f] defined legacy code as code without automated tests.
From memory, the argument is that once you have automated regression/characterisation tests with sufficient coverage around the part of it you need to change, you're in a good position to attempt to make a change since you will be alerted if you are unintentionally introducing regressions. Without automated tests you can't efficiently and confidently make changes without significant risk of regressions. Feathers' book discusses testing and refactoring techniques to take OOP-ish application code that has no automated tests and get enough tests in place to let you change what needs to be changed in a controlled way.
A consequence of Feathers' definition of legacy code is that fresh code you wrote yesterday in a trendy programming language still gets classified as legacy code if you didn't also write the supporting test suite to help others maintain it in future! It's not a perfect definition, some might find it provocative, but it's both pragmatic and actionable.
A different take on legacy code could be something like Peter Naur's paper 'Programming as Theory Building' [n]. I don't believe Naur specifically discussed legacy code, but the rough idea is that each program has a corresponding theory (relating to the problem domain, organisational constraints, the implemented solution and its structure, alternative solutions that were considered and rejected, etc). Some of this theory can be written down in artefacts such as design docs, requirements or comments in the code, but for a software project to be alive, the theory needs to live inside the heads of enough of the team who are building and maintaining the software. If that is lost (e.g. the team departs and a fresh team is hired to replace them) then no one in the new team may understand the "theory" so the software project is effectively dead until the new team learns the problem and solution space and develops their own new theory.
I'd regard such a "dead" software project where none of the current maintainers understand the theory behind the existing code as being legacy code -- but this is a joint property of the people working on the codebase and the artefacts (source code, design docs etc), it isn't a property of the code base in isolation. Maybe "legacy code" is the wrong way of framing it as it misses the importance of the relationship between the maintainers, the codebase and the surrounding context, and something like "dead project" is a little more helpful.
Or the whole company has moved onto something new, and you’re maintaining the code to keep the legacy systems going until everything has migrated off it… so it needs to run, but no one really cares.
It implies you cannot ask them WTF is this. Or more critically Why did you do this? Which of course is critical to avoid the 'Chesterton's Fence' Which I learned about here on HN and find it such a critical aspect of documentation now.
"Legacy" means "don't blame me, I'm not saying it should be this way, just that is is this way".
Which is interesting because outside of code, the word "legacy" is usually positive.
If you don’t design and write code with that in mind you’re basically planting landmines for your future self.
Interfaces are the only stable ground.
From memory, the argument is that once you have automated regression/characterisation tests with sufficient coverage around the part of it you need to change, you're in a good position to attempt to make a change since you will be alerted if you are unintentionally introducing regressions. Without automated tests you can't efficiently and confidently make changes without significant risk of regressions. Feathers' book discusses testing and refactoring techniques to take OOP-ish application code that has no automated tests and get enough tests in place to let you change what needs to be changed in a controlled way.
A consequence of Feathers' definition of legacy code is that fresh code you wrote yesterday in a trendy programming language still gets classified as legacy code if you didn't also write the supporting test suite to help others maintain it in future! It's not a perfect definition, some might find it provocative, but it's both pragmatic and actionable.
A different take on legacy code could be something like Peter Naur's paper 'Programming as Theory Building' [n]. I don't believe Naur specifically discussed legacy code, but the rough idea is that each program has a corresponding theory (relating to the problem domain, organisational constraints, the implemented solution and its structure, alternative solutions that were considered and rejected, etc). Some of this theory can be written down in artefacts such as design docs, requirements or comments in the code, but for a software project to be alive, the theory needs to live inside the heads of enough of the team who are building and maintaining the software. If that is lost (e.g. the team departs and a fresh team is hired to replace them) then no one in the new team may understand the "theory" so the software project is effectively dead until the new team learns the problem and solution space and develops their own new theory.
I'd regard such a "dead" software project where none of the current maintainers understand the theory behind the existing code as being legacy code -- but this is a joint property of the people working on the codebase and the artefacts (source code, design docs etc), it isn't a property of the code base in isolation. Maybe "legacy code" is the wrong way of framing it as it misses the importance of the relationship between the maintainers, the codebase and the surrounding context, and something like "dead project" is a little more helpful.
[f] https://www.oreilly.com/library/view/working-effectively-wit... [n] https://pages.cs.wisc.edu/~remzi/Naur.pdf