I have a new rule of thumb:
I love metaphors, and this one is great.
Technical Debt is the idea that as you develop software, any corners that you cut can be thought of as borrowing time, which incurs interest (maintenance cost) and must sooner or later be paid back (by refactoring). So, creating crufty code because it’s quicker often results in having to go back and re-do it.
Fine, but why does the metaphor stop there? Everyone loves beautiful code and elegant algorithms, sure. As a profession we don’t like mess, and we respect clever ideas than can do complex things without that mess. Makes sense.
However, debt has a positive side too. If you have a business idea but don’t have the start-up capital, you borrow it from someone who does. It saves you time, and possibly enables you during a window that might be closed 6 months later. Without that debt you would have missed your chance. If you want to buy a home, you either borrow and live in your own home for most of your life, or save up the long way and are only able to afford your home by the time you retire.
The same concepts exist in software.
Similarly if you’ve got a project to get out, without being able to predict the future you’ll have a hard time designing some components, as you don’t always know how their use will change over time. By borrowing a small amount of time early on, you can often save yourself from wasting time trying to cover every eventuality and getting lost in a maze of total abstraction. By the time you actually get to the later stages of the project you’ll often find that you were completely mistaken about the likely outcome… and that time you’d spent refactoring early was wasted.
By borrowing time you often gain. Think of it less like ‘technical debt’ and more like ‘technical leverage’. Don’t waste it by paying it off before you’ve had a chance to benefit.
I recently came across this 2004 Python recipe by Douglas Bagnall that demonstrates a technique for statistical language detection using tri-grams.
Tri-grams (a subset of n-grams) are basically three character sequences. The idea is that given a selection of documents in known languages you can figure out the frequency of each three-character sequence for each language. Once you’ve got a frequency distribution for each language, and an idea of which trigrams regularly follow with other tri-grams, you can then assess the probability that a body of text in an unknown language is written in any specific language.
The beauty of this system is that you don’t need to maintain large dictionaries for each language, just a single number for each tri-gram / language combination. You can see a similar system in action on Google’s AJAX Language API.