Ever since I stumbled across a question from another blog (linked here),
Why is there so much duplication in modern code?
, I can’t stop thinking about this conundrum. The article gives a small example of a web app with a back-end service. Looking at the app, and at the question, I have a variety of responses…
Taking the question as a directive, I find two reactions.
First there is a certain visceral level of agreement. Yes, this app is using pedantic standards where it doesn’t have to. It can be cleaned up by deduplicating the code drastically, one could just access the database straight from the front-end app, without all the stuff in between. But, as taught in most CS/IT curricula, that is a very naive way to view code, because it completely ignores things like security, caching, state management, etc. and will turn developing and debugging into a nightmare when those realizations become apparent.
Note, however, that it’s not always naive to think this way. More on that later.
Secondly, when thinking a little longer, it actually does get me to a no. The app is designed around established principles that have been tried and tested. Back-end libraries were written by others, used for other systems successfully, so there no real reason to change a winning team. And for a web app, that’s probably the sanest response. A web developer doesn’t have the luxury or understanding to meaningfully dive into deep system- and architecture-level topics.
Alright, but when taking the question literal, I have a few more thoughts. Why do the same development patterns appear in our work at different stages and levels, in different languages and architectures? Let’s explore…
Getting Things Done
Surely some of it has to do with experience. If there is one trusty old way to do something, people will see no reason to learn a new more inclusive/abstract method. In the example, both the front-end and back-end developer will use their trusty old ways to do things, and there might be a huge overlap.
Would it help to put them in a room and discuss all details of their implementations and how they could re-use each other’s code better? That doesn’t sound like something financially interesting, or worth while for the client (who traditionally tends to not care about any of this).
Innovating, Standardization and Kingdoms
Now, an altogether different reason for people to duplicate effort, rebuild things, or re-invent wheels, is simply because it doesn’t yet exist in the context they’re working in. For instance, when new hardware appears, or new capabilities become available. Or, when the tool/language is being used for things it was never intended…
I’d like to put forward that, the past 70 years of computer development is to a large extent exactly that: We tried to use the tools and languages for things they were not yet intended. We improvised, we innovated.
We just didn’t know what we were doing…
Remember when 3D graphics were not possible inside the browser? Or when different manufacturers had support for different features? The Browser Wars? Or when Java was launched to “run everywhere,” except it notoriously lacked support for anything other than the AWT GUI? Or even further back: BASICODE, the ‘standardization’ of home computer BASIC dialects, so a BASICODE program would run everywhere…
None of it ever really worked because one year later, everything about computers had changed so much, that the standards became obsolete.
I remember the introduction of XML standard in 1998. Finally a format that was both human-readible and machine-readible, they said. I did not understand the big deal. Why was everyone so proud of a very verbose and limited way to describe a tree structure? Well, apparently this was about the only thing people could actually agree on in those days.
Then there are the various corporations. In the 1980s, everything was done with C, and C++ was just an idea. C wasn’t really enough for business, so Microsoft and Apple designed their own practical extensions to C. Microsoft made COM, and Apple made Objective-C. Marketing and business practices throughout the past 40 years have fiercely protected these software kingdoms, so that even today, you inevitably have to use duplicated code.
The same thing happened with mobile phones and VR headsets. Protected software kingdoms that have their own version of everything.
Different Strokes
Then there is a distinction between different fields, where software is used in various traditional ways. In science labs, it was all UNIX and we used Lisp, Fortran and C++. Each with very peculiar and long traditions. Fortran and C++ came with a wide variety of scientific math libraries for equation solving, vector calculus, matrix inversion, etc. This is how you would process measurements in an experiment.
Across the hallway, at the CS department, in the new Computer Graphics lab course, they also used matrices and vectors, but different, and not with those standard libraries… The CS people also had peculiar ideals about lists, trees, heaps, etc. These were generally implemented using pointers, so that one could do arbitrarily large search operations on a supercomputer.
In game development, we went from handwritten assembly to C, to C++, and later Lua and C#. Each new game engine had newly minted 1960s/1970s solutions to various memory and graphics problems that appeared while trying to squeeze the last cycle out of the consoles. Consoles that would last up to 3 years, before they had to make way for completely new architectures. Since the early 2000s, the vector and matrix operations are done mostly on the GPU, which is, again, a completely different architecture. Ah, yes, and game developers tend to implement lists, trees, heaps not with pointers, but much more direct and cache-friendly.
Game development nowadays tends to be called Unity or Unreal. Both provide an entirely curated ecosystem of often duplicated code and traditions. In addition, Unity and Unreal are focusing on bigger things, interfacing with robotics, AR/VR, web, etc.,
where again, code is duplicated.
Robotics does fast calculations as well (to control limbs and actuators), but might be controlled mostly from Python and web interfaces to humans. Depending on the flavor or focus of the robots, there are some wild cultural differences here too. Using ROS is a staple (and also a duplication of some standard communication protocols), but some people don’t like ROS, and have their own contraptions, or use web technology. Microsoft and NVidia have their own special breed of ROS, or ROS-like toolset. Again, duplicating a lot of code. Self-driving car companies most likely don’t use ROS at all, but something more secure and CAN-compatible. Boston Dynamics probably also have their own toolset… duplicated.
So?
Do we duplicate a lot in our code, as a whole? Yes. Can that be avoided or mitigated? Well, whichever the reason for duplication (practicality, standardization, innovation, kingdoms, cultures, etc.), it seems to be unrelated to planning and oversight, where one could easily spot duplication, and possibly fix it.
So, duplication might not be easily avoided. Nevertheless, our best bet is probably restricting to open source environments, and active, well-maintained public projects. With good communication among the teams, it is possible to refactor out the duplications, but this doesn’t sound very convincing to a CFO or funding agency just yet.
Also, looking at the horizon of ubiquitous AR/VR applications and even smaller mobile devices, I can’t help to think that we might not be out of the innovation woods just yet, and some stubborn duplication will still show up here or there…