Software Factories
By David Alan Grier

software programmerA recent article on the Google software repository revealed something of the current state of software development. It showed the distance that software production had traveled since its origins in the 1950s and how it keeps returning to a basic set of ideas about the organization of work, ideas that software has fundamentally transformed.

The Google software repository is the central resource for most of Google’s 25,000 software developers. It holds the bulk of the source code that has been generated or acquired by the company. It coordinates work of each programmer, supports development teams, and enforces standard company workflows: the ways that software is created, reviewed, approved, and released. Most interestingly, the single repository unifies production. It pulls the programming staff into a common organization in much the same way that a factory unifies production for a traditional manufacturer. Indeed, the Google code repository might give us the best example of how we can organize a factory for software.

The idea of a software factory dates to the earliest years of the computer era. Almost from the start, computer developers felt that we would eventually have two kinds of factories, one for computing hardware and one for software. We “were all engineers and had been trained to organize our efforts along engineering lines,” explained a leader of one of the very first large software projects. Hence, they undertook software development with the same kind of approach that they brought to hardware.

However, for the first 20 years of the computing era, software developers found it difficult to create inexpensive and effective software in a factory environment. As they tried to create a factory environment for software production, they found that software production was far more complex than most manufacturing. Fundamentally, traditional manufacturing relies on a simple division of labor and has a simple means of coordinating the workforce. It divides labor following the divisions of the manufactured good. Just at that good can be divided into parts, so the workforce can be divided into teams that are assigned to each part. A factory organizes these teams using the chronology of final assembly. Each part is added to the final product in a specific order.

Software production proved to be far more complex and fluid than traditional manufacture. Instead of using one way to divide labor, it used a few traditional means of splitting a workforce. You can divide workers by time or space, by materials or teams, and finally by process. Programmers can work day or night, near to the main office, or far from it. They generally work on only one part of the system and have a fixed team of colleagues.

However, for most software projects, these divisions are rarely fixed. A programmer can change schedule, move from one part of the country to another, make a change to a small part of the code that affects the rest of the system, and even change teams from day to day. The only aspect of software development that tends to be fixed is the last tool for dividing labor, process. Software development teams tend to establish a common process at the start of a project and follow it. Hence, by the early 1980s, software developers had concluded that a software factory was an institution that was organized around a common process. A “Software Factory consists of an integrated and extensible facility of software development tools that supports a recommended methodology,” argued one software engineer of the era.

The division of labor is only one aspect of a factory. A second aspect, far more complex, is the task of coordinating labor. For most of the 20th century, industrial engineers have worked to coordinate labor in ways that minimized the exchange of information. The assembly line is the best example of how they have been able to minimize the interaction between workers. On an assembly line, a worker occupies a single place on the line and does a single task. When the unfinished product arrives, the worker completes the task and sends the product to the next station on the line. That worker never interacts directly with others on the line.

From the start, software developers have known that they needed some kind of communications device to coordinate the work of their team. During the early years of the computing era, many developers used a physical notebook to control the project. That notebook would contain the design specifications, documentation, reports on completed code, changes to the design, and all other information that would be relevant to the development staff. However, for many projects, a physical notebook was simply not adequate to track the dynamic nature of software production. One early operating system developer reported that his notebook grew quickly and overflowed a single binder. Eventually, it grew to occupy five feet of shelving and required changes to 150 pages a day.

If a software factory contained a dynamic division of labor, a set of common tools, a means of coordination, and an enforced common process, it would not be built out of bricks and mortar. In roughly 1984, software developers started creating systems that would track the source code for a new system, coordinate changes, document the software, and provide a means for testing changes. Many of the ideas in these systems had been tested before, but they started to be combined into unified systems. Two important software repositories of the 1980s were the Revision Control System, created at Purdue by Walter Tichy, and the Concurrent Version System or CVS, which was built in Amsterdam by Dick Grune.

During the 1990s, the CVS system was perhaps the most important of these systems. It was expanded by a number of developers and was distributed by the Free Software Project, the GNU organization. Developers could download the system for free, even if they were using it to create commercial software. It also became a common tool of a form of software development that started to grow during the 1990s, open source development. The proponents of open source wanted to create software that was freely distributable and freely modifiable. The GNU Organization, the Free Software Project, was an early example of open source. More current examples include the Mozilla Web browser, the Apache HTML server, and the Linux operating system.

One of the more popular software repositories, GIT, was created to support the Linux operating system. Linux has been supported by a community of programmers scattered around the world. It needed a tool that allowed developers to download code, test new ideas, show those ideas to others, and allow project leaders to accept or reject those ideas. For a time, Linux used CVS but created GIT in 2005 and used it to support its development.

Currently, there are about two dozen different software repositories in common use, with GIT perhaps being the most well-known. Some are commercial. Some are open source. Some, such as the Google repository, are private and not used by others. They generally have a common set of features that store source code, trace development, document work, allow for review, and provide means of accepting or distributing completed software. Conceptually, they tend to have three major classes of differences. First, they tend to have different ways of tracking and integrating changes. Changes can be particularly challenging because they can produce two versions of the code that are incompatible. The repositories try to avoid this problem by forcing programmers to work on independent copies of code or by giving priority to the changes of one developer or by tracking each and every change.

Second, source repositories can be structured in different ways to make the best use of computing resources. The Google repository is unusual in that it contains most of the company’s code. Most repositories are smaller and focused on a single project or topic.

Finally, these repositories can have different ways of verifying, testing, and accepting the final code. The Google repository automates most of these steps, forcing its programmers to follow a well-delineated common process. Other repositories, such as the publicly accessible GitHub, are not so disciplined.

Software repositories are not really the software factories that were envisioned by the early software developers, though they are perhaps close. Those developers made a clear distinction between programming and coding. To them, programming included the work of designing the software and designing how that software would be constructed. In contrast, coding was a more mechanical task, the work of completing the individual instructions. Programming would be done by engineers. Coding would be done by workers.

Nearly 70 years of experience have taught us that you create a system at least three times. (Arguably, you can say you create it eight times, but we will keep the example simple). You create it once when you design it. You create it a second time when you code it. You create it a third time when you debug it. All of these steps require intellectual input. All require creativity. All require discipline. None is the contribution of an unthinking worker toiling on an assembly line.

Yet, software repositories remind us that software development is a form of production. That it requires a range of skills. That is needs a sophisticated division of labor, a flexible way of coordinating labor, and a well-designed process for validating the work. In the software age, those tasks can only be done with the aid of software. Just as the factory transformed the kind of production that once done by a family business, so software repositories expand the nature of production that was once done under a factory roof.


David Alan Grier circle image

About David Alan Grier

David Alan Grier is a writer and scholar on computing technologies and was President of the IEEE Computer Society in 2013. He writes for Computer magazine. You can find videos of his writings at video.dagrier.net. He has served as editor in chief of IEEE Annals of the History of Computing, as chair of the Magazine Operations Committee and as an editorial board member of Computer. Grier formerly wrote the monthly column “The Known World.” He is an associate professor of science and technology policy at George Washington University in Washington, DC, with a particular interest in policy regarding digital technology and professional societies. He can be reached at grier@computer.org.