Can't argue with success: bioinformatics software that produces results favors agile, time and cost-effective work by staff very close to (or even participating in!) the research activity. Some potential inspirations come from interesting places. A recent
article on China's Shan Zhai in Strategy & Business magazine (formally our own house organ here at Booz Allen Hamilton, but now run out of Booz&Co.) and some very interesting commentary throughout the blogosphere got me thinking about the very different approaches to software development used by the bioinformatics community. The term shan zhai literally translated, means "mountain fortress" and suggests banditry and lawlessness, andhas come in recent years to be associated with knock-off Western consumer products (iPhone, imitation is the sincerest form of flattery...) but has also come to be associated with a kind of native Chinese cleverness and DIY hackery. Much like the term
bricolage in French, shan zhai practice an iterative, just-in-time form of product development, often targeting almost invisibly small communities with their products. This leads to remarkably nimble business behavior, although sometimes at the cost of the ultimate quality, manufacturability or broad relevance of the resulting products. Like the open source development community worldwide, the shan zhai make a practice of sharing information on materials and construction of their products. This level of information sharing is unheard of in the commercial world, but it allows the rapid creation and iteration of products from a geographically distributed community. In doing so, this community relies on emergent properties of this process to distribute and improve product ideas and concepts, rather than the planned approach common to the large multinational companies that produce most consumer goods worldwide.
This
blog post made a fascinating connection between the shan zhai and
Situated Software, a concept floated by Clay Shirky several years ago. Situated software is software that is designed for use by a specific social group - rather then designed for generality. As such, it can be developed quickly, and usually iteratively, by the community by which it will be ultimately used, or at least with that community's direct participation in the development process. This approach has significant weaknesses from a commercial point of view, which often depends on scalability and re-use to maximize the profits resulting from the initial investment of engineering effort. It also can lead to software which must be continuously managed and maintained in order to stay useful and relevant. The recent surge in websites and interactive services that are focused on particular communities of use is an example of this situated approach. With the proliferation of rapid-development software tools, and the easy interfacing made possible by REST web services, and flexible and scalable hosting provided by inexpensive webhosts and even cloud providers like Amazon and Google, this kind of development has already resulted in a wide range of remarkable and community-specific software. Everywhere you look on the web, there are niche services and sites that are directed at very specific user-communities from
artists to
autopilot enthusiasts and community activists of all stripes.
Which brings my back around to bioinformatics. Bioinformatics is a classic case of software that is often developed for use either by or for a very small community of users- often just a single lab or individual scientist. Since bioinformatics software is regularly developed to support scientific research activities, and most scientific research activity is by necessity bespoke and unique, it is not surprising that a lot of bioinformatics software development is itself also bespoke and small-scale. When one looks at the mixed success of commercial, large-scale software development in bioinformatics, one can see why smaller scale, situated efforts are often more successful in solving the day-to-day problems that face life scientists. Such efforts, when taken in whole, are often less efficient and more repetitive than those done in a more conventional fashion, but the costs associated with individual small-scale development can be lower than those of larger-scale efforts, since they rely upon graduate students, post-docs and small contractors, all of whom are notorious for their low cost.
The challenge, of course, is finding a path between situated, small-scale/low-cost development and scalable, reusable data and systems upon which larger scale efforts such as comparative effectiveness research, translational medicine and molecular clinical research can rely. The real payoff for bioinformatics and the research that it supports is that which leads to the development of new therapies and molecular markers for disease and treatments. Given the wide range of participants involved in such research, and their distribution both geographically and in discipline, the need for scalable, secure and standardized systems begins to transcend what can be effectively (or cost-effectively) be done by the small individual researcher
acting alone. Now we look at the shan zhai. They succeed in a field which is notorious for standardization, large-scale, and risk-averse behavior. Consumer electronics has gone long past the days of small, garage operations dominating the field. But the shan zhai have identified the market inherent in the long tail of consumer electronics purchasers, those that want/need something just a little different (even if what is different is the lower price implied by rampant IP violation - I am not advocating piracy, merely indicating that the shan zhai are able to nimbly respond to a market demand for it.)
Key to the success of the shan zhai is their ability to
share effectively with their colleagues throughout China and beyond. What they have recognized is that they can be more effective by reusing standardized components and data resources, and often improving them and sharing the result openly, than they can by fiercely protecting what they know as individuals or small companies. This leads to an important observation for those of us developing bioinformatics software and data resources. As we design and build standardized platforms and supporting infrastructure which can facilitate the development of bioinformatics software, it is critical to consider the means by which these tools can be flexibly and easily re-used by the many small developers that comprise the majority of the bioinformatics community. If the goal is to facilitate the development of standards-based software, and to ensure that the data collected by the community is made available using standardized and future-proofed components and representations, then we have an obligation to ensure that the tooling that we provide can be used by not only large cadres of professional software engineers, but also the graduate students and post-docs who are the
bricoleurs constructing most of the software that is in use at any given time by the scientific community.
As we continue to develop software and systems that support the needs of translational research, comparative effectiveness research and other transdisciplinary medical and scientific work, we need to constune to ensure that along with well-recognized standards, we need to provide equally standardized, and easy-to-use interfaces that reflect the needs of those deploying the research solutions to the end-users. Those end-users are often just a single lab or even a single researcher, and the developer is a post-doc or graduate student who is doing his or her work using some of the simplest tools available- often no more than perl or PHP and a website. If we can support these folks with resuable modules that they understand, and provide a platform with which they can receive help and share results, we will have gone along way to create or own shan zhai, and can begin to reap the rewards of doing so.