Monday, September 28, 2009

Temple Smith, Bioinformatics Pioneer

I had the unbelievable privilege of spending my postdoctoral years working for Temple Smith, studying 3D protein structure prediction and multiple protein structure alignment. I can honestly say that those are still some of the best and most rewarding years on my professional life. This is due almost entirely to Temple's amazing brilliance, perspicacity, encouragement, and generous nature. Temple became a professor emeritus at Boston University last Friday, which was celebrated with a remarkable seminar series of talks given by his students, collaborators and colleagues. The depth and breadth of the presentations were remarkable, as were the collection of luminaries, each of which who demonstrated the amazing impact Temple has had on the science of bioinformatics. Equally wonderful was how each of these successful researchers in their own right acknowledged the support and inspiration that they have drawn for their own work from Temple and his original contributions to science.

Mike Waterman shows a slide of he and Temple at Los Alamos, NM
Summer, 1980, in a photo taken by David Lipman

Temple is (rightly) famous for his fractious demeanor, and his willingness to question the status quo of any situation, scientific or otherwise, and for his iconoclastic "cowboy" behavior and dress. This, as was acknowledged by all the speakers, hides an open and giving heart, and a true and deep desire to see those with whom he works succeed. Temple likes nothing more than to "stir the pot" and upset the commonly-held wisdom, something that he still continues to do with remarkable efficacy.

No article or post about Temple would be complete without an anecdote, so I will relate one of my own experiences several years ago. I was a postdoc in Temple's lab, and was being recruited by one of the large East Coast pharmaceutical companies. and I was going to have lunch with their executive recruiter in Boston. We were to meet at the lab, and go from there to lunch. As we were walking out of the offices, we passed the mailbox, where a gentleman in a mustache, serape and cowboy hat was picking up his mail. This person preceded to ask who the "character in the suit" was, and what I was doing with him, and then bustled past before I could make any introductions. As the recruiter and I walked out of the office, he asked me who "that guy" was, and I responded by asking him if he was familiar with the Smith-Waterman equation. My executive friend indicated that he was, and that knowledge of the same was a prerequisite for the job I was being considered for. When I told him that "that guy" was Smith, the look on his face was priceless, and something that I treasure to this day.

It was great talking to Temple on Friday, and getting a chance to catch up with so many of the remarkable people who has has taught and with whom he collaborated with over the years. It is a wonderful legacy, and a fantastic group of friends - we all expressed a hope that Temple become an emeritus professor every year. I certainly owe what I have been able to achieve in my own career to Temple, and am glad to wish him a very happy retirement!

Monday, September 21, 2009

Shan Zhai Bioinformatics

Can't argue with success: bioinformatics software that produces results favors agile, time and cost-effective work by staff very close to (or even participating in!) the research activity. Some potential inspirations come from interesting places. A recent article on China's Shan Zhai in Strategy & Business magazine (formally our own house organ here at Booz Allen Hamilton, but now run out of Booz&Co.) and some very interesting commentary throughout the blogosphere got me thinking about the very different approaches to software development used by the bioinformatics community. The term shan zhai literally translated, means "mountain fortress" and suggests banditry and lawlessness, andhas come in recent years to be associated with knock-off Western consumer products (iPhone, imitation is the sincerest form of flattery...) but has also come to be associated with a kind of native Chinese cleverness and DIY hackery. Much like the term bricolage in French, shan zhai practice an iterative, just-in-time form of product development, often targeting almost invisibly small communities with their products. This leads to remarkably nimble business behavior, although sometimes at the cost of the ultimate quality, manufacturability or broad relevance of the resulting products. Like the open source development community worldwide, the shan zhai make a practice of sharing information on materials and construction of their products. This level of information sharing is unheard of in the commercial world, but it allows the rapid creation and iteration of products from a geographically distributed community. In doing so, this community relies on emergent properties of this process to distribute and improve product ideas and concepts, rather than the planned approach common to the large multinational companies that produce most consumer goods worldwide.

This blog post made a fascinating connection between the shan zhai and Situated Software, a concept floated by Clay Shirky several years ago. Situated software is software that is designed for use by a specific social group - rather then designed for generality. As such, it can be developed quickly, and usually iteratively, by the community by which it will be ultimately used, or at least with that community's direct participation in the development process. This approach has significant weaknesses from a commercial point of view, which often depends on scalability and re-use to maximize the profits resulting from the initial investment of engineering effort. It also can lead to software which must be continuously managed and maintained in order to stay useful and relevant. The recent surge in websites and interactive services that are focused on particular communities of use is an example of this situated approach. With the proliferation of rapid-development software tools, and the easy interfacing made possible by REST web services, and flexible and scalable hosting provided by inexpensive webhosts and even cloud providers like Amazon and Google, this kind of development has already resulted in a wide range of remarkable and community-specific software. Everywhere you look on the web, there are niche services and sites that are directed at very specific user-communities from artists to autopilot enthusiasts and community activists of all stripes.

Which brings my back around to bioinformatics. Bioinformatics is a classic case of software that is often developed for use either by or for a very small community of users- often just a single lab or individual scientist. Since bioinformatics software is regularly developed to support scientific research activities, and most scientific research activity is by necessity bespoke and unique, it is not surprising that a lot of bioinformatics software development is itself also bespoke and small-scale. When one looks at the mixed success of commercial, large-scale software development in bioinformatics, one can see why smaller scale, situated efforts are often more successful in solving the day-to-day problems that face life scientists. Such efforts, when taken in whole, are often less efficient and more repetitive than those done in a more conventional fashion, but the costs associated with individual small-scale development can be lower than those of larger-scale efforts, since they rely upon graduate students, post-docs and small contractors, all of whom are notorious for their low cost.

The challenge, of course, is finding a path between situated, small-scale/low-cost development and scalable, reusable data and systems upon which larger scale efforts such as comparative effectiveness research, translational medicine and molecular clinical research can rely. The real payoff for bioinformatics and the research that it supports is that which leads to the development of new therapies and molecular markers for disease and treatments. Given the wide range of participants involved in such research, and their distribution both geographically and in discipline, the need for scalable, secure and standardized systems begins to transcend what can be effectively (or cost-effectively) be done by the small individual researcher acting alone. Now we look at the shan zhai. They succeed in a field which is notorious for standardization, large-scale, and risk-averse behavior. Consumer electronics has gone long past the days of small, garage operations dominating the field. But the shan zhai have identified the market inherent in the long tail of consumer electronics purchasers, those that want/need something just a little different (even if what is different is the lower price implied by rampant IP violation - I am not advocating piracy, merely indicating that the shan zhai are able to nimbly respond to a market demand for it.)

Key to the success of the shan zhai is their ability to share effectively with their colleagues throughout China and beyond. What they have recognized is that they can be more effective by reusing standardized components and data resources, and often improving them and sharing the result openly, than they can by fiercely protecting what they know as individuals or small companies. This leads to an important observation for those of us developing bioinformatics software and data resources. As we design and build standardized platforms and supporting infrastructure which can facilitate the development of bioinformatics software, it is critical to consider the means by which these tools can be flexibly and easily re-used by the many small developers that comprise the majority of the bioinformatics community. If the goal is to facilitate the development of standards-based software, and to ensure that the data collected by the community is made available using standardized and future-proofed components and representations, then we have an obligation to ensure that the tooling that we provide can be used by not only large cadres of professional software engineers, but also the graduate students and post-docs who are the bricoleurs constructing most of the software that is in use at any given time by the scientific community.

As we continue to develop software and systems that support the needs of translational research, comparative effectiveness research and other transdisciplinary medical and scientific work, we need to constune to ensure that along with well-recognized standards, we need to provide equally standardized, and easy-to-use interfaces that reflect the needs of those deploying the research solutions to the end-users. Those end-users are often just a single lab or even a single researcher, and the developer is a post-doc or graduate student who is doing his or her work using some of the simplest tools available- often no more than perl or PHP and a website. If we can support these folks with resuable modules that they understand, and provide a platform with which they can receive help and share results, we will have gone along way to create or own shan zhai, and can begin to reap the rewards of doing so.

Wednesday, September 2, 2009

The PHIN Grid (and other great stuff)

I was at this year's Public Health Information Network (PHIN) conference again, held without fail in Atlanta (always on my kid's first week of school.) This meeting is always interesting to me because of the breadth of participants and the depth and importance of the topics. This year, with H1N1 flu topmost in everyone's mind, there was a special urgency from many of the participants to get the informatics of many public health efforts operating as efficiently and effectively as possible. Key to these efforts was leveraging the products of many other programs, with federal and local informatics teams often making extensive use of open source tools and technologies- the resulting talks were both inspiring and cool.

I attended as many of the grid- and cloud-related talks as I could fit in to the conference schedule, and was rewarded with a truly remarkable view of how the stakeholders throughout the PHIN enterprise have been able to leverage technology products from a wide range of programs to satisfy their unique public health requirements. It was inspiring to hear how the CDC has been using many of the tools developed by the National Cancer Institute's caBIG program, especially key parts of the caGrid infrastructure. Equally cool was how many of the key participants in the caBIG program have been directly involved in leveraging those capabilities in an entirely new setting. In particular, Tom Savel and his team talked at length in a number of sessions about using these Grid tools to implement a range of services, and about the challenges in using them in a public health setting, such as security and reliability . Hearing about how familiar tools like GAARDS, Grid Grouper and Introduce are being used in a new community was well worth the trip, as was hearing how facilities that the caBIG program has implemented, such as the Knowledge Centers, are providing important means of support in diverse settings of national (and even international) importance.

As much as I appreciate hearing the accolades and credit given to the caBIG program (and I do!) and as much as it is rewarding to see our community providing support to these important areas, I am reminded of how teams providing software and tools must continue to improve and iterate that software, and continue to ensure that the communities using these systems do not become detached from the processes used to create them. The Knowledge Centers and their staffs are clearly leading the way here, and it will be critical for them (and us) to continue to listen closely to what is happening "out there" and ensure that the needs get "in here." To that end, though, it is both inspiring and heartening to see that we have a growing, talented, committed and capable group who will not only consume the resources that the caBIG program's participants create, but who can also significantly contribute to the infrastructure's development. The challenge for caBIG is to come up with the most effective possibly means of incorporating these contributions, and ensuring that they mesh, support and inform the program going forward.

Onward and upward, PHIN! Every time I sneeze now, I am going to wonder if I can enter that event on some aspect of the emerging public health Grid.