Monday, August 16, 2010

A Leadership Role for Informatics



Nosing though my copy of the Harvard Business Review this month, I read a really interesting article on my caBIG friend and colleague, Dr. Laura Esserman. Laura is a cancer surgeon and the director of the Carol Franc Buck Breast Care Center at UCSF. What is really cool is that she is also the Associate Director of Medical Informatics at the UCSF comprehensive cancer center. This is not only a remarkable achievement, but is also an important demonstration of what it takes to get biomedical informatics onto the agenda of the clinical and medical research leadership within academic medical centers like UCSF. By bridging the two worlds (and by playing a serious game of hardball, as per the HBR piece) Laura has been able to make significant progress in her efforts to develop meaningful informatics-backed translational research and bring it to the forefront in fighting cancer.

This closely matches what we hear in the field- informatics teams that are closely tied to the specific needs of their stakeholders, and have strong support from their academic and administrative leadership at the highest levels can do really remarkable things. Those that are either far removed from the medical and scientific community at their institutions or do not have the ear of senior leadership have far less success in their institutions, and are less able to participate in the kind of exciting change that Laura and her team have been creating.

The funny thing is that getting the ear of senior leadership and being in a position of working closely with the clinical and scientific stakeholders in an institution is not as hard as you might think. In fact, as one of my mentors told me many years ago- if you want to see your ideas flourish, and have informatics and IT participate in the strategic development of the company, you have to first "make sure that email works." This mentor was the CIO for many years at one of the largest personal products companies, and as such had a lot of experience dealing with leadership for whom IT was not always considered a key part of the corporate mission. By knowing what the simple necessities are for the scientific, clinical and administrative leadership and making sure that they work consistently and well is the ticket to allowing informatics and IT to more fully participate in the mission of the institution and to realize the amazing progress that is possible- just like Laura has.

Monday, January 25, 2010

Thinking about the Shift Index

The concept of the "Shift Index" has been making the rounds of management strategists and consultants, and I have been thinking about how it applies to biomedical informatics. For those who keep track of such things, American companies have lost as much as 75% on their "Return on Assets"," or ROA, since 1964. That is a pretty stunning number, and a recent article in the Harvard Business Review discusses some possible reasons why that is. Hagel et al., the authors of the piece, identify a shift of power to both consumers and to creative talent as one of the prime motivators of the change. In many ways, such large shifts in ROA can also be attributed to a much more competitive landscape, in which over the last forty years, companies outside of the United States have become more competitive, and are increasingly putting business and pricing pressure on domestic companies. This, also, as the Hagel and his colleagues point out, is the result of increasingly rapid shifts in what they call the "Foundation Index," a measure of underlying technology changes. These Foundational Changes are currently driving much of the overall Shift now, but the authors believe that this will eventually be eclipsed by other metrics which measure information flow and its impact.
The article and an associated paper with more details both talk about the need to increasingly structure organizations to enable staff beyond the "Creatives" (as per Richard Florida in his book, The Rise of the Creative Class) to dynamically contribute to the business. In bioinformatics, this is analogous to developing processes and systems that extend access to biomedical data beyond just the software engineers and other IT professionals, to the laboratory and clinical researchers at the front lines of science. Enabling those people to make effective use of the volumes of data generated by both internal and external sources will be a key differentiator between companies that leverage the shifts in knowledge flows that Hagel et al. are talking about and those that do not. Fostering the dynamic flow of useful information both to and from the front-line scientists and clinicians gives those most knowledgeable about the underlying processes the capability to derive useful information from what is out there, enabling conversion of that information into products efficiently and effectively. This is what will ultimately drive increasing Return on our scientific and bioinformatics Assets.

Monday, October 19, 2009

Open Source Repostitory Software

When most bioinformatics professionals think of "open source repository" software, they are probably thinking of things like SourceForge, or gForge, that provide a platform for sharing open source software and its associated source code, binaries, and related files. There is also, however, software that provides a platform for the management and sharing of documents, in a wide variety of different forms, with associated, highly structured and closely managed metadata.

The sciences, and the biomedical sciences in particular, generate a wide range of documents as a key part of any research effort. Developing and managing a laboratory notebook is one of the most important skills that a graduate student learns as part of his/her training. As more and more of these documents are created electronically, developing a means to manage them has also become more important. There have been many, many efforts to develop "electronic laboratory notebooks," with a mixed record of success. Given that practically every scientific activity involves the storage and management of a range of document types, some kind of mechanism for robust, secure and version-controlled management is critical, beyond just the final paper or complete data set.

Researchers and information technologists in library science have been working on this problem for a long time, and have started to come up with a range of solutions, both commercial and open source. Of note is an open source one by the eSciDoc team, designed to support the management of scientific materials. It is based on a widely-used set of technologies and standards, such as those from DuraSpace.org, widely-used web standards and a Services-Oriented-Architecture (SOA.) With both REST and SOAP interfaces, and a well-defined structure for controlled metadata elements and vocabularies, and distributed authentication (Shibboleth) it is a candidate for immediate integration into many scientific research information environments.


Figure from the eSciDoc website
https://www.escidoc.org/JSPWiki/en/Overview


Systems such as these may cover an important "last mile" of collaborative research, by providing a structured, metadata-rich and standards-based means to manage and share the unstructured documents that make up the background of the kind of structured scientific data that is provided in systems like GEO or caARRAY. The extent which this kind of software can be combined with large-scale and distributed collaborative translational science research remains to be seen. The fact, however, that such software is developing at a rapid pace, and that it appears to share important standards and interface approaches with other emerging programs suggests that we may be seeing the emergence of an interesting and important aspect of scientific research management.

Monday, October 5, 2009

10,000 Hours of caBIG?

In Malcolm Gladwell's recent book Outliers, he raises a very interesting point about subject mastery. In his book, he looked at the amount of time spent by individuals in the practice of an activity in which they have truly mastered. By looking over a variety of disciplines, he came to an estimate of about 10,000 hours of work as that threshold. When you think about it, it is not really that surprising. Employers often look for "5 years experience" as the measure of sufficient experience in a specific job. This equates to 40 hours a week for about 5 years, totaling roughly 10,000 hours. Gladwell found a similar rule true for a range of professions, sports and interests.

The question, then, is how to provide the opportunities and environment to support the development of true expertise in the caBIG tools, and the related underlying informatics framework. Given that the program itself is relatively new, and that there has hardly been time yet for anyone to develop expertise at the level that Gladwell describes, it shouldn't be a surprise that there are still many people within the caBIG community and outside of it who dismiss the tools as too complex and difficult to understand. The challenge for caBIG is to continue to establish a place where this expertise can be developed by the stakeholder community. One such critical component is the Knowledge Centers. Another is the experienced pool of caBIG implementers and developers who have successfully used the caBIG tools (often in conjunction with a wide range of tools from other sources) to satisfy the needs of their end-users. The formal documentation and training infrastructure that is provided buy the caBIG Documentation and Training Workspace, as well as the Center for Biomedical Informatics and Information Technology (CBIIT) at NCI provide the baseline for ensuring that the information needed by the ultimate users of caBIG is available, and is structured and consistent.

All the formal documentation and training in the world can not substitute for functional working systems at other institutes, and code and tools that can provide the template for similar successful efforts at similar interested institutions. Developers and integrators, especially in open source-rich environments, have long looked to software skeletons, "Hello World" examples and detailed tutorials to provide the foundation for their efforts. These kind of frameworks also provide demonstrable evidence that their particular needs can be satisfied by the tools. This kind of sharing of experiences is one of the things that has provided real impetus to many of the deployment efforts currently ongoing in the caBIG program, and has already led to sharing of innovative solutions to several shared needs within the community. To get to that true level of mastery, caBIG will need to ensure that such frameworks, skeletons and open demonstrators are available to everyone who is interested, and are well-categorized and presented, so what can be needed can be easily found.

One thing that is clear about gaining real proficiency in anything- the sooner that someone gets started, the better they will be in the long run. The old saying about training long-bowmen (you start by training his grandfather as a child) holds just as true with software systems and tools. Each of us develop our approaches to solving problems (and the associated toolkit) early in our careers. One of the great opportunities of the caBIG program is to give those just starting out in biomedical informatics (and there are bound to be more and more every year) a solid tool-kit supporting things like data-modeling, semantics, security, and the other components of robust systems. Perhaps even providing graduate students and postdocs support for attending caBIG meetings, or even grant support for participating in the development and integration of caBIG tools where they are supporting specific scientific and biomedical goals.

Gladwell points out in his book that success stories are almost never the result of a single independent effort, but rather made up of long-term and community-wide support. By providing the environment to support this kind of horizontal community participation in each deployment effort, we all can collectively give those separate development efforts the best possible chance for success.

Monday, September 28, 2009

Temple Smith, Bioinformatics Pioneer

I had the unbelievable privilege of spending my postdoctoral years working for Temple Smith, studying 3D protein structure prediction and multiple protein structure alignment. I can honestly say that those are still some of the best and most rewarding years on my professional life. This is due almost entirely to Temple's amazing brilliance, perspicacity, encouragement, and generous nature. Temple became a professor emeritus at Boston University last Friday, which was celebrated with a remarkable seminar series of talks given by his students, collaborators and colleagues. The depth and breadth of the presentations were remarkable, as were the collection of luminaries, each of which who demonstrated the amazing impact Temple has had on the science of bioinformatics. Equally wonderful was how each of these successful researchers in their own right acknowledged the support and inspiration that they have drawn for their own work from Temple and his original contributions to science.

Mike Waterman shows a slide of he and Temple at Los Alamos, NM
Summer, 1980, in a photo taken by David Lipman

Temple is (rightly) famous for his fractious demeanor, and his willingness to question the status quo of any situation, scientific or otherwise, and for his iconoclastic "cowboy" behavior and dress. This, as was acknowledged by all the speakers, hides an open and giving heart, and a true and deep desire to see those with whom he works succeed. Temple likes nothing more than to "stir the pot" and upset the commonly-held wisdom, something that he still continues to do with remarkable efficacy.

No article or post about Temple would be complete without an anecdote, so I will relate one of my own experiences several years ago. I was a postdoc in Temple's lab, and was being recruited by one of the large East Coast pharmaceutical companies. and I was going to have lunch with their executive recruiter in Boston. We were to meet at the lab, and go from there to lunch. As we were walking out of the offices, we passed the mailbox, where a gentleman in a mustache, serape and cowboy hat was picking up his mail. This person preceded to ask who the "character in the suit" was, and what I was doing with him, and then bustled past before I could make any introductions. As the recruiter and I walked out of the office, he asked me who "that guy" was, and I responded by asking him if he was familiar with the Smith-Waterman equation. My executive friend indicated that he was, and that knowledge of the same was a prerequisite for the job I was being considered for. When I told him that "that guy" was Smith, the look on his face was priceless, and something that I treasure to this day.

It was great talking to Temple on Friday, and getting a chance to catch up with so many of the remarkable people who has has taught and with whom he collaborated with over the years. It is a wonderful legacy, and a fantastic group of friends - we all expressed a hope that Temple become an emeritus professor every year. I certainly owe what I have been able to achieve in my own career to Temple, and am glad to wish him a very happy retirement!

Monday, September 21, 2009

Shan Zhai Bioinformatics

Can't argue with success: bioinformatics software that produces results favors agile, time and cost-effective work by staff very close to (or even participating in!) the research activity. Some potential inspirations come from interesting places. A recent article on China's Shan Zhai in Strategy & Business magazine (formally our own house organ here at Booz Allen Hamilton, but now run out of Booz&Co.) and some very interesting commentary throughout the blogosphere got me thinking about the very different approaches to software development used by the bioinformatics community. The term shan zhai literally translated, means "mountain fortress" and suggests banditry and lawlessness, andhas come in recent years to be associated with knock-off Western consumer products (iPhone, imitation is the sincerest form of flattery...) but has also come to be associated with a kind of native Chinese cleverness and DIY hackery. Much like the term bricolage in French, shan zhai practice an iterative, just-in-time form of product development, often targeting almost invisibly small communities with their products. This leads to remarkably nimble business behavior, although sometimes at the cost of the ultimate quality, manufacturability or broad relevance of the resulting products. Like the open source development community worldwide, the shan zhai make a practice of sharing information on materials and construction of their products. This level of information sharing is unheard of in the commercial world, but it allows the rapid creation and iteration of products from a geographically distributed community. In doing so, this community relies on emergent properties of this process to distribute and improve product ideas and concepts, rather than the planned approach common to the large multinational companies that produce most consumer goods worldwide.

This blog post made a fascinating connection between the shan zhai and Situated Software, a concept floated by Clay Shirky several years ago. Situated software is software that is designed for use by a specific social group - rather then designed for generality. As such, it can be developed quickly, and usually iteratively, by the community by which it will be ultimately used, or at least with that community's direct participation in the development process. This approach has significant weaknesses from a commercial point of view, which often depends on scalability and re-use to maximize the profits resulting from the initial investment of engineering effort. It also can lead to software which must be continuously managed and maintained in order to stay useful and relevant. The recent surge in websites and interactive services that are focused on particular communities of use is an example of this situated approach. With the proliferation of rapid-development software tools, and the easy interfacing made possible by REST web services, and flexible and scalable hosting provided by inexpensive webhosts and even cloud providers like Amazon and Google, this kind of development has already resulted in a wide range of remarkable and community-specific software. Everywhere you look on the web, there are niche services and sites that are directed at very specific user-communities from artists to autopilot enthusiasts and community activists of all stripes.

Which brings my back around to bioinformatics. Bioinformatics is a classic case of software that is often developed for use either by or for a very small community of users- often just a single lab or individual scientist. Since bioinformatics software is regularly developed to support scientific research activities, and most scientific research activity is by necessity bespoke and unique, it is not surprising that a lot of bioinformatics software development is itself also bespoke and small-scale. When one looks at the mixed success of commercial, large-scale software development in bioinformatics, one can see why smaller scale, situated efforts are often more successful in solving the day-to-day problems that face life scientists. Such efforts, when taken in whole, are often less efficient and more repetitive than those done in a more conventional fashion, but the costs associated with individual small-scale development can be lower than those of larger-scale efforts, since they rely upon graduate students, post-docs and small contractors, all of whom are notorious for their low cost.

The challenge, of course, is finding a path between situated, small-scale/low-cost development and scalable, reusable data and systems upon which larger scale efforts such as comparative effectiveness research, translational medicine and molecular clinical research can rely. The real payoff for bioinformatics and the research that it supports is that which leads to the development of new therapies and molecular markers for disease and treatments. Given the wide range of participants involved in such research, and their distribution both geographically and in discipline, the need for scalable, secure and standardized systems begins to transcend what can be effectively (or cost-effectively) be done by the small individual researcher acting alone. Now we look at the shan zhai. They succeed in a field which is notorious for standardization, large-scale, and risk-averse behavior. Consumer electronics has gone long past the days of small, garage operations dominating the field. But the shan zhai have identified the market inherent in the long tail of consumer electronics purchasers, those that want/need something just a little different (even if what is different is the lower price implied by rampant IP violation - I am not advocating piracy, merely indicating that the shan zhai are able to nimbly respond to a market demand for it.)

Key to the success of the shan zhai is their ability to share effectively with their colleagues throughout China and beyond. What they have recognized is that they can be more effective by reusing standardized components and data resources, and often improving them and sharing the result openly, than they can by fiercely protecting what they know as individuals or small companies. This leads to an important observation for those of us developing bioinformatics software and data resources. As we design and build standardized platforms and supporting infrastructure which can facilitate the development of bioinformatics software, it is critical to consider the means by which these tools can be flexibly and easily re-used by the many small developers that comprise the majority of the bioinformatics community. If the goal is to facilitate the development of standards-based software, and to ensure that the data collected by the community is made available using standardized and future-proofed components and representations, then we have an obligation to ensure that the tooling that we provide can be used by not only large cadres of professional software engineers, but also the graduate students and post-docs who are the bricoleurs constructing most of the software that is in use at any given time by the scientific community.

As we continue to develop software and systems that support the needs of translational research, comparative effectiveness research and other transdisciplinary medical and scientific work, we need to constune to ensure that along with well-recognized standards, we need to provide equally standardized, and easy-to-use interfaces that reflect the needs of those deploying the research solutions to the end-users. Those end-users are often just a single lab or even a single researcher, and the developer is a post-doc or graduate student who is doing his or her work using some of the simplest tools available- often no more than perl or PHP and a website. If we can support these folks with resuable modules that they understand, and provide a platform with which they can receive help and share results, we will have gone along way to create or own shan zhai, and can begin to reap the rewards of doing so.

Wednesday, September 2, 2009

The PHIN Grid (and other great stuff)

I was at this year's Public Health Information Network (PHIN) conference again, held without fail in Atlanta (always on my kid's first week of school.) This meeting is always interesting to me because of the breadth of participants and the depth and importance of the topics. This year, with H1N1 flu topmost in everyone's mind, there was a special urgency from many of the participants to get the informatics of many public health efforts operating as efficiently and effectively as possible. Key to these efforts was leveraging the products of many other programs, with federal and local informatics teams often making extensive use of open source tools and technologies- the resulting talks were both inspiring and cool.

I attended as many of the grid- and cloud-related talks as I could fit in to the conference schedule, and was rewarded with a truly remarkable view of how the stakeholders throughout the PHIN enterprise have been able to leverage technology products from a wide range of programs to satisfy their unique public health requirements. It was inspiring to hear how the CDC has been using many of the tools developed by the National Cancer Institute's caBIG program, especially key parts of the caGrid infrastructure. Equally cool was how many of the key participants in the caBIG program have been directly involved in leveraging those capabilities in an entirely new setting. In particular, Tom Savel and his team talked at length in a number of sessions about using these Grid tools to implement a range of services, and about the challenges in using them in a public health setting, such as security and reliability . Hearing about how familiar tools like GAARDS, Grid Grouper and Introduce are being used in a new community was well worth the trip, as was hearing how facilities that the caBIG program has implemented, such as the Knowledge Centers, are providing important means of support in diverse settings of national (and even international) importance.

As much as I appreciate hearing the accolades and credit given to the caBIG program (and I do!) and as much as it is rewarding to see our community providing support to these important areas, I am reminded of how teams providing software and tools must continue to improve and iterate that software, and continue to ensure that the communities using these systems do not become detached from the processes used to create them. The Knowledge Centers and their staffs are clearly leading the way here, and it will be critical for them (and us) to continue to listen closely to what is happening "out there" and ensure that the needs get "in here." To that end, though, it is both inspiring and heartening to see that we have a growing, talented, committed and capable group who will not only consume the resources that the caBIG program's participants create, but who can also significantly contribute to the infrastructure's development. The challenge for caBIG is to come up with the most effective possibly means of incorporating these contributions, and ensuring that they mesh, support and inform the program going forward.

Onward and upward, PHIN! Every time I sneeze now, I am going to wonder if I can enter that event on some aspect of the emerging public health Grid.