« PreviousContinue »
pairs for large parts of the human genome. The magnitude of this task can be illustrated by noting that, just a few years ago, the largest DNA fragment that could be cloned was about 40,000 base pairs long and theoretical calculations showed that the biggest overlapping sets of cloned DNA's of that size that could be attained was under 200,000 base pairs.
In the last year, scientists at Washington University in St. Louis have unequivocally demonstrated that, using newer cloning techniques, DNA fragments several hundred thousand base pairs in size can be cloned and that overlapping sets that span more than 2 million base pairs can be achieved. Thus, we now know that the physical mapping goals that we set for ourselves are attainable.
Moreover, this work demonstrates some of the benefits that the improved technology and maps will offer. One of the genomic regions that was isolated in an overlapping clone set was that containing the cystic fibrosis gene. This work was done by a single investigator in 6 months for less than $40,000. In contrast, the efforts of at least ten laboratories over the course of 8 years were required in the original isolation of the gene. Furthermore, using the newer technique, the gene was isolated in intact form and it was surrounded by 20 times its length of normal genetic material.
Other notable achievements that have been made in the past year include the development of an alternative system, based on the P1 bacteriophage, for cloning large DNA fragments and the development of new, useful classes of markers that will speed the construction and usefulness of the human genetic linkage map.
HUMAN GENOME PROJECT BENEFITS SCIENCE
Question. Tell me, how is the Human Genome Project benefitting science right now? Do we have to wait for the project to be completed before the information is available?
Answer. No, we certainly do not have to wait until the Human Genome Project is completed to reap its benefits. Significant advances are already apparent. There has already been a marked improvement in the underlying technologies, such as cloning and mapping techniques. For example, genome scientists have constructed markedly better collections of cloned DNA, better in the sense that the individual members of the collections are much larger than were previously available and the collections themselves are more complete. From the improved libraries, a number of clones containing Interesting genes, including those associated with diseases such as cystic fibrosis, hemophilia, and a form of X-linked mental retardation have been isolated. Technological improvements have already been noted in the area of DNA sequencing as well. The first generation of automated sequencing machines has been brought on line to routine use. The continued development of better sequencing machines, as well as the adaptation of robots to take over much of the repetitive preparative work, is now an area receiving much attention.
In terms of new data, a key characteristic of the genome project, which distinguishes it from many other identifiable scientific projects, is its incremental nature. Every time a new marker is mapped, the maps improve and immediately become more useful to the general scientific community. Improved genetic maps of several human chromosomes have become available in the past year.
Thus, from the moment the first designated genome funds were spent, the project began returning the kinds of results it promised. I am confident that benefits such as these will continue to appear throughout the course of the project, and I expect they will do so at an increasing pace.
COST OF SEQUENCING
Question. I understand sequencing of DNA is still very expensive. How much does it currently cost and what are you doing to bring the cost down?
Answer. The rate of accumulation of DNA sequences in GenBank, the DNA sequence databank, continues to increase exponentially. The increase is largely due to the improved efficiency and reliability of the commercial
automated sequencers and, as a result, the greater number of scientists who are producing sequence data. There is promising technology supported by NCHGR that I expect will improve the rate and decrease the cost of sequencing by commercially available machines in the next 5
years. Using capillary gel electrophoresis, NCHGR - supported scientists have been able to increase the number of base pairs that can be determined in a given time by a factor of about 10 compared to current methods. The use of mass spectroscopy to detect DNA fragments during sequencing is also being developed. The advantage to such a scheme is that smaller amounts of DNA can be sequenced.
Recently, the sequences of several very large (45 to 100 kilobases in length) segments of DNA were submitted to the databanks by NCHGR-supported investigators. While these are not the first sequences of this size, the numbers of sequences of this size are clearly increasing, indicating that large scale sequencing is improving.
The current cost of sequencing ranges between $2 and $5 per base pair, depending on the laboratory. It is difficult at present to predict whether the program will attain the 5-year goal of reducing the cost of sequencing to $0.50 per base pair. We are using a two-faceted strategy to approach this problem. We are supporting completely new technologies, which will dramatically improve efficiency and reduce costs, but may take 5 to 10 years before becoming available commercially. In addition, we support several pilot sequencing projects that have the goal to sequence one million base pairs of finished sequence in the next 3 years by scaling up and making incremental improvements in current methods. These projects are sequencing the genomes of important model organisms, the roundworm (C. elegans) and two bacteria (E. coli and Mycoplasma capricolum) or biologically important regions of the human genome (the T-cell receptor region). We expect that using these approaches we will be able to significantly reduce the costs over the next 5 years.
CONTRIBUTIONS OF THE HUMAN GENOME RESEARCH CENTERS
Question. The FY 1992 budget requests funds for 11 human genome research centers. How will these centers contribute to the Human Genome Project?
Answer. The human genome research centers are essential to the goals of the Human Genome Project, because they represent the interdisciplinary groups that are needed to achieve major objectives such as the mapping of a complete human chromosome. These human genome research centers will each have an outreach component through which collaboration will be established with other projects and information and materials made available to the community as a whole.
The two overall goals of the human genome research centers funded so far are mapping and technology development. Three human genome research centers (Washington University, University of California at San Francisco, Baylor College of Medicine) have chosen the development of a physical and genetic map of one or more human chromosomes as their goal. The center at the Massachusetts Institute of Technology will develop the physical and genetic map of the mouse. The other two human genome research centers (University of Michigan and University of Utah) have chosen mapping, sequencing, and gene identification technology development as their goals. All of the human genome research centers, however, include significant efforts in various aspects of technology development. Additional human genome research centers that will be funded will have similar goals of mapping and technology development.
Question. What are your plans for sharing genome data with the rest of the scientific community?
Answer. This is a very important question because the raison d'etre of the Human Genome Project is the generation of an extraordinarily valuable set of tools for use by the entire biomedical research community. Furthermore, because mapping and sequence data will be generated by the genome project incrementally and each new data item will be useful as soon as it is produced, it is vital that we develop means for distributing the data rapidly and
widely. Thus, in addition to the traditional route of publication, we employing a number of other devices to facilitate sharing of genome data with the scientific community. We will be supporting public databases where all the information will be collected and made easily accessible to all who want it.
In addition, in many of our grants, especially in all human genome research center grants, we are supporting "outreach" activities, in addition to providing research support. Outreach activities include distribution of materials to all investigators who request them, collaboration with outside investigators, training opportunities, and opportunities for visiting investigators to use the most up-to-date equipment and technology on site. In addition, we are encouraging applicants for NCHGR support to develop new and innovative means for making data and materials widely available prior to publication. The adequacy of investigators' distribution plans is, in many cases, taken into account as a criterion for award of a grant.
Senator HARKIN. Dr. Lindberg, I hate to ask you to stay. Well, let's see. I have 8 minutes left.
Dr. Lindberg, your request is $100.5 million, 10 percent more than last year. Could you just give us a brief summary? I would hate to have you sit here for another 15 minutes while I go vote.
Dr. LINDBERG. I would be happy to be brief. I am very pleased to be here, Senator Harkin.
The National Library of Medicine's task is to acquire, organize, and disseminate the information that results from the discoveries and the funding of the rest of NIH and the biomedical community.
Our work has gone very well in the last year. Compared with previous years, there were more papers published and indexed, more books published and cataloged, and more online computer inquiries. There are now 5 million such inquiries a year.
In addition, there are certain initiatives that this committee and the House have encouraged that I should report to you. In 1988, this committee gave us specific language that said I should publicize the products and services of the National Library of Medicine. That has been tremendously helpful to us. That, combined with an outreach study by Dr. Michael DeBakey and 20 colleagues, has set us on a course of putting to excellent use the additional funds that the Congress has provided specifically for outreach. There are now about 50 programs in 38 States. So, we are trying our level best to make sure all of those products and services become available to every health care professional in the United States.
NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION
Even though time is short, I should report to you about the National Center for Biotechnology Information (NCBI), also created by congressional action. Among other things, the NCBI has the responsibility to be sure that the sequencing and mapping information that results from the human genome program is available as quickly and as efficaciously as possible to the investigators who participate in the program and to those health care practitioners who will ultimately use and benefit from it. All that has been going extremely well. There are now 28 remarkably talented young professionals in the center doing this work under Dr. David Lipman's direction.
In terms of breakthroughs, I think it is interesting that in the course of the last year, the NCBI staff have been able to move from use of the Cray supercomputers at Frederick, which happily the National Cancer Institute let us use, to do the same extensive calculations on smaller machines, silicon graphics microparallel processors. The net result is that in the NCBI laboratory, they can take any sequence and compare it against the entire 40 million sequences that are held in the data bank and get an answer in 5 seconds. Even better, they can offer to any collaborator across the country who is attached to the Internet this same service-log-in, printout, and so forth in 15 seconds.
HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS
Now, that brings me to the last point, Mr. Chairman. Probably the most exciting thing that has happened for us in the last year has been our participation last summer in the Office of Science and Technology Policy's White House program on High Performance Computing and Communications. That is one of the Presidential initiatives. Accompanying the President's budget is this publication that describes the program. I will leave you one.
This is a major effort, $150 million of new money this year on top of $680 million a year already being spent by the four major science agencies: DARPA in the Department of Defense, Department of Energy, NASA, and NSF. During this last year we have joined that effort, I am proud to say. In fact, NLM is the only biomedical component. So, in a sense we are representing as strongly as we can, a token for NIH, a token for HHS, and a kind of a pale representation of what should be the entire biomedical enterprise.
But at least we are participating and trying to develop biomedical applications and uses for these new systems. Dr. Bromley, head of the Office of Science and Technology Policy, and Eric Block, the former head of NSF, have said that we must plan for community benefits from computers that have not been invented yet using networks that have not been developed. But, of course, there are prototypes in both cases that are very much better than anything most investigators have. It is our task to be sure that the biomedical institutions get ready to use the new capabilities and use them wisely.
The High Performance Computing and Communications initiative will also work effectively with the human genome project. The rapid analyses that I described as possible today will have to be 1,000 times faster in the future, as Dr. Watson pointed out. We are not really behind now in analytic capability, but we know we have to do 1,000 times better in the future or the whole system won't work.
I do appreciate the past support of the Congress for NLM. This year's President's budget is $100,554,000, as you mentioned.
I would be pleased to respond to questions if your time permits. [The statement follows:]