Molecular Nutrition
Molecular Nutrition
Edited by
Janos Zempleni
University of Nebraska Lincoln
USA and
Hannelore Daniel
Technical University of Munich Germany
CABI Publishing
CABI Publishing is a division of CAB International CABI Publishing
CAB International Wallingford Oxon OX10 8DE UKTel: +44 (0) 1491 832111 Fax: +44 (0)1491 833508 E-mail: [email protected]
Web site: www.cabi-publishing.org
CABI Publishing 44 Brattle Street 4th Floor Cambridge, MA 02138 Tel: +1 617 395 4056USA Fax: +1 617 354 6875 E-mail: [email protected]
©CAB International 2003. All rights reserved. No part of this publication may be reproduced in any form or by any means, electronically, mechanically, by photocopying, recording or otherwise, without the prior permission of the copyright owners.
A catalogue record for this book is available from the British Library, London, UK.
Library of Congress Cataloging-in-Publication Data Molecular nutrition/edited by Janos Zempleni and Hannelore Daniel.
p. ; cm.
Includes bibliographical references and index.
ISBN 0-85199-679-5 (alk. paper)
1. Nutrient interactions. 2. Nutrition. 3. Molecular biology.
[DNLM: 1. Nutrition. 2. Molecular Biology. QU 145 M718 2002]
I. Zempleni, Janos. II. Daniel, Hannelore.
QP143.7 .M656 2002 517.6--dc21
2002154031
ISBN 0 85199 679 5
Typeset by AMA DataSet Ltd, UK
Printed and bound in the UK by Biddles Ltd, Guildford and King’s Lynn
4
Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:22 PM
Color profile: Disabled Composite Default screen
Contents
Contributors ix
Preface xi
PART 1: METHODS IN MOLECULAR NUTRITION RESEARCH
1 Genomics and Beyond 1
Ji Zhang
2 Perspectives in Post-genomic Nutrition Research 13
Hannelore Daniel
PART 2: CELLULAR NUTRIENT HOMEOSTASIS, PROLIFERATION AND APOPTOSIS
3 Molecular Physiology of Plasma Membrane Transporters for Organic
Nutrients 21
Hannelore Daniel
4 Intracellular Trafficking and Compartmentalization of Vitamins and
Their Physiologically Active Forms 43
Donald B. McCormick
5 Nutrient Homeostasis in Proliferating Cells 61
Janos Zempleni
6 Nutrients and Apoptosis 73
John C. Mathers
PART 3: ROLES FOR NUTRIENTS IN SIGNAL TRANSDUCTION, GENE EXPRESSION AND PROTEOLYSIS
7 Glucose Regulation of Gene Expression in Mammals 91 Fabienne Foufelle and Pascal Ferré
v
8 Amino Acid-dependent Control of Transcription in Mammalian Cells 105 Michael S. Kilberg, Van Leung-Pineda and Chin Chen
9 Fatty Acids and Gene Expression 121
Ulrike Beisiegel, Joerg Heeren and Frank Schnieders
10 Role of RARs and RXRs in Mediating the Molecular Mechanism of Action
of Vitamin A 135
Dianne R. Soprano and Kenneth J. Soprano
11 Regulation of Gene Expression by Biotin, Vitamin B6and Vitamin C 151 Krishnamurti Dakshinamurti
12 Selenium and Vitamin E 167
Alexandra Fischer, Josef Pallauf, Jonathan Majewicz, Anne Marie Minihane and Gerald Rimbach
13 Sphingolipids: a New Strategy for Cancer Treatment and Prevention 187 Eva M. Schmelz
14 The Health Effects of Dietary Isoflavones 201
Thomas M. Badger, Martin J.J. Ronis and Nianbai Fang
15 Mechanisms of Ubiquitination and Proteasome-dependent Proteolysis
in Skeletal Muscle 219
Didier Attaix, Lydie Combaret, Anthony J. Kee and Daniel Taillandier
PART 4: NUCLEIC ACIDS AND NUCLEIC ACID-BINDING COMPOUNDS
16 Diet, DNA Methylation and Cancer 237
Judith K. Christman
17 Biotinylation of Histones in Human Cells 267
Janos Zempleni
18 Niacin Status, Poly(ADP-ribose) Metabolism and Genomic Instability 277 Jennifer C. Spronck and James B. Kirkland
PART 5: MOLECULAR EVENTS AFFECT PHYSIOLOGY
19 Assembly of Triglyceride-transporting Plasma Lipoproteins 293 Joan A. Higgins
20 Regulation of Cellular Cholesterol 309
Ji-Young Lee, Susan H. Mitmesser and Timothy P. Carr
21 2002 Assessment of Nutritional Influences on Risk for Cataract 321 Allen Taylor and Mark Siegal
22 Nutrition and Immune Function 349
Parveen Yaqoob and Philip C. Calder
vi Contents
6
Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:22 PM
Color profile: Disabled Composite Default screen
PART 6: FOODS
23 Molecular Mechanisms of Food Allergy 369
J. Steven Stanley and Gary A. Bannon
24 Safety Assessment of Genetically Modified Foods 381 Steve L. Taylor
Index 395
Contents vii
Contributors
D. Attaix,Nutrition and Protein Metabolism Unit, INRA de Theix, 63122 Ceyrat, France.
T.M. Badger, Department of Pediatrics, Arkansas Children’s Hospital Research Institute, Slot 512-20, 1120 Marshall Street, Little Rock, AR 72202, USA.
G.A. Bannon,Product Safety Center, Monsanto Company, 800 N. Lindbergh Boulevard, St Louis, MO 63167, U. Beisiegel,USA. Abteilung für Molekulane Zellbiologie, Institut für Biochemie und Molekularbiologie,
Universitätsklinikum Hamburg-Eppendorf, Martinistraße 52, D-20246 Hamburg, Germany.
P.C. Calder,University of Southampton School of Medicine, Fetal Origin of Adult Disease Division, Institute of Human Nutrition, Highfield, Southampton SO17 1BJ, UK.
T.P. Carr,Department of Nutritional Science and Dietetics, University of Nebraska-Lincoln, 316 Ruth Leverton Hall, Lincoln, NE 68583-0806, USA.
C. Chen,Department of Biochemistry and Molecular Biology, University of Florida College of Medicine, Box 100245, JHMHC, Gainesville, FL 32610-0245, USA.
J.K. Christman, Stokes-Shackleford Professor and Chair, Department of Biochemistry and Molecular Biology and Eppley Cancer Center, University of Nebraska Medical Center, 984525 University Medical Center, Omaha, NE 68198-4525, USA.
L. Combaret,Nutrition and Protein Metabolism Unit, INRA de Theix, 63122 Ceyrat, France.
K. Dakshinamurti,Department of Biochemistry and Molecular Biology, University of Manitoba, 770 Bannatyne Avenue, Room 305 Basic Sciences Building, Winnipeg, Manitoba R3E 0W3, Canada
H. Daniel, Department of Food and Nutrition, Technical University of Munich, Hochfeldweg 2, D-85350 Freising-Weihenstephan, Germany.
N. Fang,Department of Pediatrics, Arkansas Children’s Hospital Research Institute, Slot 512-20, 1120 Marshall Street, Little Rock, AR 72202, USA.
P. Ferre,U465 INSERM, French Institute of Health and Medical Research, Centre de Recherches Biomedicales des Cordeliers, 15 rue de l’Ecole de Medicine, 75270 Paris Cedex 06, France.
A. Fischer,Institute of Animal Nutrition and Nutrition Physiology, Justus-Leibig-University, Giessen, Germany.
F. Foufelle,U465 INSERM, French Institute of Health and Medical Research, Centre de Recherches Biomedicales des Cordeliers, 15 rue de l’Ecole de Medicine, 75270 Paris Cedex 06, France.
J. Heeren,Abteilung für Molekulane Zellbiologie, Institut für Biochemie und Molekularbiologie, Universitätsklinikum Hamburg-Eppendorf, Martinistraße 52, D-20246 Hamburg, Germany.
J.A. Higgins,Department of Molecular Biology and Biotechnology, University of Sheffield, Firth Court, Western Bank, Sheffield S10 2TN, UK.
A.J. Kee, Muscle Development Unit, Children’s Medical Research Institute, Locked Bag 23, Wenworthville, NSW 2145, Australia.
ix
M.S. Kilberg, Department of Biochemistry and Molecular Biology, University of Florida College of Medicine, Box 100245, JHMHC, Gainesville, FL 32610-0245, USA.
J.B. Kirkland,Department of Human Biology and Nutritional Sciences, University of Guelph, 335 Animal Science and Nutrition, Guelph, Ontario N1G 2W1, Canada.
J.-Y. Lee,Department of Nutritional Science and Dietetics, University of Nebraska-Lincoln, 316 Ruth Leverton Hall, Lincoln, NE 68583-0806, USA.
V. Leung-Pineda,Department of Biochemistry and Molecular Biology, University of Florida College of Medicine, Box 100245, JHMHC, Gainesville, FL 32610-0245, USA.
J. Majewicz,School of Food Biosciences, Hugh Sinclair Human Nutrition Unit, University of Reading, Whiteknights, PO BOX 226, Reading RG6 6AP, UK.
J.C. Mathers,Human Nutrition Research Centre, School of Clinical Medical Sciences, Faculty of Agriculture and Biological Sciences, University of Newcastle upon Tyne, Newcastle upon Tyne NE1 7RU, UK.
D.B. McCormick,Fuller E. Callaway Professor Emeritus, Emory University School of Medicine, Department of Biochemistry, 4013 Rollins Research Center, Atlanta, GA 30322-3050, USA.
A.M. Minihane, School of Food Biosciences, Hugh Sinclair Human Nutrition Unit, University of Reading, Whiteknights, PO Box 226, Reading RG6 6AP, UK.
S.H. Mitmesser,Department of Nutritional Science and Dietetics, University of Nebraska-Lincoln, 316 Ruth Leverton Hall, Lincoln, NE 68583-0806, USA.
J. Pallauf,Institute of Animal Nutrition and Nutrition Physiology, Justus-Leibig-University, Giessen, Germany.
G.H. Rimbach, School of Food Biosciences, Hugh Sinclair Human Nutrition Unit, University of Reading, Whiteknights, PO Box 226, Reading RG6 6AP, UK.
M.J.J. Ronis, Department of Pediatrics, Arkansas Children’s Hospital Research Institute, Slot 512-20, 1120 Marshall Street, Little Rock, AR 72202, USA.
E.M. Schmelz,Karmanos Cancer Institute, Wayne State University, HWCRC 608, 110 East Warren Avenue, Detroit, MI 48201, USA.
F. Schnieders, Abteilung für Molekulane Zellbiologie, Institut für Biochemie und Molekularbiologie, Universitätsklinikum Hamburg-Eppendorf, Martinistraße 52, D-20246 Hamburg, Germany.
M. Siegal,Laboratory for Nutrition and Vision Research, USDA Human Nutrition Research Center on Aging, Tufts University, 711 Washington St, Boston, MA 02111, USA.
D.R. Soprano,Temple University School of Medicine, Room 401 MRB, 3307 North Broad Street, Philadelphia, PA 19140, USA.
K.J. Soprano,Temple University School of Medicine, Room 401 MRB, 3307 North Broad Street, Philadelphia, PA 19140, USA.
J.C. Spronck,Department of Human Biology and Nutritional Sciences, University of Guelph, 335 Animal Science and Nutrition, Guelph, Ontario N1G 2W1, Canada.
J.S. Stanley,Department of Pediatrics, Division of Pediatric Allergy and Immunology, University of Arkansas for Medical Sciences and Arkansas Children’s Hospital Research Institute, 4301 W Markham St, Slot 512-13, Little Rock, AR 72205, USA.
D. Taillandier,Nutrition and Protein Metabolism Unit, INRA de Theix, 63122 Ceyrat, France.
A. Taylor, Laboratory for Nutrition and Vision Research, USDA Human Nutrition Research Center on Aging, Tufts University, 711 Washington St, Boston, MA 02111, USA.
S.L. Taylor,Department of Food Science and Technology, 143 Food Industry Building, University of Nebraska- Lincoln, Lincoln, NE 68583-0919, USA.
P. Yaqoob,School of Food Biosciences, University of Reading, Whiteknights, PO Box 226, Reading RG6 6AP, UK.
J. Zempleni,Department of Nutritional Science and Dietetics, University of Nebraska-Lincoln, 316 Ruth Leverton Hall, Lincoln, NE 68583-0806, USA.
J. Zhang,Center for Human Molecular Genetics, Munroe-Meyer Institute, University of Nebraska Medical Center, 985455 Nebraska Medical Center, Omaha, NE 68198-5454, USA.
x Contributors
10 Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:23 PM
Color profile: Disabled Composite Default screen
Preface
Molecular biology has provided us with many powerful tools and techniques over the past 20–30 years, leading to the emergence of molecular nutrition as a new cornerstone in nutrition research. Molecular nutrition investigates roles for nutrients at the molecular level, such as signal transduction, gene expression and covalent modifications of proteins. Research findings that have been generated by using molecular techniques have guided us into new territory, going way beyond classical nutrition studies such as characterization of clinical signs of nutrient deficiencies. Those nutritionists who investigate effects of nutrients at the molecular level have long recognized what an exciting field this is.
In this book, we sought to capture some of the excitement in the field of molecular nutrition. We were lucky enough to recruit chapter authors that play leading roles within their disciplines. These individuals gladly shared their insights into various aspects of molecular nutrition. A book like this one cannot be comprehensive: the editors had to select from a large number of nutrients and an even larger number of effects of nutrients at the molecular level. For each topic that is included in this book, another equally important topic may have been left out. As editors, we apologize to those nutritionists who do not find
‘their’ nutrient in this book.
We sought to cover a broad range of research in molecular nutrition. The first two chapters by J. Zhang and H. Daniel cover technical advances that have been made in genomics and post-genomics, and the promise that these technologies hold for future nutrition research. Next, H. Daniel, D.B. McCormick and J. Zempleni tell us how nutrients enter cells, how they are being targeted to their sites of action in cells and how physiological processes such as cell proliferation affect nutrient transport.
This section is completed by a chapter by J.C. Mathers in which he reviews the roles for nutrients in apoptosis. A series of fine chapters by F. Foufelle and P. Ferré, M.S. Kilberget al., U. Beisiegelet al., D.R. Soprano and K.J. Soprano, K. Dakshinamurti, A. Fischeret al., E.M. Schmelz, T. Badgeret al., and D. Attaixet al. provides examples for roles of macro- and micronutrients in signal transduction, gene expression and proteolysis. J.K. Christman, J. Zempleni, and J. Kirkland and J.C. Spronck contributed reviews of nutrient-dependent modifications of nucleic acids and nucleic acid-binding proteins.
Of course, molecular nutrition cannot afford to be a self-serving science. Chapters on regulation of lipoprotein assembly (J.A. Higgins), cellular cholesterol metabolism (J.-Y. Leeet al.), oxidative stress (A. Taylor and M. Siegal) and immune function (P. Yaqoob and P.C. Calder) were selected to demonstrate how events at the molecular level can be integrated into multiorgan and whole-body metabolic pathways.
Finally, J.S. Stanley and G. Bannon, and S.L. Taylor share their thoughts with us in the fields of genetically modified foods and molecular mechanisms of food allergy.
xi
Our thanks go to all the scientists who contributed chapters for this book. These individuals created time in their busy schedules to produce chapters of outstanding quality. Also, we thank Rebecca Stubbs from CABI Publishing for her continued support during the preparation of this book.
Hannelore Daniel Janos Zempleni
xii Preface
12 Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:23 PM
Color profile: Disabled Composite Default screen
1 Genomics and Beyond
Ji Zhang
1,21Department of Pathology and Microbiology, University of Nebraska Medical Center;
2Center for Human Molecular Genetics, Munroe-Meyer Institute, University of Nebraska Medical Center, Omaha, Nebraska, USA
Introduction
Dramatic advances in genome research in recent years will facilitate the precise determination of molecular mechanisms underlying human health and disease, and thus offer great potential for pro- moting health, lowering mortality and morbidity, and preventing disease. Nutritional science can benefit greatly from understanding the molecular mechanisms that cause heterogeneous responses to nutrient intake observed in healthy adults.
Therefore, it is of considerable value for nutrition scientists to gain knowledge of essential technolo- gies and resources of genome research. Originally, genomics refer to the scientific discipline of map- ping, sequencing and analysis of an organism’s genome, the entire set of genes and chromosomes.
Now, the emphasis of genomics has undergone a transition from structural analysis of the genome (structural genomics) to functional analysis of the genome (functional genomics). Structural genomics aims to construct high-resolution genetic, physical and transcriptional maps of an organism and ultimately to determine its entire DNA sequence. Functional genomics, however, represents a new phase of genome research, refer- ring to the development of innovative technologies based on the vast amount of structural genomics information. The first section of this chapter will focus on tools and reagents utilized in structural genomics; the second section will focus on DNA microarray technology, a representative of today’s functional genomics.
Structural Genomics
Genome, genetic mapping and physical mapping
The human genome is composed of approximately 3 billion nucleotide base pairs, carrying genetic codes for 30,000–100,000 genes. The DNA of the diploid genome is organized into 22 pairs of autosomes and two sex chromosomes. Each chro- mosome is thought to contain one linear DNA molecule featuring three functional elements required for the successful duplication of the chro- mosome upon cell division: (i) autonomous repli- cation sequences; (ii) the centromere, to which the mitotic or meiotic spindle attaches; and (iii) the telomere, which ensures the complete replication of the chromatid at its ends. A map of the genome defines the relative position of different loci (genes, regulatory sequences, polymorphic marker sequences, etc.) on the DNA molecules. Two dis- tinct approaches are used to map the loci of the genome: genetic mapping and physical mapping.
When the distance between two loci is mea- sured by the meiotic recombination frequency, a genetic map is constructed. The closer two loci are on the DNA molecule, the more likely it is that they are inherited together. The term synteny refers to loci that reside on the same chromosome; these loci are not necessarily linked. Loci, located on different chromosomes or far apart on the same chromosome, segregate independently with the recombination frequency of 50%. Genetic distances are expressed in
©CABInternational2003.Molecular Nutrition
(eds J. Zempleni and H. Daniel) 1
percentage recombination or centiMorgans (cM).
One cM equals 1% recombination and corresponds toapproximately0.8×106basepairs(bp).Thegenetic mapofthehumangenomespansroughly3700 cM.
Genetic maps are constructed by using linkage analysis: the analysis of the segregation of polymorphic markers in pedigrees. The first genetic markers used were phenotypic traits (e.g. colour blindness) and protein polymorphisms.
However, polymorphic markers of these types are rare. Detailed genetic maps were made possible by the discovery of highly polymorphic sequences in the genome, typically represented by microsatellite sequences; the latter contain variable numbers of small tandem repeats of di-, tri- or tetranucleotides (Litt and Luty, 1989; Stallingset al., 1991). Linkage analysis is a statistical method and its resolution also depends on the number of informative pedigrees.
Unlike genetic mapping, physical mapping directly measures the distance between two loci on the linear DNA molecule by nucleotide base pairs.
Therefore, the complete sequence of the genome is considered as the ultimate physical map. Both genetic and physical maps result in an identical order of genes, but the relative distance between genes can vary widely due to local variations in recombination frequency. Molecular biology provides the instruments needed to construct the physical map. At the most detailed level, the nucle- otide sequence of a cloned gene can be determined.
At a lower level of resolution, the distance between restriction sites is quantified in units of base pairs.
Restriction sites are short sequences (typically 4–8 bp) recognized and cleaved specifically by class II restriction enzymes. The analysis of the presence of restriction sites yields a restriction map that typi- cally comprises 10–100 kilobases (kb). Genomic fragments of this size can be cloned and mani- pulated using plamids, phages, cosmids, BACs (bacterial artificial chromosomes) (Shizuyaet al., 1992), PACs (P1 artificial chromosomes) (Ioannou et al., 1994) or YACs (yeast artificial chromosomes) as vectors (Burkeet al., 1987).
Tools and reagents utilized in physical mapping
Mapping using human–rodent hybrid panels Somatic cell hybrids are tools utilized to map human genes physically on to chromosomes.
Fusion of human and rodent cells and the subse- quent culture selection results in stable hybrids, which usually contain a complete set of rodent chromosomes and a few human chromosomes. A panel of such hybrids allows for localization of a human gene or gene product on a specific human chromosome (Zhanget al., 1990). This approach can be used to develop synteny maps. To localize a gene in subchromosome regions, however, requires special hybrids containing only part of a human chromosome. This has been achieved by various deletion mapping approaches, including microcell-mediated subchromosome transfer or fusion using donor cells with specific trans- locations or interstitial deletions (Zhang et al., 1989a). The widely used radiation hybrid approach is also based on this deletion mapping concept (Walteret al., 1994). Here, a single human chromosome is hybridized with rodent chromo- somes and cleaved into smaller fragments by using X-rays. Fusion of this radiated cell popula- tion with a rodent cell and the subsequent selec- tion of human chromosome materials generates a panel of deletion hybrids containing different fragments of the human chromosome. Collections of radiation-reduced hybrids generated from each of 24 human chromosome-specific rodent hybrids produce a complete human radiation hybrid mapping panel. This facilitates the rapid map- ping of genes, ESTs (expressed sequence tags), polymorphic DNA markers and STSs (sequence- tagged sites) (Olson et al., 1989; Weber and May, 1989) to subchromosome regions using polymerase chain reaction (PCR) (Deloukaset al., 1998).
Mapping using chromosomein situ hybridization
Genes can be assigned directly to chromosome regions byin situhybridization, in which a DNA probe containing the sequence of interest is labelled, denatured and hybridized to its comple- mentary chromosomal DNA from denatured metaphase spreads on a glass slide. Traditionally, the probe is labelled by the incorporation of [3H]nucleotides, and the post-hybridization detec- tion is conducted by autoradiography, in which the chromosome spreads are overlaid with a liquid emulsion, exposed and developed. Silver signals appear near the radioactive probe and therefore highlight the chromosome location of the DNA
2 J. Zhang
14 Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:24 PM
Color profile: Disabled Composite Default screen
sequence of interest (Marynenet al., 1989; Zhang et al., 1989b). The major drawback of this radio- active in situ hybridization is the time needed for one experiment (typically 2–3 weeks). Also, the detection usually yields a relatively high background, and the assignment of a chromosome locus occurs on a statistical basis.
Fluorescence in situ hybridization (FISH) circumvents this problem. The probe is labelled by incorporation of nucleotides labelled with antigen (e.g. biotin) and the detection is performed using fluorescence-conjugated antibody (e.g. avidin).
Background noise is low, and most metaphase chromosomes show four specific fluorescent signals (one for each chromatid on the two homologous chromosomes). The sensitivity of this procedure is usually low, and thus it requires the use of large genomic probes (ideally >10 kb). Genomic phage (15 kb), cosmid (40 kb), PACs (80–135 kb), BACs (130 kb) and even YACs (200 kb to 2 Mb) can be utilized as probes for FISH mapping (Hardas et al., 1994).
Genome mapping and disease gene isolation One of the most prominent applications of genetic mapping is reverse genetics, i.e. mapping of a genetic disease without knowing the underlying biochemical basis. On the basis of genetic and physical mapping, it is possible to map any Mendelian phenotypic trait. Linkage of a disease locus to a mapped genetic marker allows for the diagnosis of disease and identification of carriers if informative pedigrees are available. Once a dis- ease locus has been mapped genetically, physical mapping can be applied to identify the disease gene and its primary defect (Collins, 1995). The identification of the primary defect is also essential for the design of a specific treatment for the disease and may lead to somatic gene therapy in the future. A similar approach can be applied to investigate multiple gene diseases. This further extends the application of gene mapping.
Comparative gene mapping, the mapping of homologous genes in genomes of different species, can also be helpful in the identification of disease genes. Linked gene groups have been conserved to some extent during evolution, and homologous chromosome regions containing different genes have been conserved among species, e.g. human,
mouse and rat. The existence of many mouse or rat strains with well-defined, mapped genetic diseases or syndromes can then be used to map similar genetic loci in humans (Zhanget al., 1989b).
Alternatively, association of the disease with cytogenetic lesions such as small chromosome interstitial deletions or translocations allows the direct localization of the disease locus on the physical map and the identification of the locus by means of the same technology.
Mapping the malignant genome in cancer Cancer, characterized by neoplastic transforma- tions of cells, invasion of tissues and finally metas- tasis, is a ‘disease of genes’, caused by alterations of specific genes, which are partially known.
Oncogenes play a role in the neoplastic evolution when activated by either mutation or gene ampli- fication. Tumour suppressor genes control growth of cells. Loss or inactivation of these genes also contributes to tumorigenesis. Cytologically, specific cancers often display characteristic chro- mosome abnormalities, including translocations, deletions, inversions and DNA amplifications.
Accordingly, efforts to use molecular tools in order to analyse these recurrent chromosomal abnor- malities have led to the identification of numerous genes related to tumour initiation and progression.
Conventional chromosome banding tech- niques have provided the major basis for karyo- typical analysis of malignant cells in tumors. As the interpretation of the chromosome banding pattern is a pure experience-dependent procedure, errors and ambiguous data often are difficult to prevent, especially with respect to detecting minor struc- tural changes when analysing complex karyotypes.
FISH has contributed significantly to the character- ization of chromosome abnormalities in tumours.
Rearrangements involving specific chromosomes or their derivatives in malignant cells can be visual- ized directly by hybridizing the chromosome- specific painting probes, chromosome-specific repetitive sequence probes or chromosomal region-specific DNA probes to the tumour cells.
However, using this approach to characterize those frequently observed unknown marker chromo- somes or unknown origin genomic segments would require probes for all 24 human chromosomes.
The development of spectrum karyotyping (SKY)
Genomics and Beyond 3
(Schrocket al., 1996) has attempted to circumvent this problem by visualizing 24 human chromo- somes individually in different arbitrary colours.
The resolution of this technique is limited to the chromosome level. In addition, small marker chromosomes may escape detection.
Comparative genome hybridization (CGH) provides an overview of unbalanced genetic alterations, which is based on a competitivein situ hybridization of differentially labelled tumour DNA and normal DNA to a normal human meta- phase spread (Kallioniemiet al., 1992). Regions of gain or loss of DNA sequences are seen as an increased or decreased colour ratio of two fluorochromes used to detect the labelled DNAs.
The genomic information obtained from this technique, however, is restricted to the area where only gain or loss occurs in malignant genomes. In addition, it does not lead to the generation of DNA from the detected chromosome regions.
Chromosome microdissection provides an approach to isolate DNA directly from any cyto- logically recognizable regions. The isolated DNA can then be used: (i) with region-specific painting probes for the detection of specific chromosomal disease (Zhanget al., 1993a); (ii) in gene amplifica- tion studies (Zhanget al., 1993b); and (iii) with region-specific DNA markers for position cloning (Zhanget al., 1995). Technically, the molecular cytogenetic tools described above are comple- mentary to each other, facilitating rapid scanning of malignant genomes and targeting chromosome abnormalities. This may lead to the identification of genes involved in tumorigenesis.
Genome mapping integration Genome mapping has entered the final stage of integration, i.e. the integration of genetic and physical resources into more complete and com- prehensive maps. One of the milestones in this respect was the construction of a human linkage map containing 5264 genetic markers (Dibet al., 1996). This high-resolution genetic map reaches the 1 cM resolution limit of genetic mapping, with marker spacing being <1×106bp of physical dis- tance. This provides one of the most comprehen- sive instruments in genetic disease studies. Soon after, a high-density physical map with >30,000 gene markers, average spacing about 100 kb, was
constructed (Deloukaset al., 1998). Based on these comprehensive maps, the assembly of isolated intact genomic fragments in PAC and BAC vectors into clone maps or comprehensive contigs has been facilitated. Representative PAC or BAC clones were then used as the DNA sources for a shotgun plasmid library, in which the average insert size is about 1 kb. Finally, these shotgun clones were used as DNA templates for high- throughput DNA sequencing using dideoxy- termination biochemistry and automated gel electrophoresis with laser fluorescent detection (Venter et al., 1998). Mapping of the human genome is still incomplete. This is due to the pres- ence of a large number of various interspersed and tandem repeated sequences in chromosomes. In particular, the assembly of repeats in centromeres and paracentric heterochromatin regions is not possible with current technologies. Repeats may contain as little as a few base pairs or as many as 200 bp; these regions may tandem repeat hundreds to thousands of times in a single chromosome region. Although additional efforts are required to complete the human genome sequencing, the vast amount of structural informa- tion available to date has facilitated precise deter- mination of molecular mechanisms in human cells (Deloukaset al., 1998; Landeret al., 2001; Venter et al., 2001).
Functional Genomics
The use of microarrays in functional genomics
Functional genomics represents a new phase of genome research: to assess genes functionally on the genome-wide scale. This is represented by the emergence of DNA microarray technology (Schenaet al., 1995; DeRisiet al., 1997).In silico microarray methodology is where inserts from tens of thousands of cDNA clones (i.e. probes) are arrayed robotically on to a glass slide and subse- quently probed with two differentially labelled pools of RNA (i.e. target). Typically, the RNA sample is labelled with a nucleotide conjugated to a fluorescent dye such as Cy3-dUTP or Cy5- dUTP. RNA (target) from at least two treatment groups is compared in order to identify differences in mRNA levels, e.g. normal cells versus diseased
4 J. Zhang
16 Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:24 PM
Color profile: Disabled Composite Default screen
cells; wild-type versus a transgenic animal; or general control versus a series of study samples.
After hybridization, the slide is excited by appro- priate wavelength laser beams to generate two 16-bit TIF images. The pixel number of each spot in each wavelength channel is proportional to the number of fluorescent molecules and hence permits the quantification of the number of target molecules that have hybridized to the cDNA clones (probes). The difference in signal intensities at each wavelength parallels the number of molecules from the two different target sources that have hybridized to the same cDNA probe. A general process of DNA microarray is illustrated in Fig. 1.1. Experimental procedures for this technology have been well established. Thus, this chapter focuses on data analysis.
Microarray data analysis
Figure 1.2A illustrates typical DNA microarray images. The amount of data generated by each microarray experiment is substantial, potentially equivalent to that obtained through tens of thousands of individual nucleotide hybridization experiments done in the manner of traditional molecular biology (e.g. Northern blot). It is extremely challenging to convert such a massive amount of data into meaningful biological net- works. Therefore, it is important for life scientists to understand working principles of data mining tools utilized in this field.
Data pre-processing
Various laser-based data acquisition scanners are commercially available now. For data analysis, it usually is necessary first to build up a spreadsheet- like matrix, in which rows represent genes, columns represent RNA samples, and each cell contains a ratio (e.g. pixel number of Cy5 versus pixel number of Cy3) featuring the transcriptional level of the particular gene in the particular sam- ple. This matrix can be studied in two ways: com- paring rows in the matrix and comparing columns in the matrix. By looking for similarities in expres- sion patterns of genes in rows, functionally related genes that are co-regulated can be identified. By comparing expression profiles in samples, biologi- cally correlated samples or differentially expressed
genes can be determined. Usually, the matrix needs to be filtered to remove genes with missing or erroneous values. Then, numerical values in the matrix are scaled by logarithm with base 2 to nor- malize data distribution and reduce potential data bias by extreme values. When a series of test sam- ples (e.g. clinical samples) is compared with an unpaired control (reference) sample, the logarithm scaled ratios need to be processed further by mean or median centring to allow for data analysis in test samples that is independent of the gene expression level in the unpaired control sample.
Similarity measurements
Current efforts in understanding microarray data are focused primarily on clustering and visualiza- tion. Clustering is intended to catalogue genes or RNA samples into meaningful groups based on their similar behaviours; visualization is intended to depict clustering results in a readily accessible format. For comparisons of similarities, the concept of Euclidean distance and calculation of correlation coefficients are usually utilized to set up the similarity measurement. Euclidean distance is the distance between twon-dimensional points, e.g. X and Y. Corresponding values for X are X1, X2, . . ., XN, and corresponding values for Y are Y1, Y2, . . ., YN. The Euclidean distance between X and Y is
d( , )x y (x yi i)
i
= ∑n − 2
where nis the number of the RNA samples for gene comparison, or the number of genes from sample comparison. For example, comparing any two genes (e.g. X and Y) in a three-dimensional (i.e. three samples) space, the Euclidean distance between X and Y is
d( , )x yr r = (x y1− 1) (2+ x2−y2) (2+ x3−y3)2
The closer the distance between two points, the more similar they are.
The correlation coefficient between any two n-dimensional points is defined as
r N X X Y Yi i
i
= n −
−
∑
=
1
1 dX dY
where nis the number of the RNA samples for gene comparison, or the number of genes for sample comparison,X is the average of values in
Genomics and Beyond 5
point X, anddXis the standard deviation of values in point X.
For example, if point X and Y are plotted as curves based on their values in all samples or genes,
rwill tell how similar the shapes of the two curves are. The correlation coefficient is always between
−1 and 1. Whenrequals 1, the two shapes are identical. When r equals 0, the two shapes are
6 J. Zhang
Fig. 1.1. Ideogram depicting the general procedure ofin silicoDNA microarrays. The image on the left illustrates the printing process of microarray fabrication, in which cDNA inserts from individual clones are prepared by PCR and printed on to polylysine-coated glass slides through a GMS 417 arrayer (Affymetrix).
After an overnight hybridization with differentially labelled test and control probes, the slide is scanned using a GenePix 4000 scanner (Axon Instruments) to generate two TIF images: green channel and red channel. The pixel ratio between red and green for each spot is used as the numerical value for further data analysis.
18 Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:27 PM
Color profile: Disabled Composite Default screen
Genomics and Beyond 7
Fig. 1.2. Microarray data visualizations using different algorithms (see colour version in Frontispiece).
(A) Colour images illustrating similarities and dissimilarities between two brain development stages. The upper left is the full image generated by the Cy3-labelled day 11.5 post-coitum (p.c.) mouse brain cDNA pool versus a Cy5-labelled mouse embryonic liver cDNA pool, and the lower left is the full image generated from the Cy3-labelled day 12.5 p.c. mouse brain cDNA versus the same control. The right panel depicts partial images of the left panels to illustrate details. (B) Graphic distributions showing representative clusters obtained by K-means clustering. The horizontal scale represents RNA samples obtained from ten different time points of mouse embryonic brain development. The vertical scale weighs changes in expression, from high expression (red) to low expression (green) with units in log ratio, subtracted by median. (C) A partial tree view obtained through hierarchical clustering of 4608 mouse genes over ten embryonic development samples. Red represents up-regulation and green represents down-regulation. (D) A bar graphic display of SOM illustrating gene clustering and expression patterns of regulated genes during the yeast sporulation process. All genes were organized into 324 (18×18) hexagonal map units. Each bar in a given unit illustrates the average expression of genes mapped to that unit. (E) U-matrix and component plane presentations. The colour coding in U-matrix stands for Euclidean distance. The darker the colour, the smaller the distance. The large dark blue area that occupies the majority of the display represents unregulated genes, which form some noise clustering. The component plane presentations (t0–t11) illustrate differential displays of regulated genes during sporulation of yeast on the genome-wide scale. The colour coding index stands for the expression values of genes. The brighter the colour, the higher the value.
All these differential displays are linked by position: in each display, the hexagon in a certain position corresponds to the same map unit. It is straightforward to compare expression patterns in the same positions of different displays. The last label display shows the positions of each unit on the map.
completely independent. Whenrequals−1, the two shapes are negatively correlated. Both Euclidean distance and correlation coefficient are used to measure similarities in clustering. There is no clear justification to favour one procedure over the other.
Clustering algorithms
Commonly applied algorithms for gene clustering include hierarchical clustering, K-mean clustering and self-organizing map (SOM). Hierarchical clustering is based primarily on the similarity measure between individuals (genes or samples) using a pairwise average linkage clustering, usually the correlation coefficient (Eisenet al., 1998; White et al., 1999). Through the pairwise comparison, this algorithm eventually clusters individuals into a tree view. The length of the branches of the tree depicts the relationship between individuals, where the shorter the branch is the more similarity there is between individuals (Fig. 1.2C). This algorithm has been used fre- quently in microarray data analysis, and has proven to be a valuable tool. A major drawback of hierarchical clustering is the polygenetic tree structure of the algorithm, which may be best suited to situations of true hierarchical descent, such as in the evolution of species (Tamayoet al., 1999), rather than situations of multiple distinct pathways in living cells. This may lead to incorrect clustering of genes, especially with large and complex data sets.
K-means clustering allows the partition of individuals into a given number of (K) separated and homogeneous groups based on repeated cycles of computation of the mean vector for all individuals in each cluster and reassignment of individuals to the cluster whose centre is closest to the individual (Fig. 1.2B). Euclidean distance is used commonly as the similarity measurement.
A limitation of K-mean clustering is that the arbi- trarily determined number of gene clusters may not reflect true situations in living cells. In addition, the relationship between clusters is not defined.
The SOM (Kohonen, 1995; Kohonenet al., 1996), an artificial intelligence algorithm based on unsupervised learning, appears to be particularly promising for microarray data analysis. It is, therefore, of considerable interests to discuss this application in further detail.
Self-organizing map algorithm This algorithm has properties of both vector quan- tification and vector projection, and consequently configures output prototype vectors into a topo- logical presentation of original multidimensional input numerical data. SOM consists of a given number of neurons on a usually two-dimensional grid. Each of these neurons is represented by a multidimensional prototype vector. The number of dimensions of prototype vector is equal to that of dimensions (i.e. the number of samples) of input vectors. The number of input vectors is equal to the number of inputs, i.e. the number of genes in the matrix. The neurons are connected to adja- cent neurons by a neighbourhood relationship, which dictates the topology, or structure of the map. The prototype vectors are initiated with random numerical values and trained iteratively.
Each actual input vector is compared with each prototype vector on the mapping grid based on:
r r r r
x m− c = x m− i
min{i }, where xrstands for input vector and mrc for output vector. The best- matching unit (BMU) is defined when the prototype vector of a neurone gives the smallest Euclidean distance to the input vector. Simulta- neously, the topological neighbours around the BMU are stretched towards the training input vector so that they are updated as denoted by: m tri( + =1) m tri( )+a( )[ ( )t x t m tr −ri( )]. The SOM training is usually processed in two phases, a first rough training step and then the fine tuning. After iterative trainings, SOM eventually is formed in the format that individuals with similar properties are mapped to the same map unit or nearby neighbouring units, creating a smooth transition of related individuals over the entire map (Kohonen et al., 1996). More importantly, this ordered map provides a convenient platform for various inspections of the numerical data set. Although this algorithm has been utilized in several microarray-based investigations (Tamayoet al., 1999; Toronenet al., 1999; Chen et al., 2001), the full potential of SOM (particularly for visual inspections) has not yet been fully utilized in microarray data analyses. Recently, we have introduced component plane presenta- tions, a more in-depth visualization tool of SOM, for the illustration of microarray data, in order to depict transcriptional changes for genes. By integrating features of this component plane
8 J. Zhang
20 Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:29 PM
Color profile: Disabled Composite Default screen
presentation with SOM, microarray analyses go beyond gene clustering to include, for instance, differential displays of regulated genes on a genome-wide scale.
Simultaneous illustrations of gene clusters and genome-wide differential displays using
component plane integrated SOM To demonstrate the advantages of this approach over other analytical methods, we selected a previously analysed yeast sporulation data set with 6400 genes and seven RNA samples over seven time points (Chuet al., 1998). Sporulation in yeast is the process in which diploid cells undergo meio- sis to produce haploid germ cells, involving two overlapping steps: meiosis and spore formation;
the process can be divided into meiosis I, meiosis II and spore formation. The process of sporulation can be induced using a nitrogen-deficient medium.
For SOM algorithm and its visualizations, we have utilized a SOM toolbox programmed by Vesanto et al. (2000). This toolbox, built in the Matlab 5 computation environment, has capacities to pre-process data, train SOM using a range of different kinds of topologies and to visualize SOM in various ways. To maximize the number of neighbourhood contacts topologically, we utilized hexagonal prototype vectors instead of rectangular ones for the SOM training. The algorithm was then conducted using 324 prototype vectors on a two- dimensional lattice (18×18 grid). For the visualiza- tion, we first utilized a bar graphical display (Fig. 1.2D), similar to previously published dis- plays, to gain a global view of gene clustering and expression patterns of expressed genes. The num- ber of genes mapped to individual map units varied between seven and 62, and the bar chart displayed in each hexagonal unit represented the average expression pattern of genes mapped in the unit. It can be seen that the map has been organized in such a way that related patterns are placed in nearby neighbouring map units, producing a smooth transition of expression patterns over the entire map. Therefore, gene clustering can also be recognized by surrounding neighbouring map units in addition to its core unit.
To illustrate features other than clustering of regulated genes during the sporulation process, we integrated SOM analyses with the powerful visual- ization tool of component plane presentations.
Component plane presentations provide an in- depth approach to visualize variables that contrib- ute to SOM. Each component plane presentation is considered as a sliced version of SOM, illustrating values of a single vector component in all map units. For example, the first component plane (t0) in Fig. 1.2E shows the SOM slice at time point 0 h and the last component plane (t11) shows the SOM at time 11 h during the sporulation process (Chu et al., 1998). The colours of map units are selected so that the brighter the colour, the greater is the average expression value of the genes mapped to the corresponding unit. Each of these SOM slices can also be considered as a genome-wide differen- tial display of regulated genes, in which all up- regulated units (hexagons in red), down-regulated units (hexagons in blue) and moderately tran- scribed units (hexagons in green and yellow) are well delineated. By comparing these genome-wide differential displays, we can learn many additional features of regulated genes in cells. For instance, these displays are correlated sequentially with each other, depicting the process of sporulation at the transcriptional level. The sequential inactivation of genes mapped to the two upper corners suggests that the functional group represented by genes on the right is more sensitive to the nitrogen-deficient medium induction than the one on the left, although both of them are suppressed toward the end of the sporulation process. The sequential acti- vation of genes mapped to the two bottom corners gives us a more vivid picture of the process leading to spore formation. Genes in the bottom left corner and left edge are activated at an early stage of the process, indicating that these genes are associated specifically with meiosis I. In contrast, the progres- sively increased expression of genes in the right corner suggests that these genes are associated with meiosis II and spore formation. This is consistent with the observation that known genes of meiosis II and spore formation have been mapped to these corner units.
The SOM algorithm has great potential, in particular with regard to data visualization.
To date, most of the procedures used to visualize microarray data are limited to gene clustering, typically represented by bar graphical displays as depicted in Fig. 1.2D. In contrast, U-matrix (unified distance matrix) as displayed in Fig. 1.2E is a distance matrix method that visualizes the pairwise distance between prototype vectors of neighbouring map units and helps to define the
Genomics and Beyond 9
cluster structure of SOM. We have utilized this display successfully to define some core clusters of developmentally related genes expressed during brain development. However, the interpretation of data can be difficult when noise interruption is high. This concern is supported further by the presence of a large number of unregulated genes in the sporulation data set. These genes form clusters in a random manner, producing a visible clustering area in the centre of the SOM (Fig. 1.2D).
Component plane presentations provide an in-depth approach to visualize component vari- ables that contribute to SOM. Thus, SOM can be sliced into multiple sample-specific, genome-wide differential displays. Each of these displays details transcriptional changes of a specific sample on the genome-wide scale. These genome-wide differen- tial displays greatly help to identify the biological meanings of microarray data. As illustrated in this section, we were able to determine directly the functional significance of genes differentially expressed during the process of sporulation at the genome-wide scale. To reach similar conclusions by alternative methods would require a much greater effort (Chuet al., 1998). Component plane presentations are also applicable to microarray data from other organisms. For example, we have applied this approach to microarray data from mouse brain samples using ten time points during early brain development stages. In these studies, we have identified a large number of genes that are related to brain development. These genome-wide differential displays can be used to identify the functional significance of regulated genes. Also, the displays can be used to correlate data from various samples, based on patterns in identical positions of the displays; this is particularly promising for samples from clinical studies. The potential impact of this approach on microarray data analysis can be substantial.
Summary
With advances in genome research, the concept of genomics extends beyond structural analyses of genomes to include functional analysis of the genome. Structural genomics focuses on genetic mapping and physical mapping of the genome by using various tools of molecular biology. A genetic map is based on linkage analysis of the segregation
of polymorphic markers in pedigrees. A physical map measures the distance between loci in nucleo- tide base pairs. The ultimate physical map of a genome is the determination of its complete DNA sequences. One of the most prominent applica- tions of genome mapping is disease gene studies, typically represented by reverse genetics. Cancer genetics is also an important aspect of disease gene studies. Although the completion of the human genome sequencing is approaching, the under- standing of tools and reagents involving genome mapping may still be helpful for our current research. This chapter emphasizes functional genomics, represented by DNA microarray tech- nology. This technology allows the measure of tens of thousands of genes in parallel, providing the most comprehensive approach to understand- ing molecular mechanisms involved in living cells.
The most challenging part of DNA microarray analysis is to convert the massive amount of data into biologically meaningful networks. Compared with other data mining tools, we believe that SOMs, in particular if integrated by component plane presentations, is the most powerful tool in this respect. This integrated approach not only allows genes to be clustered but also permits regulated genes to be displayed differentially on the genome-wide scale. This application is partic- ularly appealing for clinical case studies, in which detailed comparison between transcriptional pro- files of individual patients often is required. With the great abundance of genomic information and the rapid development of technology, the deter- mination of molecular mechanisms that underlie living human cells has come within reach.
Acknowledgements
The author is grateful to Li Xiao and Yue Teng for their excellent assistance with data calculation and graphics.
References
Burke, D.T., Carle, G.F. and Olson, M.V. (1987) Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors.Science236, 806–812.
Chen, J.J., Peck, K., Hong, T.M., Yang, S.C., Sher, Y.P., Shih, J.Y., Wu, R., Cheng, J.L., Roffler, S.R.,
10 J. Zhang
22 Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:30 PM
Color profile: Disabled Composite Default screen
Wu, C.W. and Yang, P.C. (2001) Global analysis of gene expression in invasion by a lung cancer model.Cancer Research61, 5223–5230.
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O. and Herskowitz, I.
(1998) The transcriptional program of sporulation in budding yeast.Science282, 699–705.
Collins, F.S. (1995) Positional cloning moves from perditionaltotraditional.NatureGenetics9,347–350.
Deloukas, P., Schuler, G.D., Gyapay, G., Beasley, E.M., Soderlund, C., Rodriguez-Tome, P., Hui, L., Matise, T.C., McKusick, K.B., Beckmann, J.S.et al.
(1998) A physical map of 30,000 human genes.
Science282, 744–746.
DeRisi, J.L., Iyer, V.R. and Brown, P.O. (1997) Explor- ing the metabolic and genetic control of gene expressiononagenomicscale.Science278,680–686.
Dib, C., Faure, S., Fizames, C., Samson, D., Drouot, N., Vignal, A., Millasseau, P., Marc, S., Hazan, J., Seboun, E., Lathrop, M., Gyapay, G., Morissette, J.
and Weissenbach, J.A. (1996) Comprehensive genetic map of the human genome based on 5,264 microsatellites.Nature380, 152–154.
Eisen, M.B., Spellman, P.T., Brown, P.O. and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns.Proceedings of the National Academy of Sciences USA95, 14863–14868.
Hardas, B.D., Zhang, J., Trent, J.M. and Elder, J. (1994).
Direct evidence for homologous sequences on the paracentric regions of human chromosome 1.
Genomics21, 359–363.
Ioannou, P.A., Amemiya, C.T., Garnes, J., Kroisel, P.M., Shizuya, H., Chen, C., Batzer, M.A. and de Jong, P.J. (1994) A new bacteriophage P1-derived vector for the propagation of large human DNA fragments.Nature Genetics6, 84–89.
Kallioniemi, A., Kallioniemi, O.P., Sudar, D., Rutovitz, D., Gray, J.W., Waldman, F. and Pinkel, D. (1992) Comparative genomic hybrid- ization for molecular cytogenetic analysis of solid tumors.Science258, 818–821.
Kohonen, T. (1995)Self-organizing Maps.Springer Series in Information Sciences, Vol. 30, Springer, Berlin.
Kohonen, T., Oja, E., Simula, O., Visa, A. and Kangas, J.
(1996) Engineering applications of the self- organizing map. Proceedings of the IEFE 84, 1358–1384.
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W. et al. (2001) Initial sequencing and analysis of the human genome.
Nature409, 860–921.
Litt, M. and Luty, J.A. (1989) A hypervariable microsatellite revealed byin vitroamplification of a dinucleotide repeat within the cardiac muscle actin gene.American Journal of Human Genetics44, 397–401.
Marynen, P., Zhang, J., Cassiman, J.J., Van den Berghe, H. and David, G. (1989) Partial primary structure of the 48- and 90-kilodalton core proteins of cell surface-associated heparan sulfate proteoglycans of lung fibroblasts.Journal of Biological Chemistry264, 7017–7024.
Olson, M., Hood, L., Cantor, C. and Botstein, D. (1989) A common language for physical mapping of the human genome.Science245, 1434–1435.
Schena, M., Shalon, D., Davis, R.W. and Brown, P.O.
(1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray.
Science270, 467–470.
Schrock, E., du Manoir, S., Veldman, T., Schoell, B., Wienberg, J., Ferguson-Smith, M.A., Ning, Y., Ledbetter, D.H., Bar-Am, I., Soenksen, D., Garini, Y. and Ried, T. (1996) Multicolor spectral karyotyping of human chromosomes.Science273, 494–497.
Shizuya, H., Birren, B., Kim, U.J., Mancino, V., Slepak, T., Tachiiri, Y. and Simon, M. (1992) Cloning and stable maintenance of 300-kilobase- pair fragments of human DNA inEscherichia coli using an F-factor-based vector. Proceedings of the National Academy of Sciences USA89, 8794–8797.
Stallings, R.L., Ford, A.F., Nelson, D., Torney, D.C., Hildebrand, C.E. and Moyzis, R.K. (1991) Evolu- tion and distribution of (GT)n repetitive sequences in mammalian genomes.Genomics10, 807–815.
Tamayo, P., Slonim, D., Mesiror, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S. and Gowb, T.R. (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation.
Proceedings of the National Academy of Sciences USA96, 2907–2912.
Toronen, P., Kolehmainen, M., Wong, G. and Castren, E. (1999) Analysis of gene expression data using self-organizing maps.FEBS Letters451, 142–146.
Venter, J.C., Adams, M.D., Sutton, G.G., Kerlavage, A.R., Smith, H.O. and Hunkapiller, M. (1998) Shotgun sequencing of the human genome.Science 280, 1540–1542.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A.et al. (2001) The sequence of the human genome.Science291, 1304–1351.
Vesanto, J. (2000) Neural network tool for data mining:
SOM toolbox. In:Proceedings of Symposium on Tool Environments and Development Methods for Intelligent Systems, TOOLMET2000. Oulun yliopistopaino, Oulu, Finland, pp. 184–196.
Walter, M.A., Spillett, D.J., Thomas, P., Weissenbach, J.
and Goodfellow, P.N. (1994) A method for con- structing radiation hybrid maps of whole genomes.
Nature Genetics7, 22–28.
Genomics and Beyond 11
Weber, J.L. and May, P.E. (1989) Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. American Journal of Human Genetics44, 388–396.
White, K.P., Rifkin, S.A., Hurban, P. and Hogness, D.S. (1999) Microarray analysis of Drosophila development during metamorphosis.
Science286, 2179–2184.
Zhang, J., Marynen, P., Devriendt, K., Fryns, J.P., Van den Berghe, H. and Cassiman, J.J. (1989a) Molecular analysis of the isochromosome 12P in the Pallister–Killian syndrome. Construction of a mouse–human hybrid cell line containing an i(12p) as the sole human chromosome.Human Genetics83, 359–363.
Zhang, J., Hemschoote, K., Peeters, B., De Clercq, N., Rombauts, W. and Cassiman, J.J. (1989b) Localiza- tion of the PRR1 gene coding for rat prostatic proline-rich polypeptides to chromosome 10 by in situhybridization.Cytogenetics and Cell Genetics52, 197–198.
Zhang, J., Devriendt, K., Marynen, P., Van den Berghe, H. and Cassiman, J.J. (1990) Chromosome mapping using polymerase chain reaction on somatic cell hybrids.Cancer Genetics and Cytogenetics 45, 217–221.
Zhang, J., Meltzer, P., Jenkins, R., Guan, X.Y. and Trent, J. (1993a) Application of chromosome microdissectionprobesforelucidationofBCR-ABL fusion and variant Philadelphia chromosome trans- locations in chronic myelogenous leukemia.Blood 81, 3365–3371.
Zhang, J., Trent, J.M. and Meltzer, P.S. (1993b) Rapid isolation and characterization of amplified DNA by chromosome microdissection: identification of IGF1R amplification in malignant melanoma.
Oncogene8, 2827–2831.
Zhang, J., Cui, P., Glatfelter, A.A., Cummings, L.M., Meltzer, P.S. and Trent, J.M. (1995) Microdis- section based cloning of a translocation breakpoint inahumanmalignantmelanoma.CancerResearch55, 4640–4645.
12 J. Zhang
24 Z:\Customer\CABI\A4514 - Zempleni\A4576 - Zempleni - First Revise VP8 #F.vp Tuesday, June 24, 2003 2:23:30 PM
Color profile: Disabled Composite Default screen
2 Perspectives in Post-genomic Nutrition Research
Hannelore Daniel
Molecular Nutrition Unit, Department of Food and Nutrition, Technical University of Munich, Germany
Introduction
Every nutritional process relies on the interplay of a large number of proteins encoded by mRNA molecules that are expressed in a given cell. Alter- ations of mRNA levels and in turn of the corre- sponding protein levels (although the two variables do not necessarily change in parallel) are critical parameters in controlling the flux of a nutrient or metabolite through a biochemical pathway. Nutri- ents and non-nutrient components of foods, diets and lifestyle can affect essentially every step in the flow of genetic information, from gene expression to protein synthesis to protein degradation, thereby altering metabolic functions in the most complex ways. There is no doubt that with the genetic information emerging on a daily basis, we are discovering exciting tools that provide us with insights into the molecular basis of human metab- olism under normal as well as pathophysiological conditions. There is also no doubt that the interplay of the rather static mammalian genome with its rapidly changing nutritional environment is one of the most attractive and interesting areas in post-genomic research.
From Gene to Function and from Genomics to Phenomics
Although a huge body of information on the number of mammalian genes, on chromosomal
localization of individual genes, their genomic structure and in part also on the functions of the encoded proteins has been gathered, we are far from understanding how these individual factors orchestrate metabolism.
The molecular descriptors of metabolism Genomic data contain only limited information about the dynamic behaviour of integrated cellu- lar processes. Nevertheless, recent technological advances have made it possible to analyse the vari- ability and dynamic changes in the genetic response of a cell or organism by determining the expression level of individual RNA molecules or huge sets of mRNA molecules. Whereas genomics describes large-scale DNA sequencing that provides basic genetic information and insights into sequence heterogeneity (i.e. single nucleotide polymorph- isms; SNPs) in coding regions of genes as well as in control elements (i.e. promoters), transcriptomics – also called expression profiling – assesses mRNA levels of a few or up to several thousand open reading frames simultaneously in a biological sam- ple; this is done mainly by DNA hybridization arrays and/or by quantitative polymerase chain reaction (PCR) techniques (Celis, 2000; Lockhart and Winzeler, 2000). Proteomics allows the pro- teome – as the protein complement of the genome that is expressed in a cell or an organ – to be iden- tified, and changes in protein expression patterns
©CABInternational2003.Molecular Nutrition
(eds J. Zempleni and H. Daniel) 13