13

C-Labeling Technique for Metabolic Network and Flux Analysis Theory and Applications

Proefschrift

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof.dr.ir. J.T. Fokkema, voorzitter van het College voor Promoties in het openbaar te verdedigen op maandag 4 november 2002 te 13.30 uur door

Wouter Adrianus VAN WINDEN ingenieur in de Bioprocestechnologie geboren te Monster

Dit proefschrift is goedgekeurd door de promotor: prof. dr. J.J. Heijnen

Samenstelling promotiecommissie: Rector Magnificus prof. dr. J.J. Heijnen dr. P.J.T. Verheijen prof. dr. J. Nielsen prof. dr. J.T. Pronk dr. U. Sauer dr. D. Schipper prof. dr. W. Wiechert

Voorzitter, Technische Universiteit Delft, promotor Technische Universiteit Delft, toegevoegd promotor Technical University of Denmark, Denemarken Technische Universiteit Delft ETH Zürich, Zwitserland DSM University of Siegen, Duitsland

The studies presented in this thesis were performed in the section bioprocess technology at the Kluyver Laboratory for Biotechnology and in the section process systems engineering at the DelftChemTech building, Delft University of Technology, The Netherlands. The research was part of the Delft University DIOC-program ‘Mastering the Molecules in Manufacturing’. Cover: 13e Lions-Rottemerenloop 23-2-2002 (Source: Hart van Holland, 27-2-2002) ISBN: 90-9016251-8

Voor Olga, Chris, Hélène, Bas & de zijnen, Loek en ‘Quark’

Table of Contents List of Abbreviations

1

1

3 3 5 6 15

1.1 1.2 1.3 1.4

Introduction Metabolic network and flux analysis Identifiability problems in metabolic flux analysis Solutions to identifiability problems Aim and outline of thesis THEORY

2 2.1 2.2 2.3 2.4 App.A App.B

3 3.1 3.2 3.3 3.4 3.5 3.6 App.A

4 4.1 4.2 4.3 4.4 4.5 App.A App.B App.C

5 5.1 5.2 5.3 5.4

Possible Pitfalls of Flux Calculations Based on 13C-Labeling Introduction Pitfall I: Incomplete metabolic reaction models Pitfall II: Microcompartmentation due to metabolite channeling Conclusions Explanation of simulated labeling data Generation of covariance matrix

19 20 21 29 33 33 35

A Priori Analysis of Metabolic Flux Identifiability from 13C-Labeling Data Introduction A priori identifiability analysis I: Structural analysis and reduction of metabolic networks A priori identifiability analysis II: Cumomer balances A priori identifiability analysis III: MS and 13C-NMR multiplet measurements Practical application Conclusions Linear algebra

39 40 41 45 50 55 57 58

Innovations in Generation and Analysis of 2D [13C,1H] COSY Spectra for Metabolic Flux Analysis Purposes Introduction Theory Materials and methods Results Conclusions Calculation of relative intensity covariances from spectral noise variance Isotopic non-steady state correction of 2D [13C,1H] COSY spectral data Significant deviation of relative intensities

61 62 63 73 75 85 86 88 89

Cumulative Bondomers: A New Concept in Flux Analysis from 2D [13C,1H] COSY Data Introduction Theory Practical application Conclusions

93 94 95 107 116

6

Systematic Approach for Converting Bondomers to 2D [13C,1H] COSY and

6.1 6.2 6.3 6.4

MS Data Introduction Theory Practical application Conclusions

7

Correcting Mass Isotopomer Distributions for Naturally Occurring

7.1 7.2 7.3 7.4

Isotopes Introduction Theory Practical applications Conclusions

119 120 120 123 127 129 130 131 133 136 APPLICATIONS

8

Metabolic Flux and Network Analysis of Penicillium chrysogenum Using

8.1 8.2 8.3 8.4 8.5

2D [13C,1H] COSY Measurements and Cumulative Bondomer Simulation Introduction Theory Materials and methods Results and discussion Conclusions

9

Metabolic Flux and Network Analysis of Saccharomyces cerevisiae Using

9.1 9.2 9.3 9.4 9.5 App.A

10

2D [13C,1H] COSY and LC-MS Measurements Introduction Theory Materials and methods Results and discussion Conclusions Modeling isotopic non-steady state

139 140 141 143 145 160 163 164 165 170 173 190 191

Verifying Assumed Biosynthetic Pathways, Metabolic Precursors and

Estimated Measurement Errors of Amino Acids, Trehalose and Levulinic Acid Using Redundant 2D [13C,1H] COSY Data 10.1 Introduction 10.2 Theory 10.3 Materials and methods 10.4 Results and discussion 10.5 Conclusions Summary Samenvatting Future Directions List of Publications Curriculum Vitae Dankwoord

195 196 196 197 198 211 213 217 221 223 225 227

List of Abbreviations Throughout this thesis the following abbreviations are used for the biological compounds: acoa akg ala ald arg asp ATP bpg cit dha dhap e4p ery fbp f6p fum glc glx gly gox g1p g6p g3p his ile leu lev lys mal man NAD(P)H oxa pep 2pg 3pg 6pg phe p5p pro

acetyl coenzyme A α-ketoglutarate alanine acetaldehyde arginine aspartic acid Adenosine TriPhosphate 1,3-bisphosphoglycerate citrate dihydroxyacetone dihydroxyacetonephosphate erythrose 4-phosphate erythritol fructose 1,6-bisphosphate fructose 6-phosphate fumarate glucose glutamic acid glycine glyoxylate glucose 1-phosphate glucose 6-phosphate glyceraldehyde 3-phosphate histidine isoleucine leucine levulinic acid lysine malate mannitol Nicotinamide-Adenine Dinucleotide (Phosphate) oxaloacetate phosphoenolpyruvate 2-phosphoglycerate 3-phosphoglycerate 6-phosphogluconate phenylalanine pentose 5-phosphate (representing a lumped pool of r5p, ru5p and x5p) proline 1

List of abbreviations pyr r5p ru5p ser s7p suc thr tp tre tyr val x5p

2

pyruvate ribose 5-phosphate ribulose 5-phospate serine sedoheptulose 7-phosphate succinate threonine triose phosphate (representing a lumped pool of g3p, dhap, bpg, 2pg, 3pg and pep) trehalose tyrosine valine xylulose 5-phosphate

Chapter 1

Introduction Part of this chapter was published in Van Gulik et al. (2001) 1.1 METABOLIC NETWORK AND FLUX ANALYSIS Determining the structure of a metabolic network and the steady state fluxes therein is a first and crucial step in the metabolic engineering of microorganisms that are applied for the production of small molecules. A good knowledge of the reaction stoichiometry and steady state flux distribution of a metabolic network form the foundation on which one can build a kinetic metabolic model for in silico analysis of the network. Moreover, by combining flux patterns of various steady states with measurement data of enzyme activities and metabolite concentrations at those steady states one can calculate flux control coefficients (normalized sensitivities of fluxes to enzyme activities) and elasticities (normalized sensitivities of specific reaction rates to metabolite levels), both of which are parameters that determine the kinetics of given metabolic model (Visser,2002). Therefore, mass balancing is an important tool for identifying the bottlenecks of the metabolic network that form the targets of choice for genetically improving a microorganism. When evaluating the results of a given genetic modification of a microorganism, mass balancing allows a first, rapid screening of the altered physiology by showing how the metabolic network and flux pattern have changed under identical cultivation conditions. Besides this application of metabolic flux analysis as a tool for metabolic engineering purposes, the identification of intracellular fluxes is an important goal in itself in the study of the physiology of microbes, of animal and plant cells and of entire tissues or organs in animals and man. Intracellular fluxes cannot be directly measured. Instead, they are calculated by solving a set of linear equations consisting of the mass balances of the intracellular metabolites and the measured net consumption or net production rates of a number of membrane-crossing compounds (note that different ways of writing the same set of equations are employed in literature): S 0 (1) M⋅ v = ϕ     In Eq.1 S is the stoichiometry matrix, M is the measurement matrix that contains one row per measured flux with an unity entry corresponding to that specific flux, v is the vector containing the steady state reaction rates that are to be calculated, ϕ is the vector containing the measured net rates and 0 is a zero vector. The subset of the above equations that represent the mass balances are equated to the zero vector instead of the differential term d(mass)/dt, because the steady state assumption implies that the intracellular metabolites do not accumulate and the dilution of components due to biomass growth is neglected. The algorithm that is used to validate the stoichiometric model and to determine the fluxes by means of this conventional Metabolic Network and Flux Analysis (MNA and MFA) 3

Chapter 1 approach is summarized in Fig.1. The algorithm in Fig.1 is started by cultivating biomass in (pseudo) steady state (e.g. in a chemostat or (fed)batch culture) (step I), measuring net consumption and production rates (step II) and setting up a stoichiometric model (box V) and finally yields a validated network structure and fluxes with their covariances (step IX).

I Perform continuous/ (fed) batch culture II Measure net conversion rates

III Do elementary balances significantly differ from zero? yes IV Identify more important (by-) products

V Set up stoichiometric model

no

VI Balance measured rates

VII Is squared difference between balanced and measured rates statistically acceptable?

no

yes

VIII Calculate fluxes

IX Report network structure, fluxes and covariances

FIGURE 1: The general algorithm for verifying the metabolic network topology and quantifying metabolic fluxes by means of metabolite balancing (conventional MNA and MFA).

There are two decision steps. Step III is a black-box check in which it is verified whether the elementary balances of consumed and produced compounds close. If not, important (by-) products have been overlooked. Their production has to be included in the model and their production rate has to be measured. Steps VI and VII are possible when the sum of the number of mass balances and measured rates in Eq.1 is larger than the rank of the combined stoichiometry and measurement matrices (note: the rank needs not equal the number of fluxes, this is a separate matter that will be dealt with later on). If this is the case, one can use the redundant measurement(s) to calculate a set of balanced measured rates (step IV). In decision step VII it is checked whether the sum of the covariance-weighted deviations between these balanced rates and the actually measured rates is statistically acceptable. If it is not acceptable 4

Introduction (and if all measurements were performed correctly), the network model needs to be revised. For details regarding the balancing and statistical testing of measured net conversion rates see Van der Heijden (1991). The question whether the rank of the combined stoichiometry and measurement matrices equals the number of fluxes can be translated into: can all intracellular fluxes be calculated from the measured extracellular rates (step VIII)? The answer to this question depends on the structure of the intracellular network and on how many and which extracellular fluxes are measured. 1.2 IDENTIFIABILITY PROBLEMS IN METABOLIC FLUX ANALYSIS Some structural features of metabolic networks prohibit determination of all intracellular fluxes, irrespective of the extracellular measurements that are performed. One such feature is the occurrence of parallel pathways in the cell, i.e. when a substrate or intermediate can be converted to a product molecule by means of different routes. In such cases mass balances only yield the sum of the fluxes through the parallel pathway branches but never the separate fluxes.

S X

v1

v5

I v2

CF CF’

P1

v3

v4

P2

FIGURE 2: A metabolic network in which substrate S in converted to intermediate I that is converted to biomass X and products P1 and P2. CF is a cofactor; the prime indicates its activated state. The dotted line represents the cell membrane.

Consider for example the metabolic network in Fig.2, where measuring the consumption rate of substrate S and the production rates of X, P1 and P2 does not suffice to calculate the separate fluxes v3 and v4. Another way of saying this is stating that the set of flux constraints is underdetermined. This becomes clear when setting up the linear constraints for the system in the same format as Eq.1:

5

Chapter 1  −1  measured consumption S 1 0 measured production X  measured production P1 0 measured production P2  0 mass balance I

1 1 1 1   v1   0       0 0 0 0   v 2   −ϕS  0 0 0 1  ⋅  v3  =  ϕX       1 0 0 0   v 4   ϕP1  0 1 1 0   v5   ϕP 2 

( 2)

S =4 M

rank 

The rank of the combined stoichiometry and measurement matrices (4) is lower than the number of fluxes (5). A real-life example of a parallel pathway is found in lysine biosynthesis (e.g. Sonntag et al.,1993) that will be discussed in the section ‘13C-labeling technique’. Besides parallel pathways, another identifiability problem in mass balancing stems from bidirectional fluxes (Wiechert et al.,1997). Mass balancing only yields net fluxes between the metabolite pools, but it does not give the information needed to conclude what forward and backward fluxes constitute that net flux. From a metabolic engineer’s point of view, it is interesting to know which reactions in the cell are reversible and to what extent the reaction runs in both directions under in vivo conditions, since highly reversible reactions are near their thermodynamic equilibrium, which means that enzymes that catalyze these steps are unlikely to exert much control on the flux through the pathway. When modeling the metabolism of a eukaryotic microorganism, distinct pools have to be included in the model for metabolites that appear in more than one cell compartment. In these cases, the number of fluxes (including transport rates across the intracellular membranes) often increases more than the number of additional mass balances that can be made for the compartmented metabolites. As a result, compartmented metabolic models often contain more unidentifiable fluxes than those of prokaryotic microorganisms. 1.3 SOLUTIONS TO IDENTIFIABILITY PROBLEMS One can distinguish five distinct methods for solving the identifiability problems mentioned above: (1) making a kinetic model and measuring enzyme activities plus metabolite concentrations under steady state conditions, (2) measuring metabolite concentrations under dynamic conditions, (3) adding cofactor balances to the steady state mass balances, (4) linear programming and (5) applying the 13C-labeling technique. The strengths and weaknesses of these methods will be discussed next. Kinetic model plus steady state measurements When a complete kinetic metabolic model is available, one can calculate all fluxes (including their reversibilities) from the measurements of enzyme activities and intracellular metabolite concentrations at a given steady state. A disadvantage of this method is that the validity of in vitro enzyme kinetics under in vivo circumstances is under debate and that methods for accurate measurements of intracellular metabolites are still under development (e.g. Theobald et al.,1997;Lange et al.,2001). Therefore, determining all intracellular fluxes from enzyme and metabolite measurements is as yet practically impossible. 6

Introduction Of course, when a specific enzyme activity is found to be zero, no kinetic model is needed to conclude that the flux through the concerning reaction is zero. However, this is metabolic network analysis rather than metabolic flux analysis: under the culture conditions that apply the reaction that is found to be absent can be removed from the network. Dynamic metabolite measurements As said, mass balances do not give information about reversibility of reactions. An elegant, but experimentally demanding way to determine the reversibility of a given reaction is to perform a dynamic experiment (e.g. adding a substrate pulse to the fermentor) and measure the dynamic concentrations of the substrate and product of the reaction. If the ratio of the concentrations of the product and substrate remain constant (and ideally close to the theoretical equilibrium constant), the reaction may be assumed in chemical equilibrium and thus reversible (Visser et al.,2000). Cofactor balances An approach to solve the unidentifiability of fluxes that does not require any new measurements is to include mass balances of metabolic cofactors such as ATP, NAD(P)H and Coenzyme-A in the set of linear flux constraints. (Note that some researchers state that pathways employing different cofactors may not be called ‘parallel’.) A hypothetical example of how a cofactor balance can make the fluxes through two parallel pathways observable is shown in Fig.2. In this figure the parallel reactions v3 and v4 leading from intermediate I to product P2 are different with respect to their cofactor (CF). If the activated cofactor (CF’) is assumed to be ATP, one can imagine that reaction v4 is more efficient and does not involve the hydrolysis of ATP, whereas reaction v3 is less efficient and costs one ATP. If the cofactor balance is taken into account, it can be seen that the rate at which product P1 is produced, determines the rate at which ATP is generated. This rate fixes the fluxes requiring ATP (i.e. v3 and v5). Now, all intracellular fluxes can be determined, since the set of flux constraints is fully determined. Eq.3 shows that the rank of the combined stoichiometry and measurement matrices now equals the number of fluxes.  −1 1  mass balance CF  0 −1 measured consumption S  1 0  measured production X  0 0 measured production P1  0 1  measured production P2  0 0 mass balance I

1 1 1  0    v1   0  1 0 1     v2 0 0 0     −ϕS    ⋅  v3  =  0 0 1     ϕX  v4 0 0 0     ϕP1     v5   1 1 0   ϕP 2 

( 3)

S =5 M

rank 

A real-life example of the hypothetical situation shown in Fig.2 is encountered in the primary carbon metabolism. The glycolytic and pentose phosphate pathways may be considered as parallel pathways both of which convert glucose 6-phosphate to pyruvate. The glycolytic branch of this parallel pathway yields more ATP and NADH per converted glucose molecule 7

Chapter 1 than the pentose phosphate branch, but in contrast to the glycolysis the latter pathway yields NADPH. The electron carrier NADPH is known to be required for anabolism and for the biosynthesis of a number of secondary metabolites (such as penicillin in the case of Penicillium chrysogenum). Therefore, equating the consumption of NADPH in biosynthetic reactions to its production in the oxidative steps of the pentose phosphate is one way of determining how much glucose is consumed via the pentose phosphate pathway. Although cofactor balances offer a solution to some flux analyses in parallel pathways, they have serious disadvantages. Firstly, cofactor balances do not enable determination of the separate forward and backward fluxes in bidirectional reaction steps. Besides, cofactor mass balances rely on estimated reaction stoichiometries and on controversial assumptions (Marx et al.,1996,1999;Schmidt et al.,1998;Wiechert et al.,1997). Examples of the latter are a growthindependent ATP yield of the respiratory system (P/O-ratio), fixed growth and non-growth associated maintenance requirements of the cell, the presence of ATP-wasting futile cycles, NAD(P)H specificities of various enzymes and the presence or absence of transhydrogenases that interconvert NADH and NADPH. Furthermore, cofactors are involved in so many reactions in the cell that it is hardly possible to take all of them into account. By consequence, cofactor mass balances are most probably far from complete and may lead to erroneous flux estimates when included in the mass balances. Linear programming A second approach for solving unidentifiability problems that does not require any additional measurements is linear programming (Savinell and Palsson, 1992). This approach is based on the presumption that cells strive for a certain goal, such as maximizing their biomass or energy yield per amount of consumed substrate, maximizing their growth rate, or minimizing the sum of squared fluxes. This goal is translated into an objective function that expresses the goal as a linear function of the metabolic fluxes. These fluxes can be freely varied within the free flux space that is formed by the null space of the combined stoichiometry and measurement matrix (S M)T in Eq.1. By means of the linear programming technique one can find the unique set of fluxes that optimizes the objective function. In case the consumption rate of S and the production rates of X, P1 and P2 are measured in the example of Fig.2, the free flux space is (any multiple of) the null space of (S M)T in Eq.2 that consists of a single column: (0, 0, 1, -1, 0)T. This free flux space shows that for the given measurements the fluxes v3 and v4 can be freely varied while keeping their sum constant, which is easily understood by looking at Fig.2. Presuming that the studied cell strives to maximize production of activated cofactor (CF’, representing e.g. energy in the form of ATP), linear programming leads to the following optimal flux distribution: v=(-φS, φP1, 0, φP2, φX). Clearly, all flux towards product P2 is now diverted to reaction v4 that does not consume any CF’. The main objection against the linear programming approach is that the supposed objective function may not be valid for the biological system. For instance, it is tempting to assume that cells strive for minimizing processes that waste energy. In practice, energy wasting processes (often referred to as ‘futile cycles’) do occur, because they serve other purposes such as generating of heat or allowing a more flexible control of the metabolic 8

Introduction network. Still, Van Gulik and Heijnen (1995) successfully used linear programming with a maximal biomass yield as the objective function to predict the activity of various metabolic pathways in Saccharomyces cerevisiae growing on various glucose/ethanol mixtures. They found that their predictions agreed reasonably well with measured enzyme activities. 13

C-labeling technique The last alternative for resolving underdetermined fluxes that is discussed here and that is used throughout this thesis is the 13C-labeling technique. This method has the potential to resolve fluxes flowing through a complicated metabolic network, even when it includes many bidirectional reactions. This non-invasive technique is based on the fact that the metabolic reactions in the cell split and form carbon-carbon bonds in well-known ways. When the cell is fed with a 13C-labeled substrate, this splitting and joining of molecules leads to a rearrangement of the positions of the 13C-labeling in the intracellular metabolites that depends on the relative fluxes through the various metabolic pathways. A classic example of two parallel pathways that lead to different 13C-labeling distributions in the common product is the lysine biosynthesis in Corynebacterium glutamicum (Sonntag et al.,1993). The biosynthesis pathway (see Fig.3) consists of two branches, one of which is the four-step succinylase variant and the other of which is the onestep dehydrogenase variant. The succinylase variant has a symmetric intermediate, L,Ldiaminopimelate, which causes a 50:50 mixture of the two orientations of the carbon skeleton in the final product lysine. The dehydrogenase variant leads to a single orientation of the carbon skeleton. Fig.3 shows how a hypothetical 3-13C-labeled pyruvate would lead to 100% 5-13C-labeled lysine via the dehydrogenase variant of the pathway and to 50% 3-13C-labeled and 50% 5-13C-labeled lysine via the succinylase variant.

1 2 3 4

1 2 3

L-aspartate

pyruvate

1 2 3 4 5 6 7 L-piperideine-2,6-dicarboxylate 1 2 3 4 5 6 7 1 2 3 4 5 6 7

1 2 3 4 5 6 7 D,L-diaminopimelate 7 CO2 7

1 2 3 4

5 6 100%

7

lysine

50% 1 2 3 4 5 6 50% 1 2 3 4 5 6

FIGURE 3: The branched biosynthesis pathway of lysine in Corynebacterium glutamicum. The white squares and circles represent 12C-atoms; gray circles represent 13C-atoms.

9

Chapter 1 Practical aspects A 13C-labeling experiment consists of feeding a (mixture of) 13C-labeled compound(s) to the metabolic network that is investigated and measuring the amounts of labeled carbon atoms at various carbon positions of the metabolic intermediates. These measurements either rely on the different magnetic spin of 12C- and 13C-atoms (13C-Nuclear Magnetic Resonance spectroscopy, 13C-NMR), on the splitting of the 1H-NMR signals of protons that are bonded with 13C-atoms (1H-NMR), or on the different atomic mass of the two carbon isotopes (Mass Spectrometry, MS). The experimental steps of a typical example of a 13C-labeling experiment are illustrated in Fig.4. The figure shows the application of the NMR method 2D [13C,1H] Correlation Spectroscopy (COSY). For this type of 13C-labeling measurement often a single uniformly 13 C-labeled carbon source is used (Szyperski, 1995), but this is not necessarily the case (Petersen et al.,2000). The commercial availability of numerous specifically 13C-labeled carbon substrates has made the 13C-labeling of the substrate an important design parameter of 13 C-labeling experiments (Möllney et al.,1999).

t>3*D-1

90% C-glucose

0.003

12

NMR biomass

10% 13 C-glucose NH4+, ethanol

air S. cerevisiae

C H

hydrolyzed biomass (sugars, amino acids)

FIGURE 4: The experimental steps of a 2D [13C,1H] COSY NMR experiment. (D is the dilution rate, i.e. the medium flow rate divided by the fermentor volume.)

As 13C-labeling experiments aim at determining the steady state flux distribution in the cell, the most appropriate cultivation technique is chemostat cultivation or (fed) batch where cells grow at a constant rate in a DO-, pH- and temperature-controlled environment. When measuring the 13C-labeling distribution of polymeric components of biomass growing in a chemostat (as is the case in the example of Fig.4), the culture has to be fed with 13C-labeled substrate for more than three residence times of the bioreactor to ensure that most (precisely: 10

Introduction the fraction (1-e-3)>0.95) unlabeled biomass has been replaced by labeled biomass. When employing batch cultivations for 13C-labeling studies (e.g. Maaheimo et al., 2001) less 13Clabeled substrate is required (it should be sufficient to make the amount of biomass in the inoculum negligible). This latter method is based on the assumption that in the batch cultures the larger part of the biomass is formed during the exponential growth phase during which the intracellular fluxes are approximately constant. A resulting disadvantage is that batch cultures only allow the determination of the fluxes that apply at maximal growth rate. When applying the 2D [13C,1H] COSY measurement method shown in Fig.4, the 13Clabeled biomass that is harvested only needs to be hydrolyzed. No further separation steps are needed prior to NMR analysis, as is clear from Fig.5 where a two-dimensional NMR spectrum of biomass hydrolysate is shown. Clearly, the 13C and 1H-dimensions give sufficient resolution to identify the separate multiplets of most carbon positions, four of which are highlighted in the figure.

phe, tyr phe, tyr FIGURE 5: A 2D [13C,1H] COSY spectrum (Schmidt et al., 1999). Highlighted in the spectrum are the multiplets containing 13C-labeling information on the grey-shaded carbon atoms of the amino acids phenylalanine (‘phe’) and tyrosine (‘tyr’) that are shown on the right.

The alternative 1H-NMR method that is used to measure fractional enrichments, on the contrary, requires separation of the biomass components. This method has the advantage that 1 H-NMR is more sensitive than 13C-NMR. A disadvantage of fractional enrichment measurements is that this method maximally yields a number of independent measurement data that equals the number of proton-bound carbon atoms in a compound. 2D [13C,1H] COSY 11

Chapter 1 measurements potentially contain more information with respect to the 13C-labeling distribution in the compound. Another measurement method that is increasingly employed is Gas Chromatography (GC)-MS analysis of polymeric biomass components (e.g. Gombert et al., 2001). In these experiments the biomass components need to be derivatized to increase their volatility and allow their separation in the GC. GC-MS analysis maximally results in a number of independent measurement data that equals the number of carbon atoms in a compound, unless compounds are fragmented and the mass distributions of the fragments are measured as well. In this thesis a new measurement technique is presented, in which the 13C-labeling distribution of metabolic intermediates is measured using Liquid Chromatography (LC)-MS. In contract to GC-MS, LC-MS does not require derivatization of the metabolites. However, when measuring the 13C-labeling of the intermediates one must take care to rapidly sample biomass and quench the metabolism. When allowed to continue after sampling, the metabolic reactions can quickly deplete the pools of metabolites that are present in very low concentrations. Moreover, changing metabolic rates could easily disturb the 13C-labeling patterns of the metabolites. The new LC-MS method is the only method that gives direct access to the 13C-labeling patterns of the metabolic intermediates. In all previously mentioned methods the 13C-labeling of the intermediates had to be deduced from the measurements of polymeric biomass components, which requires good knowledge of the biosynthesis routes. As the turnover time of the intermediates (typically in the order of seconds to minutes) is much smaller than that of biomass polymers (hours to days, depending on the growth rate), only a short time of 13C-labeled medium supply is needed in order to fully label the intracellular metabolites. This allows the application of this technique for determination of fluxes in cultures where the metabolism may not be assumed in steady state for a long time. Similarly to GC-MS analysis, LC-MS analysis maximally results in a number of independent measurement data that equals the number of carbon atoms in a compound, if the compounds are not fragmented. But also here fragmentation is possible. The 13C-labeling technique was introduced here as a method to identify fluxes that cannot be determined from measured net conversion rates alone. However, it needs to be verified, preferably a priori, whether all free fluxes are indeed identifiable from the available 13 C-labeling data. Wiechert and Wurzel (2001) have proven that in general a given set of metabolic fluxes leads to a unique set of 13C-labeling data. The answer to the inverse question (i.e. whether a unique set of fluxes exists that explains the observed set of 13C-labeling data) depends partly on the studied network, but also on the design of the experiment. Two design parameters that can be freely chosen and that determine the information obtained from the 13 C-labeling experiment are (1) the 13C-labeling of the carbon substrate in the medium and (2) the measurement method(s) that is (/are) used. Fig.6 shows a simple example of the varying information content of 2D [13C,1H] COSY, fractional enrichment and MS measurements of the same three-carbon compound. The figure shows both graphically and mathematically how the 23 isotopomers are related to the various measurement data. From the equations in Fig.6 it is clear that whereas fractional enrichment and mass distribution data are linearly related to the isotopomer fractions, the 2D [13C,1H] COSY NMR data are ratios of (sums of) isotopomer fractions. 12

Introduction

13%

M+0=0.13

15% 10%

s=0.20

7% 21% 2%

d=0.14

5%

d*=0.10 13

+

C

M+1=0.46 M+2=0.14

27%

M

100%

dd=0.55

f.e.#1= 0.55

f.e.#3= 0.51

M+3=0.27

f.e.#2= 0.49

0 0   0 0

0 0 0 0

1 0 0 0

0 1 0 0

0 0  0 1 0   0 0 0 0 0 0

0 0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1  f .e.#1  1 ⋅ idv =  f .e.# 2     1  f .e.#3 

0 1 0 0

0 1 0 0

0 0 1 0

0 1 0 0

0 0 1 0

0 0 1 0

0  M + 0   M +1  0  ⋅ idv =   0 M + 2   M + 3 1    0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1

0 0 1 0

0 0  ⋅ idv 0 1 

0 0   0 0

1  s    d 1  ⋅ idv =  *  1 d   dd  1  

FIGURE 6: The outcomes of 2D [13C,1H] COSY measurements (left) of the central carbon atom of a 13C-labeled three-carbon compound (middle), and fractional enrichment (middle, below) and MS measurements (right) of the same compound. White spheres represent 12C-atoms, gray spheres represent 13C-atoms. ‘M’ is the mass of the unlabeled compound, ‘s’ denotes a singlet, ‘d’ a doublet with ‘*’ indicating the larger scalar coupling constant, ‘dd’ denotes the double doublet, f.e.#i denotes the fractional enrichment of the ith carbon position, idv is the isotopomer distribution vector. The division in the last equation is an elementwise operation.

Note that not all possible measurements are shown in Fig.6: the 13C-labeling pattern around the (terminal) 1st and 3rd carbons can also be measured (provided that these carbons are bound to at least one proton). Assuming 2D [13C,1H] COSY NMR spectra of all three carbons are 13

Chapter 1 measured, this method yields 1+3+1=5 independent data. Similarly, MS may additionally give mass distributions of fragments of the compound. Assuming that the mass distributions of the whole molecule and a two-carbon fragment thereof can be determined, this method yields 3+2=5 independent data. For fractional enrichments it is obvious that the 3 independent enrichments shown in Fig.6 are the maximally obtainable amount of information. Theoretical aspects From a theoretical point of view the various types of 13C-labeling studies that were discussed above hardly differ. Irrespective of the measurement method employed, metabolic fluxes are calculated from measured 13C-labeling data according to the algorithm shown in Fig.7.

I Perform continuous/ (fed) batch culture, supply 13C-labeled substrate

VI Set up stoichiometric model excluding cofactor balances

VIII Set up model for simulation of 13C-labeling distributions

II Measure net conversion rates III Harvest biomass, measure 13C-labeling distribution in intermediates or biomass components by NMR or MS

VII Balance measured rates, express fluxes as measured part and free part

IV Analyze measured spectra, transform data to suitable format and estimate errors

no

V Are measured spectra consistent with assumed common precursors and biosynthesis routes?

IX Choose free flux parameters

X Calculate fluxes

yes

XI Simulate 13C-labeling distribution

XII Is difference between simulated and measured 13C-labeling distributions minimal?

no

yes XIII Is minimized difference between simulated and measured 13C-labeling distributions statistically acceptable?

no

yes XIV Report network structure, fluxes and covariances

FIGURE 7: The general algorithm for verifying the metabolic network topology and quantifying metabolic fluxes by means of the 13C-labeling technique.

In this algorithm it is assumed that mass balances and measured net conversion rates fix only a part of the fluxes in a metabolic network, i.e. some fluxes cannot be identified using 14

Introduction conventional MFA. It is further assumed that the stoichiometric model has passed the checks on the elemental balances and on the balanced net conversion rates that were included in the algorithm of Fig.1. Additionally to the measurements of the net conversion rates, 13C-labeling data are measured and analyzed in steps III and IV. In decision step V it is verified whether the measurements are consistent with the assumed precursors and biosynthesis routes. An example of this is the commonly made assumption that the amino acids valine and alanine are both synthesized from the precursor mitochondrial pyruvate. If the spectra of fragments of both amino acids that originate from the same carbon atoms according to the biosynthesis routes show large differences, these assumptions should be revised. This decision step should be performed before the actual flux analysis. The flux analysis consists of steps IX to XII in which the degrees of freedom (or: the rank-deficiency) that remain in Eq.1 are fixed by fitting simulated 13C-labeling data to the measured ones. The iterative fitting procedure is a result of the non-linear nature of the models that describe the distribution of 13C-labeling as a function of the metabolic fluxes. Presently, algebraic solutions have only been obtained for very small metabolic networks. Next, it is verified in step XIII whether the best fit that is obtained in step XII is statistically acceptable. Evidently, reliable covariances of the measured 13C-labeling data are needed for this step. If the fit is rejected, the stoichiometries and the reaction mechanisms in the metabolic network need to be revised. This check of the validity of the metabolic model further refines the preliminary checks of the model that were part of the algorithm shown in Fig.1. Even if the elementary balances close and the model passes the statistical test of the balanced net conversion rates, it may fail the statistical test of the best fit between measured and simulated 13C-labeling data. This decision step XIII and the iterative improvement of the metabolic model that is needed when the model fails the test are rarely explicitly mentioned in publications of flux analyses based on 13C-labeling experiments; minimized covariances weighted sums of squared residuals are rarely published. Neither is it checked whether models that pass the test, can be simplified and still yield statistically fits. This is a serious problem, because flux distributions are not reliable if the statistical acceptance of the underlying 13C-labeling models has not been verified. Therefore, 13C-labeling studies should initially aim at metabolic network analysis, resulting in a statistically validated metabolic network structure. Only when based on such a validated metabolic model, the 13C-labeling technique can be used for metabolic flux analysis. 1.4 AIM AND OUTLINE OF THESIS The history of the application of the 13C-labeling technique for metabolic flux analysis and many of its experimental and theoretical aspects were extensively reviewed by Szyperski (1998) and more recently by Wiechert (2001). The research project described in this thesis was started on the basis of the modeling approach that had at that time just been published by Schmidt et al. (1997) and the experimental NMR-approach proposed by Szyperski (1995). Initially we aimed at applying the 13C-labeling technique for metabolic flux analysis of P. chrysogenum growing under carbon limitation and consuming various nitrogen sources. A 15

Chapter 1 second application was the metabolic flux analysis of S. cerevisiae. This ‘learning by doing’ revealed a number of drawbacks of the existing methodology. This shifted the initial aim of our work from applying the technique to further developing the methodology itself in order to maximize the amount of information about metabolic fluxes that can be obtained from 13Clabeling experiments. In order to explain which aspects of the 13C-labeling methodology are covered in this thesis the chapters will be related to the corresponding algorithmic steps in Fig.7: • Chapter 2 focuses on step XIII. As stated in the previous section this decision step is rarely explicitly mentioned in 13C-labeling studies. This step requires a good estimation of the measurement errors (see chapter 4), since the metabolic network model needs to be accepted or rejected on statistical grounds. In this chapter we demonstrate that the omission of reactions may lead to flux estimates that differ dramatically from the true fluxes. Furthermore, the effects of channeling (i.e. direct transfer of the metabolites between enzymes that catalyze consecutive reaction steps) on estimated flux distributions is discussed. • Chapter 3 focuses on steps VI to X. In order to allow the design of 13C-labeling experiments, we developed a method for analyzing the a priori identifiability of the fluxes in a given metabolic network from a chosen set of 13C-labeling data. As such an analysis quickly leads to large systems of non-linear equations that need to be solved, we developed a method for simplifying the studied network based on insights in the general structure of 13C-labeling balances. Furthermore, we introduced a notation in which the set of metabolic fluxes could be written as the sum of a term that was fully determined by the measured net conversion rates and a term containing a limited number of free flux parameters that needed to be fixed by fitting the measured 13C-labeling data. Based on this free flux parameters representation and the concept of cumomer balances introduced by Wiechert et al. (1999), we proposed a systematic approach to algebraically analyze the identifiability of fluxes from a specific set of 13C-labeling data. Observing that this approach still led to computational problems for larger networks, we presented a numerical solution for these cases. The method allowed us to perform identifiability analysis for a published network for which this was previously not possible. • Chapters 4 and 7 both focus on steps III and IV. In our study we have used two types of 13 C-labeling data: (1) 2D [13C,1H] COSY spectra of biomass components such as amino acids, sugars and polyols and (2) MS spectra of intracellular metabolites. We developed a software package that allowed us to non-linearly fit the parameters of a newly developed spectral model to measured 2D [13C,1H] COSY spectra (chapter 4). This yielded an improved accuracy of the peak surface areas that were used as input for flux analysis and also yielded estimations of the measurement errors. A method was developed for correcting the obtained data for isotopic non-steady state of the biomass. With respect to the MS data we found a mistake in an existing method for correcting MS-data for the presence of naturally occurring isotopes and proposed a correct method (chapter 7). • Chapter 5 focuses on steps VIII and XI. We developed the new ‘bondomer’ concept that allows very efficient modeling of the 13C-distributions in a metabolic network that is fed with uniformly 13C-labeled substrates. Although the method yields much smaller models 16

Introduction

• •





than the existing isotopomer modeling approach, it is based on exactly the same state-ofthe art mathematical modeling framework as used for the isotopomer modeling. Chapter 6 focuses on step XI. The chapter gives a systematic method to convert the bondomers that were presented in chapter 5 to the actually measured NMR or MS data. Chapter 8 covers the entire algorithm of Fig.7 by presenting an application of the theory of chapters 2, 3, 4, 5 and 6. This chapter describes the metabolic network and flux analysis of P. chrysogenum growing on ammonia and nitrate as sole nitrogen sources. The 13Clabeling data used in this chapter are 2D [13C,1H] COSY spectra of amino acids and storage sugars in biomass lysate. Chapter 9 also covers the entire algorithm by presenting an application of the theory of chapters 2, 3, 4 and 7. This chapter describes the (compartmented) metabolic network and flux analysis of S. cerevisiae. The 13C-labeling data used in this chapter are obtained from 2D [13C,1H] COSY analysis of amino acids and storage sugars in biomass lysate and from a new LC-MS method that yields mass distributions of intermediates of the primary metabolism. Chapter 10 focuses on step V in the algorithm. An extensive set of 2D [13C,1H] COSY data obtained from chemostat cultures of S. cerevisiae are studied to verify the biosynthetic assumptions that were made in the previous chapters.

REFERENCES Dauner, M., Bailey, J.E., Sauer, U. (2001) Metabolic flux analysis with a comprehensive isotopomer model in Bacillus subtilis. Biotechnol. Bioeng., 76, 2: 144-156 Gombert, A.K., Moreira dos Santos, M., Christensen, B., Nielsen, J. (2001) Network identification and flux quantification in the central metabolism of Saccharomyces cerevisiae under different conditions of glucose repression. J. Bacteriol., 183, 4: 1441-1451 Lange, H.C., Eman, M., Van Zuijlen, G., Visser, D., Van Dam, J.C., Frank, J., Teixera De Mattos, M.J., Heijnen, J.J. (2001) Improved rapid sampling for in vivo kinetics of intracellular metabolites in Saccharomyces cerevisiaie. Biotechnol. Bioeng., 75, 4: 406-415 Maaheimo, H., Fiaux, J., Çakar, Z. P., Bailey, J. E., Sauer, U., Szyperski, T. (2001) Central carbon metabolism of saccharomyces cerevisiae explored by biosynthetic fractional 13C-labeling of common amino acids. Eur. J. Biochem., 268: 2464-2479 Marx, A., De Graaf, A.A., Wiechert, W., Eggeling, L., Sahm, H. (1996) Determination of the fluxes in the central metabolism of Corynebacterium glutamicum by nuclear magnetic resonance spectroscopy combined with metabolic balancing. Biotechnol. Bioeng., 49, 2: 111-129 Marx, A., Eikmans, B.J., Sahm, H., De Graaf, A.A., Eggeling, L. (1999) Response of the central metabolism in Corynebacterium glutamicum to the use of an NADH-dependent glutamate dehydrogenase. Metabol. Eng., 1, 1: 35-48 Möllney, M., Wiechert, W., Kownatzki, D., De Graaf, A.A.. (1999) Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol. Bioeng., 66, 2: 86-103 Petersen, S., De Graaf, A.A., Eggeling, L., Möllney, M., Wiechert, W., Sahm, H. (2000) In vivo quantification of parallel and bidirectional fluxes in the anaplerosis of Corynebacterium glutamicum. J. Biol. Chem., 275, 46: 35932-35941

17

Chapter 1 Sauer, U., Lasko, D.R., Fiaux, J., Hochuli, M., Glaser, R., Szyperski, T., Wüthrich, K., Bailey, J.E. (1999) Metabolic flux ratio analysis of genetic and environmental modulations of Escherichia coli central carbon metabolism. J. Bacteriol., 181, 21: 6679-6688 Savinell, J.M., Palsson, B.O. (1992) Network analysis of intermediary metabolism using linear optimization. I. Development of mathematical formalism. J. Theor. Biol., 254, 421-454 Schmidt, K., Carlsen, M., Nielsen, J., Villadsen, J. (1997) Modeling isotopomer distributions in metabolic networks using isotopomer mapping matrices. Biotechnol. Bioeng., 55, 6: 831-840 Schmidt, K., Marx, A., De Graaf, A.A., Wiechert, W., Sahm, H.,Nielsen, J., Villadsen, J. (1998) 13C tracer experiments and metabolite balancing for metabolic flux analysis: comparing two approaches. Biotechnol. Bioeng., 58, 2&3: 254-257 Schmidt, K., Nørregaard, L., C., Pedersen, B., Meissner, A., Duus, J.Ø., Nielsen, J. Ø., Villadsen, J. (1999) Quantification of intracellular metabolic fluxes from fractional enrichment and 13C-13C coupling constraints on the isotopomer distribution in labeled biomass components. Metabol. Eng., 1, 2: 166-179 Sonntag, K., Eggeling, L., De Graaf, A.A., Sahm, H. (1993) Flux partitioning in the split pathway of lysine synthesis in Corynebacterium glutamicum. Eur. J. Biochem., 213: 1325-1331 Szyperski, T. (1995) Biosynthetically directed fractional 13C-labelling of proteinogenic amino acids. Eur. J. Biochem., 232: 433-448 Szyperski, T. (1998) 13C-NMR, MS and metabolic flux balancing in biotechnology research. Quart. Rev. Biophys., 31, 1: 41-106 Theobald, U., Mailinger, W., Baltes, M., Rizzi, M., Reuss, M. (1997) In vivo analysis of metabolic dynamics in Saccharomyces cerevisiae: I. Experimental observations. Biotechnol. Bioeng., 55, 2: 305-316 Van der Heijden, R.T.J.M. (1991) State estimation and error diagnosis for biotechnological processes. PhD-thesis Delft University of Technology Van Gulik, W.M., Heijnen, J.J. (1995) A metabolic network stoichiometry analysis of microbial growth and product formation. Biotechnol. Bioeng., 48, 6: 681-698 Van Gulik, W.M., Van Winden, W.A., Heijnen, J.J. (2001) Modeling the metabolism of penicillin-G formation. In: Synthesis of β-lactam antibiotics, Ed. Bruggink, A., Kluwer Academic Publishers, Dordrecht, The Netherlands, 283-334 Visser, D., Van der Heijden, R.T.J.M., Mauch, K., Reuss, M., Heijnen, J.J. (2000) Tendency modeling: a new approach to obtain simplified kinetic models of metabolism applied to Saccharomyces cerevisiae. Metabol. Eng., 2: 252-275 Visser, D. (2002) Measuring and modeling in vivo kinetics of primary metabolism. PhD-thesis Delft University of Technology Wiechert, W., De Graaf, A.A. (1997) Bidirectional reaction steps in metabolic networks: I. Modeling and simulation of carbon isotope labeling experiments. Biotechnol. Bioeng., 55, 1: 101-117 Wiechert, W., Möllney, M., Isermann, N., Wurzel, M., De Graaf, A.A. (1999) Bidirectional reaction steps in metabolic networks: III. Explicit solution and analysis of isotopomer labeling systems. Biotechnol. Bioeng., 66, 2: 69-85 Wiechert, W. (2001) Minireview. 13C Metabolic Flux Analysis. Metabol. Eng., 3, 3: 195-206 Wiechert, W., Wurzel, M. (2001) Metabolic isotopomer labeling systems part I: global dynamic behaviour. Math. Biosci., 169: 173-205

18

THEORY

Chapter 2

Possible Pitfalls of Flux Calculations Based on 13C-Labeling This chapter is a revised version of Van Winden et al. (2001) ABSTRACT Metabolic engineers have enthusiastically adopted the 13C-labeling technique as a powerful tool for elucidating fluxes in metabolic networks. This tracer technique makes it possible to determine fluxes that are unobservable using only metabolite balances and allows the elimination of doubtful cofactor balances that are indispensable in flux analysis based on metabolite balancing alone. The 13C-labeling technique, however, relies on a number of assumptions that are not free from uncertainties. Two possible errors in the models that are needed to determine the metabolic fluxes from labeling data are (1) omitted reactions and (2) ignored occurrence of channeling. By means of two representative examples it is shown that these modeling errors may lead to serious errors in the calculated flux distributions despite the use of labeling data. A complicating factor is that the model errors are not always easily detected as poor models may still yield good fits of experimental labeling data. Results of 13Clabeling experiments should therefore be interpreted with appropriate caution.

19

Chapter 2 2.1 INTRODUCTION Over the last two decades the 13C-labeling technique has become a well-established tool in metabolic flux analysis. The reason for this is that this 13C-tracer method adds important information to the extracellular net conversion measurements, which constitute the only available measurement data for ‘classical’ flux analysis based on mass balancing alone. The 13 C-tracer method does not replace, but complements the ‘classical’ method. Therefore it always offers at least as good flux estimates as the basic flux analysis. Moreover, it may e.g. allow evaluation of separate fluxes through parallel pathways. One of the major advantages of flux analysis based on 13C-labeling is that the extra information allows the elimination of uncertain mass balances from the set of balances. Mass balances that are often regarded as doubtful are those of metabolic cofactors ATP, and NAD(P)H (Van Gulik et al.,1995; Marx et al.,1996,1999; Sauer et al.,1997; Schmidt et al.,1998; Szyperski,1998; Wiechert et al.,1997). Cofactor mass balances rely on estimated reaction stoichiometries and on controversial assumptions. Examples of the latter are a growth-independent ATP yield of the respiratory system (P/O-ratio), fixed growth and nongrowth associated maintenance requirements of the cell, the presence of ATP-wasting futile cycles, NAD(P)H specificities of various enzymes and the presence or absence of transhydrogenases interconverting NADH and NADPH. Furthermore cofactors are involved in many reactions in the cell such that it is hardly possible to take all of them into account. As a consequence, cofactor mass balances are most probably far from complete and may lead to erroneous flux estimates when included in the mass balances. However, it should not be forgotten that the assumptions on which the 13C-labeling method is based are not free of uncertainties either. The following assumptions have been made explicit by Schmidt et al. (1997), Wiechert et al. (1997) and Szyperski (1998): 1) the network stoichiometry included in the metabolic model is complete. Any omitted reaction should be insignificant in its effect on the labeling state of the biomass, 2) complete biochemical information on the fate of each carbon atom in the modeled reactions is available, 3) metabolic isotope effects are absent, i.e. enzymes do not distinguish between molecules containing various numbers of labeled carbon atoms, 4) all reactions take place in compartments where metabolites are homogeneously distributed. This assumption includes: 4a) if occurring, metabolite channeling, i.e. direct transfer of metabolites from one enzyme to the next, must be taken into account, 4b) if present, compartments in eukaryotic cells should lead to the inclusion of separate metabolite pools in the model. Although this list of assumptions has been made explicit, the full impact of some of the assumptions has not always been realized. An example is the error that is introduced in flux calculations based on labeling data by making assumptions about the reversibility of fluxes in a metabolic network. It was pointed out only recently that rough assumptions regarding the reversibility (e.g. assumed unidirectionality or isotopic equilibrium) are not good enough for metabolic models, since reversibility may have a severe effect on the labeling state of the metabolites and erroneous assumptions may therefore invalidate the calculated fluxes. This 20

Possible pitfalls fact was illustrated by Follstad et al. (1998) who simulated the influence of reversible reactions in the pentose phosphate pathway (from here on: PPP) on the fractional enrichments of the pathway intermediates, and by Wittmann et al. (1999) who did the same for their mass distributions. Finally, to our knowledge, none of the flux analyses based on 13C-labeling data that have been published so far featured a statistical scrutiny that included testing a sensitivity analysis of the outcomes to erroneous or incomplete model assumptions. Consequently, flux analyses using the tracer method are often presented without any reservations. In this chapter two possible pitfalls of the 13C-labeling method are presented. The pitfalls apply to the first and fourth assumption in the list above. 2.2 PITFALL I: INCOMPLETE METABOLIC REACTION MODELS The main assumption underlying metabolic modeling in general is that the network stoichiometry that is included in the metabolic model is complete. This assumption may not be as trivial as it seems. Although metabolic models are often based on a one-enzyme-onereaction scheme, the literature reports many enzymes that are more permissive, accepting broad ranges of substrates. Moreover, microbial genome sequencing combined with functional genomics show that the exact amino acid sequences and thereby the specificities of enzymes may vary from species to species, which makes it dangerous to automatically assume standard textbook-biochemistry when defining metabolic pathways for a given microorganism (Cordwell,1999). Examples of broad-specificity enzymes are the fbp aldolase (for this and following abbreviations, see p.1) which accepts a wide range of aldehydes in stead of its natural substrate g3p (Toone et al.,1989), hexokinase which accepts other sugars in place of glucose (Toone et al.,1989), polyol dehydrogenase which reduces several ketones (Toone et al.,1989), alcohol dehydrogenase which reduces a variety of aldehydes and ketones (Bradshaw et al.,1992) and pyr decarboxylase, which has a side activity as carboligase linking two aldehydes (Schörken et al.,1998). Another example of a permissive enzyme, which will be more elaborately described for an illustrative purpose is transketolase, which transfers an active glycolaldehyde group from a ketose donor to an α-hydroxyaldehyde acceptor (Schörken et al.,1998). Transketolase is widely known to catalyze the following two reactions of the non-oxidative branch of the PPP: r5p + x5p ← → s7p + g3p e4p + x5p ← → f6p + g3p In the two above reactions x5p, f6p and s7p act as donors of two-carbon fragments and g3p, r5p and e4p act as the acceptors. According to Schörken et al. (1998), transketolases of various organisms also accept glucose as an acceptor and hydroxypyruvate as a donor. Apparently, the range of the accepted substrates is relatively large. Table 1 shows an overview of the intermediates of the non-oxidative PPP and of the glycolysis that may serve as donors and acceptors of two-carbon fragments according to their chemical structure. If we combine all two-carbon fragment donors and acceptors we obtain six transketolase reactions in the non oxidative PPP (Table 2). This table is limited to the reactions that yield reaction products with a length of at least three carbons and at most seven 21

Chapter 2 carbon atoms. The reasons for this limitation are that intermediates of less than three carbon atoms have never been reported for the non-oxidative PPP. Products of more than seven atoms, e.g. octulose, have been included in PPP models by McIntyre et al. (1989) and by Williams et al. (1987), but were only experimentally observed in small quantities in liver tissue, erythrocytes and higher plants. TABLE 1: Possible C2-donors and acceptors of transketolase and C3-donors and acceptors of transaldolase C2/C3-donor D1: x5p D2: f6p D3: s7p Shared structure: CH2OH | C=O | HOCH | HCOH | R

Transferred by transketolase

C2/C3-acceptor A1: g3p A2: e4p A3: r5p Shared structure:

Transferred by transaldolase

HC=O | HCOH | R

TABLE 2: Possible transketolase reactions in non-oxidative PPP Donor x acceptor: D1 x A1 D1 x A2 (forward) or D2 x A1 (backward) D1 x A3 (forward) or D3 x A1 (backward) D2 x A2 D3 x A2 (forward) or D2 x A3 (backward) D3 x A3

Resulting reaction: TK1: x5p + g3p ↔ g3p + x5p TK2: x5p + e4p ↔ g3p + f6p TK3: x5p + r5p ↔ g3p + s7p TK4: f6p + e4p ↔ e4p + f6p TK5: s7p + e4p ↔ r5p + f6p TK6: s7p + r5p ↔ r5p + s7p

Reactions TK2 and TK3 of Table 2 are the commonly applied transketolase reactions that were mentioned before. Reactions TK1, TK4, TK5 and TK6 are normally not considered in metabolic networks. Reactions TK1, TK4 and TK6 have no net stoichiometric effect, so they may be ignored in flux analysis based on mass balancing. However, these reactions do have an effect on the labeling distribution of pentose pathway intermediates so they may not be ignored when metabolic fluxes are studied using the 13C-labeling technique. The occurrence of these additional transketolase reactions was clearly demonstrated by amongst others Clark et al. (1971) and Flanigan et al. (1993). Although Flanigan explicitly warned for the adverse effects of the transketolase exchange reactions on isotope-based flux analyses, this seems to have been largely ignored by most metabolic engineers. This fact may come from a confusion of words that is illustrated by a recent article by Christensen et al. (2000). In their article, Christensen et al. refer to the article of Flanigan but confuse Flanigan’s meaning of exchange reactions (i.e. exchange of a two-carbon fragment) with the exchange reactions that were 22

Possible pitfalls introduced by Wiechert et al. (1997) as a suitable mathematical form to include the reversibility of reactions in metabolic models. These two meanings of the term exchange reaction are clearly different. Due to the wrong interpretation of the term, the additional model complication by reactions TK1, TK4 and TK6 was not realized. Considering the evidence for the occurrence of all other transketolase reactions, the absence of reaction TK5 in the numerous papers devoted to the PPP is surprising, especially because this reaction has a net stoichiometric effect so it should be included in mass balances of four of the PPP intermediates. We see no compelling reason why this reaction would not take place as well. Transaldolase, the other enzyme of the non-oxidative PPP, has also been reported to be permissive to more substrates (Berthon et al.,1993) than those which it converts in the commonly known reaction: g3p + s7p ← → f6p + e4p Therefore, similarly to transketolase, more transaldolase reactions can be hypothesized than the ones that are traditionally included in models of the PPP. Combining all three-carbon fragment donors and acceptors from Table 1 and selecting the combinations that yield reaction products with a length between three and seven carbon atoms we obtain three potential transaldolase reactions. These are listed in Table 3. TABLE 3: Possible transaldolase reactions in non-oxidative PPP Donor x acceptor: D2 x A1 D3 x A1 (forward) or D2 x A2 (backward) D3 x A2

Resulting reaction: TA1: f6p + g3p ↔ g3p + f6p TA2: s7p + g3p ↔ e4p + f6p TA3: s7p + e4p ↔ e4p + s7p

Reaction TA2 is the commonly applied transaldolase reaction. Reactions TA1 and TA3 do not have a net effect on the stoichiometry, but do influence the label distribution. The occurrence of reaction TA1 has been demonstrated by Ljungdahl et al. (1961). For unknown reasons, reaction TA3 has never been mentioned before. The impact of the omission of the ‘forgotten’ reactions of Tables 2 and 3 on the outcome of 13C-labeling experiments will be illustrated by an example that is based on the metabolic network of the glycolysis and the PPP (Fig.1). The network which includes the traditional reactions (Fig.1-I) and the additional four transketolase and two transaldolase reactions (Fig.1-II) is assumed to be the true metabolism. This extended model was used to generate a series of ‘measured’ labeling data for a chosen set of fluxes and for a specific substrate labeling. The chosen flux values are shown in Fig.1. Their values are deemed representative for those found in literature. The fluxes leading from the intermediates towards synthesis of biomass components are often small compared to the throughput of the central carbon metabolism and were set at zero in order to keep the model as simple as possible. The p5p pool (consisting of r5p, ru5p and x5p, see list of abbreviations on p.1) and the g3p pool (consisting of g3p, dhap, bpg, 2pg, 3pg and pep) are lumped pools that consist of intermediates which are usually assumed to be in isotopic equilibrium due to fast exchange 23

Chapter 2 reactions. The fluxes of the network of Fig.1 were chosen such that none of the intermediates accumulates. In Fig.1-II the fluxes of reactions TK1, TK4, TK6, TA1 and TA3 could be freely chosen, since they do not have a net stoichiometric effect. They, and the flux of the bidirectional reaction TK5 were chosen to be of the same order of magnitude as the traditional transketolase and transaldolase reactions TK2, TK3 and TA2 in Fig.1-I. The applied substrate labeling was assumed to be 10% uniformly labeled glucose and 90% unlabeled glucose.

I

II

glc V1=100 tre

V5=21

V(TK4) =20

his

p5p

V(TK1)=20

g6p

V2f/b=300/221 man

V(TK2)f/b=16/15

f6p V3=93

V(TK5)f/b=20/14 e4p

ery

f6p

e4p

V(TA2)f/b=14/7 g3p

V4=193

p5p

s7p

V(TK3)f/b=18/5

V(TA1)=10 g3p

s7p

V(TA3) =10

tyr phe

V(TK6) =20

FIGURE 1: The ‘true’ metabolism (model 1) including the traditional reactions (Fig.1-I) and all the hypothetical transaldolase and transketolase reactions (Fig. 1-II). The chosen flux values are shown as well. The double headed arrows indicate bidirectional reactions and the corresponding double flux values denote the forward and backward reaction. Full and dotted lines are only used to avoid visual confusion. The forward transketolase and transaldolase reactions are the reactions as they are listed in tables 2 and 3. The corresponding backward reactions should be read from right to left in these tables. The forward reaction v2 runs in the direction of f6p, the backward reaction towards g6p. The grey shaded boxes represent the molecules of which the NMR-spectra are simulated. For abbreviations see list on p.1.

The labeling data that were generated are 2D [13C,1H] COSY data (Szyperski,1995) of a number of cell components that are derived from the intermediates. They were simulated using a model based on the method for isotopomer modeling published by Schmidt et al. (1997). The exact nature of the simulated labeling data is explained in appendix A. The model of Fig.1-I represents a part of the carbon metabolism studied by many researchers who determined the fluxes for different microorganisms using 13C-labeling experiments. The split ratio between the glycolysis and the PPP has been a major target of these studies. In the reported studies the NMR spectra of the following biomass components were measured in order to derive the labeling pattern of the intermediates of the glycolysis and PPP: glucan (giving information about g6p), glycogen (g6p), trehalose (g6p), chitin (f6p), histidine (p5p), tyrosine (g3p, e4p), phenylalanine (g3p), glycine (g3p), serine (g3p), guanosine (p5p) (Szyperski,1995; Marx et al., 1996,1997; Schmidt et al., 1999; Christensen et 24

Possible pitfalls al., 2000). The biomass components of which the NMR spectra are simulated in the present study (i.e. the gray shaded boxes in Fig.1) represent a data set that is similar to one which has actually been measured in experiments that were performed in our laboratory (see Chapter 8). This extensive set of labeling data offers labeling information of all glycolysis and PPP intermediates except s7p. The set of ‘measured’ labeling data that were generated using the model of Fig.1 (model 1) is shown in Table 4. This table also shows the set of labeling data that were generated using the model of Fig.2 (model 2) which only contains the glycolytic and the traditional PPP reactions. The fluxes of model 2 were fitted in order to minimize the sum of squared residuals between the labeling data generated using this model and the ‘measured’ labeling data of model 1. glc V1=100 tre

V5=41

p5p

his

g6p

V2f/b=337/278 man

V(TK2)f/b=45/31 f6p

V3=86

ery

V(TA2)f/b=25/11 g3p

V4=186

e4p

s7p

V(TK3)f/b=42/29

tyr phe

FIGURE 2: The ‘traditional’ metabolism (model 2) showing the optimally fitted flux values for noise free ‘measurement’ data.

Fig.2 shows that the optimally fitted model 2 does not only fail to yield the values of the transketolase and transaldolase fluxes of Fig.1-II; the fitted fluxes also deviate from those in Fig.1-I. The non-oxidative PPP fluxes are clearly wrongly determined. More seriously, the amount of glucose entering the PPP is seriously overestimated (by almost 100%). This latter overestimation leads to erroneous estimations of amounts of NAD(P)H and ATP that are involved in this part of the carbon metabolism. It should be noted that the deviations between the fitted fluxes (model 2) and the ‘true’ fluxes (model 1) depend on the sizes of the new transaldolase and transketose reactions that were assumed (Fig.1-II). When the sizes of TK1, TK4, TK6, TA1 and TA3 were reduced to only 20% of the values in Fig.1-II and the sizes of the forward and backward reactions TK5 were reduced to 8 and 2 respectively, the optimally fitted flux v5 reduced from 41 (see Fig.2) to 28. This means that the amount of glucose entering the PPP is still overestimated by 33%. 25

Chapter 2 From Table 4 it is clear that the optimally fitted model 2 yields labeling data that do not deviate much from those of model 1 when compared to fits of similar data sets in literature (e.g. Schmidt et al.,1999). This is a rather surprising result if one considers that model 1 contains seven more reactions than model 2 (the bidirectional reaction TK5 consisting of a separate forward and backward flux). The only data in Table 4 that show a substantial difference are the relative peak areas of the δ-carbon of histidine. TABLE 4: Comparison of data simulated with the ‘true metabolism’ of model 1 and fitted using model 2 molecule and carbon atom a

‘measured’ data model 1

fitted data model 2

absolute difference

st. dev. for β=0.030 b

molecule and carbon atom

‘measured’ data model 1

fitted data model 2

absolute difference

st. dev. for β=0.030

His α-s His α-d1 His α-d2 His α-dd His β-s His β-d1 His β-d2 His β-dd His δ-s His δ-d Tyr α-s Tyr α-d1 Tyr α-d2 Tyr α-dd Tyr β-s Tyr β-d1 Tyr β-d2 Tyr β-dd Tyr 26-s Tyr 26-d Tyr 26-t Tyr 35-s Tyr 35-d Tyr 35-t Phe α-s Phe α-d1 Phe α-d2 Phe α-dd

0.109 0.005 0.088 0.798 0.178 0.019 0.441 0.361 0.393 0.607 0.119 0.133 0.007 0.741 0.112 0.014 0.778 0.096 0.102 0.801 0.097 0.220 0.322 0.458 0.119 0.133 0.007 0.741

0.112 0.005 0.080 0.802 0.174 0.018 0.445 0.363 0.308 0.693 0.124 0.121 0.007 0.747 0.117 0.015 0.773 0.095 0.106 0.797 0.097 0.243 0.305 0.452 0.124 0.121 0.007 0.747

0.003 0.000 0.008 0.004 0.004 0.001 0.003 0.001 0.086 0.086 0.005 0.012 0.001 0.006 0.005 0.001 0.005 0.001 0.005 0.004 0.001 0.023 0.017 0.006 0.005 0.012 0.001 0.006

0.028 0.043 0.041 0.056 0.029 0.041 0.042 0.045 0.024 0.024 0.029 0.038 0.044 0.053 0.029 0.043 0.065 0.057 0.047 0.067 0.054 0.044 0.040 0.045 0.028 0.039 0.042 0.053

Phe β-s Phe β-d1 Phe β-d2 Phe β-dd Man 16-s Man 16-d Man 25-s Man 25-d Man 25-t Tre 1-s Tre 1-d Tre 2-s Tre 2-d Tre 2-t Tre 3-s Tre 3-d Tre 3-t Tre 4-s Tre 4-d Tre 4-t Tre 5-s Tre 5-d Tre 5-t Tre 6-s Tre 6-d Ery 14-s Ery 14-d

0.112 0.014 0.778 0.096 0.132 0.868 0.125 0.163 0.713 0.142 0.858 0.133 0.199 0.669 0.192 0.168 0.640 0.112 0.124 0.764 0.094 0.026 0.880 0.096 0.904 0.243 0.758

0.117 0.015 0.773 0.095 0.141 0.859 0.132 0.156 0.712 0.157 0.843 0.146 0.203 0.651 0.221 0.148 0.631 0.115 0.144 0.741 0.096 0.028 0.877 0.098 0.902 0.269 0.731

0.005 0.001 0.005 0.001 0.009 0.009 0.008 0.006 0.001 0.015 0.015 0.013 0.004 0.017 0.029 0.020 0.009 0.003 0.020 0.023 0.002 0.002 0.003 0.002 0.002 0.026 0.026

0.027 0.042 0.064 0.057 0.027 0.027 0.047 0.039 0.053 0.027 0.027 0.046 0.038 0.049 0.046 0.038 0.047 0.048 0.038 0.054 0.049 0.040 0.060 0.028 0.028 0.026 0.026

Note:

a

The names and symbols are explained in Appendix A. The standard deviation of the noisy ‘measurement’ data that are generated for the level of noise where the modeling error is just detectable at a 5% significance level (see text).

b

26

Possible pitfalls It must be emphasized that model 1 was used to generate error-free 2D [13C,1H] COSY data and that the fluxes of model 2 were fitted to these error-free data. Due to the absence of any measurement error, the lack of fit between the two sets of relative intensities is caused by a modeling error. Therefore, Table 4 theoretically does indicate the presence of a modelling error. In practice, however, measurement errors will be present, making it harder to detect a modeling error. The maximal level of measurement noise at which the above modeling error is still detectable can be determined using statistical testing. This is discussed in the following section. Lack of fit detection threshold In order to determine how accurate the NMR measurements should be in order to make the modeling error detectable, it was determined how much the modeling error and pure measurement error contribute to the lack of fit for several levels of measurement noise. For this purpose five covariance matrices of the ‘measurement’ data of model 1 were generated at increasing levels of measurement noise (see appendix B). The magnitude of the error is expressed by the variable β, which is a measure for the relative error. The covariance matrices were used to weigh the sum of squared residuals in five fits of the simulated data of model 2 to those of model 1. The resulting minimized weighted sums of squared residuals are shown in Fig.3. It can be clearly seen that at increasing levels of noise (i.e. increasing values of β), the weighted sum of squares decreases. This follows from the fact that the unweighted sum of squared residuals remains constant, whereas the covariance matrix by which the sum is weighted increases with β. From the data in Fig.3 it can be derived that the minimized covariance weighted sum of squared residuals (SSmin,weighted) equals: 0.013 SSmin,weighted = 2 (1) β In Eq.1 SSmin,weighted represents the modeling error, since it results from a fit based on noisefree ‘measurement’ data of model 1. When noise is added to the ‘measurement data’, it is expected that the SSmin,weighted will increase due to a contribution of the pure measurement error. This contribution has a χ2-distribution with a number of degrees of freedom that equals the number of independent data minus the number of parameters of model 2. Table 4 shows that the number of simulation data equals 55 of which 18 are dependent, since the relative intensities of the 18 simulated spectra sum up to 1. Model 2 has 5 degrees of freedom, leading to a total degree of freedom of the a χ2-distribution of 55-18-5=32. The expected mean outcome of a χ2-distribution with 32 degrees of freedom equals 32. Therefore, it is expected that when model 2 is used to fit the ‘measurement’ data of model 1 to which noise is added, SSmin,weighted will be 32 higher than those shown in Fig.3. This was verified by means of Monte-Carlo simulations at various levels of measurement noise. The various levels of noise that were added to the data of model 1 were generated from the corresponding covariance matrices as described by Johnson (1987).

27

1400,00

0,400

1200,00

0,350 0,300

1000,00

0,250

800,00

0,200

600,00

0,150

400,00

P (Eq.2)

SS(min,weighted)

Chapter 2

0,100

200,00

0,050

0,00

0,000

0,000

0,010

0,020

0,030

0,040

ß

FIGURE 3: The minimized covariance weighted sum of squared residuals (SSmin,weighted) of the noise-free data of model 1 and the fitted data of model 2 ( □ : performed fits; thin line: approximation of Eq.1) and the one-tailed probability of Eq.2 (fat line) versus level of measurement noise (β).

In order to test whether the model error is detectable the SSmin,weighted of the noisy data was tested for the null hypothesis that model 2 is correct. Using Eq.1 the test can be formulated as follows:

   0.013  P  SSmin,weighted ≥  2 + 32  SSmin,weighted ∼ χ 2 ( 32 )  < α    β   

( 2)

The probability P of Eq.2 is plotted versus β in Fig.3. According to Eq.2 the null hypothesis must be rejected when the probability of the observed deviation is smaller than α. When α is chosen to be 0.05, the null hypothesis is rejected (i.e. the modeling error is detected) for values of β smaller than 0.030. The covariance matrix for β=0.030 was used to generate a large number of noisy ‘measurement’ data sets. The standard deviations of the separate relative intensities in these sets were determined and are shown in Table 4. The allowed errors vary from a relative error of over 800% for small relative intensities (e.g. his α-d1) to a relative error of 3.1% (e.g. tre 1-d). This example gives an impression of the accuracy of the NMR measurements that is required to detect the omission of the additional transaldolase and transketolase reactions. Note that the accuracy of the NMR measurements that is required to detect the omission in the model depends on the model error and thus on the sizes of the new transaldolase and transketolase reactions that were assumed in model 1. It was already pointed out before that the δ-carbon of histidine largely contributes to the modeling error part of the sum of squared residuals (see Table 4). A consequence of this fact is that the maximally allowed level of noise at which the modeling error is detectable drastically decreases if the 13C-labeling data of this carbon atom are not included in the fit. This illustrates the importance of measurement data that are sensitive towards modeling

28

Possible pitfalls errors. It is therefore recommended to look for these model-sensitive measurement data in the design of future labeling experiments. Additional 13C-labeling data In order to check whether a more extensive set of 13C-labeling data would increase the lack of fit between models 1 and 2, the dataset of Table 4 was extended with a set of MS data including all the mass fractions of the intermediates of the PPP and of the branch point intermediates of the glycolysis and the PPP. These 37 mass fractions (7 for both g6p and f6p, 6 for p5p, 5 for e4p, 8 for s7p and 4 for g3p) were also simulated with the model of Fig.1 and fitted by varying the fluxes of the model of Fig.2. The resulting fitted flux set was only marginally different from the set shown in Fig.2 and fitted both the NMR and the MS data very well (results not shown). In other words, in this specific example even the extensive dataset of combined MS and NMR data does not reveal the serious modeling errors, unless the measurement errors are accurately known and small. Note that a general conclusion about the identifiability of the modeling error cannot be given here, because the sensitivities of the simulated data to the modeling error depend on the actual values of the metabolic fluxes. 2.3

PITFALL II: MICROCOMPARTMENTATION DUE TO METABOLITE CHANNELING Besides the assumption of a correct metabolic network that was discussed in the previous section, another important assumption of the 13C-labeling method is that metabolite channeling must be taken into account when it is proven to occur in cells. Metabolic channeling describes the mechanism whereby the product of an enzymatic reaction is transferred to the next enzyme without mixing with the bulk-phase metabolite pool (Kholodenko et al.,1996). In their review of metabolic network analysis Christensen and Nielsen (1999) mention the influence of channeling on the outcome of 13C-labeling experiments. Several researchers have included the possibility of channeling in the metabolic models they used for the simulation of 13C-labeling (Portais et al., 1993;Schmidt et al.,1999). In each of these papers channeling is only mentioned as a mechanism that causes orientation conserved transfer of symmetric molecules which would otherwise cause scrambling of the labeling pattern. The occurrence of orientation conserved transfer is controversial, as the authors show in their review of the scientific debate regarding the channeling of the TCA cycle intermediates succinate and fumarate. The symmetry of these molecules causes randomization of the C1/C4 and C2/C3 labeling, unless the molecules are passed from succinate thiokinase to succinate dehydrogenase and fumarase without being released. In that case their orientation is conserved and scrambling does not occur. In fact, orientation conserved transfer is only one of the effects that channeling may have on the label distribution in a metabolic network. Channeling between two enzymes may also lead to microcompartmentation of the reaction intermediate. In contrast to orientation conserved transfer, this effect of channeling has never been taken into account in 13C-labeling studies. Still, it may be a common phenomenon in many major pathways of both eukaryotic and prokaryotic cells (Ovádi et al.,1988; Mathews,1993; Kholodenko et al.,1996). In the glycolysis direct enzyme-to-enzyme transfer of metabolites has been claimed to occur 29

Chapter 2 between aldolase and g3p dehydrogenase (Ovádi et al.,1978a,1978b,1983,1990). This claim is not uncontroversial as is evidenced by the paper of Kvassman et al. (1988). Other glycolytic enzyme-enzyme interactions have been claimed for g3p dehydrogenase and 3pg kinase (Srivastava et al., 1986). Interactions between glycolytic enzymes and enzymes of the PPP have been studied by Wood et al. (1985). They found evidence of complexes between g3p dehydrogenase, transketolase and transaldolase. Debnam et al. (1997) concluded from their research that intermediates of the oxidative branch of the PPP are channeled between hexokinase, g6p dehydrogenase and 6pg dehydrogenase. Further channeling complexes have been suggested between alodolase and glycerol phosphate dehydrogenase, between aspartate aminotransferase and glutamate dehydrogenase (Ovádi,1990) and between the enzymes of the tricarboxylic acid cycle (Haggie et al.,1999).

I

A

e1

A

D

e3

e2

B

II

C

e1

C

e3

D

e2

B

e2 e3

III

A

e1

C

e3

D

B e2 e3 FIGURE 4: A metabolic node with and without channeling. A black sphere denotes a 13C-atom and a white sphere a 12C-atom. I: Metabolites A and B are converted to C by enzymes e1 and e2. C is converted to D by enzyme e3. II: A fraction of e2 forms an enzyme complex with enzyme e3. This e2e3 complex converts B to D and channels the intermediate C without releasing it. III: All of the enzyme e2 complexes with enzyme e3 and forms a ‘leaky channel’ from which some of the intermediate C leaks to the bulk pool.

30

Possible pitfalls It is important to realize that microcompartmentation due to channeling clearly conflicts with the assumption of homogeneous metabolite pools unless the microcompartments are considered as separate pools. When this fact is neglected the 13C-labeling data of the pools will give inaccurate information about ratios of the fluxes entering the pool, since the ratios only reflect those of the non-channelled flux fractions. To our knowledge the impact of this fact has never been acknowledged. The importance of the fact will be illustrated by means of an example system shown in Fig.4. Fig.4-I represents the standard situation around a metabolic branch point. In Fig.4-II the enzymes e2 and e3 are assumed to be present both in complexed (allowing channeling) and in uncomplexed forms, which leads to a certain fraction of a one-carbon molecule C that is channeled and does not mix with the bulk pool of C. Alternatively, as shown in Fig.4-III, it may be assumed that enzyme e2 is only present in the complexed state but forms a so-called ‘leaky channel’ (Ovádi,1990). In this case the non-leaked fraction of C does not mix with the bulk pool of C. In both cases II and III the label distribution of the free metabolite C does not only depend on the fluxes through enzymes e1 and e2 but also on the complexed fraction of enzyme e2 (case II) or on the leaked fraction of the channeled metabolite (case III). If we define vi as the flux catalyzed by enzyme ei, irrespective of the state of complexation of ei, and if we assume that in Fig.4-II α% of the flux v2 is channeled, then we can derive what percentage β of flux v3 must be channeled in order to fulfil the constraint that the absolute amount of C that is channeled by e2 must equal the amount channeled by e3: v1 + v 2 = v3  v2  β = α⋅ α ⋅ v 2 = β ⋅ v3  ( v1 + v 2 )

( 3)

Assume that in Fig.4, the fluxes are: v1=100 mole/s, v2=200 mole/s and by consequence v3=300 mole/s. If α equals 75%, then Eq.3 tells that β must equal 50%. Furthermore, assume that the label distribution of metabolites A and B is known. Written in the form of isotopomer distribution vectors (Schmidt et al.,1997), the labeling shown in Fig.4 is: a=(1,0)T and b=(0,1)T. From these isotopomer distribution vectors, the labeling of C can be calculated as follows for cases I and II in Fig.4: I : v1 ⋅ a + v 2 ⋅ b = v3 ⋅ c = ( v1 + v 2 ) ⋅ c ⇔ ( 4) c=

v1 v2 ⋅a + ⋅b ⇔ ( v1 + v 2 ) ( v1 + v2 )

c=

100  1  200  0   1 3  ⋅  + ⋅  =   300  0  300  1   2 3 

II : v1 ⋅ a + (1 − α ) ⋅ v 2 ⋅ b = (1 − β ) ⋅ v3 ⋅ c = ( v1 + (1 − α ) ⋅ v 2 ) ⋅ c ⇔ c=

(1 − α ) ⋅ v 2 ⋅ b ⇔ v1 ⋅a + ( v1 + (1 − α ) ⋅ v2 ) ( v1 + (1 − α ) ⋅ v2 )

c=

100  1  50  0   2 3  ⋅  + ⋅  =   150  0  150  1   1 3 

( 5)

The outcomes of Eqs.4 and 5 show that the fraction of the flux v2 that is channeled influences the labeling state of C. In case I measuring the labeling state of C by means of NMR or MS yields one independent measurement that fixes the flux ratio v1/v2. This solves the flux 31

Chapter 2 analysis if only one of the fluxes is known. In this case measuring the label distribution of metabolite D will not yield any extra information. In case II, however, measuring only the labeling state of C does not suffice to solve the two degrees of freedom formed by the flux ratio v1/v2 and by the parameter α. In this case the measurement of the labeling state of D is required too: I : v3 ⋅ c = v3 ⋅ d ⇔

( 6)

c=d ⇔ 1 3  d=   2 3

II : (1 − β ) ⋅ v3 ⋅ c + β ⋅ v3 ⋅ b = v3 ⋅ d ⇔ d=

(1 − β ) ⋅ v3 ⋅ c + β ⋅ v3 ⋅ b ⇔

d=

150  2 3  150  0   1 3  ⋅ ⋅  =  +  300  1 3  300  1   2 3 

v3

(7)

v3

Eq.6 shows that in case I, measuring D indeed does not give any additional information. This is obvious, since the labeling state of D always equals that of C. In case II, however, the labeling state of D forms an additional independent measurement. Combining the measured labeling states of C and D and Eq.3, the flux ratio v1/v2 and the parameters α and β can be determined. Note that Eq.7 might have been derived in an alternative way by considering that the net result of the fluxes in case II is a flux v1 from A to D and a flux v2 from B to D: II : v1 ⋅ a + v 2 ⋅ b = v3 ⋅ d = ( v1 + v 2 ) ⋅ d ⇔ (8) d=

v1 v2 ⋅a + ⋅d ⇔ ( v1 + v 2 ) ( v1 + v2 )

d=

100  1  200  0   1 3  ⋅  + ⋅  =   300  0  300  1   2 3 

Case III is completely analogous to case II when the fraction α of flux v2 that is channeled in case II, is read as the fraction α of flux v2 that is not leaked from the ‘leaky channel’ of case III. It will be clear that for large metabolic networks considering all the possible complexes of neighbouring enzymes and introducing additional parameters for fractions of fluxes that are channeled and thus bypass metabolite pools will lead to many extra degrees of freedom in the network. These degrees of freedom may cause inobservabilities of parts of the network, even when large amounts of high-quality 13C-labeling data are available. Therefore, the implications of channeling on the outcome of 13C-labeling studies will have to be seriously considered. In order to detect the occurrence of channeling in a metabolic network, one will have to measure the labeling of all the intermediates of the pathways in the network. If this is not done, one may not automatically assume that the labeling data of one of the intermediates of a linear pathway are representative for all other intermediates. If labeling results of several intermediates of a linear pathway differ then channeling effects or unknown inputs in the intermediate pools must be expected. 32

Possible pitfalls 2.4 CONCLUSIONS The two pitfalls of the 13C-labeling technique that were described and illustrated in this chapter show some of the uncertainties that still surround this technique. Both incomplete metabolic models and the occurrence of channeling between enzymes of the network may cause serious errors in the determined fluxes. If one does not take these possible sources of errors into account, the iterative numerical procedure that is commonly used to find fluxes that fit simulated to experimental data still yields an answer. At first sight, this answer may seem acceptable, because the labeling data are well fitted. It has been shown, however, that a satisfactory fit is no guarantee of a correct set of fluxes. Therefore, publications of flux analysis based on 13C-labeling studies should pay due attention to the sensitivity of the results to these modeling uncertainties. In order to make modeling errors detectable, as many labeled metabolites as possible should be measured. Besides a reasonably large quantity of measurement data a thorough statistical data evaluation is required to enable the rejection of models that do not yield an acceptable fit between measured and simulated labeling data. Assuming values for the measurement errors will not do; errors will have to be firmly established based on repeated independent experiments or signal-to-noise analysis of NMR or MS spectra. This allows one to check a posteriori whether modeling assumptions such as complete metabolic models and homogeneous distributions of intermediates were justifiable. APPENDIX A: EXPLANATION OF SIMULATED LABELING DATA 13 1 In 2D [ C, H] COSY spectra the following multiplets may be discerned: a singlet peak (denoted as ‘s’), a doublet peak (‘d’), a double doublet (’dd’) or a triplet (‘t’) (Szyperski,1995). Singlets and doublets are found in spectra of a two-carbon fragment within a molecule. Singlets, doublets and double doublets are observed in spectra of three-carbon fragments. In case the two 13C-13C scalar coupling constants in a three-carbon fragment are identical, only one doublet is observed and the two middle double doublet peaks overlap, leading to a triplet. Alternatively, if the 13C-13C scalar coupling constants differ two doublets are observed (‘d1’ having a larger coupling constant than ‘d2’) and a double doublet is found. This appendix shows the chemical structures of the molecules of which NMR spectra were assumed to be measured in the calculations in ‘Pitfall I’. The numbers between brackets indicate the numbering of the carbon atoms. For each of the spectra in Table 4 it is indicated which carbon atoms give rise to the concerning spectrum and whether scalar coupling constants are identical or not. Histidine: HIS α: observed carbon atom: 2 and its (different) coupling constants with atoms 1 and 3. HIS β: observed carbon atom: 3 and its (different) coupling constants with atoms 2 and 4. HIS δ: observed carbon atom: 5 and its coupling constant with atom 4. Tyrosine: TYR α: observed carbon atom: 2 and its (different) coupling constants with atoms 1 and 3. TYR β: observed carbon atom: 3 and its (different) coupling constants with atoms 2 and 4. 33

Chapter 2 TYR C26: observed carbon atoms: 5 and 9 and their (identical) coupling constants with atoms 4 and 6 respectively 4 and 8. TYR C35: observed carbon atoms: 6 and 8 and their (identical) coupling constants with atoms 5 and 7 respectively 7 and 9. Phenylalanine: PHE α and PHE β: see TYR α and TYR β. Trehalose: TRE C1: observed carbon atom: 1 and its coupling constant with atom 2. TRE C2: observed carbon atom: 2 and its (identical) coupling constants with atoms 1 and 3. TRE C3: observed carbon atom: 3 and its (identical) coupling constants with atoms 2 and 4. TRE C4: observed carbon atom: 4 and its (identical) coupling constants with atoms 3 and 5. TRE C5: observed carbon atom: 5 and its (identical) coupling constants with atoms 4 and 6. TRE C6: observed carbon atom: 6 and its coupling constant with atom 5. Erythritol: ERY C14: observed carbon atoms: 1 and 4 and their coupling constant with atom 2 respectively 3. Mannitol: MAN C16: observed carbon atoms: 1 and 6 and their coupling constant with atom 2 respectively 5. MAN C25: observed carbon atoms: 2 and 5 and their (identical) coupling constants with atoms 1 and 3 respectively 4 and 6.

1

3

2

4 1

5

2 1

5 4

3

3

1

2 trehalose

34

3

4

5

6

9

8 7 tyrosine

histidine

6

2

3

1

4 5 9

8 phenylalanine 1

4 5

2

2 3 6 1

2

3

4

erythritol

4 5 6 mannitol

6 7

Possible pitfalls APPENDIX B: GENERATION OF COVARIANCE MATRIX The data that are obtained from 2D [13C,1H] COSY NMR experiments are the relative areas of the spectral peaks. These areas will contain measurement errors due to the spectral noise. On the basis of observations of experimental spectra we hereby postulate an assumption regarding the experimental error which we will use to generate covariance matrices for the simulated 2D [13C,1H] COSY data: • The width of a spectral peak is independent of its height. As the error in the calculated peak area is a function of the variance of the spectral noise (σnoise2) and of the width of the peak, the absolute error of each peak in a spectrum is identical. In other words, the variance of the error in the area of a multiplet consisting of m peaks is given by: ( α ⋅ σ2noise ⋅ m ) where α is a constant. Based on this assumption, we can generate covariance matrices for the multiplet areas of the spectra of the measured carbon atoms: C A = α ⋅ σ 2noise ⋅ M

( B1)

In Eq.B1 CA is the covariance matrix, M is diagonal matrix that has the numbers of the peaks that constitute the multiplet as its diagonal elements. There are three options for M: 1 0 0 0 3 0 0   0 2 0 0 1 0  , M = 0 2 0 MA =  C  , MB =    0 0 2 0 0 2   0 0 4    0 0 0 4    MA is valid for the simplest possible spectrum (arising from a single 13C-13C scalar coupling with a neighbouring carbon atom), consisting of a singlet (one peak) and a doublet (two peaks) (Szyperski, 1995,1998). For spectra with nine peaks (arising from two different 13C13 C scalar couplings) MB should be used in Eq.B1. Finally, MC is applicable for spectra with five peaks (arising from two identical 13C-13C scalar couplings). In this case it is taken into account that the triplet area is calculated by multiplying the total area of the two peaks that do not overlap with the singlet by a factor of two. Furthermore, the pure singlet area is calculated by subtracting the total area of the two aforementioned triplet peaks from the observed singlet area. The simulated data presented in Table 4 are not the multiplet areas, but so called ‘relative intensities’ of the various multiplets. The relative intensity of a multiplet j (rj) is the normalized area, which is calculated as follows: A rj = j ( B2 ) A tot In this equation Aj represents the summed area the peaks of multiplet j and Atot is the total area of all the peaks of the spectrum of the observed carbon atom. The covariance matrix of the relative intensities can be derived from the covariance matrix of the multiplet areas: 2 C r = J ⋅ C A ⋅ J T = α ⋅ σ noise ⋅ J ⋅ M ⋅ JT

( B3 )

In Eq.B3 Cr is the covariance matrix of the relative intensities and J the Jacobian containing the partial derivatives of the relative intensities of each multiplet to the areas. The diagonal and off-diagonal elements of J are calculated from Eq.B2: 35

Chapter 2  1 − rj    for j = k  δrj   A tot   =  δA k    − rj    A  for j ≠ k   tot  Filling in these equations in Eq.B3 yields: Cr = β2 ⋅ Ι − r ⋅ i T  ⋅ M ⋅ Ι − r ⋅ i T 

( B4 ) ( B5) T

( B6 )

where the r is the vector containing the relative intensities of a given carbon atom, I is a identity matrix of appropriate size, i is a vector of the size of r and contains only ones. Finally, β2 has been substituted for

α ⋅ σ2noise . Eq.B6 yields the covariance matrix of the A tot

relative intensities of a spectrum. Several of these small covariance matrices relating to the various measured carbon atoms can be combined to one large block-diagonal covariance matrix. Doing so we ignore the fact that the subspectra of several carbon atoms in the 2D [13C,1H] COSY spectrum will have varying signal-to-noise ratios (i.e. various values of β). Furthermore, we ignore possible overlapping of one or more peaks in a spectrum, which certainly does occur in experimental spectra. Still, the obtained covariance matrix will yield more realistic estimated errors than simply assuming constant absolute or relative errors of the relative intensities, which is usually done in 13C-labeling literature. REFERENCES Berthon, H.A., Bubb, W.A., Kuchel, P.W. (1993) 13C N.M.R. Isotopomer and computer-simulation studies of the non-oxidative pentose phosphate pathway of human erythrocytes. Biochem. J., 296: 379-387 Bradshaw, C.W., Fu, H., Shen, G.-J., Wong, C.-H. (1992) A Pseudomonas sp. alcohol dehydrogenase with broad substrate specificity and unusual stereospecificity for organic synthesis. J. Org. Chem., 57: 1526-1532 Christensen, B., Nielsen, J. (1999) Metabolic network analysis. In “Advances in biochemical engineering/biotechnology” (Th. Scheper, Ed.), Vol. 66, pp.209-231, Springer-Verlag, Berlin Christensen, B., Nielsen, J. (2000) Metabolic network analysis of Penicillium chrysogenum using 13Clabeled glucose. Biotechnol. Bioeng., 68, 6: 652-659 Clark, M.G., Williams, J.F., Blackmore, P.F. (1971) The transketolase exchange reaction in vitro. Biochem. J., 125, 381-384 Cordwell, S.J. (1999) Microbial genomes and “missing” enzymes: redefining biochemical pathways. Arch. Microbiol., 172: 269-279 Debnam, P.M., Shearer, G., Blackwood, L., Kohl, D.H. (1997) Evidence for channeling of intermediates in the oxidative pentose phosphate pathway by soybean and pea nodule extracts, yeast extracts, and purified yeast enzymes. Eur. J. Biochem., 246: 283-290 Flanigan, I., Collins, J.G., Arora, K.K., MacLeod, J.K., Williams, J.P. (1993) Exchange reactions catalyzed by group-transferring enzymes oppose the quantitation and the unravelling of the identity of the pentose phosphate pathway. J. Biochem., 213: 477-485 Follstad, B.D., Stephanopoulos, G. (1998) Effect of reversible reactions on isotope label redistribution analysis of the pentose phosphate pathway. Eur. J. Biochem., 252: 360-371

36

Possible pitfalls Van Gulik, W.M., Heijnen, J.J. (1995) A metabolic network stoichiometry analysis of microbial growth and product formation. Biotechnol. Bioeng., 48, 6: 680-698 Haggie, P.M., Brindle, K.M. (1999) Mitochondrial citrate synthase is immobilized in vivo. J. Biol. Chem., 274, 7: 3941-3945 Johnson, M.E. (1987) “Multivariate statistical simulation”, Wiley&Sons, New York Kholodenko, B.N., Westerhoff, H.V., Cascante, M. (1996) Effect of channeling on the concentration of bulk-phase intermediates as cytosolic proteins become more concentrated. Biochem. J., 313: 921-926 Kvassman, J., Petterson, G., Ryde-Petterson, U. (1988) Mechanism of glyceraldehyde-3-phosphate transfer from aldolase to glyceraldehyde-3-phosphate dehydrogenase. Eur. J. Biochem., 172: 427-431 Ljungdahl, L., Wood, H.D., Racker, E., Couri, D. (1961) Formation of unequally labelled fructose 6phosphate by an exchange reaction catalyzed by transaldolase. J. Biol. Chem., 236, 6: 16221624 Marx, A., De Graaf, A.A., Wiechert, W., Eggeling, L., Sahm, H. (1996) Determination of the fluxes in the central metabolism of Corynebacterium glutamicum by nuclear magnetic resonance spectroscopy combined with metabolic balancing. Biotechnol. Bioeng., 49, 2: 111-129 Marx, A., Striegel, K., De Graaf, A.A., Sahm, H., Eggeling, L. (1997) Response of the central metabolism of Corynebacterium glutamicum to different flux burdens. Biotechnol. Bioeng., 56, 2: 168-180 Marx, A., Eikmans, B.J., Sahm, H., De Graaf, A.A., Eggeling, L. (1999) Response of the central metabolism in Corynebacterium glutamicum to the use of an NADH-dependent glutamate dehydrogenase. Metabol. Eng., 1, 1: 35-48 Mathews, C. (1993) The cell- bag of enzymes or network of channels?. J. Bacteriol., 175,20: 63776381 McIntyre, L.M., Thorburn, D.R., Bubb, W.A., Kuchel, P.W. (1989) Comparison of computer simulations of the f-type and l-type non-oxidative hexose monophosphate shunts with 31P-NMR experimental data from human erythrocytes. Eur. J. Biochem., 180: 399-420 Ovádi, J., Keleti, T. (1978a) Kinetic evidence for interaction between aldolase and D-glyceraldehyde3-phosphate dehydrogenase. Eur. J. Biochem., 85: 157-161 Ovádi, J., Salerno, C., Keleti, T., Fasella, P. (1978b) Physico-chemical evidence for the interaction between aldolase glyceraldehyde-3-phosphate dehydrogenase. Eur. J. Biochem., 90: 499-503 Ovádi, J., Mohamed Osman, I.R., Batke, J. (1983) Interaction of the dissociable glycerol-3-phosphate dehydrogenase and fructose-1,6-bisphosphate aldolase. Eur. J. Biochem., 133: 433-437 Ovádi, J. (1988) Old pathway-new concept: control of glycolysis by metabolite-modulated dynamic enzyme associations. TIBS, 13: 486-490 Ovádi, J. (1990) Channeling and channel efficiency: theory and analytical implications. In: “Control of metabolic processes” (A. Cornish-Bowden, M. L. Cárdenas, Eds.), pp. 271-279, Plenum Press, New York Portais,J.-C., Schuster, R., Merle, M., Canioni, P. (1993) Metabolic flux determination in c6 glioma cells using carbon-13 distribution upon [1-13C] glucose incubation. Eur. J. Biochem., 217: 457468 Sauer, U., Hatzimanikatis, V., Bailey, J.E., Hochuli, M., Szyperski, T., Wüthrich, K. (1997) Metabolic fluxes in riboflavin-producing Bacillus subtilis. Nature Biotech., 15: 448-452 Schmidt, K., Carlsen, M., Nielsen, J., Villadsen, J. (1997) Modeling isotopomer distributions in metabolic networks using isotopomer mapping matrices. Biotechnol. Bioeng., 55, 6: 831-840

37

Chapter 2 Schmidt, K., Marx, A., De Graaf, A.A., Wiechert, W., Sahm, H.,Nielsen, J., Villadsen, J. (1998) 13C Tracer experiments and metabolite balancing for metabolic flux analysis: comparing two approaches. Biotechnol. Bioeng., 58, 2&3: 254-257 Schmidt, K., Nørregaard, L.C., Pedersen, B., Meissner, A., Duus, J.Ø., Nielsen, J.Ø., Villadsen, J. (1999) Quantification of intracellular metabolic from fractional enrichment and 13C-13C coupling constraints on the isotopomer distribution in labeled biomass components. Metabol. Eng., 1, 2: 166-179 Schörken, U., Sprenger, G.A. (1998) Thiamin-dependent enzymes as catalysts in chemoenzymatic syntheses. Biochim. Biophys. Acta, 1385, 2: 229-243 Srivastava, D.K., Bernhard, S.A. (1986) Metabolite transfer via enzyme-enzyme complexes. Science, 234: 1081-1086 Szyperski,T. (1995) Biosynthetically directed fractional 13C-labelling of proteinogenic amino acids. Eur. J. Biochem., 232: 433-448 Szyperski, T., (1998) 13C-NMR, MS and metabolic flux balancing in biotechnology research. Quart. Rev. Biophys., 31, 1: 41-106 Toone, E.J., Simon, E.S., Bednarski, M.D., Whitesides, G.M. (1989) Enzyme-catalyzed synthesis of carbohydrates. Tetrahydron, 45, 17: 5365-5422 Van Winden, W.A., Verheijen, P.J.T., Heijnen, J.J. (2001) Possible pitfalls of flux calculations based on 13C-labeling. Metabol. Eng., 3, 2: 151-162 Wiechert, W., De Graaf, A.A. (1997) Bidirectional reaction steps in metabolic networks: I. Modeling and simulation of carbon isotope labeling experiments. Biotechnol. Bioeng., 55, 1: 101-117 Williams, J.F., Arora, K.K., Longenecker, J.P. (1987) The pentose pathway: a random harvest. Int. J. Biochem., 19, 9: 749-817 Wittmann, C., Heinzle, E. (1999) Mass spectrometry for metabolic flux analysis. Biotechnol. Bioeng., 62, 6: 739-750 Wood, T., Muzariri, C.C., Malaba, L. (1985) Complex formation between transketolase, transaldolase and glyceraldehyde phosphate dehydrogenase. Int. J. Biochem., 17, 10: 1109-1115

38

Chapter 3

A Priori Analysis of Metabolic Flux Identifiability from 13CLabeling Data This chapter was published as Van Winden et al. (2001) ABSTRACT The 13C-labeling technique was introduced in the field of metabolic engineering as a tool for determining fluxes that could not be found using the ‘classical’ method of flux balancing. An a priori flux identifiability analysis is required in order to determine whether a 13C-labeling experiment allows the identification of all the fluxes. In this chapter we propose a method for identifiability analysis that is based on the recently introduced ‘cumomer’ concept. The method improves upon previous identifiability methods in that it provides a way to systematically reduce the metabolic network on the basis of structural elements that constitute a network and to use the implicit function theorem to analytically determine whether the fluxes in the reduced network are theoretically identifiable for various types of real measurement data. Application of the method to a realistic flux identification problem shows both the potential of the method by yielding new, interesting conclusions regarding the identifiability and its practical limitations that are caused by the fact that the size and complexity of the symbolic calculations grow fast with the dimension of the studied system.

39

Chapter 3 3.1 INTRODUCTION Since long it has been recognized that analysis of stationary metabolic fluxes is often impeded by the unobservability of fluxes due to parallel pathways, metabolic cycles or bidirectional reactions. Over the last few years 13C-labeling experiments have become a well-established tool in the analysis of these otherwise unobservable fluxes. Various measurement methods are available for determining the labeling distribution of metabolic intermediates (for an overview see Möllney et al. (1999) or Szyperski (1998)). Labeling balances are bi or even trilinear expressions. This characteristic prohibits direct calculation of the unknown intracellular fluxes from the 13C-labeling data and the measured extracellular fluxes. Therefore, the unknown fluxes are mostly determined using an iterative numerical procedure. This procedure consists of using a mathematical model to simulate labeling data as a function of the fluxes and varying the fluxes until the simulated data fit the measured data. A disadvantage of such a flux-fitting algorithm is that a flux estimate can not be proven to be unique (Schmidt et al.,1998;Wiechert,1996). The possibility to uniquely identify the fluxes depends on the topology of the studied metabolic network, the measurable labeled compounds, the 13C-measurement method that is used and the substrate labeling that is applied. Although repeating the iterative flux estimation procedure for various starting values of the fluxes will increase the confidence in the uniqueness of a flux estimate, an analytical a priori (i.e. independent of the values of the actual fluxes) flux identifiability analysis is preferable. Recently, Wiechert et al. (1999) introduced the new ‘cumomer’ concept for describing labeling distributions in metabolites. Cumomers represent sets of molecules that are 13Clabeled at specific atomic positions and of which the remaining atomic positions may be either 13C-labeled or not. By consequence, cumomers are sums of isotopomers. In contrast to isotopomers, cumomers can be expressed as explicit functions of the metabolic fluxes and substrate labeling in a metabolic network. This opens the way for analytically calculating isotopomer distributions for a given set of fluxes. Moreover, the concept of cumomers can be used to tackle the inverse problem: the structural analysis of flux identifiability from known cumomer fractions. In their article Wiechert et al. (1999) illustrate the use of cumomer balances for identifiability purposes by means of two example networks. The authors mention that their identifiability analysis of the examples is rather intuitive and refer to computer algebraic algorithms developed by Wiechert (1995) for a more systematic method. In the concerning article about identifiability and redundancy analysis the author describes how to algebraically solve the flux identifiability problem for positional 13C-enrichment data using so-called Gröbner bases. However, he concludes that the computation of Gröbner bases is strictly limited to small problems, because the computational effort required to find an answer increases dramatically with the complexity of the problem. In another paper, Wiechert (1996) introduces a method for preliminary complexity reduction that allows identifiability analysis for positional 13C-enrichment data for medium-sized metabolic networks. In the paper that introduces the cumomer concept, Wiechert et al. (1999) do not discuss the limitations of the published algebraic solution methods for solving identifiability problems concerning cumomer balances instead of the simpler positional 13C-enrichment balances. Besides that, the presented identifiability analysis of cumomer balances is explicitly based on 40

A priori identifiability analysis the assumption that all cumomer fractions are potentially measurable. In reality, however, cumomer fractions are not the entities that are measured. Labeling experiment yield either positional 13C-enrichments, relative 13C-multiplet areas (e.g. Szyperski,1995) or molecular mass fractions. Of these types of measurement data, only positional 13C-enrichments directly correspond with cumomer fractions. The other types of data are related to cumomers by means of (non-) linear transformations. Analytical a priori identifiability analysis based on relative 13C-multiplet areas or molecular mass fractions has never been performed and will be presented in this chapter. 3.2

A PRIORI IDENTIFIABILITY ANALYSIS I: STRUCTURAL ANALYSIS AND REDUCTION OF METABOLIC NETWORKS In order to calculate fluxes from 13C-labeling data, both the stoichiometry and the carbon atom transfer from the reactants to the products of the reactions in the studied metabolic network need to be precisely known. This biochemical knowledge may be translated into a cumomer network and the corresponding set of cumomer balances in order to get a mathematical framework in which all possible labeling states of the intracellular metabolites are related to the fluxes in the network (Wiechert et al.,1999). In their article in which they introduce the useful concept of cumomers, Wiechert et al. (1999) state that cumomer networks can be reduced in order to reduce the complexity of the identification problem. This statement is not only valid for the case of cumomer networks. Metabolic networks have structural properties that allow network reduction and a preliminary and partial identifiability analysis prior to setting up any kind of labeling balance. The generic rules of network reduction make the presently proposed method for identifiability analysis generally applicable for any kind of labeling experiment. The rationale behind network reduction will be illustrated using the example metabolic network of Fig.2A that was introduced by Wiechert et al. (1999). Omitting single influx metabolic pools One of the reductions of metabolic networks is based on the number of fluxes that enter and leave metabolite pools. Three basic structures exist: linear, divergent and convergent nodes (see Fig.1). In their article Wiechert et al. (1999) argue that pools with only one influx only yield so-called ‘labeling redundancies’ between cumomer fractions and do not give information about fluxes. It can be shown in general (i.e. for any type of balance, be it a positional 13C-enrichment, isotopomer or cumomer balance) that only the labeling distribution of metabolites at convergent nodes yield information about relative flux sizes. This can be demonstrated for the metabolic network shown in Fig.1 by analyzing the labeling balances of linear, divergent and convergent nodes. The flux balance and general labeling balance of the linear node around metabolite B in Fig.1 are: v1 = v 2 v1 ⋅ MM AxB,1 ⋅ a = v 2 ⋅ b

(1)

In Eq.1 MMSxP,i represents a mapping matrix which describes the positions in the product P onto which the carbon atoms of substrate S are mapped to by the ith reaction. The mapping 41

Chapter 3 matrix may either be an atom mapping matrix (Zupke and Stephanopoulos,1994), or an isotopomer mapping matrix (Schmidt et al.,1997) or a transition matrix (Wiechert et al.,1997,1999). The vectors a and b contain the labeling information of metabolites A and B in the form of either positional enrichments, isotopomer fractions or cumomer fractions. Combining the flux and labeling balances in Eq.1 yields:

( 2)

MM AxB,1 ⋅ a = b

Eq.2 is a labeling redundancy which enables the calculation of the labeling distribution of B from that of A and vice versa. The equation does not give any information about the fluxes. It can be shown that the same type of labeling redundancy follows from the divergent node (C in Fig.1):

v 2 = v 3 + v 5 + v8

  MM BxC,2 ⋅ b = c v 2 ⋅ MM BxC,2 ⋅ b = ( v3 + v5 + v8 ) ⋅ c 

convergent

linear A (6)

v1

B (6)

( 3)

v2

v3

C (6)

v5 divergent v8

D (6) v6 v4 v7

E (6) F (1) G (7)

v9 linear

FIGURE 1: A metabolic network featuring three different structural elements. The number of carbon atoms of each metabolite is given between brackets.

A different set of relations results from combining the flux balance and the labeling balance of the convergent node D: v3 + v 6 = v 4   v 3 ⋅ ( MM CxD,3 ⋅ c − d ) + v 6 ⋅ ( MM ExD,6 ⋅ e − d ) = 0 v 3 ⋅ ( MM CxD,3 ⋅ c ) + v 6 ⋅ ( MM ExD,6 ⋅ e ) = v 4 ⋅ d  (4) Rewriting Eq.4 in its full matrix notation and transforming it to its reduced row echelon form (defined in appendix A) leads to one single equation in which the ratio of the fluxes entering the pool appears. The remaining equations are labeling redundancies:

42

A priori identifiability analysis 1  0   ... 0 

( MM ⋅ e − d ) − ( MM

⋅ e − d )1

( MM ⋅ c − d ) ( MM

⋅ c − d )1

  ⋅ − ⋅ ⋅ − c d MM e d   v3  ( MM ExD,6 ) ( ) CxD,3 CxD,3 ExD,6 2 2 1 1 ⋅  = 0 ...   v6  ( MM ExD,6 ⋅ e − d )i − ( MM CxD,3 ⋅ c − d )i ( MM CxD,3 ⋅ c − d )1 ⋅ ( MM ExD,6 ⋅ e − d )1 

v3 v 6 = − ( MM ExD,6 ⋅ e − d )1

ExD,6

( MM

CxD,3

⋅ c − d )1

CxD,3





( MM ExD,6 ⋅ e − d ) − ( MM CxD,3 ⋅ c − d ) ( MM CxD,3 ⋅ c − d ) ⋅ ( MM ExD,6 ⋅ e − d ) = 0 2 2 1 1  ...   ( MM ExD,6 ⋅ e − d )i − ( MM CxD,3 ⋅ c − d )i ( MM CxD,3 ⋅ c − d )1 ⋅ ( MM ExD,6 ⋅ e − d )1 = 0 

( 5)

Assuming that the labeling distribution of the substrates C and E of reactions 3 and 6 is known, Eq.5 shows that only the ratio v3/v6 can be determined from labeling balance and that one single element of vector d suffices to do so. In Eq.5 the first element of d was used to solve the flux ratio. In case (MMCxD,3.c-d)1 equals zero, Eq.5 does not lead to a solution. Note, however, that one is free to choose any of the four rows of the matrix in the top line of Eq.5 to solve the flux ratio. As an example, imagine that 100% of metabolite C in Fig.1 is labeled at the first atomic position and that 100% of metabolite E is labeled at both the first and second position. In case the carbon atoms of C and E are mapped to identical atomic positions in metabolite D (i.e. matrices MMCxD,3 and MMExD,6 are identity matrices), only the measurement of the positional enrichment of the second atom of D will yield a value for (MMCxD,3.c-d)i that is unequal to zero and will thus solve the flux ratio. The matrix in the top line of Eq.5 will have more columns for metabolite pools having more than two input fluxes. By consequence calculation of the flux ratios and labeling redundancies will yield more complicated equations. Still, the fact remains that m-1 measured labeling data suffice to find the m-1 ratios of m influxes provided that the rank of the matrix in Eq.5 is m-1 for the specific measurements, labeling distribution and mapping matrices. The linear node formed by metabolite G in Fig.1 seems to be a converging node too. However, the bimolecularity of the influx v8 should not be confused with convergence of two fluxes. A bimolecular flux counts as one single flux so pool G has a labeling balance similar to Eq.2. From the above it can be concluded that linear and divergent nodes can be removed from the metabolic network in order to reduce the isotopomer model that will be set up. It should be stressed that this does not imply that labeling information of the metabolites at these nodes is useless. On the contrary, by applying the labeling redundancy equations such as Eqs.2 and 3 labeling information of metabolites at linear or divergent nodes may be used to calculate the labeling of metabolites at ‘upstream’ convergent nodes. This fact forms the basis of the extensively used method of deriving of labeling patterns of non-measurable intermediates of the primary metabolism from the labeling patterns of biomass components such as amino acids (e.g. Marx et al.,1996).

43

Chapter 3 Lumping equilibrium pools A second possible reduction of metabolic networks is the lumping of metabolite pools of which the labeling information may be considered instantaneously equilibrated by large exchange fluxes. This reduction is often applied to the hexose monophosphate pools in the upper glycolysis, the triose monophosphate pools in the lower glycolysis and the pentose monophosphate pools in the pentose phosphate pathway (e.g. Follstad and Stephanopoulos,1998;Marx et al.,1996&1997;Schmidt et al.,1997). The lumping of metabolic pools of which the labeling is considered in equilibrium can only be based on available literature data since sizes of fluxes are not known in an a priori analysis. Example Fig.2A shows an example network modeled on the citric acid cycle and anaplerotic reactions as introduced by Wiechert et al. (1999).

A

K

v5

A v1

B v2F

v3F

v2B v4 E H

G v7

B

C

v3B

A w1

v6

D

w5

F

K

B w2F w2B w4 E H

G w3

F

FIGURE 2: An example of a metabolic network. (A) The original network. (B) The network in which single influx pools C and D have been omitted.

44

A priori identifiability analysis In the figure, bidirectional fluxes are shown as separate forward and backward fluxes. Since no information is available about net and exchange fluxes, lumping of equilibrium pools is not possible. Omission of single influx pools, however, is possible because pool C and pool D both have only one influx. Omission of pools C and D leads to the lumping of fluxes v3F, v6 and v7 into one flux w3. Omission of C should also lead to the lumping of forward and backward reactions v3F and v3B. A lumped reaction cycle leading from B and E back to B and E would be the result. However, the labeling pattern of the reaction products B and E of this cycle are necessarily identical to the reaction substrates B and E. Therefore, the lumped reaction cycle has no effect on the labeling and is left out. The reduced metabolic network is shown in Fig.2B. As a first conclusion regarding the identifiablility of fluxes one can thus state that the separate fluxes v3F and v3B in Fig.2A can not be identified using any sort of labeling information. This identifiability problem can only be solved if a measurable influx into pool C is added to the system by supplying C from the outside. This makes C a convergent metabolic node too. 3.3 A PRIORI IDENTIFIABILITY ANALYSIS II: CUMOMER BALANCES The reduced network is used as the base for setting up cumomer balances. Wiechert et al. (1999) showed that the balances of 0-cumomers correspond to flux balances, balances of 1cumomers correspond to positional 13C-enrichment balances and that the total of all cumomer balances can be transformed to isotopomer balances. In the following sections, a stepwise method for a priori analysis of the identifiability of fluxes will be proposed, based on cumomer balances of increasing weight. 0-Cumomer (flux) balances Due to the property of cumomers that 0-cumomers equal one, 0-cumomer balances are equivalent to flux balances. The first step in a priori identifiability analysis is to calculate the degrees of freedom in a (reduced) network that cannot be fixed using the flux balances. In order to do so the flux balances are combined with the directly measured extracellular fluxes to form a set of linear equations:  0  S ∀i w i > 0 ( 6)  ⋅w = w  R  m In Eq.6 S is a NxW stoichiometry matrix, where N is the number of intracellular metabolites and W the total number of fluxes (w) in the reduced metabolic network. Vector wm contains the M measured net conversion rates. Note that each reversible flux is considered as a separate forward and backward flux. By consequence all fluxes wi must have positive values. R is a MxW measurement matrix. Each row of this matrix contains a one at the position that corresponds with one measured net conversion rate in vector wm. Savinell et al. (1992) proposed a method to choose the set of rates (wm) that should be measured in order to minimize propagation of the measurement errors in the intracellular fluxes (w). The intracellular fluxes can be calculated by inverting Eq.6. Often, the set of equations is underdetermined in which case only some of the fluxes can be calculated. The remaining

45

Chapter 3 fluxes can be described as a linear combination of the so-called minimum-norm solution of the set of linear equations and the nullspace: #

S  0  S S * w =   ⋅ ∀i w i ≥ 0 (7)  + null   ⋅ β = w m + null   ⋅ β  R   wm  R R In Eq.7, ‘#’ denotes the pseudoinverse (defined in appendix A). The vector wm* contains the part of the solution of w that is fixed by extracellular measurements. For the fluxes that were directly measured wm*,i equals wm,i. The elements wm*,i corresponding to the non-measured fluxes are linear combinations of the elements of wm. Vector β contains the linear coefficients of the columns that span the null space. The elements of β represent the degrees of freedom that remain after combining the flux balances and extracellular measurements. These degrees of freedom may be fixed using labeling information. It can be analyzed a priori whether the maximal amount of labeling information that is obtained from the system theoretically suffices to solve all the fluxes. In the structural analysis of the labeling balance of the convergent metabolite node D in Fig.1, it was already discussed that the labeling balance of a metabolite pool with m influxes maximally adds m-1 independent flux constraints to the set of flux balances. The maximal number of independent flux constraints is not only restricted by the number of influxes, but also by the total number of independent labeling measurements. Metabolite D in Fig.1 is a six-carbon compound so its positional 13C-enrichment balances maximally yield 6 flux ratios. Its isotopomer or cumomer balances lead to a maximum of 63 constraints: 26 fractions minus one dependent fraction. Eq.8 gives the overall maximum of the number of independent flux constraints that can be obtained from the labeling balances of one convergent metabolite pool: minimum {m − 1 , n} for positional 13 C − enrichment balance

minimum {m − 1 , 2n − 1} for isotopomer / cumomer balance

(8)

In Eq.8 m represents the number of fluxes entering the pool and n is the number of carbon atoms of the balanced metabolite. From Eq.8 one can thus conclude that the complete set of positional 13C-balances of a two-carbon compound cannot yield all the flux ratios if more than two fluxes enter the pool whereas its isotopomer balances cannot yield the ratios of more than three influxes. This observation is especially relevant in the a priori identifiability analysis of sections of the network where small metabolites and a densely connected flux network (e.g. due to bidirectional reactions) occur. Example Assuming that only influx w1 is measured, the reduced metabolic network of Fig.2B yields the following combined set of flux balances and extracellular measurements:  w1    w  1 −1 1 −1 0 −1   2F    0       w 2B      − − 0 1 1 0 1 0     ⋅ w  =   0   (1 0 0 0 0 0 )   3   w m,1    w     4   w5 

46

(9)

A priori identifiability analysis Eq.9 can be rearranged to the following general solution of the fluxes:

 w1     w 2F   1 −1 1 −1 0 −1  #   0    1 −1 1 −1 0 −1   w 2B       0      =  0 1 −1 0 −1 0   ⋅     + null  0 1 −1 0 −1 0   ⋅ β =  w3    w   (1 0 0 0 0 0 )     w 4   (1 0 0 0 0 0 )   m,1     w5  0 0 0  1       1 0 0  β   18   0 1 0   1  −1 8  (10 )  ⋅  β2    ⋅ w m,1 +   0 0 1  β  38  1 −1 0   3  28      −1 1 −1 38 From Eq.10 it is clear that the network has three degrees of freedom. In order to check whether the pools of the metabolic intermediates could theoretically yield enough information to fix the free parameters β1, β2 and β3, Eq.8 is used. Fig.3 shows the metabolic network of Fig.2B once more in some more detail.

A

a1

w1

a2

a1

e1

a2

e2

w2F

e1

w4

w5

b1 b2

w2B w3

E H

K

B

b1

b2

b2

e1

G

e2

F b1

e2 FIGURE 3: The example network of Fig.2B with carbon-transitions added. The letters and numbers in the carbon atoms (black spheres) of the reaction products represent the carbon positions in the substrates of the concerning reaction. E.g. metabolite E is formed either from carbon atoms b1 and b2 by reaction w2F or from carbon atoms b2 and e1 by reaction w3.

The figure shows that metabolites B and E each consist of two carbon atoms and that pools B and E both have two influxes. Assuming all isotopomers of B and E and additionally all input isotopomers can be measured, Eq.8 with m=2 and n=2 predicts that each pool maximally 47

Chapter 3 yields one flux constraint. In other words: the combined labeling information of pools B and E maximally yields two flux constraints and this is not enough to fix the three degrees of freedom. So from this analysis one can already conclude that more than one extracellular flux should be measured. For that reason it is assumed that the amount of F (see Fig.3) that is produced (i.e. flux w3) can be measured as well. Eq.6 becomes: 0   w1   1 0 0 0 0          w 2F   1 5 −1 5  1 0 1 0  w 2B   −1 5 1 5   w m,1   0 1   β1   0 1   β1  * (11)  = ⋅  ⋅   ⇔ w = wm +  ⋅  + 1   w m,3   0 0   β2   w3   0  0 0   β2   w 4   2 5 −2 5   1 −1  1 −1           −1 1   −1 1   w 5   3 5 −3 5  The parameterized part of the general flux solution in Eq.11 contains two degrees of freedom. The remaining degrees of freedom equal the theoretic maximum of additional constraints. This is a necessary, but not a sufficient condition to make the system identifiable.

1-Cumomer (positional 13C enrichment) balances A 1-cumomer of a given molecule represents the fraction of the total of the concerning molecules that has one 13C-atom at a specific molecular position and of which the remaining carbon atoms may be either labeled or not. A 1-cumomer fraction of a specific molecular position is completely equivalent to the positional 13C-enrichment. Therefore, Wiechert et al. (1999) introduced a positional notation for the cumomers that corresponds to the common notation of positional 13C-enrichments: mi represents the fraction of molecules m that are labeled at the ith molecular position. (For cumomers of higher weight this notation is consistently extended: mij represents the 2-cumomer fraction of molecules M that are labeled at the ith and jth positions, etc.) Balances of cumomers of various weights are always solved in a cascade-like fashion starting with the lowest weight. Therefore the 1-cumomer balances are the first to be set up after the flux (0-cumomer) balances. In case the labeling measurements used for flux analysis are positional 13C-enrichments, the 1-cumomer balances are in fact the only balances that are needed. The balances should be written such that the terms containing unknown 1-cumomer fractions and the known substrate 1-cumomer fractions are separated on different sides of the equation sign. This allows rearrangement of the balances to a format in which the unknown fractions are non-linear functions of the known fractions and of the fluxes (Wiechert and De Graaf,1997). Substituting the general flux solution (cf. Eq.7) for the fluxes reduces the number of variables in the expressions to the p free parameters β1 to βp. Using a symbolic mathematical software package (e.g. Maple) one can analyze a priori whether a subset of p of the non-linear expressions can be solved for β. An obvious prerequisite for the existence of a solution is that the equations in the subset feature all p parameters. If one unique symbolic solution is found, the measured positional 13C-enrichments that correspond with the chosen subset of equations allow identification of all fluxes in the metabolic network.

48

A priori identifiability analysis Example The 1-cumomer balances for the network of Fig.3 are given below. The terms with unknown 1-cumomer fractions (bi and ei) are separated from the terms with the known substrate 1cumomer fractions (ai). b1 : b2 : e1 : e2 :

( w1 + w 2B ) ⋅ b1 − w 2B ⋅ e1 = w1 ⋅ a1 ( w1 + w 2B ) ⋅ b 2 − w 2B ⋅ e2 = w1 ⋅ a 2 w 2F ⋅ b1 + w 3 ⋅ b 2 − ( w 2F + w 3 ) ⋅ e1 = 0 w 2F ⋅ b 2 + w 3 ⋅ e1 − ( w 2F + w 3 ) ⋅ e 2 = 0

(12 )

Rearrangement of these balances and substitution of the fluxes by the general flux solutions of Eq.11 yield four expressions that relate the unknown 1-cumomer fractions of B and E to the known substrate labeling (a1, a2), measured extracellular fluxes (wm) and free flux parameters (β): * *  b1   w m,1 + w m,2B + β2    0  b2  =  *  e1   w m,2F + β1  e   0  2 

0 * w m,1 + w *m,2B + β2 w *m,3 w *m,2F + β1

− w *m,2B − β2 0 * − w m,2F − w *m,3 − β1 w *m,3

  * − w m,2B − β2   0  * * − w m,2F − w m,3 − β1  0

−1

 w *m,1 ⋅ a1   *  w ⋅a ⋅  m,1 2   0     0 

(13)

Using a symbolic mathematical software package it can be a priori checked whether any of the combinations of the four equations allows the calculation of β1 and β2. For all combinations of the minimally required two enrichment measurements (i.e. {b1, b2}, {b1, e1}, {b1, e2}, {b2, e1}, {b2, e2}, and {e1, e2}), the set of two equations can be symbolically inverted. This means that in the present example two extracellular flux measurements plus any set of two positional 13C-enrichment measurements theoretically make all fluxes identifiable. The identifiability for a given substrate labeling can be checked by filling in the concerning values of the substrate labeling (a) in Eq.13 prior to checking whether a given combination of measurements theoretically allows the calculation of the fluxes. Filling in identical values for a1 and a2 in Eq.13 results in 1-cumomer fractions of B and E that equal a1 and a2 and are not functions of the parameters. This agrees with the conclusion of Wiechert et al. (1999) that uniformly labeled substrates are clearly unsuitable when applying the positional 13C-enrichment measurement method. >1-Cumomer balances If the available labeling information consists of mass spectrometry (MS) data or 13C-multiplet intensity measurements (Szyperski,1995) instead of positional 13C-enrichments, balances of cumomers with weights larger than 1 have to be set up in order to perform an a priori identifiability check. These higher order cumomer balances are set up such that besides the known substrate cumomer fractions also the non-substrate cumomer fractions of weights smaller than that of the balanced cumomer are arranged on the right hand side of the equation (Wiechert et al.,1999). Due to the condition of cumomer weight preservation discussed by Wiechert et al. (1999), bilinear terms in n-cumomer balances can only contain either fractions of cumomers of a weight lower than n or one cumomer of weight n and one of weight 0. In the first case, the bilinear term contains only cumomer fractions that have already been 49

Chapter 3 calculated in balances of lower weight cumomers. In the latter case the 0-cumomer fraction equals one by definition so the bilinear term reduces to a linear one. As a consequence ncumomers always occur as linear terms in a n-cumomer balance equation (Wiechert et al.,1999), thus allowing the solution of the n-cumomers by means of a matrix inversion similar to Eq.13. By substituting the
( w1 + w 2B ) ⋅ b12 − w 2B ⋅ e12 = w1 ⋅ a12 − w 2F ⋅ b12 + ( w 2F + w 3 ) ⋅ e12 = w 3 ⋅ b 2 ⋅ e1

(14 )

In the balance of the 2-cumomer e12, the 1-cumomers b2 and e1 are arranged on the right hand side. This allows the following rearrangement: −1 * *  w *m,1 ⋅ a12  − w *m ,2B − β2   b12   w m,1 + w m ,2B + β2  ⋅ *  (15)  = *  w ⋅ b 2 ⋅ e1  w *m ,2F + w *m ,3 + β1   e12   − w m ,2F − β1 m ,3   Substituting the non-linear expressions of Eq.13 for b2 and e1 leads to an expression in which all 2-cumomer fractions of metabolites B and E are functions of wm, a and β. As the 2cumomers were the highest weight cumomers, all cumomer fractions of B and E are now expressed as functions of the unknown flux parameters, substrate labeling and the measured metabolic fluxes. A PRIORI IDENTIFIABILITY ANALYSIS III: MS AND 13C-NMR MULTIPLET MEASUREMENTS Measurement equations In order to check whether available mass isotopomer or multiplet measurements make all metabolic fluxes identifiable, the cumomer fractions have to be converted to mass fractions or to relative multiplet peak areas in 13C-spectra. This can be done by linear operations discussed by Möllney et al. (1999). Substitution of the previously obtained non-linear expressions for the cumomer fractions in the equations that relate cumomers and measurements yields relations that express the mass fractions and the relative 13C-multiplet peak areas as functions of wm, a and β. These expressions (from here on referred to as: ‘measurement equations’) are rather complex polynomials consisting of numerous terms containing the elements of β, wm and a. 3.4

50

A priori identifiability analysis The question whether a given set of measurement data enables identification of the flux parameters is equivalent to the question whether the corresponding measurement equations can be solved for β. In general, explicit solutions of polynomial equations of a degree greater than four do not exist. Therefore, a symbolic identifiability analysis is not feasible in case the polynomial terms in the measurement equations have a degree greater than four with respect to the flux parameters which nearly always occurs in realistic problems. An alternative solution for these cases will be proposed in the next section. Example Let us assume that two labeling measurements are available for the analysis of the example network of Fig.3, namely the relative singlet area of the 13C-NMR spectrum of the first carbon atom of E and the m+2 mass fraction of metabolite E, where m is the atomic weight of the unlabeled molecule. The 13C-multiplet peak areas of metabolite E are calculated from the cumomer fractions by the following equation:  es,1   0     ed,1  =  0  es,2   0     ed,2   0

 e ( = 1 by definition)   0 1 0    0 0 1 −1  e 2    ⋅ ( T2 ) ⋅  (from Eq.(13) )     e1  1 0 0     0 0 1  e12 (from Eq.(15) )  

 0   0  0   0

0 1 1  e     e 0 1 1 −1 ⋅ ( T2 ) ⋅  2    e1   1 0 1     1 0 1  e12  

(16 )

In Eq.16, the division of the two vectors is an element-by-element division which converts the 13 C-multiplet peak areas to relative areas. In the equation efs,i denotes the relative 13Cmultiplet peak area of fine structure fs (in this case either a singlet fs=‘s’ or a doublet fs=‘d’) in the 13C-NMR spectrum of the ith carbon atom of a molecule E. The symbols e, e1, e2, and e12 are the cumomers of E and T2 represents the isotopomer-to-cumomer conversion matrix Tn, which was defined by Wiechert et al. (1999) as:  T Tn  T0 = (1) , Tn+1 =  n (17 )   0 Tn  The mass fractions of metabolite E are calculated from the cumomers by (Möllney et al,1999):

e   em +0   1 0 0 0    −1      e2  (18 )  e m +1  =  0 1 1 0  ⋅ ( T2 ) ⋅  e  1 e  0 0 0 1    m+2     e12  In Eq.18, em+i represents the mass fraction of E that has a weight of i plus the weight of the fully unlabeled molecule of E. The cumomer-to-measurement conversions of Eqs.16 and 18 are based on normalized measurement data as opposed to the conversions by Möllney et al. (1999). They based the corresponding conversions on absolute NMR and MS peak areas and introduced a scaling parameter for each spectrum. Although both approaches are mathematically equivalent, in the present chapter normalization of the data is preferred because it restricts the identifiability

51

Chapter 3 problem to the original question whether a number of normalized measurements that equals the number of degrees of freedom in the metabolic network allows the identification of all the fluxes. Importantly, this keeps the number of variables in the symbolic equations at its absolute minimum. Substituting the cumomer fractions in Eqs.16 and 18 by the appropriate expressions of Eqs.13 and 15 one obtains (using e.g. Maple) Eqs.19 and 20 that express the relative singlet area of the 13C-NMR spectrum of the first carbon atom of E and the m+2 mass fraction of metabolite E as functions of wm, a and β. For the sake of readability, values are filled in for wm and a. The measured extracellular fluxes are assumed to be: wm,1=100 and wm,3=30. The substrate cumomer distribution is assumed to be as follows: a1=0.3, a2=0.4, a12=0.2. 2

0 = es,1 -0.17 ⋅10-1 ⋅ (0.25 ⋅1012 ⋅β1 +0.67 ⋅1011 ⋅β2 +0.11 ⋅1011 ⋅β1 +0.65 ⋅1010 ⋅β2 ⋅β1 +0.66 ⋅109 ⋅β2 2

2

3

2

+0.23 ⋅1013 +0.23 ⋅109 ⋅β2 ⋅β1 +0.54 ⋅108 ⋅β2 ⋅β1 +0.36 ⋅107 ⋅β1 ⋅β2 +0.15 ⋅107 ⋅β2 ⋅β1 4

2

3

3

3

2

2

2

3

+0.20 ⋅105 ⋅β1 ⋅β2 +0.13 ⋅105 ⋅β2 ⋅β1 +0.14 ⋅106 ⋅β2 ⋅β1 +0.25 ⋅104 ⋅β2 ⋅β1 +0.23 ⋅109 ⋅β1 4

5

3

+0.25 ⋅107 ⋅β1 +0.10 ⋅105 ⋅β1 +0.21 ⋅107 ⋅β2 ) / ((0.21⋅105 +0.88 ⋅103 ⋅β1 +0.10 ⋅102 ⋅β1

2

2

+0.22 ⋅103 ⋅β2 +0.70 ⋅101 ⋅β2 ⋅β1 ) ⋅ (0.85 ⋅104 +0.40 ⋅103 ⋅β1 +0.50 ⋅101 ⋅β1 +0.87 ⋅102 ⋅β2

(19 )

+0.30 ⋅101 ⋅β2 ⋅β1 ) ⋅ (0.40 ⋅103 +0.10 ⋅102 ⋅β1 +0.30 ⋅101 ⋅β2 )) 2

0 = em+2 - 0.25 ⋅10-2 ⋅ (0.51 ⋅1011 ⋅β2 +0.23 ⋅1012 ⋅β1 +0.11 ⋅1011 ⋅β1 +0.54 ⋅1010 ⋅β2 ⋅β1 +0.45 ⋅109 ⋅β2 2

2

3

2

2

2

4

+0.22 ⋅109 ⋅β2 ⋅β1 +0.39 ⋅108 ⋅β2 ⋅β1 +0.39 ⋅107 ⋅β1 ⋅β2 +0.11 ⋅107 ⋅β2 ⋅β1 +0.28 ⋅105 ⋅β1 ⋅β2 2

3

3

3

2

3

4

+0.12 ⋅105 ⋅β2 ⋅β1 +0.82 ⋅105 ⋅β2 ⋅β1 +0.13 ⋅104 ⋅β2 ⋅β1 +0.29 ⋅109 ⋅β1 +0.38 ⋅107 ⋅β1 5

3

+0.20 ⋅105 ⋅β1 +0.13 ⋅107 ⋅β2 +0.19 ⋅1013 ) / ((0.40 ⋅103 +0.10 ⋅10 2 ⋅β1 +0.30 ⋅101 ⋅β2 ) 2

⋅(0.85 ⋅104 +0.40 ⋅103 ⋅β1 +0.50 ⋅101 ⋅β1 +0.87 ⋅10 2 ⋅β2 +0.30 ⋅101 ⋅β2 ⋅β1 ) 2 )

( 20 )

The above two measurement equations should be inverted in order to check the identifiability of β1 and β2 from the chosen measurement data. Multiplying the measurement equations with the denominators of the terms containing the parameters βi, one obtains polynomials of the fifth degree in the parameters. As was mentioned earlier, explicit solutions for polynomial equations of this degree do not exist. So although the presently studied example network is still limited in number of parameters and cumomer weights it already yields non-linear equations for which no general solutions exist. It is clear that symbolic equations expressing measurement data as a function of parameters will be even far more complicated for many realistic metabolic networks. Implicit function theorem Applying the implicit function theorem (e.g. Lang,1993) to the measurement equations may solve the inversion problem that was discussed in the previous section. The measurement equations (Eqs.19 and 20) were of the following form: 0 = F1 ( m1 , β ) 

0 = Fp ( m p , β )

( 21)

In Eq.21 mi represents the ith measurement, p is the total number of measurement data that will be checked for their information content regarding the parameters in the p-dimensional 52

A priori identifiability analysis vector β. The implicit function theorem states that the elements of β are explicitly determined by m, if the determinant of the matrix containing the partial derivatives of the implicit functions F with respect to the elements of β (from here on: ‘det(∂F/∂β)’) is unequal to zero. So, by symbolically calculating det(∂F/∂β) one can determine whether the system is structurally identifiable (det(∂F/∂β)≠0) or not (det(∂F/∂β)=0). In case the determinant is not zero but a function of the parameters, it can only be checked whether the fluxes are locally identifiable for a given set β0. If only local identifiability can be shown, global identifiability may be made plausible by performing a grid search. This is done by using the symbolical determinant to quickly calculate the value of the determinant for a number of combinations of the p parameters. Note that a comprehensive grid search becomes practically infeasible for large values of p. This problem is, however, alleviated by the fact that the p-dimensional grid does not cover the entire »p-space. Referring to Eq.7 we see that the general solution of the fluxes in a metabolic network can be rearranged to:

S S null   ⋅ β ≥ −   R R

#

 0  ⋅   wm 

( 22 )

Eq.22 represents a number of constraints on the allowed parameter values β. This may considerably reduce the grid search. Furthermore, researchers can usually indicate what lower and upper bounds can be reasonably expected for many of the fluxes. Extending Eq.22 with these additional constraints, the grid search is limited to a small subspace of »p. Once identifiability has been shown, the linearized form of the explicit expressions β(m) can be calculated. Firstly, the flux parameters in Eq.21 are expressed as functions of m: 0 = F1 ( m1 , β1 ( m ) ,..., βp ( m ) ) 

0 = Fp ( m p , β1 ( m ) ,..., βp ( m ) )

( 23)

Working out the total derivatives of Eq.23 with respect to m we obtain:

 ∂F1 ∂m1 ∅   ∂F1 ∂β1  ∂F1 ∂βp   ∂β1 ∂m1 … ∂β1 ∂m p                + ⋅ =0  ∅     ∂Fp ∂m p   ∂Fp ∂β1  ∂Fp ∂βp   ∂βp ∂m1 … ∂βp ∂m p  

( 24 )

As det(∂F/∂β) is unequal to zero (this was the condition for identifiability), the matrix ∂F/∂β is invertible and Eq.24 can be rearranged to obtain the Jacobian of the explicit functions β(m): −1  ∂β1 ∂m1 … ∂β1 ∂m p   ∂F1 ∂β1  ∂F1 ∂β p   ∂F1 ∂m1 ∅                 = −  ⋅  ( 25 )  ∂β ∂m … ∂β ∂m   ∂F ∂β  ∂F ∂β   ∅ ∂Fp ∂m p  1 p p 1 p p  p  p  Finally, using this Jacobian, the explicit expressions β(m) can be written in a linearized form:

 β1   ∂β1 ∂m1 … ∂β1 ∂m p   m1 − m1,0   β1,0                = ⋅ +    β   ∂β ∂m … ∂β ∂m   m − m   β  1 p p  p p,0   p,0   p  p

( 26 ) 53

Chapter 3 Following the a priori identifiability analysis the Jacobian of Eq.26 can be employed for choosing the substrate labeling and labeling measurements that lead to a minimal propagation of errors in the labeling measurements into the calculated parameters β and thus in the fluxes (Möllney et al.,1999). Example Firstly, it is checked whether the measurement of positional 13C-enrichments of metabolite E allows identification of the fluxes in the example network of Fig.3. Note that this identifiability problem was already solved in the section about 1-cumomer balances, since the choice of positional 13C-enrichments as measurements led to simple measurement equations that could be algebraically inverted. In the concerning section it was concluded that positional 13C-enrichments do not identify the fluxes when a combination of unlabeled and uniformly labeled substrate is applied. This conclusion is checked using the implicit function theorem. The matrix ∂F/∂β is symbolically calculated for the measurement equations of the positional 13C-enrichments of E (lower two relations of Eq.13). No values for wm are filled in. The following cumomer fractions (corresponding to 90% unlabeled A and 10% uniformly labeled A) are filled in: a=1, a1=0.1, a2=0.1, a12=0.1. The determinant of the matrix (det(∂F/∂β)) indeed equals zero irrespective of the values of wm. When a=1, a1=0.3, a2=0.4, a12=0.2 is applied as the substrate labeling, det(∂F/∂β) is unequal to zero. Therefore, it can be concluded that measurement of the positional 13C enrichments of E theoretically allows identification of the fluxes when this substrate labeling is applied. The practical identifiability (defined as the statistical quality of the estimated fluxes) will depend on the actual values of wm and β. Now, it is checked whether the measurements of the relative singlet area of the 13CNMR spectrum of the first carbon atom of E and the m+2 mass fraction of metabolite E make the fluxes of the network of Fig.3 identifiable. The measurement equations are those of Eqs.19 and 20, but differ in that no values for wm were filled in. Calculation of det(∂F/∂β) yields a value of zero irrespective of the values of wm showing that the fluxes cannot be identified for a combination of unlabeled and uniformly labeled substrate A. On the contrary, the substrate a=1, a1=0.3, a2=0.4, a12=0.2 leads to a determinant unequal to zero, so the fluxes are at least locally identifiable when a suitable set of labeled substrates is used. In order to check whether the determinant is unequal to zero for a large range of parameter values (i.e. a large range of fluxes), a grid search is performed. The possible combinations (β1, β2) do not cover the entire R2-space. Filling in the values wm,1=100 and wm,3=30 as measured extracellular fluxes in Eq.11 yields the following constraints for the parameter values: 1 0  −14    β    0 1  ⋅  1  ≥  14  ( 27 )  1 −1  β2   −28       −1 1   −42 

54

A priori identifiability analysis Performing a grid search for the allowed integer values of (β1,β2) smaller than (100,100) yields the values of the determinant shown in Fig.4. This figure shows that det(∂F/∂β) is larger than zero for the entire allowed area. The determinant decreases with increasing values of (β1,β2). This is understandable, because a simultaneous increase of β1 and β2 only leads to an increasing exchange flux (w2) between pools B and E (see Fig.3). At very high exchange fluxes, the labeling distribution of pools B and E will equilibrate, leading to practical unidentifiability of the fluxes between the pools (i.e.: det(∂F/∂β)=0). Such a high exchange flux between metabolite pools B and E means that the metabolic network can be further reduced: the pools B and E can be lumped. Subsequently, a new identifiability analysis can be performed for the reduced network.

FIGURE 4: The determinant of the Jacobian (det(∂F/∂β)) of the linearized measurement equations plotted against the values of β1 and β2. The chosen measurements are the singlet of the first carbon atom of metabolite E and the mass fraction m+2 of the same metabolite E. The applied substrate labeling is: a00=0.5, a01=0.2, a10=0.1, a11=0.2 and the measured extracellular fluxes are assumed to be: wm,1=100 and wm,3=30.

3.5 PRACTICAL APPLICATION In a recent article, Petersen et al. (2000) used the 13C-labeling technique for quantifying the fluxes through a complex set of C3-C4 metabolite interconversions at the anaplerotic node of Corynebacterium glutamicum. In order to do so they grew the microorganism on a mixture of labeled carbon sources, namely [1-13C] glucose, [u-13C6] glucose and [3-13C] lactate. According to them the use of [1-13C] glucose alone did not allow to discriminate between phosphoenolpyruvate (pep) and pyruvate (pyr) involving fluxes as this labeling had been 55

Chapter 3 shown to lead to identical positional enrichments of these two metabolites (Marx et al.,1996). Adding [3-13C] lactate led to a differently labeled pyr pool thus enabling the determination of the fluxes from pep and pyr to oxaloacetate (oxa). Adding [u-13C6] glucose enabled the determination of the corresponding backward fluxes. The currently proposed identifiability analysis was applied to reduce the metabolic network studied by Petersen et al. (2000) and to set up the symbolic cumomer balances. This yielded a reduced metabolic network (see Fig.5) consisting of twelve fluxes. Six of these were supposed measurable, namely the two fluxes entering the studied network from the lower glycolysis and from the uptake of lactate and four fluxes leaving the network to form biomass components. In the reduced network three convergent nodes remained of which cumomer balances were set up: pep, pyr and oxa. The 0-cumomer balances yielded three independent mass balances. Combined with the six independently measurable rates, this left three degrees of freedom to be determined from labeling measurements.

g3p w1 CO2

pep w2

w6

CO2 w7

w4

pyr CO2

w5

w9

biomass

w8

lac biomass

w10 w11

oxa

w12

akg > biomass, 2 CO2 oxa > biomass, 3 CO2

w3 3 CO2 FIGURE 5: The reduced metabolic network including the C3-C4 interconversions at the anaplerotic node of Corynebacterium glutamicum. Dotted lines are assumed to be measured.

In trying to find the symbolic expressions giving all isotopomers of the three intermediates as functions of the substrate labeling, measured fluxes and parameters, we found that our software (Maple, version 6) and hardware (Pentium-MMX CPU, 233 MHz, 32 MB RAM) 56

A priori identifiability analysis was limiting. We did not obtain a solution. However, filling in reasonable values for the substrate labeling and measured fluxes taken from Marx et al. (1996) we could express all cumomers and consequently all isotopomers of pep, pyr and oxa as functions of the three free parameters. This allowed us to perform an identifiability analysis of all the fluxes. The data of Marx et al. that were used as substrate labeling were the positional enrichments of glyceraldehyde 3-phosphate and the positional enrichment of carbon dioxide. In their experiments Marx et al. only supplied [1-13C] glucose to their culture, so the lactate uptake flux (w8 in Fig.5) was zero. In order to fix the three remaining degrees of freedom, three labeling measurements were to be added to the aforementioned data. The identifiability analysis showed that not all combinations of three positional enrichments of pep, pyr and oxa allowed identification of all fluxes. Only the positional enrichment data sets including the enrichment of the fourth carbon atom of oxa made the fluxes identifiable. Regarding the measurements of the relative intensities of multiplets in the 13C-spectra that were additionally done by Petersen et al. (2000) it was found that the three independent relative intensities of the fine structures of the second carbon atom of both pyr and oxa did not allow identification of all fluxes, whereas those of the third carbon atom of oxa did. We could thus conclude from our identifiability analysis that neither the feeding of [313 C] lactate nor that of [13C6] glucose is necessary for determining all separate fluxes through the C3-C4 metabolite interconversions at the anaplerotic node. This is explained by the fact that the positional enrichments of pep and pyr are different even if only [1-13C] glucose is fed to the culture. This difference is due to the fact that the labeling of pep only depends on the lower glycolysis flux and the flux from oxa to pep, whereas the pyr labeling is also influenced by the flux from oxa to pyr. Note that the difference may be small compared to the measurement accuracy making determination of the fluxes hard in practice. This demonstrates the importance of a sound statistical analysis of measurement errors and resulting errors in determined fluxes. Although we have found that it is theoretically not necessary to supply [3-13C] lactate and [u-13C6] glucose to the culture, one may decide on statistical grounds that it is better to do so. 3.6 CONCLUSIONS In the above we explored possibilities to gain insight in identifiability of metabolic networks consisting of fluxes and nodes. Analysis of the general labeling balances of the nodes shows that only the balances of nodes where several fluxes converge add flux constraints to the set of flux balances. By consequence, the metabolic nodes that only have one single influx can be omitted from metabolic models that are used for flux analysis based on 13C-labeling experiments. Another possibility for network reduction is the lumping of metabolite pools of which the labeling distribution is equilibrated by fast exchange reactions. The concept of cumomer balancing is suitable for a priori identifiability analysis of the reduced metabolic network. Balances of 0-cumomers can be solved in a generic way which yields a description of all fluxes as linear combinations of measured extracellular fluxes and a number of flux parameters that equals the degree of freedom in the flux balances. Balances of 1-cumomers can be rearranged such that the unknown 1-cumomer fractions are non-linear 57

Chapter 3 functions of the known substrate labeling, the measured extracellular fluxes and the unknown flux parameters. When the balances of cumomers of further increasing weight are solved, the structure of the cumomer balances allows a rearrangement such that the balanced cumomers are functions of previously solved cumomers. By repeatedly substituting the symbolical solutions of cumomer balances in the cumomer balances of higher weight, all cumomers can be explicitly described as non-linear functions of the known substrate labeling, the measured extracellular fluxes and the flux parameters. In case only positional 13C-enrichments are measured, the symbolic solutions of the 1cumomer balances can be equated to the corresponding measurement data. In case the measurement method is mass spectrometry or 13C-NMR, the measurement data can be expressed as functions of cumomers of various weights. The cumomers in these expressions can be substituted by the symbolic solutions of the corresponding balances. The resulting measurement equations express the measured data as non-linear functions of the substrate labeling, the measured extracellular fluxes and the unknown flux parameters. Identifiability of the unknown fluxes is equivalent with the possibility to symbolically invert these measurement equations. Inversion is only feasible for the relatively simple measurement equations of positional 13C-enrichments. The inversion of measurement equations of mass fractions or relative 13C-multiplet peak areas poses algebraic problems. The measurement equations can, however, be proven to be either not invertible or locally invertible by means of the implicit function theorem. This concludes the systematical method for a priori flux identifiability analysis. When applying the a priori identifiability analysis to a realistic metabolic network with three degrees of freedom it was found that the solution of the problem is still limited by the fast growth of symbolic calculations with the dimension of the system. Values had to be substituted for the measured fluxes and substrate labeling in order to allow solution of the system, reducing the a priori identifiability analysis to an identifiability analysis for a specific case. Nevertheless, the potential of the flux identifiability analysis was shown by applying it to the recently published case of flux analysis by Petersen et al. (2000). New conclusions regarding its identifiability could be drawn indicating that the flux analysis could theoretically have been done using data from an earlier study in which only [1-13C1] enriched glucose was used instead of the complex mixture of differently labeled substrates that was applied in the more recent study. APPENDIX A: LINEAR ALGEBRA Reduced row echelon form A matrix is said to be in reduced row echelon form if the following conditions are satisfied (source: http://omega.albany.edu:8008/ mat220dir/rref.html): - The first nonzero number in a row is a 1 (called a ‘leading 1’), - All rows of zeros (if there are any) are together at the bottom of the matrix, - Each column that contains a leading 1 has zeros everywhere else.

58

A priori identifiability analysis Pseudoinverse The pseudoinverse or Moore-Penrose inverse (indicated by ‘#’) of a singular matrix A with the dimensions of MxN and rank R is defined as follows (Lay,1994): A # = VR ⋅ D−1 ⋅ U TR

( A1)

The matrices UR, VR and D in Eq.A1 are obtained from the reduced singular value decomposition of A: A = [UR

 D 0   VRT  ⋅  T  = U R ⋅ D ⋅ VRT U M-R ] ⋅    0 0   VN-R 

( A2 )

REFERENCES Follstad, B.D., Stephanopoulos, G. (1998) Effect of reversible reactions on isotope label redistribution analysis of the pentose phosphate pathway. Eur. J. Biochem., 252: 360-371 Lang, S. (1993) Calculus of several variables. New York: Springer-Verlag. p.446 Lay, D.C. (1994) Linear algebra and its applications. Reading: Addison-Wesley Publishing Company. p. 433 Marx, A., De Graaf, A.A., Wiechert, W., Eggeling, L., Sahm, H. (1996) Determination of the fluxes in the central metabolism of corynebacterium glutamicum by nuclear magnetic resonance spectroscopy combined with metabolic balancing. Biotechnol. Bioeng., 49, 2: 111-129 Marx, A., Striegel, K., De Graaf, A.A., Sahm, H., Eggeling, L. (1997) Response of the central metabolism of Corynebacterium glutamicum to different flux burdens. Biotechnol. Bioeng., 56, 2: 168-180 Möllney, M., Wiechert, W., Kownatzki, D., De Graaf, A.A. (1999) Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol. Bioeng., 66, 2: 86-103 Petersen, S., De Graaf, A.A., Eggeling, L., Möllney, M., Wiechert, W. (2000) In vivo quantification of parallel and bidirectional fluxes in the anaplerosis of Corynebacterium glutamicum. J. Biol. Chem., 275, 46: 35932-35941 Savinell, J.M., Palsson, B.O. (1992) Optimal selection of metabolic fluxes for in vivo measurement I, II. J. Theor. Biol., 155: 201-242 Schmidt, K, Carlsen, M., Nielsen, J., Villadsen, J. (1997) Modelling isotopomer distributions in biochemical networks using isotopomer mapping matrices. Biotechnol. Bioeng., 55, 6: 831-840 Schmidt., K. (1998) Quantification of intracellular metabolic fluxes with 13C tracer experiments, PhDthesis, Danish University of Technology, Copenhagen Szyperski,T. (1995) Biosynthetically directed fractional 13C-labelling of proteinogenic amino acids. Eur. J. Biochem., 232: 433-448 Szyperski, T. (1998) 13C-NMR, MS and metabolic flux balancing in biotechnology research. Quart. Rev. Biophys., 31,1: 41-106 Van Winden, W.A., Heijnen, J.J., Verheijen, P.J.T., Grievink, J. (2001) A priori analysis of metabolic flux identifiability from 13C-labeling data. Biotechnol. Bioeng., 74, 6: 505-516 Wiechert, W. (1995) Algebraic methods for the analysis of redundancy and identifiablility in metabolic 13C labelling systems. In: Schomburg, D., Lessel, U., editors. Bioinformatics: from Nucleic Acids and Proteins to Cell Metabolism, Berlin, Verlag Chemie, pp. 169-184 Wiechert, W. (1996) Metabolic flux determination by stationary 13C tracer experiments: analysis of sensitivity, identifiablility and redundancy, In: Dolezal, J., Fidler, J., editors. IFIP TC7, Conference on System Modelling and Optimization, New York, Chapman and Hall

59

Chapter 3 Wiechert, W., De Graaf, A.A. (1997) Bidirectional reaction steps in metabolic networks: I. Modeling and simulation of carbon isotope labeling experiments. Biotechnol. Bioeng., 55 ,1: 101-117 Wiechert, W., Möllney, M., Isermann, N., Wurzel, M., De Graaf, A.A. (1999) Bidirectional reaction steps in metabolic networks: III. Explicit solution and analysis of isotopomer labeling systems. Biotechnol. Bioeng., 66, 2: 69-85 Zupke, C., Stephanopoulos, G. (1994) Modelling of isotope distributions and intracellular fluxes in metabolic networks using atom mapping matrices. Biotechnol. Progr., 10: 489-498

60

Chapter 4

Innovations in Generation and Analysis of 2D [13C,1H] COSY Spectra for Metabolic Flux Analysis Purposes This chapter was published as Van Winden et al. (2001) ABSTRACT 2D [13C,1H] COSY NMR is used by the metabolic engineers for determining carbon-carbon connectivities in intracellular compounds that contain information regarding the steady state fluxes in cellular metabolism. This chapter proposes innovations in the generation and analysis of these specific NMR spectra. These include a software tool that allows accurate determination of the relative peak areas and their complete covariance matrices even in very complex spectra. Additionally, a method is introduced for correcting the results for isotopic non-steady state conditions. The proposed methods are applied to measured 2D [13C,1H] COSY spectra. When analyzing the spectra, it is observed that peak areas in a onedimensional section of the spectrum are frequently not representative for peak volumes in the two-dimensional spectrum. Furthermore, it is shown that for some spectra a significant amount of additional information can be gained from long-range 13C-13C scalar couplings in 2D [13C,1H] COSY spectra. Finally, the NMR resolution enhancement by dissolving amino acid derivatives in a non-polar solvent is demonstrated.

61

Chapter 4 4.1 INTRODUCTION For several decades labeling experiments have been used to analyze fluxes in the carbon metabolism of the cell that cannot be determined from net consumption or production rates, such as parallel or cyclic reactions, alone. Tracer atoms that are commonly used in these studies are either the radioactive carbon isotope 14C or the stable isotope 13C. The corresponding measurement methods are scintillation counting, nuclear magnetic resonance spectroscopy (NMR) or mass spectrometry (MS). As for NMR, the most frequently reported method is the measurement of fractional enrichments of specific carbon atoms (Portais et al.,1993; Sonntag et al.,1993; Marx et al.,1996; De Graaf et al.,1999). In several studies of the TCA cycle, NMR measurements of fractional enrichments were complemented by measurements of intensities of fine structures in 13C-spectra that are caused by adjacent 13Catoms (Malloy et al.,1987; Tran-Dinh et al.,1996; Jucker et al.,1998). Szyperski (1995) proposed the use of mixtures of non-labeled and uniformly 13Clabeled substrates in metabolic flux analysis. The use of these substrates leads to identical fractional enrichments for all carbon atoms. Therefore, the information to be gained from these experiments solely resides in the 13C-13C connectivities. These connectivities are derived from relative intensities of fine structures in 13C-spectra that are obtained by means of 2D [13C,1H] COSY NMR. These isotopomer measurements potentially constitute a much richer source of information than fractional enrichments. Recently, software allowing the simulation of complete isotopomer distributions has made it possible to simultaneously estimate all metabolic fluxes from any type of labeling information (Schmidt et al.,1997). Combined with the application of mixtures of specifically and uniformly labeled substrates this opened the possibility to generate and analyze rich sets of both fractional enrichment data and relative fine structure intensity measurements for flux estimation (Schmidt et al.,1999). The first step in determining fluxes from NMR data is the spectral analysis. This step partly determines the accuracies of the estimated fluxes and yields the information that is needed to determine their confidence intervals. Relatively little attention has been paid to this spectral analysis step in metabolic engineering literature. This chapter summarizes and extends current knowledge of a specific type of NMR spectra, namely a 2D [13C,1H] COSY spectrum. The presented innovations in the spectral analysis include: • the introduction of a set of linear constraints on the positions of spectral peaks. These constraints yield a small set of parameters that accurately predict all peak positions and allow cross-checking of the assignment of peaks to their corresponding fine structures in complex spectra, • the combination of the set of positional parameters with a flexible mathematical description of NMR lineshapes that is sparse in parameters. The resulting spectral model allows a nonlinear fit of measured NMR spectra that is both very fast and flexible enough only to yield small and randomly distributed residuals, • a sensitivity analysis and error propagation calculation that yields the complete covariance matrix of the estimated relative intensities based on an estimation of the spectral noise, 62

Analysis of 2D [13C,1H] COSY spectra • an extension of existing isotopic non-steady state corrections for the case of relative fine structure intensities, • a method to calculate the amount of additional information to be gained from longrange 13C-13C scalar couplings in 2D [13C,1H] COSY spectra. Nonlinear fitting of 2D [13C,1H] COSY spectra is not new. Szyperski et al. (1999) reported the development of a specialized software package FCAL for this purpose. However, this package will not be available to many workers in the field and the underlying calculations and assumptions were not published. This chapter offers a systematic procedure for nonlinear spectral fitting that can be applied by anyone interested in maximizing the quality and quantity of data extracted from their spectra. The proposed methods are applied and verified using two extensive sets of measured 2D 13 1 [ C, H] COSY spectra. Furthermore, it is checked whether peak intensities in a onedimensional section of a spectrum are representative for relative peak volumes in the original two-dimensional spectrum. Finally, the NMR resolution enhancement by using amino acid derivatives that are soluble in less polar solvents is demonstrated. 4.2 THEORY Single section versus sum of sections Measuring 13C-labeling by means of the 2D [13C,1H] COSY technique results in a spectrum that gives the NMR-signal intensities of the various carbon fine structures versus their 13Cand the 1H-frequencies (see Fig. 1A). Separation of cellular components prior to these NMR measurements is not strictly needed, because most of the carbon atoms of the different cellular components have their own unique set of 13C- and 1H-chemical shifts (Szyperski et al.,1996). These unique coordinates make it possible to assign all the NMR signals to the corresponding carbon atoms.

A

B ω(1H)

Signal Intensit

ω(13C)

FIGURE 1: (A) Schematic example of a 2D [13C,1H] COSY spectrum showing the intensities of the singlet and doublet fine structures of a carbon atom. (B) A section parallel to the 13C-frequency axis of the 2D spectrum and the resulting 1D-multiplet.

63

Chapter 4 Spectral analysis is made easier by making a one-dimensional section of a spectrum parallel to the 13C-frequency axis (see Fig. 1B). In case the relative areas of the various peaks in the section are independent of the 1H-frequency at which the section is made, these relative areas can be used instead of the relative volumes in order to calculate the relative intensities of the peaks. However, if the relative peak areas in the section depend on the 1H-frequency, a onedimensional plot should be made by summing the sections at successive 1H-frequencies (Szyperski,1995). Computer tools for improved spectral analysis The relative peak areas (or: relative intensities) in one-dimensional sections of 2D [13C,1H] COSY spectra correspond to (groups of) isotopomers (see Fig.2) and can be used for flux analysis (Szyperksi,1995; Schmidt et al.,1999; Petersen et al.,2000). : One-bond 13C-13C couplings plus one long-range 13C-13C coupling

Only one-bond 13 C-13C couplings: +

+ observed carbon atom

FIGURE 2: The fine structures that can be distinguished in a one-dimensional section of a 2D [13C,1H] COSY spectrum with only two one-bond 13C-13C scalar couplings (left) and with an additionally observable long-range 13C-13C scalar coupling (right). The isotopomer groups causing the various fine structures are shown in the middle. The respective summed spectra are shown below.

When peak areas are manually determined by indicating between which lower and upper frequency the signal area is to be integrated, the following problems are encountered: • determination of the beginning and end of separate peaks introduces a subjective element in the calculated areas. Moreover, it is often difficult or even impossible to do so due to overlapping peaks. If not all peaks of a fine structure overlap, one can only integrate the areas of the non-overlapping peaks and correct these for the non-

64

Analysis of 2D [13C,1H] COSY spectra observable peaks. This will, however, increase the error in the determined area by the same correction factor, • in complex spectra including many (overlapping) peaks it is hard to assign the various peaks to the fine structures. This is especially the case in spectra showing long-range couplings (see e.g. Fig.5), • some of the amino acids in biomass lysates that are subjected to NMR analysis are present at low concentrations. E.g. phenylalanine constitutes less than 4% of the cellular protein in E. coli, P. chrysogenum and S. cerevisiae (Stephanopoulos et al.,1998). Besides, labeling experiments for 2D [13C,1H] COSY NMR analysis are usually performed with low fractions of labeled substrate (e.g. 10% fully labeled glucose (Szyperski,1995) or 100% glucose labeled at one single position (Marx et al.,1996)). Both factors result in low signal-to-noise ratios which makes all of the above problems even worse. For the reasons stated above computer aided methods are required to identify and disentangle overlapping peaks (Wittig et al.,1995). Multiple techniques or even software packages for (automated) analysis of NMR spectra have been developed in the past (e.g. Bartels et al.,1997; Ge et al.,1993; Stephenson and Binsch,1980; Günther et al.,2000; Szyperski et al.,1999). These packages are generally good at analyzing NMR spectra of all sorts based on little to no user-supplied information. Trained NMR-operators applying 2D [13C,1H] COSY NMR to determine labeling patterns of biomass components will have no problems in assigning the signals in a spectrum to the various amino acids and carbohydrate-related compounds. They only need computer support for accurate determination of the signal intensities of the specific compounds. Ideally such a computer tool contains prior information about the specific characteristics of 2D [13C,1H] COSY spectra, such as lineshapes, positions of fine structure peaks with respect to those of other fine structures and constraints imposing equal height on peaks of one fine structure (Wittig et al.,1995). Such prior information helps to prevent erroneous peak assignments and its helps the algorithm to quickly converge to an optimal fit of the spectrum. The following section extends the prior information and constraints imposed on fits of 2D [13C,1H] COSY spectra that were proposed by Wittig et al. (1995). Spectral model Lineshapes of one-dimensional NMR peaks are not purely Gaussian or Lorentzian (Marshall et al.,2000). The Voigt function accurately describes experimental lineshapes, but cannot be analytically calculated. That is why several approximations of the Voigt lineshape are in use. Examples include the sum or product of a Lorentzian and a Gaussian lineshape (Stephenson and Binsch,1980; Wittig et al.,1995; Conny and Powell,2000) or the Pearson VII lineshape (Subhash and Mohanan,1997). The Pearson VII lineshape (Eq.1) was chosen as the basis of our spectral model because it is sparse in parameters and still approximates the Voigtian lineshape very well.

65

Chapter 4 Si ( ω13C ) =

hi   13C 13C  1 ⋅  ω − ωmax,i  p  wi 

2    + 1   

p

(1)

In Eq.1 the symbol Si represents the NMR signal size of peak i at frequency ω13C (i.e. the horizontal position in the 13C-dimension, see Fig. 1B). The Pearson VII lineshape function contains four parameters: hi is the maximal signal height, (ω13C)max,i is the peak centroid, wi represents the linewidth (note: wi does not equal the full width at half-maximum in the present formulation of the Pearson VII function) and p determines the lineshape. Eq.1 represents the Lorentzian lineshape for p=1, and equals the Gaussian lineshape in the limit p→∞. The number of peaks in the measured multiplet of a given 13C atom is a function of the number of its 13C-13C scalar couplings with other carbon atoms. If there are no 13C-13C scalar couplings, there is only one singlet. Every additional 13C-13C scalar coupling splits each fine structure into a fine structure with the double number of peaks. By consequence, a multiplet of an atom with m 13C-13C scalar couplings consists of 3m peaks divided over 2m fine structures. The peak multiplicities of the fine structures that compose multiplets with 0, 1, 2, 3 scalar couplings increase from {1} to {1, 2}, {1, 2, 2, 4}, {1, 2, 2, 2, 4, 4, 4, 8}. Note that the numbers of elements of these sets indeed equal 2m and their sums equal 3m. The peaks belonging to a fine structure are identical. This considerably reduces the number of independent parameters needed to describe a multiplet. Only 2m linewidth parameters and 2m height parameters are needed to describe a multiplet consisting of 3m peaks. Assuming that the lineshape of all the peaks in a multiplet is identical, only one parameter p (see Eq.1) is needed. The positions of the peaks in a multiplet are linearly constrained. Every 13C-13C scalar coupling splits a fine structure into a fine structure of which the peaks are shifted with respect to the ‘parent’ peaks by plus and minus a given 13C frequency, called the ‘scalar coupling constant’ and denoted by JC/C (Szyperski,1995). Consider Fig.3A showing the multiplet of carbon atom Cβ that has one-bond 13C-13C scalar couplings with carbon atoms Cα and Cγ. The coupling constants are JCβ/Cα and JCβ/Cγ. The frequency of the central singlet peak of Cβ is (ω13C)max,β. Whereas the 13C-13C scalar couplings yield a symmetrical multiplet, 13C isotope effects (Hansen,1988) cause asymmetry by slightly shifting all the peaks in the same (low field) direction. Consider a multiplet of carbon atom Cα that has a scalar coupling JCα/Cβ with Cβ and an isotope effect of the size TCα/Cβ. The resulting multiplet is shown in Fig.3B. All the 3m peak positions in a multiplet can be described by the central peak position of the singlet and m scalar couplings and m isotope effects. The following equation shows how 1+2+2=5 parameters describe the central positions of 32=9 peaks in a multiplet with two different 13C-13C scalar coupling constants:

66

Analysis of 2D [13C,1H] COSY spectra

A

α

β

JCβ/Cα

γ JCβ/Cγ

- JCβ/Cγ - JCβ/Cα

JCβ/Cγ JCβ/Cα

- JCβ/Cα+ JCβ/Cγ

JCβ/Cα- JCβ/Cγ

- JCβ/Cα- JCβ/Cγ

JCβ/Cα+ JCβ/Cγ

+

B

α JCα/Cβ

β TCα/Cβ

- JCα/Cβ

TCα/Cβ

JCα/Cβ

+

FIGURE 3: (A) The positions of the peaks in a multiplet of carbon atom Cβ, which are splitted by the 13C-13C scalar coupling with carbon atoms Cα and Cγ. (B) The positions of the peaks in a multiplet of carbon atom Cα, which are splitted by the 13C-13C scalar coupling with carbon atom Cβ and shifted by an isotope effect TCα/Cβ.

67

Chapter 4

 ω ( 13 C )  max,s    ω ( 13 C )  1 0 0 0 0  max,d1,1      ω 13 C  ( ) 1 −1 0 1 0    max,d1,2 13 1 1 0 1 0   ω ( C )max,s   13    ω ( C)    max,d2,1  1 0 −1 0 1   J Cβ / Cα     =  ω ( 13 C ) 1 0 1 0 1  ⋅  J  ( 2) max,d2,2 C β / Cγ       1 −1 −1 1 1   TCβ / Cα   ω ( 13 C )    max,dd,1 1 1 −1 1 1     T Cβ / Cγ   ω 13 C    ( )max,dd,2  1 −1 1 1 1   1 1 1 1 1   ω ( 13 C )    max,dd,3    13   ω ( C)  max,dd,4   We can thus conclude that whereas four parameters are needed to describe one NMR peak a multiplet consisting of 3m peaks may be described by 2+2m+1+2*m parameters (1 power, 2m heights, 2m linewidths, 1 central peak position, m coupling constants and m isotope effects). This is a serious reduction: e.g. a multiplet consisting of 9 peaks (m=2) can be described by 14 parameters instead of 36. This reduction is valuable when fitting a multiplet because it significantly reduces the parameter estimation. Initialization of the fit The spectral model is fitted to an experimentally determined multiplet by minimizing the sum of squared residuals. This is an ill-posed nonlinear optimization problem in the sense that the optimization procedure does not know in what direction to search if the initially guessed peaks have no overlap with the true ones. In the worst case a suboptimal local minimum is found due to the fitting of minor noise peaks. Stephenson and Binsch (1980) discussed a solution for this problem that enables automatic peak finding, but only rigorously applies to a multiplet of one peak. We have chosen for a hybrid approach. The peak positions of some peaks can be uniquely detected by the operator. As shown in the previous section, once 2*m+1 of a total of 3m peak positions are known the remaining can be automatically estimated. Adding an initial estimate for the linewidth of the peaks and initially assuming a Lorentzian lineshape (p=1 in Eq.1) suffices to make the fitting procedure quickly converge to a global minimum. By checking if the 3m estimated peak positions approximately correspond with the peaks in the measured multiplet, one verifies whether the manually indicated peak positions indeed correspond to the fine structures they were assumed to correspond with. Consider for example the case of Eq.2. The five positional parameters can be obtained by inverting the equation and manually indicating the positions of the five peaks of the singlet and doublets 1 and 2. Based on the positional parameters one can calculate the positions of the remaining four double doublet peaks. In case one of the double doublet peaks was incorrectly manually indicated as a doublet peak, the positional parameters will be wrong and the calculated positions of the double doublet peaks will not correspond with the experimentally observed 68

Analysis of 2D [13C,1H] COSY spectra ones. This method is of great help in correctly assigning all peaks to their corresponding fine structures in complex spectra. Estimation of errors in relative intensities Only few papers are available in which relative fine structure intensity data are presented together with their standard deviations. In these papers standard deviations are either derived from multiple measured data sets (Malloy et al.,1987; Jucker et al.,1998; Schmidt et al.,1999), from comparison of redundant measurements and visual inspection of the signal to noise ratio (Schmidt et al.,1999) or from the outcomes of the application of various filtering functions (exponential, gaussian or mixed exponential/gaussian linebroadening) and methods to determine the peak area (simple integration or peakfitting) of a single NMR data set (Petersen et al.,2000;A.A. De Graaf, pers.commun.). A great advantage of the peak fitting procedure discussed in the previous section is that the residual spectrum of the measured (Smeas) and optimally fitted (Sfit) spectra can be used to estimate the NMR noise and thus the measurement error from one single experiment. This method is based on the assumption that all NMR signal intensities that are measured along the ω13C-axis have measurement errors with an uniform variance (i.e. the homoscedasticity assumption applies). The error variance (σ2) can be estimated using Eq.3:

( S − Sfit ) ⋅ ( Smeas − Sfit ) = meas T

σ

2

n−p

( 3)

where n is the number of spectral data (i.e. the length of the vector S) and p is the total number of spectral parameters. In appendix A it is explained how the error variance of a spectrum can be used to calculate the covariance matrix (Cr) of the relative intensities. Although standard deviations of relative fine structure intensities have been published before, the corresponding covariances have wrongly never been determined nor taken into account by our knowledge. Only by taking into account covariances (i.e. the known correlations between dependent relative intensities) one can perform a correct statistical test of the differences between two data sets. Isotopic non-steady state correction of fractional enrichment and fine structure intensities Labeling experiments are usually started by establishing a metabolic steady state in a continuous culture growing on unlabeled substrate. Subsequently, the medium is replaced by medium containing 13C-labeled substrate. Both media must be identical except for the isotopic composition of the carbon substrate not to disturb the metabolic steady state. In continuous cultures where the labeled substrate is limiting, the substrate concentration in the fermentor is so low that the switch to labeled medium leads to a stepwise onset of the import of labeled substrate into the cells. Because the intracellular metabolite levels are generally low, it may be assumed that the metabolites immediately reach an isotopic steady state. This results in a stepwise onset of the accumulation label in the biomass. This assumption was checked against literature data on the metabolite concentrations and fluxes in the glycolysis (Theobald et al.,1997;Rizzi et al.,1997) and pentose phosphate pathway 69

Chapter 4 (Vaseghi et al.,1999) and the concentration of and flux through the combined αketoglutarate/glutamate pool (Ter Schure et al.,1995;Rizzi et al.,1997) that may be considered in isotopic equilibrium (Tran-Dinh et al.,1996). All literature data apply to Saccharomyces cerevisiae growing under conditions very similar to our experiments. Based on the data we calculated that by far the slowest wash-in of label in the intermediate pools is approximately 0.63*10-3 s-1 for the α-ketoglutarate/glutamate pool. This rate is 22 times larger than the turnover rate of the biomass (0.1hr-1=0.028*10-3 s-1) which leads us to conclude that our assumption of a stepwise onset of the accumulation label in the biomass is valid. Under these assumptions the labeling of biomass follows first order wash-in kinetics. After a number of residence times of growth on labeled medium the biomass is harvested in order to determine the label distribution in the biomass. Theoretically, the isotopic steady state of the biomass components is only reached after an infinite number of residence times. Therefore, experimental labeling data must be corrected for the deviation from the isotopic steady state at the time of harvesting the biomass (Marx et al.,1996; Szyperski 1998; Möllney et al.,1999). The fractional enrichment vector x of biomass component X at isotopic steady state (t=∞) can be calculated from the fractional enrichment vector that is measured at time t by means of the following equation:

x ( t ) − e −µ⋅t ⋅ x ( 0 ) ( 4) 1 − e−µ⋅t In Eq.4, µ is the specific growth rate of the culture and the vector x(0) is defined by the natural labeling: x (∞) =

x ( 0 ) = Pn ⋅ i

( 5)

where Pn represents the natural fractional labeling (≈0.011), and i is a vector of the same dimension as x containing only ones. The isotopic non-steady state correction of the isotopomer distributions of the biomass components is also given by Eq.4 above. In this case vector x represents the isotopomer distribution vector. In this case the ith element of vector x(0) is given by:

x i ( 0 ) = ( Pn )

L(xi )

+ (1 − Pn )

U(xi )

( 6)

where L(xi) is the number of 13C-atoms and U(xi) is the number of 12C-atoms of the ith isotopomer of biomass component X. For example, the vector x(0) of a two carbon compound is: (x00, x01, x10, x11)T = (0.976, 0.012, 0.012, 0.000)T (for an explanation of the binary subscripts see appendix B). In case the labeling data used are 2D [13C,1H] COSY spectra, Eq.4 cannot be used for isotopic non-steady state correction, since the NMR spectra seldom yield a complete isotopomer distribution vector. In other words: x(t) is not known. As was discussed before, the data obtained from 2D [13C,1H] COSY spectra are the relative intensities of fine structures which represent ratios of groups of isotopomer fractions (Szyperski,1995). The relative intensity (xi,f(t)) of the fine structure ‘f’ (e.g. a singlet or doublet) of the ith carbon atom of biomass component X at time t can be corrected for isotopic non-steady state as follows (derivation in appendix B):

70

Analysis of 2D [13C,1H] COSY spectra x i,f ( ∞ ) =

x i,f ( t ) ⋅ x i ( t ) − x i,f ( 0 ) ⋅ e −µ⋅t ⋅ Pn x i ( t ) − e −µ⋅t ⋅ Pn

(7)

In the above equation, xi without the subscript ‘f’ represents the fractional enrichment of the i carbon atom of X. So in order to correct the relative intensity of a fine structure of a carbon atom, the fractional enrichment of the atom at time t must be known as well. When the labeled substrate that is applied is uniformly labeled, the fractional enrichment of each carbon atom is identical and can be calculated by applying Eq.8 which is derived from Eq.4. The applicability of this equation (and thereby of the presented isotopic non-steady state correction) is limited to cases where the substrates are uniformly labeled. th

x i ( t ) = e −µ⋅t ⋅ Pn + (1 − e −µ⋅t ) ⋅ Pf

∀ x,i

(8)

In this equation Pf is the fraction of uniformly labeled substrate in the medium. Filling in this equation in Eq.7 yields the following isotopic non-steady state correction in case of uniform labeling: x i,f ( ∞ ) =

(

)

x i,f ( t ) ⋅ e−µ⋅t ⋅ Pn + (1 − e−µ⋅t ) ⋅ Pf − x i,f ( 0 ) ⋅ e −µ⋅t ⋅ Pn

(1 − e ) ⋅ P −µ⋅t

(9)

f

The value of the relative intensity at time 0 (xi,f(0)) in Eqs.7 and 9 can be calculated using Eq.6 where L(xi) and U(xi) are the numbers of 13C-atoms respectively 12C-atoms that neighbour the observed carbon atom. In case of superimposed fine structures due to identical 13 C-13C scalar couplings the separate relative intensities at time zero should be calculated using Eq.6 and summed afterwards. The isotopic non-steady state correction of the relative intensities must be taken into account in the estimation of the corresponding covariances. Under the assumption that the errors in Pn , Pf , t and µ in Eq.9 are negligible compared to the error in xi,f(t), the covariance matrix of the relative intensities in the multiplet of the ith carbon atom of compound x at harvesting time t (Cr(xf(t)), see appendix A) can be corrected as follows:

(

)

2

 e −µ⋅t ⋅ Pn + (1 − e −µ⋅t ) ⋅ Pf   ⋅ Cr ( x i,f ( t ) ) Cr ( x i,f ( ∞ ) ) =  (10 )   1 − e −µ⋅t ) ⋅ Pf (   When the labeled substrate has been supplied much longer than the dilution time prior to biomass harvesting (i.e. t >> 1/µ), the isotopic non-steady state correction only marginally improves the measured labeling data and may be neglected. However, for labeling times shorter than four times the residence time, negligence of the correction results in serious relative errors. A short supply of labeled substrate has several advantages. Often only small amounts of biomass are required for NMR analysis. An entire fermentor content of fully labeled biomass is seldom needed. Labeling costs may therefore be reduced by reducing either the fermentor volume (Schmidt et al.,1999) or the concentration of substrate in the medium (and thus of the biomass). Both solutions may cause changes in the metabolic fluxes that are studied. Using the normal fermentor size and substrate concentration and opting for a short label supply and isotopic non-steady state correction instead does not have this disadvantage.

71

Chapter 4 Another advantage of short labeling is that the observed metabolic system needs to be kept in a well-defined stationary state for only a short period of time. This reduces the risk of disturbances of the steady state. Finally, if substrate labeling is supplied for a long time to a fermentor with a sufficiently large volume and biomass concentration, several biomass samples may be taken prior to attaining the isotopic steady state without disturbing the metabolic steady state. By correcting these samples for the non isotopic steady state, their labeling data may be compared. This allows multiple independent NMR analyses from a single one continuous culture. Additional information due to long-range 13C-13C couplings As stated before, 2D [13C,1H] COSY spectra seldom yield complete isotopomer distribution vectors. Fig.2 shows the groups of isotopomers that can be distinguished in the section of a spectrum corresponding with a central carbon atom in a four-carbon compound where up to three 13C-13C couplings are observable. The figure is based on the assumption that all coupling constants are different. In case coupling constants are identical various fine structures overlap and the number of observable isotopomer groups decreases. Only the multiplets of carbon atoms that are covalently bound to at least one proton are observed in 2D [13C,1H] COSY spectra. By consequence amino acids always yield one less multiplet than the number of carbon atoms, having a carbonyl carbon atom that is not protonbound. Suppose that all four carbon atoms of the compound in Fig.2 are proton-bound (which is the case in e.g. erythrytol). In this case two multiplets of terminal carbon atoms and two multiplets of central carbons can be measured. How many of the 24=16 isotopomer fractions (note: Fig. 2 shows only one half of all the possible isotopomers) of the compound can be determined from the combined four multiplets? This question cannot be answered by simply summing the number of separate fine structures in all the multiplets. Some information may overlap. Moreover, the fine structure areas in a multiplet are normalized with respect to the total area of the multiplet. Consequently, one of the fractions of the isotopomer groups that are observed in a multiplet follows from the others. Suppose that none of the 13C-13C couplings is identical and that respectively one and two one-bond 13C-13C couplings can be observed in multiplets of terminal and central carbon atoms. In this case the number of separate isotopomer fractions that is determined from the four multiplets follows from the Eq.11.

x1,s = ∑ x10?? x1,d = ∑ x11??

∑x ∑x

1??? 1???

= 1 − x1,s

x 3,s = ∑ x ?010 x 3,d1 = ∑ x ?110 x 3,d 2 = ∑ x ?011

∑x ∑x ∑x ∑x

??1? ??1? ??1?

x 2,s = ∑ x 010? ∑ x ?1?? x 3,dd = ∑ x ?111 ??1? = 1 − x 3,s − x 3,d1 − x 3,d 2 x 2,d1 = ∑ x110? ∑ x ?1?? x 4,s = ∑ x ??01 ∑ x ???1 x 2,d 2 = ∑ x 011? ∑ x ?1?? x 2,dd = ∑ x111? ∑ x ?1?? = 1 − x 2,s − x 2,d1 − x 2,d 2 x 4,d = ∑ x ??11 ∑ x ???1 = 1 − x 4,s (11)

72



Analysis of 2D [13C,1H] COSY spectra 0  0 0  0 0  0 0  0 

0 0

0

0 0 0

0

1 1 1

1

C1 C1

C1

0 0

0

1 1 0

0

0 0 0

0

0

0

C2

0 0

0

0 0 0

0

0 0 0

0

1

1

C3

0 0

0

0 0 1

1

0 0 0

0

0

0

C4

0 1

0

0 0 0 C5

0 0 1

0

0

0

0

0 0 0 0

0 1

0 0 1 C6 0 0 0 C7

0 0 0 0 0 0

0 1

0 0

0 0

1 0

1 0 C8

0 1 0 C8

0 1 0 C8

0

1

0

where

C1   C2   x 0000  C3     x 0001  C4   ⋅   = 0 C5      x1110  C6   x  C7   1111   C8 

C1 = − x1,s x1,d

C5 = − x 3,s x 3,dd

C 2 = − x 2,s x 2,dd

C6 = − x 3,d1 x 3,dd

C3 = − x 2,d1 x 2,dd

C7 = − x 3,d 2 x 3,dd

C 4 = − x 2,d 2 x 2,dd

C8 = − x 4,s x 4,d

11,     continued 

In Eq.11 the subscripts ‘s’, ‘d’, ‘d1’, ‘d2’ and ‘dd’ denote the singlet, doublet, doublets 1 and 2 (two different doublets in a central carbon multiplet) and double doublet. The binary subscripts are defined as in Schmidt et al. (1997). A question mark represents either a zero or a one. The rank of the 8x16 matrix above is 8. In other words: the 12 relative intensities of the 4 multiplets yield 8 independent isotopomer data. If we assume that each multiplet has an additional long-range 13C-13C coupling that is large enough to yield well-resolved peaks then the same compound has two multiplets of terminal carbon atoms which yield 4 (3 independent) relative intensities and two multiplets of central carbon atoms yielding 8 (7 independent) relative intensities. The corresponding 20x16 matrix has rank 14. This confirms the expectation that multiplets with more couplings indeed yield a superior amount of data. In the example above the obtained 14 independent isotopomer data are the maximally achievable number. The fully unlabeled isotopomer of a n-carbon compound is not visible in any of the 2D [13C,1H] COSY spectra. Furthermore, relative intensities only yield ratios between the remaining (2n-1) isotopomers due to the normalization. The maximal number of independent ratios between (2n-1) entities is (2n-1)-1. Therefore, 2D [13C,1H] COSY NMR maximally yields (2n-2) independent labeling data for a n-carbon compound. 4.3 MATERIALS AND METHODS Labeled biomass The labeled biomass that was used in this NMR study was obtained in two continuous cultures. In one of the experiments S. cerevisiae (CEN.PK-113.7D) was grown aerobically on a C-limited defined medium containing glucose and ethanol (10:1 on a weight basis) as carbon sources and ammonium as a nitrogen source. The glucose consisted of 90% unlabeled glucose and 10% U-[13C6] glucose (CLM-1396, 99% 13C, ARC Laboratories B.V., Amsterdam, The Netherlands). The yeast was grown in a fermentor working volume of 1670 ml at a biomass concentration of 1.70 g DW/l and a dilution rate of 0.1 h-1. Biomass samples of 80 ml (≈136 mg DW) were taken for NMR analysis after 11.8 and 35.4h.

73

Chapter 4 In the second experiment P. chrysogenum (DS12975, DSM, Delft, The Netherlands) was grown aerobically on a C-limited defined medium containing glucose as a carbon source and nitrate as a nitrogen source (for details see Van Gulik et al. (2000)). The glucose consisted of 90% unlabeled glucose and 10% U-[13C6] glucose. The fungus was grown in a fermentor working volume of 1360 ml at a biomass concentration of 1.15 g DW/l and a dilution rate of 0.03 h-1. A biomass sample of 100 ml (≈115 mg DW) was taken for NMR analysis after 119.6 h. The samples of S. cerevisiae were centrifuged (6 min., 4800 rpm). After decanting the supernatant, cells were washed with 0.9% NaCl-solution, centrifuged and washed with demineralized water. After final centrifugation cells were frozen at -80°C. Prior to NMR analysis the biomass was freeze-dried. The samples of P. chrysogenum were filtered (glass fiber filter, Gelman Sciences, USA). Filters with cells were washed with 0.9% NaCl-solution and demineralized water. After final filtration cells were frozen at -80°C prior to freezedrying. Preparation of samples Biomass was hydrolysed in 10 ml 6 N HCl for 16 hours at 110°C. After filtration and evaporation to dryness, the residue was dissolved in 10 ml 0.1 N HCl and the amino acids were adsorbed to an ion exchange resin (Dowex AG 50W X4) and washed with 0.1 N HCl. The amino acids were eluted with 4 N HCl. After evaporation the residue was dissolved in D2O. The presented sample preparation includes separation of proteinogenic amino acids from the remaining biomass components. This was also done by Szyperski (1995), Schmidt et al. (1999) and Sauer et al. (1999), whereas Petersen et al. (2000) used whole-cell lysate for NMR analysis. The separation of the amino acids eliminates the interference of multiplets of carbon atoms of other compounds. Water extracts of cell components other than proteinogenic amino acids were prepared by heating the biomass in 5 ml H2O to 90∞C for 10 minutes with shaking. After centrifugation, the supernatant was lyophilized and dissolved in D2O. Derivatization After drying the biomass hydrolysate by evaporation the amino acids were converted to methyl esters with methanol/thionyl chloride according to Brenner et al. (1950). After the reaction, the residual solvent was removed by a stream of N2. The methyl esters were redissolved in 1 ml of a 50:50 mixture of deuterated methanol and acetone. The methyl esters were acylated with trifluoroacetic acid anhydride (Coulter and Hahn,1968) and after removal of the excess reagents in a stream of N2, dissolved in CDCl3. 2D [13C,1H] COSY measurements NMR measurements were performed at 600 MHz at 37 °C on a Bruker Avance 600 spectrometer. The [13C,1H] COSY experiment was the HSQC sequence by Bax and Pochapsky (1992) with gradients for artefact suppression. Folding in F1 was used for reducing the sweepwidth. The 13C carrier was set to 57.5 ppm and 2400 increments were recorded with an effective sweep width of 20 ppm (t1max =398 ms). For the aromatic carbons 74

Analysis of 2D [13C,1H] COSY spectra the offset was 129.6 ppm and 512 increments were recorded with a sweepwidth in F1 of 3 ppm (t1max =652 ms) The window function used before Fourier transformation were a cosinebell shifted by π/3 in F2 and a sine-bell shifted by π/2.6. 4.4 RESULTS Improving spectral quality by derivatization In general, the linewidths of NMR signals are dependent on a number of parameters like solvent viscosity, temperature and the presence of paramagnetic ions. In order to improve the resolution, the temperature of the sample could be increased or the sample could be dissolved in a less viscous solvent. Amino acids are generally not soluble in a-polar solvents. They do dissolve in methanol, but in this solvent they are partially converted to methyl esters during the measurement which yields spectra of poor quality. A better option is to prepare methyl esters and dissolve these in a 50:50 mixture of methanol and acetone. An additional advantage over water dissolved amino acids is that due to the absence of salts in the methanol/acetone mixture considerably less heat is generated by the 13C-decoupling during the acquisition time. In principle the methyl ester can be further converted to the N-acyl amino acid esters with TFAA, although not all amino acids are easily and quantitatively prepared. The resulting Ntrifluoroacetyl amino acid esters are soluble in chloroform. Four multiplets of amino acids in water and the corresponding derivatives in chloroform were measured and compared. Fig.4 shows a typical result of this comparison.

normalized signal size

chloroform

normalized 13C-frequency

normalized signal size

water

normalized 13C-frequency

FIGURE 4: Comparison of normalized sections in 13C-direction of 2D [13C,1H] COSY spectra of the α-carbon atom of phenylalanine. Upper: multiplet of chloroform-soluble N-trifluoroacetyl-methyl ester of phenylalanine, lower: multiplet of water-dissolved phenylalanine. Rectangle includes a peak of the doublet with the smaller 13C-13C scalar coupling constant and a peak of the double doublet.

75

Chapter 4 The multiplets are normalized to the same total height and width. It is clear that the linewidths of the peaks are considerably smaller for the chloroform soluble derivative. The part of the multiplet shown within the rectangle clearly illustrates the better resolution of peaks of the derivatives dissolved in chloroform. Results of fits of multiplets of four α-carbon atoms of proteinogenic amino acids dissolved in water and chloroform are compared in Table 1. The table shows that the average linewidths are 1.4-1.7 times larger for water samples than for chloroform. This is consistent with the observation made for Fig.4, i.e. that solution of derivatives in chloroform leads to narrower peaks and thus to a better resolution of the peaks. TABLE 1: Comparison of linewidths, D-values and determined relative intensities in multiplets of four carbon atoms of amino acids in water and chloroform. (hydrolyzed biomass of S. cerevisiae) water α-phenylalanine

αthreonine

αmethionine

α-glutamic acid

Notes:

average linewidth (Hz) Db relative singlet intensity doublet 1c doublet 2 d. doublet average linewidth (Hz) D relative singlet intensity doublet 1 doublet 2 d. doublet average linewidth (Hz) D relative singlet intensity doublet 1 doublet 2 d. doublet average linewidth (Hz) D relative singlet intensity doublet 1 doublet 2 d. doublet

1.714 3.92E-05 0.120 0.023 0.086 0.771 1.749 1.85E-05 0.279 0.254 0.106 0.361 2.203 4.09E-04 0.243 0.233 0.147 0.376 1.777 3.63E-05 0.336 0.335 0.195 0.133

chloroform 1.145 1.28E-05 0.121 0.016 0.083 0.780 1.181 1.73E-05 0.274 0.258 0.111 0.358 1.332 1.66E-04 0.236 0.241 0.107 0.415 1.237 2.52E-05 0.340 0.354 0.183 0.123

difference

SSweighted

Pa

1.359

7.15E-1

2.081

5.56E-1

3.150

3.69E-1

11.753

8.28E-3

a

0.000 0.007 0.003 -0.009

0.005 -0.004 -0.004 0.003

0.007 -0.008 0.040 -0.039

-0.004 -0.019 0.012 0.011

a

See Eqs.C1 and C2 in appendix C. For definition, see Eq.12. c Doublet 1 has the larger coupling constant in all cases. b

Table 1 also shows the effect of the derivatization and solution in chloroform on the size of the estimated covariances of the relative intensities. The total variance of the relative intensities in a multiplet consisting of F fine structures is represented by the value of a scalar D that is defined here as: 76

Analysis of 2D [13C,1H] COSY spectra 1

 F−1  F−1 D =  ∏ s i ( Cr )  (12 )  i =1  In Eq.12 si(Cr) is the ith singular value of the covariance matrix (Cr) of the relative intensities of the multiplet. Only the F-1 largest singular values are multiplied, since the Fth (smallest) singular value equals zero due to the dependence of one of the relative intensities on the others. Raising the product in Eq.12 to the power 1/(F-1) renders the outcome D independent of the number (F) of fine structures in the multiplet. This allows an unbiased comparison of the total variance of multiplets with varying numbers of peaks. When comparing the values of D for the water dissolved amino acids and their chloroform-dissolved derivatives in Table 1 one sees that the latter have smaller values in all four cases. The values are between 1.07 and 3.07 times smaller for the chloroform soluble derivatives. In other words: their relative intensities are more accurately determined. It is important to verify that the relative intensities do not depend on the solvent. One can check that the relative intensities of the multiplets in the chloroform and water samples generally show close resemblance by inspecting the one-tailed probabilities (P) in Table 1. The calculation and meaning of the values of P is explained in appendix C. When a minimal probability of 5% is chosen (i.e. α=0.05 in Eq.C2), then pairs of spectra with values of P larger than 0.05 do not significantly differ. This is the case for the multiplets of αphenylalanine, α-threonine and α-methionine dissolved in water and chloroform. For αglutamic acid, however, a slight, but statistically significant difference is observed.

Single section versus sum of sections In order to check whether the relative peak areas in a 1D-section along the 13C-frequency axis are independent of the 1H-frequency at which the section is made, a single section was compared to the sum of all sections at the 1H-frequencies where the signal was observable. Table 2 shows this comparison for eighteen carbon atoms of several amino acids and trehalose. The analyzed multiplets were selected to cover a wide range of possible fine structures and corresponding relative intensities. Evaluating the one-tailed probabilities in Table 2 versus a minimal probability of 5% one sees that the single-section multiplet significantly differs from the multiple-section multiplet for α-and δ-histidine, C2-, and C4-trehalose, α- and γ’-valine, β-tyrosine and αserine. This means that the relative areas in the single-section multiplet are not representative of the relative volumes in the 2D-NMR multiplet. In these cases the multiple-section multiplet must be used in order to get a reliable set of relative intensities. In the ten remaining cases where no significant difference is found a single section multiplet may be used to determine the relative intensities.

77

Chapter 4 TABLE 2: Comparison of relative intensities in 1D-multiplets obtained by taking either one single section of the 2D [13C,1H] COSY spectrum or a sum of sections at all 1 H-frequencies showing a signal. (hydrolyzed biomass of P. chrysogenum) single section α-histidine

β-histidine

δ-histidine C1-trehalose C2-trehalose

C3-trehalose

C4-trehalose

C5-trehalose

C6-trehalose α-valine

γ’-valine γ’’-valine α-tyrosine

β-tyrosine δ-tyrosine ε-tyrosine

78

singlet doublet 1 doublet 2 double doublet singlet doublet 1 doublet 2 double doublet singlet doublet singlet doublet singlet doublet triplet singlet doublet triplet singlet doublet triplet singlet doublet triplet singlet doublet singlet doublet 1 doublet 2 double doublet singlet doublet singlet doublet singlet doublet 1 doublet 2 double doublet singlet doublet triplet singlet doublet triplet singlet doublet

0.131 0.070 0.013 0.786 0.177 0.005 0.336 0.481 0.584 0.416 0.268 0.732 0.206 0.298 0.496 0.375 0.235 0.390 0.182 0.331 0.488 0.126 0.086 0.788 0.132 0.868 0.292 0.594 0.041 0.072 0.350 0.650 0.781 0.219 0.176 0.040 0.114 0.670 0.254 0.724 0.022 0.184 0.752 0.064 0.299 0.298

sum of multiple sections 0.121 0.073 0.014 0.792 0.190 0.004 0.336 0.470 0.518 0.482 0.262 0.738 0.227 0.292 0.481 0.363 0.238 0.399 0.163 0.324 0.512 0.120 0.086 0.794 0.136 0.864 0.280 0.598 0.045 0.078 0.364 0.636 0.793 0.207 0.171 0.043 0.109 0.676 0.229 0.771 0.000 0.177 0.757 0.066 0.293 0.299

difference

SSweighted

P

0.010 -0.004 -0.001 -0.006 -0.013 0.001 0.000 0.012 0.067 -0.067 0.006 -0.006 -0.021 0.006 0.015 0.012 -0.003 -0.009 0.018 0.006 -0.025 0.007 -0.001 -0.006 -0.004 0.004 0.012 -0.003 -0.004 -0.005 -0.014 0.014 -0.012 0.012 0.004 -0.003 0.005 -0.006 0.025 -0.047 0.022 0.006 -0.005 -0.001 0.006 -0.001

15.63

1.35E-3

1.73

6.30E-1

103.82

2.21E-24

0.95

3.31E-1

12.83

1.64E-3

1.28

5.27E-1

13.31

1.29E-3

1.70

4.28E-1

0.72

3.96E-1

54.69

8.00E-12

42.67

6.49E-11

2.49

1.14E-1

4.73

1.93E-1

46.69

7.27E-11

0.82

6.65E-1

0.74

6.89E-1

Analysis of 2D [13C,1H] COSY spectra

α-serine

β-serine

triplet singlet doublet 1 doublet 2 double doublet singlet doublet

0.403 0.222 0.374 0.078 0.326 0.580 0.420

0.408 0.213 0.376 0.077 0.335 0.569 0.431

-0.004 0.009 -0.002 0.002 -0.009 0.011 -0.011

17.13

6.63E-4

3.14

7.64E-2

Estimation of errors in relative intensities The proposed error estimation procedure was used to generate covariance matrices of all multiplets that were analyzed. These covariance matrices have non-zero off-diagonal elements due to the overlap of peaks and due to the fact that relative intensities are normalized by dividing the area of a fine structure in a multiplet by the total areas of all fine structures. Clearly, this information is richer than estimated variances of separate relative intensities (neglecting covariances) that are commonly used (Schmidt et al.,1999;Petersen et al.,2000). Consider for example a comparison of two sets of relative intensities of a multiplet consisting of a singlet and a doublet. Due to the normalization of the peak areas, the intensities are fully correlated. When applying Eq.C1 to these data (see appendix C) and filling in only variances in Cr (i.e. off-diagonal elements are set at zero), the weighted sum of squares is twice as large than in case covariances are filled in as well. This leads to an erroneous outcome of the statistical test of Eq.C2. The accuracy of the error estimation procedure was tested by a Monte Carlo analysis. For this purpose four multiplets of different carbon atoms were fitted, resulting in estimates of the NMR noise and three analytically calculated covariance matrices (see appendix A). Next, normally distributed, random noise with the estimated variance was added to the original data and the resulting multiplets were fitted. This was repeated 500 times for each multiplet resulting in sets of 500 estimated relative intensities. When calculating the covariance matrices of these sets and comparing them to the analytically determined ones, a close resemblance was observed for most covariances. This suggests that both the proposed procedure for analytical error estimation and the homoscedasticity assumption for the spectral noise are correct. Isotopic non-steady state correction Biomass of S. cerevisiae was harvested from a continuous culture at two different times prior to attaining isotopic steady state in order to test the isotopic non-steady state correction. 2D [13C,1H] COSY spectra of various components of the biomass hydrolysate were measured and analyzed to obtain the relative intensities shown in Table 3 (see columns ‘not corrected’). The dilution rate of the continuous culture was 0.1 h-1 and the two biomass samples were taken after 11.8h and 35.4h of labeled substrate supply. In these two situations the remaining fraction of naturally labeled biomass is 0.31 and 0.03, respectively. Evaluating the one-tailed probabilities in the ‘not corrected’ column of Table 3 versus a minimal probability of 5% shows significant differences between ‘early’ and ‘late’ multiplets for all carbon atoms except β-glutamic acid, β-alanine, α-, β- and γ-proline and βand δ’’-leucine. In general, the multiplets of the early harvested biomass have larger relative 79

Chapter 4 intensities of the singlets and lower values of the (double) doublets. The explanation of this observation is that the labeling pattern of the naturally labeled biomass has a larger contribution to the overall labeling pattern of the early harvested biomass. Natural labeling causes larger intensities of singlets than of other fine structures because the fortuitous labeling of several carbons in a row rarely occurs. TABLE 3: Comparison of relative intensities in multiplets (s=singlet, d=doublet, t=triplet, dd=double doublet) of amino acids in biomass harvested after different times of label supply. Left: values prior to isotopic non-steady state correction, right: values after isotopic non-steady state correction. (hydrolyzed biomass of S. cerevisiae) non-steady state correction:

α-histidine β-histidine δ-histidine αphenyl alanine β-tyrosine αglutamic acid βglutamic acid γ-glutamic acid αalanine βalanine

80

s d1 d2 dd s d1 d2 t s d s d1 d2 dd s d t s d1 d2 dd s d

biomass harvesting time (h): 11.8 35.4 0.159 0.110 0.048 0.041 0.026 0.013 0.767 0.837 0.172 0.131 0.024 0.013 0.344 0.353 0.461 0.503 0.394 0.365 0.606 0.635 0.147 0.120 0.030 0.023 0.070 0.086 0.752 0.771 0.153 0.125 0.756 0.784 0.091 0.091 0.364 0.336 0.327 0.335 0.190 0.195 0.118 0.133 0.657 0.645 0.343 0.355

s d1 d2 dd s d1 d2 dd s d

0.215 0.696 0.019 0.069 0.164 0.049 0.104 0.683 0.189 0.811

0.196 0.717 0.019 0.068 0.157 0.047 0.122 0.674 0.190 0.810

not corrected diffeSSweighrence ted

P

0.049 0.007 0.014 -0.070 0.041 0.012 -0.009 -0.043 0.030 -0.030 0.027 0.007 -0.016 -0.019 0.028 -0.028 0.000 0.028 -0.007 -0.005 -0.015 0.011 -0.011

291.42

7.16 E-63

43.82

1.65 E-9

18.57

1.64 E-05 7.60 E-10

0.019 -0.021 0.000 0.001 0.007 0.003 -0.018 0.009 -0.001 0.001

45.40

22.53

1.28 E-5

39.40

1.42 E-8

0.58

4.48 E-1

10.14

1.74 E-2

8.53

3.62 E-2

0.02

8.79 E-1

biomass harvesting time (h): 11.8 35.4 0.115 0.107 0.050 0.041 0.027 0.013 0.808 0.840 0.128 0.128 0.025 0.013 0.361 0.354 0.485 0.505 0.362 0.362 0.638 0.638 0.103 0.117 0.031 0.023 0.073 0.086 0.792 0.773 0.109 0.122 0.796 0.787 0.096 0.091 0.319 0.333 0.351 0.336 0.203 0.196 0.127 0.134 0.632 0.644 0.368 0.356

0.159 0.747 0.020 0.075 0.120 0.051 0.109 0.720 0.146 0.854

0.192 0.720 0.019 0.069 0.154 0.047 0.123 0.676 0.187 0.813

corrected diffeSSweighrence ted

P

0.009 0.009 0.014 -0.032 0.000 0.012 0.007 -0.020 0.000 0.000 -0.014 0.008 -0.013 0.019 -0.013 0.009 0.004 -0.014 0.014 0.007 -0.007 -0.012 0.012

19.23

2.45 E-4

5.41

1.44 E-1

0.00

9.93 E-1 4.17 E-3

-0.033 0.026 0.001 0.006 -0.034 0.004 -0.014 0.043 -0.041 0.041

13.23

3.87

1.45 E-1

20.08

1.64 E-04

0.57

4.51 E-01

25.64

1.13 E-05

71.61

1.93 E-15

41.51

1.17 E-10

Analysis of 2D [13C,1H] COSY spectra αproline βproline γproline δproline αleucine βleucine δ’leucine δ’’leucine αserine βserine

s d1 d2 dd s d s d t s d s d1 d2 dd s d t s d s d s d1 d2 dd s

0.341 0.329 0.202 0.129 0.648 0.352 0.220 0.706 0.074 0.240 0.760 0.254 0.638 0.032 0.076 0.837 0.163 0.000 0.253 0.747 0.899 0.101 0.188 0.246 0.068 0.498 0.435

0.337 0.320 0.199 0.145 0.632 0.368 0.182 0.725 0.092 0.209 0.791 0.218 0.682 0.026 0.074 0.830 0.170 0.000 0.216 0.784 0.889 0.111 0.162 0.271 0.070 0.497 0.377

0.004 0.009 0.003 -0.016 0.016 -0.016 0.038 -0.019 -0.019 0.031 -0.031 0.036 -0.044 0.006 0.002 0.007 -0.007 0.000 0.037 -0.037 0.010 -0.010 0.026 -0.025 -0.002 0.001 0.058

d

0.565

0.623

-0.058

1.76

6.24 E-1

0.47

4.93 E-1 8.28 E-2

4.98

7.29 249.39

6.92 E-3 8.85 E-54

0.07

9.67 E-1

12.96

3.17 E-4 6.15 E-1 4.11 E-15

0.25 70.08

19.09

1.25 E-05

0.294 0.352 0.216 0.139 0.623 0.377 0.164 0.757 0.079 0.185 0.815 0.200 0.685 0.034 0.081 0.829 0.171 0.000 0.214 0.786 0.894 0.106 0.145 0.258 0.071 0.525 0.405

0.333 0.321 0.200 0.146 0.631 0.369 0.178 0.729 0.093 0.205 0.795 0.214 0.686 0.026 0.074 0.829 0.171 0.000 0.213 0.787 0.889 0.111 0.159 0.272 0.070 0.499 0.375

-0.040 0.031 0.016 -0.007 -0.008 0.008 -0.014 0.028 -0.014 -0.020 0.020 -0.014 -0.001 0.007 0.007 0.000 0.000 0.000 0.001 -0.001 0.005 -0.005 -0.013 -0.014 0.001 0.026 0.030

0.595

0.625

-0.030

66.28

2.67 E-14

0.11

7.44 E-01 2.71 E-01

2.61

2.95 18.84

8.57 E-02 2.95 E-04

0.00

1.00 E+0

0.00

9.57 E-1 7.90 E-1 1.97 E-03

0.07 14.83

4.97

2.58 E-02

Table 3 also shows the relative intensities that are extrapolated to their values after infinite labeling supply (see columns ‘corrected’) using Eq.9. The value filled in for Pf in these equations is 0.10, except for the carbon atoms that are indirectly derived from acetylcoenzyme A (acetyl-CoA). These include the carbon atoms of glutamic acid and proline and the α-carbon of leucine. The lower value for Pf in these cases is caused by the influx of unlabeled ethanol from the feed into the acetyl-CoA pool. From the glucose consumption and biomass formation measurements it was determined that for each mole acetyl-CoA formed from (10% labeled) glucose, 0.38 mole acetyl-CoA was formed from (unlabeled) ethanol. This leads to a value for Pf of 1.00/1.38*0.10=0.072. For all but four of the twenty multiplets shown, the correction yields extrapolated relative intensities of the early and late harvested biomass that are more similar than the uncorrected values (compare P-values). However, the corrected relative intensities of the early and late harvested biomass should theoretically be identical except for random measurement errors. This is the case for only ten of the twenty carbon atoms. Quite a number of relative intensities of the early harvested biomass are ‘over-corrected’, suggesting that the corresponding carbon atoms are in isotopic steady state somewhat earlier than expected. A tentative explanation is that these amino acids are not only labeled by growth of new, labeled biomass, but also by protein turnover. In that case the assumptions underlying the isotopic non-steady state correction do not strictly apply. 81

Chapter 4

signal size

The four carbon atoms in Table 3 of which the corrected relative intensities are more different than the uncorrected values are the γ-carbon atom of glutamic acid, the α-carbon atom of proline and both the α- and β-carbon atom of alanine. The fact that the uncorrected multiplets of both alanine carbon atoms are more similar than the corrected ones seems to indicate that alanine reaches its isotopic steady state much earlier than the other amino acids. This cannot be explained by protein turnover, as this would affect all amino acids.

130.8860

131.1060

131.3260

131.5460

131.7660

13C-frequency (ppm)

measured fitted singlet

signal size

doublet 1 doublet 2 doublet 3 double doublet 1,2 double doublet 1,3 double doublet 2,3 quadruple doublet

127.532

127.562

127.592

127.622

127.652

13C-frequency (ppm)

FIGURE 5: Multiplets of δ- and ε-tyrosine fitted by means of the spectral analysis software tool. The separate peaks to the right of the multiplets are the remainders of the cut-off peaks (rescaled). The various symbols below the multiplet indicate the central peak positions of the singlet, three doublets, three double doublets and the quadruple doublet that are caused by three (two one-bond and one longrange) 13C-13C scalar couplings. The filled and open symbols of the same form indicate fine structures resulting from groups of isotopomers that only differ in their long-range 13C-13C scalar coupling.

82

Analysis of 2D [13C,1H] COSY spectra An alternative explanation could be that alanine is rare in biomass protein and is relatively abundantly present in a free form in the cell. However, alanine is neither a rare component of S. cerevisiae biomass nor known to be present in the cell in a free form in very high concentrations. Therefore, the observation must have another explanation.

signal size

signal size

Additional labeling information The 2D [13C,1H] COSY data of proteinogenic amino acids reported in literature are relative intensities of fine structures caused by one-bond 13C-13C couplings (Petersen et al.,2000;Schmidt et al.,1999;Szyperski,1995). Fiaux et al. (1999) mentioned the detection of an additional fine structure -a quadruple doublet- in the multiplet of β-histidine caused by splitting of the double doublet by a long-range 13Cβ-13Cδ coupling in the histidine molecule. Besides the reported quadruple doublet, the long-range 13Cβ-13Cδ coupling may add an additional doublet and two double doublets to the multiplet by splitting the singlet and doublets that result from the one-bond 13Cβ-13Cα and 13Cβ-13Cγ couplings. Some of these fine structures are observed in the measured multiplet of the β-carbon of histidine (Fig.6). Measured multiplets of the δ- and ε-tyrosine (Fig.5) and δ-histidine (Fig.6) carbons also show fine structures caused by long-range 13C-13C couplings. The multiplet of δ-tyrosine in Fig.5 does not only show the 13Cγ-13Cδ and 13Cδ-13Cε couplings, but also the long-range coupling with the ξ-carbon. In the multiplet of ε-tyrosine the 13Cε-13Cγ coupling is observed additionally to the commonly reported one-bond couplings with the δ- and ξ-carbons. Likewise, Fig.6 shows that the multiplet of δ-histidine does not only consist of a singlet and a doublet. Additional fine structures result from the fact that the multiplet does not only show the 13Cδ-13Cγ coupling, but also the long-range 13Cδ-13Cβ coupling. This is valuable information, since the γ-carbon itself is not proton bound which makes its couplings with the β- and δ-carbons unobservable by 2D [13C,1H] COSY NMR.

64.941

65.210

65.479

13C-frequency (ppm)

65.747

127.838

127.868

127.897

127.926

13C-frequency (ppm)

FIGURE 6: Example of additionally observable peaks due long-range 13C-13C scalar couplings in the multiplets of β- (left) and δ-histidine (right). (spectra from hydrolyzed biomass of P. chrysogenum, obtained by summing multiple sections of 2D [13C,1H] COSY multiplet).

83

Chapter 4 The relative intensities in Tables 2 and 3 were obtained by taking into account the long-range 13 C-13C couplings and fitting eight fine structures to the measured δ- and ε-tyrosine and βhistidine multiplets (see Fig.2) and four to the δ-histidine multiplet. Subsequently, the pairs of fine structures that are caused by the same one-bond 13C-13C couplings are summed. This method (method 1) is preferable to only fitting the fine structures that are expected by the one-bond 13C-13C couplings (method 2) since this will cause a misfit. Table 4 shows the differences between the outcomes of the two methods for the multiplets in Figs.5 and 6. TABLE 4: Comparison of two methods (see text) for determining relative intensities in multiplets showing long-range 13C-13C-couplings. (β- and δ-histidine, ε-tyrosine: hydrolyzed biomass of P. chrysogenum, δ-tyrosine: hydrolyzed biomass of S. cerevisiae) 1

Method:

βhistidine

δhistidine

δtyrosine

ε-tyrosine

Notes:

84

a

singlet doublet 3 b doublet 1 double doublet 13 c doublet 2 double doublet 23 double doublet 12 quadruple doublet d singlet doublet 2 doublet 1 double doublet singlet doublet 3 doublet 1 double doublet 13 doublet 2 double doublet 23 double doublet 12 quadruple doublet singlet doublet 3 doublet 1 double doublet 13 doublet 2 double doublet 23 double doublet 12 quadruple doublet

separate peaks 0.136 0.033 0.008 0.020 0.239 0.056 0.172 0.335 0.459 0.058 0.328 0.154 0.179 0.016 0.028 0.270 0.461 0.000 0.035 0.011 0.276 0.018 0.124 0.107 0.069 0.000 0.383 0.025

2

difference

SSweighted

P

summed peaks a 0.169

0.150

0.019

37.511

3.59E-8

0.029

0.040

-0.011

0.296

0.260

0.036

0.507

0.550

-0.044

0.518

0.489

0.028

11.089

8.69E-4

0.482

0.511

-0.028

0.195

0.219

-0.024

3.759

1.53E-1

0.758 e

0.760

-0.002

0.047

0.021

0.025

0.293

0.278

0.015

7.398

2.48E-2

0.299 e

0.309

-0.010

0.408

0.413

-0.005

Sum of relative intensities caused by same one-bond 13C-13C coupling. b Doublet 3 is caused by long-range 13C-13C coupling. c Double doublet 13 is caused by combination of coupling constants of doublets 1 and 3. d Quadruple doublet is caused by combination of all three coupling constants. e Due to identical one-bond 13C-13C couplings, doublets 1 and 2 and double doublets 13 and 23 (see Fig.5) are summed to yield a single relative area.

Analysis of 2D [13C,1H] COSY spectra Evaluating the one-tailed probabilities in Table 4 (see appendix C) versus a minimal probability of 5% clearly shows that the differences between the outcomes of the two methods are significant for three carbon atoms. In the case of δ-tyrosine the difference is not significant. This is due to the relatively low signal-to-noise ratio for this compound (see Fig.5), which causes a large covariance matrix and thus a small weighted sum of squared residuals. Although the fit of method 2 does not yield significantly differing relative intensities for this carbon atom, the non-random nature of the residual spectrum clearly indicates a serious misfit which further contributes to the estimated noise. This serves to emphasize that the probability that is calculated according to Appendix C is only a good criterion for comparing two spectra when no systematic misfit is detected. Many peaks of the multiplets in Figs.5 and 6 are ill resolved. Still, the computer aided spectral analysis tool is able to fit them and allocate relative intensities to them. The poor resolution is accounted for in the covariance matrices of the relative intensities. As was outlined in the theory-section, the additionally observable long-range couplings result in a larger number of independently known isotopomer groups within a molecule. This yields more independent labeling data for subsequent flux analysis. Whereas the fine structures of α, β- and δ-histidine that are reported in Tables 2 and 3 yield seven independent isotopomer measurements of the histidine molecule, the additionally observable 13C-13C couplings in the multiplets of the latter two carbon atoms yield six more independent measurement data. This means almost a doubling of the quantity of isotopomer information. 4.5 CONCLUSIONS In this chapter we have shown how the Pearson VII function can be combined with a set of linear constraints on the positions of spectral peaks to yield a mathematical description of 2D [13C,1H] COSY spectra that contains few parameters and still fits experimental multiplets very well. A computational tool that was developed based on this mathematical description of multiplets allowed us to efficiently and accurately determine the relative peak areas of fine structures in experimental multiplets containing up to 27 partly overlapping peaks. It could be checked whether the peaks had been correctly assigned to the corresponding fine structures. Complete covariance matrices of the determined relative intensities were calculated based on the estimated measurement noise and an analytical sensitivity analysis of the relative intensities with respect to the noise. It was checked whether peak intensities in a one-dimensional section along the 13Cfrequency axis are representative for relative peak volumes in a 2D [13C,1H] COSY multiplet. Comparison of single sections and sums of multiple sections of eighteen multiplets showed that in eight cases (i.e. 44%) the relative intensities of single sections were not representative for the peak volumes. It was demonstrated that 2D [13C,1H] COSY spectra of chloroform-soluble derivatives of amino acids have better resolved peaks due to the lower viscosity of chloroform when compared to water. Also, the estimated variance of the determined relative intensities was smaller for the chloroform-soluble derivatives. These findings can be used to improve the accuracy of determined peak areas in multiplets with many overlapping peaks.

85

Chapter 4 A new method was proposed to enable isotopic non-steady state correction of relative fine structure intensities. Application of this correction to twenty multiplets of amino acids in biomass samples harvested at different times led to a significant improvement of sixteen (i.e. 80%) of the multiplets. Two of the multiplets that were not improved by the correction were of alanine suggesting that this amino acid is in isotopic steady state earlier than the remaining amino acids. The cause of this phenomenon is unknown. Finally, for some of the amino acid carbon atoms long-range 13C-13C scalar couplings were observed in 2D [13C,1H] COSY spectra. We demonstrated that the presence of additional peaks in a multiplet should be taken into account when fitting the multiplets. Negligence of the satellite peaks caused by a long-range 13C-13C scalar coupling led to significantly different results. It was shown how to calculate the number of additional independent labeling data that result from the observability of the long-range 13C-13C scalar couplings. In our example of histidine the number of independent labeling data increased from the previously reported seven to thirteen. APPENDIX A:

CALCULATION OF RELATIVE INTENSITY COVARIANCES FROM SPECTRAL NOISE VARIANCE The relative intensities of a multiplet are the areas of the separate fine structures that are normalized with respect to the total spectral area. It is assumed that the covariances of the determined relative intensities are solely caused by the spectral noise. Fine structure areas (Ai) are determined by finding the parameters β=(w, h, (ω13C)max, p)T of the spectral model of Eq.1 that optimally fit the measured multiplet and calculating:

( A1)

Ai = c ⋅ mi ⋅ w i ⋅ h i

In this equation mi is the multiplicity of the peaks in fine structure i. The value of constant c in the equation depends on the lineshape of the peak and thus on the power p in Eq.1. The covariances of the spectral parameters (Cβ) can be calculated from the spectral noise variance (σ2, see Eq.3) by linearizing the fitted multiplet (sum of Eqs.1 applied to all peaks) around the optimally fitted parameters and calculating:

(

T

Cβ = σ2 ⋅ J Sfit,β ⋅ J Sfit,β

)

−1

( A2 )

The columns of the Jacobian JSfit,β in Eq.A2 can be analytically derived from Eq.1:

∂Sfit,i

  Z i,k 2 ⋅ Yi,k ∂w i = ∑  ⋅  wi k =1   Yi,k + 1     p mi

     

( A3a )

mi

∂Sfit,i

86

Z  ∂Sfit,i ∂h i = ∑  i,k  ( A3b ) k =1  h i  2⋅Y Z i,k ∂ω13C ⋅ 13C i,k13C max,i,k =  Yi,k  ( ω − ωmax,i,k )  p + 1  

( A3c )

Analysis of 2D [13C,1H] COSY spectra

F

∂S fit ∂p = ∑ i =1

      Yi,k  − Z ⋅  ln  Yi,k + 1 −  ∑   i,k   p Y   k =1  p ⋅ i,k + 1     p          mi

 ω13C − ω13C i,k,max where Yi,k =  wi  hi and Z i,k = p  Yi,k   p + 1  

  

( A3d )

2

In Eqs.A3, the subscript ‘i’ refers to the ith fine structure and ‘k’ refers to the kth peak of a fine structure. F is the total number of fine structures in a multiplet. The covariance matrix of the fine structure areas (CA) follows from Cβ by two-sided multiplication with a matrix that contains the partial derivatives of A with respect to β: C A = J A,β ⋅ Cβ ⋅ J A,β

T

( A4 )

For a multiplet consisting of a singlet and doublet JA,β is given by:

m ⋅h  0 ms ⋅ w s 0 ∅ ∅ J A,β = c ⋅  s s ( A5) md ⋅ h d 0 md ⋅ w d  0  The covariances of the relative intensities (Cr) are derived from CA in a similar manner: Cr = J r,A ⋅ C A ⋅ J r,A

T

( A6 )

Jacobian Jr,A contains the sensitivities of the relative intensities to the fine structure areas. For a multiplet consisting of only a singlet and a doublet fine structure Jr,A can be derived to be (Chapter 2 of this thesis):

J r,A =

1 2 A tot

 A ⋅  tot  0

0   As − A tot   A d

As    Ad 

( A7 )

Substituting Eqs.A1, A4, A5 and A7 in Eq.A6 one finds that constant c cancels from the final solution, so the value of this constant needs not be known. Examples of covariances of peak areas and relative intensities From the above it is clear that the covariances of the relative intensities (Cr) partly stem from correlation of the determined peak areas (accounted for in CA) and partly from the normalization of absolute peak intensities to relative values. The contribution of these two factors to the matrix Cr is illustrated in the following two examples in which CA* and Cr* are the correlation matrices corresponding to CA and Cr. I) Spectrum of the β-serine carbon atom consisting of a non-overlapping singlet (area 1) and doublet (area 2):

87

Chapter 4  9.387 4.412  1.000 0.415  C A = 10−8 ⋅  C*A =    12.057  1.000     2.177 −2.177  1.000 −1.000  Cr = 10−5 ⋅  C*r =    2.177  1.000    The correlation matrix CA* shows that the errors in the peak surfaces are correlated although the peaks do not overlap. These correlations stem from the single peak form (parameter ‘p’ in Eq.1) that is estimated for all fine structures in a multiplet and from the correlations between the positions of the various peaks (see Eq.2). Matrix Cr* shows total (negative) correlation of the errors of the relative intensities of the singlet and the doublet which is caused by the normalization of the two fine structure surfaces. II) Spectrum of the β-tyrosine carbon atom consisting of an overlapping singlet (area 1) and middle two double doublet peaks (1/2*area 3), a doublet (area 2, consisting of two fully overlapping doublets) and two separated outer double doublet peaks (1/2*area 3):

 0.800 0.709 −0.531  1.000 0.376 −0.276      * 4.451 1.760  1.000 0.387  CA = 10 ⋅  CA =    4.657  1.000     0.668 0.436 −1.104   1.000 0.554 −0.860     −4  * 0.927 −1.363  1.000 −0.901  Cr = 10 ⋅  Cr =    2.467  1.000    In the correlation matrix CA* one sees that the errors of the non-overlapping singlet and doublet areas and the doublet and double doublet areas are somewhat positively correlated for reasons mentioned above. The errors of the singlet and double doublet areas on the other hand are negatively correlated (element (CA*)1,3) due to the partial overlap between these fine structures. Comparison of CA* and Cr* again shows that normalization of the fine structure areas leads to larger correlations of the errors. It also shows that positively correlated errors of the peak surfaces do not necessarily lead to positively correlated errors of the relative intensities (see elements (CA*)2,3 and (Cr*)2,3). This can be understood by recognizing that an overestimation of the absolute areas of a large and a small peak in a multiplet may lead to a decrease of the estimated relative area of the large peak and an increase of the relative area of the small one. −5

ISOTOPIC NON-STEADY STATE CORRECTION OF 2D [13C,1H] COSY SPECTRAL DATA 13 1 Sections of the 2D [ C, H] COSY spectrum yield one-dimensional multiplets for the various carbon atoms of biomass components. The relative intensity xi,f of a fine structure ‘f’ in the multiplet of the ith carbon atom of biomass component X can be calculated from the fractional enrichment xi and from the sum of the fractions of the isotopomers that give rise to the concerning fine structure: APPENDIX B:

x i,f ( t ) = ∑ ( x i,bin ( t ) ) x i ( t )

88

( B1)

Analysis of 2D [13C,1H] COSY spectra The numerator in the right-hand side of Eq.B1 is the sum of the isotopomers. The subscript ‘bin’ denotes the ‘binary isotopomer notation’ as introduced by Schmidt et al. (1997). For example, for the singlet (‘s’) and doublet (‘d’) of the third (β) carbon of alanine, Eq.B1 reads:

ala 3,s ( t ) = ∑ ala ?01 ( t ) ala 3 ( t )

ala 3,d ( t ) = ∑ ala ?11 ( t ) ala 3 ( t ) In the equations above a question mark in a binary subscript denotes that both zero and one are allowed. Applying the isotopic non-steady state correction (Eq.4) to both the numerator and denominator of Eq.B1 yields: x i,f

( x ( t )) − e (∞) = ∑ i,bin

⋅ ∑ ( x i,bin ( 0 ) ) 1 − e −µ⋅t x i,f ( t ) ⋅ x i ( t ) − x i,f ( 0 ) ⋅ e−µ⋅t ⋅ Pn ⋅ = x i ( t ) − e −µ⋅t ⋅ Pn 1 − e −µ⋅t x i ( t ) − e −µ⋅t ⋅ Pn −µ⋅ t

( B2 )

APPENDIX C: SIGNIFICANT DEVIATION OF RELATIVE INTENSITIES The judgement whether two vectors of relative intensities (r) significantly differ can be calculated on the basis of the residual vector of the two subtracted vectors and on the covariance matrix of the residual vector. This covariance matrix is simply the sum of the covariance matrices (Cr) of the subtracted vectors. The covariance weighted sum of squares (SSweighted) of the elements of the residual is: SSweighted = ( r1 − r2 ) ⋅ ( Cr,1 + Cr,2 ) ⋅ ( r1 − r2 ) T

#

( C1)

In Eq.C1 the symbol ‘#’ represents the generalized pseudo-inverse. The use of the pseudoinverse implicitly accounts for the dependence that stems from the fact that the sums of the elements of both subtracted vectors equal one. SSweighted has a χ2-distribution in case the covariances are known. In our case the covariances are estimated from the spectral noise (Appendix A) but it was found by means of Monte Carlo simulation that the χ2-distribution describes the true distribution of SSweighted well. The number of degrees of freedom of the χ2-distribution of the weighted sum of squares equals the number of relative intensities of a multiplet (F) minus one (due to discussed dependency of residual elements). Thus, the significance of the deviation between two relative intensity vectors can be tested by evaluating the one-tailed probability P that a given deviation would be fortuitously found:

P ( SSweighted ≤ χ 2 ( F − 1) ) < α

( C2 )

REFERENCES Bartels, C., Güntert, P., Billeter, M., Wüthrich, K. (1997) GARANT- A general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comp. Chem., 18, 1: 139-149 Bax, A., Pochapsky, S.S. (1992) Optimized recording of heteronuclear multidimensional NMR spectra using pulsed field gradients. J. Magn. Res., 99: 638-643 Brenner, M., Müller, H.R., Pfister, R.W. (1950) A new enzymic synthesis. Helv. Chim. Acta, 33: 568. Conny, J.M., Powell, C.J. (2000) Standard test data for estimating peak parameter errors in X-ray photoelectron spectroscopy. Surf. Interface Anal., 29: 856-872 Coulter, J.R., Hahn, C.S. (1968) Practical quantitative gas chromatographic analysis of amino acids using the n-propyl N-acetyl esters. J. Chromatogr. 36: 42

89

Chapter 4 De Graaf, A.A., Striegel, K., Wittig, R.M., Laufer, B., Schmitz, G., Wiechert, W., Sprenger, G.A., Sahm, H. (1999) Metabolic state of Zymomonas mobilis in glucose-, fructose-, and xylose-fed continuous cultures as analysed by 13C- and 31P-NMR spectroscopy. Arch. Microbiol., 171: 371385 Fiaux, J., Andersson, C.I.J., Holmberg, N., Bülow, L., Kallio, P.T., Szyperski, T., Bailey, J.E., Wüthrich, K. (1999) 13C NMR flux ratio analysis of Escherichia coli central carbon metabolism in microaerobic bioprocesses. JACS, 121: 1407-1408 Ge, W., Lee, H.K., Nalcioglu, O. (1993) Simultaneous nonlinear least squares fitting technique for NMR spectroscopy. IEEE-Nucl. Sci. Symp. Med. Imaging Conf. 2, San Francisco, USA, 13221326 Günther, U.L., Ludwig, C., Rüterjans, H. (2000) NMRLAB-advanced NMR data processing in MATLAB. J. Magn. Res., 145: 201-208 Hansen, P.E. (1988) Isotope effects in nuclear shielding. Prog. NMR Spectrosc., 20: 207-255 Jucker, B.M., Lee, J.Y., Shulman, R.G. (1998) In vivo 13C NMR measurements of hepatocellular tricarboxylic acid cycle flux. J. Biol. Chem., 273, 20: 12187-12194 Malloy, C.R., Sherry, A.D., Jeffrey, F.M. (1987) Carbon flux through citric acid cycle pathways in perfused heart by 13C NMR spectroscopy. FEBS Letters, 212, 1: 58-62 Marshall, I., Bruce, S.D., Higinbotham, J., MacLullich, Al., Wardlaw, J.M., Ferguson, K.J., Seckl, J. (2000) Choice of spectroscopic lineshape model affects metabolite peak areas and area ratios. Magn. Res. Med., 44: 646-649 Marx, A., De Graaf, A.A., Wiechert, W., Eggeling, L., Sahm, H. (1996) Determination of the fluxes in the central metabolism of Corynebacterium glutamicum by nuclear magnetic resonance spectroscopy combined with metabolic balancing. Biotechnol. Bioeng., 49, 2: 111-129 Möllney, M., Wiechert, W., Kownatzki, D., De Graaf, A.A. (1999) Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol. Bioeng., 66, 2: 86-103. Petersen, S., De Graaf, A.A., Eggeling, L., Möllney, M., Wiechert, W., Sahm, H. (2000) In vivo quantification of parallel and bidirectional fluxes in the anaplerosis of Corynebacterium glumaticum. J. Biol. Chem., 275, 46: 35932-35941 Portais, J.-C., Schuster, R., Merle, M., Canioni, P. (1993) Metabolic flux determination in c6 glioma cells using carbon-13 distribution upon [1-13C] glucose incubation. Eur. J. Biochem., 217: 457468. Rizzi. M., Baltes, M., Theobald, U., Reuss, M. (1997) In vivo analysis of metabolic dynamics in Saccharomyces cerevisiae: II. Mathematical model. Biotechnol. Bioeng., 55, 4: 592-608 Sauer, U., Lasko, D.R., Fiaux, J., Hochuli, M., Glaser, R., Szyperski, T., Wüthrich, K., Bailey, J.E. (1999) Metabolic flux ratio analysis of Genetic and environmental modulations of Escherichia coli central carbon metabolism. J. Bacteriol., 181, 21: 6679-6688 Schmidt, K., Carlsen, M., Nielsen, J., Villadsen, J. (1997) Modeling isotopomer distributions in metabolic networks using isotopomer mapping matrices. Biotechnol. Bioeng., 55, 6: 831-840 Schmidt, K., Nørregaard, L.C., Pedersen, B., Meissner, A., Duus, J.Ø., Nielsen, J.Ø., Villadsen, J. (1999). Quantification of intracellular metabolic fluxes from fractional enrichment and 13C-13C coupling constraints on the isotopomer distribution in labeled biomass components. Metabol. Eng., 1: 166-179 Sonntag, K., Eggeling, L., De Graaf, A.A., Sahm, H. (1993) Flux partitioning in the split pathway of lysine synthesis in Corynebacterium glutamicum. Eur. J. Biochem., 213: 1325-1331 Stephanopoulos, G.N., Aristidou, A.A., Nielsen, J. (1998) ,,Metabolic Engineering. Principles and Methodologies.’’ Academic Press, San Diego

90

Analysis of 2D [13C,1H] COSY spectra Stephenson, D.S., Binsch, G. (1980) Automated analysis of high-resolution NMR spectra. I. Principles and computational strategy. J. Magn. Res., 37: 395-407 Subhash, N., Mohanan, C.N. (1997) Curve-fit analysis of chlorophyll fluorescence spectra: application to nutrient stress detection in sunflower. Remote Sens. Environm., 60: 347-356 Szyperski, T. (1995) Biosynthetically directed fractional 13C-labeling of proteinogenic amino acids. An efficient analytical tool to investigate intermediary metabolism. Eur. J. Biochem., 232: 433448 Szyperski, T., Bailey, J.E., Wüthrich, K. (1996) Detecting and dissecting metabolic fluxes using biosynthetic fractional 13C labeling and two-dimensional NMR spectroscopy. TIBTECH, 14: 453-459. Szyperski. T. (1998) 13C-NMR, MS and metabolic flux balancing in biotechnology research. Quart. Rev. Biophys., 31, 1: 41-106 Szyperski, T., Glaser, R.W., Hochuli, M., Fiaux, J., Sauer, U., Bailey, J.E., Wüthrich, K. (1999) Bioreaction network topology and metabolic flux ratio analysis by biosynthetic fractional 13C labeling and two-dimensional NMR spectroscopy. Metabol. Eng., 1, 3: 189-197 Ter Schure, E.G., Silljé, H.H.W., Verkleij, A.J., Boonstra, J., Verrips, C.T. (1995) The concentration of ammonia regulates nitrogen metabolism in Saccharomyces cerevisiae, J. Bacteriol., 177, 22: 6672-6675 Theobald, U., Mailinger, W., Baltes, M., Rizzi, M., Reuss, M. (1997) In vivo analysis of metabolic dynamics in Saccharomyces cerevisiae: I. Experimental observations. Biotechnol. Bioeng., 55, 2: 305-316 Tran-Dinh, S., Bouet, F., Huynh, Q.-T., Herve, M. (1996) Mathematical models for determining metabolic fluxes through the citric acid and the glyoxylate cycles in Saccharomyces cerevisiae by 13C-nmr spectroscopy. Eur. J. Biochem., 242: 770-778. Van Gulik, De Laat, W.T.A.M., Vinke, J.L., Heijnen, J.J. (2000) Application of metabolic flux analysis for the identification of metabolic bottlenecks in the biosynthesis of penicillin-G. Biotechnol. Bioeng., 68, 6: 602-618 Van Winden, W.A., Schipper, D., Verheijen, P.J.T., Heijnen, J.J. (2001) Innovations in generation and analysis of 2D [13C,1H] COSY NMR spectra for metabolic flux analysis purposes. Metabol. Eng., 3, 4: 322-343 Vaseghi, S., Baumeister, A., Rizzi, M., Reuss, M. (1999) In vivo dynamics of the pentose phosphate pathway in Saccharomyces cerevisiae. Metabol. Eng., 1, 2: 128-140 Wittig, R., Möllney, M., Wiechert, W., De Graaf, A.A. (1995) Interactive evaluation of NMR spectra from in vivo isotope labelling experiments. IFAC Comp. Appl. Biotechnol. GarmischPartenkirchen, Germany, 230-233

91

Chapter 4

92

Chapter 5

Cumulative Bondomers: A New Concept in Flux Analysis from 2D [13C,1H] COSY Data

This chapter is accepted for publication as Van Winden et al. (2002) ABSTRACT A well established way of determining metabolic fluxes is to measure 2D [13C,1H] COSY spectra of components of biomass grown on uniformly 13C-labeled carbon sources. When using the entire set of measured data to simultaneously determine all fluxes in a proposed metabolic network model, the 13C-labeling distribution in all measured compounds has to be simulated. This requires very large sets of isotopomer or cumomer balances. This chapter introduces the new concept of bondomers; entities that only vary in the numbers and positions of C-C bonds that have remained intact since the medium substrate molecule entered the metabolism. Bondomers are shown to have many analogies to isotopomers. One of these is that bondomers can be transformed to cumulative bondomers, just like isotopomers can be transformed to cumomers. Similarly to cumomers, cumulative bondomers allow an analytical solution of the entire set of balances describing a metabolic network. The main difference is that cumulative bondomer models are considerably smaller than corresponding cumomer models. This saves computational time, allows easier identifiability analysis and yields new insights in the information content of 2D [13C,1H] COSY data. We illustrate the theoretical concepts by means of a realistic example of the glycolytic and pentose phosphate pathways. The combinations of 2D [13C,1H] COSY data that allow the identification of all metabolic fluxes in these pathways are analyzed, and it is found that the NMR data contain less independent information than was previously expected; there is much redundancy.

93

Chapter 5 5.1 INTRODUCTION 13 The C-labeling technique has become an important tool in the analysis of metabolic fluxes in complex networks consisting of cyclic or parallel reaction pathways or bidirectional reactions. In this application of the 13C-labeling technique microorganisms are grown on 13Clabeled substrates, which results in the incorporation of the carbon isotopes at various positions in the intracellular carbon compounds. The distribution of the 13C-labeling in these compounds is measured using either mass spectrometry (MS) or nuclear magnetic resonance spectroscopy (NMR). Chemically identical compounds that only vary in the number and positions of incorporated 13C-atoms have been defined as ‘isotopomers’. Most NMR and MS methods only yield the relative abundance of groups of isotopomers and not of separate isotopomers. This is especially the case for larger biomolecules. Recently, De Graaf et al. (2000) proposed a new NMR method that allows the measurement of all separate isotopomers in metabolites having a backbone up to four carbon atoms. Szyperski (1995) proposed a method to determine metabolic fluxes from 2D [13C,1H] COSY spectra of proteinogenic amino acids in biomass grown on a mixture of uniformly labeled and naturally labeled substrate (the carbon substrate in the feed medium is from here on referred to as ‘medium substrate’ to distinguish it from the substrate of any reaction in the metabolism). Szyperski introduced probabilistic equations to calculate the fractions of intact fragments stemming from a single medium substrate molecule from the relative intensities of multiplets in 2D [13C,1H] COSY spectra. The ratios of fluxes entering specific metabolite pools can be determined from the fractions of intact fragments by employing a method named ‘METAFoR’ (metabolic flux ratio analysis) (Szyperski,1995; Szyperski et al.,1996; Sauer et al.,1999; Maaheimo et al.,2001). METAFoR is a local approach that can be applied to determine the flux ratio of a specific metabolic node of interest. When studying a specific flux ratio, this restricts the amount of 13C-labeling data that is to be measured because only the 13C-labeling patterns of the metabolites surrounding the node need to be known. Moreover, the analyzed flux ratio is not adversely affected by possible errors in other parts of the metabolic network model. At the other hand, METAFoR is a less appropriate approach for global flux analysis because it does not simultaneously fit all intracellular fluxes to the entire set of available labeling data which impedes reconciliation of (redundant) 13C-labeling information. This disadvantage is overcome by an alternative approach that allows global flux balancing, which was used to determine fluxes from 2D [13C,1H] COSY data by Schmidt et al. (1999), Petersen et al. (2000) and Dauner et al. (2001). They simulated full isotopomer distributions by simultaneously solving the mass balances and the isotopomer balances (Schmidt et al.,1997) or the cumomer balances (Wiechert et al.,1999). From the simulated isotopomer distributions they calculated the relative intensities of multiplets in 2D [13C,1H] COSY spectra corresponding to the actually measured data (Möllney et al.,1999). The metabolic fluxes were determined by searching the fluxes yielding simulated data that were most similar to the measured data. Simulating the full isotopomer distribution and summing the subsets of isotopomers corresponding to the measured data is in fact a detour. This method is computationally costly 94

Cumulative bondomers and in contrast to Szyperski’s (1995,1996) approach it does not yield information about carbon-carbon (from here on: ‘C-C’) connectivities in the metabolites. Data about C-C connectivities have the advantage that they can intuitively be interpreted, because they link on to the well-known carbon transitions in metabolic reactions. This chapter introduces a new method for determining intracellular fluxes from 2D 13 1 [ C, H] COSY data, which improves upon above two methods while combining their advantages. The method is based on simulation of so called ‘bondomers’. The use of bondomers is constrained to flux analysis of cells grown on uniformly 13C-labeled carbon sources. In case of multiple substrates they must all be uniformly 13C-labeled to the same extent. Bondomers are closely related to Szyperski’s (1995) fractions of intact fragments that stem from a single medium substrate molecule and therefore they have the appealing property of being intuitively interpretable. Modeling in terms of bondomers will be shown to lead to minimally two (but often many more) times fewer balances than isotopomer/cumomer models. Still, it allows the simultaneous fit and reconciliation of all available 2D [13C,1H] COSY data. Finally, the compactness of the set of simulated bondomers when compared to isotopomers will be shown to advantageously reduce the mathematical complexity of the identifiability analysis of metabolic fluxes from 2D [13C,1H] COSY data (cf. Chapter 3 of this thesis). 5.2 THEORY An overview of all the theoretical concepts used in this theory section, including their mutual relationships and literature references, is presented in Table 1. Bondomers and bondomer balances The data obtained from a 2D [13C,1H] COSY spectrum of a specific carbon atom in a given molecule are relative intensities of fine structures in a multiplet. These data give the relative amounts of groups of isotopomers in which the observed atom is 13C-labeled and the adjacent carbon atoms in the carbon backbone are either 13C-labeled or not (Szyperski,1995). If the compound (or fragment thereof) of which the labeling pattern is measured was synthesized by a microorganism growing on a mixture of uniformly 13C-labeled and naturally labeled carbon substrate, the occurrence of 13C-atoms on two adjacent carbon positions may have several causes. If the fragment containing the 13C-atoms stems from a single medium substrate molecule, it may have been either a uniformly labeled molecule or (less likely so) an unlabeled molecule in which the adjacent carbons are fortuitously labeled by naturally occurring 13C. Alternatively, if the C-C bond between the adjacent 13C-atoms was formed in one of the reactions leading to the observed fragment then the adjacent 13C-atoms can both stem from either a uniformly labeled or from a naturally labeled medium substrate molecule. Szyperski (1995) showed how fractions of molecules or fragments thereof that stem from one or more medium substrate molecules can be calculated from the relative intensities of NMR fine structures. This was done using so-called ‘probability equations’ that require as input the fraction of uniformly 13C-labeled medium substrate and the fraction of naturally 13Clabeled carbon. The molecules (or fragments thereof) that stem from one or more medium substrate molecules and that are calculated using the probability equations are both 95

Chapter 5 chemically and physically identical. They only vary in the numbers and positions of C-C bonds that have remained intact since the medium substrate molecule entered the metabolism. These entities are hereby defined as ‘bondomers’. Bondomers of a given compound are denoted by a abbreviation of the compound (see abbreviation list at p.1) and a binary subscript. Whereas in isotopomer notation the binary subscript ‘0’ denotes a 12C-atom and a ‘1’ a 13C-atom (Schmidt et al.,1997), in bondomer notation ‘0’ denotes a C-C bond that has been formed in one of the metabolic reactions in a cell and ‘1’ denotes a C-C bond that was already present in the medium substrate molecule. For example, bondomer e4p011 represents an e4p molecule in which only the first C-C bond has been newly formed in the metabolism. In other words, the fragment consisting of the second, third and fourth carbon atoms of e4p stems from one and the same medium substrate molecule, whereas the first carbon atom stems from another molecule. TABLE 1: abbreviation AMM BMM

96

Mapping matrices, distribution vectors and their relations meaning atom mapping matrix bondomer mapping matrix

CCMM

C-C bond mapping matrix

IMM

isotopomer mapping matrix

bdv

bondomer distribution vector

cbdv

cumulative bondomer distribution vector

cdv

cumomer distribution vector

CCV

C-C bond vector

idv

isotopomer distribution vector

riv

relative intensity vector

mutual relations, references AMM (Zupke and Stephanopoulos,1994) is related to CCMM by the reaction mechanism. BMM (this chapter) is calculated from CCMM using same algorithm by Schmidt et al. (1997) that calculates IMM from AMM. BMM is used to map bdvs in a bondomer balance (Eq.2) and to map cbdv in a cumulative bondomer balance (Eq.11). CCMM (this chapter) is related to AMM by the reaction mechanism. CCMM is used to map CCV in a C-C bond balance (Eq.1). IMM is calculated from AMM using algorithm by Schmidt et al. (1997). IMM is used to map idvs in a isotopomer balance (Schmidt et al.,1997) and to map cdv in a cumomer balance (Wiechert et al.,1999). bdv (this chapter) is mapped by BMM in a bondomer balance (Eq.2). bdv is converted to cbdv (Eq.3) by same T-matrix (Wiechert et al.,1999) that interconverts idv and cdv. bdv is interconverted into riv using probability equations (Eq.6 and (Szyperski,1995)). cbdv (this chapter) is mapped by BMM in a cumulative bondomer balance (Eq.11). cbdv is converted to bdv (Eq.3) by same T-matrix (Wiechert et al.,1999) that interconverts idv and cdv. cdv (Wiechert et al.,1999) is mapped by IMM in a cumomer balance. cdv is converted to idv by T-matrix (Wiechert et al.,1999). CCV (this chapter) is mapped by CCMM in a C-C bond balance (Eq.1). idv (Schmidt et al.,1997) is mapped by IMM in a isotopomer balance. idv is converted to cdv by T-matrix (Wiechert et al.,1999). riv (this chapter, subset of ‘measurement data vector’ in Möllney et al. (1999)) is converted to bdv using probability equations (Eq.6 and (Szyperski,1995)).

Cumulative bondomers A linear or branched molecule that does not contain closed carbon-rings and that has a backbone of n carbon atoms has n-1 C-C bonds. Such a molecule has 2n possible isotopomers and 2n-1 possible bondomers. The bondomer distribution of a molecule can be simulated in a way that is completely analogous to the simulation of isotopomers (Schmidt et al.,1997). Isotopomer balancing is based on the known carbon atom transitions in metabolic reactions, which can be described by means of atom mapping matrices (AMMs) (Zupke and Stephanopoulos,1994). Schmidt et al. (1997) introduced an algorithm that automatically generates isotopomer mapping matrices (IMMs) from AMMs. IMMs and isotopomer distribution vectors (idvs) form the two ingredients of isotopomer balances. Similarly, bondomer balancing is based on the mapping of retained C-C bonds by means of matrices that are hereby defined as C-C bond mapping matrices (CCMMs). A CCMM of a specific reaction has a number of columns equal to the number of C-C bonds in (one of) the substrate(s) of the reaction and a number of rows equal to (one of) the product(s). Matrix element CCMMi,j equals one if the jth C-C bond in the substrate is retained as the ith CC bond in the product. All other matrix elements are zero.

CH2OH

CH2OH

CO CHO

CHOH CHOH

+ CHOH

CHOH

CH2OPO3

CHOH

CHO CHOH CHOH CH2OPO3

+

CHOH CHOH CHOH CH2OPO3

CH2OPO3 s7p

CO

g3p

e4p

f6p

FIGURE 1: The reaction mechanism of the transaldolase reaction in the non-oxidative branch of the pentose phosphate pathway. The C-C bonds are shown in various styles to indicate their mapping. The dashed C-C bond in f6p is newly formed in the reaction.

Below, the C-C bond mapping is shown for the transaldolase reaction (see Fig.1) that forms part of the pentose phosphate pathway.

e4p =CCMM s7p>e4p ⋅ s7p +CCMM g3p>e4p ⋅ g3p

(1)

f6p =CCMM s7p>f6p ⋅ s7p +CCMM g3p>f6p ⋅ g3p In this equation s7p, g3p, e4p and f6p are C-C bond vectors (CCV). E.g. e4p2=0.75 means that in 75 percent of the e4p molecules the second C-C bond stems from the medium substrate. In the remaining 25% of the molecules, the second C-C bond was formed in one of the reaction steps following the uptake of the medium substrate. The four CCMMs in Eq.1 are the following:

97

Chapter 5

0 0 0 1 0 0 0 0     CCMM s7p>e4p =  0 0 0 0 1 0  CCMM g3p>e4p =  0 0  0 0 0 0 0 1 0 0     1 0 0 0 0 0 0 0     0 1 0 0 0 0 0 0 CCMM s7p>f6p =  0 0 0 0 0 0  CCMM g3p>f6p =  0 0       0 0 0 0 0 0  1 0 0 0 0 0 0 0 0 1     Note that the total number of ones in the four CCMMs (i.e. 7) that represent the reaction does not equal the number of C-C bonds in the reaction substrates and products (i.e. 8). The reason is that this reaction breaks the C-C bond at the third position in s7p and forms a new C-C bond at the third position in f6p, which means that one C-C bond stemming from the medium substrate of the cell is lost. This is different in AMMs, which contain a total number of ones equalling the number of carbon atoms in the reaction products, as carbon atoms are always conserved in a reaction. In order to set up bondomer balances, bondomer distribution vectors (bdvs) need to be defined. These vectors contain the fractions of all possible bondomers that constitute the total pool of a given metabolite. E.g. the bondomer vector of g3p is: (g3p00, g3p01, g3p10, g3p11)T. Furthermore, in order to complete the analogy with the isotopomer mapping, bondomer mapping matrices (BMMs) are defined. These matrices indicate which bondomers of the reaction products can be formed from all the possible bondomers of the reaction substrates. Each column of a BMM corresponds with a bondomer of (one of) the reaction substrate(s), each row with a bondomer of (one of) the product(s). The same algorithm that was introduced by Schmidt et al. (1997) for converting AMMs into IMMs, can be applied for converting CCMMs into BMMs. As an example, the BMMs for the substrate-product couples g3p/e4p and g3p/f6p in the above transaldolase reaction are shown: 1 0 0 0   0 1 0 0  1 1 1 1 0 0 1 0     1 1 1 1 0 0 0 1  1 1 1 1 0 0 0 0     1 1 1 1  BMM BMM g3p>e4p =  0 0 0 g3p>f6p =  0 1 1 1 1 0 0 0 0     1 1 1 1   0 0 0 0   1 1 1 1  vertically      1 1 1 1  repeated     3 times  BMMg3p>e4p contains only ones, which indicates that in this reaction each of the four bondomer fractions of g3p (g3p00, g3p01, g3p10, g3p11) equally contributes to each of the eight bondomer fractions of the formed e4p (e4p000, e4p001, e4p010, e4p011, e4p100, e4p101, e4p110, 98

Cumulative bondomers e4p111). This is a trivial observation, since in Fig.1 one can see that none of the C-C bonds of g3p ends up in e4p. In other words: in this reaction the bondomer fractions of e4p are fully determined by the bondomer fractions of s7p. Furthermore, the BMMg3p>f6p above shows that the 1st bondomer of g3p (i.e. g3p00) can lead to the 1st, 9th, 17th and 25th of the 32 possible f6pbondomers (i.e. f6p00000, f6p01000, f6p10000, f6p11000). The 2nd bondomer of g3p (i.e. g3p01) can lead to the 2nd, 10th, 18th and 26th f6p-bondomers (i.e. f6p00001, f6p01001, f6p10001, f6p11001). This mapping is easily understood by realizing that the two C-C bonds of g3p form the 4th and 5th C-C bonds in the f6p molecule. The 5th-8th, 13th-16th, 21st-24th and 29th-32nd f6p-bondomers cannot be formed according to BMMg3p>f6p. The reason is that each of these 16 bondomers has a retained C-C bond at the third position, which cannot result from the transaldolase reaction because the third C-C bond in f6p is newly formed. In a bondomer balance, the inflows and outflows of all the 2n-1 bondomers of a n-carbon compound are accounted for. The inflow terms in the balance are the products of the fluxes, BMMs and substrate-bdvs of the reactions that lead to the balanced metabolite. In case of two reacting substrates the corresponding terms in the balance are elementwise multiplications (denoted by ‘⊗’) of the two vectors that are the products of the BMMs and bdvs of the substrates. The outflow term is obtained by multiplying the bdv of the balanced metabolite by the sum of the fluxes of the reactions in which the metabolite itself serves as a substrate. Consider an e4p-pool with two influxes: the transaldolase reaction shown in Fig.1 (flux vta) and the transketolase reaction f6p + g3p → e4p + p5p (flux vtk). The total outflow of the e4ppool is vtot. The bondomer balance of e4p (where x is the bdv of component x) is: v ta ⋅ ( BMM g3p>e4p,ta ⋅ g3p ⊗ BMM s7p>e4p,ta ⋅ s7p ) +

v tk ⋅ ( BMM g3p>e4p,tk ⋅ g3p ⊗ BMM f6p>e4p,tk ⋅ f6p ) =v tot ⋅ e4p

( 2)

In order to distinguish between the matrices BMMg3p>e4p that occur in both the transaldolase and transketolase reaction, abbreviations of the reactions were added to the subscripts of the BMMs. Cumulative bondomers and cumulative bondomer balances In reaction networks with cyclic pathways, (parts of) the products of a given reaction may be recycled to yield the substrates of the same reaction via a sequence of reactions steps. This phenomenon, which frequently occurs in realistic metabolic networks, causes isotopomer balances that contain algebraic loops. Often, these need to be solved numerically in an iterative way. Wiechert et al. (1999) presented an elegant way of solving this problem by introducing the concept of cumomers. Cumomer balances can be derived from isotopomer balances via a set of rules and they can be analytically solved. The obtained solution in the form of cumomer distribution vectors (cdvs) can be converted to idvs by means of a simple linear mapping. The bondomers that were introduced in the previous section were shown to be completely analogous to isotopomers, except that in isotopomers the binary digits represent 13 C or 12C atoms, whereas in bondomers they represent C-C bonds that either originate from the medium substrate or not. The analogy between isotopomers and bondomers extends to the cumomer concept. A bdv of a n-carbon compound can be converted to its corresponding 99

Chapter 5 cumulative bondomer distribution vector (cbdv) by means of the same linear mapping matrix T that Wiechert et al. (1999) introduced for the purpose of isotopomer-to-cumomer mapping:

cbdv =Tn-1 ⋅ bdv

( 3)

 T Tm  T0 =1, Tm+1 =  m   0 Tm  In Eq.3, n is the number of carbon atoms is the compound to which the cbdv and BVD correspond. The subscript ‘n-1’ reflects the fact that this n-carbon compound has 2n-1 (cumulative) bondomers. The elements of a cbdv, the cumulative bondomer fractions, represent fractions of a given molecule that have C-C bonds originating from the medium substrate at specific positions (indicated by ‘1’) and that may have either C-C bonds originating from the medium substrate or newly formed C-C bonds at the remaining positions (indicated by ‘x’). E.g. cumulative bondomer e4px1x denotes the fraction of e4p that has a retained C-C bond at the second position and either retained or new bonds at the first and third positions. This cumulative bondomer fraction is the sum of a number of bondomers: 1

e4p x1x = ∑ e4pi1j =e4p 010 +e4p 011 +e4p110 +e4p111 i,j=0

(4)

The weight of a cumulative bondomer is defined as its number of retained C-C bonds, in analogy to the weight of a cumomer that was defined by Wiechert et al. (1999) as its number of 13C-labeled carbon atoms. The weight of e4px1x is one and the weight of e4p11x is two. The number of cumulative bondomers of weight w of a n-carbon compound is given by: ( n-1)! c= ( 5) ( n-1-w )!⋅ w! Cumulative bondomer balances can be computed from bondomer balances in a way that is identical to the computation of cumomer balances from isotopomer balances (Wiechert et al.,1999). Every linear or branched metabolite has two times fewer C-C bonds than carbon atoms, and thus two times fewer cumulative bondomers than cumomers. It follows that for any metabolic network both the number of cumulative bondomer balances and the number of variables therein are only half those of a set of cumomer balances. Converting measured relative multiplet intensities to bondomers Using the probability equations introduced by Szyperski (1995) the vectors containing the measured relative multiplet intensities (rivs) can be converted to bdvs (for more information see Chapter 6). This can be conveniently done in a linear transformation with a square probability matrix K of which the rows represent the ratios of (groups of) isotopomer fractions that are observed as various fine structures in 13C-NMR multiplets and the columns represent the possible bondomers (note: ordinary bondomers; not cumulative ones) of the observed carbon-fragment. Each element Ki,j represents the chance that the distribution of intact medium substrate fragments corresponding to bondomer j leads to the 13C-labeling pattern that is observed as fine structure i. The concept of the probability equation is illustrated in Eq.6 with an example of a three-carbon fragment of which a singlet, two doublets and a double doublet are observed in the multiplet: 100

Cumulative bondomers bdv =K -1 ⋅ riv

(6) bondomers:

K

=

ratio of (groups of) isotopomer fractions (fine structure):

e.g.:

00

01

10

11

 K1,1  011 x1x (doublet a)  K 2,1  K 3,1 110 x1x (doublet b)  111 x1x (double doublet)  K 4,1

K1,2

K1,3

K 2,2 K 3,2

K 2,3 K 3,3

K 4,2

K 4,3

K1,4   K 2,4  K 3,4   K 4,4 

010 x1x (singlet)

K 2,3 = expected relative intensity of doublet a (= abundance of isotopomer 011 divided

by abundance of cumomer x1x) when bondomer = 10 (i.e. the first C-C bond was retained, therefore the first two carbons (labeled 01) stem from one medium substrate molecule; the second C-C bond was newly formed, therefore the third carbon (labeled 1) stems from another medium substrate molecule) =  chance of fragment 01 by natural labeling +   chance of fragment 1 by natural labeling +   ×   " " " " by uniform substrate labeling   " " " " by uniform substrate labeling  =  chance of labeling x1x by natural labeling +     " " " " by uniform substrate labeling 

( (1-P ) ⋅ (1-P ) ⋅ P f

n

n

+ Pf ⋅ 0 ) ⋅ ( (1-Pf ) ⋅ Pn +Pf ⋅1)

(1-Pf ) ⋅ Pn +Pf ⋅1

= (1-Pf ) ⋅ (1-Pn ) ⋅ Pn

In Eq.6, Pf is the fraction of uniformly 13C-labeled carbon substrate that was fed to the growing culture and Pn is the fraction of naturally 13C-labeled substrate. Note that the denominator in the last line of Eq.6 is in fact the fractional 13C-enrichment which is identical for each carbon position when an uniformly 13C-labeled medium substrate is used. In case the medium contains multiple carbon substrates, they must all be uniformly 13C-labeled to the same extent to have identical fractional 13C-enrichments for each carbon position in the metabolic network. Some measurements of multiplets consist of overlapping multiplets of two different carbons within a molecule that have identical chemical surroundings and therefore appear at exactly the same co-ordinates in a 2D [13C,1H] COSY spectrum (e.g. the multiplets of the two δ-carbons of tyrosine) (Szyperski,1995). This means that one single set of relative intensities is observed, but they are caused by two sets of bondomers. When calculating the bdv of a carbon fragment from the corresponding riv this is taken into account as follows: bdv = ( 0.5 ⋅ [ K K ]) ⋅ riv #

(7)

In Eq.7 the ‘#’ indicates the pseudo inverse which is needed because the inverted matrix is not square. The elements of the bdv that are thus calculated are not fully determined, as Eq.7 is an underdetermined set of linear equations. However, this uncertainty is accounted for by

101

Chapter 5 calculating the covariance matrix of this bdv from the covariance matrix of the riv by means of the same linear transformation. A second problem in determining bondomers from relative intensities is that of overlapping doublets within a multiplet of a three-carbon fragment with identical 13C-13C scalar coupling constants (e.g. the doublets of the δ-carbon position of tyrosine) (Szyperski,1995). In that case only three relative intensities are measured which do not define all four bondomers. This can be taken into account when calculating the bdv of a carbon fragment from the corresponding riv as follows: #

1 0 0 0     bdv =   0 1 1 0  ⋅ K  ⋅ riv 0 0 0 1    

(8)

Similarly to Eq.7, the elements of the bdv that are calculated using Eq.8 are not fully determined, as Eq.8 also represents an underdetermined set of linear equations. Again, this uncertainty is accounted for by calculating the covariance matrix of this bdv from the covariance matrix of the riv by means of the same linear transformation. Note that information regarding the natural occurrence of 13C-atoms (Pn) and the percentage of uniformly 13C-labeled medium substrate (Pf) is only needed in the conversion of bdvs to rivs or vice versa (see Eq.6). The (cumulative) bondomers in themselves can be simulated and interpreted without knowing the substrate labeling. As a result of this it can be studied whether the measurement-derived bondomers theoretically allow identifiability of the free fluxes. However, this analysis is often impeded in practice by the fact that not all bondomers are fully determined due to overlapping multiplets or overlapping peaks within multiplets (see discussion above). In the latter case one has to analyze the identifiability of the free fluxes from the measurable relative intensities. Still, the fact that the percentage of uniformly 13C-labeled substrate only appears in the conversion of bdvs to rivs does allow a separate identifiability step that was not possible using isotopomer/cumomer modelling. It can namely be verified whether the bondomers can be determined from the corresponding relative intensities for the chosen 13C-labeling of the carbon substrate in the medium. In other words, it can be checked whether all probability matrices K (see Eq.6) are non-singular. A corollary is that optimally designing the percentage of uniform 13C-labeling of the carbon substrate that is used in an experiment is reduced to simultaneously maximizing the signal to noise ratio of the NMR measurements and minimizing the condition number (the ratio between the largest and smallest singular values) of the matrices K corresponding to the measured spectra. An analysis of the identifiability of bondomers from relative intensities and the dependence of the condition number of K on the fraction of uniformly labeled glucose is illustrated by means of an example in the section ‘Identifiability analysis’ in the practical application presented below. Reducing the cumulative bondomer balances If the purpose of setting up cumulative bondomer balances is metabolic flux determination by means of simulating the same bondomer fractions as those that were calculated from the measured rivs, then the number of balances can be substantially reduced. It is assumed here 102

Cumulative bondomers that prior to setting up the cumulative bondomer balances of a metabolic network, the network itself has already been reduced by removing all metabolite pools that do not have more than one influx (so-called ‘linear’ or ‘divergent’ nodes) and by lumping metabolite pools that are in isotopic equilibrium as was described in Chapter 3 of this thesis. The compounds of which 2D [13C,1H] COSY spectra are measured are metabolic intermediates or biomass components (such as amino acids) that are derived thereof. When this NMR technique is used for measuring one-bond 13C-13C scalar couplings, the observed carbon fragments are either two-carbon fragments for terminal carbons or three-carbon fragments for centrally embedded carbons (Szyperski,1995). By means of CCMMs one can map which C-C bonds of the metabolic intermediates are present in the observed fragments. Below, an example is given for the six three-carbon fragments of a tyrosine molecule that yield multiplets in a 2D [13C,1H] COSY spectrum of cell protein lysate. The lumped biosynthesis reaction for tyrosine is shown in Fig.2. 1 0 0 1 0 0 0 CCMM pep>tyr-α =  CCMM pep>tyr-δ1 =  CCMM e4p>tyr-ε1 =     0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 CCMM pep>tyr-β =   CCMM e4p>tyr-δ2 =   CCMM e4p>tyr-ε2 =   0 0 0 0 1 0 1 0 In the above example the three-carbon fragments are defined as follows: tyr-α includes the carboxy carbon, Cα and Cβ, tyr-β includes Cα, Cβ and Cγ, tyr-δ1 includes Cγ, Cδ1, and Cε1, tyrδ2 includes Cγ, Cδ2 and Cε2, tyr-ε1 includes Cδ1, Cε1, Cξ and, finally, tyr-ε2 includes Cδ2, Cε2 and Cξ. CO2 COO

COO

COPO3 + COPO3 + CH2

CH2

COO

CHO

CαHNH3

CHOH

CβH2

CHOH

Cγ H Cδ2H Cδ1H

CH2OPO3

Cε2H Cε1H CξOH

pep

pep

e4p

tyr

FIGURE 2: The lumped biosynthesis reaction of tyrosine from intermediates of the glycolysis and pentose phosphate pathway. The C-C bonds are shown in various styles to indicate their mapping. The dashed C-C bonds in tyr are newly formed in the reaction.

From the entire set of CCMMs for all the observed fragments one can determine which cumulative bondomers of the metabolic intermediates must be simulated. This is explained by means of an arbitrary CCMMI>F that maps which C-C bonds in intermediate I are retained at what positions in observed fragment F. The columns of CCMMI>F that have entries equal to one indicate which C-C bonds of I end up in F. The fractions of the subsets of the bondomers 103

Chapter 5 of I that only vary in those specific C-C bonds fully determine bdvF. In the above example CCMMe4p>tyr-ε2 indicates that the C-C bonds at the second and third position of the e4pmolecule are mapped to the tyr-ε2 fragment. This means that bdvtyr-ε2 is fully determined by the following subsets of e4p-bondomers: e4px00, e4px01, e4px10 and e4px11, where x may be either ‘1’ or ‘0’. Note that these subsets are a mixed form of isotopomers (having subscripts ‘0’ and ‘1’) and cumomers (having subscripts ‘x’ and ‘1’). It can be easily shown that these subsets are related to the cumulative bondomers e4pxxx, e4pxx1, e4px1x and e4px11 by means of the following linear transformation:  e4p xxx   e4p x00       e4p xx1  =T ⋅  e4p x01  (9)  e4p x1x  c  e4p x10       e4p x11   e4p x11  Matrix Tc in this equation is as defined in Eq.3 where the subscript c is the number of columns in CCMMe4p>tyr-ε2 that have an entry equal to one (i.e. c=2 in this case). The lefthand side of Eq.9 shows that the cumulative bondomer fractions that fully determine bdvtyr-ε2 are the 0-cumulative bondomer fraction (which always equals 1) plus those in which one or more of the mapped C-C bonds is a ‘1’. Generalizing this for any CCMMI>F the subset of cumulative bondomers of intermediate I that fully determine bdvF is given by:

{all cumulative bondomers I } ijk....

(10 )

i,j,k,... = 'x' if sum ( column i,j,k,... of CCMM I>F ) =0 with  i,j,k,... = '1' or 'x' if sum ( column i,j,k,... of CCMM I>F ) =1 This rule leads to an important reduction of the number of cumulative bondomers that need to be simulated. As each row of a CCMMI>F can only contain a single entry equal to one, the number of columns of which the sum equals one is smaller than or equal to the number of rows. It was discussed above that if the NMR-measurements only yield one-bond 13C-13C scalar couplings, the observed carbon fragments maximally contain three carbons (i.e. two CC bonds). Consequently, the maximal number of rows in a CCMMI>F is two. Therefore, the maximal weight of the cumulative bondomers that need to be simulated equals two in this case. The rule of weight preservation (Wiechert et al.,1999) states that terms in cumomer balances never contain cumomers of higher weight than that of the cumomer that is balanced. The same holds for cumulative bondomer balances. By consequence, in order to simulate cumulative bondomers up to a maximal weight of two, no cumulative bondomers of higher weight need to be simulated. This causes a dramatic reduction in the number of cumulative bondomer balances, especially when intermediates have long carbon-backbones (such as s7p that contains seven carbons). Note that the approach that is presented above can also be applied to cumomer modeling, in which case the CCMMs have to be replaced by AMMs. This also leads to a reduction of the number of cumomers that need to be simulated. In case the observed carbon fragments maximally contain three carbons, only cumomers up to a maximal weight of three would need to be simulated.

104

Cumulative bondomers In order to explain which cumulative bondomer balances are needed in order to simulate the reduced set of 0-, 1- and 2-cumulative bondomers we first introduce the general structure of the 0-, 1- and 2-cumulative bondomer balances for a metabolic network with an arbitrary topology, which has the medium substrate S and intermediates A and B: 0-cumulative bondomer balances: cbdv A,weight=0  f0 ( v, BMMs ) ⋅   =g 0 ( v,BMMs, cbdv S,weight=0 ) cbdv B,weight=0  1-cumulative bondomer balances:

(11)

cbdv A,weight=1  f1 ( v, BMMs ) ⋅   =g1 ( v,BMMs, cbdv S,weight =1 ) cbdv B,weight=1  2-cumulative bondomer balances: cbdv A,weight=2  f 2 ( v, BMMs ) ⋅   =g 2 ( v,BMMs, cbdv S,weight=2 , cbdv A,0
In the cumulative balance of weight W the function ‘fW’ represents a square matrix, the entries of which are functions of the set of metabolic fluxes (v) and of the bondomer mapping matrices (BMMs) that represent all the C-C bond mappings in the network. The matrices fW are multiplied with a vector containing the vertically concatenated vectors cbdvX,weight W each of which holds the subset of cumulative bondomers of weight W of intermediate X. To the right-hand side of the equation sign, ‘gW’ represents a vector, the entries of which are functions of the metabolic fluxes, the BMMs, the cumulative bondomers of the substrate of weight W and the cumulative bondomers of the intermediates A and B of a weight higher than zero and smaller than W. The exact form of the balances in Eq.11 follows from the cumomer theory in Wiechert et al. (1999) to which the cumulative bondomer theory introduced in this chapter is analogous. An example of a set of cumulative bondomer balances of weights 0, 1 and 2 of a realistic metabolic network is presented in the section ‘Practical Application’. The general structure of Eq.11 helps to explain another aspect in which cumulative bondomer balances are less complex than cumomer balances. The vectors cbdvS on the righthand side terms of Eq.11 contain the distributions of the cumulative bondomers of medium substrate S. By definition, S only has retained C-C bonds and no newly formed C-C bonds. Therefore, the cbdvS of any weight reduces to a vector containing only ones. Assume that the only available NMR measurements are those of a fragment F that is made exclusively of intermediate B. Based on CCMMB>F, the rule of Eq.10 and an transformation similar to Eq.9 it can be determined which 0-, 1- and 2-cumulative bondomers of B need to be simulated. This set of cumulative bondomers is called the ‘primary set’. The cumulative bondomer balances that correspond to the primary set seldom suffice for the simulation of this primary set. The reason is that the balances in their turn often contain other cumulative bondomers of the values of which the primary set depends. This ‘secondary set’ of cumulative bondomers therefore needs to be simulated as well, meaning that the corresponding cumulative bondomer balances have to be included, and so on. This asks for a structured approach. The minimal subset of cumulative bondomer balances that is required for the simulation of the primary set can be determined by the following cascade-like-procedure: 105

Chapter 5 1) The matrices fW in Eq.11 are often sparse. They can be rearranged to a block diagonal form. The elements in the vectors at both sides of Eq.11 have to be rearranged accordingly. This rearrangement shows the independent sets of balances for each cumulative bondomer weight. 2) The primary set of cumulative bondomers is determined from all CCMMI>F. 3) The required balances are determined weight-by-weight in a decreasing order. The set of balances of the highest weight (i.e. weight 2 in Eq.11) is considered first. This weight is hereby defined as weight W. 4) It is determined which cumulative bondomers of the weight W are in the primary set. This is the primary subset w. 5) The balances of weight W that are minimally needed are all those that have nonzero entries in the blocks to which one or more elements of the primary subset w belongs. 6) The terms on the right hand side of the minimal needed set of balances of weight W may contain cumulative bondomers of weights lower than W. These are added to the primary subsets of cumulative bondomers of those specific weights. 7) W is decreased by one until W=0. The above steps 4) - 7) are repeated for each W. This procedure yields the minimal set of cumulative bondomer balances that is fully independent of the remaining balances. This set can be used to simulate the cumulative bondomers for any set of fluxes. From these simulated cumulative bondomers one can calculate the simulated equivalents of the measured rivs as was explained above (Eqs.3 and 6). An iterative determination of the flux set that yields the smallest (error weighted) difference between the simulated and measured rivs completes the metabolic flux analysis. glc v1 tre

v9

g6p v2f/b

man

v10

v5

p5p

v12

his

v8f/b e4p v13

f6p

v3

ery

v7f/b tp

s7p

v4

tyr

v6f/b v11

phe

FIGURE 3: The reduced metabolic network consisting of the glycolysis, the conventional pentose phosphate pathway and the lumped effluxes towards all biomass components and storage compounds. White boxes: metabolite pools that have more than one influx and therefore need to be included in the 13 C-labeling distribution model. Gray boxes: biomass components of which the 13C-labeling is measured. (For full names see p.1).

106

Cumulative bondomers 5.3 PRACTICAL APPLICATION The above theory is applied to the metabolic network that consists of all glycolytic and conventional pentose phosphate reactions. The possible additional pentose phosphate reactions discussed in Chapter 2 of this thesis were not included here. The network is reduced following the theory in Chapter 3 by lumping pools that are assumed in isotopic equilibrium: g3p (for abbreviations see p.1), dhap, bpg, 2pg, 3pg, pep are lumped as triose phosphates (tp) and r5p, ru5p, x5p are lumped as pentose 5-phosphates (p5p). Further reduction is achieved by omitting the single influx metabolic pool fbp. The reduced network is shown in Fig.3. The medium substrate of the network is a mixture of naturally labeled and uniformly labeled 13C6-glucose. Measured fluxes are v1 (glucose uptake) that is directly measured and fluxes v9 to v13 that are derived from the measured rate and known stoichiometry of biomass formation. Setting up the 0-cumulative bondomer balances The 0-cumulative bondomer balances (mass balances) of the network are set up first. These balances can be combined with the measured rates in the following convenient notation (Chapter 3 of this thesis):  0  S   ⋅ v=  v  ⇔ R  m

g6p-balance: f6p-balance: tp-balance: p5p-balance: e4p-balance: s7p-balance: v1 -measurement: v9 -measurement: v10 -measurement: v11 -measurement: v12 -measurement: v13 -measurement:

1  ⋅ ⋅  ⋅ ⋅  ⋅ 1  ⋅  ⋅ ⋅  ⋅  ⋅

−1 1 ⋅ 1 −1 − 1 ⋅ ⋅ 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

⋅ ⋅ −1 ⋅ ⋅

(12 )

−1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1 −1 1 −2 2 ⋅ ⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ 1 −1 ⋅ 1 1 −1 − 1 ⋅ ⋅ ⋅

⋅ −1 1 ⋅ −1

⋅ 1 1 −1 −1 1 ⋅ ⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

















⋅ -1 ⋅ ⋅ −1 ⋅ -1 ⋅ −1 ⋅ ⋅ -1 1 ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅

⋅ ⋅

⋅ ⋅ ⋅ -1 ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅







1 ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅

 v1     v 2f   v 2b  ⋅    0    v3   0  ⋅   v4     0  ⋅     v5   0  ⋅    v6f   0  -1     v6b   ⋅    0  ⋅ v7f  =  v  ⋅    v7b   m,1  ⋅    v m,9    v8f   v  ⋅  m,10  v8b     v m,11  ⋅     v9   ⋅   v m,12   v10   v m,13  1     v11      v12   v   13 

In this equation, S is the stoichiometry matrix and R is the matrix of which the elements indicate which rates are (indirectly) measured. In this equation and the ones that follow the dots indicate zeros, the vector v contains the fluxes and vector vm holds the measured rates. As was shown in Chapter 3 the underdetermined set of balances in Eq.12 can be solved in a general way to yield:

107

Chapter 5  ⋅  3  ⋅  1 1   −3  ⋅  1 * v =v m +  ⋅  1   ⋅ 1   ⋅  ⋅   ⋅  ⋅   ⋅ 







1





1

























1





1







1





1











































⋅

  ⋅  ⋅ ⋅  ⋅ ⋅  ⋅  ⋅ ⋅β  ⋅  1 1  ⋅ ⋅  ⋅  ⋅  ⋅  ⋅

(13)

*

In Eq.13, vm is the part of the solution that is fixed by the measured rates and that is calculated by multiplying the pseudo inverse of matrix (ST RT)T with the vector (0T vmT)T. The matrix in Eq.13 is a nullspace of the matrix (ST RT)T. Vector β contains the five free parameters that fix the degrees of freedom that remain for this network. The entries in the representation of the nullspace shown in Eq.13 show that parameter β1 represents the split ratio between the glycolysis and pentose phosphate pathway, and that parameters β2 to β5 represent the exchange fluxes of the hexose isomerase reaction (v2: g6p ↔ f6p), transketolase reactions (v6: 2 p5p ↔ s7p + tp and v8: e4p + p5p ↔ f6p + tp), and transaldolase reaction (v7: s7p + tp ↔ f6p + e4p). Setting up the 1- and 2-cumulative bondomer balances Simulation of the 13C-labeling distribution in the network of Fig.3 using the isotopomer/cumomer-balancing method requires a total of 312 balances (26 both for g6p and f6p, 23 for tp, 25 for p5p, 24 for e4p and 27 for s7p). The total number of bondomer/cumulative bondomer-balances is only half of that. If we want to explore the possibilities to reduce these balances to the minimal set needed to simulate data that can be compared with measured 2D [13C,1H] COSY data, we need to define the measurement data first. The network in Fig.3 includes fluxes from the intermediate pools towards the storage compounds tre, man and ery and towards the amino acids his, phe and tyr of all of which 2D [13C,1H] COSY spectra can be measured. Table 2 contains all the CCMMI>F of the reactions that lead from the metabolic intermediates to the two- and three-carbon fragments in the compounds of which spectra were measured. Details regarding these spectra were given in Chapter 2 of this thesis.

108

Cumulative bondomers Since the maximal number of C-C bonds in the measured carbon-fragments is two, no cumulative bondomers of weight higher than two need to be simulated. This causes a first reduction of the total of 156 cumulative bondomer balances to a mere 70 (25 and 45 balances for the 1- and 2-cumulative bondomers respectively, cf. Eq.5). TABLE 2: The C-C bond mapping matrices describing the formation of the two- and three carbon fragments that are measured for flux analysis of the network in Figure 3 measured compound histidine

multi- CCMMI>F plet #

multi- CCMMI>F plet #

1

10

2 3 tyrosine

4 5 6

7

phenyl- 8 alanine 9

measured compound  0 0 0 1  manniCCMM p5p>his-α =   tol 0 0 1 0 0 0 1 0 CCMM p5p>his-β =   0 1 0 0 CCMM p5p>his-δ = (1 0 0 0 ) 1 CCMM tp>tyr-α =  0 0 CCMM tp>tyr-β =  0 0 CCMM tp>tyr-δ1 =  0 0 CCMM e4p>tyr-δ2 =  0

CCMM f6p>man-c6 = ( 0 0 0 0 1)

11

0  1 1  0

trehalose

1  0

CCMM f6p>man-c1 = (1 0 0 0 0 )

1 CCMM f6p>man-c2 =  0 0 CCMM f6p>man-c5 =  0

0 0 0 0  1 0 0 0 0 0 0 1  0 0 1 0

12

CCMM g6p>tre-c1 = (1 0 0 0 0 )

13

1 CCMM g6p>tre-c2 =  0 0 CCMM g6p>tre-c3 =  0

0 0 0 0  1 0 0 0 1 0 0 0  0 1 0 0

0 CCMM g6p>tre-c4 =  0 0 CCMM g6p>tre-c5 =  0

0 1 0 0  0 0 1 0 0 0 1 0  0 0 0 1

0 0  0 1  0 0 0 CCMM e4p>tyr-ε1 =   1 0 0 0 0 1 CCMM e4p>tyr-ε2 =   0 1 0

14

1 CCMM tp>phe-α =  0 0 CCMM tp>phe-β =  0

17

CCMM g6p>tre-c6 = ( 0 0 0 0 1)

18

CCMM e4p>ery-c1 = (1 0 0 )

15 16

0  1 1  0

erythritol

CCMM e4p>ery-c4 = ( 0 0 1)

Based on the 23 CCMMI>F of Table 2 this subset of 70 cumulative bondomer balances can be further reduced. As was discussed in the theory section, one can determine from the CCMMI>F which cumulative bondomers directly correspond to the measured data. In the present example, this ‘primary set’ (see theory section ‘Reducing the cumulative bondomer balances’) consists of 28 cumulative bondomers (18 of weight 1 and 10 of weight 2). In order to calculate the cumulative bondomers in this primary set, one has to solve both the balances that correspond to these cumulative bondomers and the balances of the cumulative bondomers that appear in these primary balances but do not belong to the primary set themselves. This leads to a minimally needed set of 43 balances: the 25 1-cumulative bondomer balances shown in Eq.14 and the 18 2-cumulative bondomer balances shown in 109

Chapter 5 Eq.15. In these equations ‘diag(x)’ indicates a diagonal matrix of which the diagonal elements are given by vector x.

M1 =

v1 +v 2b   v1 +v 2b  v1 +v 2b  v1 +v 2b   v1 +v 2b  v +v 2f 7f +v8f   v 2f +v7f +v8f  v 2f +v7f +v8f  v +v +v 2f 7f 8f   v 2f +v7f +v8f  2 ⋅ v3 +v6f +v7b +v8f  2 ⋅ v +v +v +v 3 6f 7b 8f  ( diag  v5 +2 ⋅ v 6b +v8b  v5 +2 ⋅ v 6b +v8b  v +2 ⋅ v +v 5 6b 8b  ⋅ v +2 v +v 5 6b 8b  v 7f +v8b   v 7f +v8b  v 7f +v8b  v6f +v 7b   v6f +v 7b  v6f +v 7b  v6f +v 7b   v6f +v 7b  v6f +v 7b 

⋅  ⋅ ⋅ ⋅ ⋅ ⋅ -v 2b  ⋅ ⋅ ⋅ ⋅ ⋅ -v 2b ⋅  ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  -v ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  2f ⋅ ⋅  ⋅ -v 2f ⋅ ⋅ ⋅  ⋅ ⋅ -v 2f ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ -v ⋅ ⋅ ⋅ 2f  -v ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  2f  ⋅ ⋅ ⋅ ⋅ ⋅ -v3 -v 7b ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v3 -v 7b  -v ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  5  ⋅ -v5 ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ -v ⋅ ⋅ ⋅ ⋅ 5  -v ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 5   ⋅ ⋅ ⋅ ⋅ ⋅ -v8b ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ -v8b ⋅  ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅

110

⋅ ⋅ -v 2b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v8b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

⋅ ⋅ ⋅ -v 2b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v3 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 7b ⋅

                 + M1 ) ×                

 g6p xxxx1   v1   g6p  v  xxx1x    1  g6p xx1xx   v1   g6p x1xxx   v1   g6p1xxxx   v1   f6p  0 xxxx1      f6p xxx1x  0  f6p xx1xx  0  f6p  0 x1xxx      f6p1xxxx  0  tp x1  0  tp  0 1x      p5p xxx1  =  0   p5p xx1x  0  p5p  0 x1xx      p5p1xxx  0  e4p xx1  0  e4p x1x  0      e4p1xx  0  s7p xxxxx1  0  s7p xxxx1x  0  s7p  0 xxx1xx      s7p xx1xxx  0  s7p x1xxxx  0  s7p  0  1xxxxx    

(14 )

⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 2b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 7f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v8f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 7f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v8f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v8f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v8f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v3 -v 6f -v8f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 6f -v8f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 6b ⋅ ⋅ ⋅ ⋅ -v 6b -v8b -v6b -v8b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 6b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 6b ⋅ -v8b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 6b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 7f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v7f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 7f ⋅ -v 6f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 7b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 6f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v7b ⋅ ⋅ ⋅ ⋅ ⋅ -v 6f ⋅ ⋅ ⋅ -v 7b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 6f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 7b ⋅ ⋅ ⋅ ⋅ ⋅ -v 6f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

⋅ ⋅  ⋅ ⋅   ⋅ ⋅  ⋅ ⋅  ⋅ ⋅   ⋅ ⋅  ⋅ ⋅  ⋅ ⋅  -v7f ⋅  ⋅ -v 7f  ⋅ ⋅  ⋅ ⋅   ⋅ ⋅  ⋅ ⋅  ⋅ ⋅   ⋅ -v 6b  ⋅ ⋅  ⋅ ⋅   ⋅ ⋅  ⋅ ⋅  ⋅ ⋅   ⋅ ⋅  ⋅ ⋅  ⋅ ⋅   ⋅ ⋅ 

Cumulative bondomers

v1 +v 2b    g6p xxx11   v1      v  v1 +v 2b g6p xx11x      1 v +v g6p 1 2b x11xx      v1  v1 +v 2b    g6p11xxx   v1   v 2f +v 7f +v8f   f6p xxx11  0   v +v +v  f6p  0 xx11x  2f 7f 8f        v 2f +v 7f +v8f  f6p x11xx  0   v 2f +v 7f +v8f  f6p11xxx  0  2 ⋅ v +v +v +v   tp  0 11 3 6f 7b 8f  +M 2 ) ×   =   ( diag   v5 +2 ⋅ v6b +v8b   p5p xx11  0  v5 +2 ⋅ v6b +v8b   p5p x11x  0   v +2 ⋅ v +v   p5p 0 11xx 5 6b 8b       v7f +v8b    e4p x11  0 v7f +v8b    e4p11x  0    s7p xxxx11  0 v6f +v7b       v6f +v7b    s7p xxx11x  0 v6f +v7b    s7p xx11xx  0  0         v6f +v7b    s7p11xxxx   

(15 )

⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ -v 2b  ⋅ ⋅ ⋅ ⋅ ⋅ -v 2b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  -v ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 2b  ⋅ ⋅ ⋅ -v 2b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅  -v 2f ⋅ ⋅ ⋅ -v7f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v8f ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ -v ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v8f ⋅ ⋅ ⋅ ⋅ 2f  -v ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  2f  ⋅ ⋅ ⋅ -v 2f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 7f  ⋅ ⋅ ⋅ ⋅ -v -v -v 6f -v8f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v3 ⋅ 3 7b M2 =  ⋅ ⋅ ⋅ ⋅ -v 6b -v8b ⋅ ⋅ ⋅ ⋅ ⋅ -v6b ⋅ ⋅ ⋅  -v5 ⋅ ⋅ ⋅  ⋅ -v5 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v6b ⋅ ⋅  ⋅ ⋅ -v ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v6b ⋅ 5  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v7f ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ -v8b ⋅ -v8b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v 7f ⋅ ⋅  ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ -v 6f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v7b ⋅ ⋅ ⋅ ⋅ ⋅  -v6f ⋅ ⋅ -v 7b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v6f ⋅ ⋅ ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅  ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -v7b ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

                      

The set of equations that is required to simulate the extensive set of measured data is thus reduced from the original complete set of 156 cumulative bondomer balances to 43, i.e. a reduction of 72%. Compared to the complete set of 312 cumomer balances that would be needed to simulate the 13C-labeling distribution in the same network, the reduction is even 86%. However, as stated in the theory section, the approach for finding the minimal set of cumulative bondomers that needs to be simulated can also be applied to cumomers, which would lead to a minimal set of 92 (31, 43 and 18 of the 1-, 2 and 3-cumomer balances). So compared to the minimal cumomer model, the minimal cumulative bondomer model leads to 47% fewer (and simpler) balances.

111

Chapter 5 As all cumulative bondomers of the medium substrate glucose equal one, the vector on the right hand side of Eq.14 contains flux v1 as the only variable. At first it seems surprising that the vector on the right hand side of Eq.15 also only contains this single variable v1 and no terms that consist of the product of the flux through a bimolecular reaction and two 1cumulative bondomer fractions of the reaction substrates. Apparently none of the needed 2cumulative bondomers is formed from a combination of two 1-cumulative bondomers. This is understood when one inspects the needed 2-cumulative bondomers. They share the common feature that they have two neighbouring retained C-C bonds. A fragment containing two neighbouring C-C bonds can be transferred from substrate to product, but it can never result from a bimolecular reaction. By consequence the 2-cumulative bondomers are no functions of the 1-cumulative bondomers in this case. The cumulative bondomer balances in Eqs.14 and 15 are very simple compared to the cumomer balances that would be needed to simulate the same measurement data for the network in Fig.3. Identifiability analysis The matrices in Eqs.14 and 15 are symbolically inverted to yield all cumulative bondomer fractions as functions of the fluxes vi, which are subsequently substituted by the general flux solution of Eq.13. Of the outcomes only the equations corresponding to the primary set of 28 cumulative bondomers need to be retained for subsequent identifiability analysis. These relations between the cumulative bondomers and flux parameters are symbolically derived towards the parameters, yielding a 28x5 Jacobian. This symbolic derivation is allowed by the limited size and number of the balances in Eqs.14 and 15 and would certainly not be feasible for cumomer balances of the same network. The identifiability of all fluxes from a given set of 2D [13C,1H] COSY measurements and measured net conversion rates can now be analyzed. In order to do so, values are to be filled in for all the fluxes as symbolic determination of the rank of the Jacobian still proved infeasible due to the large non-linear symbolic entries in this matrix. In this example the biosynthetic fluxes the values of which do not affect the identifiability analysis are set at zero and the glucose uptake is normalized to one: vm = (vm,1 ; v m,9 ; v m,10 ; vm,11 ; v m,12 ; v m,13)T = ( 1, 0 , 0 , 0 , 0 , 0)T. Furthermore the free flux parameters are chosen such that we obtain the following set of fluxes within the flux space defined by Eq.13: v1=1.0, v2f=1.3, v2b=0.9, v3=0.8, v4=1.8, v5=0.6, v6f=0.5, v6b=0.3, v7f=0.5, v7b=0.3, v8f=0.5, v8b=0.3, v9=0, v10=0, v11=0, v12=0, v13=0. The variables in the Jacobian are substituted by these values. As stated in the theory section the identifiability analysis of free parameters from NMR measurements can theoretically be divided into two steps, namely: (1) analyzing whether the bondomers allow identifiability of the free fluxes and (2) checking if those bondomers can be determined from the corresponding measured relative intensities. However, it was discussed that in practice one often encounters overlapping multiplets in a 2D [13C,1H] COSY spectrum and overlapping doublets within a multiplet due to identical 13C-13C scalar coupling constants, both of which situations lead to undetermined bondomers (see Eqs.7 and 8). Both these problems are encountered in the present example: multiplets 6, 7, 10, 11 and 18 in Table 2 consist of two overlapping multiplets and multiplets 6, 7, 11, 13, 14, 15 and 16 in Table 2 contain overlapping doublets. In order to take into account these phenomena of 112

Cumulative bondomers overlapping multiplets and overlapping peaks within multiplets, the identifiability analysis of free parameters from relative intensities is performed in a single step. This is done by applying the linear transformations of Eqs.3 and 6, 7 or 8 to transform the Jacobian with the derivatives of the cumulative bondomers towards the flux parameters to the Jacobian containing the derivatives of the corresponding relative intensities of the multiplets in Table 2 towards the parameters. Identifiability analysis can then be performed for all possible combinations of the multiplets in Table 2. Doing so, the entire group of relative intensities of each multiplet of the considered combination is assumed measured. Only when the rank of the corresponding Jacobian-submatrix equals the number of free parameters (i.e. 5 in this case) the flux parameters can be identified. Beforehand, it is clear that not a single multiplet suffices to identify all five parameters, because in the current case each multiplet consists of maximally four fine structures (a singlet, two doublets and a double doublet) that have only three independent relative intensities (the fourth complementing the sum of the three other to a total of one). A single multiplet will only suffice to fully identify all free parameters in case more than two 13C-13C scalar couplings can be observed in the 13C-NMR spectrum. In Chapter 4 of this thesis it was shown that this is feasible for the multiplets of δ- and εtyrosine and of β-histidine where three 13C-13C scalar couplings were observed, leading to seven independent relative intensities. The current identifiability analysis where two 13C-13C scalar couplings are assumed observable shows that whereas 57 out of the 153 combinations of two multiplets contain a total of at least five independently measured relative intensities, none of the 57 corresponding Jacobian-submatrices have a rank equal to five. Even the best Jacobian-submatrix (with the smallest condition number) had a condition number larger than 1014, i.e. was very close to singular. Only by combining three or more multiplets all parameters can be identified: 150 out of the 816 combinations of three multiplets have a rank of five. The condition number of the worst full-rank Jacobian-submatrices (closest to singularity) was smaller than 105, whereas the condition number of the best rank-deficient Jacobian-submatrix (furthest from singularity) was larger than 1013, which shows the clear-cut difference between the determined and underdetermined situations. The condition numbers of the 816 Jacobiansubmatrices of all combinations of three multiplets are shown in Fig.4. When checking the information content of all combinations of an increasing number of the multiplets in Table 2, the fraction of the possible combinations that identify the five parameters rapidly approaches one. In the present example, the lowest number of multiplets for which the five degrees of freedom are fixed for any combination of multiplets is 12. In other words, there is a combination of 11 multiplets that still does not allow identification of the five parameters, namely the combination of multiplets 3, 4, 5, 6, 7, 8, 9, 15, 16, 17 and 18 in Table 2. Note that of these 11 multiplets, only nine are actually different as the spectra 4 and 8 (tyr-α and phe-α) and spectra 5 and 9 (tyr-β and phe-β) are identical. The above identifiability analysis assumes that the applied carbon substrate in the medium consists of 10% uniformly 13C-labeled and 90% naturally labeled glucose (containing 1.1% 13C). It can be readily verified that this labeling does not cause singularity 113

Chapter 5 of any of the probability matrices K. For example, using this fraction of uniform labeling the condition number of the matrix K shown in Eq.6 is 1.48, i.e. K is far from singular. Figure 5 shows the condition number of matrix K versus the fraction uniformly labeled glucose (Pf).

number of combinations of 3 out of 18 multiplets

250

200

150

100

50

0

0

2

4

6 10

8

10

12

14

16

18

log(condition number)

FIGURE 4: Histogram showing the distribution of the logarithms of the condition numbers of the 816 Jacobian-submatrices corresponding to all combinations of 3 out of 18 multiplets.

The figure shows that the condition number is minimal for Pf=0.075. At this value of Pf the matrix K and its inverse are as close as possible to the identity matrix which means that errors in the relative intensities are hardly amplified when transforming them to bondomers. Interestingly, when natural 13C-labeling is neglected, the graph of Fig.5 shows a monotonous increase and the minimal condition number of K is found at the limit Pf↓0. It should be stressed that this analysis does not take into account the actual measurement errors of the relative intensities that are amplified by the matrix K. Therefore the present outcomes are not valid for specific cases. Moreover, when optimally designing an experiment it should be reminded that an increasing Pf causes a higher signal to noise ratio. By consequence, it is to be expected that the optimal value for Pf will be higher than 0.075.

114

Cumulative bondomers

5

10

4

condition number of K

10

3

10

2

10

1

10

0

10

0

0.1

0.2

0.3

0.4

fraction uniformly

0.5 13

0.6

0.7

0.8

0.9

1

C-labeled glucose

FIGURE 5: Condition number of K-matrix (see Eq.6) that maps bondomers of three-carbon fragment to the corresponding relative intensities (logarithmic scale) plotted against the fraction of uniformly 13C-labeled carbon substrate in the medium. The fraction of the carbon substrate that is not uniformly 13C-labeled is assumed to be naturally 13C-labeled (1.1% 13C).

Zero bondomers Suppose that the 43 cumulative bondomers of the metabolic intermediates (g6p, f6p, tp, p5p, e4p and s7p) are simulated using Eqs.13 to 15 and an arbitrary set of free flux parameters. An additional advantage of the use of bondomers then becomes clear when the 78 bondomers of the two- and three-carbon fragments in Table 2 are derived from these simulated cumulative bondomers. It is observed that some of the bondomers are zero and remain so for any other set of fluxes, showing that they cannot be formed in the studied metabolic network. Examples of such bondomers are tyr-β01 and tyr-β11. The explanation for this finding can be readily found by looking at CCMMtp>tyr-β in Table 2. The second row of this CCMM contains only zeros, showing that the second C-C bond in this fragment is never retained when tyrosine is synthesized. This can be verified in Fig.2. In the present example, 20 out of the 78 bondomers of the carbon fragments in Table 2 are zero due to the defined reaction mechanisms, irrespective of the fluxes. These ‘zero bondomers’ form additional constraints when analysing the relative intensities of the respective carbon fragments. In the above example of tyr-β the NMR-multiplet consists of 4 fine structures of which the relative intensities are determined with some error. Transforming 115

Chapter 5 these relative intensities to bondomers by means of the probability equations whilst constraining the values of two of the bondomer fractions to zero will reduce the errors in the remaining two bondomer fractions. The fact that quite a number of bondomers is zero by definition also helps to explain why it was found above that some combinations of up to 11 multiplets did not suffice to identify the five parameters. Some of the multiplets contain less information with respect to the metabolic fluxes than is expected on the basis of independently measured relative intensities alone, due to the fact that some of the bondomers causing the peaks are set at zero by the defined reaction mechanisms. For example, the multiplet of tyr-β contains four fine structures, but the relative intensities of these do not contain three independent data (loss of one degree of freedom due to normalization), as was previously assumed. The relative intensities only yield one single independent bondomer, due to the fact that two bondomers are zero by definition. Note that these observations could not be made from simulated isotopomer distributions. 5.4 CONCLUSIONS In this chapter, the concepts of bondomers and cumulative bondomers are introduced as practical entities for the simulation of 13C-labeling data in metabolic flux analysis studies where uniformly 13C-labeled carbon substrates are used. These concepts are shown to have, on the one hand, a mathematical analogy to the existing concepts of isotopomers and cumomers but to lead, on the other hand, to simulation models that are much smaller and simpler than the models needed to simulate isotopomers and cumomers. A realistic practical application of the theory to the metabolic network consisting of the glycolysis and pentose phosphate pathway illustrates the newly proposed concepts. The example shows that the presented theory allows a reduction of the complete cumomer model from the original 312 to a mere 92 cumomer balances and that the concept of cumulative bondomers allows a further reduction to only 43 cumulative bondomer balances. The bondomer concept has an advantage over isotopomers in that it offers easy intuitive interpretation because bondomer fractions are independent of the applied substrate labeling and simply represent molecules in which specific C-C bonds have been formed or retained in the metabolic network. In the practical application of the proposed concepts, this easy interpretation of bondomers led to the simple, but nevertheless surprising insight that many sets of relative intensities determined in 2D [13C, 1H] COSY spectra contain less information with regard to the fluxes than was previously thought. Until the present, a priori (i.e. non-numerical) identifiability analysis of the free fluxes in a metabolic network from 13C-labeling measurements has often been found practically impossible due to the large number and size of cumomer or isotopomer balances for the given network. The considerably smaller number and size of the cumulative bondomer balances bring a priori identifiability analysis within reach for larger networks than before. The practical application in this chapter illustrates an identifiability analysis, which is, however, still partly numerical. In the presented identifiability analysis it is shown that the bondomer concept allows a separate study of unidentifiability problems due to the network structure and due to the chosen fraction of uniformly 13C-labeled carbon substrate. 116

Cumulative bondomers REFERENCES Dauner, M., Bailey, J.E., Sauer, U. (2001) Metabolic flux analysis with a comprehensive isotopomer model in Bacillus subtilis. Biotechnol. Bioeng., 76, 2: 144-156 De Graaf, A.A., Mahle, M., Möllney, M., Wiechert, W., Stahmann, P., Sahm, H. (2000) Determination of full 13c isotopomer distributions for metabolic flux analysis using heteronuclear spin echo difference NMR spectroscopy. J. Biotechnol., 77: 25-35 Maaheimo, H., Fiaux, J., Çakar, Z.P., Mailey, J.E., Sauer, U., Szyperski, T. (2001) Central carbon metabolism of Saccharomyces cerevisiae explored by biosynthetic 13C labeling of common amino acids. Eur. J. Biochem., 268: 2464-2479 Möllney, M., Wiechert, W., Kownatzki, D., De Graaf, A.A.. (1999) Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol. Bioeng., 66, 2: 86-103 Petersen, S., De Graaf, A.A., Eggeling, L., Möllney, M., Wiechert, W., Sahm, H. (2000) In vivo quantification of parallel and bidirectional fluxes in the anaplerosis of Corynebacterium glutamicum. J. Biol. Chem., 275, 46: 35932-35941 Sauer, U., Lasko, D.R., Fiaux, J., Hochuli, M., Glaser, R., Szyperski, T., Wüthrich, K., Bailey, J.E. (1999) Metabolic flux ratio analysis of genetic and environmental modulations of Escherichia coli central carbon metabolism. J. Bacteriol., 181, 21: 6679-6688 Schmidt, K., Carlsen, M., Nielsen, J., Villadsen, J. (1997) Modeling isotopomer distributions in metabolic networks using isotopomer mapping matrices. Biotechnol. Bioeng., 55, 6: 831-840 Schmidt, K., Nørregaard, L.C., Pedersen, B., Meissner, A., Duus, J.Ø., Nielsen, J.Ø., Villadsen, J. (1999) Quantification of intracellular metabolic fluxes from fractional enrichment and 13C-13C coupling constraints on the isotopomer distribution in labeled biomass components. Metabol. Eng., 1, 2: 166-179 Szyperski,T. (1995) Biosynthetically directed fractional 13C-labelling of proteinogenic amino acids. Eur. J. Biochem., 232: 433-448 Szyperski,T., Bailey, J.E., Wüthrich, K. (1996) Detecting and dissecting metabolic fluxes using biosynthetic fractional 13C-labeling and two-dimensional NMR spectroscopy. TIBTECH, 14: 453-459 Van Winden, W.A., Heijnen, J.J., Verheijen, P.J.T. (2002) Cumulative bondomers: a new concept in flux analysis from 2D [13C,1H] COSY data. Accepted for publication in Biotechnol. Bioeng. Wiechert, W., Möllney, M., Isermann, N., Wurzel, M., De Graaf, A.A. (1999) Bidirectional reaction steps in metabolic networks: III. Explicit solution and analysis of isotopomer labeling systems. Biotechnol. Bioeng., 66, 2: 69-85 Zupke, C., Stephanopoulos,G. (1994) Modeling of isotope distributions and intracellular fluxes in metabolic networks using atom mapping matrices. Biotechnol. Progr., 10: 489-498

117

Chapter 5

118

Chapter 6

Systematic Approach for Converting Bondomers to 2D [13C,1H] COSY and MS Data

ABSTRACT In this chapter we present an algorithm that offers a general approach to find all probabilistic expressions needed to convert bondomer distributions to actually measured 13C-labeling data, such as mass distributions (measured by MS) or relative intensities in multiplets (measured by NMR). The algorithm can be coupled to an earlier presented bondomer simulation model that simulates 13C-labeling measurement data. The algorithm is straightforward and can be easily applied to measured molecules (or fragments thereof) of any size. The restriction is that this approach is only valid for 13C-labeling data measured in experiments where uniformly 13Clabeled substrate is fed to the culture.

119

Chapter 6 6.1 INTRODUCTION One of the types of 13C-labeling data that are used for metabolic flux analysis purposes is 2D [13C,1H] COSY. Chapter 5 of this thesis introduced the concept of ‘(cumulative) bondomers’ which allows efficient modeling of this type of data for a given metabolic network and set of metabolic reaction rates. The theory in that chapter includes a conversion from bondomers to the relative intensities of peaks in 2D [13C,1H] COSY spectra, which was based on the probabilistic equations introduced by Szyperski (1995). In his paper, Szyperski presented probabilistic equations that are needed to convert (analogs of) bondomers of a two-carbon fragment with an observed peripheral carbon atom and of a three-carbon fragment with an observed central carbon atom to the fine structures that are observed in multiplets measured for the concerning fragment. In this chapter we present a systematic approach for deriving Szyperski’s analytical expressions, which can also be applied for new situations such as three-carbon fragments with an observed peripheral carbon atom and four-carbon fragments. These latter situations are encountered in 2D [13C,1H] COSY spectra where long range 13C-13C scalar coupling are observable (Chapter 4 of this thesis). The probabilistic expressions that result from this approach are the entries of a probability matrix with which a bondomer distribution vector can be converted to its corresponding relative intensity vector or vice versa (Chapter 5 of this thesis). In addition, we will show that a similar linear transformation can be made to convert a bondomer distribution vector to its corresponding mass distribution vector. The probabilistic expressions needed for this new conversion of bondomers to mass isotopomers will be shown to follow from the same systematic approach that was applied for conversion of bondomers to relative intensities of fine structures in NMR spectra. The algorithm that is presented in this chapter is strictly limited to 13C-labeling experiments where uniformly 13C-labeled substrate is used. The reason for this is that bondomer modeling only tracks the positions of retained carbon-carbon bonds irrespective of the fact whether the bonded carbon atoms are 13C- or 12C-isotopes. These two isotopes must therefore be equally distributed over all carbon positions. This is the case for metabolites that stem from a partly naturally 13C-labeled (fraction Pn≈0.011), partly uniformly 13C-labeled (fraction Pf) carbon substrate in the feed. All carbon atoms then have an identical overall 13Cenrichment (Pt), namely: Pf+(1-Pf)*Pn. In case the feed medium contains more than one carbon substrate, the conditions for bondomer modeling are still being met if these substrates are all uniformly 13C-labeled to exactly the same extent. 6.2 THEORY Calculation of isotopomer distributions of fragments of medium substrate The first step in converting a bondomer distribution of a N-carbon molecule or fragment thereof to its relative intensity or mass distribution is to set up N vectors containing the frequencies of occurrence in the medium substrate of each of the 21, 22, …, 2L, ..., 2N isotopomers of fragments of length 1, 2, …, L, ..., N given the values of Pf and Pn. The Eth entry of the Lth vector (L≤N) is the frequency of occurrence of the isotopomer of the L-carbon fragment that has a labeling pattern equal to the L-bits binary representation of digit (E-1) 120

Converting bondomers to NMR and MS data (from here on represented as bin(E-1,L)). In the binary number a zero denotes a 12C-atom and a one a 13C-atom (Schmidt et al.,1997). E.g. the 7th entry of the vector of a fragment of length 4 is the 4-bit long binary representation of 6: bin(6,4)=0110. Since each fragment originates either from a naturally 13C-labeled or from an uniformly 13 C-labeled substrate, the frequency of occurrence of each isotopomer I of the given fragment is the sum of the occurrences of I in a naturally and uniformly labeled fragment of the same length. The terms in this sum are, of course, weighted by the relative amounts of naturally (1Pf) and uniformly labeled (Pf) medium substrate. It is easily understood that all frequencies of occurrence of isotopomers in a uniformly labeled fragments are zero, except for the fully labeled isotopomer (by convention the last element of an isotopomer distribution vector) of which the frequency of occurrence equals one. The frequency of occurrence of isotopomer I caused by natural labeling is given by the product of Pn to the power of the number of 13Catoms and (1-Pn) to the power of the number of 12C-atoms. In formula, the isotopomer distribution vector (idvL) of a fragment of length L is given by:

       Pn ^  ∑ bin (1, L )  ⋅ (1 − Pn ) ^  ∑ bin (1, L )    ones   zeros    0         P ^ bin 2, L 1 P ^ bin 2, L ⋅ − 0  ( ) ( ) ( ) ∑ ∑ n n      idv L = Pf ⋅   + (1 − Pf ) ⋅  ones zeros              1         L L  Pn ^  ∑ bin ( 2 , L )  ⋅ (1 − Pn ) ^  ∑ bin ( 2 , L )    ones   zeros  

(1)

Conversion of bondomer to isotopomer distribution Once the frequencies of occurrence of all the isotopomers of fragments of length 1 to N have been listed, the next step is to generate a vector with all bondomers for the N-carbon molecule or fragment thereof. Such a vector contains 2(N-1) elements of which the Eth element denotes the bondomer that has a retained C-C bond pattern equal to the (N-1)-bits binary representation of digit E-1 (i.e. bin(E-1,N-1)). In this binary number a one denotes a C-C bond that was already present in the medium substrate and a zero denotes a C-C bond that was newly formed in the metabolism (Chapter 5 of this thesis). This bondomer distribution vector is subsequently translated to a medium substrate fragment distribution. For example, bondomer 101 denotes a 4-carbon fragment in which the first and third C-C bonds were retained whereas the second C-C bond was formed in the metabolism. Consequently, the fragment consists of two joined 2-carbon fragments, each of which stems from a single medium substrate molecule. The medium substrate fragment distribution of bondomer 101 is {2,2}. Similarly, the feed substrate fragment distribution of bondomer 0011 equals {1,1,3}. Note that the sum of the lengths of all the medium substrate fragments equals the number N of carbon atoms in the studied molecule or fragment thereof. The bondomer distribution vector (bdv) and its medium substrate fragment distribution vector (fdv) of a 3-carbon molecule are give in Eq.2 below.

121

Chapter 6

 {1,1,1}   00     01  {1, 2}    → fdv = bdv = ( 2)  {2,1}   10     11     {3}  We can now compute the frequencies of occurrence of all isotopomers for a given bondomer. For this, we need the fdv and the N vectors containing the frequencies of occurrence of each of the 21, 22, …, 2N isotopomers of fragments of length 1, 2, …, N that were set up in the previous section. We take the first fragment length (L1) of the row of the fdv that corresponds to the specific bondomer. The isotopomer distribution of the fragment is given by the isotopomer distribution vector for fragment length L1, which is from here on called idvL1. Next, we take the second fragment length (L2) and its corresponding idvL2. The idvL1+L2 for fragment combination {L1,L2} is subsequently calculated by multiplying each of the 2L1 elements of idvL1 with the entire idvL2 and vertically concatenating the resulting vectors. The resulting idvL1+L2 has a length 2L1*2L2=2(L1+L2). When there are more feed substrate fragments, we take the idvL3 of the next fragment, multiply each element of idvL1+L2 with this new isotopomer distribution vector and again vertically concatenate the resulting vectors. This procedure is repeated until all feed substrate fragments of the specific bondomer have been taken. As we saw earlier, the sum of all these fragments (L1+L2+…+L(last)) equals N. Consequently, the finally obtained idvL1+L2+…+L(last) has a length 2L1+L2+…+L(last)=2N and thereby contains the frequencies of all the 2N isotopomers of the N-carbon molecule. The above is done for each bondomer of a bdv. When the outcomes are horizontally concatenated, this results in a 2Nx2(N-1) bondomer-to-isotopomer mapping matrix BIMM in which each column corresponds to a bondomer and each row to an isotopomer. Element BIMMi,j of this matrix is the relative occurrence of the ith isotopomer when the N-carbon molecule would uniquely consist of the jth bondomer. It is thus observed that the uniform 13Clabeling of the medium substrate constrains the relative frequencies of occurrence of all isotopomers such that all 2N isotopomer fractions can be calculated from only 2N-1 bondomer fractions. Conversion of isotopomer to relative intensity or mass distribution Finally, the frequencies of occurrence of the isotopomers have to be translated to the actually measured data. In their article, Möllney et al. (1999) give the measurement matrices M that map isotopomer distributions to measurement data vectors containing positional 13Cenrichments, relative intensities of NMR fine structures or mass fractions. The positional 13Cenrichment data are not relevant in this case, since we are considering 13C-labeling experiments in which a single, uniformly 13C-labeled substrate is used and all positional 13Cenrichments are therefore equal. When 2N isotopomers are mapped to D data, the measurement matrix M has the dimensions Dx2N. Multiplying this measurement matrix M with the 2Nx2(N-1) bondomer-toisotopomer mapping matrix (BIMM) that was obtained in the previous section gives the Dx2(N-1) probability matrix K (Chapter 5 of this thesis) of which the elements represent the chances of finding given relative intensities or mass fractions for each of the 2(N-1) bondomers. 122

Converting bondomers to NMR and MS data The measured mass fractions of a specific molecule or the fine structure areas of a given multiplet are commonly normalized to sum up to one. Therefore, all columns of probability matrix K have to sum up to one as well. This is always the case for probability matrices that convert bondomer fractions to mass fractions, because each isotopomer contributes to one of the mass fractions (i.e. all columns of M contain a one). For probability matrices that convert bondomer fractions to relative intensities, however, only a subgroup of isotopomers that are 13C-labeled at a specific carbon position contribute to the relative intensities (i.e. not all columns of M contain a one) resulting in columns of K of which the sum is smaller than one. The concerning matrices must be scaled, which is done by dividing the unscaled matrix by the positional 13C-enrichment (Pt) for the given values of Pf and Pn: K unscaled M ⋅ BIMM = K= ( 3) Pf + (1-Pf ) ⋅ Pn Pf + (1-Pf ) ⋅ Pn As stated above, all 2N isotopomers of a N-carbon compound can be calculated from 2(N-1) bondomer fractions (of which 2(N-1)-1 are independent). This automatically leads to the conclusion is that at most 2(N-1)-1 independent 13C-labeling data, be it MS, NMR or a combination of both, can be measured for this N-carbon compound. The calculation of 13C-labeling data vector (dv) from the bondomer distribution vector (bdv) is a simple linear transformation:

( 4)

dv =K ⋅ bdv

If the number of independent data points is smaller than the number of independent bondomer fractions, Eq.4 can not be inverted, since K is then singular. A generalized solution for the bondomer distribution can, however, be calculated by applying the pseudo-inverse (indicated by ‘#’) of K. When mapping the covariance matrix of the data (Cdv) by the same pseudoinverse of K, the mutual dependencies of the calculated bondomer fractions are accounted for in the resulting covariance matrix of the bondomers (Cbdv), as was also discussed in Chapter 5 of this thesis: bdv = ( K ) ⋅ dv #

(

Cbdv = ( K ) ⋅ Cdv ⋅ ( K ) #

)

# T

( 5)

6.3 PRACTICAL APPLICATION The above algorithm will be illustrated by means of an example. It is assumed that the complete 8x1 bondomer distribution vector of the TCA-cycle intermediate oxaloacetate (bdvoaa) has been simulated (see Eq.6). We will convert these bondomers to the mass distribution vector of oxaloacetate (mdvoaa) and the relative intensity vector (rivthr-α) of the multiplet of the α-carbon of threonine, an oxaloacetate-derived amino acid. This multiplet contains information about the first three of the four oxaloacetate carbon atoms. The relative intensity vector is to be computed from the bdvoaa,1-3 of the first two C-C bonds of oxaloacetate which can be derived from bdvoaa as shown in Eq.6.

123

Chapter 6

 ( 000= ) 0.55     ( 001= ) 0.06   ( 010= ) 0.03   ( 00= ) 0.55+0.06=0.61      ( 011= ) 0.13  ( 01= ) 0.03+0.13=0.16    , bdv oaa,1-3 = bdv oaa =   (10= ) 0.00+0.02=0.02  100= ) 0.00  (      (101= ) 0.02   (11= ) 0.20+0.01=0.21   (110= ) 0.20     (111= ) 0.01   

( 6)

• Step 1: We first set up 4 vectors containing the frequencies of occurrence in the medium substrate of each of the 2, 4, 8, 16 isotopomers of fragments of length 1, 2, 3, 4. The natural 13 C-labeling fraction (Pn) is assumed to be 0.011 and the fraction of uniformly 13C-labeled medium substrate (Pf) is 0.250. We apply Eq.1 for each fragment length. An example is shown in Eq.7 for fragment length L=2 and the complete list of frequencies of occurrence of all isotopomers is shown in Eq.8.

      0.011^  ∑ 00  ⋅ 0.989 ^  ∑ 00    ones   zeros     0       0.734  0.011^ 01 0.989 ^ 01 ⋅    ∑   ∑    0.008  0 ones    zeros      idv = 0.250 ⋅   + 0.750 ⋅   =  0.008  0      0.011^  ∑ 10  ⋅ 0.989 ^  ∑ 10      1    ones   zeros    0.250       0.011^  11 ⋅ 0.989 ^  11  ∑  ∑     ones   zeros    isotopomers of fragment lengths 1 to 4: 0  1                      

124

00 01 10 11

000 001 010 011 100 101 110 111

0000   0.742   0001   0.258 0010     0011   0100     0101   0110    = 0111     1000   1001     1010   1011     1100   1101     1110    1111  

frequency of occurrence: 0.734 0.008 0.008 0.250

0.726 0.008 0.008 0.000 0.008 0.000 0.000 0.250

0.718   0.008  0.008   0.000  0.008   0.000  0.000   0.000   0.008  0.000   0.000  0.000   0.000  0.000   0.000   0.250 

(8)

(7)

Converting bondomers to NMR and MS data • Step 2: Next, we set up a bondomer distribution vector and translate it to the corresponding medium substrate fragment distribution. As the mass distribution is determined for the entire oxaloacetate molecule and the relative intensity vector only represents a 3-carbon fragment thereof, a bdv and medium substrate distribution vector have to be set up both for lengths 3 and 4 (see Eq.9).

bdv oaa

 {1,1,1,1}   000       {1,1,2}   001   {1,2,1}   010   {1,1,1}   00          011  01  {1,3}  {1, 2}      = → fdv oaa =  , bdv oaa,1-3 = → fdvoaa,1-3 =  100   {2,1}   10  2,1,1}  {          3 { }  {2,2}   101   11     {3,1}   110       {4}   111   

(9)

We can now compute the frequencies of occurrence of all isotopomers for all elements of the above bdvs. This is illustrated for one of the elements of bdvoaa, namely 010. The row of fdvoaa that corresponds to this bondomer is {1,2,1}. We take the first fragment length (L1=1) and the respective idvL1 in Eq.8. Subsequently, we take the second fragment length (L2=2) and its corresponding idvL2. The idvL1+L2 for fragment combination {1,2} is calculated by multiplying each of the elements of idvL1 with the entire idvL2 and vertically concatenating the resulting vectors. This is repeated for the third fragment length (L3=1). The calculation of idvL1+L2+L3 is shown in Eq.10.

 0.742  idv L1 =    0.258  idv L1+L2+L3

idv L1+L2

  0.734    0.544        0.742 ⋅  0.008    0.006    0.008    0.006         0.250    0.186   = = 0.734    0.189        0.002   0.008     0.258 ⋅   0.008   0.002      0.250    0.064       

  0.742   0.404   0.544 ⋅    0.258       0.141     0.742    0.005   0.006 ⋅     0.258    0.002     0.006 ⋅  0.742    0.005        0.258    0.002      0.742    0.138    0.186 ⋅  0.258    0.048     = =     0.742    0.141   0.189 ⋅      0.258    0.049     0.002  0.742     0.002 ⋅       0.001   0.258      0.002   0.742      0.002 ⋅     0.001    0.258      0.048     0.742     0.064 ⋅     0.017  0.258    

(10 )

125

Chapter 6 Performing the above calculation for all bondomers in Eq.9 yields a 16x8 BIMM (the third column of which was calculated in Eq.10) for the entire oxaloacetate molecule and a 8x4 BIMM one for the 3-carbon fragment thereof. • Step 3: The measurement matrix for the mass distribution of oxaloacetate (Mmdv,oaa) and for the relative intensity vector of the threonine-α multiplet (Mriv,thr-α) are shown in Eq.11:

M mdv,oaa

 isotopomer → 0000 0001 0010 0011 0100 0101 0110 0111   mass ↓  M+0 1 0 0 0 0 0 0 0  = … M +1 0 1 1 0 1 0 0 0  M+2 0 0 0 1 0 1 1 0  M+3 0 0 0 0 0 0 0 1   M+4 0 0 0 0 0 0 0 0  isotopomer → mass ↓ M+0 M +1 M+2 M+3 M+4

M riv,thr-α

 1000 1001 1010 1011 1100 1101 1110 1111  0 0 0 0 0 0 0 0   1 0 0 0 0 0 0 0  1 0 1 0 0 0  0 1  0 0 0 1 0 1 1 0  0 0 0 0 0 0 0 1 

 isotopomer →  000 001 010 011 100 101 110 111   fine structure ↓   s 0 0 1 0 0 0 0 0  =  d1 0 0 0 1 0 0 0 0    d2 0 0 0 0 0 0 1 0    dd 0 0 0 0 0 0 0 1  

(11)

Multiplying these measurement matrices with the respective BIMMs yields the probability matrices K using which mdvoaa can be calculated from bdvoaa and rivthr-α can be calculated from bdvoaa,1-3. The probability matrices are shown in Eq.12. Note that K(mdvoaa,1-3→rivthr-α) was scaled to have columns summing up to one by means of Eq.3. K ( bdv oaa → mdv oaa ) =

 bondomer →   mass ↓  M+0  M +1   M+2  M+3   M+4 

126

000

001

010

011

100

101

110

0.303 0.422 0.220 0.051

0.404 0.290 0.193 0.097

0.404 0.290 0.193 0.097

0.538 0.205 0.007 0.186

0.404 0.290 0.193 0.097

0.538 0.205 0.007 0.186

0.718 0.032 0.001 0.000

0.004 0.017 0.017 0.065 0.017 0.065 0.250

 111   0.718   0.032  0.001   0.000  0.250 

(12 )

Converting bondomers to NMR and MS data K ( bdv oaa,1-3 → riv thr-α ) =  bondomer →   fine structure ↓  s  d1   d2  dd 

00

01

10

0.550 0.192 0.192 0.067

0.023 0.718 0.008 0.250

0.023 0.008 0.718 0.250

 11   0.031   0.000  0.000   0.968 

(12, continued )

• Step 4: Finally, the probability matrices in Eq.12 are multiplied with the bdvs of Eq.6 (see Eq.4) in order to give the data vectors. The resulting mass fraction vector of oxaloacetate is and the relative intensity vector of threonine-α are shown in Eq.13:

mdv oaa

 ( M + 0 = ) 0.398     ( M + 1 = ) 0.327  =  ( M + 2 = ) 0.148     ( M + 3 = ) 0.098   ( M + 4 = ) 0.029   

riv thr-α

 ( s = ) 0.346   ( d1 = ) 0.232   =  ( d2 = ) 0.133     ( dd = ) 0.289 

(13)

6.4 CONCLUSIONS In this chapter we have presented a straightforward algorithm that yields probability matrices that can be used to convert bondomer distribution vectors to 13C-labeling data, irrespective whether these data consist of mass fractions (measured by mass spectrometry) or relative intensities of fine structures in multiplets (measured by nuclear magnetic resonance spectroscopy). The method can be easily implemented in bondomer simulation software based on the theory presented in Chapter 5 in order to allow the simulation of specific measurement data for a given metabolic network and set of reaction rates. The algorithm was illustrated by a realistic example in which the mass distribution of oxaloacetate and the relative intensities in the multiplet of the α-carbon of the oxaloacetate-derived amino acid threonine were computed from the bondomer distribution of oxaloacetate. REFERENCES Möllney, M., Wiechert, W., Kownatzki, D., De Graaf, A.A. (1999) Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol. Bioeng., 66, 2: 86-103 Szyperski,T. (1995) Biosynthetically directed fractional 13C-labelling of proteinogenic amino acids. Eur. J. Biochem., 232: 433-448 Schmidt, K., Carlsen, M., Nielsen, J., Villadsen, J. (1997) Modeling isotopomer distributions in biochemical networks using isotopomer mapping matrices. Biotechnol. Bioeng., 55, 6: 831-840

127

Chapter 6

128

Chapter 7

Correcting Mass Isotopomer Distributions for Naturally Occurring Isotopes A shortened version of this chapter is accepted for publication as Van Winden et al. (2002) ABSTRACT In one method of metabolic flux analysis simulated mass spectrometry data is fitted to measured mass distributions of metabolites that are isolated from cultures with defined feeding of 13C-labeled substrates. Doing so, simulated mass distributions must be corrected for the presence of naturally occurring isotopes. A method that was recently introduced for this purpose consists of consecutive correction steps for each isotope of each element in the considered compound. Here we show that all isotopes of each individual element must, however, be corrected in one single step. Furthermore, it is shown that in order to correctly take into account the presence of naturally occurring isotopes, the source of information with respect to isotopic compositions of the elements needs to be chosen with care.

129

Chapter 7 7.1 INTRODUCTION Tracer studies using 13C-labeled substrates have become an important tool for metabolic flux analysis and metabolic network analysis. The flux distribution in the carbon metabolism of a cell can be determined by setting up a mathematical model that simulates the 13C-labeling distributions in all relevant carbon compounds based on the chosen substrate labeling and a set of reaction rates and by fitting the rates that give the best correspondence between the simulated data and a set of actually measured 13C-labeling data. The measurement data used for such studies can be either obtained by NMR spectroscopy (Schmidt et al.,1999), mass spectrometry (Wittmann and Heinzle,1999) or both (Möllney et al.,1999). When fitting simulated to measured mass distribution vectors (mdv), it should be taken into account that most biologically relevant elements occur in two or more stable isotopic forms (see Table 1). To avoid errors in the fluxes determined from these studies, the occurrence of natural isotopes has to be considered. In order to determine the mass distribution of a measured compound that is purely caused by incorporation of 13C-atoms of the labeled substrate one has to subtract the mass contributions of naturally occurring isotopes from the measured spectra. This requires a number of sequential multiple linear regression analyses (Lee et al.,1991). As Wittmann and Heinzle (1999) showed, one can alternatively choose not to correct the measured data but to include the correction for the presence of naturally occurring isotopes in the simulation model which is rather straightforward. The presence of naturally labeled 13C-atoms can easily be taken into account when defining the substrate isotopomer distribution that is used as an input for the simulation model. For example, due to the occurrence of 1.1% naturally labeled 13C, the isotopomer distribution of ‘unlabeled’ ethanol is: 97.8% 12C2-ethanol, 1.1% [1-12C, 2-13C]-ethanol, 1.1% [1-13C, 2-12C]-ethanol and 0.0% 13 C2-ethanol. The isotopomer distribution of a carbon feed consisting of 10% uniformly labeled ethanol and 90% unlabeled ethanol should therefore be defined as 0.9*97.8=88.0% 12 C2-ethanol, 0.9*1.1=1.0% [1-12C,2-13C]-ethanol, 0.9*1.1=1.0% [1-13C,2-12C]-ethanol and 0.9*0.0+0.1*100.0=10.0% 13C2-ethanol. However, as Table 1 shows carbon is not the only atomic species that has natural isotopes. Natural isotopes other than those of carbon have to be considered as well if one is to compare simulated mass distributions with actually measured ones. Especially when GC/MS is employed for measuring mass distributions the correction for naturally occurring isotopes other than carbon is mandatory because commonly used derivatizing agents (e.g. tertbutyldimethylsilyl for organic acids and N-trifluoroacetyl for amino acids (Wittmann and Heinzle,1999)) add many more atoms that may be naturally labeled to an analyzed component. Although the method introduced by Wittmann and Heinzle (1999) offers an elegant solution to the problem of naturally occurring isotopes other than carbon, it contains an error that is corrected in this chapter. The effects of neglecting this error are illustrated by means of two examples on the calculation of mass isotopomer distributions and on the estimation of flux distributions from experimental mass spectrometry data. In addition to that the effects of using different values for the occurrence of natural isotopes are shown.

130

Correcting mass isotopomer distributions TABLE 1:

Isotopic compositions of selected elements

A (Rosman and Taylor, 1998) element H C N O Si P S

Most abundant mass (M) 1 12 14 16 28 31 32

B (Lieser, 1969) element Most abundant mass (M) H 1 C 12 N 14 O 16 Si 28 P 31 S 32

fraction M+0 0.999885 0.9893 0.99632 0.99757 0.922297 1.00000 0.9493

fraction M+1 0.000115 0.0107 0.00368 0.00038 0.046832 0.00000 0.0076

fraction M+2 0.00000 0.00000 0.00000 0.00205 0.030872 0.00000 0.0429

fraction M+0 0.99985 0.98893 0.99634 0.99759 0.9217 1.0000 0.9502

fraction M+1 0.00015 0.01107 0.00366 0.00037 0.0471 0.0000 0.0075

fraction M+2 0.00000 0.00000 0.00000 0.00204 0.0312 0.0000 0.0421

7.2 THEORY Correction method by Wittmann and Heinzle Wittmann and Heinzle (1999) corrected a simulated mass distribution that results from differential 13C-labeling for the presence of isotopes of other elements by multiplication of the mass distribution vector with a separate correction matrix for each possible isotope, i.e. one for 2H, one for 17O, one for 18O, etc. This method is faulty in that all possible isotopes of one element have to be considered in one single correction matrix. This point can be understood by applying the correction procedure for example to the mass distribution of carbon dioxide. According to Wittmann and Heinzle the mass distribution of CO2 that results from the differential 13C-labeling (i.e. fractions ‘M+0’ containing 12C and ‘M+1’ containing 13C) is first corrected by multiplying with a matrix that contains the fractions that shift by zero, one or two Dalton due to the presence of respectively none, one or two 17O isotopes in the molecule. The resulting mass distribution is subsequently corrected by multiplying with a matrix that contains the fractions that shift by zero, two or four Dalton due to the presence of respectively none, one or two 18O isotopes in the molecule. It can easily be seen that the fractions that are shifted by two Dalton in the first correction have to contain two 17O atoms. Since atoms cannot be 17O and 18O at the same time, these must not be corrected anymore for the possible presence of 18O. Likewise, fractions that are shifted by one Dalton in the first correction must only be further corrected for the possible presence of a single, not two, 18O-atoms. Correct method The fact that atoms can only be one isotope at the time is acknowledged by correcting for the possible presence of all isotopes of one element in a single step. In the example of CO2 this is 131

Chapter 7 done by multiplying with a matrix that accounts for mass shift due to either 17O or 18O. For the above example of carbon dioxide the correct correction is shown in Eq.1.

mdv CO2,corr = CCO2,O ⋅ mdvCO2,uncorr mdv CO2,uncorr

(1)

 12 C ( O 2 )  =  13   C ( O 2 ) 

16   O2 0   16 17 16 O O O2    17  16 17 O 2 + 16 O18O O O   CCO2,O = 17 18 17 16 18   O O O2 + O O   18 17 18   O2 O O   18   0 O   2 In Eq.1 the isotopic compositions of the elements between brackets in the mass distribution vectors (mdv) are undefined. The chemical formulas of the isotopologues (i.e. molecular species that have identical elemental and chemical compositions but differ in isotopic content (Hellerstein and Neese,1999)) of the hydrogen and oxygen moieties of formic acid in the correction matrix CCO2,O represent their respective abundances. Dauner and Sauer (2000) presented a general equation to calculate these abundances. An equally general, but simpler form this equation is presented in Eq.2, where the atom species of the considered elemental isotopologue has N naturally occurring isotopes I(1),...,I(N) with natural abundances p(I(1)),...,p(I(N)) and with frequencies f(I(1)),...,f(I(N)) within the isotopologue. f ( I( i ) )   N p ( I (i )) N    abundance =  ∑ f ( I ( i ) )  ! ⋅ ∏  ( 2)  i =1  i =1  f ( I ( i ) )!    Eq.2 shows that the abundance of an isotopologue is the product of the number of possible permutations of the isotopes in the isotopologue and of the abundances of each isotope to the power of its frequency in the isotopologue. E.g. the abundance of 16O17O in Eq.1 is 2 (=nr. of permutations) * (0.99757)1 (=occurrence 16O) * (0.00038)1 (=occurrence 17O) = 7.6×10-4. As an example of the correction of the mass distribution of a molecule that contains two elements other than carbon, the correction of formic acid is presented in Eq.3:

mdv HCOOH,corr,HO = CHCOOH,O ⋅ ( mdv HCOOH,corr,H ) = CHCOOH,O ⋅ CHCOOH,H ⋅ mdv HCOOH,uncorr mdv HCOOH,uncorr

 12 C ( H 2 O 2 )  =  13   C ( H 2 O 2 ) 

CHCOOH,H

132

 1H2 1 2  H H = 2  H2  0 

  1 H2  1 2  H H 2 H 2  0

( 3)

Correcting mass isotopomer distributions

CHCOOH,O

16  O2  16 17 O O   17 16 18  O2 + O O 17 18  O O = 18  O2  0   0   0 

  16 O2 0 0   16 17 16 O O O2 0  17 16 18 16 17 16  O2 + O O O O O2  17 18 17 16 18 16 17  O O O2 + O O O O  18 17 18 17 O2 O O O2 + 16 O 18 O   18 17 18 0 O2 O O  18  0 0 O2  0

0

0

( 3, continued )

Eq.3 shows that in the uncorrected mass distribution vector only the carbon isotopes are defined and that in each subsequent multiplication step, the isotopes of one of the elements become defined. It is also observed that each correction step does not only correct the originally present mass fractions (M+0 and M+1), but also the mass fractions that originate from mass shifts caused by other elements. In other words: the number of columns in each subsequent correction matrix has to equal the number of rows in the previous one. When correcting molecules with many atoms of a given element, the number of possible mass shifts is considerable. The component glucose 6-phosphate, for example, contains 11 hydrogen atoms that may cause mass shifts of 1 to 11 Dalton, 9 oxygen atoms that may cause additional mass shifts of 1 to 18 Dalton (see Table 1). One can, however, restrict the number of mass shifts that are considered (i.e. the number of rows in the matrix Ccompound,element) to only those fractions that are larger than a given minimum value. This generally results in a small number of possible mass shifts due to the relatively rare occurrence of most isotopes in biological molecules. The presented procedure was implemented in Matlab (The Mathworks, Inc.). This program generates the correction matrices based on the chemical formula of a given metabolic intermediate and multiplies these matrices with the simulated mass distribution vector that only accounts for the 13C-labeling. Isotope abundance information Besides using a correct correction method, it is of course important to use correct values for the natural isotope abundance. Literature data on the occurrence of natural isotopes differ markedly as is shown for two different sources in Table 1A and B. The consequences of different values for isotope abundances on metabolic flux analysis are illustrated below. 7.3 PRACTICAL APPLICATIONS Effects of correction method and isotopic composition on corrected mass isotopomer distributions The difference between the old and newly proposed mass distribution corrections is illustrated by the case of tert-butyldimethylsilyl derivative of acetoacetate with the chemical formula C16H35O3Si2 (see Figure 1). The uncorrected mass distribution of the acetoacetate molecule (of which the isotopomer distribution is assumed to be simulated) consists of five fractions stemming from the presence of zero to four 13C-atoms. The 13C-labeling of the derivatizing agent is not simulated, as it is not involved in the metabolism. Therefore, not only the mass 133

Chapter 7 shifts caused by the non-carbon atoms in the derivatized molecule have to be corrected for but also those that are caused by naturally occurring 13C-isotopes for any of the twelve carbon atoms in the derivatizing agent. CH3 CH3 C

CH3

CH3 Si

CH3

O CH3 C

CH

CH O-

O CH3 Si

CH3

CH3 C

CH3

CH3 FIGURE 1:

Structure of tert-butyldimethylsilyl derivative of acetoacetate.

The column ‘none’ of Table 2 shows an arbitrarily chosen uncorrected mass distribution that is caused by 13C-labeling of the acetoacetate moiety of the derivatized acetoacetate molecule. The next two columns of the same table show the mass distributions that follow from the correction by both the method of Wittmann and Heinzle (1999) and the one presented in this chapter. The differences between the uncorrected and the corrected mass distributions clearly show the importance of correcting for the presence of naturally occurring isotopes. Admittedly the differences between the outcomes of both correction methods are relatively small, but it is a systematic error. If one wants to take fully advantage of the measurement accuracy of mass spectrometry all errors, however small, must be corrected if possible. TABLE 2: Uncorrected mass distribution vector of of tert-butyldimethylsilyl derivative of acetoacetate and mass distribution corrected by two different methods and two different sources of isotope abundance information correction method >

none

method Wittnewly promann &Heinzle posed method

newly proposed method

literature source of isotopic compositions >

not applicable

(Rosman and Taylor, 1998)

(Lieser, 1969)

molecular mass \/ 331 (M+0) 332 (M+1) 333 (M+2) 334 (M+3) 335 (M+4) 336 (M+5) 337 (M+6) 338 (M+7) 339 (M+8)

134

(Rosman and Taylor, 1998)

mass distribution \/ 0.6000 0.2000 0.1000 0.0700 0.0300 0.0000 0.0000 0.0000 0.0000

0.4449 0.2521 0.1505 0.0910 0.0454 0.0120 0.0033 0.0006 0.0001

0.4435 0.2527 0.1523 0.0905 0.0452 0.0120 0.0032 0.0005 0.0001

0.4404 0.2538 0.1531 0.0910 0.0455 0.0122 0.0033 0.0005 0.0001

Correcting mass isotopomer distributions The mass distribution of tert-butyldimethylsilyl derivative of acetoacetate was also corrected by the newly proposed method, but using two different sources of information regarding the natural abundance of the various isotopes (see Table 1A and B). The effects of the exact isotope abundances on the corrected mass distributions can be observed in the last two columns of Table 2. In the present example the choice of the values for natural isotope abundance has even a larger effect on the outcomes than the correct correction method. Effects of correction method and isotopic composition on flux parameter estimation from MS data The impact of the two approaches for mass isotopomer correction on the estimation of intracellular flux parameters is demonstrated for a published study on lysine producing Corynebacterium glutamicum in batch culture applying MALDI-TOF MS for labeling analysis (Wittmann and Heinzle, 2001). In this study the flux distribution was estimated from labeling patterns of secreted lysine, alanine and trehalose and stoichiometric data on substrate uptake, product secretion and precursor demand for biomass formation, respectively, using the old correction method by Wittmann and Heinzle (1999). In comparison, the flux calculation performed with the same experimental data and the new isotope correction method presented in this chapter showed very small differences. This is exemplified by the comparison of several key flux parameters of the central metabolism of C. glutamicum. The first two columns of Table 3 display the obtained values for the flux partitioning ratios between pentose phosphate pathway and glycolysis (ΦPPP), between anaplerosis and tricarboxylic acid cycle (ΦPC), between dehydrogenase and succinylase pathway in lysine biosynthesis(ΦDH), and the reversibility of glucose 6-phosphate isomerase (ζPGI) and of bidirectional fluxes between C4 metabolites of the TCA cycle and C3 metabolites of glycolysis (ζPC/PEPCK), previously calculated by Wittmann and Heinzle (2001) and in this work, respectively. TABLE 3: Flux parameter estimation from experimental data of Wittmann and Heinzle (2001) using the old and the newly proposed method for natural isotope correction and two different sources of isotope abundance information correction method > literature source of isotopic compositions > flux parameter \/ ΦPPP ΦPC ΦDH ζPGI ζPC/PEPCK

method Wittmann & Heinzle (Lieser, 1969)

74.4447 36.6043 63.3368 11.6230 0.4919

newly proposed method (Lieser, 1969) estimated values \/ 74.4448 36.6043 63.3315 11.6236 0.4918

newly proposed method (Rosman and Taylor, 1998) 74.2177 36.2762 63.3474 11.6480 0.5105

135

Chapter 7 The differences between the two correction methods in this example were about 0.0001% for ΦPPP and ΦPC, respectively, and 0.001% for ΦDH and thus significantly below deviations to be expected by experimental errors. Also the values for the reversibilities ζPGI and ζPC/PEPCK were almost identical. In the presented example the influence of the isotope correction method can therefore be neglected. This may also hold for other cases, where the analytes are not derivatized such as amino acids, sugars or organic acids, respectively, containing oxygen as the only element with three different stable isotopes. In the case of GC/MS derivatized analytes containing large derivatization residues the influence is more significant. Especially silicon, typically present in GC/MS derivatized compounds, and to a somewhat lesser extent sulfur, contained for example in methionine or cysteine, that exhibit relatively high amounts of both the M+1 and M+2 isotopes should be considered in this context. Further calculations underline the influence of natural isotope abundance data for mass isotopomer correction of input substrate and output metabolite labeling on the results of metabolic flux calculations. Applying the newly proposed isotope correction method, but two different literature sources by Lieser (1969) and Rosman and Taylor (1998) for the occurrence of natural isotopes, the entire flux distribution of C. glutamicum was estimated from the experimental data of Wittmann and Heinzle (2001). Depending on the values chosen for the relative contribution of different isotopes, the obtained flux parameters differed up to 0.3 % for the examined flux partitioning ratios ΦPPP, ΦPC, and ΦDH, respectively (Table 3, second and third column). Differences were also found for reversibilities as shown for ζPGI and ζPC/PEPCK. The effects caused by different values for natural isotope abundances were much higher than those introduced by the two different isotope correction methods. 7.4 CONCLUSIONS Here we show the necessity of correcting simulated mass distributions for the presence of naturally occurring isotopes when these simulations are to be compared to actually measured data sets in metabolic flux analysis. We showed that an elegant method recently introduced for that purpose by Wittmann and Heinzle (1999) is not fully correct in that it fails to recognize that atoms cannot be two different isotopes at the same time. This error has been corrected in an adapted version of the method that we have proposed in this chapter. The method was implemented in Matlab; this software can be obtained from the authors at request. We further point out that care must be taken when choosing literature values for the natural isotope abundance. We recommend the use of the information of Rosman and Taylor (1998) because it is recent information that was obtained from a critical element-by-element evaluation of available literature and because it represents the isotopic composition of the chemicals most commonly encountered in the laboratory (Rosman and Taylor, 1998). Although both relatively small, the effects of an erroneous correction method and incorrect isotope abundances are additive and should be prevented for the sake of accuracy.

136

Correcting mass isotopomer distributions REFERENCES Dauner, M., Sauer, U. (2000) GC-MS analysis of amino acids rapidly provides rich information for isotopomer balancing. Biotechnol. Progr., 16: 642-649 Hellerstein, M.K., Neese, R.A. (1999) mass isotopomer distribution analysis at eight years: theoretical, analytic and experimental considerations. Am. J. Physiol., 276: E1146-E1170 Lee, W.-N., Edmond, J., Byerley, L.O., Bergner, E.A. (1991) Mass isotopomer analysis: theoretical and practical considerations. Biol. Mass Spectrom., 20: 451-458 Lieser K.H. 1969. Einführung in die kernchemie, Weinheim: Verlag Chemie. Möllney, M., Wiechert, W., Kownatzki, D., De Graaf, A.A.. (1999) Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol. Bioeng., 66, 2: 2:86-103 Rosman, K.J.R., Taylor, P.D.P. (1998) Isotopic compositions of the elements 1997. Pure Appl. Chem., 70, 1: 217-235 Schmidt, K., Nørregaard, L.C., Pedersen, B., Meissner, A., Duus, J.Ø., Nielsen, J.Ø., Villadsen,J. (1999) Quantification of intracellular metabolic from fractional enrichment and 13C-13C coupling constraints on the isotopomer distribution in labeled biomass components. Metabol. Eng., 1, 2: 166-179 van Winden, W.A., Wittmann, C., Heinzle, E., Heijnen, J.J. (2002) Correcting mass isotopomer distributions for naturally occurring isotopes. Accepted for publication in Biotechnol. Bioeng. Wittmann, C., Heinzle, E. 1999. Mass spectrometry for metabolic flux analysis. Biotechnol. Bioeng., 62, 6: 739-750 Wittmann, C. Heinzle, E. (2001) Novel approach for metabolic flux analysis - application of MALDITOF MS to lysine-producing Corynebacterium glutamicum. Eur. J. Biochem. 268: 2441-2455.

137

Chapter 7

138

APPLICATIONS

Chapter 8

Metabolic Flux and Network Analysis of

Penicillium chrysogenum Using 2D [13C,1H] COSY Measurements and Cumulative Bondomer Simulation This chapter was submitted for publication in Biotechnol. Bioeng. as Van Winden et al. (2002) ABSTRACT At present two alternative methods are available for analyzing the fluxes in a metabolic network: (1) combining measurements of net conversion rates with a set of metabolite balances including the cofactor balances or (2) leaving out the cofactor balances and fitting the resulting free fluxes to measured 13C-labeling data. In this study these two approaches are applied to the fluxes in the glycolysis and pentose phosphate pathway of Penicillium chrysogenum growing on either ammonia or nitrate as the nitrogen source, which is expected to give different pentose phosphate pathway fluxes. The presented flux analyses are based on extensive sets of 2D [13C,1H] COSY data. A new concept is applied for simulation of this type of 13C-labeling data: cumulative bondomer modeling. The outcomes of the 13C-labeling based flux analysis substantially differ from those of the metabolite balancing approach. The fluxes that are determined using 13C-labeling data are shown to be highly dependent on the chosen metabolic network. Extending the traditional non-oxidative pentose phosphate pathway with additional transketolase and transaldolase reactions and extending the glycolysis with a fructose 6-phosphate aldolase/dihydroxyacetone kinase reaction sequence considerably improves the fit of the measured and the simulated NMR data. The results obtained using the extended version of the non-oxidative pentose phosphate pathway model contradict the common assumption that transketolase and transaldolase reactions are reversible. Superficial inspection of the fits between the simulated and measured labeling data seems to confirm that the model describes the measurements well. Still, based on estimated measurements errors, our fits are rejected on statistical ground. This shows that strict statistical testing of the outcomes of 13C-labeling based flux analysis using realistic measurement errors is of prime importance for verifying the assumed metabolic model.

139

Chapter 8 8.1 INTRODUCTION Several investigations of the metabolic flux distributions of Penicillium chrysogenum have already been carried out in order to investigate which reaction steps in the primary or secondary metabolism limit the penicillin biosynthesis in high-yielding production strains. Jørgensen et al. (1995) analyzed the fluxes during three phases of a fed batch cultivation of a high-yielding former production strain of P. chrysogenum growing on a complex medium. They combined measured uptake and production rates of 33 compounds with a stoichiometric model consisting of 61 internal fluxes, 21 amino acid uptake rates, 49 intracellular metabolites, 7 substrates and 5 metabolic products. They reported that the flux through the pentose phosphate pathway (PPP) increases relative to the flux through the glycolysis when the metabolism shifts from rapid growth and no product formation to slow growth and high penicillin production. Henriksen et al. (1996) used the same strain as Jørgensen et al. (1995) and a slightly adapted version of their stoichiometric model to analyze the fluxes in chemostat cultivations growing at various specific growth rates on a defined medium. They determined the macromolecular composition of biomass growing at each specific growth rate for an accurate analysis of the energy and reduction equivalent requirements for biomass synthesis. They found that their calculated fluxes depended on the assumed synthesis route of cytosolic acetylcoenzyme A for which several options were available. Van Gulik et al. (2000) analyzed the flux distributions of another high-yielding former production strain of P. chrysogenum growing as a chemostat culture with a range of specific growth rates and different defined media that contained either glucose, ethanol, acetate or xylose as the single carbon source and ammonia or nitrate as the nitrogen source. They combined measured uptake and production rates of 14 compounds with a detailed stoichiometric model consisting of 195 intracellular reactions and transport steps and 147 different metabolites, many of which were modeled as distinct cytosolic, mitochondrial and peroxisomal pools. The flux analyses by Van Gulik et al. (2000) revealed that the primary carbon metabolism is not likely to contain potential bottlenecks for penicillin productivity. The availability of NADPH appeared to be an important limiting factor. However, the calculated flux through the PPP depends on the assumed NADPH specificity of a number of enzymes in the catabolism and anabolism. Christensen and Nielsen (2000) were the first to publish a flux analysis of P. chrysogenum based on a 13C-labeling experiment. They grew the fungus as a chemostat culture with a specific growth rate of 0.08 h-1 on defined medium. They analyzed the fluxes by combining the stoichiometric model and biomass composition of Henriksen et al. (1996) with measured uptake and production fluxes and with GC-MS measurements of the 13Clabeling distributions of a number of amino acids in the cell protein. These 13C-labeling data allowed them to leave out uncertain assumptions regarding the cofactor requirements in the individual reactions and thereby obtain more robust estimations of metabolic fluxes. For their specific growth rate of 0.08 h-1 at which the penicillin production rate is very low, Christensen and Nielsen estimated the flux through the oxidative part of the PPP to be 75% of the glucose uptake rate, whereas Henriksen et al. (1996), based on their model including cofactor

140

Metabolic network and flux analysis of P. chrysogenum balances and for a specific growth rate of 0.1 h-1, had previously estimated the same flux to be 61% of the glucose uptake rate. The fact that flux analyses based on stoichiometric models including cofactor balances and measured extracellular net conversion rates give different outcomes than 13C-labeling studies was confirmed by Schmidt et al. (1998). They applied both methods to study the flux entering the oxidative branch of the PPP relative to the glucose uptake rate for a wild type strain of Aspergillus oryzea growing as a chemostat cultivation with a specific growth rate of 0.1 h-1. They used a defined medium containing either ammonia or nitrate, because the different degree of reduction of these two nitrogen sources leads to varying NADPH requirements and thereby to different flux distributions. Using the metabolite balancing method including the cofactor balances they found a relative flux into the PPP of 34% for ammonia and 120% for nitrate. Using, on the other hand, the 13C-labeling method, they found values of 53% respectively 60% for ammonia and nitrate. They attributed the difference to the incompleteness of the NADPH-balance that was used in the metabolite balancing approach. In order to verify the metabolic fluxes that were determined by Van Gulik et al. (2000) based on the metabolite balancing approach including cofactor balances, we performed chemostat cultivations using the same strain of P. chrysogenum, procedures and stoichiometric network, but replaced a fraction of the glucose in the medium by uniformly 13 C-labeled glucose. We measured the 2D [13C,1H] COSY spectra of a number of biomass components and used these data for a new metabolic flux analysis in which the cofactor balances could be removed from the model proposed by Van Gulik et al. (2000). As one of the main goals of this study was to determine the NADPH demand for penicillin biosynthesis, the fluxes were analyzed for a specific growth rate of 0.03 h-1 at which rate the penicillin production is at its maximum according Van Gulik et al. (2000). This is in contrast to the high specific growth rate (0.08 h-1) and low penicillin production rate at which Christensen and Nielsen (2000) analyzed the metabolic fluxes. To our knowledge, this 2D [13C,1H] COSY based flux analysis is the first to be performed for P. chrysogenum. In the presented study the theoretical bondomer concept proposed in Chapter 5 of this thesis finds its first application. 8.2 THEORY Metabolic network model The metabolic network of which the fluxes are to be analyzed consists of the glycolysis and pentose phosphate pathway plus the glucose uptake and hexokinase reaction and effluxes of the various pathway intermediates towards anabolism (i.e. a subset of the network presented in Van Gulik et al.,2000). This network can be systematically reduced by removing ‘linear’ and ‘divergent’ metabolite pools that have only one influx and by lumping metabolite pools that are in isotopic steady state due to fast exchange reactions (Chapter 3 of this thesis). The advantage of this network reduction is that the mathematical model needed to fit the measured 13 C-labeling data is considerably simplified whilst no information is lost, since the fluxes in the original network can be easily deduced from those of the lumped reactions in the reduced network.

141

Chapter 8 The reduced network that is thus obtained is shown in Fig.1-I. It was used as a case study in both Chapters 2 and 5 of this thesis. In Chapter 5 we demonstrated the serious effect on the estimated fluxes of extending this conventional model with the additional transaldolase and transketolase reactions shown in Fig.1-II, for the presence of which convincing evidence had been found in literature. A second extension of the network is formed by the fructose 6phosphate aldolase and dihydroxyacetone kinase reactions shown in Fig.1-III. This potential bypass of the phosphofructokinase and fructose 1,6-bisphosphate aldolase reaction steps in the glycolysis is as yet hypothetical for P. chrysogenum, as no proof of its functioning has been published to this date. The reason for including the network extension in our flux analysis is that the gene coding for fructose 6-phosphate aldolase was recently found in Escherichia coli by Schürmann and Sprenger (2001) and that, if indeed present, this reaction sequence is likely to have an effect on the fluxes that are determined from the 13C-labeling data. This effect is expected because the intermediate dihydroxyacetone of the pathway in Fig.1-III is a symmetrical molecule. Consequently, the 13C-labeling of the first and third carbon position of this triose that is formed from the top half of the fructose 6-phosphate molecule is scrambled (hence the inclusion of scrambling reaction v21 in Fig.1-III). This phenomenon is also known to occur to the symmetrical molecules succinate and fumarate in the TCA cycle (Schmidt et al.,1997). It does not occur in normal glycolysis where fructose 6phosphate is phosphorylated to fructose 1,6-bisphosphate which is subsequently split into two triose molecules that are both asymmetrical due to the attached phosphate group. From here on the metabolic network and its extensions in Fig.1 are referred to as networks I, II and III.

I

II

glc V1 tre

V9

V5

p5p

V12

g6p

III

V15

f6p

his

p5p

V14

V20

f/b

v21

V2

f/b

V8

V16

f/b

V3

e4p

V13

f6p e4p

ery

V7

V18

f/b

tp

dha

f/b

man V10 f6p

s7p

tp tp

V4

v22

V19

s7p

tyr V6

V17

f/b

phe V11

FIGURE 1: (I) The traditional metabolic network, (II) the hypothetical transaldolase and transketolase reactions and (III) the hypothetical fructose 6-phosphate aldolase and dihydroxyacetone kinase reaction. Double headed arrows indicate bidirectional reactions. The corresponding flux symbols represent the forward and backward reaction rate. Full and dotted lines are only used to avoid visual confusion. Grey shaded boxes represent the molecules of which we measured NMR-spectra. For abbreviations see p.1

142

Metabolic network and flux analysis of P. chrysogenum Balanced extracellular fluxes The net conversion rates that were measured during the cultivations were used for flux analysis and data reconciliation using the complete stoichiometric model of Van Gulik et al. (2000) and dedicated steady state flux analysis software (SPAD it, Nijmegen, The Netherlands). The measured rates and mass balances formed an overdetermined system from which all fluxes in network I could be determined. For the 13C-labeling based flux analysis the stoichiometric model was modified by including both a cytosolic and a mitochondrial transhydrogenase like reaction (converting NADH/NADP to NAD/NADPH and vice versa in both intracellular pools). These two reactions removed the reduction equivalent constraints on the flux analysis. Because of the many uncertainties in the ATP-balance (e.g. futile cycles, proton leakage), this energetic constraint had already been removed from the stoichiometric model (Van Gulik et al.,2000). Furthermore, a number of reaction rates were considered as separate forward and backward reactions (indicated by double headed arrows in Fig.1). As a result of these changes an underdetermined system was obtained that could only partially be solved (Chapter 3 of this thesis). Five degrees of freedom remained for model I. These had to be fixed by fitting the simulated 13C-labeling data to a corresponding set of measured data. Further extending the model with the hypothetical reactions in networks II and III led to 7 respectively 2 additional degrees of freedom that had to be fixed using the 13C-labeling data. Cumulative bondomer balances By using a single, uniformly 13C-labeled carbon substrate, the prerequisite for using the (cumulative) bondomer concept was met in this study. For that reason, a 13C-labeling model could be set up in the form of cumulative bondomer balances that were introduced in Chapter 5. The case study presented in that chapter is based on exactly the same metabolic network as network I in this chapter and on the same set of measured 2D [13C,1H] COSY spectra that was used for flux analysis in this study. Therefore the minimal set of cumulative bondomer balances needed to simulate the bondomers for network I is given in Chapter 5. Obviously, the network extensions II and III lead to changes in some of the cumulative bondomer balances. Bondomers are simulated for a specific set of fluxes by solving the cumulative bondomer balances for that set of fluxes and transforming the obtained cumulative bondomers to the corresponding bondomers. In order to compare these simulated bondomers to the actually measured ones, the relative intensities of the peaks in the measured 2D [13C,1H] COSY spectra also need to be converted to their corresponding bondomers. The free fluxes that remain in the metabolic network can then be fixed by iteratively improving the set of fluxes until the sum of covariance weighted squared differences between the simulated and measured bondomers is minimal (Schmidt et al.,1997). 8.3 MATERIALS AND METHODS Strain The strain used was a high-yielding industrial P. chrysogenum (code name DS 12975) that was kindly donated by DSM Anti-Infectives (Delft, The Netherlands). 143

Chapter 8 Cultivation media For the two experiments defined media were used containing glucose as the sole carbon source and either ammonia or nitrate as the sole nitrogen source. The medium was based on the one used in similar fermentations reported by Van Gulik et al. (2000). The composition of the medium used in the ‘ammonia experiment’ was as follows: 3.3 g/l glucose.H2O, 1.40 g/l (NH4)2SO4, 0.32 g/l KH2PO4, 0.20 g/l MgSO4.7H2O, 0.20 g/l KOH, 0.41 g/l phenyl acetic acid (PAA), 0.06 ml/l silicone antifoam agent (BDH, Poole, UK) and 4 ml/l trace element solution. The composition of the trace element solution was given in Van Gulik et al. (2000). The composition of the medium used in the ‘nitrate experiment’ was as follows: 3.3 g/l glucose.H2O, 2.16 g/l KNO3, 0.32 g/l KH2PO4, 1.40 g/l MgSO4.7H2O, 1.84 g/l Na2SO4.10H2O, 0.20 g/l KOH, 0.41 g/l PAA, 0.06 ml/l silicone antifoam agent and 4 ml/l of trace element solution. The two media had the same total nitrogen (0.30 g/l N) and sulfate (1.11 g/l SO4) concentrations. It had previously been determined that for the applied specific growth rate of 0.03 hr-1 the PAA is solely consumed in β-lactam synthesis and not in catabolic reactions. Medium preparation The appropriate amount of PAA was dissolved in 4 l of a KOH solution. After setting the pH at 5.40 using 2N H2SO4, the medium vessel containing the PAA solution was autoclaved for 40 min at 121°C. The other medium components were dissolved in 7 l of demineralized water. This solution was filter-sterilized using a Millidisc filtration system (Millipore, Bedford, MA) and added to the PAA solution in the final medium vessel. The medium vessel was placed on a magnetic stirrer and was allowed to mix for at least 12 h before it was connected to the reactor. The medium vessel was also continuously stirred during the experiment. Chemostat system The chemostat culture system used was as described in Van Gulik et al. (2000), except for the working volume of the reactor that was reduced from 1.8 l to 1.3 l and the aeration rate that was increased from 0.5 to 1.0 vvm (l air/l broth/min). Reactor operation The batch phase and the first part of the chemostat phase were operated as described in Van Gulik et al. (2000), except for the initial amount of medium in the reactor, which was 1.3 l and for the amount of spores which was obtained from 3.0 g rice instead of the previously reported 7.5 g. During the chemostat phase a dilution rate of 0.03 h-1 was applied. When metabolic steady state had been established after 6 to 7 dilution times the feed medium was replaced by a new medium that was chemically identical to the previous medium, but in which 10% of the naturally 13C-labeled glucose was replaced by [u-13C6] glucose (CAS 492-62-6, Cambridge Isotope Laboratory Inc., Andover, MA). The culture was fed with this 13C-labeled medium for another 3.5 dilution times after which more than 95% of the naturally 13C-labeled P. chrysogenum biomass had been replaced by biomass grown on the 10% uniformly labeled medium, i.e. isotopic steady state was nearly established. 144

Metabolic network and flux analysis of P. chrysogenum Analytical procedures and calculation of specific conversion rates Both the analytical procedures and the calculation of the specific conversion rates are identical to those reported by Van Gulik et al. (2000). Biomass sampling and sample handling Biomass samples of approximately 100 ml (≈115 mg DW) were taken for NMR analysis after approximately 50 and 120 h of 13C-labeled medium feed supply. The samples were filtered (glass fiber filter, Gelman Sciences, USA). Filters with cells were washed with 0.9% NaClsolution and demineralized water. After final filtration the biomass was stored at -80°C. Prior to NMR analysis the biomass was freeze-dried and subsequently hydrolyzed in 10 ml 6 N HCl for 16 hours at 110°C. After filtration and evaporation to dryness, the residue was dissolved in 10 ml 0.1 N HCl and the amino acids were adsorbed to an ion exchange resin (Dowex AG 50W X4) and washed with 0.1 N HCl. The amino acids were eluted with 4 N HCl. After evaporation the residue was dissolved in D2O. The presented sample preparation includes separation of proteinogenic amino acids from the remaining biomass components, which eliminates the interference of multiplets of carbon atoms of other compounds. Water extracts of cell components other than proteinogenic amino acids were prepared by heating the biomass in 5 ml H2O to 90∞C for 10 minutes with shaking. After centrifugation, the supernatant was lyophilized and dissolved in D2O. 2D [13C,1H] COSY measurements NMR measurements were performed and the recorded spectra were analyzed by means of the spectral fitting software that was described in Chapter 4 of this thesis. The resulting data are relative intensities of several fine structures that can be observed in the multiplets corresponding to the following carbon positions: phenylalanine-α, -β, histidine-α, -β, -δ, tyrosine-α, -β, -δ, -ε, mannitol-C1/C6, -C2/C5, trehalose-C1, -C2, -C3, -C4, -C5, -C6 and erythritol-C1/C4 (for a detailed explanation of the exact nature of these data see appendix A of Chapter 2). As especially the biomass samples taken after 50 h of 13C-labeled glucose supply were not yet in isotopic steady state, the measured relative intensities were corrected for this as described in Chapter 4. 8.4 RESULTS AND DISCUSSION Fluxes determined using cofactor balances The measured and balanced extracellular rates of the two chemostat cultures are given in Table 1. The carbon and nitrogen balances of both cultures closed within 10%. The dissolved oxygen was >90% of the saturation concentration during both cultivations. Based on these measured extracellular rates and the complete stoichiometric model of Van Gulik et al. (2000) the fluxes of the metabolic network I were determined. The results for the continuous cultures growing on either ammonia or nitrate as the nitrogen source are given in the columns ‘model including cofactor balances’ in Table 2. Based on this metabolite balancing approach in which the 13C-labeling data were not considered, only net fluxes could be determined for the bidirectional reaction steps. All fluxes are normalized to the same glucose uptake rate of 100 arbitrary units. 145

Chapter 8 TABLE 1:

The balanced extracellular rates during the two chemostat cultivations

consumed/produced compound

glucose ammonia/nitrate biomass oxygen carbondioxide penicillin-G 6-aminopenicillinic acid phenylacetic acid orthohydroxyphenylacetic acid 6-oxopiperidine-2-carboxylic acid extracellular peptides extracellular polysaccharides

specific rate (mmol/Cmol biomass/h) in culture grown on NH4+ -1.241*101 -6.493 3.097*101 -3.193*101 3.402*101 4.490*10-1 1.121*10-2 -5.745*10-1 8.052*10-2 5.685*10-2 1.069 3.907

specific rate (mmol/Cmol biomass/h) in culture grown on NO3-1.269*101 -5.775 3.060*101 -2.738*101 4.053*101 1.422*10-1 3.969*10-2 -1.936*10-1 5.146*10-2 3.299*10-2 1.114 2.148

As Table 2 shows, this approach for flux analysis yields a flux into the PPP (v5) of 39% of the glucose uptake rate for P. chrysogenum growing on ammonia as the nitrogen source and 122% for growth on nitrate. By consequence, the phosphoglucose isomerase reaction (v2) runs in the direction of f6p for ammonia, but in the inverse direction for nitrate. In this latter case, the PPP runs in a cyclic mode. The higher flux into the PPP is to be expected when nitrate is the nitrogen source, since nitrate is more oxidized than ammonia and therefore requires more NADPH to be assimilated in biomass and penicillin. Recalculating our relative PPP flux under the assumption that penicillin production is absent yields values of 36% for ammonia and 119% for nitrate. Comparison of these values with those in Table 2 (determined without this assumption) shows that, according to the model of Van Gulik et al. (2000), even at a relatively high penicillin production rate the penicillin biosynthesis only consumes a minor share of the produced NADPH. The recalculated relative PPP fluxes agree very well with those of Schmidt et al. (1998) who applied metabolite balances including cofactor balances to analyze the fluxes in A. oryzea growing on ammonia and nitrate and found values of 34% and 120% for ammonia and nitrate. Both the model of Van Gulik et al. (2000) and the model of Schmidt et al. (1998) included a NADPH-dependent isocitrate dehydrogenase which yields NADPH additionally to the oxidative branch of the PPP. This additional source of NADPH was absent in the P. chrysogenum model of Henriksen et al. (1996), which may explain why they calculated a relative PPP flux of 61% for growth on ammonia. Another explanation may be that Henriksen et al. (1996) assumed the penicillin precursor cysteine to be formed via the transsulfuration pathway that consumes 8 molecules of NADPH per cysteine. The model of Van Gulik et al. (2000) included the direct sulfhydylation pathway for cysteine, which costs 5 molecules of NADPH per cysteine. However, as the total NADPH requirement for penicillin biosynthesis is relatively small, the latter difference between both models will only lead to minor differences in the estimated PPP flux.

146

Metabolic network and flux analysis of P. chrysogenum TABLE 2: Metabolic fluxes for network shown in Figure 1, normalized to a glucose uptake flux of 100 NO3- , network I+II+III, C-labeling based fluxes

39.56 107.33 122.22 40.74

12.89

40.74

10.72

38.61

17.59 2.71 13.37 2.41 2.17

14.74 2.62 10.44 2.33 2.10

100 -34.64 82.61 40.42 108.18 119.89 39.89 30.98 39.89 331.84 37.78 0.00 14.74 2.62 10.44 2.33 2.10

100 22.93 251.68 57.54 119.29 59.48 36.70 0.00 19.75 0.00 0.62 5.42 17.59 2.71 13.37 2.41 2.17 0.00 67.82 16.95 7.23 43.14 90.17 67.30

100 2.36 343.68 52.75 120.51 82.90 17.27 0.00 27.56 0.59 35.74 27.82 14.74 2.62 10.44 2.33 2.10 0.00 41.61 -10.28 38.60 72.19 81.15 48.03

100 41.35 286.46 29.36 125.43 41.06 18.98 0.00 13.61 0.00 6.06 3.00 17.59 2.71 13.37 2.41 2.17 0.00 65.74 5.37 43.26 0.07 93.35 29.66 34.32 0.00 10000 34.33

100 15.07 340.87 29.98 124.75 70.19 38.61 0.00 23.32 63.04 5.93 13.80 14.74 2.62 10.44 2.33 2.10 0.00 115.86 15.29 0.00 8.30 7.47 0.00 27.01 24.32 10000 27.01

13

NH4+ , network I+II+III, C-labeling based fluxes

64.30 126.03 39.00 12.89

100 -16.24 100.19 44.49 106.23 98.65 32.80 20.60 32.80 193.38 30.63 0.00 17.59 2.71 13.37 2.41 2.17

13

NO3- , network I+II, C-labeling based fluxes

100 -37.12

13

NH4+ , network I+II, 13 C-labeling based fluxes

f6p → dha + tp

NO3- , network I, C-labeling based fluxes

s7p + e4p → f6p + p5p

100 b 43.35 c

13

2 p5p → s7p + tp s7p + tp → f6p + e4p p5p + e4p → f6p + tp

NH4+ , network I, 13 C-labeling based fluxes

notes:

g6p → f6p

NO3- , model including cofactor balances

v1 v2, net v2, exch v3 v4 v5 v6, net v6, exch v7, net v7, exch v8, net v8, exch v9 v10 v11 v12 v13 v14 v15 v16, net v16, exch v17 v18 v19 v20, net v20, exch v21 v22

NH4+ , model including cofactor balances

net flux direction

flux in Fig.1 a

a

Bidirectional fluxes in Figure 1 are presented as net and exchange (exch) fluxes in this column. Net fluxes are defined in the direction indicated in the second column. Positive values correspond to net fluxes in the defined direction, negative values correspond to net fluxes in the opposite direction. b Gray cells refer to fluxes that are fixed due to model assumptions. c Using the model including the cofactor balances, only net fluxes are determined for the bidirectional reaction steps.

147

Chapter 8 Note that the values of the fluxes of the anabolic reactions (v9 to v13, shaded rows in Table 2) depend on the nitrogen source, but are identical irrespective of the model that is used. These fluxes that are related to biomass synthesis are uniquely determined by the measured and balanced net conversion rates. In Chapter 3 we introduced a notation where the fluxes of an underdetermined system are expressed as the sum of a term fixed by the measured conversion rates and a term that is the product of the nullspace of the combined stoichiometry and measurement matrices and vector containing the free flux parameters. For the above fluxes v1 and v9 to v13 the corresponding rows in the nullspace (given in Chapter 5) contain only zeros. Measured relative intensities and derived bondomer distributions In order to fit the free fluxes in the models without cofactor balances a number of 2D [13C,1H] COSY spectra were measured, analyzed and corrected for isotopic non-steady state to give the relative intensities given in Table 3. TABLE 3: Measured relative intensities of singlets (s), doublets (d, ‘d*’ indicates the doublet with larger scalar coupling constant), triplets (t) and double doublets (dd) of various carbon positions in amino acids, storage sugars and polyols present in biomass lysate nitrogen source > sample taken after (h) > measured multiplet and fine structure \/ his-α = s d d* dd his-β = s d* d dd his-δ = s d tyr-α = s d d* dd tyr-β = s d* d dd tyr-δ1/δ2 = s d t tyr-ε1/ε2 = s

148

NH4+

NH4+

NO3-

NO3-

50

120

50

120

time corrected relative intensities \/ 0.118 0.114 0.014 0.061 0.073 0.068 0.794 0.756 0.148 0.164 0.140 0.187 0.034 0.008 0.015 0.004 0.294 0.325 0.315 0.338 0.525 0.503 0.530 0.471 0.554 0.516 0.446 0.484 0.130 0.141 0.142 0.169 0.129 0.129 0.135 0.110 0.039 0.034 0.039 0.043 0.701 0.696 0.684 0.679 0.184 0.151 0.183 0.188 0.026 0.018 0.028 0.018 0.688 0.690 0.701 0.721 0.102 0.140 0.088 0.074 0.164 0.175 0.773 0.759 0.064 0.066 0.262 0.291

Metabolic network and flux analysis of P. chrysogenum d t phe-α = s d d* dd phe-β = s d* d dd man-C1/C6 = s d man-C2/C5 = s d t tre-C1 = s d tre-C2 = s d t tre-C3 = s d t tre-C4 = s d t tre-C5 = s d t tre-C6 = s d ery-C1/C4 = s d

0.141 0.132 0.033 0.694 0.154 0.023 0.726 0.098

0.320 0.418 0.156 0.140 0.036 0.668 0.160 0.027 0.708 0.106 0.146 0.854 0.136 0.287 0.577 0.173 0.827 0.142 0.270 0.588 0.274 0.247 0.479 0.157 0.301 0.542 0.099 0.090 0.811 0.114 0.886 0.358 0.642

0.147 0.143 0.041 0.669 0.173 0.027 0.718 0.082

0.300 0.409 0.163 0.131 0.036 0.671 0.181 0.033 0.679 0.107 0.203 0.797 0.200 0.287 0.513 0.260 0.740 0.225 0.292 0.483 0.363 0.240 0.397 0.161 0.325 0.514 0.117 0.087 0.797 0.133 0.867 0.400 0.600

All relative intensities in this table, except for those for the late sample of NO3-, are determined from multiplets that were obtained from a single section of the two-dimensional 2D [13C,1H] COSY spectra. The relative intensities for the late sample of NO3- were determined from multiplets obtained by taking multiple sections. In Chapter 4 of this thesis it was observed that these summed multiplets are in many cases a better representation of the true intensities in the 2D [13C,1H] COSY spectra. This is the reason why only the late NO3data and not those of the early sample were used for further analysis. For NH4+ both the early and late samples were obtained by making single sections of the spectra, for which reason both data sets were used in the analysis. In order to fit the 13C-labeling data with the bondomers that were simulated by means of the cumulative bondomer models, the measured relative intensities of the amino acids and storage sugars were converted to the bondomer distributions of their metabolic precursors. These measurement-derived bondomers are given in the columns ‘meas.’ in Table 4. When comparing Tables 3 and 4 one sees that the multiplets that consist of two overlapping separate multiplets (i.e. tyr-δ1/δ2, tyr-ε1/ε2, man-C1/C6, man-C2/C5 and ery-C1/C4) lead to two separate sets of bondomers. 149

Chapter 8 TABLE 4: Bondomer fractions determined from the relative intensities in Table 3 and simulated using the metabolic models I and I+II measured/simulated >

meas.

meas.

nitrogen source > sample taken after (h) > C-C bonds to determined which from bondomers measured correspond a multiplet (see \/ Table 3) \/ p5p43 = 00 his-α 01 10 11 p5p32 = 00 his-β 01 10 11 p5p1 = 0 his-δ 1 tp12 = 00 tyr-α 01 10 11 tp20 = 00 tyr-β 01 10 11 tp20 = 00 tyr-δ1/δ2 01 10 11 e4p03 = 00 01 10 11 e4p01 = 00 tyr-ε1/ε2 01 10 11 e4p32 = 00 01 10 11 tp12 = 00 phe-α 01 10 11 tp20 = 00 phe-β 01 10 11

NH4+ 50

NH4+ 117

150

0.094 0 0.365 0.541

0.071 0.155 0 0.774 0.145 0 0.855 0

0.084 0.155 0 0.761 0.109 0 0.892 0

0.112 0 0.379 0.509 0.581 0.419 0.084 0.153 0 0.763 0.147 0 0.853 0 0.096 0 0.904 0 0.096 0.904 0 0 0.461 0.539 0 0 0.041 0 0.120 0.840 0.106 0.164 0 0.730 0.122 0 0.878 0

sim. (I) NH4+

sim. (I+II) NH4+

meas. NO3120

bondomer fractions \/ 0.045 0 0.085 0.870 0.139 0.116 0.113 0 0 0 0.392 0.365 0.416 0.469 0.520 0.471 0.412 0.595 0.533 0.588 0.405 0.467 0.088 0.084 0.126 0.126 0.157 0.124 0 0 0 0.787 0.759 0.750 0.088 0.084 0.154 0 0 0 0.912 0.916 0.846 0 0 0 0.121 0.084 0.088 0 0 0 0.879 0.916 0.912 0 0 0 0.046 0.045 0.121 0.954 0.955 0.879 0 0 0 0 0 0 0.498 0.534 0.529 0.502 0.466 0.471 0 0 0 0 0 0 0.046 0.045 0.085 0 0 0 0.066 0.084 0.089 0.887 0.871 0.826 0.088 0.084 0.115 0.126 0.155 0.151 0 0 0 0.787 0.759 0.734 0.154 0.084 0.088 0 0 0 0.846 0.916 0.912 0 0 0

sim. (I) NO3-

sim. (I+II) NO3-

0.063 0 0.069 0.868 0.132 0 0.451 0.417 0.478 0.522 0.108 0.118 0 0.774 0.108 0 0.892 0 0.108 0 0.892 0 0.063 0.937 0 0 0.583 0.417 0 0 0.063 0 0.069 0.868 0.108 0.118 0 0.774 0.108 0 0.892 0

0.065 0 0.086 0.849 0.151 0 0.439 0.410 0.582 0.418 0.110 0.146 0 0.744 0.110 0 0.890 0 0.110 0 0.890 0 0.065 0.935 0 0 0.586 0.414 0 0 0.065 0 0.085 0.850 0.110 0.146 0 0.744 0.110 0 0.890 0

Metabolic network and flux analysis of P. chrysogenum f6p1 = 0 1 f6p5 = 0 1 f6p12 = 00 01 10 11 f6p54 = 00 01 10 11 g6p1 = 0 1 g6p12 = 00 01 10 11 g6p23 = 00 01 10 11 g6p34 = 00 01 10 11 g6p45 = 00 01 10 11 g6p5 = 0 1 e4p1 = 0 1 e4p3 = 0 1 note:

man-C1/C6

man-C2/C5

tre-C1 tre-C2

tre-C3

tre-C4

tre-C5

tre-C6 ery-C1/C4

0.071 0.929 0.071 0.929 0.073 0 0.331 0.597 0.073 0 0.331 0.597 0.105 0.895 0.081 0 0.308 0.611 0.268 0.118 0.118 0.496 0.102 0.341 0 0.557 0.017 0.103 0 0.879 0.031 0.969 0.335 0.665 0.335 0.665

0.260 0.740 0.070 0.930 0.260 0 0.372 0.368 0.070 0 0.100 0.829 0.140 0.860 0.140 0 0.200 0.660 0.315 0.026 0.116 0.544 0.092 0.338 0 0.570 0.038 0.054 0 0.908 0.038 0.962 0.529 0.471 0.046 0.954

0.211 0.790 0.048 0.952 0.211 0 0.391 0.399 0.048 0 0.089 0.863 0.151 0.849 0.151 0 0.280 0.570 0.316 0.114 0.090 0.480 0.098 0.308 0 0.594 0.034 0.064 0 0.902 0.034 0.966 0.534 0.466 0.045 0.955

0.142 0.858 0.142 0.858 0.163 0 0.309 0.528 0.163 0 0.309 0.528 0.213 0.787 0.198 0 0.307 0.495 0.393 0.099 0.099 0.409 0.108 0.369 0 0.523 0.042 0.093 0 0.865 0.055 0.945 0.389 0.611 0.389 0.611

0.349 0.651 0.095 0.905 0.349 0 0.381 0.269 0.095 0 0.104 0.801 0.189 0.812 0.189 0 0.206 0.606 0.375 0.019 0.100 0.506 0.108 0.367 0 0.525 0.051 0.056 0 0.893 0.051 0.949 0.583 0.417 0.063 0.937

0.296 0.704 0.065 0.936 0.296 0 0.392 0.312 0.065 0 0.085 0.850 0.230 0.771 0.230 0 0.303 0.467 0.394 0.139 0.059 0.408 0.116 0.337 0 0.547 0.050 0.066 0 0.884 0.050 0.950 0.586 0.414 0.065 0.935

a

A C-C bond denoted by the subscript ‘0’ denotes a bond that was formed in the biosynthesis of the concerning amino acid or storage sugar and that does not originate from the metabolic precursor.

Due to the fact that the relative intensities from which these bondomer fractions were determined are not separately observable, the derived bondomer fractions are not uniquely determined. The bondomer sets corresponding to overlapping multiplets are mutually dependent. This dependency is accounted for in the corresponding covariance matrices of the bondomers (not shown here). These covariance matrices are derived from the covariance matrices of the relative intensities that, in their turn, are automatically generated when fitting the multiplets using the peakfitting software presented in Chapter 4. The bondomers in Table 4 that equal exactly zero are the ‘zero bondomers’ that were discussed in Chapter 5. In a simulation studies it was found that these bondomers cannot be formed by the reactions in the network versions I and I+II. This knowledge was used as a constraint in the conversion of the measured relative intensities to bondomers. Some of the 151

Chapter 8 bondomers that remain zero for networks I and I+II can, however, be formed by the additional reactions in the doubly extended network I+II+III. For that reason, conversion of the measured relative intensities to bondomers yields different outcomes for this extended network. The ‘measured’ bondomers for this network are given in the column ‘meas.’ in Table 5. TABLE 5: Bondomer fractions determined from the relative intensities in Table 3 and simulated using the metabolic model I+II+III measured/simulated >

meas.

meas.

nitrogen source > sample taken after (h) > C-C bonds to determined which from measured bondomers multiplet (see correspond Table 3) \/ \/ p5p43 = 00 his-α 01 10 11 p5p32 = 00 his-β 01 10 11 p5p1 = 0 his-δ 1 tp12 = 00 tyr-α 01 10 11 tp20 = 00 tyr-β 01 10 11 tp20 = 00 tyr-δ1/δ2 01 10 11 e4p03 = 00 01 10 11 e4p01 = 00 tyr-ε1/ε2 01 10 11 e4p32 = 00 01 10 11

NH4+ 50

NH4+ 117

152

0.094 0 0.365 0.541

0.062 0.150 0.038 0.749 0.145 0 0.855 0

sim. (I+II+ III) NH4+

meas. NO3120

bondomer fractions \/ 0.044 0.010 0.084 0.862 0.112 0.098 0.139 0 0 0 0.379 0.371 0.392 0.509 0.532 0.469 0.581 0.585 0.533 0.419 0.415 0.467 0.077 0.062 0.117 0.149 0.147 0.120 0.030 0.054 0.036 0.744 0.737 0.727 0.154 0.116 0.147 0 0 0 0.846 0.884 0.853 0 0 0 0.121 0.116 0.096 0 0 0 0.879 0.884 0.904 0 0 0 0.096 0.062 0.121 0.904 0.939 0.879 0 0 0 0 0 0 0.511 0.531 0.478 0.489 0.469 0.522 0 0 0 0 0 0 0.024 0.033 0.072 0.068 0.029 0.051 0.068 0.078 0.051 0.840 0.861 0.826

sim. (I+II+ III) NO3-

0.054 0.025 0.071 0.850 0.125 0 0.424 0.451 0.556 0.445 0.098 0.130 0.045 0.727 0.143 0 0.857 0 0.143 0 0.857 0 0.083 0.917 0 0 0.579 0.421 0 0 0.057 0.026 0.075 0.842

Metabolic network and flux analysis of P. chrysogenum tp12 = 00 01 10 11 tp20 = 00 01 10 11 f6p1 = 0 1 f6p5 = 0 1 f6p12 = 00 01 10 11 f6p54 = 00 01 10 11 g6p1 = 0 1 g6p12 = 00 01 10 11 g6p23 = 00 01 10 11 g6p34 = 00 01 10 11 g6p45 = 00 01 10 11 g6p5 = 0 1 e4p1 = 0 1 e4p3 = 0 1

phe-α

phe-β

man-C1/C6

man-C2/C5

tre-C1 tre-C2

tre-C3

tre-C4

tre-C5

tre-C6 ery-C1/C4

0.077 0.152 0.029 0.742 0.109 0 0.892 0

0.098 0.160 0.030 0.712 0.122 0 0.878 0 0.071 0.929 0.071 0.929 0.073 0.165 0.165 0.597 0.073 0.165 0.165 0.597 0.105 0.895 0.081 0.154 0.154 0.611 0.268 0.118 0.118 0.496 0.102 0.341 0 0.557 0.017 0.052 0.052 0.879 0.031 0.969 0.335 0.665 0.335 0.665

0.062 0.147 0.054 0.737 0.116 0 0.884 0 0.145 0.855 0.063 0.937 0.145 0.000 0.472 0.382 0.034 0.030 0.080 0.856 0.108 0.892 0.108 0.000 0.350 0.542 0.322 0.136 0.084 0.458 0.085 0.321 0 0.594 0.025 0.060 0.022 0.893 0.047 0.953 0.531 0.469 0.062 0.939

0.108 0.147 0.028 0.717 0.154 0 0.846 0 0.142 0.858 0.142 0.858 0.163 0.155 0.155 0.528 0.163 0.155 0.155 0.528 0.213 0.787 0.198 0.154 0.154 0.495 0.393 0.099 0.099 0.409 0.108 0.369 0 0.523 0.042 0.047 0.047 0.865 0.055 0.945 0.389 0.611 0.389 0.611

0.098 0.130 0.045 0.727 0.143 0 0.857 0 0.269 0.731 0.084 0.916 0.250 0.019 0.430 0.301 0.057 0.027 0.076 0.840 0.208 0.792 0.194 0.014 0.333 0.459 0.386 0.140 0.068 0.405 0.104 0.351 0 0.546 0.044 0.059 0.021 0.876 0.065 0.935 0.579 0.421 0.083 0.917

Table 6 shows the number of independent data points (bondomers) that are available for the flux analyses for the two nitrogen sources and three network models. The higher number of independent data points for network I+II+III compared to networks I and I+II is explained by the lower number of zero bondomers for the doubly extended network. Note that out of the 94 (ammonia) and 78 (nitrate) bondomers in Tables 4 and 5 only 34 (network I or I+II)/ 38 (network (I+II+III) are independent for ammonia and 28/ 31 for nitrate. This observation stresses the need for covariance weighted fitting in which the correlation between the mutually dependent data is accounted for. 153

Chapter 8 TABLE 6: Number of independent data points (a), number of free fluxes (b), minimal sums of squared residuals (SSres) (c) and the ratio c/(a-b) for the three network versions and two nitrogen sources nitrogen source \/ NH4+

NO3-

network > a) independent data points b) free fluxes c) minimal weighted SSres c/(a-b) a) independent data points b) free fluxes c) minimal weighted SSres c/(a-b)

I 34 5 3706.6 127.8 28 5 1929.8 83.9

I+II 34 12 1756.8 79.9 28 12 1146.5 71.7

I+II+III 38 14 619.8 25.8 31 14 450.2 26.5

Fluxes determined using 13C-labeling data Table 2 contains the fluxes that were found by nonlinear minimization of the sum of covariance weighted squared residuals of the measured and simulated bondomers for the cases of network I, I+II and I+II+III. It is observed that for the nitrate fed culture the fluxes that were found by fitting the 13C-labeling data by means of network model I agree quite well with those that were determined by metabolite balancing including the cofactor balances. For the ammonia fed culture, on the other hand, the 13C-labeling method yields a much larger flux into the PPP (v5) than the cofactor based method (99% versus 39% of the glucose uptake rate). Qualitatively, this finding agrees with the fact that Christensen and Nielsen (2000) found a larger flux into the PPP in their 13C-labeling study of a culture growing on ammonia than Henriksen et al. (1996) found in their cofactor based study (75% versus 61%). The differences found here are, however, much larger. Comparison of the outcomes shows that the 13C-labeling method yields a smaller difference between the flux into the PPP for growth on ammonia and nitrate (99% versus 120%) than the cofactor based method (39% versus 122%). This is consistent with the findings of Schmidt et al. (1998) who also found a much smaller difference when using the 13 C-labeling based method than when using the cofactor based method. In their case, however, this smaller difference mainly resulted from the fact that the 13C-labeling method estimated a much lower flux into the PPP for growth on nitrate. Table 6 gives the key characteristics and outcomes of the optimizations. The ratio c/(ab) in this table has an expectation of 1 in case the model is correct. Assuming that the model is correct, the actually found minimized sums of squared residuals can be tested against the χ2-distribution for (a-b) degrees of freedom. This yields negligible probabilities for the found values of 3706.6 (ammonia) and 1929.8 (nitrate). Therefore, the model of network I is rejected. The same table shows that the network extensions II and III improve the fit considerably. Especially the improvement of the fit by the two additional degrees of freedom that are added with network III is substantial and gives a reason to believe that this hypothetical pathway may actually be active. Still, the ratios c/(a-b) that are found for the fits with the extended networks are too large to accept these network models. Returning to Table 2 it is clear that network extensions II and III also cause considerable changes to the fluxes. Adding additional transaldolase and transketolase 154

Metabolic network and flux analysis of P. chrysogenum reactions causes a drastic decrease of the flux into the PPP both for growth on ammonia and nitrate (99% and 120% for network I; 59% and 83% for network I+II). This tendency was already observed in the case study in Chapter 2. Adding the hypothetical fructose 6-phoshate aldolase/dihydroxyacetone kinase reaction further decreases this flux into the PPP to 41% and 70% for the two nitrogen sources. It is clear that the flux into the PPP is consistently higher for growth on nitrate than on ammonia, but the exact fluxes are found to be highly dependent on the chosen network model. Clearly, it is of prime importance for flux analysis to verify whether the additional non-oxidative PPP reactions actually occur in the cell. The found reversibility of the phosphoglucose isomerase reaction (v2) varies with the chosen network model, but is consistently larger than the net flux of the reaction. The reversibilities of the transketolase and transaldolase reactions are not very consistent, especially when it is considered that reactions v6, v8 and v16 are commonly assumed to be catalyzed by one and the same transketolase (Marx et al.,1997). The fluxes found by applying network I+II+III show that approximately equal amounts of the fructose 6-phosphate are estimated to be converted to glyceraldehyde 3-phosphate by the glycolytic route (v3 = 29 for ammonia, 30 for nitrate) and by the hypothetical route of network III (v20net = v22 = 34 for ammonia, 27 for nitrate). Note that the rate of the label scrambling reaction v21 in network I+II+III is fixed to a value that is two orders of magnitude larger than the glucose uptake rate to ensure complete scrambling of the dihydroxyacetonelabeling. Confidence intervals were not calculated for the fluxes in Table 2, because this is only allowed when the minimized sums of squared residuals of the fits are statistically acceptable. In order to get an impression of the accuracies of the determined fluxes, the covariance matrix of the bondomers that was used for weighing the fits using network I+II+III was scaled such that the minimized sums of squared residuals just equaled the border of the 95% confidence interval. This required scaling of the covariance matrix for the ammonia data with a factor 17.0, leading to a minimized sum of squared residuals of (see Table 6) 619.8/17.0=36.5 and scaling of the covariance matrix for the nitrate data with a factor 16.3, leading to a minimized sum of squared residuals of 450.2/16.3=27.6. Now, standard deviations of the estimated fluxes can be calculated. These values are given with the corresponding fluxes in Table 7. It is observed that the errors in the glucose uptake rate (v1) and anabolic reaction rates (v9-v13), caused by errors in the measurements of the net conversion rates are rather small. At the same time one sees that the estimated transaldolase and transketolase reaction rates are extremely uncertain. The fluxes in the glycolysis (v2-v4) and the flux into the PPP (v5) have much smaller uncertainties. Also the fluxes in the fructose 6-phosphate/dihydroxyacetone kinase reaction sequence (v20 and v22) are estimated within a rather narrow confidence interval. The dihydroxyacetone scrambling reaction (v21) was fixed at a high value and therefore has a standard deviation of zero.

155

Chapter 8 TABLE 7: Metabolic fluxes and standard deviations based on scaled covariance matrices of 13C-labeling data (see text) for network I+II+III

s7p + e4p → f6p + p5p

f6p → dha + tp

standard deviations, NO3-

2 p5p → s7p + tp s7p + tp → f6p + e4p p5p + e4p → f6p + tp

fluxes, NO3-

g6p → f6p

standard deviations, NH4+

v1 v2, net v2, exch v3 v4 v5 v6, net v6, exch v7, net v7, exch v8, net v8, exch v9 v10 v11 v12 v13 v14 v15 v16, net v16, exch v17 v18 v19 v20, net v20, exch v21 v22

direction of net flux

fluxes, NH4+

flux in Fig.1 *1

100.00 41.35 286.46 29.36 125.43 41.06 18.98 0.00 13.61 0.00 6.06 3.00 17.59 2.71 13.37 2.41 2.17 0.00 65.74 5.37 43.26 0.07 93.35 29.66 34.32 0.00 10000 34.33

2.26*10-1 2.63*101 1.16*101 1.05 1.50 4.44 5.51*105 1.65*105 13.16 7.00 4.51*105 1.15*105 2.42*10-1 3.39*10-3 2.00*10-2 3.05*10-3 3.00*10-3 5.03*104 2.83*105 2.21*105 1.68*105 3.35*104 1.12*101 7.81*104 9.63 4.89 0.00 8.22*10-1

100.00 15.07 340.87 29.98 124.75 70.19 38.61 0.00 23.32 63.04 5.93 13.80 14.74 2.62 10.44 2.33 2.10 0.00 115.86 15.29 0.00 8.30 7.47 0.00 27.01 24.32 10000 27.01

1.61*10-1 2.41*101 1.11*101 1.04 9.87*10-1 2.88 5.06*105 4.23*103 2.64*104 1.32*104 5.04*105 2.62*103 1.14*10-1 2.66*10-3 1.35*10-2 2.40*10-3 2.48*10-3 1.61*103 1.21*105 4.98*105 1.19*105 3.81*105 1.32*104 6.74*101 12.47 6.40 0.00 7.05*10-1

Reversibility of non-oxidative PPP reactions We investigated whether the degree of reversibility of the transketolase and transaldolase reactions affected the outcome of the flux analysis by repeating the flux analysis with network I+II+III in which the exchange fluxes of the reactions v6, v7, v8 and v16 in Fig.1 were fixed at either zero or a high value (approximately 100 times the glucose uptake rate). The outcomes revealed that fixing the exchange fluxes at high values seriously deteriorates the goodness of fit. For each of the two nitrogen sources the minimal weighted sum of squared residuals was almost hundredfold increased compared to the values shown in Table 6. Fixing the exchange fluxes at zero, on the other hand, hardly affected the goodness of fit; the minimal weighted sums of squared residuals were increased by only 2 percent compared to the values for the same network shown in Table 6. Due to the reduction of the number of 156

Metabolic network and flux analysis of P. chrysogenum degrees of freedom by 4 (four exchange fluxes were fixed at zero) the ratios c/(a-b) for ammonia and nitrate even decreased from 25.8 and 26.5 (see Table 6) to 22.6 respectively 22.0. Except for the transketolase and transaldolase fluxes in the network, all other fluxes remained close to the value shown for network I+II+III in Table 2. These results suggest that non-reversibility of the transketolase and transaldolase reactions is not a bad assumption. This is in contrast to findings by Marx et al. (1997) and by Follstad and Stephanopoulos (1998) who found that assuming reversibility of the transketolase and transaldolase reactions results in a better fit of the measured 13C-data with the simulated data than assuming non-reversibility. Their conclusions, however, applied to the traditional non-oxidative PPP as shown in Fig.1-I. The present results were obtained using the extended non-oxidative PPP (Fig.1-I+II). Indeed, when we repeated our flux analysis with only network I in which the exchange fluxes were set at zero, we also found a deterioration of the goodness of fit: the ratios c/(a-b) for ammonia and nitrate increased from 127.8 and 83.9 (see Table 6) to 205.1 respectively 199.8. It appears that goodness of fit is equally realized by either a traditional PPP where reversibility is included or an extended PPP where new reactions are added and non-reversibility is assumed. Consequently, the conclusion that the transketolase and transaldolase reactions are reversible may be an artifact caused by incompleteness of the traditional model. If the presented extended version of the nonoxidative PPP is correct, this conclusion may be unjustified. Simulated bondomer distributions The columns ‘sim.’ in Tables 4 and 5 show the optimally fitted bondomers that are found in each optimization. When considering the differences between the measured and simulated bondomers in the tables, it should be reminded that some of the bondomers are fully correlated which means that seemingly large differences may only have minor contributions to the sum of covariance weighted squared residuals. This point may be understood by looking at the very different values of the measured and simulated bondomers of e4p1 and e4p3 for nitrate in Table 5. The values of the four ‘measured’ bondomers for these two carbon-carbon bonds were derived from only two measured relative intensities (0.400 and 0.600) shown in Table 3. Due to the resulting correlations between the sets of bondomers of e4p1 and e4p3, the residuals of the measured and simulated sets vanish when they are weighted by the covariances. This becomes clear when the four simulated bondomers are converted to relative intensities and compared to the measured ones. One finds values of 0.354 and 0.646, much more similar to the actually measured values. One can study the separate contributions of the various multiplets to the covariance weighted sums of squared deviations between the measured and simulated data in Tables 4 and 5. Large contributions could point at non-random errors in measured spectra, or could alternatively indicate which part of the network model may be incomplete. Fig.2 shows the separate contributions of the multiplets to the fit of networks I, I+II and I+II+III for nitrate. It is seen that the largest contributions to the misfit of network I are caused by the multiplets of tyr-β, tyr-ε1/ε2, phe-β, man-C1/C6, tre-C2 and tre-C3. Extending the metabolic model with network II leads to substantial reductions of the contributions of tyr-ε1/ε2, man-C1/C6, tre-C2 and tre-C3, but causes a significant increase of the contribution of tre-C4. Further extension of 157

Chapter 8 the model with network III especially diminishes the contributions of tyr-β, tyr-ε1/ε2 and pheβ and also reduces the misfit of tre-C4.

contribution to minimal weighted SSres

350

I

300

I+II I+II+III

250 200 150 100 50

ty r-b ty r-d 1/ d2 ty r-e 1/ e2 ph ea ph em b an -C 1/ C m 6 an -C 2/ C 5 tre -C 1 tre -C 2 tre -C 3 tre -C 4 tre -C 5 tre -C er 6 yC 1/ C 4

ty r-a

b

d shi

shi

hi

sa

0

multiplet

FIGURE 2: The contributions of the various multiplets to the covaraince weighted sum of squared deviations between the measured and simulated bondomers for networks I, I+II and I+II+III. Data correspond to nitrate fed culture.

In the presently studied network, the contributions to the sum of squared residuals of the various multiplets shown in Fig.2 do not give clear indications as to which part of the network may be incomplete. This is caused by the tightly interconnected topology of the studied network. Missing or erroneous reactions do not only cause badly fitted multiplets of components ‘downstream’ of the considered reaction, but also ‘upstream’, since the ‘downstream’ intermediates can be recycled to ‘upstream’ pools by exchange reactions. The contributions to the weighted sum of squares of network I+II+III do not show any outlyers that clearly indicate non-random errors in any of the multiplets. The fact that the sum of all the contributions is statistically unacceptable may be due either to reactions that are still missing in the network or to a systematic underestimation of the measurement errors (and thus an overestimation of the covariance weighted minimal sum of squared residuals) by the peakfitting software.

158

Metabolic network and flux analysis of P. chrysogenum

NO 3

1

1

0.9

0.9

0.8

0.8

sim ulated relative intensities

sim ulated relative intensities

NH4

0.7 0.6 0.5 0.4 0.3 0.2

0.7 0.6 0.5 0.4 0.3 0.2

0.1

0.1

0

0 0

0.2

0.4

0.6

m easured relative intensities

0.8

1

0

0.2

0.4

0.6

0.8

1

m easured relative intensities

FIGURE 3: Parity plots of the measured relative intensities and the relative intensities calculated from the bondomers that were simulated using the network model I+II+III.

Analysis of the measurement errors The goodness of fit of the data at the level of relative intensities is illustrated by Fig.3 where the measured relative intensities are plotted against the corresponding values that were calculated from the bondomers that were simulated with network I+II+III both for ammonia and for nitrate. The maximal and average absolute deviations between the measured and simulated data in these plots are 0.058 and 0.015 for ammonia and 0.046 and 0.014 for nitrate. The absence of any substantial deviations shows that in order to reject these fits the estimated measurement errors must be small. Ignoring the covariances of the relative intensities that are taken into account in our fitting procedure, the estimated standard deviation of our relative intensities is 0.006 ± 0.003 both for the ammonia and nitrate data. These measurement errors were computed by our peakfitting software (Chapter 4 of this thesis) by multiplying the sensitivities of the fitted peak areas with the residual NMR noise. Consequently, the measurement error is not expected to be proportional to the (normalized) relative intensities, which is confirmed by the absence of any correlation between the relative intensities and the corresponding standard deviations. This also agrees with the error model for the same type of 13C-labeling data proposed by Dauner et al. (2001) who assumed an error of approximately 2% that is proportional to the unscaled NMR signal and is therefore constant with respect to the normalized relative intensities. Based on their error model Dauner et al. found a weighted sum of squared residuals nearly lying within the 95% confidence interval for one of their fits. The average absolute deviations between the corresponding measured and simulated data was 0.015, which is almost identical to our best fits (0.015 for ammonia and 0.014 for nitrate). The fact that our model is statistically rejected stems from the fact that our 159

Chapter 8 statistical testing is more strict: as said our average error is 0.006, i.e. more than threefold smaller than the error of 2% that was used by Dauner et al (2001). Furthermore, in our fit covariances are taken into account. This shows that the acceptation or rejection of the model is very much dependent on the assumed measurement errors. Whereas Dauner et al. derived their error from the comparison of a number signals of amino acids that originate from the same precursor and assumed this error to be applicable to all other multiplets, we estimated our measurement errors for each individual NMR-multiplet. The correct model structure will have to be confirmed by comparing multiple complete data sets for a single organism and growth condition. 8.5 CONCLUSIONS In this study penicillin producing Penicillium chrysogenum was grown in a chemostat culture with either ammonia or nitrate as the nitrogen source in order to study the NADPH requirements under both conditions. The cultures were fed with uniformly 13C-labeled glucose so that the 2D [13C,1H] COSY spectra of a number of biomass components could be measured. The measured net conversion rates of these cultivations were combined with the metabolite mass balances including the cofactor balances to determine the fluxes in the glycolysis and pentose phosphate pathway. The measured 13C-labeling data offered an alternative way to determine the fluxes without the need for including cofactor balances. Comparison of the outcomes of the two methods confirmed earlier observations (Schmidt et al.,1998) that 13C-labeling studies yield different estimations of metabolic flux distributions than metabolite balancing including cofactor balances. The best fit of the 13C-labeling data that was obtained using the conventional network model of the glycolysis and pentose phosphate pathway had to be rejected on statistical grounds. In order to investigate the possibility of modeling errors, the model was extended by additional transketolase and transaldolase reactions for which evidence had been presented in the past (Chapter 2 of this thesis) and by a fructose 6-phosphate reaction that was recently discovered in glycolysis in Escherichia coli (Schürmann and Sprenger,2001). Although these network extensions resulted in clear improvements of the goodness-of-fit of the measured 13 C-labeling data by the simulation model, the best model still yielded a minimized sum of squared residuals that was too large to accept the model. The widely varying flux distributions that were found when fitting 13C-labeling data using different metabolic network models emphasize the fact that true fluxes are only found when the biochemistry of the studied metabolic network is completely known. This disadvantage can be turned into an advantage by employing the 13C-labeling technique as a tool to determine the exact biochemistry by comparing the goodness-of-fit using the various model alternatives. This latter approach requires the availability of reliable measurement errors. Finally, this study has demonstrated the practical use of the recently introduced cumulative bondomer concept for the purpose of metabolic flux analysis from 2D [13C,1H] COSY data.

160

Metabolic network and flux analysis of P. chrysogenum REFERENCES Christensen. B., Nielsen, J. (2000) Metabolic network analysis of Penicillium chrysogenum using 13Clabeled glucose. Biotechnol. Bioeng., 68, 6: 652-659 Dauner, M., Bailey, J.E., Sauer, U. (2001) Metabolic flux analysis with a comprehensive isotopomer model in Bacillus subtilis. Biotechnol. Bioeng., 76, 2: 144-156 Follstad, B., Stephanopoulos, G., (1998) Effect of reversible reactions on isotope label redistribution. analysis of the pentose phosphate pathway. Eur. J. Biochem., 252, 3: 360-371 Henriksen, C.M., Christensen, L.H., Nielsen, J., Villadsen, J. (1996) Growth energetics and metabolic fluxes in continuous cultures of Penicillium chrysogenum. J. Biotechnol., 45: 149-164 Jørgensen, H., Nielsen, J., Villadsen, J., Møllgaard, H. (1995) Metabolic flux distributions in Penicillium chrysogenum during fed-batch cultivations. Biotechnol. Bioeng., 46, 2: 117-131 Marx, A., Striegel, K., De Graaf, A.A., Sahm, H., Eggeling, L. (1997) Response of the central metabolism of Corynebacterium glutamicum to different flux burdens. Biotechnol. Bioeng., 56, 2: 168-180 Schmidt, K., Carlsen, M., Nielsen, J., Villadsen, J. (1997) Modeling isotopomer distributions in metabolic networks using isotopomer mapping matrices. Biotechnol. Bioeng., 55, 6: 831-840 Schmidt, K., Marx, A., De Graaf, A.A., Wiechert, W., Sahm, H., Nielsen, J., Villadsen, J. (1998) 13C Tracer experiments and metabolite flux analysis: comparing two approaches. Biotechnol. Bioeng., 58, 2&3: 254-262 Schürmann, M., Sprenger, G.A. (2001) Fructose-6-phosphate aldolase is a novel class I aldolase from Escherichia coli and is related to a novel group of bacterial transaldolases. J. Biol. Chem., 276, 14: 11055-11061 Van Gulik, W.M., De Laat, W.T.A.M., Vinke, J.L., Heijnen, J.J. (2000) Application of metabolic flux analysis for the identification of metabolic bottlenecks in the biosynthesis of penicillin-G. Biotechnol. Bioeng., 68, 6: 602-618 Van Winden, W.A., Van Gulik, W.M., Schipper, D., Verheijen, P.J.T., Krabben, P., Vinke, K., Heijnen, J.J. (2002) Metabolic flux and metabolic network analysis of Penicillium chrysogenum using 2D [13C,1H] COSY measurements and cumulative bondomer simulation. Submitted for publication in Biotechnol. Bioeng.

161

Chapter 8

162

Chapter 9

Metabolic Flux and Network Analysis of Saccharomyces

cerevisiae Using 2D [13C,1H] COSY and LC-MS Measurements ABSTRACT A new method for measuring 13C-labeling in the primary carbon metabolism is presented here: supplying 13C-labeled substrates to a continuous culture during only one hour, followed by rapid sampling of the fermentation broth, immediate quenching of the metabolism, boiling ethanol extraction and direct LC-MS measurement of the mass isotopomer distributions of 16 intermediates of the glycolysis, pentose phosphate pathway, anaplerotic reactions, glyoxylate shunt and TCA cycle. Besides allowing a very short labeling of the culture, this method also requires very little biomass for the analysis and does not rely on assumed biosynthesis routes, in contrast to the common GC-MS and NMR methods both of which do not directly measure the labeling of primary metabolites, but of polymeric biomass components that are synthesized thereof. The method is used to determine the 13C-distribution in the Saccharomyces cerevisiae strain CEN.PK113-7D grown in chemostat culture with D=0.1 hr-1 and a medium containing 90% (w/w) glucose and 10% (w/w) ethanol as carbon sources. The measured 13C-labeling data are fitted using a detailed, compartmented model of the primary metabolism. The outcomes include the flux distribution in the network and the cytosolic and mitochondrial fractions of the metabolites that are present in both compartments. The estimated flux distribution is compared to one that is found by fitting a second data set that was obtained using the same strain and cultivation conditions, but a different measurement technique: 2D [13C,1H] COSY measurement of amino acids in biomass protein that was grown to isotopic steady state on 13C-labeled medium. When separately fitted the MS and NMR data sets yield different estimated flux patterns. A simultaneous fit of the two data sets leads to a larger total deviation than the sum of the two separate fits. This observation, plus statistical analyses of the fits of the separate data sets indicate that the current model is not yet complete. The merits of both methods with respect to the estimation of several metabolic model parameters and the differences between the methods are discussed.

163

Chapter 9 9.1 INTRODUCTION The central carbon metabolism consists of a large number of reactions relative to the number of carbohydrate-intermediates. Consequently, analysis of the steady state fluxes is often hindered by the fact that the metabolite balances leave more fluxes undetermined than can be compensated for by measuring extracellular consumption or production rates. In eukaryotic cells, the situation is even worse, as the number of (reversible) transport rates across the mitochondrial membrane and reactions taking place both in the cytosol and in the mitochondrion often increases more rapidly than the number of metabolite balances of compartmented pools. An important first step in solving the problem of undetermined fluxes is to reduce the number of fluxes by demonstrating absence of activity of certain enzymes under specific nutritional regimes (De Jong-Gubbels et al.,1995) or by demonstrating the unique compartmental localization of certain enzymes (Chaves et al.,1997). The remaining undetermined fluxes may be resolved by adding metabolite balances of metabolic cofactors such as ATP and NAD(P)H. A disadvantage of the latter approach is that it does not elucidate the fluxes in reversible or parallel reaction pathways. Moreover, it introduces many uncertainties in the determined fluxes due to doubtful cofactor specificities and estimated energetic parameters such as P/O-ratio, and growth-dependent and independent maintenance. For the latter reasons, an increasing number of metabolic flux analyses is based on an additional source of data regarding the in vivo flux distribution (‘fluxeome’): 13C-labeling data of metabolites formed from an uniformly or specifically 13C-labeled medium substrate. Two recent publications reported the application of 13C-labeling experiments to elucidate the fluxeome in the eukaryotic organism Saccharomyces cerevisiae. Gombert et al. (2001) used Gas Chromatography Mass Spectrometry (GC-MS) measurements of amino acids in biomass grown on 1-13C1-glucose to compare the fluxes in cells grown under high (batch culture) and low glucose (chemostat culture) concentrations and in cells with and without the gene coding for Mig1p, a protein known to play a role in glucose sensing by yeast. Maaheimo et al. (2001) employed 2D [13C,1H] Correlation Nuclear Magnetic Resonance Spectroscopy (COSY NMR) and u-13C6-labeled glucose, to determine flux ratios in exponentially growing cells in aerobic and unaerobic batch cultures. The results of the aerobic batch cultures of Gombert et al. and of Maaheimo et al. should give comparable outcomes, since both experiments were carried out with the same strain CEN.PK113-7D and same glucose concentration and in both experiments cells were harvested when growing at their maximum growth rate. The main differences were that Maaheimo et al. used shake flasks and undefined yeast nitrogen base medium whereas Gombert et al. used a pH and DO-controlled bioreactor and defined medium containing ammonia as the nitrogen source. Although comparison of the results of both studies is hindered by lack of data regarding the isotopic exchange between the cytosolic and mitochondrial pyruvate in the model by Gombert et al., some important flux ratios agree well. E.g. it is reported that 0-4% (Maaheimo et al.) versus 2.5% (Gombert et al.) of the phosphoenolpyruvate is formed from pentoses and 88-100% (Maaheimo et al.) versus 82% (Gombert et al.) of the cytosolic oxaloacetate stems from cytosolic pyruvate.

164

Metabolic network and flux analysis of S. cerevisiae In this study, we compare two fluxeomes of the CEN.PK113-7D strain of S. cerevisiae that were determined from two separate aerobic, carbon-limited chemostat cultures in which culturing conditions were kept as constant as possible, but different 13C-labeled medium substrates and measurement techniques were used. The first technique is the same one as employed by Maaheimo et al. (2001), namely 2D [13C,1H] COSY of amino acids in the biomass. The other technique, which has by our knowledge not been reported before, is direct Liquid Chromatograpy (LC) MS analysis of the mass isotopomer distribution of nearly all intermediates of the primary metabolism: glycolysis, pentose phosphate pathway (PPP), glyoxylate shunt and tricarboxylic acid (TCA) cycle. Thus far, direct measurement of the 13C-labeling distribution of the central carbon metabolism intermediates was impeded by their high turnover rates and low concentrations (Szyperski,1998;Dauner and Sauer,2000). For this reason the 13C-labeling pattern of the intermediates was inferred from NMR and MS measurements of accumulating biomass components such as amino acids and storage sugars that were synthesized from the intermediates via well-known pathways. A recently developed rapid sampling, rapid quenching and metabolite extraction protocol (Lange et al.,2001b) combined with the high sensitivity of MS when compared to NMR (Szyperski,1998) allowed us to directly access the 13 C-labeling state of the metabolic intermediates and to check their consistence with NMR measurements of the amino acids in cell protein. 9.2 THEORY Metabolic network model The primary carbon metabolism of S. cerevisiae was modeled on the basis of the network models published by Maaheimo et al. (2001), Gombert et al. (2001) and Lange (2002). The metabolic model that consists of a cytosolic and mitochondrial compartment is given in Fig.1. Clearly, this metabolic model is a lumped form of the true metabolic network. This lumped network was obtained by applying the network simplification rules presented in Chapter 3, which state that all metabolite pools with only one influx can be removed from the model and that all metabolite pools that may be assumed to be in isotopic equilibrium due to fast exchange reactions can be lumped into one single pool. Fig 1-I shows upper part of the network (from here on: ‘upper network’) the glycolysis (v1 to v4) and the PPP (v9 to v12). Additionally to the traditional reactions in both these pathways, Fig.1-I includes an as yet unproven by-pass of the glycolytic phosphofructokinase and fbp aldolase reactions formed by the f6p aldolase (v5) and dha kinase (v7) reactions that were also discussed in Chapter 8. Fig.1-II shows an extension of the traditional PPP by four transketolase (v13 to v16) and two transaldolase (v17, v18) reactions, for the existence of which evidence has been presented in Chapters 2 and 8. The network of Fig.1-II is in fact superimposed on the one in Fig.1-I and is shown separately only for reasons of visual clarity. The anabolic consumption of various metabolic intermediates in the upper network is modeled as a set of effluxes (v20 to v23). The network of Fig.1-III encompasses the lower part of the primary metabolism (from here on: ‘lower network’). Shown are the reactions around the pyr node including the anaplerotic pyr carboxylase reaction (v25), the pyr decarboxylase by-pass (v26, which lumps 165

Chapter 9

v24

I

glc v1 v23

g1p

v19, towards network III

CO2

v21

g6p v2

v23+v24

II v14

p5p

v9

p5p

f/b

f6p

v12

v3

v13

f6p

v22

f/b

v16

f/b

e4p

e4p v10

f/b

v5

fbp

f/b

v11

f/b

v4

dha

v17

f/b

s7p

tp

v7

v15

v18

s7p

tp

v6 v20

cytosol

v8, towards network III

III

eth

v8, from network I

pyr

v19, from network I

v26

accoa

v29f/b

v28

v44 v25

CO2

v46

v27

v39

cit

oaa v43

v51

v32

pyr

accoa

v49

v31f/b v50

v47+v51 v30f/b

v45

v33 v38 v52

oaa v42f/b

v48+v52

v36

v37f/b

mitochondrion v34

mal v35f/b

cytosol

CO2

cit

v40

suc

v41

mal FIGURE 1: The compartmented metabolic network model of wild type Saccharomyces cerevisiae. I+II: ‘upper network’, III: ‘lower network’. Double-headed arrows indicate reversible fluxes, for directions of respective forward and backward fluxes see Tables 5 and 9. For names of compounds see list of abbreviations at p.1. In this figure the mal pool includes fum. The gray shaded metabolite pools in networks II and III coincide with the corresponding pools in network I.

166

Metabolic network and flux analysis of S. cerevisiae ald dehydrogenase and acoa synthetase), the transmitochondrial membrane transport of pyr (v29), the mitochondrial pyr dehydrogenase reaction (v32) and the mitochondrial, unidirectional malic enzyme reaction (v38). Fig 1-III further shows the lumped mitochondrial TCA cycle (v33 to v35 and v37) and the transmitochondrial membrane transport of acoa (v28) and oaa (v30). Consistently with Maaheimo et al. (2001) and Gombert et al. (2001) transport of acoa was modeled as unidirectional (from cytosol to mitochondrion) and transport of pyr and oaa and the conversion of suc to mal and of mal to oaa are assumed bidirectional. In contrast to the models of S. cerevisiae presented by Maaheimo et al. (2001) and Gombert et al. (2001), the model presented here also includes the lumped glyoxylate shunt (v39 and v40) for reasons that will be explained in the Results and Discussion section. The glyoxylate shunt leads via cytosolic cit to cytosolic suc and mal, for each of which transmitochondrial membrane transporters have been identified (Kaplan et al.,1995;Palmieri et al.,1999&2000). The transport of cit (v31) was modeled as bidirectional in order to allow cit to be produced in the cytosol, imported in the mitochondrion and converted via the TCA cycle, or alternatively, to be produced in the mitochondrion, exported to the cytosol en converted via the glyoxylate shunt. The transport of suc (v40) and mal (v41) were modeled unidirectional (from cytosol to mitochondrion) to keep the number of degrees of freedom of the model limited and because there does not seem to be any physiological reason to export mitochondrial suc and mal to the cytosol when glucose is present (i.e. no gluconeogenesis occurs). Palmieri et al. (2000) argue that anaplerotic uptake of dicarboxylates is indeed unidirectional and proceeds by the action of the Pi-coupled dicarboxylate carrier. The suc/fum carrier that exchanges cytosolic suc for mitochondrial fum was not included in the present model as it is expected that this carrier has a role in gluconeogenesis (Palmieri et al.,2000). The anabolic consumption of various metabolic intermediates in the lower network was modeled as a number of effluxes (v44 to v50). Following the network simplification rules, the TCA cycle intermediate akg was removed from the model. Its anabolic consumption was included in the efflux of mitochondrial cit. The CO2 produced in the formation of akg from cit in this lumped biosynthesis reaction, was accounted for in the 13C-labeling balance of CO2. The three sub-networks in Fig.1 furthermore contain two 13C-labeling scrambling ‘reactions’ for the symmetric intermediates dha (v6) and mal (v36). In the model the mal pool was lumped with the fum pool because the mass fractions were observed to be nearly identical (see Results and Discussion). Finally, three metabolic intermediates in Fig.1 have an influx that does not stem from another intracellular metabolite in Fig.1: g1p (v24), mitochondrial pyr (v51) and oaa (v52). These influxes were indispensable for the fitting of the observed mass fractions of these three intermediates as will be explained in the section Results and Discussion. Compartmentation: 2D [13C,1H] COSY spectra of biomass components versus mass spectra of metabolic intermediates The two types of 13C-labeling data that were measured in the presented experiments, 2D [13C,1H] COSY and MS data, have a different information content with respect to the fluxes in cells that contain subcellular compartments that are separated by impermeable membranes, 167

Chapter 9 such as the eukaryote S. cerevisiae that is studied in this chapter. The difference between the two methods is illustrated by Fig.2. 2D [13C,1H] COSY data reflect the 13C-labeling pattern in polymeric biomass components. These components (amino acids as well as storage and wall sugars) are commonly made from a metabolic precursor pool in a specific cell compartment. In case the subcellular localization of the respective biosynthetic pathway is known, the NMR data thus contain information about fluxes entering and leaving the precursor pool in the concerning cell compartment. A disadvantage of the NMR data is that errors in the assumed compartment from which the precursor originates or in the biosynthetic pathway leading to the measured compound propagate to cause errors in the estimated fluxes.

NMR

ala

pyr

ala

pyr

MS

FIGURE 2: A compartmented cell consisting of a cytosolic (white) and mitochondrial (gray) compartment. The small shaded circles indicate carbon atoms that are 13C-labeled to a varying extent. Since alanine is only synthesized in the mitochondrion, NMR data of biomass protein only contain 13 C-labeling information of mitochondrial pyruvate. MS data contain 13C-labeling information of pyruvate in both compartments.

Mass spectra, on the other hand, directly reflect the 13C-labeling pattern of the intermediates of primary metabolism. No knowledge about biosynthetic pathways is needed to interpret this type of data. The mass fractions, however, contain 13C-labeling information about cellaveraged metabolite pools, irrespective of the compartment where they are localized. Therefore the interpretation of MS data requires knowledge of the molar fractions of the measured metabolites in each of the compartments. An experimentally demanding, if not impossible, way of solving this problem is to cautiously lyse the cells and to physically separate the various compartments prior to measuring their mass spectra (Szyperski,1998). The dependence of the interpretation of mass spectra on the mole fractions of a given compound in various compartments can be turned into an advantage by using it as a tool to determine the molar fractions in compartments. Consider for example the pyr pool in Fig.1, which is known to be present both in the cytosol and in the mitochondrion. When determining the fluxes in the metabolism of S. cerevisiae using cumomer modeling, the measured mass distribution vector (mdvmeas) of pyruvate is fitted by the corresponding simulated distribution (mdvsim). In a compartmented model, two separate mass distributions are simulated for the cytosol and mitochondrion. These have to be combined to a mole fraction-averaged overall mass distribution in order to fit them to the measured one: 168

Metabolic network and flux analysis of S. cerevisiae mdv meas ≈ α ⋅ mdv sim,cytosol + (1-α ) ⋅ mdv sim,mitochondrion

(1)

In Eq.1 the parameter α represents the cytosolic mole fraction of pyr. As this fraction is unknown, it needs to be fitted simultaneously with the fluxes. This means that of the measured mass distribution of pyr, one data point is needed to fit this parameter α, whereas the remaining data points are used to fit the fluxes. Two-step fit The metabolic fluxes in the networks shown in Fig.1 were determined in a two-step procedure. First, the fluxes v1 to v24 in the upper network (Figs.1-I+II) were determined from the measured extracellular rates entering or leaving the intermediate pools in that part of the network and 13C-labeling data that (indirectly) contain information about the labeling distribution of the concerning metabolites. The fluxes (v8 and v19) and isotopomer distributions leaving the tp and CO2 pools that were determined in this fit were used as input for the next step. In this second step, the mentioned input of the upper network was combined with the measured extracellular rates and 13C-labeling data concerning the lower network (Fig.1-III) to determine fluxes v25 to v52 plus the cytosolic mole fractions of the compartmented metabolite pools of pyr, cit, suc and fum/mal. The two-step fit is allowed owing to the fact that there are no reactions that lead back from intermediates of the lower part to intermediates of the upper part. The reasons to opt for this approach were twofold. Firstly, the metabolic network structure in Fig.1 was set up in an iterative fashion by testing whether a given network structure allowed a statistically acceptable fit of the measured 13C-labeling data and if not, by adapting the network (see Fig.7 of Chapter 1). In the iteration it was found that by far most network uncertainties were located in the lower network. This was mainly due to the compartmentation of that part of the network. Since setting up a new network structure and performing a flux fit is a timeconsuming procedure, we decided to fit the fluxes in the upper network first and to use the results of that fit for subsequent iterative improvement of the lower network. Apart from the time-saving, fitting the fluxes in the upper network independently from those in the lower network made the results of the upper network insensitive to possible modeling errors in the lower part. Note that the outcomes of two separate fits are not necessarily identical to the outcome of one single fit, since in one single fit the lower network forces the output from the upper network towards values that allow the best possible fit of its fluxes and 13C-labeling data. Therefore, the two-step approach is useful for metabolic network analysis purposes, but should preferably be followed by a single fit of the final network model. This still remains to be done for the present model. Flux fitting In order to analyze the number of free fluxes in both the upper and the lower network model the corresponding stoichiometry matrices (S) are combined with measurement matrices (R) that contain a row with a unity entry for each (indirectly) measured rate. The combined stoichiometry and measurement matrices are used to find a generalized solution v for the underdetermined flux balances as described in Chapter 3: 169

Chapter 9 #

S  0  S v =   ⋅   + null   ⋅ β ( 2)  R   vm  R where ‘#’ denotes the pseudo inverse and ‘null’ the null space of the combined stoichiometry and measurement matrix. The first term in the generalized flux solution in Eq.2 is fully determined by the measured rates (vm) and the second term is a linear combination of the columns spanning the null space of the full set of flux balances. The vector β contains the free flux parameters that need to be fitted to the measured 13C-labeling data. The free flux parameters are fitted in an iterative fashion by substituting values in Eq.2 and using the resulting set of fluxes as input for a the cumomer balances (Wiechert et al.,1999) of the metabolic network. The calculated cumomers are transformed to the actually measured 13C-labeling data as described in Möllney et al. (1999). The covariance weighted sum of squared deviations between the simulated and measured data is used as the target function in a non-linear minimization procedure. The entire simulation and optimization are performed in Matlab (The MathWorks Inc., Natick, MA, USA). 9.3 MATERIALS AND METHODS Culturing conditions • In experiment I the haploid, prototrophic S. cerevisiae strain CEN.PK-113.7D was cultured to yield 13C-labeled biomass for 2D [13C,1H] COSY analysis. The aerobic, carbon-limited chemostat culture was performed at a working volume of 1.5 l in a 2-l fermentor (Applikon Dependable Instruments, Schiedam, The Netherlands). The pH was controlled at pH 5 (BioController ADI 1030, Applikon Dependable Instruments), the temperature was controlled at 30°C (Thermostat WK 230, Lauda, Lauda-Königshofen, Germany), the stirrer speed was 800 rpm and the aeration rate 0.35 vvm (l air/l broth/min). Cultivation proceeded at a constant dilution rate of 0.10 hr-1. The medium that was supplied was the defined mineral medium of Verduyn et al. (1992) in which the ammonium sulfate concentration was halved. The dual carbon source was 3 g/l glucose and 0.3 g/l ethanol. Nitrogen was solely provided in the form of ammonia. Steady state conditions were routinely checked by off-line measurements of the biomass concentration through filtration on nitro-cellulose filters (pore size 0.45 µm, Gelman Science, Ann Arbor, MI., USA) and by on-line measurements of the off-gas concentrations of O2 (Servomex 1100A Oxygen Analyser, Taylor Servomex, Crowborough, UK) and CO2 (Beckman 864 infrared detector, Rosemount Analytics, Santa Clare, CA., USA). Gas flow rates were measured using a Saga Digital Flow Meter (Ion Science, Cambridge, UK). The carbon balance closed within 5%, the nitrogen balance within 2% and the redox balance within 4%. When the chemostat culture in experiment I had reached metabolic steady state after about 4 to 5 residence times, the unlabeled medium was replaced by chemically identical medium in which 10% of the glucose was uniformly 13C-labeled (D-glucose [u-13C6], CAS: 492-62-6, Cambridge Isotope Laboratories Inc., Andover, MA, USA). • In experiment II the same strain was cultured at the same dilution rate, pH, temperature and stirrer speed to yield 13C-labeled metabolic intermediates for LC-MS 170

Metabolic network and flux analysis of S. cerevisiae analysis. The aerobic, carbon-limited chemostat culture was performed at a working volume of 4 l in a weight-controlled 7-l fermentor (Applikon Dependable Instruments, Schiedam, The Netherlands). In this experiment 0.3 bar overpressure was applied to allow rapid sampling. The aeration rate of 0.68 vvm (l air/l broth/min), was chosen such that the dissolved oxygen concentration was identical to that in experiment I. The medium that was supplied was the defined mineral medium of Verduyn et al. (1992) with 10 g/l glucose and 1.0 g/l ethanol as dual carbon source. Nitrogen was solely provided in the form of ammonia. In this experiment a higher biomass concentration was cultivated than in experiment I to favor reliable intracellular metabolite analyses. Steady state conditions were routinely checked by off-line measurements of the biomass concentration through filtration on nitro-cellulose filters (pore size 0.45 µm, Gelman Science, Ann Arbor, MI., USA) and by on-line measurements of the off-gas concentrations of O2 and CO2 (combined paramagnetic/infrared analyser, NGA 2000, Rosemount Analytics, Santa Clare, CA., USA). Gas flow rates were measured using a Saga Digital Flow Meter (Ion Science, Cambridge, UK). The carbon balance closed within 5%. When the chemostat culture of experiment II had reached metabolic steady state after 7 residence times, the unlabeled medium was replaced by chemically identical medium in which 100% of the glucose was specifically 1-13C-labeled (D-glucose [1-13C1], CAS: 4076222-9, Cambridge Isotope Laboratories Inc., Andover, MA, USA) and in which 100% of the ethanol was uniformly 13C-labeled (ethanol [u-13C2], CAS: 70753-79-6, Cambridge Isotope Laboratories Inc., Andover, MA, USA). Biomass sampling and sample handling • In experiment I, biomass samples of 100 ml (≈170 mg dry weight biomass) were taken for 2D [13C,1H] COSY analysis 11.8 h and 35.4 hours (1.2 and 3.5 residence times) after the switch to 13C-labeled medium. At this time 70% and 97% of the naturally 13C-labeled biomass had been replaced by biomass grown on uniformly 13C-labeled substrate. The samples were directly centrifuged (6 min., 4800 rpm). The cell pellet was washed with 0.9% NaCl-solution and demineralized water. After final filtration the biomass was stored at -80°C. Prior to NMR analysis the biomass was lyophilized for 48 h and subsequently hydrolyzed in 10 ml 6 N HCl for 16 hours at 110°C. After filtration and evaporation to dryness, the residue was dissolved in 10 ml 0.1 N HCl and the amino acids were adsorbed to an ion exchange resin (Dowex AG 50W X4) and washed with 0.1 N HCl. The amino acids were eluted with 4 N HCl. After evaporation the residue was dissolved in D2O. The presented sample preparation includes separation of proteinogenic amino acids from the remaining biomass components, which eliminates the interference of multiplets of carbon atoms of other compounds. Water extracts of cell components other than proteinogenic amino acids were prepared by heating the biomass in 5 ml H2O to 90∞C for 10 minutes with shaking. After centrifugation, the supernatant was lyophilized and dissolved in D2O. • In experiment II, biomass samples were taken for LC-MS analysis 0, 40 and 60 minutes after the switch to 13C-labeled medium. Based on literature data on intracellular pool sizes and reaction rates it had been estimated that of all the central carbon metabolism intermediates the lumped α-ketoglutarate/glutamate-pool had the smallest turnover rate. 171

Chapter 9 According to these estimations more than 97% of this intermediate pool is expected to be replaced by intermediate formed from the 13C-labeled medium after 30 minutes of 13C-labeled medium supply. At each of the three time points (0, 40 and 60 min.) 1 ml-samples (≈5 mg of biomass) were taken using the rapid sampling setup described in Lange et al. (2001b). Immediate quenching of the metabolism, separation of the cells from the extracellular liquid, and cell extraction were also performed as described in Lange et al. (2001b). The ethanol and traces of methanol were removed from the extracted samples in a RapidVap vacuum evaporation system using vacuum along with a temperature gradient and vortex motion (Labconco, Kansas City, MO., USA). During the drying 3 samples of the same time point were combined into one, to obtain a final amount of biomass of 15 mg per sample. The dried samples were dissolved in 0.5 ml of demineralized water and centrifuged (5 min, 11000*g) and decanted to remove insoluble matter. They were stored at -80ºC prior to LC-MS analysis. 2D [13C,1H] COSY measurements For experiment I, NMR measurements were performed as described in Chapter 4 of this thesis. The recorded spectra were analyzed by means of spectral fitting software (see Chapter 4). The resulting data are relative intensities of several fine structures that can be observed in the multiplets of proteinogenic amino acids, the storage sugar trehalose and levulinic acid, a chemical degradation product of g6p. Covariances of the data were estimated on the basis of NMR noise by the peakfitting software. Because especially the early biomass sample taken after 11.8 h of 13C-labeled glucose supply was not yet in isotopic steady state, the measured relative intensities were corrected for this as described in Chapter 4. MS measurements For experiment II, the intermediates in the biomass extract were separated by highperformance anion exchange chromatography. This was done using an Alliance pump system (Waters, Milford, USA) followed by an IonPac AS11 (250 x 4 mm) column equipped with an AG11 (50 x 4 mm) precolumn (both from Dionex, Sunnyvale, CA., USA). The flow rate was 1 ml/min. In all cases 10 µl sample was injected. From the above sample preparation method it follows that 10 µl sample contains the intracellular metabolites originating from 0.30 mg dry weight of S. cerevisiae. The sodium hydroxide concentration of the eluent being too high for a proper mass spectrometric analysis, the sodium cations were exchanged for protons by a post column ASRS Ultra 4 mm ‘Self Regenerating Suppressor’ from Dionex. Subsequent MS analyses were performed with a Quattro-LC triple quadrupole mass spectrometer (Micromass Ltd., Manchester, UK) equipped with an electrospray ionization interface with a mass range up to 1600 m/z. The nebulizer gas flow (nitrogen) was 75 l/h and the desolvation gas flow (nitrogen) was 680 l/h. The source block temperature was 80 °C and the desolvation gas temperature was 250 °C. The capillary voltage was set at -2.7 kV. All samples were analyzed in the negative mode giving [M-H]- ions, which were monitored in the single ion recording (SIR) mode (MS1 is static, not scanning) with a resolution of 0.8 mass units. For further details regarding the MS method, see Van Dam et al. (2002). Note that the measured mass 172

Metabolic network and flux analysis of S. cerevisiae distributions of the intracellular metabolites are independent of their absolute concentrations. Consequently, the data are not affected by changes in any of the intracellular metabolite concentrations that could occur after sampling. Moreover, calibration of the LC-MS to assure correct absolute concentrations is not needed. The resulting data are the mass fractions of the following intermediates of the central carbon metabolism: g6p, g1p, f6p, fbp, bpg, pep, pyr, 6pg, p5p, e4p, s7p, cit, gox, suc, fum, mal. The identity of each of these components except for s7p was checked against a standard sample. The p5p mass fractions were determined from the overlapping r5p and ru5p signals, not from x5p. Standard deviations of the mass fractions were determined from 5 repeated measurements. 9.4 RESULTS AND DISCUSSION Table 1 shows the extracellular rates, biomass concentrations, total organic carbon concentrations in the filtrate, dissolved oxygen levels, respiratory quotients and biomass yields of experiments I and II. The calculations are based on an estimated molecular weight for biomass of 26.4 g/C-mole (Lange and Heijnen,2001a). These data show that the two chemostat cultures were nearly identical. Therefore, the metabolic flux analyses of both experiments are based on the set of balanced net conversion rates of experiment I and the biosynthetic reaction rates that were derived thereof using the detailed metabolic model published by Lange (2002). TABLE 1:

Macroscopic measurements and yields of the experiments I and II.

q(glc) q(eth) q(O2) q(CO2) q(biomass) total organic carbon filtrate dissolved oxygen biomass concentration biomass concentration respiratory quotient yield(biomass/glc) yield(biomass/glc) yield(biomass/glc+eth)

(mmol/C-mol/h) (mmol/C-mol/h) (mmol/C-mol/h) (mmol/C-mol/h) (C-mol/C-mol/h) (mg/l) (% of saturation level) (g/l) (C-mol/l) (mol/mol) (C-mol/C-mol) (g/g) (C-mol/C-mol)

experiment I 26.1 12.0 80.7 73.0 0.10 70 60 1.70 0.064 0.90 0.67 0.53 0.58

experiment II 24.7 10.6 75.7 69.8 0.10 133 60 5.33 0.202 0.92 0.67 0.53 0.58

Flux analysis upper network Table 2 (columns ‘measured’) shows the time-corrected relative intensities in the NMR spectra of 21 different carbon positions in amino acids and storage sugars that were measured in the biomass grown in experiment I and harvested 12 and 35 hours after the switch to 13Clabeled medium. Table 3 (columns ‘measured’) shows the mass fractions of 10 metabolic intermediates that were measured in the extracts of the biomass grown in experiment II and harvested 0, 40 and 60 minutes after the switch to 13C-labeled medium. The mass fractions smaller than 0.01 are not shown. In no case the sum of the fractions is below 0.99. 173

Chapter 9 TABLE 2: Time-corrected measured and simulated relative intensities in the upper part of the central carbon metabolism (see Figs. 1-I+II). Values given are of singlets (s), doublets (d), triplets (t), double doublets (dd) and double double doublets (ddd) of various carbon positions in amino acids, and storage sugars present in biomass lysate. (‘d*’ indicates the doublet with larger one-bond scalar coupling constant, ‘dL’ indicates the doublet caused by the long range scalar coupling, ‘ddL’ and ‘dd*L’ indicate the double doublets caused by the long range scalar coupling and originating from splitting of ‘d’ and ‘d*’, ‘dtL’ denotes double tripled caused by the long range scalar coupling) time=35 hrs., measured, extrapolated to infinite time

simulated, network I+II, hypothesis B, separate fit NMR data

simulated, network I+II, hypothesis B, simultaneous fit NMR and MS data

measured compound, carbon position and fine structure \/

time=12 hrs., measured, extrapolated to infinite time

sample >

0.108

0.117 0.118 0.089 0.086 0.016 0.023 0.778 0.773 0.121 0.009 0.772 0.097 0.259 0.741 0.107 0.013 0.041 0.840 0.118 0.004 0.350 0.056 0.011 0.009 0.005 0.449 0.326

0.137

0.127

0.145

0.153

0.010

0.008

0.708

0.711

0.131 0.016 0.760 0.094 0.282 0.718 0.113 0.005 0.076 0.806 0.162 0.010 0.309 0.066 0.010 0.007 0.024 0.413 0.349

0.121 0.015 0.770 0.095 0.280 0.720 0.108 0.005 0.081 0.807 0.162 0.010 0.314 0.080 0.010 0.007 0.024 0.394 0.362

0.289

0.197

0.203

0.037

0.034

0.034

phe-α = s d

0.073

d*

0.032

dd

0.787

phe-β = s d* d dd gly-α = s d his-α = s d d* dd his-β = s d* d dd dL dd*L ddL ddd his-δ = s d dL

174

0.124 0.016 0.766 0.094 0.362 0.638 0.120 0.027 0.050 0.803 0.126 0.002 0.330 0.131 0.007 0.023 0.030 0.352 0.410 0.362 0.282 0.283 0.031 0.004

Metabolic network and flux analysis of S. cerevisiae ddL ser-α = s d d* dd ser-β = s d tyr-α = s d d* dd tyr-β = s d* d dd tyr-δ1/δ2 = s d t tyr-ε1/ε2 = s d t lev-C2 = s d d* dd lev-C3 = s d t dL ddL dtL lev-C5 = s d* d dd tre-C1 = s d tre-C2 = s d t tre-C3 = s d t tre-C4 = s d t tre-C5 = s d t tre-C6 = s d

0.277 0.351 0.150 0.071 0.257 0.522 0.408 0.592 0.132 0.091 0.031 0.747 0.114 0.021 0.771 0.095 0.151 0.793 0.056 0.238 0.319 0.443

0.349

0.420

0.401

0.159 0.070 0.272 0.499 0.375 0.625 0.127 0.083 0.030 0.759 0.122 0.027 0.760 0.091

0.179 0.104 0.239 0.478 0.418 0.582 0.137 0.145 0.010 0.708 0.131 0.016 0.760 0.094 0.121 0.785 0.095 0.318 0.266 0.416 0.288 0.080 0.065 0.567 0.137 0.014 0.000 0.003 0.209 0.636 0.103 0.049 0.004 0.845 0.173 0.827 0.160 0.222 0.619 0.288 0.145 0.567 0.141 0.223 0.636 0.103 0.053 0.845 0.107 0.893

0.172 0.108 0.242 0.478 0.414 0.586 0.127 0.153 0.008 0.711 0.121 0.015 0.770 0.095 0.112 0.793 0.096 0.299 0.278 0.423 0.283 0.099 0.077 0.542 0.138 0.014 0.000 0.003 0.215 0.630 0.100 0.053 0.003 0.844 0.161 0.839 0.149 0.245 0.606 0.283 0.176 0.542 0.141 0.229 0.630 0.100 0.056 0.844 0.103 0.897

0.287 0.100 0.087 0.525 0.125 0.030 0.001 0.032 0.170 0.642 0.184 0.024 0.015 0.778 0.154 0.846 0.137 0.206 0.658 0.279 0.163 0.559 0.119 0.220 0.661 0.095 0.049 0.856 0.112 0.888

175

Chapter 9 The mass fractions of the sample taken at t=0 in experiment II represent the 13C-labeling distribution of metabolites that were formed from naturally 13C-labeled glucose and ethanol. The mass fractions greater than M+0 are caused by natural isotopes of carbon, oxygen and hydrogen. In the cumomer model that simulates the mass fractions, natural 13C-labeling is taken into account in the definition of the cumomer distribution of the carbon substrate. The presence of natural isotopes other than carbon is accounted for by applying the correction procedure proposed in Chapter 7. By setting the fractions of uniformly and specifically 13Clabeled medium substrates in the model at zero, the mass fractions of intermediates formed from naturally labeled medium were simulated. Table 3 (column ‘time=0, simulated’) shows that the simulated data agree well with the measured ones: only 5 out of the 27 fractions significantly differ based on a 95% confidence level. The comparison does not give any information about the fluxes, but tests the accuracy of the MS measurements (Hellerstein and Neese,1999). In experiment II, the mass fractions were measured both at t=40 and at t=60 in order to check whether the 13C-labeling distributions of the intracellular metabolites were in isotopic steady state after more than 30 minutes, as was estimated on the basis of literature values of intracellular pool sizes and estimated fluxes. It can be verified in Table 3 (columns ‘time=40 and time=60, measured’) that mass fractions remain constant after 40 minutes, indicating that the pools may indeed be assumed in isotopic steady state: only 1 out of the 34 fractions measured at t=40 and t=60 significantly differs based on a 95% confidence level. In order to find out whether the two different measurement techniques that were used to obtain 13C-labeling information result in different flux fits, the relative intensities in the NMR spectra and the mass fractions measured in the sample taken at t=60 were separately fitted. These fits were performed using the cumomer balances of the upper network (Fig.1-I+II). Next, the two sets of 13C-labeling data were simultaneously fitted to investigate whether the metabolic model could be brought into accordance with two totally independent data sets. Fits were not only performed for two different data sets and for the combined data set, but also for two different hypotheses regarding the metabolic network: • A: all reactions in the networks of Fig.1-I+II are active. Bidirectional reactions run in both directions, • B: the by-pass of the glycolysis formed by the f6p aldolase (v5 in Fig.1-I), dha scrambling (v6) and dha kinase (v7) reactions is inactive: rates v5f, v5b, v6 and v7 are fixed at zero. The transketolase (v10 and v12 to v16 in Fig.1-I+II) and transaldolase reactions (v11, v17 and v18) are not reversible; the net reactions may run in both directions but their exchange fluxes are fixed at zero. The parameters and results of the six (3 data sets x 2 network hypotheses) iterative minimizations of the covariance-weighted sums of squared residuals are summarized in Table 4. The table shows that both for hypothesis A and B, the separate fit of the MS data is better (c/(a-b)=14.2 (A) and 14.0 (B)) than the separate fit of the NMR data (c/(a-b)=103.8 (A) and 98.5 (B)). This may reveal either (1) that the metabolic model contains an error to which the MS data are less sensitive than the NMR data, (2) that one or more of the modeled biosynthetic reactions leading from the metabolic intermediates to the biomass components measured with NMR are incorrect or (3) that the estimated measurement error of the NMR 176

Metabolic network and flux analysis of S. cerevisiae TABLE 3: Measured and simulated mass fractions and standard deviations (determined from 5 repeated experiments) of various intermediates of the upper part of the central carbon metabolism (see Figs. 1-I+II) time=60 min., measured

time=60, simulated, net-work I+II, hypothesis B, separate fit MS data

time=60, simulated, net-work I+II, hypothesis B, simultaneous fit NMR and MS data

a

time=40 min., measured

Note:

time=0, simulated network I+II, hypothesis A

measured compound and mass fraction \/ g6p M+0 (M=259) M+1 M+2 M+3 g1p M+0 (M=259) M+1 M+2 f6p M+0 (M=259) M+1 M+2 M+3 f16p M+0 (M=339) M+1 M+2 M+3 2/3pg M+0 (M=185) M+1 M+2 pep M+0 (M=167) M+1 M+2 6pg M+0 (M=275) M+1 M+2 M+3 p5p M+0 (M=229) M+1 M+2 e4p M+0 (M=199) M+1 M+2 s7p a M+0 (M=291) M+1 M+2

time=0 min., measured

sample >

0.911±0.007 0.065±0.005 0.022±0.002

0.916 0.064 0.019

0.910±0.005 0.064±0.004 0.025±0.006 0.910±0.009 0.067±0.005 0.021±0.005

0.916 0.064 0.019 0.916 0.064 0.019

0.893±0.015 0.077±0.013 0.029±0.003

0.909 0.065 0.024

0.950±0.004 0.035±0.002 0.015±0.001 0.952±0.001 0.035±0.001 0.013±0.001 0.915±0.003 0.065±0.003 0.019±0.001

0.951 0.034 0.014 0.954 0.034 0.012 0.914 0.064 0.021

0.941±0.002 0.051±0.001

0.928 0.054

0.948±0.003 0.045±0.002

0.941 0.044

0.929±0.004 0.067±0.003

0.904 0.073

0.174±0.009 0.720±0.010 0.085±0.010 0.019±0.002 0.306±0.040 0.637±0.039 0.057±0.009 0.189±0.020 0.711±0.021 0.082±0.008 0.017±0.003 0.331±0.016 0.523±0.026 0.125±0.013 0.021±0.002 0.640±0.012 0.331±0.012 0.022±0.001 0.630±0.015 0.340±0.013 0.023±0.002 0.199±0.008 0.701±0.009 0.086±0.003 0.013±0.001 0.745±0.009 0.219±0.007 0.034±0.003 0.866±0.004 0.122±0.004 0.013±0.002 0.858±0.002 0.126±0.002 0.014±0.001

0.192±0.015 0.695±0.013 0.091±0.010 0.020±0.002 0.332±0.027 0.610±0.021 0.059±0.016 0.202±0.033 0.683±0.038 0.092±0.012 0.021±0.001 0.350±0.027 0.496±0.026 0.127±0.015 0.023±0.002 0.628±0.016 0.343±0.016 0.022±0.001 0.631±0.020 0.338±0.020 0.024±0.002 0.195±0.005 0.696±0.006 0.096±0.004 0.011±0.001 0.757±0.013 0.201±0.012 0.037±0.004 0.867±0.004 0.123±0.004 0.011±0.002 0.860±0.003 0.125±0.002 0.013±0.001

0.188 0.702 0.095 0.013 0.314 0.592 0.082 0.190 0.699 0.096 0.013 0.347 0.532 0.105 0.014 0.673 0.312 0.011 0.675 0.312 0.009 0.188 0.700 0.095 0.015 0.772 0.211 0.014 0.863 0.123 0.013 0.854 0.120 0.022

0.194 0.753 0.038 0.014 0.338 0.615 0.035 0.341 0.583 0.064 0.011 0.364 0.554 0.066 0.014 0.770 0.215 0.012 0.772 0.216 0.010 0.194 0.751 0.039 0.016 0.868 0.115 0.015 0.860 0.126 0.013 0.855 0.124 0.018

Expected mass is 289. Identity of s7p was not verified for lack of a standard.

177

Chapter 9 data is too small. Note that in contrast to the NMR data the errors of the MS data were not estimated, but determined from five repeated measurements. The minimized sum of squared errors of the simultaneous fits of the MS and NMR data in Table 4 are larger than those of the separate data sets. This shows that both the measured NMR data and the MS data are less well approximated when fitted at the same time. Apparently, the single model is not consistent with the two data sets. However, it should be reminded that the biomass samples that were used for MS and NMR analyses were obtained from two separate cultures. Although very similar net conversion rates were measured for these two cultures (see Table 1), small differences between the two cultures may explain to some extent the observed inconsistency of the two data sets. TABLE 4: Number of independent data points (a), number of free fluxes (b), minimal sums of squared residuals (SSres) (c), the ratio c/(a-b), the variable T of Eq.3 and the probability P that a value larger than T is caused by random errors for the upper network (Fig.1-I+II). Shown outcomes are of the fits based on hypotheses A and B and of the data sets measured using MS, NMR and MS+NMR Data \/ NMR

MS

NMR +MS

network, hypothesis > a) indep. data points (total data points number of data sets that are normalized to one) b) parameters (free fluxes + serine synthesis ratio) c) min. weighted SSres c/(a-b) TNMR (Eq.3) P(T>TNMR |T∼F(6,75)) a) indep. data points b) parameters (free fluxes + influx v24) c) min. weighted SSres c/(a-b) TMS (Eq.3) P(T>TMS |T∼F(6,18)) a) indep. data points b) parameters (free fluxes + serine synthesis ratio + influx 24) c) min. weighted SSres c/(a-b) TNMR+MS (Eq.3) P(T>TNMR+MS |T∼F(6,108))

I+II, A

I+II, B 91 (124-33)

16 (15+1)

10 (9+1)

7783.4 103.8

7975.5 98.5 0.30 0.93 34

16 (15+1) 255.5 14.2

10 (15+1) 335.6 14.0

0.94 0.49 125 (34+91) 17 (15+1+1) 11 (9+1+1) 9000.6 (NMR: 7930.8+ MS:1069.7) 83.3

9309.6 (NMR: 8239.3+ MS: 1070.3) 81.7 0.62 0.72

Another observation in Table 4 is that although the minimized sum of squared errors increases due to the lower number of free flux parameters under hypothesis B compared to hypothesis A, the minimized error divided by the number of free data points (c/(a-b)) decreases for all three data sets. In order to test whether the simplified model of hypothesis B describes the data as well as the complete model of hypothesis A, we calculate the F-distributed variable T: 178

Metabolic network and flux analysis of S. cerevisiae

T=

(SSres,hypothesis B − SSres,hypothesis A ) ( phypothesis B − phypothesis A ) SSres,hypothesis A ( n hypothesis A − p hypothesis A ) (

T ∼ F p hypothesis B − p hypothesis A , n hypothesis A − phypothesis A

( 3)

)

where SSres is the covariance-weighted minimized sum of squared residuals, n is the number of independent data points and p is the number of model parameters. Table 4 shows both the values of T for the three different data sets and the probabilities that values of T larger than the observed ones are caused by random errors. The large probabilities clearly show that the simplified model of hypothesis B does not describe the measurements significantly worse than the model of hypothesis A. In other words, the presence of the f6p aldolase/dha kinase bypass and the reversibility of the transketolase and transaldolase reactions are not proven. For all six minimized normalized sums of squared residuals (c in Table 4) it holds that their statistically expected value equals the number of independent data points minus the number of model parameters (a-b). However, all the ratios (c/(a-b)) are considerably larger than the expected value of 1, leading to a rejection of all fits on statistical ground. Apparently the model in Fig.1-I+II is not fully correct or complete. Study of the fitted measurements does not reveal any clear outliers that explain the rejection. The NMR data that were fitted separately or simultaneously with the MS data using the upper network and hypothesis B are shown in Table 2. The mean absolute deviation between the measured and simulated relative intensities in this table is 0.028 for the separately fitted NMR data and 0.026 for the simultaneously fitted NMR and MS data. For the separately fitted NMR data the largest absolute deviation is observed for the fine structure ‘ddL’ of the his-δ multiplet, but this fine structure largely overlaps with the fine structure ‘d’ which makes it hard to resolve the two separate fine structures. The sum of the relative intensities of both fine structures is fitted considerably better than the separate ones. The MS data that were fitted separately or simultaneously with the NMR data using the upper network and hypothesis B are shown in Table 3. The mean absolute deviation between the measured and simulated relative intensities in this table is 0.013 for the separately fitted MS data and 0.044 for the simultaneously fitted NMR and MS data. Only a few large deviations are observed when the MS data are separately fitted, whereas simultaneous fitting with the NMR data leads to clearly misfitted mass fractions for all metabolites except for g1p, e4p and s7p. Amongst the worst separately fitted MS data are those of bpg and pep where the model predicts too large M+0 fractions. Combined with the fact that the fitted NMR data of tyr-α and phe-α, that both reflect the 13C-labeling of pep, are not fitted very well either, this could point at an error of the simulation of the 13C-labeling tp-pool in Fig.1-I. A tentative explanation could be the presence in the cells of pep-carboxykinase that converts cytosolic oaa into pep and CO2. This enzyme was not considered in the present study because it was found to have a very low activity in S. cerevisiae at low ethanol to glucose ratios in the feed (De Jong-Gubbels et al.,1995). Note that its inclusion in the model would require the integration of the upper and lower networks, as it catalyzes a reaction that leads from the intermediate oaa of the lower network back to pep in the upper network. 179

Chapter 9 TABLE 5: Fitted metabolic fluxes for the upper network (Figure 1-I+II), normalized to a glucose uptake flux of 100, and the fitted serine synthesis ratio network I+II, hypothesis A, NMR and MS data fitted

network I+II, hypothesis B, NMR and MS data fitted

180

a

network I+II, hypothesis B, MS data fitted

Notes:

network I+II, hypothesis A, MS data fitted

v1 v2, net g6p → f6p v2, exch v3 v4, net f16p → 2 tp v4, exch v5, net f16p → dha v5, exch + tp v6 v7 v8 v9 v10, net 2 p5p → s7p v10, exch + tp v11, net s7p + tp → v11, exch f6p + e4p v12, net p5p + e4p → v12, exch f6p + tp v13, net s7p + e4p → v13, exch f6p + p5p v14 v15 v16 v17 v18 v19 v20 v21 v22 v23 v24 serine synthesis ratio tp → ser gly → ser

network I+II, hypothesis B, NMR data fitted

net flux direction

network I+II, hypothesis A, NMR data fitted

flux in Fig.1 a

100 b 10.3 118.5 40.4 40.4 138.6 9.3 0.0 >>1000 9.3 105.3 63.7 20.0 0.0 21.0 0.0 19.5 24.6 -1.0 1.4 0.0 58.1 10.3 79.4 42.3 63.7 12.7 3.3 2.4 26.0 -

100 -18.8 73.3 40.1 40.1 126.0 0.0 0.0 0.0 0.0 95.6 92.8 32.7 0.0 30.6 0.0 26.2 0.0 2.0 0.0 35.8 44.2 0.7 72.0 >>1000 92.8 12.7 3.3 2.4 26.0 -

100 29.6 >>1000 56.2 56.2 117.8 0.0 3.6 >>1000 0.0 111.8 44.4 14.5 0.0 14.5 0.0 12.1 0.0 0.0 0.0 39.0 >>1000 0.0 22.8 >>1000 44.4 12.7 3.3 2.4 26.0 5.5

100 26.9 >>1000 55.3 55.3 81.3 0.0 0.0 0.0 0.0 110.8 47.1 27.9 0.0 15.4 0.0 0.6 0.0 12.4 0.0 113.8 >>1000 69.4 0.0 >>1000 47.1 12.7 3.3 2.4 26.0 5.4

100 4.4 120.4 38.4 38.4 1.9 9.4 5.8 >>1000 9.4 103.4 69.6 25.9 9.5 22.9 0.0 17.5 0.0 3.0 0.0 16.0 89.3 0.0 69.0 0.0 69.6 12.7 3.3 2.4 26.0 6.5

100 4.9 132.6 48.0 48.0 5.4 0.0 0.0 0.0 0.0 103.5 69.1 23.0 0.0 22.7 0.0 20.1 0.0 0.2 0.0 27.0 63.6 2.3 67.9 96.1 69.1 12.7 3.3 2.4 26.0 6.5

0.65 0.35

0.64 0.36

-

-

0.65 0.35

0.63 0.37

Bidirectional fluxes in Figure 1 are presented as net and exchange (exch) fluxes in this column. Net fluxes are defined in the direction indicated in the second column. Positive values correspond to net fluxes in the defined direction, negative values correspond to net fluxes in the opposite direction. b Gray cells refer to fluxes that are fixed due to model assumptions.

Metabolic network and flux analysis of S. cerevisiae The fluxes that were fitted for the upper network, based on hypothesis A and B and either the separately fitted NMR data, the separately fitted MS data or the simultaneously fitted combined data set are given in Table 5. The table shows that the fluxes fitted using either hypothesis A or B deviate considerably when only the NMR data are fitted. For two separately fitted MS and combined data sets the results of the two hypotheses deviate much less. The deviations between the fits based on both hypotheses are mainly caused by the absence or presence of the f6p aldolase/dha kinase bypass under the two hypotheses. The assumed reversibility of the transketolase and transaldolase reactions (v10-v18 in Fig.1) has hardly any impact since even if their exchange fluxes are allowed to be non-zero (hypothesis A), they are often fitted to be zero. The fitted fraction of g6p that is converted via the PPP (v9) clearly depends on the data set. Dependence of this flux on the hypothesis is only observed for the NMR data. The flux distribution around the g6p-node that is fitted for the MS data: approximately 46% entering the PPP (v9), 28% converted via the phosphoglucose isomerase (v2,net), 26% consumed for biomass synthesis (v23) is very close to the distribution found by Gombert et al. (2001). For their chemostat cultivation with the same dilution rate (but no co-fed ethanol) they found 44%, 34%, and 22% for the same fluxes. As the carbon balance has to close for each set of fitted fluxes and the glucose uptake and anabolic fluxes are identical in each case, the amount of tp going towards the lower network (v8) is directly related to the CO2 that is produced in the oxidative branch of the PPP (v19). With respect to the fitted fluxes for the transketolase and transaldolase reactions (v10v18) it needs to be stressed that these fluxes have been shown to be ill determined in earlier 13 C-labeling studies based on the same (extended) model for the non-oxidative PPP (Chapter 8 of this thesis). As stated in the Theory section, g1p has an influx that does not stem from another intracellular metabolite in the model (v24). The reason to include this influx in the model of Fig.1-I is the observation that the mass fractions of g1p (see Table 3) are clearly different from those of g6p. The metabolite g1p has a much larger M+0 fraction than g6p and lower M+1 and M+2 fractions, indicating that this metabolite is 13C-labeled to a lower extent than g6p. A tentative explanation is that the g1p formed from g6p by phosphoglucomutase is diluted by g1p-molecules stemming from turnover of the large pool of unlabeled storage sugars (e.g. trehalose and glycogen) that are still present in the cell after 40 and 60 minutes of 13 C-labeled medium supply. When fitted to the measured MS data, this turnover rate (normalized to the glucose uptake rate of 100) is estimated to be 5.5 (see Table 5). Note that the fact that g1p and g6p have different mass fractions demonstrates that the phosphoglucomutase reaction that interconverts these two metabolite pools does not catalyze a fast exchange reaction. Finally, one degree of freedom in the network I+II that is not visible in Fig.1, is the ‘serine synthesis ratio’ that can be fitted to the NMR data. When comparing the relative intensities of the multiplet of ser-α in Table 2 to those of the multiplets of phe-α and tyr-α, clear differences are observed (see also Assumption III in Chapter 10). Both the three carbon fragment of ser-α and those of phe-α and tyr-α directly stem from an intact molecule of pep (part of the tp-pool in Fig.1-I). The different 13C-labeling pattern of ser-α indicates that part of the serine must have an alternative synthesis route. This is indeed the case, as serine is the 181

Chapter 9 precursor of gly via the serine hydroxymethyltransferase reaction, which is may run in both directions. Formation of ser from gly (the carbons of which stem from the first and second positions of pep) and a separate one-carbon moiety that stems from the third position of pep, can explain the observed lower relative intensity of the double doublet and the higher relative intensity of the doublet caused by a 13C-labeling of the carboxy and α-carbon of ser. When fitted to the measured NMR data, the fraction of ser formed from gly is estimated to be 0.35. Flux analysis lower network Table 6 (columns ‘measured’) shows the time-corrected relative intensities in the NMR spectra of 32 different carbon positions in amino acids that were measured in the biomass grown in experiment I and harvested 12 and 35 hours after the switch to 13C-labeled medium. Table 7 (columns ‘measured’) shows the mass fractions of 6 metabolic intermediates that were measured in the extracts of the biomass grown in experiment II and harvested 0, 40 and 60 minutes after the switch to 13C-labeled medium. The mass fractions smaller than 0.01 are not shown. In no case the sum of the fractions is below 0.97. TABLE 6: Time-corrected measured and simulated relative intensities in the lower part of the central carbon metabolism (see Figs. 1-III). Values given are of singlets (s), doublets (d), triplets (t), double doublets (dd) and double double doublets (ddd) of various carbon positions in amino acids, and storage sugars present in biomass lysate. (‘d*’ indicates the doublet with larger one-bond scalar coupling constant, ‘dL’ indicates the doublet caused by the long range scalar coupling, ‘ddL’ and ‘dd*L’ indicate the double doublets caused by the long range scalar coupling and originating from splitting of ‘d’ and ‘d*’, ‘dtL’ denotes double tripled caused by the long range scalar coupling) simulated, network III, simultaneous fit NMR and MS data

182

simulated, network III, separate fit NMR data

ala-α = s d d* dd ala-β = s d asp-α = s d d* dd asp-β = s d* d

time=35 hrs., measured, extrapolated to infinite time

measured compound, carbon position and fine structure \/

time=12 hrs., measured, extrapolated to infinite time

sample >

0.125 0.109 0.051 0.715 0.151 0.849 0.277 0.127 0.243 0.353 0.260 0.250 0.367

0.154 0.123 0.047 0.676 0.188 0.812 0.267 0.114 0.249 0.370 0.281 0.240 0.363

0.152 0.143 0.034 0.671 0.186 0.814 0.278 0.121 0.248 0.353 0.285 0.242 0.381

0.160 0.144 0.059 0.637 0.219 0.781 0.288 0.108 0.251 0.353 0.279 0.253 0.390

Metabolic network and flux analysis of S. cerevisiae dd glx-α = s

0.124 0.335

d

0.199

d*

0.343

dd

0.124

glx-β = s d t glx-γ = s d* d dd ile-α = s d d* dd ile-γ2 = s d ile-δ = s d lys-α = s d d* dd lys-β = s d t lys-γ = s d t lys-δ = s d t lys-ε = s d leu-α = s d d* dd leu-β = s d t leu-δ1 = s d leu-δ2 = s d met-α = s

0.632 0.356 0.012 0.179 0.729 0.020 0.073 0.360 0.050 0.528 0.063 0.188 0.812 0.626 0.374

d

0.640 0.324 0.036 0.648 0.339 0.014 0.178 0.755 0.067 0.230 0.770 0.219 0.033 0.669 0.079 0.830 0.170 0.000 0.218 0.782 0.894 0.106

0.117 0.334 0.338 0.196 0.184 0.336 0.355 0.134 0.123 0.623 0.351 0.026 0.194 0.719 0.019 0.069 0.338 0.050 0.542 0.070 0.186 0.814 0.624 0.376 0.308 0.041 0.584 0.067 0.626 0.321 0.053 0.635 0.337 0.028 0.199 0.743 0.058 0.221 0.780 0.215 0.026 0.684 0.074 0.829 0.171 0.000 0.214 0.786 0.889 0.111 0.241 0.234 0.148 0.108

0.093 0.339

0.078 0.356

0.219

0.202

0.344

0.368

0.098

0.074

0.628 0.346 0.025 0.197 0.712 0.020 0.071 0.356 0.043 0.536 0.065 0.186 0.814 0.653 0.347 0.310 0.031 0.599 0.060 0.657 0.331 0.012 0.628 0.346 0.025 0.197 0.732 0.071 0.217 0.783 0.193 0.023 0.699 0.085 0.821 0.171 0.009 0.186 0.814 0.892 0.108 0.278

0.654 0.322 0.024 0.205 0.704 0.021 0.071 0.354 0.042 0.540 0.064 0.219 0.781 0.652 0.348 0.335 0.034 0.574 0.058 0.702 0.290 0.009 0.654 0.322 0.024 0.205 0.724 0.071 0.226 0.774 0.196 0.023 0.698 0.083 0.798 0.191 0.011 0.219 0.781 0.893 0.107 0.288

0.121

0.108

183

Chapter 9 d* dd pro-α = s d d* dd pro-β = s d t pro-γ = s d t pro-δ = s d arg-γ = s d t arg-δ = s d thr-α = s

0.310 0.211 0.344 0.135 0.641 0.359 0.000 0.186 0.736 0.078 0.205 0.796 0.210 0.748 0.042 0.219 0.781 0.285

d

0.109

d*

0.254

dd

0.352

thr-β = s d t thr-γ = s d val-α = s d d* dd val-γ1 = s d val-γ2 = s d

0.267 0.642 0.091 0.626 0.374 0.209 0.035 0.676 0.080 0.183 0.817 0.856 0.144

0.234 0.242 0.378 0.417 0.335 0.199 0.321 0.145 0.640 0.360 0.000 0.179 0.712 0.109 0.207 0.793 0.197 0.751 0.052 0.221 0.779 0.276 0.272 0.107 0.111 0.255 0.258 0.362 0.359 0.294 0.624 0.083 0.622 0.379 0.216 0.030 0.671 0.084 0.190 0.810 0.880 0.120

0.248

0.251

0.353

0.353

0.339 0.219 0.344 0.098 0.628 0.346 0.025 0.197 0.732 0.071 0.217 0.783 0.197 0.732 0.071 0.217 0.783 0.278

0.356 0.202 0.368 0.074 0.654 0.322 0.024 0.205 0.724 0.071 0.226 0.774 0.205 0.724 0.071 0.226 0.774 0.288

0.121

0.108

0.248

0.251

0.353

0.353

0.285 0.623 0.093 0.653 0.347 0.263 0.032 0.629 0.076 0.186 0.814 0.892 0.108

0.279 0.643 0.078 0.652 0.348 0.271 0.032 0.622 0.074 0.219 0.781 0.893 0.107

Similarly to the upper network discussed above, the values of the mass fractions that were simulated by assuming only naturally labeled medium substrate (Table 7, column ‘time=0, simulated’) agree reasonably well with the mass fractions measured in the sample taken at t=0: only 5 (the fractions of gox and mal) out of the 15 fractions significantly differ based on a 95% confidence level. Taking into account the small size of most standard deviations of the mass fractions, the observed correspondence between the measured and theoretically expected values at t=0 for both the upper and lower networks demonstrates the accuracy of the MS measurements.

184

Metabolic network and flux analysis of S. cerevisiae TABLE 7: Measured and simulated mass fractions and standard deviations (determined from 5 repeated experiments) of various intermediates of the lower part of the central carbon metabolism (see Fig. 1-III) time=0 min., measured

time=0, simulated network III

time=40 min., measured

time=40, simulated, network III, separate fit MS data

time=40, simulated, network III, simultaneous fit NMR and MS data

time=60 min., measured

time=60, simulated, network III, separate fit MS data

time=60, simulated, network III, simultaneous fit NMR and MS data

sample >

0.973±0.024 0.027±0.024 0.918±0.003 0.065±0.004 0.016±0.001

0.961 0.033 0.921 0.063 0.015

0.940±0.009 0.060±0.009 0.951±0.005 0.049±0.005

0.971 0.022 0.948 0.043

0.937±0.016 0.046±0.003 0.017±0.018

0.948 0.043 0.009

0.935±0.003 0.048±0.002 0.016±0.002

0.946 0.043 0.010

0.828±0.034 0.142±0.020 0.746±0.009 0.126±0.007 0.079±0.004 0.032±0.002 0.012±0.000 0.912±0.007 0.089±0.007 0.570±0.040 0.217±0.006 0.154±0.015 0.038±0.001 0.021±0.015 0.507±0.017 0.287±0.011 0.162±0.023 0.044±0.003 0.514±0.009 0.283±0.009 0.150±0.005 0.043±0.002 0.010±0.001

0.821 0.160 0.724 0.144 0.078 0.040 0.012 0.828 0.128 0.636 0.186 0.142 0.032 0.004 0.509 0.266 0.188 0.029 0.508 0.265 0.189 0.030 0.008

0.815 0.168 0.615 0.186 0.151 0.035 0.012 0.799 0.145 0.567 0.176 0.223 0.028 0.008 0.563 0.172 0.228 0.029 0.562 0.172 0.228 0.030 0.009

0.787±0.052 0.213±0.052 0.681±0.004 0.147±0.002 0.103±0.002 0.044±0.002 0.017±0.001 0.883±0.040 0.117±0.040 0.481±0.025 0.259±0.021 0.183±0.005 0.053±0.007 0.024±0.020 0.461±0.009 0.304±0.016 0.178±0.019 0.057±0.005 0.481±0.006 0.292±0.008 0.161±0.004 0.053±0.002 0.013±0.001

0.841 0.142 0.677 0.148 0.107 0.050 0.015 0.791 0.147 0.610 0.203 0.147 0.034 0.006 0.496 0.273 0.187 0.033 0.495 0.273 0.188 0.033 0.011

0.845 0.141 0.645 0.159 0.150 0.033 0.011 0.800 0.143 0.588 0.157 0.220 0.027 0.008 0.583 0.154 0.225 0.028 0.582 0.154 0.226 0.029 0.009

measured compound and mass fraction \/ pyr (M=87) cit (M=191)

gox (M=73) suc (M=117)

fum (M=115)

mal (M=133)

M+0 M+1 M+0 M+1 M+2 M+3 M+4 M+0 M+1 M+0 M+1 M+2 M+3 M+4 M+0 M+1 M+2 M+3 M+0 M+1 M+2 M+3 M+4

In contrast to the intermediates of the upper network, those of the lower network are not in isotopic steady state after 40 minutes. No less than 15 out of the 23 fractions measured at t=40 and t=60 significantly differ based on a 95% confidence level. For all six metabolite pools the M+0 fractions decrease and the fractions of the higher masses increase when going from t=0, via t=40 to t=60. This can be explained either by a large metabolite pool with a turnover rate (throughput divided by pool size) that is considerably smaller than anticipated or by a influx into one of the metabolite pools that originates from a biomass component that is not yet in isotopic steady state after 40 minutes. This latter explanation would be in conflict with the

185

Chapter 9 common assumption that biomass components are formed unidirectionally (i.e. no turnover of protein, lipids, etc. occurs). In the model, isotopic non-steady state of a metabolite pool can be accounted for by adding an influx with unlabeled compound into the pool and an equally large efflux of the same pool (see Appendix A). Of all the possible non-steady state pools that were tested, the best fit of the MS data was found by assuming that the mitochondrial pyr and oaa pools are not in isotopic steady state. This can be intuitively understood when inspecting the MS data in Table 7: • without an influx of unlabeled pyr, the M+0 fraction of pyr (0.828 for t=40, 0.787 for t=60) cannot be explained by formation of pyr from pep (M+0=0.630 for t=40, 0.631 for t=60) and can only be explained by formation of pyr from mal if all the 13C-labeling of mal were concentrated on the carbon that is split off as CO2, which is highly unlikely, • without an influx of unlabeled oaa, the M+0 fraction of cit (0.746 for t=40, 0.681 for t=60) cannot be explained by formation of cit from oaa and acoa. Although the mass fractions of these two compounds were not measured, oaa can only be formed from mal (M+0=0.514 for t=40, 0.481 for t=60) or from the carboxylation of pyr (0.828 for t=40, 0.787 for t=60) with CO2 (M+0 estimated to be 0.8) leading to oaa with M+0=0.828*0.8=0.662 for t=40, and 0.787*0.8=0.630 for t=60. Combined with acoa that will have a M+0 fraction smaller than 1, the formed cit will have a M+0 fraction that is considerably lower than the measured one. For the reasons above the unlabeled influxes v51 and v52 were included in the model. The MS data measured in the samples taken at t=40 and t=60 were fitted simultaneously by a single set of fluxes for the lower network (Fig.1-III), except for v51 and v52 that were specific for each data set. In order to check whether the two different measurement techniques that were used to obtain 13C-labeling information also result in different flux fits for the lower network, the relative intensities in the NMR spectra and the MS data were separately fitted. These fits were performed using the cumomer balances of the lower network. Next, the two sets of 13Clabeling data were simultaneously fitted to investigate whether the metabolic model could be brought into accordance with two totally independent data sets. The parameters and results of the three iterative minimizations of the covarianceweighted sums of squared residuals are summarized in Table 8. The table shows that the separate fit of the MS data is only slightly better (c/(a-b)=28.4) than the separate fit of the NMR data (c/(a-b)=32.9). The fit of the MS data is worse than for the upper network (c/(ab)=14.0 for hypothesis B), whereas the fit of the NMR data is clearly better than for the upper network (c/(a-b)=98.5 for hypothesis B). The minimized sum of squared errors of the simultaneous fits of the MS and NMR data are larger than those of the separate data sets. This shows that, similarly to the upper network, both the measured NMR data and the MS data are less well approximated when fitted at the same time. Apparently, the single model is again not consistent with the two data sets.

186

Metabolic network and flux analysis of S. cerevisiae TABLE 8: Number of independent data points (a), number of free fluxes (b), minimal sums of squared residuals (SSres) (c) and the ratio c/(a-b) for the lower network (Fig.1III). Shown outcomes are of the fits based on the data sets measured using MS, NMR and MS+NMR data \/ NMR

MS

NMR + MS

network > a) indep. data points (total data points - number of data sets that are normalized to one) b) parameters (free fluxes) c) min. weighted SSres c/(a-b) a) indep. data points b) parameters (free fluxes + cytosolic fractions + influxes v51 and v52) c) min. weighted SSres c/(a-b) a) indep. data points b) parameters (free fluxes + cytosolic fractions+ influxes v51 and v52) c) min. weighted SSres c/(a-b)

III 133 (198-65) 11 4016.0 32.9 46 19 (11+4+4) 766.6 28.4 179 (133+46) 19 (11+4+4) 10053.6 (NMR: 6146.2+ MS: 3907.5) 62.8

As was also observed for the upper network, the three minimized sums of squared residuals (c in Table 8) are larger than the expected value of 1, leading to a rejection of all fits on statistical ground. Apparently the model in Fig.1-III is not fully correct or complete or the measurement error is underestimated. The NMR data that were fitted separately or simultaneously with the MS data using the lower network are shown in Table 6. The mean absolute deviation between the measured and simulated relative intensities in this table is 0.014 for the separately fitted NMR data and 0.020 for the simultaneously fitted NMR and MS data. For the separately fitted NMR data the largest absolute deviation is 0.064 and is observed for the fine structure ‘dd’ of the met-α multiplet. The MS data that were fitted separately or simultaneously with the NMR data using the lower network are shown in Table 7. The mean absolute deviation between the measured and simulated relative intensities in this table is 0.026 for the separately fitted MS data and 0.053 for the simultaneously fitted NMR and MS data. When the MS data are separately fitted the largest deviations between the measured and simulated mass fractions are observed for the M+0 fractions of gox and suc, both for the samples taken at t=40 and t=60. The fluxes that were fitted using the lower network for either the separately fitted NMR data, the separately fitted MS data or the simultaneously fitted combined data set are given in Table 9. The table shows that the fluxes deviate considerably depending on the data set that is fitted. The fluxes that are found when separately fitting the NMR data indicate that 57% of the cytosolic pyr is imported into the mitochondrion (v29), 25% is carboxylated to form oaa (v25) and the remaining 17% is decarboxylated to form acoa (v26). When fitting the MS data, a higher fraction (46%) is carboxylated to form oaa (v25). This higher fraction is needed to 187

Chapter 9 supply the cit synthase reaction (v39) that is found to be active when the MS data are fitted (not for the NMR data) and for the surprisingly high rate of formation of cytosolic mal (v42). This mal is imported into the mitochondrion (v42) to allow the very high rate of malic enzyme reaction (v38). The MS and NMR data sets further lead to opposite flux estimations with respect to the transport of cytosolic acoa and oaa into the mitochondrion, and the reversibility of the cytosolic and mitochondrial interconversion of oaa and mal (v37 and v42). Although De Jong-Gubbels et al.(1995) concluded from their study of enzyme activities that the gox shunt was not active in the strain T23D of S. cerevisiae under the present conditions (90% (w/w) glc, 10% (w/w) eth in the medium), the observed intracellular gox and the elevated M+2 fraction and decreased M+0 fraction of suc with respect to those of cit strongly suggest that suc is partly formed from the doubly 13C-labeled eth in the medium via the gox shunt in the presently studied strain CEN.PK-113.7D. The gox shunt (v40) is estimated to be more active when fitting the MS data than when fitting the NMR data. The net amount of CO2 that is produced in the metabolic network of Fig.1 (v43) is by the definition of the system fixed to the amount of carbon that is taken up from the medium in the form of glc and eth minus the amount of carbon consumed in the biosynthetic reactions. The model yields an estimation of 262.9 (mole/100 mole of glucose), which corresponds reasonably well with the amount of CO2 that was measured to leave the reactor: around 280 (mole/100 mole of glucose) for both experiments (see Table 1). The influxes v51 and v52 of the mitochondrial pyr and oaa pools that were included to explain the observed isotopic non-steady state and that could be estimated using the MS data show especially a relatively large unlabeled influx of oaa. The higher estimated value of v52 at t=40 than at t=60 is consistent with the expected gradual wash-out of the remaining unlabeled metabolite fraction: the modeled unlabeled influx should vanish at infinite time. The estimated unlabeled influxes of pyr that increase with time, however, do not agree with this. TABLE 9: Fitted metabolic fluxes for the lower network (Figure 1-III), normalized to a glucose uptake flux of 100, and the fitted cytosolic fractions of four compartmented metabolite pools

110.8 47.1 50.6 6.0 38.8 0.0 54.3 0.0

network III, NMR and MS data fitted

188

pyr (cyt) → pyr (mit)

95.6 b 92.8 24.5 15.9 38.8 c 30.3 55.2 >1000

network III, MS data fitted

v8 v19 v25 v26 v27 v28 v29, net v29, exch

net flux direction

network III, NMR data fitted

flux in Fig.1 a

103.5 69.1 22.8 10.1 38.8 0.0 70.6 0.0

Metabolic network and flux analysis of S. cerevisiae v30, net oaa (cyt) → v30, exch oaa (mit) v31, net cit (cyt) → v31, exch cit (mit) v32 v33 v34 v35, net suc→ v35, exch mal (mit) v36 v37, net mal (mit)→ v37, exch oaa (mit) v38 v39 v40 v41 v42,net mal (cyt)→ v42, exch oaa (cyt) v43 v44 v45 v46 v47 v48 v49 v50 v51 (40 min) v51 (60 min) v52 (40 min) v52 (60 min) cytosolic fractions pyr cit suc fum/mal notes:

7.6 63.5 -3.5 133.2 45.1 72.3 57.2 60.7 396.4 >>1000 64.7 45.7 7.6 0.0 3.5 11.6 -8.1 0.0 262.9 0.0 8.8 20.9 17.8 0.0 3.1 11.6 -

0.0 0.0 2.3 42.5 77.5 74.4 65.2 76.0 >>1000 >>1000 74.4 0.0 41.0 13.1 10.8 39.5 -28.7 29.1 262.9 0.0 8.8 20.9 17.8 0.0 3.1 11.6 163.4 253.9 >>1000 209.6

-16.8 31.3 10.1 0.0 64.2 61.2 59.6 68.6 >1000 >>1000 78.0 0.0 11.4 19.1 9.0 20.8 -11.8 3.0 262.9 0.0 8.8 20.9 17.8 0.0 3.1 11.6 43.3 87.5 218.6 213.6

-

0.19 0.00 0.00 0.73

0.19 0.00 0.00 0.06

a

Bidirectional fluxes in Figure 1 are presented as net and exchange (exch) fluxes in this column. Net fluxes are defined in the direction indicated in the second column. Positive values correspond to net fluxes in the defined direction, negative values correspond to net fluxes in the opposite direction. b Black cells in the table hold fluxes that are input from the upper network (see table 5). c Gray cells in the table hold fluxes that are fixed at their values due to model assumptions.

Using the MS data the cytosolic pool fractions of the pyr, cit, suc and fum/mal pools could be estimated as well. As shown in Table 9, cit and suc are estimated to be predominantly located in the mitochondrion, whereas pyr and fum/mal are estimated to have cytosolic mole fractions of 19% and 73%, respectively. It should be stressed that considering the large number of free parameters compared to the number of MS data for the lower network, the estimated cytosolic mole fractions should be interpreted with care. The results presented here show that using different types of data for flux determination results in different flux estimations. Besides the measurement method that is employed, the topology of the metabolic model has an enormous impact on the flux patterns that emerges 189

Chapter 9 from the fits of both data sets. The presently proposed model was the one that was selected out of many alternatives in order to yield the smallest minimized sums of squared errors. Many more possible reactions and reaction reversibilities can be hypothesized which will yield flux distributions that differ from the ones presented here. Inclusion of more reactions in the model leads to larger confidence intervals or may even cause identifiability problems for the estimated fluxes. When the model becomes overly complex, a demanding, but almost indispensable method to increase the reliability of the model (and thereby of the estimated fluxes) is to knock out genes of enzymes that are estimated to have high activity and to perform a 13C-labeling study of the mutant to verify whether removal of the specific reaction from the model can explain the observed changes in the 13C-labeling data. 9.5 CONCLUSIONS In this chapter we presented the first direct measurements of the 13C-labeling based mass isotopomer distribution of 16 metabolic intermediates of the glycolysis, PPP, glyoxylate shunt and TCA cycle in S. cerevisiae. These measurements were obtained by rapid sampling and quenching from a chemostat culture, followed by boiling ethanol extraction and LC-MS analysis of the metabolic intermediates. Comparison of the mass fractions measured for intermediates in cells grown on unlabeled medium with the theoretically expected mass fractions caused by natural isotopes, demonstrates the accuracy of the method. In order to validate an extensive model of the primary carbon metabolism of S. cerevisiae including a cytosolic and mitochondrial compartment, we tested its ability to fit two data sets that were obtained from two identical chemostat cultures by means of different measurement methods. Besides the LC-MS data, an extensive data set was measured using 2D [13C,1H] COSY. Although both data sets could be fitted well by the model when judged by the absolute deviations between the measured and the simulated data, it must be concluded that the presented model is not correct or complete. This conclusion is based on the observations that: (1) the two different data sets yield different flux estimations, (2) the covariance weighted minimized sums of squared deviations between the measured and simulated data lead to rejection of the model on statistical ground, (3) simultaneous fitting of the NMR and MS data leads to considerably worse fits than separately fitting the data sets. It was shown that MS measurements of the 13C-labeling distribution of intermediates that are present both in the cytosol and in the mitochondrion allow the estimation of their molar fractions in both compartments. Furthermore, the MS measurements showed that the 13 C-labeling distribution of the glycolytic and PPP intermediates is in isotopic steady state after 40 minutes, but 13C-labeling distribution of the TCA cycle intermediates is not. It was shown that this isotopic non-steady state can be accounted for in the (non-dynamic) cumomer model that is used for the simulation of the 13C-labeling distributions. The fit of the MS data yielded a distribution of 44:28:26 for the fluxes leaving the g6p pool towards the PPP, glycolysis and biosynthesis, which is close to values found in previous studies. It was furthermore found that in an extended version of the non-oxidative branch of the PPP the fluxes may be assumed irreversible without negatively affecting the fit. The fluxes around the pyruvate node and in the TCA cycle that were fitted based on the MS data give unconventional results. Especially the pyruvate carboxylation and malic enzyme 190

Metabolic network and flux analysis of S. cerevisiae reactions had high estimated rates. The MS measurements of the TCA cycle intermediates gave strong indications that the glyoxylate shunt is active when the carbon substrate in the feed consists of 90% (w/w) glucose and 10% (w/w) ethanol. The method for direct LC-MS measurement of the 13C-labeling of metabolic intermediates that was presented in this chapter is complementary to existing methods such as GC-MS and NMR. It has experimental advantages of requiring a relatively short supply of 13 C-labeled feed and only small biomass samples. Furthermore, the method gives direct information about the isotopomer distributions of the intermediates, so that these need not be deduced from measured accumulating components (as is the case for GC-MS and NMR data) and thereby offers a qualitative enrichment of the currently available data required to identify fluxes even in a complex network, such as the primary carbon metabolism. However, more important than identification of the fluxes is the validation of the metabolic network. The present results point at incompleteness of the current state-of-the-art metabolic models of eukaryotes. In order to determine the complete map of the carbon metabolism it is mandatory to perform more experiments using variously 13C-labeled substrates, deletion mutants of the studied strain and varying, well-controlled cultivation conditions and to extend the amount of data that is obtained per experiment by including MS measurements of free intracellular amino acids and by measuring mass isotopomer distributions of fragments of the intracellular metabolites. Without a validated map of the carbon metabolism all determined fluxeomes remain questionable. APPENDIX A: MODELING ISOTOPIC NON-STEADY STATE When all metabolite pools in a metabolic model are in isotopic steady state, it is assumed that 13 C-labeled medium has been supplied for infinite time. The steady state isotopomer distribution vector (idv(∞)) of metabolite P that has n influxes stemming from metabolites S1 to Sn, satisfies the following set of isotopomer balances: n

n

∑ ( v ) ⋅ idv ( ∞ ) = ∑ ( v ⋅ IMM i =1

i

P

i =1

i

Si>P

⋅ idvSi ( ∞ ) )

( A1)

where IMMSi>P represents the isotopomer mapping matrix for the specific conversion of Si to P. Note that bimolecular reactions have been ignored for the sake of simplicity. From the isotopic steady state isotopomer distribition vector and the isotopomer distribution vector of the naturally labeled compound (idv(0)), one can calculate the isotopic non-steady state isotopomer distribution of P at time t using Eq.A2 (cf. Chapter 4):

idv P ( t ) = e − D⋅t ⋅ idv P ( 0 ) + (1 − e − D⋅t ) ⋅ idv P ( ∞ )

( A2 )

In Eq.A2, D is the turnover rate of the metabolite pool, i.e. its throughput divided by its pool size. This equation is based on the assumption that all metabolites S that are converted to P are in isotopic steady state and P is not due to its relatively low turnover rate. In practice this assumption is seldom valid, since metabolic network often contain feedback loops that indirectly convert P back to S, which causes isotopic non-steady state of S. A second objection against the application of Eq.A2 is that it cannot be implemented in cumomer models, which are strictly based on the isotopic steady state assumption.

191

Chapter 9 An alternative way to model isotopic non-steady state is to extend the model with an influx vextra with naturally labeled P0 into pool P and an equally large efflux of pool P. This approach allows cumomer modeling, as P0 is simply treated as an additional invariable external substrate. The isotopic non-steady state isotopomer distribution of P at time t is now in fact modeled as: n n   v v idv t v IMM idv 0 + ⋅ = ⋅ ⋅ + ( ) ( ) ( ) ( vi ⋅ IMMSi>P ⋅ idvSi ( ∞ ) ) ∑ ∑ extra i P extra P P0>P   i =1 i =1  

( A3)

where IMMP0>P is an identity matrix. Substituting Eq.A1 into Eq.A3 yields: n n   + ⋅ = ⋅ + v v idv t v idv 0 ( ) ( ) ( ) ( vi ) ⋅ idv P ( ∞ ) ⇔ ∑ ∑ i  P extra P  extra i =1 i =1   n

idv P ( t ) =

∑(v )

v extra

i

⋅ idv P ( 0 ) + ⋅ idv P ( ∞ ) ( A4 ) n n      v extra + ∑ ( vi )   vextra + ∑ ( vi )  i =1 i =1     Using this approach, the cumomer model calculates an isotopic steady state isotopomer distribution vector of P that in fact represents its isotopic non-steady state. In case the model contains feedback loops converting P to S, the model equally yields the resulting isotopic non-steady state isotopomer distribution of S. Note that this approach is a relatively simple trick to use cumomer models for dynamic isotopomer simulations. By including additional naturally labeled influxes for all pools in the network model, the isotopomer dynamics of all pools can be taken into account. By combining Eqs.A2 and A4 one can calculate for each metabolite pool with a turnover rate D what value for flux vextra leading to and from that pool needs to be given as input to the model to calculate the isotopic non-steady state of pool at time=t: v extra

i =1

n e − D⋅ t ⋅ ∑ ( vi ) 1 − e − D⋅t i =1

( A5 ) n    v extra + ∑ ( vi )  i =1   Note that Eq.A5 yields an infinitely large flux vextra at time t=0 (equating idv(t) to idv(0), see Eq.A4) and a vanishing flux vextra at time t=∞ (equating idv(t) to idv(∞), see Eq.A4). e − D⋅ t =

⇔ v extra =

REFERENCES Chaves, R.S., Herrero, P., Ordiz, I., Angeles del Brio, M., Moreno, F. (1997) Isocitrate lyase localisation in Saccharomyces cerevisiae Cells. Gene, 198: 165-169 Dauner, M., Sauer, U. (2000) GC-MS analysis of amino acids rapidly provides rich information for isotopomer balancing. Biotechnol. Prog., 16: 642-649 De Jong-Gubbels, P., Vanrolleghem, P., Heijnen, J., Van Dijken, J.P., Pronk, J.T. (1995) Regulation of carbon metabolism in chemostat culture of Saccharomyces cerevisiae grown on mixtures of glucose and ethanol. Yeast, 11: 407-418 Gombert, A. K., Moreira dos Santos, M., Christensen, B., Nielsen, J. (2001) Network identification and flux quantification in the central metabolism of Saccharomyces cerevisiae under different conditions of glucose repression. J. Bacteriol., 183, 4: 1441-1451

192

Metabolic network and flux analysis of S. cerevisiae Hellerstein, M.K., Neese, R.A. (1999) Mass isotopomer distribution analysis at eight years: theoretic, analytic and experimental considerations. Am. J. Physiol., 276: E1146-E1170 Kaplan, R.S., Mayor, J.A., Gremse, D.A., Wood, D.O. (1995) High level expression and characterization of the mitochondrial citrate transport protein from the yeast Saccharomyces cerevisiae. J. Biol. Chem., 270, 8: 4108-4114 Lange, H.C., Heijnen, J.J. (2001a) Statistical reconciliation of the elemental and polymeric biomass composition of Saccharomyces cerevisiae. Biotechnol. Bioeng., 75, 3: 334-344 Lange, H.C., Eman, M., Van Zuijlen, G., Visser, D., Van Dam, J.C., Frank, J., Texeira De Mattos, M.J., Heijnen, J.J. (2001b) Improved rapid sampling for in vivo kinetics of intracellular metabolites in Saccharomyces cerevisiae. Biotechnol. Bioeng., 75, 4: 406-415 Lange, H.C. (2002) Quantitative physiology of Saccharomyces cerevisiae using metabolic network analysis. PhD-thesis Delft Unversity of Technology, ISBN 90-9015552-X, Delft, The Netherlands Maaheimo, H., Fiaux, J., Çakar, Z. P., Bailey, J. E., Sauer, U., Szyperski, T. (2001) Central carbon metabolism of Saccharomyces cerevisiae explored by biosynthetic fractional 13C-labeling of common amino acids. Eur. J. Biochem., 268: 2464-2479 Möllney, M., Wiechert, W., Kownatzki, D., De Graaf, A.A.. 1999. Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol. Bioeng., 66, 2: 86-103 Palmieri, L., Vozza, A., Hönlinger, A., Dietmeier, K., Palmisano, A., Zara, V., Palmieri, F. (1999) The mitochondrial dicarboxylate carrier is essential for the growth of Saccharomyces cerevisiae on ethanol or acetate as the sole carbon source. Mol. Microbiol., 32, 2: 569-577 Palmieri, L., Runswick, M.J., Fiermonte, G., Wlaker, J.E., Palmieri, F. (2000) Yeast mitochondrial carriers: bacterial expression, biochemical identification and metabolic significance. J. Bioenerg. Biomembr., 32, 1: 67-77 Szyperski, T. (1998) 13C-NMR, MS and metabolic flux balancing in biotechnology research. Quart. Rev. Phys., 31, 1: 41-106 Van Dam, J.C., Eman, M.R., Frank, J., Lange, H.C., Van Dedem, G.W.K., Heijnen, J.J. (2002) Analysis of glycolytic intermediates in Saccharomyces cerevisiae using anion exchange chromatography and electrospray ionization with tandem mass spectrometric detection. Anal. Chim. Acta, 460, 209-218 Verduyn, C., Postma, E., Scheffers, W.A., Van Dijken, J.P. (1992) Effect of benzoic acid on metabolic fluxes in yeasts: a continuous culture study on the regulation of respiration and alcoholic fermentation. Yeast, 8: 501-517 Wiechert, W., Möllney, M., Isermann, N., Wurzel, M., De Graaf, A.A. 1999. Bidirectional reaction steps in metabolic networks: III. Explicit solution and analysis of isotopomer labeling systems. Biotechnol. Bioeng., 66, 2: 69-85

193

Chapter 9

194

Chapter 10

Verifying Assumed Biosynthetic Pathways, Metabolic Precursors and Estimated Measurement Errors of Amino Acids, Trehalose and Levulinic Acid Using Redundant 2D [13C,1H] COSY Data ABSTRACT Assumptions about the biosynthetic routes of intracellularly accumulating compounds can be verified by comparing the 2D [13C,1H] COSY spectra of different compounds that are assumed to have a common precursor in central carbon metabolism. Here we study 2D [13C,1H] COSY spectra of a wildtype and an isogenic deletion mutant strain of S. cerevisiae to verify biosynthesis assumptions that have been published in literature. The results generally confirm the assumptions. Furthermore, a new method is presented to determine fractional enrichments from 2D [13C,1H] COSY spectra of fragments that have been synthesized from more than one metabolic precursor. Finally, we find that measurement errors that were estimated from NMR noise using a theoretical spectral model fail to explain observed data variability and we offer a possible explanation for this finding.

195

Chapter 10 10.1 INTRODUCTION Intermediates of the central metabolism are present in the cell in too low concentrations to measure their 13C-labeling by means of 13C-NMR. Therefore, it has become common practice to measure 13C-NMR spectra of intracellularly accumulating (polymeric) compounds that are synthesized from the metabolic intermediates, such as proteinogenic amino acids and storage sugars. In order to deduce information about the 13C-labeling patterns of intermediates of primary metabolism from the NMR data of the accumulating compounds, the precursors and the biosynthetic pathways leading to these compounds must be well known. In their article on the primary metabolism of Saccharomyces cerevisiae Maaheimo et al. (2001) published a clear overview of the origins of each carbon atom in the essential amino acids, both with respect to the precursor intermediates and with respect to the compartmentation of those precursors. In this appendix, the assumptions made by Maaheimo et al. are checked by comparing the (sums of) measured relative intensities that were claimed to correspond to identical intact fragments in the precursor. This comparison is made for two independent NMR data sets, one of a wildtype Saccharomyces cerevisiae and one of a ∆TPI-mutant of the same organism. 10.2 THEORY 13 C-Labeling of fragments smaller than observed by 2D [13C,1H] COSY If two amino acids or other accumulating compounds have a common precursor in the central carbon metabolism, then the parts of their carbon backbones that were synthesized from that precursor should have identical 13C-labeling patterns. One of the means to check the biosynthetic assumptions is to compare the 2D [13C,1H] COSY spectra of the fragments that are expected to be identically labeled. A 2D [13C,1H] COSY spectrum of a central carbon atom shows the 13C-labeling of the observed carbon and of the two carbons that are directly bonded to it. In case this entire threecarbon fragment originates from the same precursor in both compounds of which the biosynthetic routes are checked, one can simply compare the relative intensities of the singlet, doublets and double doublet in their 2D [13C,1H] COSY spectra. Alternatively, in case not the entire three-carbon fragment but only a two-carbon fragment thereof is formed from a common precursor, only the relative intensities of fine structures caused by different 13Clabeling patterns in that specific fragment should be compared. This can be done by summing the relative intensities of fine structures caused by isotopomers that only differ in the carbon atom that originates from another precursor molecule. Consider for example the situation where the 2D [13C,1H] COSY spectra of the αβγcarbon fragments of two amino acids are measured and only the αβ-carbon fragment in both amino acids stems from a common precursor. The 13C-labeling of the these 2-carbon fragments can then be compared by summing the relative intensities of the fine structures caused by the labeling patterns 010+011 and 110+111 in both amino acids (where ‘0’ denotes 12 C and ‘1’ denotes 13C). Doing so, one in fact compares the occurrence of the labeling patterns 01x and 11x in both amino acids (where ‘x’ denotes either 12C or 13C).

196

Verifying biosynthetic pathways, precursors and estimated errors Calculation of fractional enrichments Consider the biosynthesis of an amino acid in which the m-carbon fragment [C1-…-Cm] is joined to a n-carbon fragment [Cm+1-…-Cm+n] to form the (m+n)-carbon fragment [C1-…Cm+n]. The simultaneous 13C-labeling of the two neighboring carbons Cm and Cm+1 then is the result of a random process. In case the chemostat culture in the considered 13C-labeling experiment is fed with medium containing a fraction Pf of uniformly 13C-labeled substrate(s) and a fraction (1-Pf) of unlabeled substrate(s), then all carbon atoms have the identical fractional enrichments (F.E.) that are given by: F.E.=Pf + (1-Pf ) ⋅ Pn

(1)

It follows that for any fragment [C1-…-Cm+n] in which Cm+1 is 13C-labeled, the chance that Cm is 13C-labeled as well equals F.E. When measuring the multiplet of carbon Cm+1 by means of 2D [13C,1H] COSY, the 13C-labeling of fragment [Cm-Cm+1-Cm+2] is observed: [12Cm13 Cm+1-12Cm+2] gives a singlet (s), [13Cm-13Cm+1-12Cm+2] a doublet (d1), [12Cm-13Cm+1-13Cm+2] a different doublet (d2) and [13Cm-13Cm+1-13Cm+2] a double doublet (dd). From the above we can conclude that: F.E.=

(

fraction  13 Cm - 13 Cm+1 - 12 Cm+2 

fraction  12 Cm - 13 Cm+1 - 12 Cm+2  +  13 Cm - 13 Cm+1 - 12 Cm+2 

(

fraction  13 Cm - 13 Cm+1 - 13 Cm+2 

fraction  12 Cm - 13 C m+1 - 13 C m+2  +  13 Cm - 13 C m+1 - 13 C m+2 

)

)

=

( 2)

or: F.E.=

relative intensity(d1) relative intensity(dd) = relative intensity(s+d1) relative intensity(d2+dd)

( 3)

Reconsider the above example of an αβγ-carbon fragment of an amino acid, where the αβcarbon fragment and the γ-carbon stem from two separate precursor molecules. Measurement of the 13C-labeling of the β-carbon by means of 2D [13C,1H] COSY yields the relative occurrence of the labeling patterns 010, 011, 110 and 111 (from here on these isotopomer fractions are denoted as f010, f011, etc.). In this case one knows that the ratios f011/(f010+f011) and f111/(f110+f111) must be equal and identical to the fractional enrichment of the γ-carbon. 10.3 MATERIALS AND METHODS The data that are used in this chapter are three sets of 2D [13C,1H] COSY data of yeast biomass lysate. Two of those sets are 13C-labeling measurements of biomass samples of the ‘wildtype’ S. cerevisiae strain CEN.PK-113.7D that were taken 1.2 and 3.5 dilution times after the switch from medium containing unlabeled glucose to medium containing 10% uniformly 13C-labeled glucose (i.e. Pf=0.10). The third data set consists of 13C-labeling measurements of a biomass sample of an isogenic triose-phosphate isomerase deletion (∆TPI) mutant of S. cerevisiae that was taken 3.5 dilution times after the switch from medium containing unlabeled glucose to medium containing 10% uniformly 13C-labeled glucose. Both strains were cultured as aerobic, carbon-limited chemostat cultivation at a constant dilution rate of 0.10 hr-1. The medium that was supplied was defined mineral medium. The dual carbon source was 3 g/l glucose and 0.3 g/l ethanol. Nitrogen was solely 197

Chapter 10 provided in the form of ammonia. For details regarding the cultivation methods, biomass sampling and handling see Lange et al. (2002). For details regarding the NMR measurement method and analysis of the spectra see Chapter 4. All measured 13C-data were extrapolated to data that would be measured after infinitely long supply of 13C-labeled medium by means of the correction for isotopic non-steady state described in Chapter 4. The two NMR data sets of the early and the late biomass samples of wildtype S. cerevisiae were combined to a single averaged data set that will be further used in this chapter. 10.4 RESULTS AND DISCUSSION Verifying assumed biosynthetic pathways and metabolic precursors • Assumption I: Alanine, valine, the β - and γ 2-carbon atoms of isoleucine and the β -, γ -, δ 1- and δ 2-carbon atoms of leucine are synthesized from mitochondrial pyruvate. Tables 1 and 2 show the (summed) relative intensities that should correspond for these four amino acids if this assumption is correct. In these tables and all tables that follow, the first column shows the labeling pattern to which the relative intensities in each row correspond. The greek symbols follow the notation of Maaheimo et al. (2001). Additionally, the symbol ‘χ’ denotes the carboxy-group of an amino acid. TABLE 1: (summed) Relative intensities of α-carbon in alanine and valine labeling of fragment 01 11

wildtype ala-χα val-χα 0.255 0.244 0.745 0.756

∆TPI-mutant ala-χα val-χα 0.125 0.141 0.875 0.859

TABLE 2: (summed) Relative intensities of β-carbon in alanine, γ1-carbon in valine, γ2-carbon in isoleucine and δ1-carbon in leucine labeling of fragment 01 11

ala-αβ 0.169 0.831

wildtype val-βγ1 ile-βγ2 0.186 0.186 0.814 0.814

1

leu-γδ 0.216 0.785

ala-αβ 0.168 0.832

∆TPI-mutant val-βγ1 ile-βγ2 0.185 0.177 0.815 0.823

leu-γδ1 0.209 0.791

The relative intensities in Table 1 indeed agree quite well, especially for the wildtype strain. In Table 2 the relative intensities of valine and isoleucine are very similar, but those of alanine and leucine differ more. The intenstity of the doublet of leucine-γδ1 is over 5% lower than that of alanine, suggesting that the bond between the γ- and δ1-carbon atoms is sometimes split in this amino acid or that the amino acid is for a small part formed from an unlabeled precursor. An possible explanation for this observation is that the biosynthetic pathway leading from pyruvate to leucine (8 steps) is longer than those leading to alanine (1 step), valine (5 steps) and isoleucine (5 steps). In a longer pathway, the chance is larger that one of the intermediate pools is fed by a small biosynthetic side-stream which leads to a different labeling pattern of the end product.

198

Verifying biosynthetic pathways, precursors and estimated errors The assumption that alanine is synthesized from mitochondrial pyruvate can be further checked by comparing the 13C-labeling patterns of alanine and the χαβ-fragment of tyrosine and phenylalanine. If it is assumed that phosphoenolpyruvate is the only precursor of cytosolic pyruvate, then this comparison may lead to two conclusions: • if the patterns are identical, then alanine stems either from cytosolic pyruvate or from mitochondrial pyruvate provided that the 13C-labeling of this pool is identical to that of the cytosolic one due to a fast transmitochondrial exchange reaction or due to exclusive formation of mitochondrial pyruvate from cytosolic pyruvate, • if the patterns are not identical, then alanine must either stem from mitochondrial pyruvate or from cytosolic pyruvate provided that this pool is equilibrated with mitochondrial pyruvate that does not exclusively stem from cytosolic pyruvate. Without further information regarding the assumptions above, a more definite conclusion is not possible from this comparison alone. TABLE 3: Labeling of fragment 010 011 110 111

Relative intensities of α-carbon in alanine, phenylalanine and tyrosine

ala-χαβ 0.139 0.116 0.049 0.696

wildtype phe-χαβ 0.114 0.083 0.024 0.780

tyr-χαβ 0.129 0.087 0.031 0.754

ala-χαβ 0.096 0.029 0.045 0.831

∆TPI-mutant phe-χαβ tyr-χαβ 0.091 0.093 0.015 0.021 0.010 0.008 0.883 0.877

The difference in 13C-labeling between the alanine fragment and that of phenylalanine/tyrosine is clearly much larger than the mutual difference between phenylalanine and tyrosine. Both this observation and the results in Tables 1 and 2 support the assumption by Maaheimo et al. that alanine is formed from mitochondrial pyruvate. • Assumption II: The χαβ-fragments of tyrosine and phenylalanine stem from the common precursor phosphoenolpyruvate. The identical metabolic origin of the first three carbon atoms in phenylalanine and tyrosine was already observed in Table 3. It can be further checked by comparing the 13C-labeling of the αβ-fragments of tyrosine and phenylalanine in the 13C-spectrum of the β-carbon of the amino acids (Table 4). The table shows that the relative intensities closely correspond both for the wildtype and for the mutant strain. The common origin of these parts of phenylalanine and tyrosine is hereby clearly demonstrated. TABLE 4: labeling of fragment 01 11

(summed) Relative intensities of β-carbon in phenylalanine and tyrosine wildtype phe-αβ tyr-αβ 0.135 0.141 0.866 0.859

∆TPI-mutant phe-αβ tyr-αβ 0.118 0.116 0.882 0.884

199

Chapter 10 • Assumption III: Serine is formed from 3-phosphoglycerate, which is in isotopic equilibrium with phosphoenolpyruvate in the cell. If this assumption is correct, the 13C-labeling pattern of serine should be identical to that of the χαβ-fragments of tyrosine and phenylalanine. Table 5 shows the comparison. TABLE 5: labeling of fragment 010 011 110 111

Relative intensities of α-carbon in serine, phenylalanine and tyrosine

ser-χαβ 0.154 0.071 0.264 0.511

wildtype phe-χαβ 0.114 0.083 0.024 0.780

tyr-χαβ 0.129 0.087 0.031 0.754

ser-χαβ 0.102 0.025 0.289 0.584

∆TPI-mutant phe-χαβ 0.091 0.015 0.010 0.883

tyr-χαβ 0.093 0.021 0.008 0.877

Especially the lower two rows are in conflict with the above assumption. Clearly, the α-β carbon bond is split more often in the biosynthesis of serine than in that of the other two amino acids. This is probably caused by reversible cleavage of the third carbon of serine to form glycine. If this reaction occurs, the 13C-labeling pattern of the χα-fragments of the three amino acids should still be identical. Table 6 indeed shows that the relative intensities of the χα-fragments of the three amino acids correspond much better. TABLE 6: tyrosine labeling of fragment 01 11

(summed) Relative intensities of α-carbon in serine, phenylalanine, and

ser-χα 0.225 0.775

wildtype phe-χα 0.196 0.804

tyr-χα 0.216 0.784

∆TPI-mutant ser-χα phe-χα tyr-χα 0.127 0.107 0.115 0.873 0.893 0.886

• Assumption IV: Glycine is formed either from the χα-fragment of serine or from the χαfragment of threonine. Table 7 shows the labeling pattern of glycine and of the χα-fragment of serine that should be identical if glycine is exclusively synthesized by splitting off the β-carbon atom of serine. TABLE 7: labeling of fragment 01 11

(summed) Relative intensities of α-carbon in threonine and glycine

gly-χα 0.310 0.690

wildtype ser-χα 0.225 0.775

thr-χα 0.387 0.613

∆TPI-mutant gly-χα ser-χα thr-χα 0.139 0.127 0.167 0.861 0.873 0.834

For the wildtype this is clearly not the case. However, it should be mentioned that the timecorrected relative intensities of the early and late harvested biomass of the wildtype gave very different values for the α-carbon of glycine. The wildtype values in Table 7 are averages of f01/f11=0.361/0.639 for the early sample and 0.259/0.741 for the late one. Only the values of 200

Verifying biosynthetic pathways, precursors and estimated errors the late sample of the wildtype agree well with those of serine. For the mutant strain, the relative intensities of glycine correspond reasonably well with those of serine. Still, it cannot be excluded that the biosynthesis of glycine partially follows a different route. An alternative biosynthesis route is available: in S. cerevisiae glycine can also be synthesized from the χα-fragment of threonine by threonine aldolase (Maaheimo et al., 2001). Table 7 also shows the labeling pattern of the χα-fragment of threonine. For the wildtype the correspondence between the labeling patterns of glycine and threonine is better than between those of glycine and serine. However, as said above the relative intensities of glycine in the wildtype are an average of the two very different sets for the early and late sample. Whereas the values of the late sample (f01/f11=0.259/0.741) gave the best correspondence with serine, those of early sample (0.361/0.639) give the best correspondence with threonine. This observation is likely to be due to coincidence and not a result of shifting biosynthetic fluxes, because both samples were taken from a steady state culture in which fluxes are expected to remain constant over time. For the mutant strain, the labeling patterns of glycine and threonine show less agreement than those of glycine and serine. Since the values of glycine are lying between those of serine and threonine it is very well possible that glycine is made from both amino acids. Based on this finding the metabolic model of Chapter 9 needs to be extended with this alternative route for glycine synthesis. • Assumption V: Aspartic acid, methionine and threonine are all three synthesized from cytosolic oxaloacetate. This assumption of Maaheimo et al. can be verified by comparing the 13C-labeling of the χαβ- (Table 8) and αβγ-fragments (Table 9) of the concerning amino acids. The relative intensities of the αβγ-fragment of methionine were not available from the measurements. The relative intensities corresponding to the doublet of aspartic acid were summed in order to make them comparable with the threonine spectrum where the doublets overlapped. TABLE 8: labeling of fragment 010 011 110 111

TABLE 9: labeling of fragment 010 011+110 111

Relative intensities of α-carbon in aspartic acid, methionine and threonine

asp-χαβ 0.272 0.121 0.246 0.361

wildtype met-χαβ 0.238 0.128 0.238 0.397

thr-χαβ 0.277 0.109 0.256 0.358

∆TPI-mutant asp-χαβ met-χαβ thr-χαβ 0.126 0.113 0.143 0.026 0.027 0.023 0.053 0.055 0.053 0.794 0.805 0.781

(summed) Relative intensities of β-carbon in aspartic acid and threonine wildtype asp-αβγ thr-αβγ 0.270 0.280 0.610 0.633 0.120 0.087

∆TPI-mutant asp-αβγ thr-αβγ 0.145 0.150 0.779 0.796 0.076 0.053

201

Chapter 10 Both Tables 8 and 9 show reasonably good correspondence between the spectra. The singlet in the spectrum of the α-carbon of methionine is somewhat smaller (and the double doublet larger) than for the other two amino acids. The singlet and doublet in the spectrum of the βcarbon of aspartic acid is somewhat smaller (and the double doublet larger) than for threonine. The difference may be explained by the differences in length of the biosynthetic pathways of the concerning amino acids (see discussion Assumption I). Aspartic acid is a precursor of both threonine and methionine. It is surprising to find that the α-carbon spectrum of methionine has a higher double doublet intensity than that of its precursor. This would mean that one of the intermediate pools in between is fed with a more uniformly labeled side-stream which does not seem very likely. • Assumption VI: χα- and γ 1δ-fragments in isoleucine stem from cytosolic oxaloacetate. In Tables 10 and 11 the relative intensities of the fragments are compared to those of the corresponding fragments in aspartic acid, methionine and threonine in as far as they are available. The results in both tables confirm the common origin of the four amino acids. TABLE 10: (summed) Relative intensities of α-carbon in aspartic acid, methionine, threonine and isoleucine labeling of fragment 01 11

asp-χα 0.393 0.608

wildtype met-χα thr-χα 0.365 0.386 0.635 0.614

ile-χα 0.398 0.602

asp-χα 0.153 0.848

∆TPI-mutant met-χα thr-χα 0.140 0.166 0.860 0.834

ile-χα 0.174 0.827

TABLE 11: Relative intensities of γ-carbon in threonine and δ-carbon in isoleucine labeling of fragment 01 11

wildtype thr-βγ ile-γ1δ 0.623 0.625 0.377 0.375

∆TPI-mutant thr-βγ ile-γ1δ 0.869 0.879 0.132 0.121

• Assumption VII: Glutamate, proline and arginine are all three formed from the common precursor α-ketoglutarate. The βγδε-fragment of lysine is synthesized from the fragment in αketoglutarate that forms the αβγδ-fragment in the former three amino acids. The available relative intensities that allow verification of these assumptions can be found in Tables 12 to 15 below. TABLE 12: Relative intensities of α-carbon in glutamate and proline labeling of fragment 010 011 110 111

202

wildtype glu-χαβ pro-χαβ 0.335 0.322 0.193 0.205 0.345 0.333 0.127 0.140

∆TPI-mutant glu-χαβ pro-χαβ 0.381 0.390 0.240 0.238 0.294 0.278 0.085 0.094

Verifying biosynthetic pathways, precursors and estimated errors TABLE 13: Relative intensities of β-carbon in glutamate and proline and γ-carbon in lysine labeling of fragment 010 011+110 111

glu-αβγ 0.627 0.354 0.019

wildtype pro-αβγ 0.640 0.360 0.000

lys-βγδ 0.641 0.338 0.021

glu-αβγ 0.668 0.332 0.000

∆TPI-mutant pro-αβγ 0.632 0.368 0.000

lys-βγδ 0.628 0.318 0.054

TABLE 14: (summed) Relative intensities of γ-carbon in glutamate, proline and arginine and δ-carbon in lysine labeling of fragment 010 011+110 111

glu-βγδ 0.186 0.744 0.071

wildtype pro-βγδ arg-βγδ 0.182 0.203 0.725 0.750 0.093 0.047

lys-γδε 0.227 0.731 0.043

glu-βγδ 0.230 0.711 0.059

∆TPI-mutant pro-βγδ arg-βγδ 0.238 0.263 0.654 0.707 0.107 0.030

lys-γδε 0.276 0.705 0.019

TABLE 15: Relative intensities of δ-carbon in proline and arginine and ε-carbon in lysine labeling of fragment 01 11

pro-γδ 0.205 0.795

wildtype arg-γδ 0.220 0.780

lys-δε 0.225 0.775

∆TPI-mutant pro-γδ arg-γδ lys-δε 0.264 0.284 0.276 0.736 0.716 0.724

Although some differences can be found in the above four tables, the values generally agree quite well. In Tables 14 and 15 the differences between the relative intensities of the various amino acids seem to be similar for the wildtype and the mutant strain. Proline tends to consist of more uniformly labeled fragments than the others, whereas arginine and lysine have fewer. Their spectra show higher singlet intensities. No alternative biosynthesis routes were found that could possibly explain the observed differences. For lack of a large amount of multiple datasets confirming the large deviations, it is assumed that they are caused by relatively large experimental errors. • Assumption VIII: The χα-fragments in the amino acids lysine and leucine are derived from an acetyl-coenzymeA moiety that is of cytosolic origin in lysine and of mitochondrial origin in leucine. If the above assumption is correct and if the cytosolic and mitochondrial pools have different 13 C-distributions, then a different labeling pattern should be found for the χα-fragments in both amino acids. This finding would also support the assumption that the labeling of acetylcoA in the two compartments is not equilibrated by a rapid exchange reaction. If, on the other hand, the labeling of the two fragments is identical then either the labeling of the two acetylcoA pools is identical or the two acetyl-coA moieties in lysine and leucine have a common origin. 203

Chapter 10 TABLE 16: (summed) Relative intensities of α-carbons in lysine and leucine labeling of fragment 01 11

wildtype lys-χα leu-χα 0.349 0.247 0.651 0.754

∆TPI-mutant lys-χα leu-χα 0.295 0.274 0.705 0.726

The summed relative intensties of the wildtype spectra in Table 16 clearly confirm a different compartmental origin of the acetyl-coA in both amino acids. Those of the mutant are more similar, which could indicate that in this strain the cytosolic and mitochondrial acetyl-coA pools are more similar due to a different metabolic flux pattern. The χα-fragment of lysine has a higher ratio f01/f11 than leucine in both strains. If the assumed compartmental origins of the acetyl-coA moieties that are precursors of these fragments are correct, then this means that cytosolic acetyl-CoA has a higher ratio f01/f11 than mitochondrial acetyl-CoA. This may be partially explained by the inflow of naturally labeled ethanol (that was present in the medium) into the cytosolic acetyl-coA pool, because unlabeled ethanol contains a ratio f01/f11 of 0.99/0.01. However, f01 in naturally labeled ethanol is 100 times smaller than f00 for which reason even a relatively large inflow of ethanol into the cytosolic acetyl-CoA pool causes only a small bias of its ratio f01/f11. An additional explanation is that when threonine is cleaved by threonine aldolase, the resulting χα- and βγ-fragments are glycine (see Assumption IV) and acetaldehyde. The latter compound is converted to cytosolic acetyl-coA (Christensen,2000). As can be seen in Table 11 the βγ-fragment of threonine has a high ratio f01/f11 both in the wildtype and mutant strain. Therefore, threonine aldolase may also explain the observed higher ratio f01/f11 in lysine. Checking the consistency of overlapping 13C-labeling information • Check I: Due to the long-range couplings, the spectra of the β- and δ-carbons of histidine contain overlapping fragments (respectively the αβγδ- and βγδ-carbons) and can therefore be checked for their mutual consistency. The summed relative intensities that should correspond are given in Table 17. There is reasonable agreement between the two spectra. TABLE 17: (summed) Relative intensities of β- and δ-carbon in histidine labeling of fragment

101 111

wildtype his-βγδ his-βγδ from from δ-carbon β-carbon spectrum spectrum 0.059 0.069 0.941 0.931

∆TPI-mutant his-βγδ his-βγδ from from δ-carbon β-carbon spectrum spectrum 0.063 0.082 0.937 0.918

• Check II: For the wildtype strain the spectra of trehalose and levulinic acid were measured, both of which are derived from the metabolic intermediate glucose 6-phosphate. Tables 18 to 20 show the relative intensities that should theoretically be the same. The results 204

Verifying biosynthetic pathways, precursors and estimated errors in Tables 18 and 19 confirm the common origin of levulinic acid and trehalose. Table 20, however, shows a marked difference between the two spectra. An explanation was not found. TABLE 18: (summed) Relative intensities of 2nd carbon in levulinic acid and 3rd in trehalose labeling of fragment 010 011+110 111

wildtype lev-C2 tre-C3 0.287 0.279 0.188 0.163 0.525 0.559

TABLE 19: (summed) Relative intensities of 3rd carbon in levulinic acid and 4th in trehalose labeling of fragment 010 011+110 111

wildtype lev-C3 tre-C4 0.157 0.119 0.200 0.220 0.643 0.661

TABLE 20: (summed) Relative intensities of 5th carbon in levulinic acid and 6th in trehalose labeling of fragment 01 11

wildtype lev-C5 tre-C6 0.198 0.112 0.802 0.888

Calculated relative intensities Tables 1-20 above only gave the (summed) relative intensities corresponding to the larger (two-carbon) part of the three-carbon fragments that contain newly formed carbon-carbon bonds. For those fragments the fractional enrichments of the smaller (one-carbon) part can also be calculated as was described in the Theory section. Table 21 gives the fractional enrichments of the carbons that were joined to the fragments containing the observed carbon. Fractional enrichments could only be calculated for spectra in which the relative intensities of the two doublets could be separately determined. The ratios f011/(f010+f011) in the first row differ much more than the ratios f111/(f011+f111) in the second row. The reason for this is that the values of f011 are smaller than 0.03 in all cases except for ile-χαβ and leu-χαβ of the wildtype and leu-χαβ of the mutant. This results in relative large contributions of measurement errors to the calculated fractional enrichments. The values of f111 are larger than 0.06 in all cases except for lys-χαβ of the mutant. For this reason the calculated fractional enrichments in the second row are more reliable values.

205

Chapter 10 TABLE 21: Fractional enrichments determined from the 13C-spectra in which carboncarbon bonds were formed during biosynthesis. The carbons of which fractional enrichment was calculated are in italics, the carbons of which the spectrum was used are underlined labeling of fragment 011/(010+011) 111/(011+111)

val-χαβ 0.132 0.108

phe-αβγ 0.094 0.111

tyr-αβγ 0.168 0.109

011/(010+011) 111/(011+111)

0.086 0.097

0.184 0.106

0.162 0.120

wildtype ile-χαβ glu-δγβ 0.125 0.094 0.111 0.089 ∆TPI-mutant 0.183 0.088 0.090 0.079

lys-χαβ 0.118 0.103

leu-χαβ 0.121 0.102

0.091 0.074

0.121 0.094

For the wildtype, the first four values lie close to the theoretically expected value of 0.110 (see Eq.1: in this experiment Pf=0.100, Pn=0.011, so F.E.=0.100+(1-0.100)*0.011=0.110). The fractional enrichment of the β-carbon of glutamate in the wildtype is clearly lower, which is consistent with the fact that 10% (w/w) unlabeled ethanol was also fed to these cultures of S. cerevisiae. This ethanol will dilute the labeling of glutamate via its incorporation in the TCA-cycle intermediate citrate (via the intermediate acetyl coenzyme A). This phenomenon is similar to the dilution of 13C-labeling of the carbon substrate by catabolic degradation of an unlabeled cosubstrate that was discussed by Christensen and Nielsen (2002). They also found a lower average fractional enrichment of the glutamate carbons due to the incorporation of unlabeled acetyl coenzyme A in this compound. The fractional enrichments of the β-carbons of lysine and leucine are also somewhat lower than the theoretically expected value of 0.110, which is expected for lysine as its βcarbon has the same origin as the β-carbon of glutamate (Maaheimo et al.,2001;Christensen and Nielsen,2002). The β-carbon of leucine, however, stems from mitochondrial pyruvate of which the 13C-labeling is not expected to be diluted by unlabeled ethanol. The β-carbon of leucine has the same origin as the β-carbon of valine, so their fractional enrichments would be expected to be more similar. For the mutant, the values differ somewhat more than for the wildtype and generally seem to be a little lower. Again, the fractional labeling of β-carbon of glutamate is clearly lower than that of the carbons of the other amino acids. The fractional enrichment of the βcarbon of lysine is low as well, which agrees with the fact that it has the same origin as the βcarbon of glutamate. For the mutant the fractional enrichment of the β-carbon of leucine agrees better with the β-carbon of valine, confirming their common origin. Statistically testing the equality of relative intensities The equality of the relative intensities of the amino acids in the tables above can be tested statistically using the measurement errors that were estimated based on NMR noise using a theoretical spectral model (Chapter 4 of this thesis). An example is given for the comparison of the relative intensities in Table 3. Their values and estimated covariance matrices are shown in Table 22. 206

Verifying biosynthetic pathways, precursors and estimated errors TABLE 22: Relative intensities and covariances of two sets of relative intensities of αcarbons of alanine, phenylalanine and tyrosine for the wildtype and ∆TPI-mutant labeling of fragment

alaχαβ

pheχαβ

tyrχαβ

010 011 110 111 010 011 110 111 010 011 110 111

relative intensities 0.139 0.116 0.049 0.696 0.114 0.083 0.024 0.780 0.129 0.087 0.031 0.754

wildtype covariances * 104

0.074 -0.013 -0.018 -0.043 0.068 -0.005 -0.021 -0.042 0.065 -0.004 -0.025 -0.036

-0.013 0.256 -0.029 -0.214 -0.005 0.198 -0.021 -0.172 -0.004 0.219 -0.030 -0.185

-0.018 -0.029 0.144 -0.098 -0.021 -0.021 0.227 -0.185 -0.025 -0.030 0.276 -0.222

-0.043 -0.214 -0.098 0.355 -0.042 -0.172 -0.185 0.399 -0.036 -0.185 -0.222 0.443

relative intensities 0.096 0.029 0.045 0.831 0.091 0.015 0.010 0.883 0.093 0.021 0.008 0.877

∆TPI mutant covariances * 104

0.071 -0.029 -0.003 -0.039 0.131 -0.047 -0.016 -0.068 0.052 -0.011 -0.008 -0.033

-0.029 0.527 -0.068 -0.429 -0.047 1.013 -0.040 -0.927 -0.011 0.342 -0.009 -0.322

-0.003 -0.068 0.168 -0.097 -0.016 -0.040 0.236 -0.180 -0.008 -0.009 0.089 -0.073

-0.039 -0.429 -0.097 0.564 -0.068 -0.927 -0.180 1.174 -0.033 -0.322 -0.073 0.428

If the above relative intensities are statistically tested for their identity, the following results are obtained: TABLE 23: Covariance weighted sums of squared residuals (SS) between relative intensties of α-carbons of various combinations of alanine, phenylalanine and tyrosine and probabilities (P) that those deviations are caused by coincidence alone fragment alaχαβ pheχαβ

SS P SS P

wildtype phe-χαβ tyr-χαβ 111.91 43.01 4.25E-24 2.45E-09 20.93 1.09E-04

∆TPI mutant phe-χαβ tyr-χαβ 36.31 58.43 6.43E-08 1.27E-12 0.60 8.97E-01

Table 23 shows that only the relative intensities of the α-carbon of phenylalanine and tyrosine of the mutant are judged to be identical by this test. All other combinations have larger residuals than can be explained by the estimated covariances alone. It is clear that the deviations between alanine on the one hand and tyrosine or phenylalanine on the other are much larger than the mutual deviations between phenylalanine and tyrosine, which was already concluded from a more superficial comparison in Assumption I (Table 3). If the measurement errors cannot explain the observed differences between relative intensities that should be identical according to their biosynthesis routes, then either these routes are not correct or complete, or the measurement errors are underestimated. One possible source of additional errors that was discussed in Chapter 4 is that in case the onedimensional 13C-spectra are obtained from two-dimensional [13C,1H] COSY spectra by making a section at a single 1H-frequency, the relative peak areas may not be representative for the relative peak volumes in the two-dimensional spectra. 207

Chapter 10 Checking estimated measurement errors Assuming that the biosynthetic assumptions are correct, sets of multiple relative intensities that should be identical can be used to experimentally determine covariance matrices. A number of covariance matrices that were determined in this way are shown in the Tables 2432 below. The tables also show averages of the covariance matrices of the concerning carbon atoms that were estimated based on the NMR noise. TABLE 24: Covariances estimated from NMR noise and from theoretically identical relative intensities of α-carbon in alanine and valine for wildtype and ∆TPI-mutant (see Table 1) covariances estimated from multiple 01 sets 11 noise 01 11

wildtype covariances * 104 5.359 -5.359 5.359 0.268 -0.268 0.268

∆TPI mutant covariances * 104 1.394 -1.394 1.394 0.416 -0.416 0.416

TABLE 25: Covariances estimated from NMR noise and from theoretically identical relative intensities of β-carbon in alanine, γ1-carbon in valine, of γ2-carbon in isoleucine and of δ1-carbon in leucine for wildtype and ∆TPI-mutant (see Table 2) covariances estimated from multiple 01 sets 11 noise 01 11

wildtype covariances * 104 4.297 -4.297 4.297 0.367 -0.367 0.367

∆TPI mutant covariances * 104 3.171 -3.171 3.171 0.204 -0.204 0.204

TABLE 26: Covariances estimated from NMR noise and from theoretically identical relative intensities of α-carbon in phenylalanine and tyrosine for wildtype and ∆TPImutant (see Table 3) labeling of fragment multiple 010 sets 011 110 111 noise 010 011 110 111

208

0.871

0.067

wildtype covariances * 104 0.481 0.104 -1.455 0.468 -0.200 -0.749 0.434 -0.338 2.542 -0.005 -0.022 -0.040 0.206 -0.025 -0.177 0.246 -0.200 0.417

0.020

0.091

∆TPI mutant covariances * 104 0.060 -0.019 -0.062 0.175 -0.055 -0.180 0.017 0.056 0.185 -0.029 -0.012 -0.051 0.678 -0.025 -0.624 0.163 -0.126 0.801

Verifying biosynthetic pathways, precursors and estimated errors TABLE 27: Covariances estimated from NMR noise and from theoretically identical relative intensities of β-carbon in phenylalanine and tyrosine for wildtype and ∆TPImutant (see Table 4) covariances estimated from multiple 01 sets 11 noise 01 11

wildtype covariances * 104 0.663 -0.663 0.663 0.910 -0.910 0.910

∆TPI mutant covariances * 104 0.012 -0.012 0.012 0.929 -0.929 0.929

TABLE 28: Covariances estimated from NMR noise and from theoretically identical relative intensities of α-carbon in aspartic acid, methionine and threonine for wildtype and ∆TPI-mutant (see Table 8) labeling of fragment multiple 010 sets 011 110 111 noise 010 011 110 111

3.664

0.381

wildtype covariances * 104 -1.058 1.245 -3.851 2.242 -1.025 -0.160 0.774 -0.994 5.004 -0.074 -0.109 -0.198 0.944 -0.150 -0.720 0.639 -0.380 1.297

2.305

0.051

∆TPI mutant covariances * 104 -0.285 -0.211 -1.809 0.041 0.022 0.223 0.023 0.167 1.420 -0.006 -0.015 -0.030 0.183 -0.014 -0.163 0.138 -0.109 0.303

TABLE 29: Covariances estimated from NMR noise and from theoretically identical relative intensities of β-carbon in aspartic acid and threonine for wildtype and ∆TPImutant (see Table 9) covariances estimated from multiple 010 sets 011+110 111 noise 010 011+110 111

wildtype covariances * 104 2.337 -0.592 -1.744 2.774 -2.182 3.926 0.318 -0.004 -0.314 0.386 -0.382 0.696

∆TPI mutant covariances * 104 0.140 0.454 -0.594 1.467 -1.921 2.515 0.121 0.012 -0.133 0.294 -0.306 0.439

209

Chapter 10 TABLE 30: Covariances estimated from NMR noise and from theoretically identical relative intensities of β-carbon in glutamate and proline and γ-carbon in lysine for wildtype and ∆TPI-mutant (see Table 13) covariances estimated from multiple 010 sets 011+110 111 noise 010 011+110 111

wildtype covariances * 104 0.706 -0.145 -0.561 1.033 -0.888 1.448 2.918 -0.430 -2.488 1.087 -0.657 3.145

∆TPI mutant covariances * 104 4.813 -0.874 -3.939 6.513 -5.639 9.578 3.024 -0.525 -2.498 1.220 -0.695 3.193

TABLE 31: Covariances estimated from NMR noise and from theoretically identical relative intensities of γ-carbon in glutamate, proline and arginine and δ-carbon in lysine for wildtype and ∆TPI-mutant (see Table 14) covariances estimated from multiple 010 sets 011+110 111 noise 010 011+110 111

wildtype covariances * 104 1.370 0.384 -1.754 1.878 -2.261 4.015 1.765 0.229 -1.995 1.335 -1.564 3.559

∆TPI mutant covariances * 104 4.892 2.111 -7.002 6.648 -8.758 15.760 2.338 0.408 -2.747 2.669 -3.077 5.824

TABLE 32: Covariances estimated from NMR noise and from theoretically identical relative intensities of δ-carbon in proline and arginine and ε-carbon in lysine for wildtype and ∆TPI-mutant (see Table 15) covariances estimated from multiple 01 sets 11 noise 01 11

wildtype covariances * 104 0.916 -0.916 0.916 0.245 -0.245 0.245

∆TPI mutant covariances * 104 1.005 -1.005 1.005 0.243 -0.243 0.243

When comparing the covariance matrices that were generated according to the two different methods it is observed that in the majority of cases the experimentally determined variances are considerably larger than the ones determined from the NMR noise: e.g. for the wildtype in Tables 24, 25, 26, 28, 29 and 32 and for the ∆TPI mutant in Tables 24, 25, 29, 30, 31 and 32. In a few cases the NMR noise yields larger covariances: e.g. for the wildtype in Tables 27 and 30 and for the ∆TPI mutant in Tables 26 and 27. The difference in the covariances of the sets containing more than two relative intensities is hard to interpret. It should be noted that the experimentally determined covariances of the ∆TPI mutant are determined from one set of relative intensities per amino acid, whereas those of the 210

Verifying biosynthetic pathways, precursors and estimated errors wildtype are determined from the double number of sets due to the availability of measurements of early and late harvested biomass. This makes the covariances that are estimated for the wildtype data less prone to random outliers. 10.5 CONCLUSIONS In this chapter, the biosynthetic assumptions of S. cerevisiae that were published by Maaheimo et al. (2001) were verified by comparing 2D [13C,1H] COSY spectra of accumulating compounds that were assumed to have common precursors in central carbon metabolism. Their assumptions are confirmed in almost all cases: • alanine, valine, the β- and γ2-carbon atoms of isoleucine and the β-, γ-, δ1- and δ2carbon atoms of leucine are indeed likely to be synthesized from mitochondrial pyruvate, • the identical metabolic origin of the carbon atoms in phenylalanine and tyrosine is also confirmed, • serine is found to be formed from 3-phosphoglycerate, but its third carbon has a deviating 13C-labeling due to the reversible cleavage to form glycine, • it is likely that glycine is both synthesized from serine and from threonine, • aspartic acid, methionine and threonine are all three found to be synthetized from cytosolic oxaloacetate, • the χα- and γ1δ-fragments in isoleucine are also likely to stem from cytosolic oxaloacetate, • glutamate, proline and arginine are all three formed from the common precursor αketoglutarate and the βγδε-fragment of lysine is confirmed to be synthesized from the fragment in α-ketoglutarate that forms the αβγδ-fragment in the former three amino acids, • it is verified that the χα-fragments in the amino acids lysine and leucine are synthesized from differently 13C-labeled acetyl-coenzymeA moieties; probably cytosolic acetyl-coA in lysine and mitochondrial acetyl-coA in leucine. The acetylcoenzymeA moiety in lysine has a higher ratio f01/f11 than that in leucine, which is probably due to the inflow into the cytosolic acetyl-coA pool of unlabeled ethanol that is taken up from the medium and of acetaldehyde that stems from threonine cleavage, • the common precursor glucose 6-phosphate of levulinic acid and trehalose is confirmed by two out of three compared spectra. The above conclusions were based on visual inspection of the identity of relative intensities. When weighted by measurement errors that were estimated from NMR noise using a theoretical spectral model, even small deviations were found to be statistically unacceptable. A possible explanation for this is that the measurement errors are underestimated. Comparing the covariances that are estimated based on NMR noise with those that are determined from redundant sets of multiple relative intensities shows that the NMR noise often fails to fully explain the data variability that is observed in practice. This means that there must indeed be more sources of measurement errors. The outcomes of the (newly presented) way of determining fractional enrichments from 13 the C-NMR spectra of fragments in which carbon-carbon bonds were formed during 211

Chapter 10 biosynthesis also generally confirm the assumptions about the biosynthesis routes of the amino acids. Due to the addition of the unlabeled cosubstrate ethanol in the experiments of which the NMR spectra were obtained, the results allow a rough investigation of the catabolism of cosubstrates. The present method yields position specific fraction enrichments in contrast to a similar approach presented by Christensen and Nielsen (2002) who only used molecule-averaged fractional enrichments. The positional information presented here leads to more detailed conclusions. REFERENCES Christensen, B. (2000) Metabolic network analysis: Principles, methodologies and applications. PhDthesis Technical University of Denmark Christensen, B., Nielsen, J. (2002) Reciprocal 13C-labeling: a method for investigating the catabolism of cosubstrates. Biotechnol. Progr., 18, 2: 163-166 Lange, H.C., Van Winden, W.A., Schipper, D., Heijnen, J.J. (2002) Metabolic flux analysis of a triose-phosphate isomerase deletion mutant of S. cerevisiae: testing assumptions about metabolic networks by combining cofactor and 13C-labeling balances. Submitted for publication in Biotechnol. Bioeng. Maaheimo, H., Fiaux, J., Çakar, Z.P., Mailey, J.E., Sauer, U., Szyperski, T. (2001) Central carbon metabolism of Saccharomyces cerevisiae explored by biosynthetic 13C labeling of common amino acids. Eur. J. Biochem., 268: 2464-2479

212

Summary Of the thesis: ‘13C-Labeling Technique for Metabolic Network and Flux Analysis’ by Wouter van Winden Metabolic engineers have adopted the 13C-labeling technique as a powerful tool for studying metabolic network structure and elucidating fluxes in those metabolic networks. This tracer technique makes it possible to determine fluxes that are unobservable using only metabolite balances and allows the elimination of doubtful cofactor balances that are often indispensable in flux analysis based on metabolite balancing alone. In Chapter 2 the assumptions on which the 13C-labeling technique relies are made explicit. It is discussed that not all of those assumptions are free from uncertainties. Two possible errors in the models that are used to determine the metabolic fluxes from labeling data are: (1) omitted reactions and (2) ignored occurrence of channeling. By means of two representative examples it is shown that these modeling errors may lead to serious errors in the calculated flux distributions despite the use of labeling data. A complicating factor is that the model errors are not always easily detected as poor models may still yield good fits of experimental labeling data. Results of 13C-labeling experiments should therefore be interpreted with appropriate caution. Ideally, one would like to ascertain prior to performing a 13C-labeling experiment whether the chosen substrate labeling and measurement technique allow identification of all the fluxes in the studied metabolic network. Chapter 3 presents a method for such an identifiability analysis that is based on the recently introduced ‘cumomer’ concept. The method improves upon previously published identifiability methods in that it provides a way to systematically reduce the metabolic network on the basis of structural elements that constitute a network. It uses the implicit function theorem to analytically determine whether the fluxes in the reduced network are theoretically identifiable for various types of real measurement data. Application of the method to a realistic flux identification problem shows both the potential of the method by yielding new, interesting conclusions regarding the identifiability and its practical limitations that are caused by the fact that the size and complexity of the symbolic calculations grow fast with the dimension of the studied system. One of the measurement techniques that is increasingly used in 13C-labeling studies is 2D [13C,1H] COSY. This method yields information about carbon-carbon connectivities in intracellular compounds that in their turn contain information regarding the steady state fluxes in cellular metabolism. Chapter 4 proposes innovations in the generation and analysis of these specific NMR spectra. These include a software tool that allows accurate determination of the relative peak areas and their complete covariance matrices even in very complex spectra. Additionally, a method is introduced for correcting the results for isotopic non-steady state conditions. The proposed methods are applied to measured 2D [13C,1H] COSY spectra. When analyzing the spectra, it is observed that peak areas in a one-dimensional section of the spectrum are frequently not representative for peak volumes in the two-dimensional spectrum. Furthermore, it is shown that for some spectra a significant amount of additional information can be gained from long-range 13C-13C scalar couplings in 2D [13C,1H] COSY 213

Summary spectra. Finally, the NMR resolution enhancement by dissolving amino acid derivatives in a non-polar solvent is demonstrated. When determining the fluxes in a metabolic network by fitting simulated data to the entire set of measured 2D [13C,1H] COSY, one has to repeatedly simulate the 13C-labeling distribution in all measured compounds. This requires the solution of very large sets of isotopomer or cumomer balances. Both the number and size of these balances are reduced by the new ‘bondomer’ concept that is introduced in Chapter 5. Bondomers are entities that only vary in the numbers and positions of C-C bonds that have remained intact since the medium substrate molecule entered the metabolism. Bondomers are shown to have many analogies to isotopomers. One of these is that bondomers can be transformed to cumulative bondomers, just like isotopomers can be transformed to cumomers. Similarly to cumomers, cumulative bondomers allow an analytical solution of the entire set of balances describing a metabolic network. The main difference is that cumulative bondomer models are considerably smaller than corresponding cumomer models. This saves computational time, allows easier identifiability analysis and yields new insights in the information content of 2D [13C,1H] COSY data. The restriction of the bondomer concept is that it can only be used in experiments where biomass is grown on (mixtures of) uniformly 13C-labeled carbon sources. We illustrate the theoretical bondomer concept by means of a realistic example of the glycolytic and pentose phosphate pathways. It is analyzed which combinations of 2D [13C,1H] COSY data allow the identification of all metabolic fluxes in these pathways and it is found that the NMR data contain less independent information than was previously expected; there is much redundancy. The bondomer concept is not constrained to simulating 2D [13C,1H] COSY data. It can be applied to simulate any kind of 13C-labeling data. Chapter 6 presents an algorithm that offers a general approach to find all probabilistic expressions needed to convert bondomer distributions to actually measured 13C-labeling data, such as mass distributions (measured by MS) or relative intensities in multiplets (measured by NMR). Combined with the bondomer modeling approach presented in the previous chapter, this algorithm allows straightforward simulation of all sorts of 13C-labeling measurement data. As mentioned above, one of the alternatives to measuring 13C-labeling distributions by means of 2D [13C,1H] COSY is to apply mass spectrometry (MS). When fitting simulated MS data to measured mass distributions, the simulated mass distributions must be corrected for the presence of naturally occurring isotopes. A method that was recently introduced for this purpose consists of consecutive correction steps for each isotope of each element in the considered compound. In Chapter 7 it is shown that all isotopes of each individual element must, however, be corrected in one single step. Furthermore, it is shown that in order to correctly take into account the presence of naturally occurring isotopes, the source of information with respect to isotopic compositions of the elements needs to be chosen with care. The theory discussed in the first seven chapters of the thesis is applied in two case studies that are presented in chapters 8 and 9. Chapter 8 presents the metabolic network and flux analysis of the glycolysis and pentose phosphate pathway of the fungus Penicillium chrysogenum. The chapter compares the outcomes of the ‘classical’ method for metabolic flux 214

Summary analysis (by combining measurements of net conversion rates with a set of metabolite balances including the cofactor balances) with the outcomes of 13C-labeling based flux analysis. The two approaches are applied to two different cultivation conditions in which P. chrysogenum is grown on either ammonia or nitrate as the nitrogen source, which is expected to lead to different pentose phosphate pathway fluxes. The presented flux analyses are based on extensive sets of 2D [13C,1H] COSY data. The 13C-labeling distribution is simulated using the bondomer concept that was introduced in chapter 5. The outcomes of the 13C-labeling based flux analysis substantially differ from those of the metabolite balancing approach. The fluxes that are determined using 13C-labeling data are shown to be highly dependent on the chosen metabolic network. Extending the traditional non-oxidative pentose phosphate pathway with additional transketolase and transaldolase reactions (as discussed in chapter 1) and extending the glycolysis with a fructose 6-phosphate aldolase/dihydroxyacetone kinase reaction sequence considerably improves the fit of the measured and the simulated NMR data. The results obtained using the extended version of the non-oxidative pentose phosphate pathway model contradict the common assumption that transketolase and transaldolase reactions are reversible. Superficial inspection of the fits between the simulated and measured labeling data seems to confirm that the model describes the measurements well. Still, based on estimated measurements errors, our fits are rejected on statistical ground. This shows that strict statistical testing of the outcomes of 13C-labeling based flux analysis using realistic measurement errors is of prime importance for verifying the assumed metabolic model. The case study of chapter 9 introduces a new method for measuring 13C-labeling in the primary carbon metabolism: supplying 13C-labeling to a continuous culture during only one hour, followed by rapid sampling of the fermentation broth, immediate quenching of the metabolism, boiling ethanol extraction and direct LC-MS measurement of the mass isotopomer distributions of 16 intermediates of the glycolysis, pentose phosphate pathway, anaplerotic reactions, glyoxylate shunt and TCA cycle. Besides allowing a very short labeling of the culture, this method also requires very little biomass for the analysis and does not rely on assumed biosynthesis routes, in contrast to the common GC-MS and NMR methods both of which do not directly measure the labeling of primary metabolites, but of polymeric biomass components that are synthesized thereof. The method is used to determine the 13C-distribution in the Saccharomyces cerevisiae strain CEN.PK113-7D grown in chemostat culture with D=0.1 hr-1 and a medium containing 90% (w/w) glucose and 10% (w/w) ethanol as carbon sources. The measured 13C-labeling data are fitted using a detailed, compartmented model of the primary metabolism. The outcomes include both the flux distribution in the network and the cytosolic and mitochondrial molar fractions of the metabolites that are present in both compartments. The estimated flux distribution is compared to one that is found by fitting a second data set that was obtained using the same strain and cultivation conditions, but a different measurement technique: 2D [13C,1H] COSY measurement of amino acids in biomass protein. The MS and NMR data sets are found to yield different estimated flux patterns. A simultaneous fit of the two data sets leads to a decreasing goodness-of-fit. This observation, plus statistical analyses of the fits of the separate data sets indicate that the current model is 215

Summary not yet complete. The merits of both methods with respect to the estimation of several metabolic model parameters and the differences between the methods are discussed. The final chapter shows how biosynthetic assumptions of intracellularly accumulating compounds can be verified by comparing the 2D [13C,1H] COSY spectra of different compounds that are assumed to have a common precursor in central carbon metabolism. In this chapter, the biosynthetic assumptions made in chapter 9 are verified using the 2D [13C,1H] COSY spectra of a wildtype and an isogenic deletion mutant strain of S. cerevisiae. The results generally confirm the assumptions. Furthermore, a new method is presented to determine fractional enrichments from 2D [13C,1H] COSY spectra of fragments that have been synthesized from more than one metabolic precursor. Finally, it is found that measurement errors that were estimated from NMR noise using a theoretical spectral model fail to explain observed data variability. In conclusion, this thesis presents innovations with respect to the experimental, mathematical and biochemical aspects of the 13C-labeling technique. Although improved, the technique still does not yield statistically accepted fits of measured 13C-labeling data of P. chrysogenum and S. cerevisiae. The main bottleneck seems to be that our metabolic models of eukaryotes are insufficiently detailed to describe the data well. Adding details will, however, inevitably entail an increasing number of degrees of freedom. The question is whether the amount of currently available data suffices to identify all of those.

216

Samenvatting Van het proefschrift: ‘13C-Labeling Techniek voor Metabole Netwerk en Flux Analyse’ door Wouter van Winden Metabole ingenieurs (wetenschappers gespecialiseerd in het bestuderen en manipuleren van het celmetabolisme) zien de 13C-labelingstechniek als een veelbelovende methode om de structuur van, en de reactiesnelheden (of: ‘fluxen’) in metabole netwerken te bestuderen. De labelingstechniek maakt het mogelijk om reactiesnelheden te bepalen die niet observeerbaar zijn als alleen metabolietbalansen worden toegepast. Controversiële cofactorbalansen, die vaak onontbeerlijk zijn om bepaalde reactiesnelheden te bepalen met metabolietbalansen, kunnen bij gebruik van de labelingstechniek achterwege worden gelaten. In hoofdstuk 2 van dit proefschrift worden de aannames waarop de 13Clabelingstechniek berust expliciet genoemd en wordt bij een aantal daarvan vraagtekens gezet. Twee mogelijke foutieve aannames in de modellen waarmee reactiesnelheden worden bepaald zijn (1) vergeten reacties en (2) het onvoorziene plaatsvinden van directe doorgave van metabolieten tussen enzymen. Met behulp van twee representatieve voorbeelden wordt aangetoond dat deze modelfouten ertoe kunnen leiden dat de geschatte reactiesnelheden ver naast de waarheid zitten, ondanks het gebruik van 13C-labelingsdata. Hierbij komt nog dat de modelfouten moeilijk detecteerbaar zijn, omdat zelfs slechte modellen de gemeten labelingsdata soms goed kunnen benaderen. Om deze reden moeten de uitkomsten van labelingsexperimenten altijd kritisch beschouwd worden. In het ideale geval zou men al voorafgaand aan een 13C-labelingsexperiment willen weten of het gekozen gelabelde substraat en de gebruikte meettechniek inderdaad alle fluxen in het bestudeerde metabole netwerk zichtbaar maken. Hoofdstuk 3 beschrijft een methode om een dergelijke identificeerbaarheidsanalyse uit te voeren met behulp van het zogenaamde ‘cumomeren’ concept. De voorgestelde methode verbetert bestaande methoden door allereerst een systematische versimpeling van het netwerk uit voeren, op basis van structurele kenmerken van de verschillende knooppunten in dat netwerk. Vervolgens past de methode het impliciete functie theorema toe om analytisch te kunnen bepalen of de reactiesnelheden theoretisch gezien geïdentificeerd kunnen worden met verschillende soorten labelingsdata. Een toepassing van de methode op een bestaand identificeerbaarheidsvraagstuk leidt tot nieuwe, interessante conclusies met betrekking tot de identificeerbaarheid, wat de potentie ervan aantoont. Dezelfde toepassing toont echter ook de praktische beperking die het gevolg is van de explosieve groei van de symbolische berekeningen bij een groter wordend netwerk. Een meettechniek die in toenemende mate gebruikt wordt in 13C-labelingsstudies 2D [13C,1H] COSY NMR. Deze methode geeft informatie over de koolstof-koolstofbindingen in intracellulaire stoffen, die op hun beurt informatie bevatten over de stationaire reactiesnelheden in het celmetabolisme. Hoofdstuk 4 stelt een aantal innovaties voor op het gebied van de meting en analyse van dit type NMR spectra. De innovaties omvatten onder meer software die in staat stelt om zelfs in complexe spectra precieze piekoppervlaktes en bijbehorende covarianties te schatten. Daarnaast wordt een methode voorgesteld om resultaten 217

Samenvatting te corrigeren als deze in isotope niet-stationaire toestand bepaald zijn. De methodes worden toegepast op gemeten 2D [13C,1H] COSY spectra. Bij het analyseren van de spectra bleek dat piekoppervlaktes in een eendimensionale doorsnede van een spectrum vaak niet representatief zijn voor de piekvolumes in het tweedimensionale spectrum. Daarnaast wordt aangetoond dat in sommige spectra een aanzienlijke hoeveelheid extra informatie gewonnen kan worden, doordat in 2D [13C,1H] COSY spectra ook lange-afstandskoppelingen tussen 13C-atomen zichtbaar zijn. Tenslotte wordt getoond hoe het oplossen van derivaten van aminozuren in niet-polaire oplosmiddelen de resolutie van NMR metingen vergroot. Wanneer metabole reactiesnelheden geschat worden door gesimuleerde data iteratief aan te passen aan de volledige set 2D [13C,1H] COSY meetdata, moet men herhaaldelijk de 13 C-verdeling in alle gemeten stoffen berekenen. Dit vereist het oplossen van zeer grote aantallen isotopomeren- of cumomerenbalansen. Zowel het aantal als de omvang van deze balansen worden verkleind door het niewe ‘bondomeren’concept dat in hoofdstuk 5 wordt gepresenteerd. Bondomeren zijn grootheden die slechts verschillen in de aantalen en posities van de koolstof-koolstofbindingen die intact zijn gebleven vanaf het moment dat het substraatmolecuul de cel binnenkwam. Bondomeren blijken analoog te zijn aan isotopomeren. Zo kunnen ze getransformeerd worden in cumulatieve bondomeren, op precies dezelfde wijze als isotopomeren omgezet worden in cumomeren. Evenals cumomeren, maken cumulatieve bondomeren het mogelijk om de balansen die een netwerk beschrijven expliciet op te lossen. Het belangrijkste verschil is dat een cumulatieve-bondomerenmodel veel kleiner is dan het cumomerenmodel van hetzelfde netwerk. Dit bespaart rekentijd, laat gemakkelijkere identificeerbaarheidsanalyse toe en levert bovendien nieuwe inzichten op in de informatieinhoud van 2D [13C,1H] COSY data. De restrictie van het bondomerenconcept is dat het slechts toegepast kan worden in experimenten waar biomassa gekweekt wordt op (mengsels van) uniform 13C-gelabelde koolstofbronnen. Het theoretische bondomeren concept wordt geïllustreerd aan de hand van een realistisch voorbeeld van de glycolyse en pentosefosfaatroute. Er wordt geanalyseerd welke combinaties van 2D [13C,1H] COSY data alle reactiesnelheden in deze reactiepaden identificeerbaar maken en het blijkt dat de NMR data minder onafhankelijk informatie bevatten dan tot nu toe werd verondersteld, veel meetdata zijn redundant. Het bondomerenconcept is niet alleen bruikbaar om 2D [13C,1H] COSY data te simuleren. Het kan toegepast worden om willekeurig welk type 13C-labeling data te simuleren. Hoofdstuk 6 beschrijft een algoritme dat de gebruiker in staat stelt om alle toevalsvergelijkingen te vinden die nodig zijn om bondomerenverdelingen om te zetten in de daadwerkelijk gemeten data, zoals massaverdelingen (meten met MS) of relatieve piekintensiteiten in multipletten (gemeten met NMR). In combinatie met de bondomerenmodellering zoals beschreven in het vorige hoofdstuk, vormt dit algoritme een gereedschap waarmee alle types 13C-data eenvoudig zijn te simuleren. Zoals al eerder gemeld, kunnen 13C-labelingsverdelingen niet alleen gemeten worden met behulp van 2D [13C,1H] COSY, maar ook door gebruik te maken van MS. Wanneer gesimuleerde MS data aangepast worden aan gemeten massaverdelingen, moeten de simulatiedata gecorrigeerd worden voor de aanwezigheid van in de natuur voorkomende 218

Samenvatting isotopen. Een methode daartoe die recentelijk werd voorgesteld, bestaat uit opeenvolgende correctiestappen voor elke isotoop van elk element dat voorkomt in de betreffende stof. In hoofdstuk 7 wordt echter gedemonstreerd dat er in één enkele stap gecorrigeerd moet worden voor alle isotopen van een element. Daarnaast wordt aangetoond dat de informatiebron over het voorkomen van isotopen met zorg gekozen moet worden, om een juiste isotopencorrectie te kunnen garanderen. De theorie die in de eerste zeven hoofdstukken van dit proefschrift is behandeld, wordt in hoofdstukken 8 en 9 toegepast in een tweetal case studies. Hoofdstuk 8 behandelt de analyse van de netwerkstructuur en de reactiesnelheden van de glycolyse en pentosefosfaatroute in de schimmel Penicillium chrysogenum. Deze studie maakt een vergelijking tussen de uitkomsten van de ‘klassieke’ methode voor fluxanalyse (waarin gemeten netto conversiesnelheden worden gecombineerd met een set metabolietbalansen waaronder ook de cofactorbalansen) en de uitkomsten van een 13C-labeling-gebaseerde fluxanalyse. The twee methoden worden toegepast op twee verschillende kweekcondities waarin P. chrysogenum ofwel ammonium ofwel nitraat als stokstofbron gebruikt, hetgeen naar verwachting leidt tot verschillende reactiesnelheden in de pentosefosfaatroute. De beschreven fluxanalyses zijn uitgevoerd op basis van uitgebreide sets aan 2D [13C,1H] COSY data. De 13C-labelingsverdeling wordt gesimuleerd met het bondomerenmodel dat in hoofdstuk 5 werd geïntroduceerd. De uitkomsten van de 13C-labeling-gebaseerde fluxanalyse verschillen sterk van de uitkomsten die volgden uit de metabolietbalansen. De reactiesnelheden die met de 13C-data bepaald zijn, blijken zeer gevoelig te zijn voor de gekozen netwerkstructuur. Uitbreiding van de niet-oxidatieve tak van de pentosefosfaatroute met extra transketolase en transaldolase reacties en uitbreiding van de glycolyse met een fructose 6-fosfaat aldolase/ dihydroxyacetone kinase reactiesequentie leidt tot een duidelijke verbetering van de overeenkomst tussen de gemeten en gesimuleerde NMR data. De resultaten die gevonden worden met de uitgebreide versie van de pentosefosfaatroute zijn in conflict met de algemeen aanvaarde aanname dat de transketolase en transaldolase reacties omkeerbaar zijn. Oppervlakkige vergelijking van de fits tussen de gesimuleerde en gemeten labelingdata lijkt te bevestigen dat het model de metingen goed beschrijft. Toch worden de fits statistisch verworpen op basis van geschatte meetfouten. Hieruit blijkt dat het noodzakelijk is om de uitkomsten van de 13C-labeling analyse streng te toetsen op basis van realistische meetfouten om de veronderstelde metabole netwerkstructuur te verifiëren. De case study in hoofdstuk 9 introduceert een nieuwe methode om 13C-labeling te meten in primaire metabolieten: 13C-labeling van een continucultuur gedurende slechts één uur, gevolgd door snelle monstername, onmiddellijke stillegging van het metabolisme in het monster, extractie in kokende ethanol en rechtstreekse LC-MS meting van de massaverdelingen van 16 intermediairen van de glycolyse, pentosefosfaatroute, anaplerotische reacties, glyoxylaatroute en citroenzuurcyclus. Naast de korte labelingsduur heeft deze methode als voordelen dat de analyse slechts weinig biomassa vereist en dat de methode niet berust op veronderstelde biosyntheseroutes, dit in tegenstelling tot de gangbare GC-MS en NMR methodes waarbij niet de labeling van de primaire metabolieten wordt gemeten, maar van de daaruit gevormde polymere biomassacomponenten. 219

Samenvatting Deze methode wordt gebruikt om de 13C-verdeling te bepalen van de metabolieten in Saccharomyces cerevisiae stam CEN.PK113-7D die gekweekt is in een chemostaatcultuur met D=0.1 u-1 en een voedingsmedium met daarin 90% glucose and 10% ethanol (beide op gewichtsbasis). De gemeten 13C-labelingsdata worden benaderd met een gedetailleerd, gecompartimenteerd model van het primaire metabolisme. De uitkomsten hiervan omvatten de reactiesnelheden in het netwerk en de cytosolaire en mitochondriële molfracties van de metabolieten die in beide compartimenten voorkomen. De geschatte fluxverdeling wordt vergeleken met een die volgde uit de fit van een tweede dataset die bepaald werd voor dezelfde stam en kweekcondities, maar met een andere meetmethode: 2D [13C,1H] COSY metingen van aminozuren in biomassa-eiwit. Er wordt geconstateerd dat de MS en NMR datasets tot verschillende reactiesnelheidspatronen leiden. Een gelijktijdige fit van de twee datasets leidt tot een afnemende kwaliteit van de benadering. Deze waarneming, en statistische analyse van de beste benaderingen van de losse datasets geven aan dat het huidige model nog niet compleet is. De voordelen van beide methodes met betrekking tot het schatten van de verschillende modelparameters worden besproken. Het laatste hoofdstuk toont hoe aannames over de biosynthese van intracellulair ophopende stoffen kunnen worden gecontroleerd door de vergelijking van 2D [13C,1H] COSY spectra van verschillende stoffen waarvan wordt aangenomen dat ze gesynthetiseerd zijn uit dezelfde uitgangsstof in het primaire metabolisme. In dit hoofdstuk worden de biosyntheseaannames uit hoofdstuk 9 geverifieerd aan de hand van 2D [13C,1H] COSY spectra van een wildtype en daaruit afgeleide deletiemutant van S. cerevisiae. De resultaten bevestigen over het algemeen de gedane aannames. Daarnaast wordt ook een nieuwe methode gepresenteerd waarmee fractionele verrijkingen kunnen worden bepaald uit 2D [13C,1H] COSY spectra van fragmenten die uit meer dan één uitgangsstof zijn samengesteld. Tenslotte wordt geconstateerd dat meetfouten die werden geschat op basis van de ruis in de gemeten NMR spectra tekort schieten in het verklaren van de variatie van de meetdata. Tot slot, dit proefschrift beschrijft innovaties met betrekking tot de experimentele, wiskundige en biochemische aspecten van de 13C-labeling techniek. Alhoewel de methode verbeterd is, levert hij nog altijd geen statistisch acceptabele fit op van de gemeten 13Clabeling data van P. chrysogenum en S. cerevisiae. Het voornaamste probleem lijkt dat onze huidige metabole modellen niet voldoende gedetailleerd zijn om de data afdoende te beschrijven. Het toevoegen van details zal echter onvermijdelijk leiden tot een toenemend aantal vrijheidsgraden. Het is de vraag of de hoeveelheid data die momenteel beschikbaar is wel voldoet om die allemaal te identificeren.

220

Future Directions Although the application of the 13C-labeling technique for metabolic network and flux analysis has passed from its early youth to adolescence during the past five years, a number of uncertainties that remain will have to be tackled before it is a mature technique. These uncertainties are especially encountered when the technique is applied to eukaryotic cells, as is illustrated by Chapters 8 and 9 of this thesis where fits of 13C-labeling data were rejected on statistical ground. Christensen (2000) discussed a number of sources of uncertainty that he collectively referred to as ‘spatio-temporal effects’. These include: • cells may contain various compartments from which the precursors of different biosynthesis routes can be withdrawn, • the cells may synthesize its polymeric compounds in different stages of a cell cycle during which metabolic fluxes vary, • the biomass in a fermentor may not be homogeneous, especially in case of filamentous or genetically instable species. The first point can be solved by using redundant 13C-labeling data to verify the biosynthesis routes of the various amino acids as was done by Christensen (2000) and in Chapter 10 of this thesis. Preferably, this is done for several mutant strains or growth conditions in order to exclude the possibility that amino acids with a different biosynthesis route coincidentally have a similar 13C-labeling-distribution. Alternatively, when the 13Clabeling of primary metabolites is directly measured (as presented in Chapter 9 of this thesis), no assumptions need to be made about the biosynthesis routes and compartmental origins of the precursors. When applying the latter approach one has to take into account that one measures cell-averaged 13C-labeling distributions of the compartmented pools. The second source of uncertainty arises due to the fact that the dynamics of 13C-labeled biomass formation are slower than that of the cell cycle. Due to the non-linear relation between fluxes and isotopomer distributions, the measured average 13C-labeling pattern cannot be fitted by an ‘average fluxeome’. Direct LC-MS measurements of the 13C-labeling of primary metabolites (as proposed in Chapter 9) may offer a solution to this problem, since most primary metabolite pools are 13C-labeled within a few minutes (i.e. within a single stage of the cell cycle). A prerequisite is that cells can be grown synchronously. This has recently been proved feasible elsewhere. In case the flux patterns during various stages of the cell cycle can be determined from 13C-tracer experiments, models that include cell population balances may be formulated to describe the ‘average fluxeome’ in a fermentor with a nonsynchronously growing culture. The last source of uncertainty (non homogeneous biomass) will be hard to tackle unless cell-sorting techniques can be applied before measuring the 13C-labeling. However, this latter method is not likely to yield enough biomass for 13C-labeling measurements. Direct LC-MS measurements of the 13C-labeling of primary metabolites may in the near future also open the possibility to measure dynamic 13C-labeling pasterns in experiments where the biomass is isotopically dynamic, but in metabolic steady state. This allows the estimation of the metabolic pool sizes from the estimated fluxes and the observed dynamic 13 C-labeling. Even more ambitious is to add a pulse of 13C-labeled substrate to biomass that is 221

Future directions in steady state and to measure both the absolute pool sizes and their mass isotopomer distributions over time. This may resolve dynamic flux patterns that cannot be resolved from dynamic pool size measurements alone due to the same structural features of metabolic networks that also impede flux identifiability in steady state flux analysis: parallel and bidirectional reactions. Two practical improvements that will further increase the quantity of available 13Clabeling information are (1) to measure the fraction of 13CO2 in the off-gas and (2) to apply tandem mass spectrometry to measure the mass distribution of fragments of selected mass isotopomers (Jeffrey et al.,2002). An increased quantity of 13C-labeling data is mandatory for identifying the fluxeome in increasingly detailed metabolic models, especially in case of compartmented cells of eukaryotes. Finally, a theoretical improvement of the methods that are presented in this thesis is to implement a more sophisticated minimization algorithm that increases the chance of finding a global minimum when fitting the measured 13C-labeling data (Dauner et al.,2001). The level of detail of metabolic models of a number of microorganisms, amongst which S. cerevisiae, is likely to get more detailed in the near future as a consequence of the transcriptome analyses that are nowadays performed in many laboratories and the proteome analyses that are gradually becoming a routine procedure as well. These analyses will lead to the discovery of new reactions and transport steps and perhaps even to new complete pathways. Systematic synthesis of alternative network models in which these new reactions are included and performing 13C-labeling fits with those models to verify their occurrence in vivo will help to bridge the current cleft between the sequenced genome and the active proteome. REFERENCES Christensen, B. (2000) Metabolic network analysis: Principles, methodologies and applications. PhDthesis Technical University of Denmark Dauner, M., Bailey, J.E., Sauer, U. (2001) Metabolic flux analysis with a comprehensive isotopomer model in Bacillus subtilis. Biotechnol. Bioeng., 76, 2: 144-156 Jeffrey, F.M.H., Roach, J.S., Storey, C.J., Sherry, A.D., Malloy, C.R. (2002) 13C Isotopomer analysis of glutamate by tandem mass spectrometry. Anal. Biochem., 300: 192-205

222

List of Publications Poster presentations Van Winden, W.A., Verheijen, P.J.T., Van der Heijden, R.T.J.M, Grievink, J., Heijnen, J.J. A new method for using 2D [13C,1H] COSY data in metabolic modelling. Presented at Metabolic Engineering II-Conference, Elmau, Germany, October 1998 Van Winden, W.A., Verheijen, P.J.T., Van der Heijden, R.T.J.M., Grievink, J., Heijnen, J.J. A new method for using 2D [13C,1H] COSY data in metabolic modelling. Presented at NMR in Biotechnology-symposium, Ede, The Netherlands, November 1998 Van Winden, W.A., Verheijen, P.J.T., Heijnen,J.J. Possible pitfalls of flux calculations based on 13C-labeling. Presented at Metabolic Engineering III-Conference, Colorado Springs, USA, October 2000 Van Winden, W.A., Heijnen,J.J., Verheijen, P.J.T. Improving simulated and measured 13Clabeling data in quest of the true metabolic fluxes. Presented at Physiology of Yeasts and Filamentous Fungi Symposium, Middelfart, Denmark, July 2001 Van Winden, W.A., Heijnen,J.J., Verheijen, P.J.T. Improving simulated and measured 13Clabeling data in quest of the true metabolic fluxes. Presented at Yeasterday, Papendal, The Netherlands, April 2002 Oral presentations A novel method for using 2D [13C,1H] COSY data in metabolic flux analysis. 2nd Annual Symposium M3-Programme, Delft, The Netherlands, April 1999 Using the 13C-labeling technique for flux analysis of P. chrysogenum. Annual meeting of DSM ‘Cluster Project Fijnchemie’, Vaalsbroek, The Netherlands, April 2000 Metabolic flux analysis of P. chrysogenum using 13C-labeling. Symposium on Penicillin Biosynthesis and Fermentation Research TUD/RuG, Delft, Februari 2001 Improving simulated and measured 13C-labeling data in quest of the true metabolic fluxes. 4th Annual Symposium M3-Programme, Delft, The Netherlands, March 2001 Metabolic flux analysis of P. chrysogenum using 13C-labeling. Final meeting of DSM ‘Cluster Project Fijnchemie’, Doorwerth, The Netherlands, May 2001 Improving simulated and measured 13C-labeling data in quest of the true metabolic fluxes. ‘Theoretical Tools’ workshop at Physiology of Yeasts and Filamentous Fungi Symposium, Middelfart, Denmark, July 2001 Using 13C-labeling technique for flux analysis of wild type and mutant S. cerevisiae. BSDL/BIOMAC PhD-symposium, Delft, The Netherlands, November 2001 Application of LC-MS measurements of 13C-labeling distributions for metabolic flux analysis. ESBES-4 Symposium, Delft, The Netherlands, August 2002 Journal articles Lange, H.C., Van Winden, W.A., Schipper, D., Heijnen, J.J. (2002) Metabolic flux analysis of a triose-phosphate isomerase deletion mutant of S. cerevisiae: testing assumptions about metabolic networks by combining cofactor and 13C-labeling balances. Submitted for publication in Biotechnol. Bioeng. 223

List of publications Van Winden, W.A., Verheijen, P.J.T., Heijnen, J.J. (2001) Possible pitfalls of flux calculations based on 13C-labeling, Metabol. Eng., 3, 2: 151-162 Van Winden, W.A., Heijnen, J.J., Verheijen, P.J.T., Grievink, J. (2001) A priori analysis of metabolic flux identifiability from 13C-labeling data’, Biotechnol. Bioeng., 74, 6: 505516 Van Winden, W.A., Schipper, D., Verheijen, P.J.T., Heijnen, J.J. (2001) Innovations in generation and analysis of 2D [13C,1H] COSY NMR spectra for metabolic flux analysis purposes, Metabol. Eng., 3, 4: 322-343. Van Winden, W.A., Wittmann, C., Heinzle, E., Heijnen, J.J. (2002) Correcting mass isotopomer distributions for naturally occurring isotopes. Accepted for publication in Biotechnol. Bioeng. Van Winden, W.A., Heijnen, J.J., Verheijen, P.J.T. (2002) Cumulative bondomers: a new concept in flux analysis from 2D [13C,1H] COSY data. Accepted for publication in Biotechnol. Bioeng. Van Winden, W.A., Van Gulik, W.M., Schipper, D., Verheijen, P.J.T., Krabben, P., Vinke, K., Heijnen, J.J. (2002) Metabolic flux and metabolic network analysis of Penicillium chrysogenum using 2D [13C,1H] COSY measurements and cumulative bondomer simulation. Submitted for publication in Biotechnol. Bioeng. Book contributions Van Gulik, W.M., Van Winden, W.A., Heijnen, J.J. (2001) Modeling the metabolism of penicillin-G formation. In: Synthesis of β-lactam antibiotics, Ed. Bruggink, A., Kluwer Academic Publishers, Dordrecht, The Netherlands, 283-334

224

Curriculum Vitae Wouter Adrianus van Winden was born on August 31, 1974 in Monster, The Netherlands. From 1986 to 1992 he attended grammar school at the Sint Janscollege in The Hague. Hereafter, he studied bioprocess technology, specialization modelling and control, at Wageningen University of Agriculture. The last two years of his study were spent on a graduation project at the Bioprocess Group of Wageningen University of Agriculture, a traineeship at the Department of Food Engineering of University College Cork in Ireland and a traineeship at the Department of Process and Environmental Technology of Massey University in Palmerston North, New Zealand. After his graduation in 1997 he was employed as a PhD by the bioprocess technology and process systems engineering groups of Delft University of Technology. In 2002 he finished his thesis entitled ‘13C-Labeling Technique for Metabolic Network and Flux Analysis: Theory and Applications’. Besides his research work, he worked as a free-lance journalist for the Delft University Newspaper ‘Delta’ from 1998 until 2002. Wouter is currently employed as assistant professor ‘metabolomics’ by the bioprocess technology group of Delft University of Technology.

225

Curriculum vitae

226

Dankwoord In tegenstelling tot wetenschappelijke artikelen, is het bij proefschriften niet gebruikelijk om meerdere auteurs op de omslag te vermelden. Nu ben ik wel degene die de afgelopen tweehonderdzoveel pagina’s heeft geschreven en gelay-out, toch was dit proefschrift nooit gerealiseerd zonder de directe en indirecte hulp van een groot aantal personen. ‘Medeauteurs’ zijn in eerste instantie mijn begeleiders Sef Heijnen en Peter Verheijen. Hen wil ik niet alleen bedanken voor hun grote wetenschappelijke bijdrage aan mijn werk, maar ook voor het prettige persoonlijke contact en voor hun vertrouwen in tijden dat zij meer heil zagen in mijn werk dan ikzelf. Sef, bedankt voor de grote vrijheid en je aanmoedigingen om op congressen en cursussen ideeën op te gaan doen. Peter, dank voor het vele verbeteren en aanscherpen van mijn Matlab-programmaatjes, vaak nachtwerk gezien de tijd waarop de mailtjes met antwoorden werden verzonden. Laat ik ook niet de begeleiders vergeten die mijn werk mede op de rails zetten en er in de beginfase nauw bij betrokken waren: Johan Grievink en René van der Heijden. Verder ben ik veel dank verschuldigd aan de immer behulpzame én gezellige collega’s op het Kluyverlab: Walter van Gulik, Diana Visser, Preben Krabben, Hans Lange, Ko Vinke, Liang Wu, Mlawule Mashego, Jan van Dam en Roelco Kleijn. Walter en Hans leverden de metabole netwerken van hoofdstuk 8, respectievelijk 9. Hulp van Ko was onontbeerlijk bij het fermenteren en monsternemen en Jan wist in korte tijd indrukwekkende getallen uit de MS te halen. Preben fungeerde als encyclopedie (zowel voor feiten op biochemisch als op sportgebied) en had altijd de tijd voor een brainstorm over het metabolisme. Met kamergenote Diana kon ik ook over de niet-wetenschappelijke aspecten van het aioschap praten. Misschien wat minder samenwerking, maar minstens zoveel gezelligheid ondervond ik tijdens de periodes die ik verspreid over mijn aioschap doorbracht bij de vakgroep process systems engineering. Dank daarvoor aan onder andere Panos Seferlis, Sean Birmingham, Michiel Meeuse, Michiel van Wisse, Pieter Schmal, Gijsbert Korevaar en de vele twaio’s en studenten. Een zeer prettige samenwerking had ik met onderzoekers van het Beijerincklaboratorium van DSM in Delft. Ik bedank vooral DSM-onderzoeker en NMR-expert Dick Schipper voor zijn onmisbare spectra en duidelijke uitleg daarbij, maar ook ben ik dank verschuldigd aan tal van andere DSM’ers voor hun belangstelling en adviezen. Om de bedankjes aan de wetenschappelijke hulpdiensten te beëindigen, rest mij nog het noemen van de informele wetenschappelijke adviseurs ‘aan de andere kant van het gebouw’: Hans van Dijken en Jack Pronk en natuurlijk de afstudeerders die aan dit werk bijdroegen: Maciek Antoniewicz, Wouter Berendsen en Diana Ros Riu. Maar met deze dankbetuigingen heb ik niet al het bij dit onderzoek betrokken TUpersoneel genoemd. Daarom dank ik bij deze ook de voltallige ondersteunende diensten voor de uitstekende faciliterende hulp die ik op het Kluyverlaboratorium en in het DelftChemTechgebouw heb gehad. Zo nu en dan zeer welkome afleiding van mijn onderzoekswerk werd mij geboden door de redactie van het TU-weekblad Delta, waarvoor ik hen (inclusief de vele inmiddels vertrokken redactieleden) zeer hartelijk dank. Dankzij Delta ken ik nu veel meer mensen en 227

Dankwoord plaatsen op de TU dan alleen mijn eigen lab(genoten) en heb ik mijn pen kunnen scherpen. Ik hoop dat de Delta de kans blijft krijgen om een onafhankelijke mening te geven over het doen en laten binnen de TU. Sommige TU’ers lijken er zo op gebrand om het weekblad op fouten te betrappen dat zij de vele juiste berichtgeving over het hoofd zien. Zij negeren het feit dat journalisten, net als wetenschappers, mensen zijn die fouten kunnen maken en dat Delta en passant ook als kweekvijver dient voor jonge schrijftalenten, van wie er al een aantal een carrière hebben weten te maken als professionele journalist. Een andere vorm van afleiding van mijn werk, zeer welkom als afwisseling van het vele zitten achter de computer, waren de soms zware, maar altijd gezellige trainingen met de wegselectie van Haag Atletiek. Daarvoor dank aan trainers, coaches en rencollega’s die met hun fysieke aanslag op mijn lichaam ongemerkt mijn geest masseerden. Blijft daar een laatste groep van eveneens indirect, maar daarom geenzins minder belangrijke medeverantwoordelijken voor mijn werk: mijn sociale omgeving. Die bestaat natuurlijk in eerste instantie uit mijn vriendin Olga van Diermen, alias Ol, die me vooral in de eerste jaren een aantal keer uit de put hielp, maar net zo goed meegenoot in mijn veel vrolijkere periodes. Aangezien het voor haar zichtbare deel van mijn werk slechts bestond uit talloze kladblaadjes volgekrabbeld met open (12C-atomen) en gesloten (13C-atomen) bolletjes, mag het geen verbazing wekken dat zij meester was in het relativeren van mijn werk. Grote dank ook voor de niet aflatende belangstelling en steun van mijn ouders Chris en Hélène, broers Bas en Loek, ‘zus’ Martine, en nichtje Brecht (en op de valreep ook nog Lot) en de schoonfamilieleden Anton en Landa, Renate, Jacco en neefje Koen. Tenslotte, veel dank voor de interesse, gezelligheid en talloze meer of minder zinvolle e-mails van mijn achtkoppige Wageningse vriendenclub Quark: Ad, Arne, Bram, Harmjan, Floris, Friso, Rutger en Sander. Goed, je moet ergens een lijn trekken en ik denk dat ik hiermee de belangrijkste betrokkenen genoemd en bedankt heb. Vanwege jullie belangrijke bijdrage aan mijn werk en welzijn zijn jullie wat mij betreft hiermee ook een beetje doctor geworden, voor zover jullie dat al niet waren. Rest mij niets dan het geheel af te sluiten met een passende relativering: “There is, of course, one truly intriguing whale question which it would be lovely to get an answer to. Whales have the largest brains on the earth. What do they do with them? Let’s face it, you don’t need a brain that would get stuck in a lift in order to open your mouth and swim towards a load of stupid plankton. (…) Whales have been lounging about and eating plankton for some fifty million years. Not a bad achievement, especially when compared to humankind’s rather pathetic four hundred thousand. Maybe that’s what whales do with their huge brains – they don’t **** things up. It takes the human race to the very limits of its intellectual ability to **** things up in the way it does. The braincrunching effort that has gone into developing the technology to kill, destroy, poison and pollute, pushes our greatest minds to the very limits of their potential. Perhaps if they had been just a little bit cleverer they would not have done any of those things in the first place.” (uit: Ben Elton, ‘Stark’) ☺ Wouter Den Haag, september 2002 228

Notes

Notes

C-Labeling Technique for Metabolic Network and Flux ...

Conclusions. 85. App.A Calculation of relative intensity covariances from spectral noise variance .... The dotted line represents the cell membrane. .... A second approach for solving unidentifiability problems that does not require any additional.

6MB Sizes 10 Downloads 243 Views

Recommend Documents

Fingerprint Based Cryptography Technique for Improved Network ...
With the advancement in networking technology ... the network so that the sender could generate the ... fingerprint and the sender also generates private key.

generic technique for interconnection between wsn and ip network
transparent interconnection between WSN and IP network. It is generic as it can work with address- centric and data-centric WSNs. It does not require ...

Implementing SVPWM technique to Axial Flux
0.1 Sec reaches to its steady state condition on -190 Rad/Sec. Parameters of the Axial Flux PMSM are used in the simulation listed in Appendix A. Figure 7 ...

A Solver for the Network Testbed Mapping Problem - Flux Research ...
ing an extra node (thus preferring to waste a gigabit in- terface before choosing ...... work experimentation, the primary concern is whether, for example, a node is ...

A Solver for the Network Testbed Mapping Problem - Flux Research ...
As part of this automation, Netbed ...... tions, we compute the average error for each test case. Ideally ... with available physical resources, the goal of these tests.

equations and algorithms for mixed-frame flux-limited diffusion ...
radiation hydrodynamics is to expand expressions in powers of alone and to only ...... varies as T to a higher power than the gas energy density. How- ever, the ...

Computational tools for metabolic engineering
Available online 13 March 2012 ... Analyzing and annotating genomic sequences, storing and retrieving metabolic .... information based on the SBML file format (Hucka et al., 2003). ..... Next, genes that were shared between strains that.

type theory and semantics in flux - Free
clear from the context which is meant. The set of leaves of r, also known as its extension (those objects other than labels which it contains), is {a, b, ...... binary sign ∧.. phon concat ∧.. binary cat(np)(vp)(s) ∧.. fin hd. ∧ .. cnt forw a

Computational tools for metabolic engineering - Semantic Scholar
Mar 13, 2012 - within engineered cells. (4) Pathway prospecting tools aid researchers looking to integrate complex reaction pathways into non-native hosts.

equations and algorithms for mixed-frame flux-limited diffusion ...
In x 4 we take advantage of the ordering of terms we derive for the static diffusion regime to construct a radiation hydrodynamic simulation algorithm for static diffusion problems that is simpler and faster than those now in use, which we implement

Metabolic Programming, Epigenetics, and Gestational ... - Springer Link
Nov 30, 2011 - the idea that environmental factors in early life and in utero can have profound influences on lifelong health [7, 8]. Epidemiologic and animal studies by a number of investi- gators support the concept that there is a critical develop

type theory and semantics in flux - Free
objects than are provided by classical model theory, objects whose components can be manipulated by ... type theory as an important component in a theory of cognition. ...... of a video game.8. (15) As they get to deck, they see the Inquisitor, calli

Relation Between Cardiovascular and Metabolic ...
tween 0.5% and 2.0% of the variance), but the effect sizes were moderate: B values (which indicate the ..... all four diagnoses, are tests that require fast online responses from the individual, and it makes sense that these ... years; decline in the

ePUB Metabolic Engineering: Principles and ...
... have led to a significant production of data such as high throughput genetic data and clinical ... of the major computational tools employed in medicinal chemistry However throughout its We ... are still used off label and their efficacy and safe

A Novel Technique A Novel Technique for High ...
data or information within the cover media such that it does not draw the diligence of an unsanctioned persons. Before the wireless communication data security was found. Processing and transmission of multimedia content over insecure network gives s

flux-presentation-text.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

Multiple Tail Median and Bootstrap Technique for ...
pareto distribution (GPD) arises as the limiting distribution. The concept of .... standard normal cumulative distribution function applied to the CDF (c) Logarithmic.

A FFT technique for discrimination between faults and ...
system fault currents, which proved to provide a reliable ... The generated data were used by the. MATLAB to test the ... systems. The third winding (tertiary) in a three winding transformer is often a delta- connected ..... Columbus, Ohio, 2001.

Islands in Flux -
chroniclers of contemporary issues, it features information, insight and perspective related to the environment, wildlife conservation, development and the island's indigenous communities. The book provides an important account that is relevant both

A FFT technique for discrimination between faults and ...
The generated data were used by the ... system. In this paper, a simple suppressing method is proposed to suppress the inrush ..... Columbus, Ohio, 2001.

An Automatic Verification Technique for Loop and Data ...
tion/fission/splitting, merging/folding/fusion, strip-mining/tiling, unrolling are other important ... information about the data and control flow in the program. We use ...