Chapter 11

Accuracy and Validity of the Nanoscale Simulator

Nunc est disputandum

(Now is the time for discussion.)

__________________________- Horace (adapted)

Introduction

The two questions that must always be asked for any computer simulation are "How reliable is it?", and "How well does it reflect the reality of the process being modelled?". The first question is usually easier to answer than the second. Previous chapters have compared the results of the nanosimulator with experimental data to show that the simulator broadly models the general form of the experimental data for the proteins actin and tubulin. The question of how well it actually simulates the modelled process is more difficult to answer, since direct experimental evidence of the low-level processes involved is sparse and difficult to interpret.

This chapter examines the nanoscale simulator's theoretical utility, and the degree to which its results should be accepted as an accurate reflection of the underlying physical processes being simulated.

Setting the Scene: Other Simulation Programs

In order to clarify these two basic questions regarding the accuracy of the results, and the faithfulness of the model with regard to the process being modelled, it may be instructive to briefly examine and discuss a number of other areas of computer simulation in different fields.

Physics

Some systems have features that are well described by a small number of analytic equations, for which simulations can produce accurate, repeatable predictions of physical phenomena, often to an arbitrary degree of accuracy limited only by computational resources. Examples of such systems can be found in any physics textbook and include planetary motion, wave propagation, relativity, simple mechanics and so on. An example of practical relevance is the increasing use of computerised car crash simulations as a cheaper and more informative alternative to physical controlled crashes⁽¹⁾.

Frequently these systems have limits beyond which their behaviour becomes chaotic (as in the gravitational n-body problem). Theoretically, these systems can still be accurately simulated, but such simulation may be impractical due to errors in measurement of the initial state, or limitations of computing equipment. Within defined limits such simulations make strong predictions, and strongly model the underlying physics of their system.

Meteorology

Complex systems may be well-characterised by physical laws, but may be difficult to simulate due to 'chaotic' behaviour (extreme sensitivity to initial conditions causing greatly differing outcomes for otherwise very similar initial states).

An example of such a system is turbulence⁽²⁾, which on a large scale is largely what hinders weather prediction. Weather prediction attempts to model a highly chaotic system, which limits its usefulness because in addition to imperfections in the model (it is computationally infeasible to model every molecule of air for a year), the initial conditions are not known (it is not physically possible to measure every molecule of air before beginning the simulation). As a result, weather models must simulate their initial conditions before simulating future predictions, leading to inaccuracy that is magnified by chaotic effects over time. This has the all-too-familiar effect of making weather predictions increasingly unreliable the further they lie in the future⁽³⁾. Weather simulation makes only moderate predictions, despite strong modelling the underlying physics of the system.

Expert Systems

The reverse of this may occur when a workable ad-hoc heuristic takes the place of detailed low-level modelling of a system⁽⁴⁾. For example both neural networks⁽⁵⁾^,⁽⁶⁾ and expert systems⁽⁷⁾ have been used successfully to study protein conformation and sequence in ways that may have strong predictive power, based on an analysis of existing data, without necessarily any modelling of underlying process whatsoever.

Economics

Some elaborate simulations have been made of systems whose complexity appears to defy analysis. For example, a number of simulations of economic activity have been made which appear to have extremely weak predictive power, while still (debatably⁽⁸⁾) modelling the underlying process closely⁽⁹⁾.

Biology and Physical Chemistry

Simulation of biological systems is common, ranging from those related to this thesis such as microtubule assembly⁽¹⁰⁾^,⁽¹¹⁾, virus assembly⁽¹²⁾, and actin filament bundling behaviour⁽¹³⁾, to atomic level simulations of diffusion and liquid behaviour⁽¹⁴⁾, detailed protein modelling⁽¹⁵⁾, modelling of membrane bilayers⁽¹⁶⁾ and to much larger simulations of organs⁽¹⁷⁾, organisms and ecosystems.

Due to the great complexity of biological systems, most simulations occupy a middle ground, using an abstracted, but still relevant, model of the system. Predictive power can be weak, where a model is demonstrated to show the plausibility of a process without making strong experimental predictions. For example, the virus work of Berger et al.⁽¹⁸⁾ which demonstrates strongly the plausibility of their local rules based virus model, without making predictions about any particular virus.

Alternatively a model may make very strong predictions, such as protein structure predictions, which specify atomic position to within a few angstroms, and which can be verified by other researchers.

Accuracy and Predictive Power

It is instructive to evaluate the nanosimulator in this context. Does the simulator produce accurate results, and how well does it model the underlying processes it is attempting to simulate? It is also necessary to consider how much initial data the simulator requires. If the simulator requires so much data that it could not fail to produce accurate results, then despite its accuracy it is still not a useful model.

The simulator does in fact require quite a lot of data, not all of it evident. It requires a reasonably advanced knowledge of the physics involved, for instance. But, since that is common across all the different simulations it attempts, it is more pertinent to ask how much specific information about each type of object or protein it requires.

The answer is that it needs every active binding site specified for every active state, with both the breaking and binding strength of each binding site fully specified, as well as the events that cause it to change state. It further requires a model to handle the breaking and binding of multiple links. It can also use the overall diffusion constant for the object, although it will work this out on theoretical grounds if this is unknown.

From this input the simulator predicts a large amount of information. The simulation produces results for the overall polymerisation behaviour for any concentration, as well as describing the nucleation behaviour, and the behaviour of the growing or shrinking aggregate. It can attempt to simulate more complex situations involving multiple chemicals, and it handles a number of physical processes (such as changes in local concentrations caused by polymerisation) that are very difficult to include in traditional mathematical treatments. By allowing the investigator to examine the minutiae of the polymerisation (e.g. the histories of individual polymers, the evolving histograms of the sizes of polymers over time, and so on) the simulation makes available a great deal of detail not currently accessible to experimental researchers.

The simulator produces data that cannot currently be obtained by other means, namely details of protein interactions. If the gross results (i.e. polymerisation curves, histogram distributions) are accurate, then these other details are potentially useful. However as they cannot (yet) be experimentally confirmed they must remain hypotheses.

As experimental techniques become more and more sensitive, and knowledge of protein interactions grows, it is likely that it will be possible to test more accurately the predictions of the nanoscale simulator, and at the same time fine-tune it to a greater extent than is currently possible.

Validity of the Model

The second question is more difficult: how well does the simulator reflect the physical chemistry of the polymerisation of objects? It should be clear that in order to deal with the physical and temporal scales of interest, the simulator necessarily takes an abstract view of many of the details involved in the physical and chemical reactions taking place.

Depending on the questions being asked, and the particular insights being looked for, this can often be an advantage. Certainly, in an examination of the exact way an antibody binds to a particular viral coat protein, or the precise way in which the conformation of a protein changes to expose a binding site, this simulator will be of little use. Its purpose is to model the details of larger scale assembly and disassembly, and for that purpose the level of abstraction is quite appropriate. It does not matter exactly how a site binding, a bond breaking, or a state change of an individual object or protein occurred, what matters is that it did occur. At this level of individual objects the physical reality of the reaction is indeed being simulated, even though the abstractions required mean that it is not making particularly strong predictions.

At the next level up, the simulator does make predictions. The way that objects are initiated (at least in vitro), the way that aggregates grow and break down, are all predicted in detail. Unfortunately though, this level of detail is just outside the range of current experimental apparatus. Although it is possible to see in much greater detail than the simulator through an electron microscope, it is not possible to observe dynamic, active systems at the nanometre scale. And although these systems can be observed growing and changing using light microscopy, it is usually possible to see them only as tiny lines or filaments, and no finer detail can be made out.

This does however allow the testing of some of the gross predictions of the model, such as growth and shrinkage rates, overall polymerisation masses, and so on. Whether one then accepts that the underlying model is accurate, given that the simulator produces correct answers for these more measurable values, is a nice exercise in epistemology. There is reasonable suggestive evidence that this intermediate level of simulation is accurate, because the physics of movement and collision have been included in the simulation, and the top level results (e.g. polymerisation curves) are approximately correct. But this is only indirect evidence - without experimental support it is not possible to firmly claim that this is the method by which such objects interact.

Summation

Until the interactions modelled by the nanoscale simulator have been examined in more detail, the results of the program must be treated with caution. Due to the abstracted nature of the model implemented by the program, there is a certain degree of resilience to the underlying physical chemistry involved, but there is still a dependency. The limitations of the simulator in other respects (conformation change, treatment of the energetics of multiple bonds, and hence of their breakage and binding probabilities, elasticity and so forth) also constrain its scope.

The paucity of experimental evidence at the level required to fully model individual monomers also restricts the simulation, but this problem is also an opportunity, because the simulator may be able to shed light on some details by demonstrating that certain configurations are unlikely to give rise to the observed large-scale results.

In the meantime one of the great attractions of this technique is the way that so many large scale properties, such as diffusion, concentration gradients, growth and decay cycles, structural geometry, and complex interaction cycles all emerge from the base level description of the fundamental units involved. Even in the absence of strong experimental predictions, the modelling of process is an important and interesting result - the fact that the modelling of the process gives rise to experimentally testable predictions is an added benefit.

Index

Last Chapter

Next Chapter

References

1. Thomke, S., Holzner, M. and Gholami, T. (1999) The Crash in the Machine, Sci. Am. Vol 280 No 3. pp 72-77

2. Moin, P. and Kim, J., (1997), Tackling Turbulence with Supercomputers, Sci.Am., Vol 276, No. 1., pp 46 - 52

3. Author's and colleague's experiences at the Bureau of Meteorological Research Centre, Melbourne, Australia, 1988

4. An interesting review of such work is Rawlings, C.J. and Fox, J.P., (1994) Artificial intelligence in molecular biology: a review and assessment, Phil. Trans. R. Soc. Lond., Vol 344, pp 353-363

5. Qian, N., & Sejnowski, T.J., (1988), Predicting the secondary structure of globular proteins using neural network models, J. Molec. Biol. Vol 202, pp 865-884 (cited Rawlings, C.J. op. cit.)

6. Brunak, S., Engelbrecht, J & Knudsen, S., (1991) Neural network detects errors in the assignment of mRNA splice sites. Nucl. Acids Res. Vol 18 pp 4797-4801 (cited Rawlings, C.J. op. cit.)

7. E.g. Clark, D.A., Rawlings, C.J., Barton, G.J & Archer, I., (1990) Knowledge-based orchestration of protein sequence analysis and knowledge acquisition protein structure prediction. Proceedings: AAAI Spring Symposium 1990, pp 28-32 (cited Rawling, C.J. op. cit.)

8. Many fundamental elements of economics are still hotly debated, e.g. Dowe, L.D & Korb, K.B.,(1996) Conceptual Difficulties with the Efficient Market Hypothesis: Towards a Naturalized Economics, Proceedings: ISIS (Information, Statistics and Induction in Science) 1996, pp 212-223

9. Mandelbrot, B.B. (1999) A Multifractal Walk down Wall Street, Sci. Am.Vol 280, No. 2., pp50 - 53

10. E.g. Martin, R.S. (1993) op. cit.

11. e.g. Flyvbjerg, H., et al. (1996) op. cit.

12. Schwartz, R. et al. (Dec. 1998) op. cit.

13. Civelekoglu, G., and Edelstein-Keshet, L. (1994) Modelling the Dynamics of F-actin in the cell, Bul. Math. Biol., Vol 56, No. 4., pp 587-616

14. Allen, M.P. and Tildesley, D., (1986) Computer simulation of liquids, Clarendon Press Oxford (cited Atkins op.cit.)

15. Gerstein, M. and Levitt, M., (1998) Simulating Water and the Molecules of Life, Sci. Am.Vol 279, No. 5, pp 74 - 79

16. A good review article is Merz, K. M. Jr., (1997) Molecular dynamics simulations of lipid bilayers, Curr Opin Struc Bio., Vol 7, No 4., pp 511-517

17. E.g. Hinton, E.G., Plaut, D.C. and Shallice, T (1993) Simulating Brain Damage, Sci. Am. Vol 269, No. 4., pp 58-65

18. Berger, B.,(1994) et al.