<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html><head></head> <body> <h1> EMM: Empirical Methods for LSCITS</h1><br><span style="font-style: italic;">A masters-level module designed and delivered by Dave Cliff for the LSCITS EngD, last update April 2010.</span> <h2>Summary</h2> <p>The scale and complexity of IT systems, and so-called systems-of-systems (SoS), are rapidly increasing. There is now significant concern that the traditional divide-and-conquer engineering design techniques that have served so well for the last 50 years of IT systems development may not be able to fully address future needs for large-scale complex IT systems (LSCITS). It now seems very likely that future LSCITS practitioners will need to be skilled in the use of inherently stochastic computational techniques, such as market-based control, neural networks, and evolutionary optimization, for various purposes, all of which typically yield nondeterministic results. For this reason, they will additionally need to be skilled in using statistically rigorous empirical exploration and experimental methods of inquiry more common in the natural sciences than in the synthetic tradition of computer science and engineering. </p> <p>The  narrative arc of this module is as follows. First, a number of motivating examples are presented: the transmission dynamics of malware (computer viruses and worms, etc) over LSCITS networks, and the effect of network topology on those dynamics; market-based resource allocation in LSCITS; automated design and optimization of IT systems via genetic algorithms; and LSCITS applications of automated adaptation via neural networks. Next, minimal models of complex systems, such as cellular automata and random Boolean networks, are introduced as simple nonlinear dynamical systems that exhibit complex emergent phenomena, and for which there is an established body of work studying their behaviour and dynamics. Then, having firmly established the need for students to have a strong understanding of the basics of experiment design, data visualization &amp; display, and statistical analysis, those topics are introduced in turn. The content in this latter part of the module would not be out of place on a degree in any one of the natural sciences, but is nevertheless (regrettably) novel content for a course in computer science/engineering.</p> <p><b> Unit Director &amp; Lecturer:</b> Dave Cliff</p> <h2> Lectures &amp; Seminars</h2> <p>The twelve lectures in this module cover a lot of intellectual ground very quickly. The topic of each individual lecture could easily be expanded to form an entire Masters-level module. For that reason, each lecture is intended as a high-level overview/survey of core topics and issues, with examples selected for their relevance to LSCITS. The seminars offer an opportunity to explore some specific points in more depth, primarily via empirical exploration of key algorithms, tools, and techniques. So, the aim here is that lectures give <i>breadth</i> of coverage, with the majority of the student s involvement in this module being increasing the <i>depth</i> of their knowledge in each area via the directed reading and private study components. </p> <p>Copies of the lecture slides will be given out in the lectures for you to make notes on. If you miss a lecture please ask the lecturer for a copy of the slides that you missed. </p> <h2>Assessment</h2> <p>Assessment for this module takes the form of a single 3000-word pdf-format report (no more than 10pages total) to be submitted six weeks after the the end of the lectures; the usual expectation is that the report describes the design, analysis and visualisation/presentation of results from empirical exploration of a minimal complex adaptive system (such as, for example, a simple stochastic search, optimization, gradient-descent or relaxation process; or a cellular automata or random Boolean network model). However, exceptions may be made to allow the report be in the form of an extended essay with a reduced emphasis on presentation of <i>de novo</i> empirical data: for example, a critical discussion of two contrasting approaches, or a meta-analysis of prior empirical results. In addition to the printable pdf-format report, a supporting webpage/site should also be prepared, allowing additional material (such as source code, raw data, animated visualizations) to be made available for inspection. Further details will be given in Lecture 1</p> <h2> How it all links together</h2> <p>Below you will see the list of lecture-titles, and the links and readings for each lecture. You might wonder how it all fits together. There is a story (that gets told in the first lecture) of how it all links up, but here it is for the record (and for anyone who's reading this because they're wondering if they want to take this module, or why they took this module)...</p> <p>In an unlikely coincidence, the proposals for the UK Large-Scale Complex IT Systems (LSCITS) Initiative and for the USA s Ultra-Large-Scale (ULS) Systems Program were written entirely independently, largely at the same time, and came to broadly similar conclusions about what are the most pressing challenges and the most promising lines of further research. For both the LSCITS Initiative and the ULSS Program, there is a commitment to exploring novel computational approaches based on emergent properties of complex adaptive systems (CAS); in the book  <i>Ultra-Large Scale Systems: The software challenge of the future </i>, by Northrop <i>et al.</i> (2006, pp.32-35) these are referred to as issues in <i>computational emergence</i> and <i>design</i>. This EngD module gives an introduction to some key types of CAS that are likely to play a role in future LSCITS design and operation: that s basically what the first half of the module is concerned with. These CAS can be considered as constituting a set of  empirical methods in the sense that they solve certain problems via their emergent, system-level, behaviour. A common theme in all such CAS is that, for practical applications, the compounding of nonlinearities across the system, and the frequent reliance on random processes, means that formal mathematical analyses of the systems are often not possible. Instead, these CAS are often best understood by treating them as systems with inherent variability and applying empirical/experimental methods that are standard in the natural sciences: that s basically what the second half of the modules is concerned with. </p> <p>The previous paragraph is a decent summary of what we look at in the first part of Lecture 1.</p> <p>As LSCITS/ULSS are networks of interacting components, the second part of Lecture 1 then gives a very quick tour through key terms in network mathematics. The theme of covering the basics continues in Lecture 2, which gives a similarly speedy recap of relevant fundamental probability and statistics issues.</p> <p>Having covered the basics and established some core terminology in the first two lectures, we then move on to exploring issues in CAS relevant to LSCITS via the following four lectures: Lecture 3 discusses malware attack and defence on large-scale technology networks; Lecture 4 describes market-based approaches to issues in LSCITS management; Lecture 5 briefly introduces neural networks; and Lecture 6 gives an overview of evolutionary approaches to automated design and optimization.</p> <p>We bid farewell to CAS in Lecture 7, where minimal models of complex adaptive systems are introduced and discussed. These minimal models offer opportunities to study large-scale networks of interacting entities without investing time and effort in detailed simulations of the activities of each node or their inter-node interactions. Such models are relatively simple to create, and yet can generate vast quantities of rich data in short periods of time. This makes them appealing generators of example data-sets for the statistical tools and techniques that are explored in the remaining lectures of the module.</p> <p>In Lecture 8, the motivation for the statistical issues, approaches, tools, and techniques covered in more depth in the subsequent lectures is illustrated by walking through some examples. The examples allow us to get familiar with core terms and concepts, and also with the basics of Analysis of Variance (ANOVA).</p> <p>In Lecture 9 we explore the design of experiments, essentially looking at rigorous ways of answering the questions  what data shall we gather, and how? </p> <p>In Lecture 10, we look at visualization, clustering, and dimensionality-reduction: all of which are methods for addressing the question  what does the data look like? </p> <p>In Lecture 11, we explore mathematical techniques for nonparametric analysis  that is, for rigorously analysing data without making too many (or any) assumptions about the probability distributions that the data came from.</p> <p>Lecture 12 reviews regression, further analysis of variance, and model selection: how best to summarize the data with a mathematical model. </p> <p><u1:p>&nbsp;</u1:p></p> <h2>Timetable </h2> (L<i>n</i> is Lecture <i>n</i>; S<i>n</i> is Seminar <i>n</i>) <p><b>Monday</b>: <i>Setting the Scene</i></p> <p>14:00-15:00 L1: Introduction, Overview, &amp; Networks</p> <p>15:15-16:15 L2: Basics  Experiments, Results, Randomness</p> <p>16:30-17:30 S1: Complex Adaptive Systems in Systems (of Systems) Engineering</p> <p><b>Tuesday:</b> <i>Complex Adaptive Systems Issues in LSCITS</i><o:p></o:p></p> <p>09:30-10:30 L3: Malware Attack &amp; Defence on LSCITS</p> <p>10:45-11:45 L4: Market-Based Control for LSCITS</p> <p>12:00-13:00 L5: Neural Networks and LSCITS</p> <p>14:00-15:00 L6: Evolutionary Design for LSCITS </p> <p>15:15-16:15 L7: Minimal Models of Complex Adaptive Systems</p> <p>16:30-17:30 S2: Exploring CAS </p> <p><b>Wednesday:</b> <i>Tools &amp; Techniques</i></p> <p>09:30-10:30 L8: Empirical Methods in Action <o:p></o:p></p> <p>10:45-11:45 L9: Design of Experiments</p> <p>12:00-13:00 L10: Visualisation, Clustering, and Dimensionality Reduction.</p> <p>14:00-15:00 L11: Nonparametric Analysis</p> <p>15:15-16:15 L12: Model Fitting and Model Selection</p> <p>16:30-17:30 S3: Exploring Tools &amp; Techniques</p> <p><b>Thursday:</b> <i>Wrapping Up</i></p> <p>10:00-11:00 S4: Planning for the Assessment<o:p></o:p></p> <u1:p>&nbsp;</u1:p><o:p></o:p> <h2> Required Reading<u1:p></u1:p><o:p></o:p></h2> <h2><span style="font-size: 12pt; font-weight: normal;">There is no one book that draws together all the topics covered in this module. Details of additional links and further readings are given on a lecture-by-lecture basis in  Further Readings, Links, and Resources , below. The list of <i>required</i> readings here has been chosen to try to minimize the number of publications involved, while leaving none of the material from the lectures entirely ignored: think of the list here as the smallest set of readings that you might get away with; they re all recommendations for books to keep on your shelves for reference/reminder purposes, but expect to need to follow up several of the additional readings too, depending on what topics interest you the most. <span style="">&nbsp;</span><u1:p></u1:p></span><span style=""><o:p></o:p></span></h2> <p>For the  new science of network mathematics introduced in Lecture 1, and for the malware dynamics discussed in Lecture 3, Chapters 1 to 6 of the populist and engagingly non-technical book by Duncan Watts (one of the founders of the new network science) are a useful introduction:</p> <ul> <li>Watts, D. (2003) <i>Six Degrees: The science of a connected age.</i> Vintage. <a href="http://www.amazon.co.uk/Six-Degrees-Science-Connected-Age/dp/0099444968/ref=sr_1_2?ie=UTF8&amp;s=books&amp;qid=1266758984&amp;sr=1-2">(Amazon.co.uk)</a> </li> </ul> <p>For the foundational material in Lecture 2, almost any book on probability and statistics will be fine  the difficulty is then finding a book that also covers all of the material that follows in Lectures 8-12. The Lecture 2 stuff, plus a fairly comprehensive but brief (sometimes <i>very</i> brief) discussion of the material in Lectures 8-12 can be found in Boslaugh &amp; Watters (2008). The psychologists Field &amp; Hole (2003) take the position that the best way of explaining this material is by injecting a lot of humour (relentlessly so, in places), and by dealing with topics in the chronological sequence that they occur in practice: it s a nice book but with an obvious bias to psychology experiments, and with really not much said about the underlying mathematics. </p> <p>You could probably just about get away with buying only one of these books, but if you can afford two then it will be very useful to see the same topics described in two different styles and depths by the different authors:</p> <ul> <li> Boslaugh, S. &amp; Watters, P. (2008), <i>Statistics in a Nutshell: A Desktop Quick Reference</i>. O Reilly. <a href="http://www.amazon.co.uk/Statistics-Nutshell-Desktop-Reference-OReilly/dp/0596510497/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1266757373&amp;sr=1-1"> (Amazon.co.uk)</a></li> <li>Field, A. &amp; Hole, G. (2003), <i>How to Design and Report Experiments.</i> Sage. <a href="http://www.amazon.co.uk/Design-Report-Experiments-Andy-Field/dp/0761973834">(Amazon.co.uk)</a> </li> </ul> <p>Finally, because it was prepared for politicians in the UK parliament, I'd recommend that you take a read of this:</p><ul><li> Bolton P. (2010), <i>Statistical Literacy Guide: How to spot spin and inappropriate use of statistics.&nbsp;</i> House of Commons Library. <a href="http://www.parliament.uk/documents/commons/lib/research/briefings/snsg-04446.pdf"> (Here)</a></li></ul><p>For the market-based approach introduced in Lecture 4, there is a very long HP Labs technical report written by Dave Cliff in 1997. Chapters 2 and 3 of that report provide an overview of the background economics and give an introduction to market-based control. This is free.</p> <ul> <li> Cliff, D. (1997), <i>Minimal-Intelligence Agents for Bargaining Behaviors in Market-Based Environments</i>. Hewlett-Packard Labs Technical report HPL-97-91. (<b>Free download</b> of pdf file available here: <a href="http://www.hpl.hp.com/techreports/97/HPL-97-91.pdf">HPL-97-91</a>).</li> </ul> <p>For the complex adaptive systems concepts introduced in Lectures 5, 6, and 7, Chapters 15 to 23 of Flake (1998) are a gentle introduction (and the rest of Flake s book is great too, but covers topics outside the scope of this module).</p> <ul> <li>Flake, G. W. (1998), <i>The Computational Beauty of Nature: Computer Explorations of Fractals, Chaos, Complex Systems and Adaptation.</i> MIT Press Bradford Book. <a href="http://www.amazon.co.uk/Computational-Beauty-Nature-Explorations-Adaptation/dp/0262561271/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1266755101&amp;sr=8-1"><span class="GramE">(Amazon.co.uk)</span></a></li> </ul> <p>MIT Press host a website to support this book, here: <a href="http://mitpress.mit.edu/books/FLAOH/cbnhtml/home.html">http://mitpress.mit.edu/books/FLAOH/cbnhtml/home.html</a></p> <u1:p>&nbsp;</u1:p><o:p></o:p> <h2>A note on statistical software packages</h2> <p class="MsoNormal" style=""><span style="">There are several popular software packages for statistical analysis of data, many with associated visualization tools built in or available as add-ons. Several of the most popular are long-established commercial systems, meaning that someone (you, or your employer, or your university) has to pay for a license. Examples include stats-specific packages such as <i><a href="http://www.minitab.com/">Minitab</a></i>, <i style=""><a href="http://www.sas.com/">SAS</a></i></span>, <i><a href="http://www.spss.com/">SPSS</a></i>, <i><a href="http://www.stata.com/"><span class="SpellE">Stata</span></a></i>, and <i><span style="color: red;"><a href="http://www.statsoft.com/"><span class="SpellE">Statistica</span></a></span></i>; and also statistical libraries that you can add on to more general-purpose mathematical programming systems such as <i style=""><a href="http://www.wolfram.com/"><span class="SpellE">Mathematica</span></a></i><span style=""> and <i style=""><a href="htttp://www.matlab.com/"><span class="SpellE">Matlab</span></a></i>. <u1:p></u1:p></span></p> <p class="MsoNormal" style=""><span style="">In recent years the open-source community has developed systems that rival the power of these commercial products, yet which are free. In particular, the <i><a href="http://www.r-project.org/">R</a></i> statistics-specific programming language is rapidly becoming the open-source standard for statistical applications (<i>R</i> is  heavily inspired by a commercial language/system called <i>S, </i>developed<i style=""> </i>by Bell Labs  now Lucent Technologies; commercially available as <i><a href="http://spotfire.tibco.com/Products/S-Plus-Overview.aspx">S-Plus</a></i>), and has advanced visualization capabilities, see e.g.:</span></p> <ul> <li> Adler, J. (2010) <i>R in a Nutshell: A Desktop Quick Reference.</i> O Reilly. </li> <li> Murrell, P. (2006) <i>R Graphics.</i> Chapman &amp; Hall. </li> </ul> <p>The <i style=""><a href="http://www.python.org/">Python</a></i> programming language is rapidly gaining huge support among the numeric/scientific computing and simulation/modelling <span class="GramE">communities,</span> and <i>Python</i> is also open-source and free to use. A standard Python distribution can be extended for mathematical work and simulation <span class="SpellE">modeling</span> via the free <i style=""><a href="http://numpy.scipy.org/"><span class="SpellE">NumPy</span></a>, <a href="http://www.scipy.org/"><span class="SpellE">SciPy</span></a>,</i> and <i><a href="http://simpy.sourceforge.net/"><span class="SpellE">SimPy</span></a></i> library packages. It is claimed (e.g. Lutz, 2008, p.4 &amp; p.6) that <span class="SpellE"><i style="">NumPy</i></span> extends <i>Python</i> to be more powerful than <span class="SpellE"><i>Matlab</i></span>, and that Python is now routinely used for scientific and numeric programming by <span class="SpellE">JPMorgan</span>, Union Bank of Switzerland, Citadel Fund Management, NASA, Los Alamos Labs, <span class="SpellE">FermiLab</span>, the Jet Propulsion Lab, and others. See e.g.:</p> <ul> <li>Lutz, M. (2008) <i>Learning Python.</i> O Reilly.<u1:p></u1:p> </li> <li> Vaingast, S. (2009) <i>Beginning Python Visualization.</i> Apress Inc.</li> </ul> <p> It s not uncommon for textbooks to assume familiarity with one specific package (for instance, the Field &amp; Hole book referenced in the required reading list gives many examples of SPSS output).<u1:p></u1:p></p> <p>The lectures on this module do not assume knowledge of any specific packages, neither commercial nor open-source (this may change in later years). For the assessment exercise, access to one or more of the software packages mentioned here would probably be very useful. If your employer/sponsor can provide you with access to a copy of a commercial package, that s fine: they all do pretty much the same things; if you end up using a free installation of <i>R</i> and/or <i>Python</i>, you won t be at a disadvantage either. Bear in mind that once you have invested time in learning one package, you ll probably stick with using that until events force you to learn another one (e.g. you move to a new job where they use a different package as standard). </p> <u1:p>&nbsp;</u1:p><o:p></o:p> <h2> Further Readings, Links and Resources</h2> <p class="MsoNormal" style=""><span class="GramE"><b> General.</b></span><b><span style=""> </span></b><o:p></o:p></p> <p class="MsoNormal" style=""><span style=""><u1:p></u1:p>Significantly more mathematical details for the material in Lectures 8-12 is given by Chatfield (1983) and Jain (1991). Jain s book is notable for its explicit focus on experiment design and statistical analysis for analyzing computer systems: although the computer systems described there are now two decades old, the experimental design and analysis aspects are still all very current. <ul> <li>Chatfield, C. (1983), <i>Statistics for Technology: A Course in Applied Statistics.</i> <span class="GramE">3<sup>rd</sup> Edition, Chapman &amp; Hall.</span> <a href="http://www.amazon.co.uk/Statistics-Technology-Applied-Statistical-Science/dp/0412253402/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1266757439&amp;sr=1-1">(Amazon.co.uk)</a> </li> <li> Jain, R. (1991), <i>The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling.</i> Wiley. <a href="http://www.amazon.co.uk/Art-Computer-Systems-Performance-Analysis/dp/0471503363/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1266757920&amp;sr=1-1">(Amazon.co.uk)</a> </li> </ul> <p>For the complex adaptive systems introduced in Lectures 5, 6, and 7, a significantly more applied perspective than that offered by Flake (1998) is to be found in chapters 5, 9, 11, and 12 of Segaran (2007). Segaran cover several of the same topics, often in less technical depth, but with programming examples showing how such techniques are used in current Web2.0 applications. </p> <ul> <li>Segaran, T. (2007), <i>Programming Collective Intelligence.</i> O Reilly. <a href="http://www.amazon.co.uk/Programming-Collective-Intelligence-Building-Applications/dp/0596529325/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1266757746&amp;sr=1-1">(Amazon.co.uk)</a> </li> </ul> <p class="MsoNormal" style=""><b><span style="">Lecture 1: (Introduction, Overview, &amp;) Networks. </span></b><o:p></o:p></p> <u1:p></u1:p> <ul type="disc"> <li class="MsoNormal" style=""><span style="">Duncan Watts is one of the founders of what is still referred to as "the new science of networks". His popular-science book <i>Six Degrees: <span class="GramE">The</span> New Science of Networks</i> (Vintage, 2003) is a gentle introduction, but lacks technical details. The missing technical details are to be found in his earlier book (an expanded version of his PhD thesis): <i>Small Worlds: <span class="GramE">The</span> Dynamics of Networks between Order and Randomness</i> (Princeton University Press, 2003). Many of Watts technical contributions are co-credited with his PhD supervisor, Steve <span class="SpellE">Strogatz</span>. <u1:p></u1:p><o:p></o:p></span></li> <li class="MsoNormal" style=""><span style="">Independently of Watts &amp; <span class="SpellE">Strogatz</span>, Albert-Laszlo <span class="SpellE">Barabasi</span> made similarly significant contributions at much the same time; for a popular account of his work, see: <span class="SpellE">Barabasi</span>, A.-L. (2003) <i>Linked: The New Science of Networks</i>. <span class="SpellE">Perseus</span>. <u1:p></u1:p><o:p></o:p></span></li> <li class="MsoNormal" style=""><span class="SpellE"><span style="">Barabasi</span></span><span style=""> and Watts collaborated, along with Mark Newman, on producing an edited collection of key papers in the history of the  new science of networks : Newman, M., <span class="SpellE">Barabasi</span>, A.-L., &amp; Watts, D., editors (2006) <i>The Structure and Dynamics of Networks</i>. Princeton University Press.</span></li><li class="MsoNormal" style=""><span style="">The Klemm-Eguiluz model that allows smooth variation between Small World and Scale Free networks was first described in a three-and-a-half-page paper available <a href="http://pre.aps.org/abstract/PRE/v65/i5/e057102">here</a>: Klemm, K. and Eguiliz, V., (2002) "Growing scale-free networks with small-world behavior"</span><span style="font-weight: bold;"> </span><span style="font-style: italic;">Phys. Rev. E &nbsp; </span><span style="font-weight: bold;">65</span>, 057102.</li> </ul> <p><b>Lecture 2: Basics  Experiments, Results, Randomness.</b></p> <ul> <li> Almost everything in this lecture is covered in the books by <span class="SpellE"> Boslaugh</span><span style=""> &amp; Watters (2008), </span>Chatfield (1983), Field &amp; Hole (2003), and Jain (1991), all of which were mentioned above. For another view on the same material, there is a free electronic statistics textbook available on StatSoft s website at <a href="http://www.statsoft.com/textbook/">http://www.statsoft.com/textbook/</a></li> </ul> </span><span style="font-weight: bold;">Seminar 1: Complex Adaptive Systems in Systems (of Systems) Engineering.</span><u1:p></u1:p> </p> <ul type="disc"> <li class="MsoNormal" style=""><span style="">In this seminar we will be discussing the following three papers:</span></li> <ul> <li class="MsoNormal" style="">Sheard, S. &amp; Mostashari, A. (2009) "Principles of Complex Systems for Systems Engineering".&nbsp;<span class="moz-txt-underscore"><span class="moz-txt-tag"></span><span style="font-style: italic;">Systems Engineering </span><span style="font-weight: bold;">1</span></span><span style="font-weight: bold;">2</span>(4):295-312.&nbsp;</li> <li class="MsoNormal" style="">Polacek, G. &amp; Verma, D. (2009) "Requirements Engineering for Complex Systems: Principles vs. Rules". In <span style="font-style: italic;">Proc. Seventh Annual Conference on Systems Engineering Research</span> (CSER2009).&nbsp;</li> <li class="MsoNormal" style="">Sillitto, H. (2010) "Design Principles for Ultra-Large-Scale Systems" unpublished draft manuscript.&nbsp;</li> </ul> <li class="MsoNormal" style="">Please read all three papers, and for <span style="font-style: italic;" class="moz-txt-star"><span class="moz-txt-tag">all three</span></span><b class="moz-txt-star"><span class="moz-txt-tag"></span></b> of them please come to the seminar prepared to stand up and give a brief (10mins) summary of the paper and your perspective on its contents (you can prepare powerpoint slides if you wish, but that may be overkill -- it should be enough to talk from some notes and/or a marked-up hardcopy of the paper). In the seminar, exactly who summarizes/responds to which paper will be chosen at random. No-one will be asked to summarize/respond to more than one paper; but the random nature of the selection process means that, on the day, one or more people will end up not being asked to give a prepared summary/response for any of the papers. Nevertheless, you'll all be encouraged/expected to engage in the discussion that follows, so the fact that you've all come prepared means you can fully involve yourself in that discussion.&nbsp; </li> </ul> <p class="MsoNormal" style=""><span style=""> <p><b style="">Lecture 3: Malware Attack &amp; Defence on LSCITS<u1:p></u1:p></b></p> <ul type="disc"> <li class="MsoNormal" style=""><span style="">The Williamson and <span class="SpellE">Leveille</span> paper on epidemiological modelling of computer virus spread and removal, that introduced the PSIDR model, is available as a Hewlett-Packard Labs technical report, serial number <a href="http://www.hpl.hp.com/techreports/2003/HPL-2003-39.pdf">HPL-2003-39</a>. A later HP Labs Tech Report, <a href="http://www.hpl.hp.com/techreports/2003/HPL-2003-103.pdf">HPL-2003-103</a> by <span class="SpellE">Twycross</span> and Williamson is a short paper describing early results from the HP Virus Throttle. Details of the HP Virus Throttle's subsequent release into internet switch and router products can be found in <a href="http://www.hp.com/rnd/news/virus_throttle_software.htm">this HP Feb 2005 Press Release</a>, and in this <a href="http://www.cbronline.com/news/hp_procurve_enables_virus_throttle_within_vlans">Dec 2006 CBR story</a>. <u1:p></u1:p><o:p></o:p></span></li> <li class="MsoNormal" style=""><span style="">A short paper in <i>Science</i> discussed the way in which the details of how a computer virus operates can determine the topology of the resultant  overlay network of vulnerable machines: <span class="SpellE"><span style="color: rgb(35, 31, 32);">Balthrop</span></span><span style="color: rgb(35, 31, 32);">, J., Forrest, S., Newman, M., &amp; Williamson, M. (2004), <span style="">Technological Networks and the Spread of Computer Viruses , <i>Science</i> <b>304</b>:527-529.<u1:p></u1:p></span></span><o:p></o:p></span></li> <li class="MsoNormal" style=""><span style="">Ross Anderson at Cambridge maintains a web-page on <a href="http://www.cl.cam.ac.uk/%7Erja14/econsec.html">Economics and Security,</a> which includes links to a number of key papers. The most useful for this lecture is <a href="http://www.cl.cam.ac.uk/%7Erja14/Papers/econ_czech.pdf"><i>Information Security Economics - And Beyond.</i></a><u1:p></u1:p><o:p></o:p></span></li> </ul> <p><b style="">Lecture 4: Market-Based Control for LSCITS<u1:p></u1:p></b></p> <ul type="disc"> <li class="MsoNormal" style=""><span style="">It s quite old now, but the best collection on market-based control is still: Clearwater, S., editor (1995) <i>Market-Based Control: A paradigm for distributed resource allocation.</i> World Scientific Press.<u1:p></u1:p><o:p></o:p></span></li> <li class="MsoNormal" style=""><span style="">For further details of the IBM humans-<span class="SpellE">vs</span>-robot-traders work that we covered, see the key <a href="http://www.research.ibm.com/infoecon/paps/AgentHuman.pdf">paper</a> by the IBM team, their <a href="http://www.research.ibm.com/infoecon/talks/ijcai01/ijcai01_files/frame.htm">web presentation</a> of that paper, that team's <a href="http://www.research.ibm.com/infoecon/index.html">homepage</a> (from which, if you click on "papers", you will be able to find the sources for some of the other IBM results quoted in the lecture slides). In the key paper linked to above, the IBM team state that the financial impact of their results "...might be measured in billions of dollars annually."<span style="">&nbsp;&nbsp; </span><span style="">&nbsp;</span><u1:p></u1:p><o:p></o:p></span></li> <li class="MsoNormal" style=""><span style="">A recent paper by <span class="SpellE">Ladley</span> &amp; Bullock explores the effects that trade-network topology has on the performance of traders in that network: <span class="SpellE">Ladley</span>, D. &amp; Bullock, S. (2008). The Strategic Exploitation of Limited Information and Opportunity in Networked Markets. <i>Computational Economics</i> <b>32:</b>295-315<span style="color: rgb(0, 112, 192);">. <a href="http://eprints.ecs.soton.ac.uk/16763/1/Ladley-Bullock-08.pdf"><span style="color: rgb(0, 112, 192);">(PDF here)</span></a></span>.<u1:p></u1:p><o:p></o:p></span></li> <li class="MsoNormal" style=""><span style="">Some of the co-authors of this paper are members of the ULS Systems Program: Klein, M., Moreno, G., <span class="SpellE">Parkes</span>, D., <i>et al.</i> (2008), <a href="http://www.eecs.harvard.edu/econcs/pubs/klein08.pdf">Handling interdependent values in an auction mechanism for bandwidth allocation in tactical data networks</a> <em>Proceedings of ACM SIGCOMM 2008 Workshop on Economics of Networked Systems (<span class="SpellE">NetEcon</span> 2008)</em>.<o:p></o:p></span></li> <u1:p></u1:p> <li class="MsoNormal" style=""><span style="">The EPSRC-funded project on Market Based Control, involving researchers at the Universities of Birmingham, Liverpool, and Southampton, and industrial involvement from BAE Systems, BT, and Hewlett-Packard, has a website at <a href="http://www.marketbasedcontrol.com">www.marketbasedcontrol.com</a> <o:p></o:p></span></li> </ul> <u1:p></u1:p> <p><b>Lecture 5: Neural Networks and LSCITS</b></p> <ul> <li>The original paper on back-propagation is <span class="SpellE">Rumelhart</span>, D., Hinton,<sup> </sup>G., &nbsp;&amp;&nbsp; Williams, R.<b> </b><span style="">(1986)<b>  </b>Learning representations by back-propagating errors </span><i> Nature</i> <b>323</b>:533  536, an expanded version of which was published as a chapter ( Learning Internal Representations by Error Propagation , pages 318"362) in: <span class="SpellE">Rumelhart</span>, D. &amp; McClelland, J., editors, (1986) <i>Parallel Distributed Processing: Explorations in the Microstructure of Cognition.</i> Vol. <b>1</b>: <i>Foundations</i> (MIT, Cambridge, 1986). The book chapter is the version that almost everyone cites though.</li> <li>A long-standingly popular and recently-revised textbook on neural networks and related machine learning algorithms is: Haykin, S. (2009) <i>Neural Networks and Learning Machines</i>, 3<sup>rd</sup> Edition, Pearson.</li> <li>For relevant work by members of the LSCITS team, see: Kurd, Z., Kelly, T., &amp; Austin, J. (2007)  Developing Artificial Neural Networks for Safety-Critical Systems <i><span style="">Neural Computing &amp; Applications,</span></i><span style=""> <b style="">16</b>:11 19 <a href="http://www.springerlink.com/content/02076pu863562232/">[PDF here]</a>. And, for the full details: Kurd, Z. (2003) <i>Artificial Neural Networks in Safety-Critical Applications</i>. PhD Thesis, University of York Department of Computer Science<a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.6215&amp;rep=rep1&amp;type=pdf">[PDF here]</a>.</span></li><li><span style="">(<b>Free download</b> of a simple back-propagation neural network written in Python is available here:&nbsp;<a href="http://python.ca/nas/python/bpnn.py">bpnn</a><a href="http://python.ca/nas/python/bpnn.py">.py</a>).</span></li> </ul> <p><b>Lecture 6: Evolutionary Design for LSCITS </b></p> <ul> <li>For many years, many people have used Melanie Mitchell s <i>Introduction to Genetic Algorithms </i>(MIT Press, 1998) as the standard citation for introducing people to GAs. It s still a great introduction, but it could do with being released as a revised edition to accommodate the last decade of development. </li> <li> John Koza co-authored a series of large books <i>Genetic Programming</i> (1992), <i>Genetic Programming II</i> (1994), <i>Genetic Programming III</i> (1998), <i>Genetic Programming IV</i> (2003). They are each rich sources of examples, and collectively show the development of the field over the past two decades, but this is a <i>lot</i> of reading. For a more succinct and up-to-date summary, see: <span class="SpellE">Poli</span>, R., Langdon, B., &amp; <span class="SpellE">McPhee</span>, N. (2008), <i>A Field Guide to Genetic Programming. </i><span class="GramE">Lulu Press. </span></li> <li>For an example of LSCITS-relevant use in industry (Hewlett-Packard work on GA optimization of SAN designs) see <span class="SpellE">Dicke</span>, E., <span class="SpellE">Byde</span>, A. <i>et al. </i>(2004)<i style=""> </i> Using a Genetic Algorithm to Design and Improve Storage Area Network Architectures in <i>Proceedings of the International Conference on Genetic and Evolutionary Computation: GECCO2004</i>. Lecture Notes in Computer Science <span style="font-size: 12pt;">3102</span><span style="font-size: 12pt; font-weight: normal;">:1066-1077.<u1:p></u1:p> <a href="http://www.hpl.hp.com/techreports/2003/HPL-2003-221.pdf">[PDF here]</a> </span></li> <li>The Carnegie-Mellon University Software Engineering Institute (CMU SEI), home of the ULS Systems Program, maintains a library of ULS publications which includes this somewhat speculative position paper: McKinley, P., Cheng, B., &amp; <span class="SpellE">Ofria</span>, C. (2007) <a href="http://www.cs.virginia.edu/%7Esullivan/ULS1/ULS07/mckinley.pdf">Applying Digital Evolution to the Development of Self-Adaptive ULS Systems</a> presented at the <a href="http://www.cs.virginia.edu/%7Esullivan/ICSEULS1/">First ICSE Workshop on Software Technologies for Ultra-Large-Scale (ULS) Systems</a>: Workshop at the International Conference on Software Engineering, May 22, 2007, Minneapolis, Minnesota, USA.</li> <li>(<b>Free download</b> of a simple minimal genetic algorithm example written in Python is available here: <a href="http://www.cs.bris.ac.uk/home/dc/simpleGA.py">simpleGA.py</a>).</li> </ul> <p><b>Lecture 7: Minimal Models of Complex Adaptive Systems</b></p> <ul> <li> Andy Wuensche maintains the Discrete Dynamics Lab website at <a href="http://www.ddlab.com">www.ddlab.com</a>, which is a great place to spend some time. Andy s early masterwork on the results of driving 1-D Cellular Automata backwards was published as a glossy colour book by Addison-Wesley in 1992 and then quietly deleted in 1999. Andy has since resurrected it as: <span class="SpellE">Wuensche</span>, A. &amp; Lesser, M. (2000) <i>The Global Dynamics of Cellular Automata: An Atlas of Basin of Attraction Fields of One-Dimensional Cellular Automata.</i> <span class="GramE">2<sup>nd</sup> Edition.</span> <span class="GramE">Discrete Dynamics Inc.</span></li> <li>Stephen Wolfram was an internationally recognised CA researcher before he even started work on <a href="http://www.wolfram.com/"><span class="SpellE">Mathematica</span></a>. His 2002 <span class="SpellE">magum</span> opus is astonishing in a number of regards, worth checking out for the ambition alone: Wolfram, S. (2002) <i>A New Kind Of Science.</i> Wolfram Media Inc. For a review of <span style="font-style: italic;">ANKOS</span> &nbsp;by Melanie Mitchell in <span style="font-style: italic;">Science</span>, see <a href="http://www.sciencemag.org/cgi/content/full/298/5591/65?ijkey=DwQCDRqqvIYqc&amp;keytype=ref&amp;siteid=sci">here</a>. For a whole bunch of <span style="font-style: italic;">ANKOS</span> reviews<span style="font-style: italic;"></span>, see <a href="http://shell.cas.usf.edu/%7Ewclark/ANKOS_reviews.html">here</a>.&nbsp;</li> <li>For a recent comprehensive study of 1-D CA, see McIntosh, H. (2009) <i>One Dimensional Cellular Automata.</i> <span class="SpellE">Luniver</span> Press.</li> <li>The NK model of <span class="SpellE">epistatic</span> genome interactions, and the NKC model of co-evolution, were introduced by Stuart Kauffman in his 1993 book <i>The Origins of Order: Self-Organization and Selection in Evolution.</i> <span class="GramE">Oxford University Press.</span></li> <li>An extension to the NK model, exploring the effects of small-world/scale-free <span class="SpellE">epistatic</span> network topologies is discussed in <span class="SpellE">Hebbron</span>, T., Bullock, S. and Cliff, D. (2008) <a href="http://eprints.ecs.soton.ac.uk/15781/"><span class="SpellE">NKalpha</span>: Non-uniform <span class="SpellE">epistatic</span> interactions in an extended NK model.</a> In: <i>Artificial Life XI: Proceedings of the Eleventh International Conference on the Simulation and Synthesis of Living Systems</i>, pp. 234-241, MIT Press, Cambridge, MA.</li> <li>This paper, by experts in advanced large-scale complex telecoms networks, concludes with a discussion of how cellular automata may have some clues for future 4G Networks: Ho, L., Samuel, L, &amp; Pitts, J. (2003)  Applying Emergent <span class="SpellE">Self</span>-Organizing behaviour for the Coordination of 4G Networks Using Complexity Metrics . <i>Bell Technical Journal</i> <b style="">8</b>(1):5-25.</li><li><span style="">(<b>Free download</b> of a simple minimal 1-D cellular automata example written in Python is available here: <a href="http://www.cs.bris.ac.uk/home/dc/simpleCA.py">simpleCA.py</a>).</span></li> </ul> <p><b>Seminar 2: Exploring Complex Adaptive Systems.</b></p> <ul> In this seminar we will discuss&nbsp;case-study academic papers relevant to Lectures 3 to 7 that show CAS approaches applied to (or proposed as applicable to) real-world LSCITS issues -- these have already been mentioned in the readings for each lecture, above.&nbsp;</ul></span></p><p class="MsoNormal" style=""><span style=""> <p><b>Lecture 8: Empirical Methods in Action<o:p></o:p></b></p> <ul> <li> Much of this lecture closely follows Chapters 10 and 11 of: Chatfield, C. (1983), <i>Statistics for Technology: A Course in Applied Statistics.</i> 3<sup>rd</sup> Edition, Chapman &amp; Hall. <a href="http://www.amazon.co.uk/Statistics-Technology-Applied-Statistical-Science/dp/0412253402/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1266757439&amp;sr=1-1">(Amazon.co.uk)</a></li> <li>Basic Analysis of Variance (ANOVA) are covered in all the statistics texts listed in the Required Reading and General Background sections of this webpage, but only Bosluagh&amp; Waters (2008) also discuss Multivariate ANOVA (known, unsurprisingly, as MANOVA). MANOVA and Multivariate Analysis of Covariance (MANCOVA) are described nicely in Chapters 8 and 9 of: Chatfield, C. &amp; Collins A., (1980). <i>Introduction to Multivariate Analysis.</i> Chapman and Hall. </li> </ul> <p><b>Lecture 9: Design of Experiments</b></p> <p>In addition to the DoE material covered in the books by Chatfield, Field &amp; Hole, and Jain (all of which were mentioned above), these two books should also be useful:</p> <ul> <li> This one is nice and short (150pp): Anthony, J. (2003) <i>Design of Experiments for Engineers and Scientists.</i> Butterworth Heinemann.<span style="">&nbsp; </span></li> <li>This one is much bigger and much more comprehensive: (650pp): Montgomery, D. (2009) <i style="">Design and Analysis of Experiments. </i><span class="GramE">7<sup>th</sup> Edition, Wiley.</span><o:p></o:p></li> </ul> <p><b>Lecture 10: Dimensionality Reduction<u1:p></u1:p>, Visualization &amp; Clustering<u1:p></u1:p></b></p> <ul type="disc"> <li class="MsoNormal" style=""><span style="">The <span class="SpellE"><i>Pajek</i></span> website was for many years <a href="http://vlado.fmf.uni-lj.si/pub/networks/pajek/">here</a> but that site now says that from January 2008 it has been replaced by the <span class="SpellE">Pajek</span> Wiki which is <a href="http://pajek.imfm.si/doku.php">here</a>. You can download the <span class="SpellE">Pajek</span> overview paper from <a href="http://pajek.imfm.si/lib/exe/fetch.php?id=pajek&amp;cache=cache&amp;media=slides:pajek97.pdf">here</a>.</span></li><li class="MsoNormal" style=""><span style="">There are lots of examples of high-quality graphical visualizations of complex data at <a href="http://www.visualcomplexity.com/vc/">www.visualcomplexity.com</a>, and the IBM <span style="font-style: italic;">ManyEyes</span> community "shared visualization and discovery" project mentioned in the lecture has a website at <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/.">http://manyeyes.alphaworks.ibm.com/manyeyes/</a> and is described in a technical paper:&nbsp;</span></li><ul><li class="MsoNormal" style=""><span style=""><span style="">Vigas, F.,Wattenberg, M., van Ham, F., Kriss, J., and McKeon, M. (2007) "</span></span><span style="">Many Eyes: A Site for Visualization at Internet Scale" In <span style="font-style: italic;">Proc. IEEE InfoVis 2007</span>.&nbsp; Available <a href="http://www.research.ibm.com/visual/papers/viegasinfovis07.pdf">here</a>.<br></span></li></ul> <li>Fry's book is a gentle introduction to using the <i><a href="http://processing.org/">Processing</a></i> open-source environment, which is currently rapidly gaining in popularity in the graphic design, animation, and visualization communities: Fry, B. (2008) <i>Visualizing Data: Exploring and explaining data with the Processing environment.</i> O Reilly. <u1:p></u1:p><o:p></o:p></li> <li class="MsoNormal" style=""><span style="">This is a comprehensive review of data clustering techniques: Jain, A., <span class="SpellE">Murty</span>, M., &amp; Flynn, P. (1999). Data clustering:&nbsp;a review</span><span style="">. <i style="">ACM Computing Surveys.</i> <b style="">31</b>(3):264-323. <u1:p></u1:p><o:p></o:p></span></li> <li class="MsoNormal" style=""><span style="">The nice tutorial paper on PCA, written by Lindsay Smith, is <a href="http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf">here</a>. Alternatively, see Chapter 4 of: Chatfield, C. &amp; Collins A., (1980). <i>Introduction to Multivariate Analysis.</i> Chapman and Hall.<span style="">&nbsp; </span><span style="">&nbsp;</span><u1:p></u1:p><o:p></o:p></span></li> <li class="MsoNormal" style=""><span class="SpellE"><span style="">Erkki</span></span><span style=""> <span class="SpellE">Oja's</span> paper on PCA via a simple neural network model is available online <a href="http://ece-classweb.ucsd.edu/winter06/ece173/documents/Oja%20--%20Simplified%20Neuron%20Model%20as%20a%20Principal%20Component%20Analyzer.pdf">here</a>. <u1:p></u1:p><o:p></o:p></span></li> <li class="MsoNormal" style=""><span style="">A February 2008 version of the McDonald, <span class="SpellE">Suleman</span>, Williams, <span class="SpellE">Howison</span> and Johnson paper on using Minimum Spanning Trees to visualise/analyse the foreign exchange markets is available <a href="http://arxiv.org/abs/cond-mat?papernum=0412411">here</a>. This is not standard computer science or <span class="GramE">systems engineering reading material, but it shows</span> you how the ideas in this lecture get used in real applications. The authors take care to introduce the foreign exchange side of things quite gently.<u1:p></u1:p><o:p></o:p></span></li> <li>This February 2009 paper shows PCA used for dimensionality-reduction in the analysis of trading systems that had been optimized by a genetic algorithm (thereby touching on issues from Lectures 4, 6, and 10): Cliff, D. (2009) "<a href="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4769010"><span style="color: rgb(79, 79, 79); text-decoration: none;">ZIP60: Further Explorations in the Evolutionary Design of Trader Agents and Online Auction-Market Mechanisms".</span></a> <i>IEEE Transactions on Evolutionary Computation.</i> <b>13</b>(1):3-18. A <span class="SpellE">pdf</span> of the preprint version is available <a href="http://eprints.ecs.soton.ac.uk/14078"><span style="color: rgb(79, 79, 79); text-decoration: none;">here</span></a><span style="color: red;">.</span><o:p></o:p></li> </ul> <o:p></o:p> <p><b>Lecture 11: Nonparametric Analysis</b></p> <ul> <li>Much of this lecture is based on this book by Marjorie Pett which, although it has  healthcare in the title, describes methods that are applicable whenever samples are small and/or distributions are uncommon: Pett, M. (1997), <i> Nonparametric Statistics in Health Care Research: Statistics for Small Samples and Unusual Distributions. </i> Sage. </li> <li>Nick <span class="SpellE">Feltovich's</span> 2003 paper on the Robust Rank Order test (Dave's favourite nonparametric statistical test) is available for download from <a href="http://www.uh.edu/%7Enfelt/papers/robust.pdf">here</a>. A supplementary paper by <span class="SpellE">Feltovich</span>, listing exactly-computed critical values for the RRO (i.e., the numbers you need to compare to) is available <a href="http://www.uh.edu/%7Enfelt/papers/rrovals.pdf">here</a>.<o:p></o:p></li> </ul> <b>Lecture 12: Model Fitting and Model Selection</b> <ul> <li>For an advanced extended discussion of model selection, see Burnham, K. &amp; Anderson, D. (2002), <i>Model Selection and Multimodel Inference: A practical information-theoretic approach.</i> 2<sup>nd</sup> Edition, Springer.</li> </ul> </span><span style=""> <p><b>Seminar 3: Exploring Tools and Techniques</b></p> <ul> <li>In this seminar we will be discussing the following three papers: </li> <ul> <li> Cohen, J. (2004) "The Earth Is Round (p&lt;.05)"<span style="font-style: italic;"> American Psychologist</span> <span style="font-weight: bold;">49</span>(12):997-1003.<br> </li> <li>Johnson, D. (1999) "The Insignificance of Statistical Significance Testing" <span style="font-style: italic;">Journal of Wildlife Management</span> <span style="font-weight: bold;">63</span>(3):763-772.<br> </li> <li>Wilkinson, L. et al. (2009) "Statistical Methods in Psychology Journals: Guidelines and Explanations". <span style="font-style: italic;">American Psychologist</span> <span style="font-weight: bold;">54</span>(8):594-604.</li> </ul> <li> Although LSCITS has little to do with academic studies in psychology, and even less to do with the management of wildlife, the arguments raised in these three papers are really relevant to many/any field that uses empirical methods.&nbsp; The Cohen and Johnson papers both highlight problems (and speculate on solutions) in the use of significance tests. The Wilkinson paper was written by a "task force" convened to formulate a response to such concerns. Seminar 3 will be a discussion of these three papers. Please read all three, and for <span style="font-style: italic;">all three</span><b style="font-weight: bold; font-style: italic;" class="moz-txt-star"><span class="moz-txt-tag"></span><span class="moz-txt-tag"></span></b><span style="font-weight: bold; font-style: italic;"> </span>of them please come to the seminar prepared to stand up and give a brief (10mins) summary of them and your perspective on their contents (as before, you can prepare powerpoint slides if you wish, but that may be overkill -- it should be enough to talk from some notes and/or a marked-up hardcopies of the paper). &nbsp;Who summarizes/responds to which paper will be chosen via the same random process as was used for Seminar 1: no-one will be asked to summarize/respond in more than one seminar; the random nature of the selection process means that, on the day, one or more people will end up not being asked to give a prepared summary/response for any of the papers; nevertheless, you'll all be encouraged/expected to engage in the discussion that follows, so the fact that you've come prepared means you can fully involve yourself in that discussion.&nbsp; </li> </ul> </span></p> </body></html>