Issue No.01 - January/February (2010 vol.27)
Published by the IEEE Computer Society
Igor Markov , University of Michigan
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MDT.2010.22
<p>This is a review of <it>Advanced Excel for Scientific Data Analysis</it>, 2nd ed. (by Robert de Levie)</p>
At the end of every semester, student grades are due 72 hours after the final exam. When grading is done, teaching assistants put together the final spreadsheet with cumulative scores, computed as weighted averages of scores for homework, course projects, and exams. My task is to assign letter grades to individual students—an attempt at fairness while keeping the course attractive to future generations and yet avoiding grade inflation. When assigning grades "on the curve," it is common to start with historical quotas for A's, B's, and C's for the course—the mean can be anywhere from B− or B+, and a quarter of students usually receive grades from A− to A+. Students with exceptionally low scores are given exceptionally low grades, and specific percentile boundaries for common grades are adjusted based on the actual workload in this semester. To implement such a policy, the rows of the spreadsheet are sorted by cumulative scores so that percentile boundaries can be translated into absolute boundaries.
Regrettably, absolute grade boundaries can look arbitrary, and a careless mistake on a simple homework question can, in principle, turn an A− into a B+ in a student's transcript. To make the letter grades more stable, the boundaries are further relocated—up or down—to the closest gap between successive scores (the gaps are computed and recorded next to scores). As a side effect, this step allows us to be generous with re-grades on exam questions—small changes in scores rarely affect letter grades. Before entering grades, we count A's, B's, and C's to compare them against the original quotas, and perform a few spot checks.
EDA-minded readers may raise several interesting questions. To what degree can the grade-assignment process be automated? How should this automation be implemented and maintained? How can results be verified? Which tools or environments should be used to visualize results, for example, in the form of histograms? What pricing model is most appropriate for such automation?
Let us examine the traditional approach perfected by the EDA industry—a new stand-alone software product, which we would code-name GradeCompiler. GradeCompiler would read a dedicated input file format with separate sections for raw scores, overall grade quotas and a course-complexity adjustment. Grade-boundary determination would be formalized as an NP-hard optimization problem and solved by fast heuristics. Individual student grades could then be listed in an output file. A prototype implementation written in C++ by two graduate students could be commercialized, extended with a GUI and visual histograms, and maintained on several OS platforms. A separate verification tool would ensure that actual grade statistics closely match those in the input file. Given the potential market of over 400,000 educators, a sales force would need to be assembled. License management software would limit software piracy, and support contracts would improve adoption while increasing the return on investment. Pricing could be based on recurring usage (number of courses) or end result (number of students with grades)—one model may be appropriate for smaller colleges, and the other for large, established universities. Given a double-digit growth in US tuitions, universities could be expected to afford fairly high license fees. Furthermore, customers would request support for incremental grade changes and more obscure features, suggesting opportunities for future releases.
Is there a more intuitive way to automate grade processing? Perhaps a more lightweight solution that does not require significant development, but can be easily tried, adopted, and tweaked? A programming language with built-in text-processing features can be useful, such as Tcl, Perl, Python, or Visual Basic, but it must also provide visualization. Potential users may value a familiar and consistent GUI, as well as interactivity—they may even insist on setting grade boundaries by hand while looking at sorted grades. This brings us to Microsoft Excel—a tool available on practically every Windows laptop today. As the reader probably realized two paragraphs earlier, Excel provides sufficient functionality to store, inspect, and edit grades as well as automate basic grade-assignment tasks, perform sanity checks before submitting final grades and plot a histogram. All this is accomplished with built-in functions, like sorting and by evaluating simple arithmetic expressions in groups of rows, requiring a much smaller effort than in the traditional EDA approach.
Can Excel be used for more-sophisticated tasks of interest in electronic design? Of course, we don't expect Excel to compete with C++ in raw speed, but it may compete in interactivity, available data processing functions, and easy visualization. These topics are covered in the book Advanced Excel for Scientific Data Analysis, by Robert de Levie. Most readers will be surprised by the sheer amount of sophisticated work that can be performed with Excel without breaking a sweat.
De Levie's book starts by reviewing the basic capabilities of Microsoft Excel, including spreadsheet organization, key aspects of GUI, available functions, data visualization, and even differences between Excel 2007 and previous versions. The introductory chapter also touches upon programming (functions, subroutines, and macros) and explains how to represent mathematical objects (complex numbers, matrices, combinatorics). A more systematic list of functions and operations in Excel is given in Appendix A, while Appendix D guides the readers who need to transition to Excel 2007 from earlier versions.
Chapters 2–4 are focused on linear and nonlinear least-squares data fitting. They are full of step-by-step practical examples and illustrations, and offer an effective warm-up before more advanced chapters. The examples are from fields like analytical chemistry and astrophysics, but are quite clear and can be easily adapted to, say, 45-nm CMOS gate delay models. De Levie also introduces the powerful Solver tool available in Excel that can find roots of functions, solve linear programs, and so on. Chapter 4 ends on p. 306 and covers a variety of practical issues, such as fitting a set of curves, interpolation, the impact of noise, etc. But you probably expected such functionality from the most popular spreadsheet tool, didn't you? Did you also expect to see Fourier transforms? They are the subject of Chapter 5, which uses them to perform differentiation, interpolation, lossy data compression, and analysis of tidal waves in Maine. Chapter 6 goes on to convolution, deconvolution, and time-frequency analysis. If you are using Matlab for your DSP work, you are probably not going to be impressed, but do your coworkers and secretaries have Matlab on their laptops? Besides, it is a lot easier to embed Excel data and graphics into Word documents.
Chapter 7 explains how to use Excel to perform numerical integration of ordinary differential equations. It covers Euler and Runge-Kutta methods, Lottka oscillators, and uses these techniques to analyze chaos and stability. Chapter 9 discusses common mathematical operations—differentiation, integration, smoothing, and interpolation. The discussion covers many practical contests and describes several different techniques. Some topics from earlier chapters are revisited and more powerful solutions are introduced—for example, to perform data fitting with low signal-to-noise ratio. The level of detail and insights merit a graduate-level numerical analysis textbook, but everything is illustrated in Excel and remains very tangible. Chapter 10 covers matrix algebra and related techniques. Highlights include several techniques for solving systems of linear equations and matrix inversion, such as Gauss and Gauss–Jordan elimination, as well as the singular-value decomposition, cubic splines, multivariate centering, and several macro packages with additional matrix codes that can be downloaded and installed by the readers. Appendix B describes and illustrates a number of basic matrix functions available in Excel, including finding eigenvalues and eigenvectors, linear system solvers, and matrix generators.
The 80-page Chapter 8 focuses on writing your own macros—this can be tricky, given that macros usually read data from spreadsheet cells and write data into other cells. The chapter covers everything from basic control structures, to coloring, and interface with Microsoft Visual Basic, and then presents a series of examples, followed by sections on debugging and macro recorders. Developing macros for matrix operations is illustrated in Appendix B, while Appendix C outlines available "MacroBundles & MacroMorcels". Chapter 11 delves into spreadsheet reliability, dealing with numerical precision, cancelation errors, good spreadsheet practices, and several additional macro packages, such as BigMatrix and Xnumbers.
All in all, this 700-page book is impressive and somewhat unexpected. It shows how sophisticated numerical tasks, including those arising in EDA, can be performed on any Windows laptop with available software. This can be useful for quick prototyping, early parameter estimation, sanity-checking, benchmark data analysis, and as an aid in research, ensuring that low-level programming effort does not interfere with conceptual thinking about important EDA challenges. Given recent reports of EDA tools running on iPhones, I am sure that Excel-based EDA tools are on their way, if not in wide use.