Friday, August 31, 2012

Why your stupid makefile doesn't work

Or, a short treatise on silent failure. The makefile, as powerful as it is pervasive, is much like any other power tool in that you can spend all day mashing yourself in the thumb with it if you don't know how to use it properly. The problem with them is that the syntax is so permissive that almost everything is valid, but it's so specific that a single misplaced character will produce a completely different result from what you wanted. Common mistakes:

Misplaced spaces
Inconsistent use of trailing slashes in paths
Pattern rules in the wrong order

You have to be careful with leading and trailing whitespace, because it will cause two seemingly identical paths to not match. The same goes for trailing slashes on pathnames. /a/b/c will not match /a/b/c/. No, it isn't obvious, because you might assume that the makefile operates primarily on files rather than strings. But it does matter.

Finally, as for rule order. The order of execution on dependencies, laterally, is undefined. That is, given "a: b c d" the order in which b,c and d are completed is undefined, especially in multi-threaded builds. Vertically, given "a:" "b:" and "c:", the order of execution is based on the dependency relationships between those targets. So what does rule order matter? Because the order in which rules are evaluated is from top to bottom. Although the build system is great at automatically navigating dependency trees and generating hundreds of shell commands, it is up to you to enter build rules in the correct order of specificity, not so that they run in any particular order, but so that they are tested with the correct priority. For example, I've been working on a makefile to automatically patch the lastest Radeon driver to build for the latest stable kernel on Debian. The first thing my script needs to do is unpack the archive into a directory I've named "oem", so the rule for this is "oem/%: oem". But even though it's the first task to execute, the rule belongs at the bottom of the script because it is the catch-all for any target which did not match any preceding rule. It basically says "If after everything else, you couldn't find a file, then make sure you unpacked the archive". I'm going to throw in a shell command at the end that tests to make sure that the target is on the filesystem, because otherwise, make will not spit out an error message because the rule matches irrespectively of whether it succeeds at outputting the target sought.

Sunday, August 26, 2012

The perfect kernel

My epic quest for the perfect kernel continues, and I am rapidly closing on my goal. Things I've accomplished thus far:

Remove stuff you don't need

Many of these will be obvious. Do you have a parallel port? If your computer is new, probably not. Linux is intended to run everything from handheld cell phones to supercomputing research servers hosting hundreds of users. If you are installing a PC distribution, some of the default assumptions will lean towards the latter scenario. In addition to unneeded device drivers, stuff you can probably remove for a desktop system includes.

Excessive logging
Excessive access control
Unneeded quotas

One of the first things that I noticed was that Debian defaults to support for 512 CPUs. On my machine, that's 504 too many, plus I don't need things like NUMA support either, because this isn't a mainframe. Further, if you're the only one using the machine, then you can dispense with a lot of kernel features intended for policing massively multi-user environments like Web servers and render farms. Do I need a report of the top 12 user accounts by disk usage? No, because I'm the only user, and I know how much I use. Would the kernel waste resources hitting a log or quota-check hook at every corner? Probably.

Add stuff that is missing

Linux has become amazing at producing fully-functional default configurations with zero user input. However, if you want to maximize performance, then you will want to dig through various logs and module lists to ensure that all of your system's capabilities were detected. This is keeping in mind that just because a device works doesn't mean that it is configured to its potential. Some devices may have been detected in a compatibility mode, and in those cases, you will want to manually specify a better driver or change your BIOS settings to enable missing features. Stuff to check:

dmesg: for the system message log
lsmod: for a list of loaded modules
lspci: for a list of detected PCI devices
lsusb: for a list of detected USB devices
gnome-device-manager
Your peripheral and chipset manuals

Demodulize the stuff you use

By default, these days, Linux distributions ship a minimal kernel with almost all of the features compiled as modules. This is, intuitively enough, so that the kernel will be modular, so that it will support all of the hardware in the world (almost), without bloating the system. This is great from a compatibility perspective, but it has drawbacks. Firstly, it requires a two-stage boot process called "initrd", where "rd" is short for "ram disk". This causes the kernel to be booted on an initial ramdisk filesystem alongside all of the modules which it might potentially need to initialize those devices which are necessary to begin accessing the real filesystem. It works fine, but potential problems include the overhead involved in loading lots of module files and allocating the memory for them. There may be overhead from memory fragmentation, or from the extra indirection necessary for dynamic modules. Anyway, there's just something elegant about knowing that all of the essential stuff gets loaded into memory in one contiguous block. It probably decreases boot time slightly, too. Stuff I build into the kernel image includes anything which is high throughput, low latency, or otherwise essential.

Chipset drivers: ICH10 southbridge, PCI(e), GPIO, hardware monitors, etc.
I/O drivers: UHCI and EHCI for USB. AHCI for disks
More I/O drivers: SATA. Also SCSI emulation over SATA is standard now
Video
Sound
Ethernet and TCP/IP
Virtualization hosting devices

Lastly, among the things which you may decide you don't need , some will lack modulization as an option, and you can simply disable those. Everything else, I intend to trivially build as modules. This way, if I find a Token Ring card lying next to a dumpster, support is just an insmod away.

Saturday, August 25, 2012

Kernel configuration

Building the kernel is actually really easy. Between the distribution scripts and the kernel build scripts, you will get a config file that produces a very flexible kernel that will probably boot successfully with the default settings. And there are literally hundreds of settings. You can spend days simply sifting through all of them and learning about computer architecture in the process, if you want to.

Of course, if you get a setting wrong, your system may not boot, but you can install kernels side by side and simply fall back to a working kernel at the boot prompt, so it isn't a big deal.

There are already tons of tutorials on how to perform the build itself, which is really easy and amounts to little more than "make xconfig", "make kernel_image", and [boot loader install command goes here]. If you run Debian, it's super extra-easy. The hard part is choosing the correct settings to customize the kernel and pare it down to what you need without excluding anything important.

For that part of the process, I have found little guidance online. Basically, an interesting thing about Linux is that it is less aware of the hardware products in your system, as such, and more aware of the components from which they are made. So, instead of detecting an Acme Model X motherboard, you will find that it discovers the north and south bridges, and sundry hardware controllers individually. The downside is that it makes system configuration a bit of an adventure. The upside is that Linux is compatible with a wider variety of hardware because the driver developers needn't foresee every possible variation of hardware model and revision for every vendor and product line, and so on. Instead it says "Hey, there's a hard drive controller over there, which I succeeded in talking to using these protocols".

Boot an initial kernel

The default kernel, as provided by the distribution, is usually built using fairly generic settings suitable for your target architecture. Hardware support which could be essential for booting will have been compiled into the kernel image, and then everything else, kitchen sink included, will have been built as modules. Upon boot, you can read the logs, and, if you are like me, you will quickly find that the kernel knows more about your hardware than you do. You can then use this information to build a better kernel. Use dmesg to read the kernel's log and see what hardware it detected and what modules it used to do it. In my experience booting default Debian and Ubuntu, the default hardware support is quite functional across the board. Use lsmod to obtain a list of the modules that were loaded to achieve that support.

Choose modules

The modules in the list you obtain from the generic kernel will fall into one of three categories. Firstly, some modules will be extraneous. You will find that they were loaded speculatively or to support features you don't use, and you don't really need them. Secondly, there will be the modules which are exactly what you need. Thirdly, there may be modules which provide a subset of the functionality you need for a particular device. For example, I have an Intel ICH10 southbridge, but Debian's kernel picks modules for the PIIX chipset, which preceded the ICH product line. I'm still trying to figure out, by trial and error, whether there are ICH drivers which will talk to my hardware, even though PIIX drivers seem to nominally work. Lastly, there is the possibility that some devices were missed entirely. The process for sorting out which is which involves lots of Googling and manual-reading. Luckily, you can make xconfig and hit Ctrl-F to search the configuration settings for modules pertinent to any functionality you are missing.

Thursday, August 23, 2012

Kernel building

One of the many benefits of Linux is that you can recompile the operating system, which is a common practice among its users. This means that you can build a version of it which is tailored to your needs and hardware. You can change settings and limits, swap out components, and add or remove optimizations according to your system capabilities.

A typical operating system will contain hooks for loading and detecting all manner of hardware which I will never use. It may be compiled for an old processor, and for old or generic peripherals so that it will be binary compatible with as much hardware as possible. This is more for the convenience of the OS vendor than the user.

Linux, on the other hand, is compatible with a vast, vast selection of hardware devices, but at the source level. So, rather than go to a vendor for a prebuilt binary which runs on, say, every PC built since 2000, I can build a binary which is specifically built for the hardware in my particular machine.

For example, I can specify that this is an Intel/AMD 64, so I get all-64-bit binaries rather than a mixture of backward-compatible subsystems. Then, I can specify that I have eight CPUs, which means that the necessary tables and supporting structures to run eight CPUs are statically built into the kernel, so there's no need for the kernel to fiddle with extra pointers, counters, and conditionals at runtime within the schedulers, which need to be as lean as possible.

Further, I can tell the build to perform branch analysis on the kernel, which causes it to produce self-modifying code to optimize certain conditionals which change infrequently. So, the kernel, while it is running, will insert jump instructions into itself at appropriate points so that it can avoid a read and a comparison from then until the next state change.

Everything in this operating system is about efficiency. Vista idled at 3GB of RAM, which is an amazing non-achievement. What it was doing holding so much RAM, I can only guess. I would like to think that it was being held in a system cache for further reallocation. My Debian install idles at 500MB, or 1/6th of that. I can run a build of the kernel with all eight processors pegged at 100%, inside of 1.5 GB of RAM. My swap partition is untouched. My hard disk light barely flickers. Why? Because I have so much free RAM! Everything is cached in RAM. The disk only blinks occasionally to commit the object files when they come out of the compiler or linker.

And it isn't just fancy programming. Part of this memory efficiency is due to a strong commitment in the community to standardization of interfaces, which is why Open Source is more than just a buzzword or a social movement. On a very pragmatic and technical level; all of the executables are drawing from the same pool of dynamic libraries. This means that if I have one hundred processes which need to use the DEFLATE algorithm, there is only one copy of zlib in memory, and they all share it, rather than having a different library instance compiled into each and every program times one hundred.

Sunday, August 19, 2012

Linux

I've decided to switch back to Linux as my primary OS for two main reasons. Firstly, its performance is great, especially in the area of file I/O, and software development is an extremely file-intensive activity. Secondly, there is a vast library of free development tools for Linux and Unix-like operating systems, some of which will only work under Windows with compatibility layers, if at all.

Ubuntu is amazing for ease-of-use. The live CD works perfectly, with zero configuration. Setup and support overhead used to be the main argument against Linux, and Ubuntu truly renders that a moot point. I am currently typing this while diffing a backup of my old system in the background, with flawless system responsiveness. I have a working network, sound, and video. And all I had to do to get here was boot a CD.

I only wish that more game companies would target the Linux platform, as to me, that is the main drawback, because I have to switch OSes to play games. Aside from that, I have basically zero incentive to use anything else. Linux, which is completely customizable, renders my hardware platform a playground for testing and software development.

Now, the only question is which distribution to use. Ubuntu makes for an excellent desktop OS, although I use it mostly as a live rescue disk. Its features are beautifully integrated and intuitive. On the other hand, I lean towards Debian as a development environment. Debian, on the other hand, harkens to the old school of Linux administration. It has tons and tons of settings, and the packages are largely uncustomized from their default configurations. What this means, is that after you install lots of packages, you will find the settings to be incoherent, with lots of clumsy or minimalist defaults. This is because, again, the packages come from all over the place, and nobody has substantially unified or coordinated their settings for any particular purpose.

That might seem to give Ubuntu a huge advantage, but despite all of the extra work to configure Debian, it offers the greatest number of options and flexibility to someone who has the time and the inclination to go to the trouble. It has a vast, vast library of meticulously maintained packages, its innards are exhaustively documented, and the entire thing is intended to be customized by the end-user at the same level as its developers.

Wednesday, August 15, 2012

The source editor

Ok, I'm underwhelmed with Visual Studio C++ Express' IDE. First of all, the Object Browser, which is a great idea in principle, also consistently fails at finding anything whatsoever, ever. Also, it utterly crawls at bulk project property updates. It's very clever at allowing you to simultaneously edit a property for several intersections of projects, configurations, and platforms, but for some reason, it takes forever. There isn't even basic support for refactoring. That being a fancy word typically for "Rename this symbol everywhere that it occurs", which is distinct from search-and-replace because it is language-aware, and will not blindly match substrings. It's a basic, but extremely time-saving feature. However, because this is the Express version, you can't do anything to expand its functionality or address its shortcomings. The plugin architecture is deliberately crippled, which is supposed to motivate you to pay money for a less crippled version. And although I'm sympathetic to the desire to get paid for software, there is a ton of high-powered development tools in the public domain.

Next up, I intend to take a look at the Eclipse IDE. I hadn't looked at it for a while because it is a Java-based behemoth which was originally designed for Java development. However, despite running in a Java VM, I don't see how it could be any less responsive than Visual Studio. Because it is an open-source product with a strong emphasis on extensibility, its plugin website boasts a directory of over 1400 plugins. There is support for the MSVC compiler and SDK suite, which is not dependent on Visual Studio to run. There is built-in refactoring support, and git integration.

I also want to try out the UML plugins. UML is supposed to automatically generate flowcharts and diagrams of software architecture, which could be very handy for design. There is a notion called round-trip engineering, which basically means that you can have diagrams generated for your code and you can then edit those diagrams, and have the changes reflected in your code. This sounds really useful for developing the initial data layout and procedural flow. Though, I think once I had any significant body of code, I probably wouldn't want a script generator casually rewriting it.

Incidentally, 7Zip is a really nice compression tool. It just finished compressing a 4GB git repository, which was already partially compressed, to 10% of its original size. It took a while, but that's handy.

Sunday, August 12, 2012

Company name?

Currently considering "William B. Pwningsly Center for Awesomeness". May not fit well on business cards, though.

Thursday, August 9, 2012

Show of hands

Well, there appears to be a monkeywrench in my Irrlicht/Bullet integration idea. The problem is that the two use different-handed coordinate systems, which is basically the perfect problem to ruin an otherwise really good idea. To convert from one system to the other, you basically have to swap the X and Y axes. So far so good, because we could get around that by redefining X and Y to be swapped within Irrlicht. Then, we have to invert Z, which requires a float multiplication for every single vertex, and that is not so good.

Then, I thought to myself, well, the conversion consists of two rotations and a scaling operation, and is therefore linear, so I should be able to fold it into the view transform without incurring any extra processing. The problem remains, though, that prior to that transformation, all of the crossproducts will be facing the wrong way, which means, at a minimum, that all of the normals will be wrong, and backface culling will work backwards. If I really wanted to, I could typedef all of the Irrlicht math primitives to right-handed counterparts and then let the compiler find all of the references. But then, I still have to fix all of the incorrect transforms during file loading, and also, every single time that anything mysteriously fails, I will have to go hunting through the code for that one hand-inlined special case somewhere that continued to assume a left-handed coordinate system.

It isn't insurmountable, but that's a lot of work and, in return, all I will have to show for it is a version of Irrlicht with extra coordinate system support. So, now, I'm going to play with the idea of writing my own OpenGL renderer just to see how that compares. It might even be fun.

Tuesday, August 7, 2012

I blog

because it's a socially acceptable way of talking to myself.

git

I'm really satisfied with git as a version control system. I don't really have much to compare it to other than the version of SourceSafe that came with Visual Studio 6 eons ago. I'm normally not much for celebrity endorsements, but if Linus Torvalds wrote git specifically so that he could maintain the Linux kernel, then that counts for something.

It's simple and fast. Most things can be done with a short command-line or two. Pretty much all of its functions revolve around performing binary operations on directory trees. "binary" in the sense that you give it two directory trees and it performs some abstract operation, usually resulting in a new directory tree. Whether those trees are on the filesystem, the repository, or represented by a diff file is pretty much completely interchangeable. So, you can say "give me version X of the experimental branch. Now give me a diff that represents all of the changes needed to bring it to version Y".

The great thing about version control in general, is that it's an effortless way of generating a complete history of the state of even a colossally huge source tree. Because the changes are stored differentially, rather than as complete copies, your consumption of disk space is mostly bounded by your typing speed. This gives you new confidence to experiment as much as you want without the constant fear of having to wait ten minutes to revert from a backup. Now, you are no longer wasting tons of mental energy and brain space trying to remember all of the changes you need to back out if your program suddenly breaks. Because you have less to think about, this frees up your mental processing and effectively makes you a smarter programmer. Talk about useful.

Sunday, August 5, 2012

Awesome

this

Lambda

Ok. I can't overstate how great lambda functions are, and I have to say that with their arrival, the C++ language, the standard library, and boost have all finally come into their own, as a whole. I mean this in the sense that only as of lambda do they finally accomplish what they have intended all along.


aabb::aabb(const float (*vb)[3], const float (*ve)[3] ){
  
 typedef std::pair<float, float> extent_t;
 extent_t ext[3];
 std::fill(ext, &ext[3], extent_t(0,0));

 std::for_each(vb, ve, [&ext](const float (&v)[3]){
  std::transform(v, &v[3], ext, ext, [](const float &f, extent_t&x)->extent_t{
   x.first = std::max(f, x.first);
   x.second = std::min(f, x.second);
   return x;
  });
 });

 std::transform(ext, &ext[3], dim, [](const extent_t& x){ return x.first - x.second; });
 std::transform(dim, &dim[3], ext, pos, [](const float& d, const extent_t &x){ return x.first - d/2; });
}

This is my first attempt at a function to calculate the axis-aligned bounding box and center of an arbitrary vector array. And I have to say, this code is very, very compact, clear, and semantically dense. There is basically no semantic overhead here. There is no explicit loop control which might otherwise foster sign, comparison, and off-by-one errors (unless I were to forget that my 3D vectors are float[3]). I don't waste a ton of space specifying initialization for classes I'm only ever going to use once. This is about as declarative as it gets. Most of the syntax is spent on specifying types, and then the properly selected type goes along and does its thing without a lot of procedural writing on my part. In fact, aside from template calls, the only procedural code in here basically amounts to this:


std::fill(ext, &ext[3], extent_t(0,0));
x.first = std::max(f, x.first);
x.second = std::min(f, x.second);
return x;
return x.first - d/2;

Which means, in order: zero the extent array, find the furthest extent of each axis in each of two directions, and then calculate the center as the offset between the maximum axis extent and half of the associated dimension.

The standard algorithms were supposed to provide compact, simple loops, and now that we have lambda, they do that. As opposed to before, where you had to go through and write a functor class and remember to properly accept and copy all of the outer scope variables. By the time you were done with that, you had something more complex, more verbose, and at least as error-prone as a traditional for(;;) loop. Suddenly the standard template library and boost fulfill everything they were supposed to do in the first place, and this is great, because that's a lot of really useful stuff.

The really nice thing about declarative programming of this kind is that you spend most of your time specifying types rather than specifying logic or procedures. The compiler is great at finding problems with types, and terrible at the other. So, if you define a bunch of types, and pass them to each other in a type-safe setting, and the thing compiles, then chances are that you have a valid program that does something pretty close to what you intended because the pieces only fit together a certain way.

Saturday, August 4, 2012

Metaprogramming

... or: where does all the time go? So, I was moving along absorbing Blender code, and making glacial progress on the COLLADA exporter to begin with, when I noticed that I needed to iterate over some vertex lists, and I got to thinking about vector math. Vector math got me thinking about MMX. Thinking about MMX got me thinking about generic ways to generate optimized code, which is why I spent all day trying to remember how to use C++ templates. And C++ templates are a really great thing. The reason is that they essentially allow you to program the compiler to generate code for you.

C programmers are accustomed to this:

x + 1 + 2 + 3 + y

collapses to: x + 6 + y

...because the compiler can calculate the values of the constants and insert the result into the binary so that the executable needn't add 1+2+3 at runtime. What templates allow you to do, is to write what are essentially programs that run within the compiler to do work at compile-time instead of run-time. But rather than just operate on literal constants, you can work with just about anything that the compiler knows at compile-time; especially types. So, for example, I can define template <typename T, T V, typename OP=op_null> class expression{...}; in terms of a fundamental type T (int, float, etc.), and a constant value V. Where OP represents an arbitrary operation on V, but defaults to a NOP. This allows me to declare this class as essentially a wrapper around, say "const int x=123". Then, i can overload the operators on that class so that when I perform arithmetic on it, I get back a different type, which is still defined in terms of "const int x(123)", but defined with a different OP type, which performs an operation on V. So, say, --expression <int, 100 >() will return a class which represents the subtraction of 1 from 100. So now, I have an inline predicate that generates a sequence of numbers starting at 100 and counting down. But it does it at compile-time, not at run time.

We can define classes in terms of constants and constant arithmetic, so that allows us to count. We can define classes in terms of each other, so that gives us recursion. Now comes template specialization, where I can specify a type which is a special case for, say expression<int, 0>. So that when I'm done counting down from expression<int, 100> to expression<int, 0 >, I can stop. So, now we have conditionals within the compiler, so we can do finite loops. Also, since we can define any OP class we want, and we can define OP classes in terms of each other, we can nest them to create sequences of "opcodes".

So, having churned through all this, impressed with my ability to write a metaprogram that decrements const int x; 100 times but optimizes out to "x=0", I decided to compare the disassembly to a regular old for() loop and was surprised to find the exact same result. Way to reinvent the wheel. Anyway, at least I remember how to work with templates now. And I am steadfastly committed to finding an actual use for this eventually.

Thursday, August 2, 2012

Mars Curiosity

I'm looking forward to Curiosity's landing on Mars this Monday. The reentry and landing sequence is so ludicrously complex, I'm not optimistic. If it succeeds, it will be very exciting. I'm not aware that they ever did a complete simulation of the landing here on earth, and the chances of nailing something that complicated on the first attempt do not look good to me, but maybe that's why I don't work for NASA. Proverbial fingers are crossed.

Wednesday, August 1, 2012

Bullet

Bullet does more than just physics calculations, per se, because it has to. In particular, it maintains memory structures analogous to the visual scene graph, but for physics purposes instead. Like the renderer, it has an octree implementation which works by recursively subdividing rectangular volumes into eight subvolumes. So, at the top level, you will have a box-shaped region that contains everything, and it will be split into eight sub-boxes, each of which is split into eight sub-boxes, and on down the line. I don't say "split into eighths" because the splits aren't necessarily equal in size. Instead, they are chosen so as to attempt to subdivide and distribute the world elements roughly equally so that, at the top level, each sub-box contains roughly one eighth of everything, and then each of those sub-boxes is split into "eighths" in that same sense, and on down.

The goal here is that once you have divided up the scene, you can perform operations at whatever "box" level is needed, up to whatever granularity you need in order to accomplish a task, without having to iterate through every single item in the game universe. You can pick any arbitrary object at random and almost instantly find everything that is "near" it by looking at the octree node that contains it. If you don't find what you want there, you simply pop out one or two levels, and now you have everything that is "near" the initial search space.

The applications of octrees in collision detection are obvious, and this relatively coarse process of finding spatial neighbors is referred to as the "broadphase", in Bullet. Things that are right up next to each other need to be checked for collisions, and things that aren't don't, and octrees are a very efficient way of distinguishing.

What I would like to do is integrate Bullet into the renderer so that the visual scene graph and the physical scene graph are the same because this saves memory and avoids array-copying. I don't know if this is industry-standard, but it seems useful, especially since I want to be able to stream the world from disk, and it saves the overhead of maintaining an extra octree. If I could use the same structure for collision detection, PVS, and occlusion culling, that would be great.

Physics

I'm almost ready to test basic physics exporting from Blender, and looking at its inner workings has been interesting. The code is very clean, much of it written in procedural-style C++. The codebase is huge, but it's very neatly arranged. It's exciting to peer into such a large, well-made product and know that you can arbitrarily add any features you like. Blender has all manner of commercial-grade features not unlike Maya or 3DSMax, but the source is freely available, and you can learn all sorts of things simply thumbing through it. I do wish it were better documented, though.

Moving onto physics simulation, this is a part that I'm looking forward to. I first got a sense of the sort of generalized simulations that are possible looking at the Battlefield 1942 initialization scripts and was completely amazed that all of the vehicles in that game were actually assembled from many individual simulated mechanical components. So, for example, a plane had two wings which generated lift and drag, and so forth, and a car had four wheels each with shock and suspension characteristics. All of the vehicles in the game were WWII-themed but, if you wanted to, you could, in theory, script an eight-wheeled car that would continue driving if overturned. To be completely zany, I wrote in a tremendous bomb that could be straddled and ridden about like Slim Pickens.

I don't know what physics engine DICE used, or whether the Refractor engine had its own in-house simulator. I will be using Bullet. Bullet is yet another completely outstanding free open-source product. It is released under the zlib license and is under constant development. The latest version incorporates OpenCL support, which is especially exciting because this basically unlocks all of the processing power available on a modern computer.

Battlefield was released in 2002. For comparison, a water-cooled $.5 million Cray SV-1 supercomputer from three years prior would have topped out at 1 Teraflop. Meanwhile a single modern consumer GPU from Radeon claims 1.2 Teraflops for two or three hundred dollars today.

A modern CPU then adds a paltry .1 Teraflops, but its strength is not in floating point processing. A desktop CPU, like the quad-core hyperthreaded processor in my machine would probably be best suited to traversing octrees and running pathfinding and AI algorithms in eight parallel threads while the GPU performs physics and rendering calculations. Most likely with the cooling system groaning like a hair drier all the way, but that's ok.

Here is what I'm getting at. When we look at Battlefield 1942, we consider that it was basically harnessing all of the processing power available at the time to deliver really entertaining game mechanics -- and then DICE was bought out and that same design has been recycled ever since for about a decade. This is what always happens. id software released Doom, which was followed by upteen thousand clones and variations on that theme. But at least back then, some attempt was made at innovation. Doom had its Duke 3D and Quake had its Half-Life. These days, it seems as if developers have gotten wise and realized that there's no need to take a risk on a new gameplay mechanic when the market will happily spend their money on the same basic design, over and over again. So, although the hardware keeps improving by leaps and bounds, to the point where we all essentially have little supercomputers sitting on our desks, the game doesn't evolve because all of the processing power gets spent on making it look as if the player models have real skin and hair.