Thursday, November 29, 2012
Moustache transplants
Saturday, November 17, 2012
Too Much Computing?
There's something that's been bothering me for a while. Computerized voting. Why? Because nobody knows what's inside a computer. Literally. Microchips are tiny, and they consist of billions of miniscule structures photographically etched onto a glass slab the size of a coin which, further, is encased in ceramic or polymer resin.
So, unless you have shaved open every chip in the voting machine and then analyzed it under an exotic microscope layer by layer, you don't know what was inside the machine you used to vote. You don't know what was in it before you voted, nor, more importantly, what it recorded while you voted.
The trouble is that there are many avenues for the introduction of flaws at various points in the production and supply chain for the machines and their software. Add to this the fact that we vote by secret ballot in the US, and there's really not much way to verify the validity of anything that comes out of an electronic voting machine, because all it contains is a pattern of magnetic charges that are only meaningful within itself, and to its manufacturer. It can claim anything it was programmed or wired to say, whether by mistake, or by malicious contrivance. On the other hand, with a paper ballot, there are minute details which, though they might not tie a ballot to a specific individual, they do distinguish the ballots from each other. Handwriting, in particular, for example. So, I don't see what's wrong with using old-fashioned pen and paper. Actually, I think voting is a rare instance where pen and paper are vastly superior to any mechanical or electronic method.
Tuesday, November 13, 2012
More Video
I think it's kind of strange and interesting how the GPU industry is turning PC performance metrics on their head. My new nvidia GTX 650 Ti will bring my computer's processing capacity to somewhere around 2 teraflops. With 1.8 tf of that residing on the GPU alone. Computers of any kind did not reach half of that capacity until 1996. I find that I never get tired of reveling in the idea that my ordinary middle-of-the-road gaming PC is computationally equivalent to a machine for which an entire research department or university would have paid millions of dollars as recently as within my lifetime.
I'm also struck, however, by the surprising asymmetry. There are 768 parallel execution units on this GPU. That is, it has hardware support to perform seven hundred sixty eight simultaneous operations at any given time. It's like a supercomputing cluster on a PCI card. In some ways, I wonder if it isn't the PC which is becoming a peripheral to the video card. I think there will come a time when the PC is relegated to little more than the role of IO backplane, while all of the interesting things happen elsewhere.
To answer why this is, I think we have to consider the burden imposed on x86 architecture by the desire for backward compatibility. As far as I know, it wasn't until the adoption of the amd64 architecture that the PC began to discard legacy features, and even then, this is only in 64-bit "long mode". This means that, in addition to all of the advanced hyper-modern features available on an amd64, there is still all of the supporting circuitry needed to run applications from 1978. Try booting to DOS sometime. It works perfectly. I suspect that CP/M would be no different.
I don't understand why this is. Can't Intel and AMD sell special legacy kits for people who want to run ancient software? Why not just emulate old computers entirely in software? There is absolutely no shortage of processing power to do that, plus there are substantial benefits to virtualization, like snapshotting, as well as the extension of the modern OS's capabilties into the emulated environment.
I think that, at some point, Intel and AMD should simply start over from scratch and design a chip around a modern instruction set. With the recent progress on virtualization technology, I tend to hope that this is the direction in which they are headed already. Even as things are, if I needed to run some sort of x86-based PLC written in QBASIC, I would have no problem emulating several dozen such machines entirely in software. So what is the point of hamstringing the transistorspace with three decades' worth of backward compatibility? Because a PC CPU has, not some, but ALL of the features that were ever implemented over the course of my entire lifetime. 8086, 286, 386, 486, and on, and on, and on. All of those features, many of which are non-features to me, as well as most users, take up space and waste energy.
The GPU, on the other hand, is not burdened by any such issues, and can be designed entirely for raw performance. So, I think that if the results which GPU designers are achieving are any indication, then it means that we're designing CPUs wrong.
Monday, November 12, 2012
Video
My previous video card was an nvidia 8800 GTS from the now defunct BFG technologies. Before they went out of business, they sold a lot of nvidia GPUs with extended warranties standard. A great many of these cards burnt out, as is attested by the widespread complaints which can be found on the Web. My card was one of these.
This is the only bad thing that I can say about nvidia and, in reality, it's hard to say whether the fault can be attributed to nvidia at all since, at least in my case, it was a factory-overclocked card. This was the reason for the extended warranty. The reasoning was supposed to go "Sure, the chip is overclocked, but it comes that way from the factory with an extensive custom heatsink, and the engineering and workmanship are guaranteed". Which is a fine theory until you realize that once things begin to go south, they can simply close up shop and run, which is what they did.
So, anyway, aside from this one thing, which may not be an nvidia problem at all, I can only say good things about the company. In particular, their Linux support is first-rate. It's really good to see a prolific, leading hardware manufacturer to whom Linux is not some sort of stepchild.
I get the feeling that a lot of companies dedicate a single evening meeting to Linux where they go "Ok, so $x thousand dollars should be a suitable Linux budget for the period ending FOREVER, and if that should chance to result in a driver which works at least some of the time, for some applications, then great. And I ordered mocha, not decaf." And again, I realize that Linux isn't important to everyone, but I really, really like it, and I think it would greatly further the computing industry both in terms of culture and productivity for Linux to be brought further into the mainstream.
Sunday, November 11, 2012
VirtualBox
I really like VirtualBox so far, but there are a few caveats. First of all, I find that I have to use PIIX chipset emulation because ICH emulation (which is marked experimental) dies frequently and nastily running Vista. This problem so far appears completely resolved running an emulated ICH board.
The other thing is that installing the VirtualBox extensions on Vista is a pain because Vista locks d3d9.dll, so the VirtualBox installer can't replace it with a paravirtualized version. The solution is inconvenient. First off, you have to mount the VDI file containing the Vista install so that you can rename d3d9.dll so that Vista will stop locking it immediately upon boot. However, contrary to the instructions I've found scattered online, I've been completely unable to mount VDI files directly. Instead, I use:
ionice -c 3 VBoxManage clonehd --format RAW in.vdi out.img
and this converts the Virtual Disk Image file to a raw disk image which, as far as I know, is just the concatenation of all of the data on the disk as it would appear on a physical disk. "ionice -c 3" is optional but, on my system, it prevents the IO operation from hogging all of the system's IO-time, so that I can continue doing amazingly important things in the foreground, such as this blog entry.
Great. So, now we have a disk image, but we can't mount an entire partitioned disk. So we have to do:
parted out.img
u b
p
and that gives something like:
Model: (file) Disk /home2/wine/out.img: 69632786432B Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 1048576B 69631737855B 69630689280B primary ntfs bootwhere the value in the "Start" column is the byte-offset into the image at which the first and only partition begins. The intervening data is probably stuff like the partition table and MBR.
FINALLY. We can do:
mount -o loop,offset=1048576B out.img /mnt/vbox
mv /mnt/vbox/windows/System32/d3d9.dll /mnt/vbox/windows/System32/d3d9.dll.bak
umount /mnt/vbox
At this point, we discover that VirtualBox cannot directly load raw disk images, so we do:
ionice -c 3 VBoxManage convertfromraw out.img nod3d.vdi --format vdi
And now, finally, we have a vdi file which is exactly like the one we started with, except for the name of a single file, and now we can boot to Vista and install VirtualBox Guest Extensions, complete with the experimental D3D driver, which I hope will work instead of bluescreening every few minutes.
Incidentally, I've somewhat accidentally hit upon a neat strategy for speeding up my virtualized Vista install. I've found that by placing my base disk image on my SSD, it effectively separates my base Vista install from all subsequent snapshots, whose location defaults to my home directory, which is an old platter drive. The base install is 21(!!!) GB excluding the pagefile, and this is mostly libraries and executables that never change, so the SSD is a great place for this data. But because I always run this disk image from a differential snapshot located on a platter drive, the base image is effectively read-only, so I don't have to worry about Windows thrashing my SSD to death with constant write operations. And best of all, I don't have to worry about shoving a non-standard configuration into Windows, which is apt to break irreparably the moment you do anything out of the ordinary with it. It's all handled by Linux and the VB hypervisor. As far as Windows is concerned, it's just writing to a single regular hard drive when, in fact, all of the system files are read at high speed from an SSD, while all of the pagefile, document, and application I/O is quietly routed to a hardy old mechanical drive. And best of all, I can run Windows without listening to the tortured disk head grinding I've come to associate with it
Monday, November 5, 2012
PunkBusting in WINE
The infamous PunkBuster anti-cheat system uses debugger-like techniques and is similar in function to a typical antivirus, except that it is dedicated to game cheats rather than the usual types of malware. It is a client-server arrangement. The PB server runs alongside the game server, and the client lives in the background on each player's machine. Mostly, the client sits and watches the game's memory space searching for various anomalies which it considers to be indicative of the presence of a cheat. If it finds one, it reports the player to the server, which then decides on a corrective action which can range anywhere from a warning, to ejection, to possible global blacklisting, depending on the nature of the fault.
You can disable PunkBuster on your client if you want to, but most servers will treat this itself as a minor infraction, and will quickly kick you to ensure that you don't interfere with the the other authenticated players.
PunkBuster installed under WINE without any complaints, though, strangely, I had to assign it a library-mapping exception directing WINE to use its built-in version of crypt32. It's odd that it required the WINE-version of the crypto library to run, but it installed fine. The question of whether it works, on the other hand, is a different matter.
I have identified three important background processes which are important PB components. These are PnkBstrA.exe, PnkBstrB.exe, and PnkBstrK.sys. PnkBstrK.sys is a Windows driver, and this is the component which fails upon joining the server. When I think "Windows driver under Linux", right away, I expect that this is going to be trouble. After some cursory digging around, I found, to my surprise, that the driver is successfully loaded by WINE and it runs in the background as it should until it is called upon to perform an unimplemented NT kernel call, at which point it dies.
The call in question is PsLookupProcessByID(), which is easy enough to implement, but the real trouble is the subsequent call to KeStackAttachProcess(). The dearth of documentation for this call is not surprising, but from what I gather, its purpose is to attach the memory space of a user process to the driver so that it can go digging around in it. Given the nature of PunkBuster as an application, it's fairly clear what the driver component is trying to accomplish before it errors out. It's saying "Excuse me, ntoskrnl.exe, I would like to scan X process' address space", and WINE says "I don't know how to do that". PunkBuster fails its scan, and I get kicked from the server.
Not being able to run PunkBuster is a fairly severe limitation on WINE given that, at least in my view, its main purpose is to enable Windows games. So, I think it should be a fun challenge to develop a patch that makes it work.
The first problem at hand is the issue of address space collisions. Under Windows, every user-process' address space is partitioned into two areas. The low area belongs to the userspace process, and the high area is mapped into systemspace. One consequence of this arrangement is that a systemspace process can map any entire userspace image into its address space, in its original position, without colliding with or overwriting any of its own data. This is what I assume KeStackAttachProcess() is supposed to do. And although it's easy enough to implement using shared memory, the first thing I need to do is modify WINE so that its "systemspace" loads into a reserved area mirroring the Windows behavior.
Certain WINE components are always required for every process, driver or otherwise. These are "wine" and "wine-preloader", which I have successfully relocated. Now, I just need to modify the PE loader to recognize a "systemspace" load, and make it offset all of the dll images appropriately...
Adventures with WINE Part 2
Many modern games require a Web browser for setup, updates, and multiplayer. So, my first priority was to get a working Web browser. WINE comes with its own Internet Explorer replacement which is based on Gecko and nominally works, but it doesn't have all of the features of MSIE. Importantly, many apps embed Internet Explorer as a component for menu rendering and Internet I/O, so having a working form of IE is important.
Using WINE you can, in fact, install the real Internet Explorer 8 from Microsoft and run it under Linux. It doesn't work perfectly, but it works well enough to host the plugins you need to launch games. Windows Firefox, on the other hand, does seem to work perfectly. It works exactly as usual, even installing and running plugins from the Web. Still, it's important to get IE up and running because some applications might stubbornly embed IE in Firefox(!!) simply for the sake of browser-portability. So you STILL need a working IE, and installing MSIE 8 using WINE's "winetricks" script is probably the best way to do it.
So, now I have two working Windows-based Web browsers running under Linux, and this is pretty good. However, the next problem I notice is that SSL is broken in IE. SSL is critical for games because the portal utilities, or whatever they're called, need the cryptographic layer for licensing and authentication. You can get around this problem, again, thanks to winetricks, which automates a large number of WINE tasks, many of which revolve around automatically downloading and installing "native" Windows components.
I found that tinkering with native versions of various Internet-related libraries allowed me to get SSL working in IE: wintrust, secur32, crypt32. The native Windows versions represent a complete implementation in contrast to the WINE versions, and so they work better. They aren't included by default because they are owned by Microsoft, but nothing is to prevent the user from obtaining them from Microsoft.
I want to start out with something basic, so I decide to try out Battlefield Heroes. I use Firefox to navigate to the game's Website and it happily downloads and installs all of the necessary plugins and applets. The game downloads and updates itself. This is getting exciting. It launches, it runs, and the framerate is good despite a few console warnings about unrecognized shader definitions. So now, it's time for the moment of truth: multiplayer. And guess what else? It works. For all of about three seconds, at which point the server ejects me for a PunkBuster failure. And so begins the next chapter.
mmap() is Great
Part of the attraction to messing with WINE, beyond its practical benefits, is simply the challenge involved. And there is also my fascination with the inner workings of computers. And then there is also the idea that when you contribute to a really low-level project, the implications can be very broad and far-reaching. In the case of WINE, every feature you add has the potential to open compatibility to dozens or even hundreds of applications. There is also the educational aspect, too, as I've picked up details about Windows and Linux that I wouldn't have learned otherwise.
Foremost of these is the concept of mapped memory. This may be the greatest programming construct that nobody ever talks about. Under the unices, this is accomplished with the mmap() function, and under Windows, the equivalent is CreateFileMapping(). The basic idea is that you can create arbitrary correspondences between a process' address space and a file. That's all. It's very simple, and very useful. Of course, this is especially powerful under the unix-like systems where the devices are all files. So, that earlier statement abstracts to encompass the creation of arbitrary correspondences between address spaces and devices. For example, I can map a one-megabyte block of addresses to /dev/random, and I instantly have a virtual megabyte of random numbers. Because it's mapped to a device, that address space doesn't take up any physical memory. The address-space itself is allocated, but it is private to the process. On a modern 64-bit machine, each and every process has 2^64 bytes of virtual address space, and you can map it to anything you want.
A more useful example is to map a huge address space to an equally huge game world stored on disk, and then the operating system takes care of the details of paging the world in and out of memory while, to your application, it appears exactly as if all twelveteen dozen gigabytes of the file are resident in memory. If you map to things that aren't memory, then there's basically no limit.
Incidentally, this is how Linux and probably other OSes handle shared libraries (dlls). Rather than allocating a chunk of physical memory for each dll for each and every process that uses it, instead, the disk file containing the library is simply mapped into memory, so that in reality all of the processes are referencing the same physical copy. That way, a dll may not even be present in RAM, except when needed, and even then, it may only be the parts of it that are in use.
Adventures with WINE
WINE is not the kind you drink. It is, instead, an expansive Windows quasi-emulation layer for Linux and UNIX-like operating systems. Although the 'E' stands for "Emulator", the developers prefer to avoid that word because WINE doesn't strictly emulate anything in a formal sense. Instead, it creates an environment which is binary-compatible with Windows from the ABI and API on up. There is no machine virtualization. The Windows binaries are loaded into memory using a custom loader, and they execute natively on the CPU just like a UNIX binary, but with a ton of dynamically-linked shims and supporting libraries.
Just as impressive is that this all happens entirely in host userspace. WINE simply maps Windows functionality to corresponding UNIX actions using libraries which take the place of native Windows counterparts. So, for example, WINE's gdi32.dll has most of the expected Windows drawing functions, but instead of calling the NT Kernel to draw to the screen, it performs the equivalent tasks by calling the X server. The Windows application thinks it's running in Windows because it sees Windows interfaces, but those interfaces accomplish their jobs in a completely different way than Windows.
It might be an apt analogy to say that WINE is to UNIX what MinGW is to Windows. It amounts to a compatibility layer, and some of the Windows functionality does have to be simulated in some way, and that does amount to overhead. However, Linux is so much faster at various tasks, that the difference should in many cases be a wash, though WINE may be faster or slower depending on the activity in question. The main benefit, though, is the flexibility that Linux offers and being able to take advantage of it from Windows programs. I also just really like the idea of being able to play games without having to boot outside of my favorite operating system. My Linux installation is like a luxurious den with a La-Z-Boy and an attached workshop, so why would I want to leave to launch other applications?
So, WINE, as I said, is basically an impossible project. But that doesn't deter me from stubbornly attempting to run the latest games with it. And in actuality, the process of learning to work on WINE has felt sort of like building a ship in a bottle. My goal is to get some current FPS games running, and it takes a good bit of work.
Saturday, October 20, 2012
Chromium
I like Google Chrome because it's fast, and it comes from my favorite Web search company. However, for whatever reason, the prebuilt packages from Google and Debian have rendering problems on my system. Maybe it's the amount of other messing around that I have done throughout the system which is somehow knocking the widget offsets out of whack. In any case, it looks awful, and is often unusable.
The solution to this is to download Chromium, the open-source version of Chrome, and build it. This ensures that the headers used to build the program correspond to the specific versions of software on my system, which helps to ensure compatibility. It works great now, but, as usual, there is always one obscure glitch which causes me to waste most of an entire day on an otherwise simple project.
As it turns out, Chromium has a stubbornly recurring bug which will probably crop up periodically forever because of the way Chrome's sandboxing is done under Linux. Basically, it immediately performs a system call which disables almost all further system calls from the sandbox process. This contributes to security by greatly limiting the sandbox process' acccess. However, every now and then, a forbidden call will somehow work its way in during development or building. In my case, this causes the browser to crash instantly on all pages, including all of the status and config pages.
The correct way to fix this would be to weed out the offending call. I'm way too lazy to do that, and prefer to simply disable that particular layer of the sandbox using "--disable-seccomp-filter-sandbox". Because it's a security-related thing, this probably explains why I had to go to so much trouble to find the answer. Really, though, since this feature isn't even available under Windows, I'm not going to worry about it.
Friday, October 19, 2012
Serious question
For everyone who reads my blog, which I know totals to roughly one person, counting myself. Is there something wrong with me if I have to compare Chief Miles O'Brien and Piers Morgan side by side in order to tell them apart? Even then, I have to rely mostly on age difference to tell them apart. Maybe the nose and chin are slightly different. Seriously, for the longest time, I thought that Colm Meaney had diversified his career into talkshow and reality TV hosting until I realized that they are two completely different people.
Wednesday, October 17, 2012
ALSA Sound System
Currently, ALSA is the standard Linux sound system. I'm not sure how I feel about it. On one hand, it works and it's highly configurable. On the other hand, its configuration system is a labyrinth of semidocumented hacks on top of hacks, which would probably have been better implemented using a real scripting language like Lua, or bash for that matter. It seems intuitive enough at first glance, such that you might mistake it for a list of assignments within a simple object-oriented property graph. But as soon as you try to use it, you will be instantly bewildered by all of the seemingly intuitive things that will simply not work, or which mean something other than what you think. Even after writing the configuration file for my system, I still don't fully understand exactly what the commands do. As far as I can tell, you are assigning strings to key values. Sometimes those strings are references, by name, to other configuration blocks. Sometimes those strings are themselves inlined configuration blocks. Sometimes one will work, but not the other. It's ridiculously confusing, and if it's going to be this difficult to use, then someone needs to write a configuration front-end, because this was like reverse-engineering a primitive programming language that I am probably only ever going to use once.
So, with the complaining out of the way, here is why I needed a custom configuration file. I have a Soundblaster X-Fi. The trouble is that the way this card is intended to be used, is via its six analog outputs, which are usually lauded by reviewers as an excellent value for consumer-grade D/A devices. Unfortunately, my Surround receiver doesn't have analog inputs, so I have to use SPDIF. SPDIF is digital two-channel sound and, as far as I know, the X-Fi does not do hardware encoding. This script, when dropped in /etc, configures ALSA to provide an "a52" device, which provides a filter stack which performs the requisite on-the-fly encoding. Applications see a stereo output but it is upmixed to 5.1 and then encoded into two channels, suitable for a Surround receiver to decode back to 5.1. When I get around to it, I will add another device that encodes 5.1 to SPDIF directly without upmixing. The configuration file is here.
More Desktop
Now I remember what I disliked about xfce. I had a problem saving (or restoring?) my session state between logins. I don't know what caused that, but this time around the problem is gone and I really like xfce. My only complaint is that there is no generalized drag-and-drop for menu and taskbar customization. Everything else is great.
Better yet, the xfce window manager can be switched out for Compiz. Compiz has a reputation for bloat, but I suspect that's only if you enable a bazillion useless bling plugins. With basic settings in place, I find that desktop response is even faster, because now the windows are being rendered in hardware, offloading work from the CPU. Dragging windows around the desktop is glassy-smooth, unlike in fluxbox, or some other 2D WM. I don't consider transparency a useless vanity item. I think it's really handy to be able to see through all of your non-focused windows because even though you might not be able to see them clearly, you can still see where they are relative to each other, and you can spot updates and changes in processes you might have running in the background.
There's probably some stuff in Compiz that looks neat but isn't strictly useful, and those things can be ok, so long as they are related to momentary things, like workspace switching and so on. If there's a feature that causes all of my windows to be continuously rendered as if they are on fire, or something, I can do without that.
Next up, I want to enable display management with fast session and user-switching, but I would rather do it without loading the entirety of the GNOME Desktop environment.
Desktop
Well, now that my disk storage fiasco has finally come to a close, I'm back to thinking about my desktop environment. I've decided against gnome because it's bloated and the file manager is possibly buggy. KDE is an extremely sharp-looking feature-rich environment, but it is insanely bloated, well beyond gnome. It leaves various Python instances running in the background, which take up tons of memory. Python is fine for transient processes such as configuration scripts, but not for something that is going to stay memory resident or run in the background.
fluxbox is lean, with a good feature set, but it's kind of annoying that all of the configuration is in text files. It occurs to me that I should probably be blogging more because I can't remember what I disliked about xfce, but I'm sure it was something. lxde, also, was alright. I want to dig around in fluxbox for a while and make sure I've given it a due chance because I like its simplicity. I want a setup that allows me to group like programs in the taskbar and maybe drag like windows into each other for tabbing. Eyecandy isn't a big factor with me, but I do like the idea of some practical transparency so that you can see through windows when positioning them.
Monday, October 8, 2012
Current events
The economy appears to be in dire straits indeed, as one of our senators has presented his initiative to exploit the nation's untapped livestock resources in his proposal to kill and eat Muppets. It's really not since the CDC's acknowledgement of the ongoing zombie threat that a government initiative has been so relevant. Politics are inherently surreal and disturbing, and it's good to see that someone is confronting the real issues head-on.
My only question is: who is going to regulate the production of muppet byproducts, and where is the USDA on this? In related news, I wanted to note my approval of the President's beer initiative. Nothing says "Everyone calm down and have a beer" like building a brewery in the White House basement. I hope that this is what they mean by "progressivism". I'm not in favor of more controls or more taxes, but the government certainly needs more beer.
Saturday, October 6, 2012
Friday, October 5, 2012
An open letter to hoteliers
Thursday, October 4, 2012
Ok. It only gets worse. How in the world did we manage to land a rover on mars when we can't even write a correct manual for a desktop computer? I shouldn't have to take a scanning electron microscope to every piece of equipment I buy just to figure out how to use it. Nonetheless, look at this. If we go to Intel for specs on the X58 chipset, we get this. Ok. PCIE 2. Great! So, now let's click "Compare chipset components" for further details. "PCI Express Revision 1.1". What the hell, seriously? So, which one is it? More puzzlingly, how is it that the same group of people who built a machine that performs billions of mathematical calculations per second is unable to consistently distinguish between two two-digit numbers when it comes to writing the very literature on which the sale of their product depends? The dichotomy makes my head spin.
So, now let's click on the link for "product brief". Look! A block diagram. Those are usually quite illuminating. We see PCI V2.0 from the X58 and "500 MB/s each x1" for the ICH 10. 500MB/s/lane corresponds to 6Gb/s/lane, which requires PCIE V2.0 on the ICH10, so great! PCIE V2.0 all around! So, what do I have to do to enable it? The mystery continues.
Well, this is irritating. So, I ordered a RAID controller card for my SSD drive thinking that it would enable the full throughput of the drive because the on-board SATA controller on the Dell MT435 XPS is only SATA v2 at 3Gbps, or half of what I need. So, my RocketRaid 620 card promptly arrives, and I throw it in the machine, and I find that despite the SATA link being v3 at 6Gbps, my throughput has nonetheless dropped from roughly 0.25GB to 0.18GB. So, I try all manner of sensible things to get it to come up. I build and install the OEM drivers from HighPoint. No improvement. I flash my RAID controller BIOS, which disappointingly provides an enhanced config menu at the cost of AHCI support (what?). It seems like a downgrade, if anything. I install the RAID management program. I assign the SSD to a 1-disk "array", in case RAID-mode is faster. Still no improvement. So, I begin reading up on my system specifications to ensure that my system supports PCIE 2, which one would hope, since the machine was sold almost two years after the publication of the standard.
One would expect that the answer to such a basic hardware question would be readily available, and one would be completely wrong. Why? Because, once again, the marketing department's whims thoroughly triumph over common sense. Ride along with me on this completely fruitless and stupid journey. Our first stop is Dell's support site at support.dell.com. Typically, when we need specs on a piece of hardware, we look in the manual, where hardware specs belong. Let's see. We have 1 PCIEx16 slot, and 3 PCIx1 slots. Great. The multiplier tells us how many channels or "lanes" are provided by each slot, but we still need to know the PCIE version because our expansion board is x1 and I'm not going to swap out my x16 video card anyway. At any rate, we still don't know whether the slots are v1 or v2, and nowhere in any of the manuals does it specify.
Well, since we aren't getting any help from the manufacturer at the product level, we can "drill down", as they say in the corporate world, to the component level and look for assistance there. So, I look up the parts list for my machine, in search of the motherboard model number, and I get gibberish like this:
N575K QTY 0 SERVICE CHARGE..., HYPERTEXT MARKUP LANGUAGE...,
INTEL..., LAN ON MOTHERBOARD..., 435MT
I interpret this to mean that I was assessed a service fee for a zero-quantity of part number N575 which consisted of HTML, an Intel something-or-other, and integrated ethernet. That makes perfect sense, or whatever is the complete opposite. Nevertheless, by searching for all occurrences of the word "motherboard", I was able to find the one line item which actually does refer to the motherboard, and I determined that the part number is R843J. This is great! Maybe now I can finally get some support. Not to make a long story even longer, this is the "specs" page for the motherboard in question. It does not help. At all.
So, having completely struck out on support from the manufacturer at both the product and component levels, we can begin digging after subcomponents in the hopes of answering the timeless riddle of what type of PCIE support I have. Cursory Googling does not provide any direct answer, but it does reveal that the system board uses the X58 and ICH10R chipsets, which are roughly analogous to a north and south bridge on older machines. We can go to Intel for specs on these and behold, wonder of wonders, a manufacturer who actually bothers to document their products. This is expected, since Intel is perhaps the lead developer of the PC architecture, and they would not maintain that status if nobody could locate or make sense of their documentation.
Homestretch? No way. As it turns out, in a strange twist, both chipsets suport PCI Express. The X58 does v2, and the ICH10R does v1.1. So the question becomes: to which chipset is each slot connected? That would have been a question for the motherboard manufacturer, whom I would at this point describe as having achieved a depth of inefficacy so deeply nested as to constitute a fractal of pure fail.
Now I'm convinced that not even Dell knows which slots on the XPS 435MT support PCIE 2.0. So, let's ask Linux. We can do lspci -t, which gives:
...
\-[0000:00]-+-00.0
+-01.0-[07]--
+-03.0-[06]--
+-07.0-[05]--+-00.0
| \-00.1
+-14.0
+-14.1
+-14.2
+-14.3
+-19.0
+-1a.0
+-1a.1
+-1a.2
+-1a.7
+-1c.0-[04]--
+-1c.3-[03]--+-00.0
| \-00.1
+-1c.4-[02]----00.0
+-1d.0
+-1d.1
+-1d.2
+-1d.7
+-1e.0-[01]--
+-1f.0
+-1f.2
\-1f.3
Using the ouput of lspci -vvv for further reference, we know that [03] is the RAID controller, and it is connected to function 3 of slot 1c, the info of which looks like this:
0000:00:1c.3 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 4 (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-Right away, we can see that this makes much more sense than a Dell parts list. We have a PCIE v1.0 slot which is capped at 2.5Gbps. Well, this sucks. However, I'm still not done. Because when we go back and look at the bus tree, we see that bus 5 is connected to bus 7, and I happen to know that 5 is my PCI v2x16 video card which does operate at 5Gbps/lane. It has neighbors in the tree which are empty, so maybe there is still some slim chance that the drive controller's slot can be routed to the X58 interface, though it seems unlikely. It seems similarly unlikely that someone would have routed PCI slots around an available fast interface to a slow one, but I've yet to figure out if that's what has actually happened, but I think the primary takeaway from this, as the business school kids say, is that crappy documentation wastes time.SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 0000c000-0000dfff Memory behind bridge: fbd00000-fbdfffff Prefetchable memory behind bridge: 00000000c0400000-00000000c05fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #4, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <256ns, L1 <4us ClockPM- Surprise- LLActRep+ BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Slot #0, PowerLimit 10.000W; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: feeff00c Data: 4171 Capabilities: [90] Subsystem: Dell Device 02c9 Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed+ WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [180 v1] Root Complex Link Desc: PortNumber=04 ComponentID=00 EltType=Config Link0: Desc: TargetPort=00 TargetComponent=00 AssocRCRB- LinkType=MemMapped LinkValid+ Addr: 00000000fed1c000 Kernel driver in use: pcieport
Thursday, September 27, 2012
SSD
This is another one of those posts where I marvel at the amazing pace of hardware technology. I just purchased a 120GB OCZ Vertex 3 solid-state drive. This thing has a throughput of 0.5 GB/s. That's right. It can dump the entire contents of a dual-layer DVD in roughly 17 seconds. To put that further into perspective, this drive is roughly as fast as four or five ordinary drives operating in RAID 0 in terms of throughput alone. Because the drive is a departure from the mechanical drives which had previously been the standard for roughly two decades, it has no moving parts and therefore zero seek-time. There is no physical distance between any two addresses on the drive, and all accesses have the same latency, which, between the SATA controller and the Sandforce chip is close to zero for most purposes anyway.
The only drawback with SSDs is that each page can only be written around 3000 times over the lifetime of the device. This is keeping in mind that mechanical drives are also limited by the physical endurance of their spindle and head bearings. Just the same, though, I want to minimize writes to my SSD, so here is what I did. I split my root filesystem between my SSD and my old 0.5TB mechanical drive. I divided things up thusly:
SSD
- /etc
- /usr
- /lib*
- /bin
- /sbin
- /opt
- /boot
HDD
- /home
- /root
- /var
du -xshc / and I get 3.9GB. What? That's right, I'm running Linux, so despite having thousands of programs loaded, I could theoretically cram absolutely all of them, and all of their libraries, plus the kernel and all of the system programs, plus most of my system settings into system RAM simultaneously and still have two gigs free, which is incidentally what I feel like I have left over when I'm running a completely idle Vista system.
Of course, in real life, the system only loads what it needs, which is why my system memory is nearly empty all of the time. And so ends my one hundred and thirty fifth thesis on why Vista is a useless bloated carcass of an operating system.
Here, I'm basically reiterating a theme that comes up frequently in my general ponderings about pretty much everything. If you only ever see things one way all the time, you begin to take it for granted, and pretty soon what you have becomes invisible. Then, one day, you find something new for comparison and you wonder what in the world you were thinking all along. In this case, I've been wondering where all of the hardware innovations go. They go to waste on operating systems whose primary domain of innovation is to find new ways to squander resources. You see, OS developers have timelines and budgets, and we wouldn't want to burden them with the trouble of applying common sense, because that takes effort. What we do, instead, is we drop a fat, sloppily under-engineered, over-managed project in the users' lap and let his hardware dollar compensate for a profoundly shoddy product. As long as nobody ever tests an alternative, they never notice that the quality of their OS is in freefall because the continuous progress of hardware technology cancels it out.
Wednesday, September 26, 2012
iPhone
Tuesday, September 18, 2012
Desktops
I now count myself among the ranks of GNOME refugees. I liked GNOME 3 until I realized that Nautilus will report spurious non-specific "file not found" errors when network-copying large file trees. Since I'm using Linux to avoid exactly this sort of thing, and GNOME is basically built around its file manager, I decided to ditch it for xfce, which happens to be designated for Debian's next official desktop.
I like it. It's simple, it's clean, and the controls are attractive enough, although the default icon set could stand some improvement. Best of all, I don't know how else you can get a windowed environment on a modern computer which idles at under 500MB of RAM, or roughly half of what Gnome requires.
The main trouble, under Debian/squeeze, is that because it isn't the default desktop yet, there is no metapackage, and you have to go hunt down various apps and features which would have otherwise been loaded and preconfigured as a package-manager task alongside GNOME.
Thursday, September 13, 2012
Please wait...
I just wanted the world to know that I'm currently attempting a Wine install of Battlefield 3 while running Windows Update in a VirtualBox VM, all under a custom kernel in Debian, with my home-brewed Radeon driver, and this is the sort of thing that I find terribly exciting.
On one hand, I expect the Wine install to fail simply because reimplementation of the Windows API is such a huge and literally interminable task. On the other, I expect VirtualBox to run Windows apps perfectly, but at a significant performance overhead.
At any rate, I have both Windows environments neatly sandboxed so that if either one gets a virus or simply implodes, as Windows is wont to do, it doesn't affect anything. I just rm -r and cp -r, and everything is fine again, which I realize is a radical concept to those of us who are accustomed to dealing with a system Registry. Wine, I compiled and installed locally to a dedicated user account whose sole purpose is to run wine. This way, if the compatibility layer fails, or an app flips out and decides to delete everything, then the most I can lose is the wine user's home directory. VirtualBox, on the other hand, is inherently a sandbox, so there's nothing special to do.
As part of my Windows emulation binge, I took a look at ReactOS, and I frankly can't comprehend the purpose of that project. To me, the entire point of reimplementing Windows is so that you can use Windows apps without Windows. I suppose the point of ReactOS is that you can pretend as if you're running Windows without having paid for it, and that seems completely pointless to me. When I want to run Windows, I simply shell out the $100 and away I go. If I'm going to expend a ton of effort to run an operating system, I'm going to do it to run one which is better, not one which is a clone of something that doesn't work well for me in the first place. And realistically, nobody will ever catch up to the Windows API because replicating all of the bugs in semi-documented APIs is really difficult and the Windows API is a moving target anyway -- not for the benefit of the user, but as a consequence of what I suppose could be called marketing-driven engineering. Which is a phrase I use to mean "We change stuff so that we can increment the version number and redesign the retail packaging."
Oracle
Wednesday, September 12, 2012
Eclipse
is great, in case I didn't belabor the point sufficiently. I'm surprised at how fast it is. Under Linux, you can scarcely tell that it's a gigantic Java app. It's no slower than commercial products I've seen, but when it does bog down, it's usually because it's doing something useful, like indexing the daylights out of the entire linux source, or whatever you're working on. It makes extensive use of multithreading in the GUI, so even when you have a build running while indexing source in the background, you can continue editing as if you were merely idling along. It is an absolutely brilliant product. It compresses something like half a dozen virtual consoles' worth of tools, browsers, status displays, and error reports into one interface. I am absolutely tired of telling myself "Did the developers actually bother to try this product before releasing it?". Eclipse is not one of those products. It's very obvious that the developers use Eclipse to develop Eclipse, because it's so useful and well thought-out. The only other product I've found with Eclipse's apparent mind-reading capabilities is Google.
The main caveats I can offer would be to use Oracle's Java runtime, and you will get fast and stable performance. Unpack it into ./eclipse/jre, restart Eclipse, and from there, it just works. Also, avoid shoddy or outdated plugins.
Tuesday, September 11, 2012
Video packages
I was surprised at how much work went into building my own debs for my video drivers. It isn't any problem with the Debian packaging system, which has been automated even further since I last looked at it. It's just that when I download something which is labeled a driver package, and I run the provided installer, I expect it to be a finished product unless otherwise stated. This was sort of like ordering furniture delivery and finding a big box of parts on my doorstep, and no manual.
Anyway, at least the pieces seem to have been well made and everything works so far. AMD's Stream SDK installed without issues and clinfo shows all processors. Google Earth shows some instability, but I'm not sure what's causing it. Hopefully an update will fix it later.
Everything works. I have OpenGL, OpenCL, audio, and network printing, just like with any commercial OS, except that there is no junk in it. Everything runs at a crazily fast pace, and the only way that I have ever run short of RAM was by running lzma with completely unreasonable dictionary settings.
Wednesday, September 5, 2012
Radeon HD 2000-4000 on Debian, Linux 3.5.3
As a developer, I really like Debian for a number of reasons, with its emphasis on reliability, customizability, and efficient minimalism. The drawback, though, is that the fastidiousness involved in all of the testing and validation means that the current "stable" release is generally well behind most other distributions. So, if you want the latest and greatest of something, you often have to either backport a "testing" package, or simply build it yourself. Fortunately, for the kernel, which is in a constant state of flux, there is the "kernel-package" package which makes the proper installation of custom kernels trivial (I believe there is also an Ubuntu kernel-package).
Ok. So, here is my take on the current state of Radeon HD support under Linux. Unfortunately, the open-source support native to the kernel is not so good because ATI/AMD considers the methods by which it accomplishes most of its nifty stuff proprietary. This is probably because they are afraid that their source code would reveal details about their hardware components and ASICs and so forth. So, the bad news is that high-end 3D acceleration and OpenCL is pretty much out of the question -- at least within the realm of strict open source, that is.
The good news is that AMD has released its core IO library in closed-source binary form, which is the next best thing. This is wrapped in an otherwise open-source driver which exposes the various hooks the closed-source library needs in order to call the operating system and allocate DMA memory and other such low-level things. So, although the closed lib is a black box, there is still an open-source glue layer which we can update in order to maintain forward kernel compatibility even without ongoing manufacturer support, and this is always nice, because manufacturers can be really fickle about legacy support, especially towards Linux. Some people consider a mixed-source driver a half-measure. But I look at it this way: the vast majority of the time, the OEMs don't release their IC diagrams, nor even their firmware source code. So how is this much different? Anyway, it works! So, I'm happy.
When I downloaded the latest AMD driver package, I was disappointed to find that not only is the installer itself broken, but the latest stable kernel, 3.5.3 breaks compatibility with the driver. So, in the true spirit of Linux, I fixed it. You can download the makefile I used here. You will also need the Radeon HD 2000-4000 series x86 64 drivers from AMD. To use, simply configure, build and install your kernel, as usual. Then place the two aforementioned files in a directory
of their own and domake build-pkg
sudo make install-pkg
sudo aticonfig --initial
and reboot. Detailed instructions are in the makefile, which you can open in gedit, and it will conveniently highlight the comments in blue. Read them and follow the instructions carefully lest you hose your system. So far so good, for me. fgl_glxgears runs at 3000 fps, and the rss-glx screensavers look great.
[UPDATE]
The new version of the Radeon HD 2000-4000 install wrangler for Debian is up, and it fixes the PM issue, as well as a problem with incomplete uninstallation. That solves all issues that I know of.
[UPDATE] Now builds native .deb packages.
Monday, September 3, 2012
Saturday, September 1, 2012
Makefile parallelism
Despite the improvements which could be made to improve error-checking, makefiles are actually a pretty great invention. A makefile build-rule is of the form:
A: B
F
Where A is a set of targets, B is a set of dependencies, and F is a list of shell command templates to run top-to-bottom in order to obtain A from B. That's it. All you have to do is type up a list of true statements about the relationships between the steps in your build process, and make automatically figures out what needs to be done, and in what order. Not only that, but on request, it will figure out how to divide the steps up into groups which can be run in parallel so that if you ask it to build Z, and it knows that J, K, and L are not interdependent, then it will build those three all at the same time. But if it knows that J,K,and L depend on A, then it will complete A first. How useful is that? Very. You only run into problems when you think that you have specified a rule adequately, but due to some technicality of syntax, something didn't match up, and you get a different result from what you intended.
I was just reading an article by a guy who says that he launched a company to deal with the "problems" in parallel make. If you don't like the syntax, that's one thing, but the principle of the thing is not broken. If make is building things in the wrong order, it's because you omitted or misstated a dependency. Going back to my archive-unpacking example, I later realized that I was still missing something. Say I have:
mytarget: oem/a
dostuff etc, etc.
oem: myarchive.zip
unzip $< -d oem
oem/%: oem
@echo
If you say "make mytarget", make's logic goes like this: ok, I need oem/a. oem/a does not exist. Do I have a rule that matches "oem/a"? Yes. oem/% matches, and it says that I need to make "oem" first. There's an explict rule for making oem, and it says to unzip it.
That's excellent. However, there's a problem with it. From make's perspective "oem" has been previously completed if a) it exists on the filesystem and b) its modification time is later than that of all of its dependencies. That doesn't take into account the case in which oem is half-done, like if the user hits Control-C during decompression and then runs the build again later. So, I will do this:
mytarget: oem/a
dostuff etc, etc.
oem unpacked_flag: myarchive.zip
unzip $< -d oem
touch unpacked_flag
oem/%: unpacked_flag
@echo
Much better. Now, I have defined success in terms of the existence of a file named "unpacked_flag", which can only exist after archive extraction has run to completion. So, when make goes and spawns seven subprocesses to pursue various branches of a build, and it sees that they all depend on oem/%, which is short for "anything in the oem directory", it will block all of those processes until the empty file "unpacked_flag" has been built, signifying successful extraction of oem/.
Great. Simply by having been both complete and accurate in our statements about our build dependencies, we automatically have a thread-safe build script. More specifically, a correctly written makefile and a thread-safe makefile are the same thing. Which makes me think: essentially, aforementioned Makefile Guy started a company to service the sector of the programming community which is plagued by broken and badly written makefiles, though he won't say as much explicitly lest he offend his market. Talk about depressing. There was one more thing that got me, though. I had:
oem/patchme:
echo 'append this' >> $@
This says to build "patchme" by appending 'append this' to it. Obviously patchme is dependent on its own existence in order for this to work, but make cannot handle circular dependencies. So, we can't specify it as its own dependency, and it has no other dependencies, so we have stated that it has none at all, and make will simply give up. So, we have to explicitly say:
oem/patchme: unpacked_flag
echo 'append this' >> $@
That's it. It's elegant and it makes perfect sense.
Friday, August 31, 2012
Why your stupid makefile doesn't work
Or, a short treatise on silent failure. The makefile, as powerful as it is pervasive, is much like any other power tool in that you can spend all day mashing yourself in the thumb with it if you don't know how to use it properly. The problem with them is that the syntax is so permissive that almost everything is valid, but it's so specific that a single misplaced character will produce a completely different result from what you wanted. Common mistakes:
- Misplaced spaces
- Inconsistent use of trailing slashes in paths
- Pattern rules in the wrong order
You have to be careful with leading and trailing whitespace, because it will cause two seemingly identical paths to not match. The same goes for trailing slashes on pathnames. /a/b/c will not match /a/b/c/. No, it isn't obvious, because you might assume that the makefile operates primarily on files rather than strings. But it does matter.
Finally, as for rule order. The order of execution on dependencies, laterally, is undefined. That is, given "a: b c d" the order in which b,c and d are completed is undefined, especially in multi-threaded builds. Vertically, given "a:" "b:" and "c:", the order of execution is based on the dependency relationships between those targets. So what does rule order matter? Because the order in which rules are evaluated is from top to bottom. Although the build system is great at automatically navigating dependency trees and generating hundreds of shell commands, it is up to you to enter build rules in the correct order of specificity, not so that they run in any particular order, but so that they are tested with the correct priority. For example, I've been working on a makefile to automatically patch the lastest Radeon driver to build for the latest stable kernel on Debian. The first thing my script needs to do is unpack the archive into a directory I've named "oem", so the rule for this is "oem/%: oem". But even though it's the first task to execute, the rule belongs at the bottom of the script because it is the catch-all for any target which did not match any preceding rule. It basically says "If after everything else, you couldn't find a file, then make sure you unpacked the archive". I'm going to throw in a shell command at the end that tests to make sure that the target is on the filesystem, because otherwise, make will not spit out an error message because the rule matches irrespectively of whether it succeeds at outputting the target sought.
Sunday, August 26, 2012
The perfect kernel
My epic quest for the perfect kernel continues, and I am rapidly closing on my goal. Things I've accomplished thus far:
Remove stuff you don't need
Many of these will be obvious. Do you have a parallel port? If your computer is new, probably not. Linux is intended to run everything from handheld cell phones to supercomputing research servers hosting hundreds of users. If you are installing a PC distribution, some of the default assumptions will lean towards the latter scenario. In addition to unneeded device drivers, stuff you can probably remove for a desktop system includes.
- Excessive logging
- Excessive access control
- Unneeded quotas
Add stuff that is missing
Linux has become amazing at producing fully-functional default configurations with zero user input. However, if you want to maximize performance, then you will want to dig through various logs and module lists to ensure that all of your system's capabilities were detected. This is keeping in mind that just because a device works doesn't mean that it is configured to its potential. Some devices may have been detected in a compatibility mode, and in those cases, you will want to manually specify a better driver or change your BIOS settings to enable missing features. Stuff to check:
- dmesg: for the system message log
- lsmod: for a list of loaded modules
- lspci: for a list of detected PCI devices
- lsusb: for a list of detected USB devices
- gnome-device-manager
- Your peripheral and chipset manuals
Demodulize the stuff you use
By default, these days, Linux distributions ship a minimal kernel with almost all of the features compiled as modules. This is, intuitively enough, so that the kernel will be modular, so that it will support all of the hardware in the world (almost), without bloating the system. This is great from a compatibility perspective, but it has drawbacks. Firstly, it requires a two-stage boot process called "initrd", where "rd" is short for "ram disk". This causes the kernel to be booted on an initial ramdisk filesystem alongside all of the modules which it might potentially need to initialize those devices which are necessary to begin accessing the real filesystem. It works fine, but potential problems include the overhead involved in loading lots of module files and allocating the memory for them. There may be overhead from memory fragmentation, or from the extra indirection necessary for dynamic modules. Anyway, there's just something elegant about knowing that all of the essential stuff gets loaded into memory in one contiguous block. It probably decreases boot time slightly, too. Stuff I build into the kernel image includes anything which is high throughput, low latency, or otherwise essential.
- Chipset drivers: ICH10 southbridge, PCI(e), GPIO, hardware monitors, etc.
- I/O drivers: UHCI and EHCI for USB. AHCI for disks
- More I/O drivers: SATA. Also SCSI emulation over SATA is standard now
- Video
- Sound
- Ethernet and TCP/IP
- Virtualization hosting devices
Lastly, among the things which you may decide you don't need , some will lack modulization as an option, and you can simply disable those. Everything else, I intend to trivially build as modules. This way, if I find a Token Ring card lying next to a dumpster, support is just an insmod away.
Saturday, August 25, 2012
Kernel configuration
Building the kernel is actually really easy. Between the distribution scripts and the kernel build scripts, you will get a config file that produces a very flexible kernel that will probably boot successfully with the default settings. And there are literally hundreds of settings. You can spend days simply sifting through all of them and learning about computer architecture in the process, if you want to.
Of course, if you get a setting wrong, your system may not boot, but you can install kernels side by side and simply fall back to a working kernel at the boot prompt, so it isn't a big deal.
There are already tons of tutorials on how to perform the build itself, which is really easy and amounts to little more than "make xconfig", "make kernel_image", and [boot loader install command goes here]. If you run Debian, it's super extra-easy. The hard part is choosing the correct settings to customize the kernel and pare it down to what you need without excluding anything important.
For that part of the process, I have found little guidance online. Basically, an interesting thing about Linux is that it is less aware of the hardware products in your system, as such, and more aware of the components from which they are made. So, instead of detecting an Acme Model X motherboard, you will find that it discovers the north and south bridges, and sundry hardware controllers individually. The downside is that it makes system configuration a bit of an adventure. The upside is that Linux is compatible with a wider variety of hardware because the driver developers needn't foresee every possible variation of hardware model and revision for every vendor and product line, and so on. Instead it says "Hey, there's a hard drive controller over there, which I succeeded in talking to using these protocols".
Boot an initial kernel
The default kernel, as provided by the distribution, is usually built using fairly generic settings suitable for your target architecture. Hardware support which could be essential for booting will have been compiled into the kernel image, and then everything else, kitchen sink included, will have been built as modules. Upon boot, you can read the logs, and, if you are like me, you will quickly find that the kernel knows more about your hardware than you do. You can then use this information to build a better kernel. Use dmesg to read the kernel's log and see what hardware it detected and what modules it used to do it. In my experience booting default Debian and Ubuntu, the default hardware support is quite functional across the board. Use lsmod to obtain a list of the modules that were loaded to achieve that support.
Choose modules
The modules in the list you obtain from the generic kernel will fall into one of three categories. Firstly, some modules will be extraneous. You will find that they were loaded speculatively or to support features you don't use, and you don't really need them. Secondly, there will be the modules which are exactly what you need. Thirdly, there may be modules which provide a subset of the functionality you need for a particular device. For example, I have an Intel ICH10 southbridge, but Debian's kernel picks modules for the PIIX chipset, which preceded the ICH product line. I'm still trying to figure out, by trial and error, whether there are ICH drivers which will talk to my hardware, even though PIIX drivers seem to nominally work. Lastly, there is the possibility that some devices were missed entirely. The process for sorting out which is which involves lots of Googling and manual-reading. Luckily, you can make xconfig and hit Ctrl-F to search the configuration settings for modules pertinent to any functionality you are missing.
Thursday, August 23, 2012
Kernel building
One of the many benefits of Linux is that you can recompile the operating system, which is a common practice among its users. This means that you can build a version of it which is tailored to your needs and hardware. You can change settings and limits, swap out components, and add or remove optimizations according to your system capabilities.
A typical operating system will contain hooks for loading and detecting all manner of hardware which I will never use. It may be compiled for an old processor, and for old or generic peripherals so that it will be binary compatible with as much hardware as possible. This is more for the convenience of the OS vendor than the user.
Linux, on the other hand, is compatible with a vast, vast selection of hardware devices, but at the source level. So, rather than go to a vendor for a prebuilt binary which runs on, say, every PC built since 2000, I can build a binary which is specifically built for the hardware in my particular machine.
For example, I can specify that this is an Intel/AMD 64, so I get all-64-bit binaries rather than a mixture of backward-compatible subsystems. Then, I can specify that I have eight CPUs, which means that the necessary tables and supporting structures to run eight CPUs are statically built into the kernel, so there's no need for the kernel to fiddle with extra pointers, counters, and conditionals at runtime within the schedulers, which need to be as lean as possible.
Further, I can tell the build to perform branch analysis on the kernel, which causes it to produce self-modifying code to optimize certain conditionals which change infrequently. So, the kernel, while it is running, will insert jump instructions into itself at appropriate points so that it can avoid a read and a comparison from then until the next state change.
Everything in this operating system is about efficiency. Vista idled at 3GB of RAM, which is an amazing non-achievement. What it was doing holding so much RAM, I can only guess. I would like to think that it was being held in a system cache for further reallocation. My Debian install idles at 500MB, or 1/6th of that. I can run a build of the kernel with all eight processors pegged at 100%, inside of 1.5 GB of RAM. My swap partition is untouched. My hard disk light barely flickers. Why? Because I have so much free RAM! Everything is cached in RAM. The disk only blinks occasionally to commit the object files when they come out of the compiler or linker.
And it isn't just fancy programming. Part of this memory efficiency is due to a strong commitment in the community to standardization of interfaces, which is why Open Source is more than just a buzzword or a social movement. On a very pragmatic and technical level; all of the executables are drawing from the same pool of dynamic libraries. This means that if I have one hundred processes which need to use the DEFLATE algorithm, there is only one copy of zlib in memory, and they all share it, rather than having a different library instance compiled into each and every program times one hundred.
Sunday, August 19, 2012
Linux
I've decided to switch back to Linux as my primary OS for two main reasons. Firstly, its performance is great, especially in the area of file I/O, and software development is an extremely file-intensive activity. Secondly, there is a vast library of free development tools for Linux and Unix-like operating systems, some of which will only work under Windows with compatibility layers, if at all.
Ubuntu is amazing for ease-of-use. The live CD works perfectly, with zero configuration. Setup and support overhead used to be the main argument against Linux, and Ubuntu truly renders that a moot point. I am currently typing this while diffing a backup of my old system in the background, with flawless system responsiveness. I have a working network, sound, and video. And all I had to do to get here was boot a CD.
I only wish that more game companies would target the Linux platform, as to me, that is the main drawback, because I have to switch OSes to play games. Aside from that, I have basically zero incentive to use anything else. Linux, which is completely customizable, renders my hardware platform a playground for testing and software development.
Now, the only question is which distribution to use. Ubuntu makes for an excellent desktop OS, although I use it mostly as a live rescue disk. Its features are beautifully integrated and intuitive. On the other hand, I lean towards Debian as a development environment. Debian, on the other hand, harkens to the old school of Linux administration. It has tons and tons of settings, and the packages are largely uncustomized from their default configurations. What this means, is that after you install lots of packages, you will find the settings to be incoherent, with lots of clumsy or minimalist defaults. This is because, again, the packages come from all over the place, and nobody has substantially unified or coordinated their settings for any particular purpose.
That might seem to give Ubuntu a huge advantage, but despite all of the extra work to configure Debian, it offers the greatest number of options and flexibility to someone who has the time and the inclination to go to the trouble. It has a vast, vast library of meticulously maintained packages, its innards are exhaustively documented, and the entire thing is intended to be customized by the end-user at the same level as its developers.
Wednesday, August 15, 2012
The source editor
Ok, I'm underwhelmed with Visual Studio C++ Express' IDE. First of all, the Object Browser, which is a great idea in principle, also consistently fails at finding anything whatsoever, ever. Also, it utterly crawls at bulk project property updates. It's very clever at allowing you to simultaneously edit a property for several intersections of projects, configurations, and platforms, but for some reason, it takes forever. There isn't even basic support for refactoring. That being a fancy word typically for "Rename this symbol everywhere that it occurs", which is distinct from search-and-replace because it is language-aware, and will not blindly match substrings. It's a basic, but extremely time-saving feature. However, because this is the Express version, you can't do anything to expand its functionality or address its shortcomings. The plugin architecture is deliberately crippled, which is supposed to motivate you to pay money for a less crippled version. And although I'm sympathetic to the desire to get paid for software, there is a ton of high-powered development tools in the public domain.
Next up, I intend to take a look at the Eclipse IDE. I hadn't looked at it for a while because it is a Java-based behemoth which was originally designed for Java development. However, despite running in a Java VM, I don't see how it could be any less responsive than Visual Studio. Because it is an open-source product with a strong emphasis on extensibility, its plugin website boasts a directory of over 1400 plugins. There is support for the MSVC compiler and SDK suite, which is not dependent on Visual Studio to run. There is built-in refactoring support, and git integration.
I also want to try out the UML plugins. UML is supposed to automatically generate flowcharts and diagrams of software architecture, which could be very handy for design. There is a notion called round-trip engineering, which basically means that you can have diagrams generated for your code and you can then edit those diagrams, and have the changes reflected in your code. This sounds really useful for developing the initial data layout and procedural flow. Though, I think once I had any significant body of code, I probably wouldn't want a script generator casually rewriting it.
Incidentally, 7Zip is a really nice compression tool. It just finished compressing a 4GB git repository, which was already partially compressed, to 10% of its original size. It took a while, but that's handy.
Sunday, August 12, 2012
Company name?
Thursday, August 9, 2012
Show of hands
Well, there appears to be a monkeywrench in my Irrlicht/Bullet integration idea. The problem is that the two use different-handed coordinate systems, which is basically the perfect problem to ruin an otherwise really good idea. To convert from one system to the other, you basically have to swap the X and Y axes. So far so good, because we could get around that by redefining X and Y to be swapped within Irrlicht. Then, we have to invert Z, which requires a float multiplication for every single vertex, and that is not so good.
Then, I thought to myself, well, the conversion consists of two rotations and a scaling operation, and is therefore linear, so I should be able to fold it into the view transform without incurring any extra processing. The problem remains, though, that prior to that transformation, all of the crossproducts will be facing the wrong way, which means, at a minimum, that all of the normals will be wrong, and backface culling will work backwards. If I really wanted to, I could typedef all of the Irrlicht math primitives to right-handed counterparts and then let the compiler find all of the references. But then, I still have to fix all of the incorrect transforms during file loading, and also, every single time that anything mysteriously fails, I will have to go hunting through the code for that one hand-inlined special case somewhere that continued to assume a left-handed coordinate system.
It isn't insurmountable, but that's a lot of work and, in return, all I will have to show for it is a version of Irrlicht with extra coordinate system support. So, now, I'm going to play with the idea of writing my own OpenGL renderer just to see how that compares. It might even be fun.
Tuesday, August 7, 2012
git
I'm really satisfied with git as a version control system. I don't really have much to compare it to other than the version of SourceSafe that came with Visual Studio 6 eons ago. I'm normally not much for celebrity endorsements, but if Linus Torvalds wrote git specifically so that he could maintain the Linux kernel, then that counts for something.
It's simple and fast. Most things can be done with a short command-line or two. Pretty much all of its functions revolve around performing binary operations on directory trees. "binary" in the sense that you give it two directory trees and it performs some abstract operation, usually resulting in a new directory tree. Whether those trees are on the filesystem, the repository, or represented by a diff file is pretty much completely interchangeable. So, you can say "give me version X of the experimental branch. Now give me a diff that represents all of the changes needed to bring it to version Y".
The great thing about version control in general, is that it's an effortless way of generating a complete history of the state of even a colossally huge source tree. Because the changes are stored differentially, rather than as complete copies, your consumption of disk space is mostly bounded by your typing speed. This gives you new confidence to experiment as much as you want without the constant fear of having to wait ten minutes to revert from a backup. Now, you are no longer wasting tons of mental energy and brain space trying to remember all of the changes you need to back out if your program suddenly breaks. Because you have less to think about, this frees up your mental processing and effectively makes you a smarter programmer. Talk about useful.
Sunday, August 5, 2012
Lambda
Ok. I can't overstate how great lambda functions are, and I have to say that with their arrival, the C++ language, the standard library, and boost have all finally come into their own, as a whole. I mean this in the sense that only as of lambda do they finally accomplish what they have intended all along.
aabb::aabb(const float (*vb)[3], const float (*ve)[3] ){
typedef std::pair<float, float> extent_t;
extent_t ext[3];
std::fill(ext, &ext[3], extent_t(0,0));
std::for_each(vb, ve, [&ext](const float (&v)[3]){
std::transform(v, &v[3], ext, ext, [](const float &f, extent_t&x)->extent_t{
x.first = std::max(f, x.first);
x.second = std::min(f, x.second);
return x;
});
});
std::transform(ext, &ext[3], dim, [](const extent_t& x){ return x.first - x.second; });
std::transform(dim, &dim[3], ext, pos, [](const float& d, const extent_t &x){ return x.first - d/2; });
}
This is my first attempt at a function to calculate the axis-aligned bounding box and center of an arbitrary vector array. And I have to say, this code is very, very compact, clear, and semantically dense. There is basically no semantic overhead here. There is no explicit loop control which might otherwise foster sign, comparison, and off-by-one errors (unless I were to forget that my 3D vectors are float[3]). I don't waste a ton of space specifying initialization for classes I'm only ever going to use once. This is about as declarative as it gets. Most of the syntax is spent on specifying types, and then the properly selected type goes along and does its thing without a lot of procedural writing on my part. In fact, aside from template calls, the only procedural code in here basically amounts to this:
std::fill(ext, &ext[3], extent_t(0,0));
x.first = std::max(f, x.first);
x.second = std::min(f, x.second);
return x;
return x.first - d/2;
Which means, in order: zero the extent array, find the furthest extent of each axis in each of two directions, and then calculate the center as the offset between the maximum axis extent and half of the associated dimension.
The standard algorithms were supposed to provide compact, simple loops, and now that we have lambda, they do that. As opposed to before, where you had to go through and write a functor class and remember to properly accept and copy all of the outer scope variables. By the time you were done with that, you had something more complex, more verbose, and at least as error-prone as a traditional for(;;) loop. Suddenly the standard template library and boost fulfill everything they were supposed to do in the first place, and this is great, because that's a lot of really useful stuff.
The really nice thing about declarative programming of this kind is that you spend most of your time specifying types rather than specifying logic or procedures. The compiler is great at finding problems with types, and terrible at the other. So, if you define a bunch of types, and pass them to each other in a type-safe setting, and the thing compiles, then chances are that you have a valid program that does something pretty close to what you intended because the pieces only fit together a certain way.
Saturday, August 4, 2012
Metaprogramming
... or: where does all the time go? So, I was moving along absorbing Blender code, and making glacial progress on the COLLADA exporter to begin with, when I noticed that I needed to iterate over some vertex lists, and I got to thinking about vector math. Vector math got me thinking about MMX. Thinking about MMX got me thinking about generic ways to generate optimized code, which is why I spent all day trying to remember how to use C++ templates. And C++ templates are a really great thing. The reason is that they essentially allow you to program the compiler to generate code for you.
C programmers are accustomed to this:
x + 1 + 2 + 3 + y
collapses to:
x + 6 + y
...because the compiler can calculate the values of the constants and insert the result into the binary so that the executable needn't add 1+2+3 at runtime. What templates allow you to do, is to write what are essentially programs that run within the compiler to do work at compile-time instead of run-time. But rather than just operate on literal constants, you can work with just about anything that the compiler knows at compile-time; especially types. So, for example, I can define template <typename T, T V, typename OP=op_null> class expression{...}; in terms of a fundamental type T (int, float, etc.), and a constant value V. Where OP represents an arbitrary operation on V, but defaults to a NOP. This allows me to declare this class as essentially a wrapper around, say "const int x=123". Then, i can overload the operators on that class so that when I perform arithmetic on it, I get back a different type, which is still defined in terms of "const int x(123)", but defined with a different OP type, which performs an operation on V. So, say, --expression <int, 100 >() will return a class which represents the subtraction of 1 from 100. So now, I have an inline predicate that generates a sequence of numbers starting at 100 and counting down. But it does it at compile-time, not at run time.
We can define classes in terms of constants and constant arithmetic, so that allows us to count. We can define classes in terms of each other, so that gives us recursion. Now comes template specialization, where I can specify a type which is a special case for, say expression<int, 0>. So that when I'm done counting down from expression<int, 100> to expression<int, 0 >, I can stop. So, now we have conditionals within the compiler, so we can do finite loops. Also, since we can define any OP class we want, and we can define OP classes in terms of each other, we can nest them to create sequences of "opcodes".
So, having churned through all this, impressed with my ability to write a metaprogram that decrements const int x; 100 times but optimizes out to "x=0", I decided to compare the disassembly to a regular old for() loop and was surprised to find the exact same result. Way to reinvent the wheel. Anyway, at least I remember how to work with templates now. And I am steadfastly committed to finding an actual use for this eventually.
Thursday, August 2, 2012
Mars Curiosity
I'm looking forward to Curiosity's landing on Mars this Monday. The reentry and landing sequence is so ludicrously complex, I'm not optimistic. If it succeeds, it will be very exciting. I'm not aware that they ever did a complete simulation of the landing here on earth, and the chances of nailing something that complicated on the first attempt do not look good to me, but maybe that's why I don't work for NASA. Proverbial fingers are crossed.
Wednesday, August 1, 2012
Bullet
Bullet does more than just physics calculations, per se, because it has to. In particular, it maintains memory structures analogous to the visual scene graph, but for physics purposes instead. Like the renderer, it has an octree implementation which works by recursively subdividing rectangular volumes into eight subvolumes. So, at the top level, you will have a box-shaped region that contains everything, and it will be split into eight sub-boxes, each of which is split into eight sub-boxes, and on down the line. I don't say "split into eighths" because the splits aren't necessarily equal in size. Instead, they are chosen so as to attempt to subdivide and distribute the world elements roughly equally so that, at the top level, each sub-box contains roughly one eighth of everything, and then each of those sub-boxes is split into "eighths" in that same sense, and on down.
The goal here is that once you have divided up the scene, you can perform operations at whatever "box" level is needed, up to whatever granularity you need in order to accomplish a task, without having to iterate through every single item in the game universe. You can pick any arbitrary object at random and almost instantly find everything that is "near" it by looking at the octree node that contains it. If you don't find what you want there, you simply pop out one or two levels, and now you have everything that is "near" the initial search space.
The applications of octrees in collision detection are obvious, and this relatively coarse process of finding spatial neighbors is referred to as the "broadphase", in Bullet. Things that are right up next to each other need to be checked for collisions, and things that aren't don't, and octrees are a very efficient way of distinguishing.
What I would like to do is integrate Bullet into the renderer so that the visual scene graph and the physical scene graph are the same because this saves memory and avoids array-copying. I don't know if this is industry-standard, but it seems useful, especially since I want to be able to stream the world from disk, and it saves the overhead of maintaining an extra octree. If I could use the same structure for collision detection, PVS, and occlusion culling, that would be great.
Physics
I'm almost ready to test basic physics exporting from Blender, and looking at its inner workings has been interesting. The code is very clean, much of it written in procedural-style C++. The codebase is huge, but it's very neatly arranged. It's exciting to peer into such a large, well-made product and know that you can arbitrarily add any features you like. Blender has all manner of commercial-grade features not unlike Maya or 3DSMax, but the source is freely available, and you can learn all sorts of things simply thumbing through it. I do wish it were better documented, though.
Moving onto physics simulation, this is a part that I'm looking forward to. I first got a sense of the sort of generalized simulations that are possible looking at the Battlefield 1942 initialization scripts and was completely amazed that all of the vehicles in that game were actually assembled from many individual simulated mechanical components. So, for example, a plane had two wings which generated lift and drag, and so forth, and a car had four wheels each with shock and suspension characteristics. All of the vehicles in the game were WWII-themed but, if you wanted to, you could, in theory, script an eight-wheeled car that would continue driving if overturned. To be completely zany, I wrote in a tremendous bomb that could be straddled and ridden about like Slim Pickens.
I don't know what physics engine DICE used, or whether the Refractor engine had its own in-house simulator. I will be using Bullet. Bullet is yet another completely outstanding free open-source product. It is released under the zlib license and is under constant development. The latest version incorporates OpenCL support, which is especially exciting because this basically unlocks all of the processing power available on a modern computer.
Battlefield was released in 2002. For comparison, a water-cooled $.5 million Cray SV-1 supercomputer from three years prior would have topped out at 1 Teraflop. Meanwhile a single modern consumer GPU from Radeon claims 1.2 Teraflops for two or three hundred dollars today.
A modern CPU then adds a paltry .1 Teraflops, but its strength is not in floating point processing. A desktop CPU, like the quad-core hyperthreaded processor in my machine would probably be best suited to traversing octrees and running pathfinding and AI algorithms in eight parallel threads while the GPU performs physics and rendering calculations. Most likely with the cooling system groaning like a hair drier all the way, but that's ok.
Here is what I'm getting at. When we look at Battlefield 1942, we consider that it was basically harnessing all of the processing power available at the time to deliver really entertaining game mechanics -- and then DICE was bought out and that same design has been recycled ever since for about a decade. This is what always happens. id software released Doom, which was followed by upteen thousand clones and variations on that theme. But at least back then, some attempt was made at innovation. Doom had its Duke 3D and Quake had its Half-Life. These days, it seems as if developers have gotten wise and realized that there's no need to take a risk on a new gameplay mechanic when the market will happily spend their money on the same basic design, over and over again. So, although the hardware keeps improving by leaps and bounds, to the point where we all essentially have little supercomputers sitting on our desks, the game doesn't evolve because all of the processing power gets spent on making it look as if the player models have real skin and hair.
Saturday, July 28, 2012
Geometry editor
Geometry editor
Blender is an amazing piece of open-source software. Maybe because I'm a total amateur at 3D modeling, Blender does everything and much more than I can imagine needing for 3D content creation. It does meshes, armature (skeletal) animation, skinning, sculpting, lighting and a ton of other things I don't even comprehend yet.
I'm not much of an artist, so Blender will be mostly useful for map editing and tweaking and adjusting things for testing and so forth. This is ok, though, because, as I said previously, I want as much of the content as possible to be procedurally generated anyway.
For content exchange between Blender and the engine, I'm starting with COLLADA because it provides way more descriptiveness than I need, and it's an open standard which has native support in both Blender and the rendering engine. The downside is that all of that textual XML syntax has the potential to become huge very quickly for even simple objects, so if things begin to bog down, I can begin considering a binary converter and file compression.
The only drawback to Blender that I can find is that it doesn't export physics or custom properties to COLLADA, so my next coding goal is to develop a patch for Blender to add those capabilities. I just completed my initial build of the Blender source, so now I can begin tinkering...
Friday, July 27, 2012
The renderer
Renderer
I really don't want to spend the time to write my own renderer from scratch, especially if there are ready made ones for free. There is a considerable amount of code involved in just reading and decoding the file formats involved for textures, shaders, armatures, meshes, and keyframes and that's before you even begin to think about transformations and animation before finally sending the vector arrays to the rendering hardware.
I considered two renderers; OGRE 3D and Irrlicht, and I'm still not sure which to stick to. Both have free and open licenses permitting modification and redistribution.
OGRE
On the plus side, OGRE has a lot of features and a large development community. On the downside, I noticed several glitches in the demo applications under Windows. Also, the COLLADA support does not appear to be included in the main source distribution and must be compiled as a contributed plugin. I'm still learning about OpenGL, so I don't fully understand this, but it appears to say that the GL renderer performs thousands of redundant operations because it does not sort state changes well. I'm not sure how Irrlicht compares in this regard, though.
Irrlicht
Irrlicht is leaner on features, but the code is exceptionally clean and compact. It builds with no external dependencies, and the initialization and render loop code fits on a single page. COLLADA support is integral, so, out of the box, you can add a single line of code to the Hello World app, and immediately load scenes exported from Blender. Also, the demo apps seem very neat and polished, without any odd quirks.
If I were attempting to develop a conventional game, I would probably go with OGRE. However, I have so much customization in mind that I would probably be better served with the simplicity of Irrlicht. Since I'm going to have to make so many changes either way, Irrlicht means there's less code to rewrite.


