After the almost-fiasco of upgrading my Ultra2 to Etch, I've ended up with a number of useful things to know and remember about Etch and/or this kind of setup. Share and Enjoy, I say, so here goes.

The Etch squid needs epoll(), which only 2.5+ kernels provide. The Sarge squid continues to work fine, though.

Stunnel now requires its PEM file to also contain a set of Diffie-Helman parameters or it won't start. README.Debian does show how to retrofit that (trivial), but the manual page doesn't. Also stunnel is now in /usr/bin instead of /usr/sbin.

The perl rename proggy is now called prename.

On Sparc64, the libc-kernel interface is badly shot for 2.4 kernels: simple things like statfs() return borken info. I found two programs that fail badly in that setup: innwatch and amanda. On one box amanda managed to crash the whole box.

Do not think about compiling a 2.6 kernel on an UltraSparc with gcc 4.1. You will regret it. In my case the second an iptables recent rule was loaded, I got this beauty (same one every time):

Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 00000000000003f1
tsk->{mm,active_mm}->pgd = fffff8007f190000
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
swapper(0): Oops [#1]
TSTATE: 0000004480009607 TPC: 000000000062359c TNPC: 00000000006235a0
Y: 00000001    Not tainted
TPC: <recent_entry_lookup+0x2c/0x88>

I so loved to see that smiley. sigh

Use 3.4 and all is good. (MAKEFLAGS="CC=gcc-3.4" make-kpkg... does the trick.) Well, most is good. There's still a few unaligned access traps, even in I disabled arp tables completely as one of the prime producers of this, but the mainstream TCP code also generates a few a day. But no more panics.

Do expect to build iptables locally after the new kernel works (iptables has lots of annoying build-deps). If you don't, then a number of things will magically fail and the iptables people are not willing to fix this.

The new version of the MD subsystem in 2.6 is pickier: if you tell it to boot from a mirror-half directly (e.g. ...kernel root=/dev/sda1), then you'll get this useless and misleading panic: VFS: Cannot open root device "sda1" or unknown-block(8,1). device 8,1 is all fine, but as my kernel has MD built-in, it detected the presence of 8,1 in an MD device first and remembers that this device is "used". The fix is easy but took me quite some time to find: use the extra arg raid=noautodetect and booting the half will work.

Note that if you start the remaining MD devices from within that setup, the MD driver will decide that YOUR CURRENT MIRROR HALF is the faulty one. To get back to your (presumably worked upon and magically repaired) half of the mirror being the one that counts, you will have to nuke and rebuild the mirror; you can't mark the other half as bad in place because your half has already been nixed.

If you end up with a silo that doesn't even reach the SILO prompt (happened to me, "SProgram terminated..ok"), then you should suspect the second.b boot loader file. In my case that was corrupt, booting an Etch cd and walking through oodles of boring stuff until you can get your MD started, then copying the second.b from the cd and rerunning silo -f works. Note that second.b is modified enpassant by silo, so md5sum doesn't tell you anything useful about this file's state. Also note that the silo program provides exceptionally lousy diagnostics. You get "...appears to be valid" in lots of totally screwed-up situations, which you'll find out only when you boot next....

Fast Data MMU miss or the like while booting can also mean that your /vmlinuz symlink is dud (BTST). As silo understands full PROM aliases and allows you to (laboriously) specify the kernel file you want to boot, you can get past that: scsi/sd@0,0;1/boot/vmlin... did the job on this box (whose openprom has a default scsi devalias!), but make sure to explicitely specify the root device.

For those who don't remember all the garbage in their /boot, silo has two really nice features: add these two lines to your silo.conf,

image="ls -l /boot"
image="cat /boot/silo.conf"

and you'll be able to select their labels on the SILO prompt. The first prints a listing of /boot, and the second dumps your silo.conf. Very handy I must say.

If you think that VG metadata backups from 2.4 are any use during a move to 2.6, forget it. I lost one vg completely. The metadata restore simply doesn't work across the lvm10-lvm2 transition, but nor did vgconvert or anything else work -- short of dd'ing over the physical volumes; I couldn't even lv/vg/pvremove things...Very nasty that was.

If you use rsync to backup/restore in such a case, do yourself the favour and use --numeric-ids on the way back. Otherwise you'll likely end up with mysterious (ha!) file ownership stuffups, because rsync sets the file owner to the account name on the sending side by default.

[ published on Sun 29.04.2007 20:15 | filed in interests/debian | ]
Debian Silver Server
© Alexander Zangerl