Epistolary
rob carlson . gallery . contact
Other References
2.4.18-19.7.xsmp kernel
BIOS
Chris Dailing
December 15, 2002
December 29, 2002
December 30, 2002
December 6, 2002
December 9, 2002
Dell
Dell PowerEdge 1650
disk scrub
GB
Grub
Hitachi
IMAP
January 20, 2003
kjournald
modutils
MX
O200
passwd
PERC
POP3
RAID
RAID 1
RedHat
RedHat Linux
RPM
SCSI
Silicon Graphics
SMP
SSH

Community Space
Arizona Breast Cancer 3-Day
Help Laura Duncan fight breast cancer

blip.tv
Your video, your way

Baltimore Spokes
Improving the quality of Baltimore by encouraging alternative wheeled transportation.

Blogosphere

del.icio.us
Digg this!
Google it
Newsvine
reddit
Technorati

Obama 2008

Barack Obama 2008

Recommended by

Dell PowerEdge 1650

Got a new Dell PowerEdge 1650 in at work on December 6, 2002 which will serve as our lovely new POP3 and IMAP server. I'll probably move some of the older machines over to do secondary MX once the install is finished. It's got two Hitachi 73 GB SCSI drives linked with PERC. Once I get it set as RAID 1 we'll have a suitably redundant high capacity rack mount server to replace our Silicon Graphics O200 server with 8 gigs of space.

Genius boy tells me that it's all right to reboot the machine during the RAID 1 disk scrub process. Since I want to go home before the ice gets too bad, I did. I do a warm reboot and when it comes back up, there's a solid light on the disk and I'm not sure what that means. I decide to wait another few minutes to see if the disk stops accessing and then power down and see if the RAID setup recognizes the containers or drives on the SCSI chain again.

A while later the light is still online so I give the BIOS another try. Still no response from the drives so it's time to reboot. The disk light doesn't stay on this time, and while the array controller takes about 30 seconds to spin them both up, evantually it does detect them as "new devices" and brings me back into the setup screen. When I get back to the screen, I find the drive scrubbing from 83%, exactly where I left it when I rebooted. A minute later the indicator clicks to 84% and it looks like I didn't hurt anything, just confused the PERC for a few minutes.

I realize I'm going to have to wait until at least tomorrow morning to start the RedHat install unless I can occupy myself until 18:30. I wind up doing just that.

Reboot after the scrub comes back OK, although I've only noticed it doing anything to Disk 1. I'm not sure what to make of that. On reboot, Disk 0 comes on with a solid light, and Disk 1 blinks occasionally as if it's trying to start up. I decide to hard power down and reboot while the BIOS is waiting for the Array Controller #0 to start up. I'm assuming that a power cycle will have some sort of positive effect like it did the last time. Listening carefully, I can hear both drives spin up, and begin to flash in sequence. The Array Controller seems to like this and finds one container with 68.3 GB, just like I set up.

It looks like Chris was right about being able to reboot while the controller is working, the machine just doesn't behave like it ought to.

The RedHat installation CD boots a minute later and the install goes smoothly according to the notes I wrote earlier that day. When it came around time to reboot, the machine seems to have the same issue with restarting the Array Controller if the machine wasn't powered down. The controller just sits there and Disk 0 has a solid light, with Disk 1 at nothing. A little more confident than last time, I hard power off the machine. I assume (a little optimistically) from my experience so far that the installation was a success and move it from my office into the machine room.

Luckily, it works. Five minutes later I'm connected to the new box from my office. I still need to figure out the root of this problem, but I have a whole weekend, a fist-full of documentation and an SSH terminal to do it with before this Monday.

December 15, 2002 - Had to go into work this morning to fix the mail server. It appears the kjournald managed to lock itself so hard that I had to go in and reboot it from the console. Network services were still responding, but nothing that had to either make a log entry or authenticate against passwd. The PERC has been giving me a bunch of trouble since we installed the server, including not restarting after a warm reboot. I'll probably call Dell about it tomorrow.

December 29, 2002 - Another lockup early this morning and again in the late evening.

December 30, 2002 - Found the cause of the system lockup in a post to the Dell Support Forums, and its a SMP-related race condition with ext3 in the 2.4.18-3 kernel of RedHat Linux.

January 20, 2003 - Did the kernel upgrade last night. I grabbed an updated version of modutils and the new kernel RPM from the RedHat site, and the RPM put the new kernel in all the right places, and updated the Grub boot list. A quick reboot this morning, selected the new 2.4.18-19.7.xsmp kernel from the list and it worked like a charm. I'm impressed.


No Comments | #2016

Leave a Reply

Please let me know how you got here, if this page was useful to you, and your opinions.

Unless noted, all content on epistolary.org is © Copyright 1999-2008 to Rob Carlson with all rights reserved. All information is verified when possible, cited as appropriate and applied in the real world at your own risk. Send all feedback to rob@vees.net.