xaminmo: Josh 2016 (Default)
I also found that DB2 is wonky in that:
* DB2 put my ACTIVELOG in C:\SERVER1 rather than C:\Tivoli\TSM\LOG as specified during loadformat
* DB2 leaves a whole lot of garbage in C:\ProgramData\IBM\DB2\DB2TSM1\SERVER1
* TSM doesn't clean up the DB2TSM1\SERVER1, even after DSMSERV REMOVEDB and DB2ICRT sequence.
* TSM decides to run a full DBB on every 10 min poll of log space, even when nothing is running and no log space is used.

I've cleared off another 20G of garbage from the drive, mostly DB2 dumps from the last instance, and restarted TSM. No new DBBs yet, which is promising.

actlog, sessions, procs, log )

35% Write

Nov. 18th, 2011 08:52 am
xaminmo: Josh 2016 (Default)
The drive that failed out of my array was already stickered as having done the same the last time I had array problems. The drive is less than a year old.
current error status )
Nothing for the most recent error.
The offline error count is a misnomer, because they are mostly triggered by OS read requests. The drive is dropped, thereby going idle, and then the recovery happens, calling it offline. But it's functionally a captive error.

I did a selftest full media scan (selective 0-max) and it reported NO NEW ERRORS. Of course it didn't. it was already discovered.

*grrr*
xaminmo: Josh 2016 (Default)
md1 : active raid6 sda2[5](F) sdd2[4] sdb2[6](F) sdc2[2] sde2[7](F)
      105810432 blocks level 6, 512k chunk, algorithm 2 [5/2] [__U_U]
      bitmap: 1/1 pages [4KB], 65536KB chunk
xaminmo: Josh 2016 (Default)
The GUI, when idle, chews up to 30% CPU.

Ok, whatever.

But on Linux, I was running into odd problems, such as:
tail: cannot watch `anyfilename': No space left on device

Wasn't a ulimit issue. Turns out that I ran out of inotify watches for root.

The fix was to add this to /etc/sysctl.conf (double the old value):

fs.inotify.max_user_watches=65536

Activate now was sysctl -w fs.inotify.max_user_watches=65536

When I searched, it looks like other people have run into this issue with CrashPlan.

Anyway, I'm still running into problems where I can't move the repository data for a client.. about a gig in, I get "Unable to move - unknown system error"

Only one hit, no details on response... *sigh*

Drive age.

May. 28th, 2011 05:46 pm
xaminmo: Josh 2016 (Default)
Brand new Western Digital MyBook says power-up count is 13, and spin-up time is 8150. I've had it powered on for about an hour, but it says it's been powered on for almost a year. Still, the warranty says May 19, 2013, and I bought it May 26, 2011, with a 2-yr warranty, so it's mostly moot.


Or I could look at it as they broke the drive in to get past the 3-6 month early-death period for many devices... :)
xaminmo: Josh 2016 (Default)
Picked up a couple of Radeon 5450 cards for the boys computers.

Took a Dremmel to the ADD2-R slots to remove the blocking plastic.

This opens up lane 1, receive on lanes 3, 7, and 11-15, plus transmit on lanes 12-15, and all hot-plug, and several redundant ground pins.

I actually damaged a pin in the process, but it turns out it's a ground pin. Technically that should be bad, but most cards just connect all ground pins to the same ground plane, so it's ok.

Anyway, plugged it in, and windows showed the boot screen, but not the login screen.

Into BIOS to set PCI as default, and booted, installed ATI drivers, and lo and behold. Default Portal 2 parameters are WAY up there. It blows my system out of the water.

I have 3 riser cards on order, just in case I really bork one. I wanted to replace the slot with a proper 16x slot with all of the conductors; however, that seems like way too much effort. It would mean unsoldering 164 pins, and resoldering 164 pins, by hand, twice. Bleh.

Anyway, Khai's system is now super happy with graphics, and I'm off to mod Max's riser card now.
xaminmo: Josh 2016 (Default)
SB Audigy was a bust, as was the SiiG DP Soundwave.

The Virtual Audio Card drivers worked fine for TVersity, but not for light desktop use.

Well, luckily, I stumbled onto an SB Live! 5.1 Dell edition card for $9. SB Live! cards work fine in 2003. WOOT!.

Also, when I reflashed the SIL-3124 card to BASE firmware rather than RAID firmware, the older PE400SC was happy to boot from it. I now have Linux back on a separate system. Unfortunately, my Toshiba DVD-ROM from ages past causes constant bus resets. It's a minor issue, because I have no disks on that bus. Still, it overflows the DMESG buffer, and makes boot require manual intervention. Also, all of the RAID are in an external bay because I can't fit 5 drives AND cooling inside the system.

Still, I have a $25 PATA DVD-RW drive on order and I should be set after that.

*knock on wood*
xaminmo: Josh 2016 (Default)
I'm not actually out of memory. I'm using just under 500M of RAM for TSM, 1G for an unrelated VMWare instance. I have 3.93GB of RAM total, about 1.8G free, and I'm running with /3GB flag.

*sigh*
ANR0132E lvmread.c(1245): Memory allocation failed: object Resync read page buffer, size 4096. (SESSION: 237, PROCESS: 14) )
xaminmo: Josh 2016 (Default)
Linux's compcache (compressed cache) became ramzswap.
This is like "Active Memory Sharing" for pSeries Hypervisor.
Linux's implementation is as an LZO compressed block device in kernel RAM.

As development went on, there was no need to limit this to paging space. Now, it's called zram. It's not super super stable, but it seems to work well enough when you leave it alone.

Linux already supports "priority" for paging spaces (ie, hierarchically used in order of best to worst). As such, backing devices are moot anyway. We'll just store non-compressible junk in RAM. It's rare, and not any worse than if there were no LZO.

zram is single-threaded, which is disappointing; however, its' performance is about 50% of RAM, and about 800% of disk. It's also allowed to run multiple devices at the same priority and each device will sit on its own core.

my foray into zram )
xaminmo: Josh 2016 (Default)
So, /dev/sda is on the SIL-3132 add-in card which runs at UDMA/100 on single-lane PCIe.

The drive on here runs 25% faster than the other 4 identical drives on ICH7 internal ports at UDMA/133.

Also, when idle, the ST32000542AS drives make the click every 20-21 seconds.

I heard rumors that updating to CC35 to remove the clicks also prevents drop-outs under Linux MDRAID.

Vague rumors of it being related to a drive spindown when queried for SMART.

Seagate offers a bootable ISO for this, but it says, no, sorry, can't update your drive.

I run the command-line tool to update, and it works for the 4 on ICH7 ports, but the SIL3130 drives get garbled model/serial from query with their tool.

OK, well, losing the drives on the ICH7 is worse because it kills 2 drives at once due to weirdness with this intel desktop board. I'll leave it with the one on CC34.

Then, I go to run smartctl and it fails to get any info... I'm freaking out, pissed, etc...

and then I realize I'm not root.

I R SMRT!

FIOS

Dec. 12th, 2010 01:35 pm
xaminmo: Josh 2016 (Default)
Anyone here have to reset their ONT periodically?

Power cycled my router, no change. Factory default, no change. Power cycle ONT, comes back.

Seems to be a recurring issue for me, but I'm also on MOCA using 20 year old coax with splitters and barrels because the installer didn't want to run CAT5 between two attics.

(I was down from between 04:01am and 04:31am and had 13 down notices (6.5 hours)
xaminmo: Josh 2016 (Default)
Looks like these are accounted for
Now that I'm all stable on the new drives, my old drives are up for sale at http://dallas.craigslist.org/ndf/sys/2099029407.html

ST3500320AS Seagate Barracuda 7200.11
Capacity 500GB
Interface SATA 3Gb/s
Cache 32MB
Guaranteed Sectors 976,773,168
Average latency 4.16ms
Random read seek time <8.5ms
Random write seek time <9.5ms

I have nine, with date codes ranging from 08174 to 1037-4.
The two oldest ones are original label. The rest are "Certified Repaired HDD".
I ran these as 2 different arrays under Linux and have since moved to a smaller array of larger disks.
All of these are "healthy" per SMART reports prior to pulling out of service today.

500GB SATA-II drives range from $35 and up new, so I think $10 each used is a fair price. (I'd like $12, but it's easier to not have to come up with $8 change :)
xaminmo: Josh 2016 (Default)
So, replication from the split copy worked, and then I rebuilt the array from 1 other known working disk, plus one suspect disk.

Checked this morning and all was well, and added the new, 5th disk and restripe is running.

I checked the SMART logs on one of the good disks, and it shows that out of the sectors I've written, an amazing THREE PERCENT have required ECC recovery. Or if you count it only by read sectors, it's a whole one percent.

For the drive that went offline in its first day of use, Offline_Uncorrectable and Pending_Reallocation are both 202.

That seems VERY excessive to me. I know that current disks have such high densities that bit rot is rapid. I think I heard it's 40% signal loss by 400 seconds after write, and then it sort of levels off around there.
blah blah blah )
It looks like the best bet is:
1 One disk per controller, per array for hardware reliability
2 RAID-6 for reliability during recovery
3 Modern disks for ECC
4 Continue backups.

Unfortunately, I'm not able to do #1 right now. I'd need minimally 4 controllers and I have three. Based on current arrays, I'd need 5 controllers to keep from having to have 2 arrays and double the parity cost.

I guess I could get 2 more controllers. I'll have to weigh my concerns with that cost.

My reference )
xaminmo: Josh 2016 (Default)
Disks
When adding drives to a degraded array, the order added can matter. During rebuild, one of the drives went offline. This took the controller offline, which took a second one.

I couldn't stop the array because it had a VG on it. I couldn't export nor set the VG unavailable because the underlying array was inaccessible. Stupid leenooks LVM.

Anyway, I rescanned the busses and the disks came back. This concerns me because I have all disks set to fast fail on boot so they don't time out for the array.

Readding the drives after they came back caused them to show up as spare and not live.

*sigh*
xaminmo: Josh 2016 (Default)
Anyone going out today who wants to pick one up for me, and I'll pay you back.

I have a home for at least 4 of them.
xaminmo: Josh 2016 (Default)
lilo no longer supports root on LVM for whatever reason.

The fix was to add it to the kernel append line

root=/dev/mapper/root-hd4

Apparently, the LILO developers aren't really interested in fixing complex boot, but I'm not quite ready to move to GRUB.

But, anyway, I'm up and running.

*pleased*
xaminmo: (Computer Drive)
IBM 1956, 24" platters, 5 million characters ( 5bit? 6bit?)


Micropolis 1578 332MB 5.25" FH (4MB/sec peak, 1.5MB/sec sustained)


SanDisk Transflash / MicroSD, 8GB 10mm x 16mm x 1mm (6MB/sec write, 12MB/sec read)
xaminmo: Josh 2016 (Default)
So, I've been posting about drive failures, etc.
I finally have everything put back together.

All data was salvaged and I'm running the final reshape and resync now.

But, I noticed, both drives that failed were in the top bays.

Hrm, kinda close together, but more importantly, NO AIR FLOW!

The lower bay has an 80mm fan blowing across the top of both drives. They're also flat, side by side.
Then, there's the CPU which has a monster fan and horn that also blow warm air out into the case.
There's vent holes in the back, but it's passive exhaust.
Up top, in the back, is the PSU fan, but the top bays have no air circulation at all.

To correlate this, we have 2 of these servers at work, and have had drive failures, ONLY in the top bays.

*sigh*

So, I had an 80mm, 40CFM fan in my drawer that I deployed. It's held in by twist-ties, but it's in just the right spot and is pretty secure. It makes a little more noise, but the temps are 30C and 34C.

The room is 22C, and in open air, these got up to 51C. None of them complained about exceeding any sort of max temp, but I'm pretty sure it's bit-rot from being warm.

I need to see about installing similar fans for the 2 boxes at work. We have warranty drives coming in, but it's a pain in the rump to deal with failed drives.

For my home units, all I need is to finish the resync, and then figure out why the new rootvg won't come up during boot from disk. I can mount it up from alternate boot.

Profile

xaminmo: Josh 2016 (Default)
xaminmo

July 2017

S M T W T F S
      1
23 45678
9 1011 12131415
16171819202122
23242526272829
3031     

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 22nd, 2017 12:52 pm
Powered by Dreamwidth Studios