Home NAS setup with Thecus N5550
2012-08-12 | Permalink
(tl;dr: use latest rdup for backups, and thecus has problems with some
drives, even from the compatibility list, if so, try updating firmware).
So I recently saw a reasonably good deal on the Thecus N5550, a 5-bay
hot-swappable NAS machine. This seemed as good a time as any to finally clean
up my backups and Do Them Right.
So the plan is to see how much the already excellent
rdup has progressed (last time I used it
was at version 0.5, it is now at 1.1.13, the version from the repositories
on my systems was 1.0.5).
Initial setup for the N5550 is quite OK, I setup NTP, NFS access, Samba access,
and enabled SSH login to have a lowlevel look at what’s going on inside.
Setting up the RAID array was a breeze; For my first experiment I chose a
RAID-5 array of 3 Seagate 3TB disks, totalling in about 5.5TB of space.
I installed rdup on my vps, and on a local always-on machine (a lowpower
fanless box acting as a fancy gateway), and initially I mounted the big
drive as a samba mount (there were some unidentified problems with nfs4, see
below). Read and write speed were not as high as I would have thought, about
1-3 MB/s. But that was a problem for later, and I don’t really need that
much speed anyway.
I only needed simple incremental backups, so the rdup-simple script should
be fine.
Then the first problem arised; there were a lot of ’no such file or
directory’ errors. After retracing the steps rdup-simple takes manually, I
found out that sometimes rdup-up missed creating a directory. And then of
course when it tries to write files in it it fails. This happens for
different directories on every run, so I suspect this is some form of race
condition.
I asked around and a friend who uses it says he had never seen this with his
version (1.1.11 IIRC).
Time to upgrade.
Ater getting rid of the repository versions, and building a fresh rdup from
the latest release (long live apt-get build-dep!), rdup-simple proceeded
without a hitch.
I also has a number of media files I wanted to copy, so I simply scp’d them
to the mounted share.
And then the real trouble started.
While it was copying one of the video files, scp aborted with an I/O error.
I tried to copy again, and this time the failed file worked. But shortly
after that, while copying another big file, the thecus started beeping
continuously.
Looking at the diagnostics, it reported a failing disk (drive 2). Since I
had already run the smart checks on this brand new disk, I didn’t think it
was actually dying on me, so I tried to find what was really happening. Good
thing I had that SSH access; dmesg showed lots and lots of problems:
Periodic exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen from sata drive
hard resetting link
followed by a number of messages:
device reported invalid CHS sector 0
And of course the RAID was in a degraded mode. Telling it to fix itself
started a rebuild. As you may now, RAID rebuilds take a long long time, and
after about 10 hours (at 92 percent), disks 1 and 3 dropped with similar
errors. Bye bye RAID.
This seems to be a problem with the driver, as similar problems for other
systems are all over the web. It is often blamed on bad cabling, but there
are no cables in this machine (the sata connections are directly wired on a
board, and if that board was bad, I’d expect to see much, much more
complaints on the Thecus forum).
Another thing I noticed is that these are 6GB/s disks, and the thecus should
handle them fine (they are on the compatibility list), but they are connected
at 3.0GB/s, not 6.0.
This might be related.
So, I almost contemplated getting rid of the firmware and trying to install
a real OS on the system. But there was one thing else to try first.
The version of the disk firmware was CC4B, and there were some reports with
other Thecus machines that depending on the firmware version, some disks of
the same model worked and some did not.
I put my disks in another machine and updated firmware to CC4H. Fun!
Steps updating seagate disks:
- Boot my main Linux machine to windows for Seagate firmware update tool
- firmware update tool reboots the machine into a mini linux version
- which updates the disk
- then reboots the machine (defaulting to main linux)
- Repeat for other disks
Shoving the disks back into the Thecus, I got a small disappointment; the
disks were still negotiated to 3 GB/s. Oh well.
Rebuilt the RAID array (this time as a RAID-1, I care more about redundancy
than write speed and size). And this time I found out the nfs problem (or at
least a workaround); when mounting with nfs4, the shares got a weird UID,
and Invalid Argument if you try to chown something. However, mounting it
using nfs3 makes it work.
Using RAID-1 and nfs(3) gave me 10 MB/s write speed and 2 GB/s read speed.
Now that is more like it! (Again, don’t care too much about write speed, but
10 MB is a lot better than 1). (EDIT: hmm, that was with a zeroed test file
I just created, for normal files it appears to be about 10 MB/s read speed
as well).
I have been stresstesting it all day now, and have just started the backups
again. No disk I/O errors so far.
So if you get those, before doing anything drastic, check your disk firmware
updates.
Happy RAIDing!