Escaping the FAANG Part 2. - Building a NAS

In part 1, I decided to move away from the predations of FAANG and manage my data myself. To do this I decided to build my own NAS / homelab server. In this installment, I order the parts and build that machine.

I should also issue the disclaimer that in this edition we are very much down in the weeds. From a high philosophical goal, we have descended to the depths of the nitty-gritty. If the dirt does not interest you, please feel free to skip this one.

Building the NAS

And so on to designing and building the NAS. At this point, my requirements engineering became a little muddied by a competing desire to have fun building an interesting machine.

I started with the same motherboard that I have in my Lightroom / Windows machine. The thought process here was that it would give me flexibility to swap components back and forth if needed. From there, I added a cheap Ryzen processor, an SSD for the OS so as to save the precious SATA ports for the hard drives, and a power supply that should theoretically be pretty energy efficient. For drives, I simply went with the cheapest reasonable option - 4 x 2TB consumer drives. This bit me in a couple of paragraphs time.

Then the important questions start, what software and configuration?

I have the most familiarity with Ubuntu, so starting with Ubuntu server seemed reasonable. I don’t have the patience to mess with BSDs and I favour having this as more of a homelab setup rather than opting for a preconfigured NAS option like FreeNAS.

Then the crucial question, what file system? There is a large amount of hype for ZFS, although it’s less flexible for growing the pool later, vs something like mdadm. Both support a variety of RAID options.

There is a huge amount of bike-shedding on the internet on the appropriate RAID levels. Basically, at one end of the spectrum, you make sure you have very good backups and don't need RAID, because if a drive fails you can just buy another and restore your backups. At the other end of the spectrum are voices that want to add enough disks of redundancy that it's statistically unlikely that the RAID could ever lose data.

I was somewhere in the middle - on the one hand I definitely wanted great backups. Having a single file server doesn't protect you from any of the likely threat models such as flood, theft or accident. And yet on the other hand, I didn't want to be downloading terabytes of backups on every drive failure. Some amount of redundancy would be nice.

RAID5 or RAIDZ seem to offer a nice compromise - you get a single drive failure without any negative consequences, simply plug in another drive, and let it rebuild. Unfortunately this is a simplification - with the published drive bit-failure rate, there is a not-insignificant chance that the rebuild process will fail on multiple terabyte rebuilds.

The thing is, this argument isn't nearly as convincing as the dogma out there regarding 'the death of RAID5' - the entire argument rests on the published URE (unreadable read error) of drives (usually 1/10^15). Sure, if you multiply this by the large size of todays disks, you get a significant likelihood of a URE during a RAID rebuild. But this ignores the distribution of URE's - which are far more clustered. Additionally, we are assuming a backup as well, so a failure during the RAID rebuild simply means we need to access some backups. It's not a data-loss scenario.

On the other hand, at this scale, adding another disk for dual parity isn't too expensive, so RAID6 or RAIDZ2 doesn't have many downsides, and makes it less likely that you'll have to touch the backups.

I had pretty much come to a decision - a RAIDZ2 pool of 4x2TB disks - I'd even ordered the disks, when a comment on the very helpful Butter What discord chat made me reconsider. I'd ordered cheap disks, with the rationale that a RAID would make it fairly pain free to rebuild if one failed. Unfortunately I hadn't noticed that these drives were SMR - shingled magnetic disks, and that a rebuild on one of them would cause load on the new disk that they weren't designed for. For this reason SMR disks are not recommended for a RAID, and are barely supported with ZFS.

Luckily I could return the disks unopened, and reorder, and instead opted for a mirrored pool of 2 x 4TB CMR drives.

When you build a computer, there is a moment of reckoning, surrounded with cardboard, when you realise whether the parts will fit together or not. Like most of the times I have attempted this, the first time they do not. This time I had somehow confused microATX with miniITX and there was no way my motherboard was going to fit in the case. The case was a nice size though, and everything else seemed good, so I quickly found a compatible miniITX board online and placed another order.

I built the computer, and tested that the fans spun up. Then I spent an hour tidying the cables and making everything neat. Of course I hadn't tested and of course the computer didn't boot.

It took a lot of debugging, trying every component, before I realised that somewhere in the ordering process, I had messed up and combined a CPU with no onboard graphics with a motherboard that needed one. Although the computer would never be plugged in to a monitor, without a GPU, it wouldn't reach BIOS. I made another order.

We were finally getting somewhere, at this point I could boot to bios, and was getting ready to install an OS when I noticed the SSD wasn’t showing up. Somehow the SSD I had ordered wasn’t compatible with the motherboard. Another order was needed.

At this point the thing finally started. I was relieved, and also a little impressed that my repeated and frantic disassembly of the thing hadn't caused any additional issues.

Final Build:

Component	Part
Case	Fractal Node 304
PSU	Silverstone ST40F-ES230
Motherboard	Asus Prime A320I-K
CPU	AMD Athlon 3000G
Memory	Corsair ValueSelect 8GB DDR4-2400
SSD	Kingston A400
Spinning disks	Seagate Ironwolf, WD Red Plus

With the hardware finally working I could finally work on the software. As I would administer this server remotely, the only things necessary to do with the monitor and keyboard plugged in were to install the basic packages for SSH, setup the keys, and install tailscale, so I could easily connect to the machine from afar.

Tailscale is amazing for this sort of thing. It allows me to connect to my local machines without any notion of the network topology between us, thus I don’t have to mess around with router tables or NAT busting. When you first ssh in to a machine on a LAN from afar without any configuration except logging in to the tailscale tool it feels magical. Additionally, the whole thing uses a VPN behind the scenes, which adds a level of security to the thing. I’m sure I could hack together a similar solution with wireguard, but tailscale works so nicely that I am happy to adopt it here.

Then, over SSH, I could install ZFS, samba and rclone. Nowadays system administration has become a dirty concept in tech, devops has eaten the world, and if I was to be fashionable, I would be configuring a kubernetes deployment script that setup the entire machine with remote orchestration. But my burnout starts twinging at so many fashionable technologies obscuring a level of over engineered indirection. Instead I treated this as an opportunity to hand-craft a bespoke server with manual labour, and noted down my process in a notebook, the old fashioned way.

I had setup everything up to this point before I realized that in my initial Ubuntu setup, I had neglected to enable full disk encryption, and the only way to do so was to reinstall the operating system, essentially starting from scratch. I could keep the ZFS pool, as I had the operating system on the SSD so it would be easy to repartition, and I had kept a detailed log of everything I had done up to this point, but regardless, it seemed like a lot of work and I procrastinated on it for a few days.

But the longer I waited, the harder it would get, so eventually I bit the bullet.

I wrote a few cron scripts to backup the zpool into backblaze B2 which is a little cheaper than S3, so at this point I had my NAS, with its terabytes of files, all up and running and backed up.

The next step was integrating it into my workflows. But that is a story for next time.

Next Part