[ale] Giant storage system suggestions
Alex Carver
agcarver+ale at acarver.net
Fri Jul 13 14:19:03 EDT 2012
On 7/12/2012 04:22, JD wrote:
> On 07/11/2012 05:03 PM, Alex Carver wrote:
>> I'm trying to design a storage system for some of my data in a way that
>> will be useful to duplicate the design for a project at work.
>
> We have 2 requirements at this point.
> a) Cheap
> b) 10TB usable with room to grow
>
> A few more questions need to be asked to get started understanding the
> requirements more completely.
>
> * What services do you require?
> ** CIFS
> ** NFS
> ** iSCSI
> ** AoE
> ** rsync
> ** others?
Actually, none of those on the work system. :) The data is going to be
accessible via HTTP for the most part (custom website designed to
catalog and sort the data). Uploads of smaller files will be by direct
HTTP POST and larger files will be scp/sftp. The home system might use
just NFS/CIFS to be mounted and used like any normal storage volume.
> * How will you backup all this data automatically?
> ** tape
> ** duplicity
> ** lvm
> ** zsend
That was TBD but likely a combination of tape and blueray. No remote
storage in either home or work. Home because that's how I feel. Work
because of ITAR and other government regulations.
> * Which file system(s) do you want/need?
The file system has to work with Linux/BSD because the OS on the server
is going to be one of those. Beyond that it won't matter as long as
it's a scalable filesystem.
> * Storage Performance?
> ** 10/100 connection
> ** GigE connection
> ** 10GigE connection
> ** multiple bonded 10GigE connections?
The connection to the server is going to be 100Mb/s ethernet (GigE may
show up later depending on network upgrades). The array is directly
connected to the server so that's all SATA/SAS (or at least it was in my
first thoughts about it).
> * Transport connections?
> ** ethernet over copper
Ethernet over copper.
> * Are there any unusual distance requirements for access to the storage?
No distance requirements.
>
> * What is the largest partition size required? This feeds into backups and
> future data migration options. You WILL need to migrate the data in the future.
I was hoping for one monolithic volume. It just needs to be a giant
data storage volume. Organization is covered via the custom interface.
> * How critical is the data?
The data is archival. No one dies but the data should be around for
many years.
> * Budget?
> ** SW - is commercial SW an option at all?
> ** HW - RAID cards fail occasionally, so you'll want an identical spare available.
> ** Support - things that might take me a week to figure out are solved in an
> hour by a professional in the business.
Software RAID is fine, it's not necessary for this system to use
hardware RAID (my original concept wasn't hardware RAID either). As
close to standard software is best (Linux/BSD, abstracted hardware (
/dev/hd[a-z][0-9], udev, etc.) so that there's no dependence on the
specific hardware).
> * RAID Options
> ** RAID6
> ** RAID10
> ** RAIDz
> ** RAIDz2
This was one of the questions I had asked. RAID level is flexible as
long as I have some redundancy but not to the point that I'm losing a
significant amount of array space to the redundancy components. If I
can handle a two or three disk loss then I'm fine. I will have cold
spares for the drives.
> * Any data replication requirements?
Nope, just the backups once in a while.
> * Any HA data requirements?
No, no high availability requirements. It's completely archival. At
work live data that is in regular use is stored on personal machines
with a backup sent to the server. Archived data resides on the server.
At home it's pretty much the same thing with some minor exceptions
like movies or music is pulled directly from the server.
> At this level, I'd expect hardware to be picky, so be certain that anything you
> piece together is listed as supported between the RAID, expansion, external
> array, protocols, physical connections and motherboard.
Right, I wanted to make a system that could gloss over the hardware
specifics. For example, software RAID shouldn't care what kind of SATA
card is in place as long as the drives are accessible and still have the
same device names. I wanted to avoid hardware RAID only because I get
locked into a specific card and vendor. Other than that everything was
flexible.
> My initial brainstorm said "he needs ZFS", but only you can decide if that is
> possible. Last time I checked, ZFS under Linux isn't a first-class supported
> solution.
>
> Sorry, no real answer from me, just more questions. Good luck and please post
> more about your attempts and final solution. Actually, the final solution would
> be a fantastic ALE presentation.
More information about the Ale
mailing list