Professional Documents
Culture Documents
The SSD now has to erase and write two blocks, even though one would have sufficed for the amount of data being written. To fix this, the drive's firmware would have to do data mapping on the byte level, which likely isn't going to happen (in the worst case, you would need more memory for the remapping table than the drive's capacity!) If the file system's write was aligned to a multiple of the SSD's erase block size, the result would be this:
Thus, it's generally a good idea to make sure your file system's writes are aligned to multiples of your SSD's erase block size. As I found out, this isn't quite as easy as it sounds. The first road block is already encountered when you partition a hard drive:
Partition Alignment
If the partitions of a hard drive aren't aligned to begin at multiples of 128 KiB, 256 KiB or 512 KiB (depending on the SSD used), aligning the file system is useless because everything is skewed by the start offset of the partition. Thus, the first thing you have to take care of is aligning the partitions you create.
A cylinder.
A sector.
Traditionally, hard drives were addressed by indicating the cylinder, head and sector at which data was to be read or written. These represented the radial position, the drive head (= platter and side) and the axial position of the data respectively. With LBA (logical block addressing), this is no longer the case. Instead, the entire hard drive is addressed as one continuous stream of data. Linux' fdisk, however, still uses a virtual C-H-S system where you can define any number of heads and sectors yourself (the cylinders are calculated automatically from the drive's capacity), with partitions always starting and ending at intervals of heads x cylinders. Thus, you need to choose a number of heads and sectors of which the SSD's erase block size is a multiple. I found two posts which detail this process: Aligning Filesystems to an SSD's Erase Block Size and Partition alignment for OCZ Vertex in Linux. The first one recommends 224 heads and 56 sectors, but I can't quite understand where those numbers come from, so I used the advice from the post on the OCZ forums with 32 heads and 32 sectors which means fdisk uses a cylinder size of 1024 bytes. And because fdisk partitions in units of 512 cylinders (= 512 x heads x sectors) fdisk's unit size now happens to be an SSD's maximum erase block size. Nice! To make fdisk use 32 heads and 32 sectors, remove all partitions from a hard drive and then launch fdisk with the following command line when you create the first partition: view source print?
1.fdisk -S 32 -H 32 /dev/sda
The OCZ post also recommends starting at the second 512-cylinder unit because the first partition is otherwise shifted by one track. Don't ask me why :) Here's how I partitioned my SSD in the end:
For a normal hard drive, I'd probably use 128 heads and 32 tracks now to achieve 4 KiB boundaries for my partitions.
Probably the larger chunk size is more useful if you are storing large files on the RAID partition, but I haven't found any advice which included benchmarks or at least a solid explanation yet.
For example if you have 4 drives in RAID5 and it is using 64K chunks and given a 4K file system block size. The stride size is calculated for the one disk by (chunk size / block size), (64K/4K) which gives 16K. While the stripe width for RAID5 is 1 disk less, so we have 3 databearing disks out of the 4 in this RAID5 group, which gives us (number of data-bearing drives * stride size), (3*16K) gives you a stripe width of 48K. The Linux Kernel RAID wiki offers further insight:
Calculation
chunk size = 128kB (set by mdadm cmd, see chunk size advise above) block size = 4kB (recommended for large files, and most of time) stride = chunk / block = 128kB / 4k = 32kB stripe-width = stride * ( (n disks in raid5) - 1 ) = 32kB * ( (3) - 1 ) = 32kB * 2 = 64kB
If the chunk-size is 128 kB, it means, that 128 kB of consecutive data will reside on one disk. If we want to build an ext2 filesystem with 4 kB block-size, we realize that there will be 32 filesystem blocks in one array chunk. stripe-width=64 is calculated by multiplying the stride=32 value with the number of data disks in the array. A raid5 with n disks has n-1 data disks, one being reserved for parity. (Note: the mke2fs man page incorrectly states n+1; this is a known bug in the man-page docs that is now fixed.) A raid10 (1+0) with n disks is actually a raid 0 of n/2 raid1 subarrays with 2 disks each. So these are the stride and stripe-width parameters I'd use:
Intel SSDs with an erase block size of 128 (or 512 KiB -- Intel isn't quite straightforward with this, see the comments section for a discussion on the subject - if anyone from Intel is reading this, help us out! ;-)) that are not part of a software RAID:
-E stride=32,stripe-width=32
OCZ Vertex SSDs with an erase block size of 512 KiB that are not part of a software RAID:
-E stride=128,stripe-width=128
Thus, I set up the file systems on the Intel SSD like this: view source print?
1.mkfs.ext4 -b 1024 -E stride=128,stripe-width=128 -O ^has_journal /dev/sda1
defaulted to 1024 byte allocation units on my boot partition, so I adjusted the stride up to 128 KiB according to the advice from the CentOS wiki. The alignment of my boot partition is probably not of any relevance because the system will read maybe 10 files from it and not modify anything, but I wanted to stay consistent :)
Cylinder Alignment Table. The Vertex SSD has an erase block size of 512kB. To properly align your SSD for writes, use one of the following parameters when starting fdisk. If you specify the start of every partition manually you don't need to use this table. The main purpose of using the S option and the H option is for automatic partition alignment as some tools has the ability to round to cylinders. If you create your partitions with fdisk using cylinders, you'll get automatic alignment using this table. Just make sure to start your first partition on the second cylinder. Code:
Cylinder size (Times 512k) 1024K (x2) -S 16 -H 128 -S 32 -H 64 512K (x1) -S 8 -H 128 -S 16 -H 64 -S 32 -H 32 256K (x0.5) -S 4 -H 128 -S 8 -H 64 -S 16 -H 32 -S 32 -H 16 128K (x0.25) -S 2 -H 128 -S 4 -H 64 -S 8 -H 32 -S 16 -H 16 -S 32 -H 8
Partition alignment for OCZ Vertex in Linux This guide will show you how to set up general alignment for use with Linux. Tweaking some of the values used will yield different results. Background information SSDs work fastest if partitions are properly aligned. For the
Vertex drive, an alignment size of 64KB (128 sectors) has been proposed in this forum. On the other hand, since write operations to an SSD always affect a whole erase block, it makes sense to align to the erase block size, which is 512KB for OCZ Vertex. Note that a 512KB aligned drive is also 64KB aligned, because 512KB is a multiple of 64KB. In the following, I will therefore assume that you want a 512KB alignment; other possible alignment sizes are discussed later. Since the first partition cannot start at 0, minimal loss of capacity is obtained if the first partition starts at the first 512K position instead. To achieve this, 512KB must be a multiple of the cylinder size (at least if the partition is to start at a cylinder boundary, which is probably a good idea). It is convenient to use 512KB directly for the cylinder size - then all partitions except for the first one are automatically 512KB aligned (with other sizes you might have to calculate). We obtain cylinders with 512KB size by using 32 heads and 32 sectors/track (see table below). The first partition needs special treatment because it is automatically shifted by one track if you do not intervene. This can be done in fdisk expert mode, but I have noticed that you obtain the same effect if you simply let the first partition start at cylinder 2 instead of cylinder 1. You do not need the expert mode in this case, so I adopt this method for simplicity. Step-by-step guide In the following I will show the necessary steps for creating a linux partition (type 83) with 512KB alignment using fdisk. If you have no Linux system installed yet, start from a Live CD or use the "rescue system" option that comes with some distributions to get a Linux command prompt. The following commands are prefixed by sudo because they require superuser privileges. Code:
$ sudo fdisk -H 32 -S 32 /dev/sda The number of cylinders for this disk is set to 15711. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): o Building a new DOS disklabel with disk identifier 0x8cb3d286. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. The number of cylinders for this disk is set to 15711. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-15711, default 1): 2 Last cylinder, +cylinders or +size{K,M,G} (2-15711, default 15711): Using default value 15711 Command (m for help): t Selected partition 1 Hex code (type L to list codes): 83 Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks.
Basically, fdisk is started with correct -H and -S options. Then "o" creates a new partition table, "n" creates the partition, "t" sets the partition type, and "w" writes everything to disk. Verify result Let's first check if the intended geometry is stored correctly: Code:
$ sudo fdisk -l /dev/sda Disk /dev/sda: 8237 MB, 8237408256 bytes 32 heads, 32 sectors/track, 15711 cylinders Units = cylinders of 1024 * 512 = 524288 bytes Disk identifier: 0x8cb3d286 Device Boot /dev/sda1 Start 2 End 15711 Blocks 8043520 Id 83 System Linux
As you can see, the geometry (32 heads, 32 sectors/track) has been stored as intended. The cylinders also have the intended size (1024 * 512 Bytes = 512KB). Now let's check the alignment: Code:
$ sudo fdisk -lu /dev/sda
Disk /dev/sda: 8237 MB, 8237408256 bytes 32 heads, 32 sectors/track, 15711 cylinders, total 16088688 sectors Units = sectors of 1 * 512 = 512 bytes Disk identifier: 0x8cb3d286 Device Boot /dev/sda1 Start 1024 End 16088063 Blocks 8043520 Id 83 System Linux
If you call fdisk with the -u option, positions are shown in sectors. Each sector has 512 byte. You can see from the output of fdisk -lu that the first partition is indeed 512KB aligned (1024 * 512 Byte = 512KB). Moreover, only 512KB at the beginning of the drive are wasted. Other alignment sizes It is not necessarily so that you get best performance with a 512KB alignment. If you want to align to a 64 KB (128 sectors) alignment size instead (as suggested in this forum for Windows), then using fdisk -H 8 -S 16 /dev/sda will result in the 64KB cylinder size appropriate for 64KB alignment. For other alignment sizes, you can look up suitable -H and -S values for fdisk in b2bde4's Cylinder alignment table. Apart from the different options for fdisk, stick to the above description. Special case: "Partitionless" drive with a single filesystem (octoploid's suggestion): If you do not need several partitions on your SSD, and if you do not want to boot from it (or if you have another drive in your system to store the boot loader on), then explicit alignment can be avoided by creating a filesystem on the device as a whole. You don't need fdisk in this case. Just proceed as follows (with your preferred options to mke2fs, this is only an example): Code:
$ sudo mke2fs -t ext2 /dev/sda mke2fs 1.41.1 (01-Sep-2008) /dev/sda is entire device, not just one partition! Proceed anyway? (y,n) y Filesystem label= OS type: Linux Block size=4096 (log=2) ... This filesystem will be automatically checked every 20 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
Note that in the partitionless case, the whole drive must be mounted (i.e. /dev/sda instead of /dev/sda1). Therefore the mount command looks somewhat unusual, like this: Code:
$ sudo mount /dev/sda /mnt
With the "partitionless" method, you automatically get an aligned drive. You also have slightly more capacity compared to the fdisk method because there's no need to drop the first cylinder in this case.
Propose adding recommendation of 'mke2fs ... -E stripe-width=128' for 512kB erase page block drives (vertex) or 'mke2fs ... -E stripe-width=32' for 128kB erase page block drives. (afaik everything else)