General concepts
Goodies
the threads subcommand lets you know what each thread is currently doing {{{osd threads io
}}}
Working with an AFS+OSD cell
The following techniques are only useful when using OSD as a frontend for HSM:
- {{{vos listobj
}}} is useful for finding all objects stored on a given OSD
- {{{fs prefetch
}}} allows the user to schedule tape restore operations
- the fetch queue for a given osd can be examined: {{{osd fetchqueue io_a
osd f io_a}}}
it is useful to have bosserver run a script to wipe objects off the OSDs according to OSD fill rate etc.
- the atime can be used to reach wiping decisions
- this requires a filesystem with atime support
- the OSDDB can tell non/wipeable OSDs apart
Other HSM related features:
- md5 checksums can be retrieved for archival copies of objects (unverified) {{{ osd md5 io 536870930.2.1413.0 4
}}}
This is how a fileserver and OSD server can share one machine:
touch the file /vicepx/OnlyRXOSD to forbid the fileserver to use this partition
The new fs ls subcommand can tell files apart:
- {{{ [iokaste] /afs/desytest/osdtest_2 # fs ls
m rwx root 2048 2008-05-08 13:21:01 . m rwx root 2048 2008-05-08 16:31:48 .. f rw- bin 44537893 2008-05-08 11:36:40 ascii-file f rw- bin 1048576 2008-05-08 13:21:14 ascii-file.osd d rwx bin 2048 2008-05-08 09:06:52 dir f rw- bin 0 2008-05-08 13:08:46 empty-file o rw- bin 44537893 2008-05-08 13:19:58 new-after-move }}} where m is a mountpoint, f a file, d a directory and o an object. fs ls will also identify files with their objects wiped from on-line object storage (i.e., with archival copies only).
How to migrate data from an OSD
- set a low write priority to stop fileservers from storing data on the OSD in question {{{osd setosd -wrprior 0
}}}
- use {{{vos listobj
}}} to identify the files (by fid) that have data on the OSD
- use {{{fs replaceosd
}}} to move each file's data to another OSD
Priorities and choice of storing OSD
- OSDs are used in a Round Robin fashion
- priorities add weight to each OSD
- static priorities can be set by the administrator {{{osd setosd -wrprior ... -rdprior ...
}}}
- priorities are dynamic
- ownerships apply a modifier of +/- 10 to the static priority
- locations apply a modifier of +/- 20
- the fill percentage plays a role if above 50% or 95% respectively
Customizing owner, location:
the osd addserver subcommand {{{osd adds 141.34.22.101 iokaste dtc ifh
}}} makes an AFS fileserver known to the OSDDB
this is required in order to make use of ownerships and locations
- this data can be examined using {{{[iokaste] /afs/desytest/osdtest_2 # osd servers
Server 'iokaste' with id=141.34.22.101:
- owner = 1685349120 = 'dtc' location = 1768318976 = 'ifh'}}}
The help on osd addserver is misleading: {{{ # osd help adds
osd addserver: create server entry in osddb Usage: osd addserver -id <ip address> -name <osd name> [-owner <group name (max 3 char)>] [-location <max 3 characters>] [-cell <cell name>] [-help] }}} as the name to be specified is not actually an "osd name" but an alias name for the file server you're adding.
Data held in volumes, DBs etc.
- all metadata belonging to files in a volume are stored in a designated volume special file
metadata references OSDs by id
OSD id is an index into the OSDDB
- these IDs are permanent and can never be reused after deletion of an OSDDB entry
- view the OSDDB using {{{osd l
}}}
a file's metadata is found using the metadataindex stored in the file's vnode
view the metadata using {{{fs osd <file>
}}}
How to upgrade a cell to AFS+OSD
- set up OSDDB on the database servers
- set up pristine AFS+OSD fileservers + OSDs
- move volumes to the AFS+OSD fileservers
volserver is supposed to be armed with a -convertvolumes switch for that purpose
otherwise, set the osdflag by hand {{{vos setfields <volume> -osd 1
}}}
Policies
- policies are going to decide 4 basic things:
- whether object storage is to be used for a given file
- whether the objects are to be mirrored and how often
- how many stripes the file will comprise
- the size of the stripes
- two pieces of information will be used to make decisions:
- file size
- file name (i.e. prefix and suffix)
Open questions
- will inheritance be supported for policy definitions?
how many levels of inheritance are permitted (1 or n)?
- in what way can policies be represented/stored?
Technical aspects
Performance
- client connections to OSDs are cheap because they are unauthorized
- clients do connect to each single OSD they have business with
osd psread acts as though reading stripes from multiple OSDs
- i.e. it opens several connections, in this case all targets are the same however
- each stripe has an altered offset (e.g., normally first n stripes start at offset 0 on each OSD, here it is 0+(i*stripesize) etc.)
- this is not impossible for access to AFS fileservers, but making connections is more costly
Notes on the code (changes)
- hashing is used for management of DB contents (not a change)
choice of OSDs resides in osddbuser.c
Link tables
- original link table entries consisted of 5 columns of 3 bits each
- thus, 5 different versions of a file were supported, each with a max. link count of 7
AFS+OSD (like MR-AFS) supports more places for one and the same file to live:
* 2 RW volumes (the 2nd during a move operation) * 1 BK volume * 13 RO volumes * 1 clone during move
which amounts to up to 17 places for a filealso, there might be as many as 6 versions of a file:
* 1 RW volume * 1 BK volume * 1 clone during move * 1 RO * 1 RO-old during vos release * 1 may be an old RO which was not reachable during the last vos release.
- as at least 6 columns are needed with at least 5 bits per column, a link table row now consists of 32 bits instead of 16
The explanations are from vol/namei_ops.c. The new format is used as Linktable version 2, with the original format still being supported as version 1.
Debugging techniques
- add trace* calls to the code
- CM_TRACE_WASHERE especially handy
use