Differences between revisions 7 and 8
Revision 7 as of 2008-05-26 15:29:02
Size: 9649
Editor: FelixFrank
Comment:
Revision 8 as of 2008-05-26 16:26:45
Size: 9921
Editor: FelixFrank
Comment: osd setvar
Deletions are marked like this. Additions are marked like this.
Line 17: Line 17:
 * using `osd setvariable`, one can specify the max. allowed number of parallel fetches from an OSD server: {{{ # osd setvar io maxParallelFetches 4
}}}
  * this is the only variable (!) currently available in OSD servers
  * this is not to be confused with `osd osd` which sets fields in the OSDDB
Line 46: Line 50:

Other HSM related features:

TableOfContents

General concepts

Goodies

  • the threads subcommand lets you know what each thread is currently doing {{{osd threads io

}}}

Working with an AFS+OSD cell

The following techniques are only useful when using OSD as a frontend for HSM:

  • {{{vos listobj

}}} is useful for finding all objects stored on a given OSD

  • {{{fs prefetch

}}} allows the user to schedule tape restore operations

  • the fetch queue for a given osd can be examined: {{{osd fetchqueue io_a

osd f io_a}}}

  • using osd setvariable, one can specify the max. allowed number of parallel fetches from an OSD server: {{{ # osd setvar io maxParallelFetches 4

}}}

  • this is the only variable (!) currently available in OSD servers

  • this is not to be confused with osd osd which sets fields in the OSDDB

  • it is useful to have bosserver run a script to wipe objects off the OSDs according to OSD fill rate etc.

  • the atime can be used to reach wiping decisions
    • this requires a filesystem with atime support
  • the OSDDB can tell non/wipeable OSDs apart
  • on wipeable OSDs, set the high water mark using osd setosd: {{{ # osd seto 12 -highwatermark 900

}}} where 12 is the OSD's ID from the OSDDB.

  • examine this setting using osd osd: {{{ # osd osd 12

Osd 'io_e' with id=12:

  • type = 0 minSize = 1024 KB maxSize = 67108864 KB totalSize = 284819 MB pmUsed = 1 per mille used totalFiles = 0 M Files pmFilesUsed = 0 per mille used ip = 141.34.22.100 server = 0 lun = 4 alprior = 50 rdprior = 0 migage = 0 seconds flags = 0 unavail = 0

    owner = 0 = location = 0 = timeStamp = 1211807708 = May 26 15:15 highWaterMark = 900 per mille used lowWaterMark = 0 per mille used (obsolete) chosen = 0 (should be zero) }}}

  • md5 checksums can be retrieved for archival copies of objects (unverified) {{{ osd md5 io 536870930.2.1413.0 4

}}}

This is how a fileserver and OSD server can share one machine:

  • touch the file /vicepx/OnlyRXOSD to forbid the fileserver to use this partition

The new fs ls subcommand can tell files apart:

  • {{{ [iokaste] /afs/desytest/osdtest_2 # fs ls

m rwx root 2048 2008-05-08 13:21:01 . m rwx root 2048 2008-05-08 16:31:48 .. f rw- bin 44537893 2008-05-08 11:36:40 ascii-file f rw- bin 1048576 2008-05-08 13:21:14 ascii-file.osd d rwx bin 2048 2008-05-08 09:06:52 dir f rw- bin 0 2008-05-08 13:08:46 empty-file o rw- bin 44537893 2008-05-08 13:19:58 new-after-move }}} where m is a mountpoint, f a file, d a directory and o an object. fs ls will also identify files with their objects wiped from on-line object storage (i.e., with archival copies only).

How to migrate data from an OSD

  1. set a low write priority to stop fileservers from storing data on the OSD in question {{{osd setosd -wrprior 0

}}}

  1. use {{{vos listobj

}}} to identify the files (by fid) that have data on the OSD

  1. use {{{fs replaceosd

}}} to move each file's data to another OSD

Priorities and choice of storing OSD

  • OSDs are used in a Round Robin fashion
  • priorities add weight to each OSD
  • static priorities can be set by the administrator {{{osd setosd -wrprior ... -rdprior ...

}}}

  • priorities are dynamic
    • ownerships apply a modifier of +/- 10 to the static priority
    • locations apply a modifier of +/- 20
    • the fill percentage plays a role if above 50% or 95% respectively

Customizing owner, location:

  • the osd addserver subcommand {{{osd adds 141.34.22.101 iokaste dtc ifh

}}} makes an AFS fileserver (!) known to the OSDDB

  • this is required in order to make use of ownerships and locations

  • this data can be examined using {{{[iokaste] /afs/desytest/osdtest_2 # osd servers

Server 'iokaste' with id=141.34.22.101:

  • owner = 1685349120 = 'dtc' location = 1768318976 = 'ifh'}}}
  • /!\ The help on osd addserver is misleading: {{{ # osd help adds

osd addserver: create server entry in osddb Usage: osd addserver -id <ip address> -name <osd name> [-owner <group name (max 3 char)>] [-location <max 3 characters>] [-cell <cell name>] [-help] }}} as the name to be specified is not actually an "osd name" but an alias name for the file server you're adding.

Data held in volumes, DBs etc.

  • all metadata belonging to files in a volume are stored in a designated volume special file
  • metadata references OSDs by id

  • OSD id is an index into the OSDDB

    • these IDs are permanent and can never be reused after deletion of an OSDDB entry
    • view the OSDDB using {{{osd l

}}}

  • a file's metadata is found using the metadataindex stored in the file's vnode

    • view the metadata using {{{fs osd <file>

}}}

How to upgrade a cell to AFS+OSD

  1. set up OSDDB on the database servers
  2. set up pristine AFS+OSD fileservers + OSDs
  3. move volumes to the AFS+OSD fileservers
    • volserver is supposed to be armed with a -convertvolumes switch for that purpose

    • otherwise, set the osdflag by hand {{{vos setfields <volume> -osd 1

}}}

Policies

  • policies are going to decide 4 basic things:
    1. whether object storage is to be used for a given file
    2. whether the objects are to be mirrored and how often
    3. how many stripes the file will comprise
    4. the size of the stripes
  • two pieces of information will be used to make decisions:
    1. file size
    2. file name (i.e. prefix and suffix)

Open questions

  • will inheritance be supported for policy definitions?
    • how many levels of inheritance are permitted (1 or n)?

  • in what way can policies be represented/stored?

Technical aspects

Performance

  • client connections to OSDs are cheap because they are unauthorized
  • clients do connect to each single OSD they have business with
  • osd psread acts as though reading stripes from multiple OSDs

    • i.e. it opens several connections, in this case all targets are the same however
    • each stripe has an altered offset (e.g., normally first n stripes start at offset 0 on each OSD, here it is 0+(i*stripesize) etc.)
    • this is not impossible for access to AFS fileservers, but making connections is more costly

Backwards compatibility

  • file server detects clients that are unable to recognize object storage
  • read OSD contents themselves, forward to client
    • this uses doubled bandwidth
    • low performance

Notes on the code (changes)

  • hashing is used for management of DB contents (not a change)
  • choice of OSDs resides in osddbuser.c

  • the DataXchange() function plays a major role

  • original link table entries consisted of 5 columns of 3 bits each
    • thus, 5 different versions of a file were supported, each with a max. link count of 7
  • AFS+OSD (like MR-AFS) supports more places for one and the same file to live:

     *      2 RW volumes (the 2nd during a move operation)
     *      1 BK volume
     *     13 RO volumes
     *      1 clone during move
    which amounts to up to 17 places for a file
  • also, there might be as many as 6 versions of a file:

     *      1 RW volume
     *      1 BK volume 
     *      1 clone during move
     *      1 RO
     *      1 RO-old during vos release
     *      1 may be an old RO which was not reachable during the last vos release.
  • as at least 6 columns are needed with at least 5 bits per column, a link table row now consists of 32 bits instead of 16

The explanations are from vol/namei_ops.c. The new format is used as Linktable version 2, with the original format still being supported as version 1. /!\ Does the code need to support legacy link table format? Volumes are incompatible anyway.

Debugging techniques

  • add trace* calls to the code
    • CM_TRACE_WASHERE especially handy
  • use

AfsOsd/Notes (last edited 2009-06-04 10:49:45 by FelixFrank)