Revision 9 as of 2008-05-26 17:18:44

Clear message

TableOfContents

General concepts

Goodies

}}}

Working with an AFS+OSD cell

The following techniques are only useful when using OSD as a frontend for HSM:

}}} is useful for finding all objects stored on a given OSD

}}} allows the user to schedule tape restore operations

osd f io_a}}}

}}}

}}} where 12 is the OSD's ID from the OSDDB.

Osd 'io_e' with id=12:

}}}

This is how a fileserver and OSD server can share one machine:

The new fs ls subcommand can tell files apart:

m rwx root 2048 2008-05-08 13:21:01 . m rwx root 2048 2008-05-08 16:31:48 .. f rw- bin 44537893 2008-05-08 11:36:40 ascii-file f rw- bin 1048576 2008-05-08 13:21:14 ascii-file.osd d rwx bin 2048 2008-05-08 09:06:52 dir f rw- bin 0 2008-05-08 13:08:46 empty-file o rw- bin 44537893 2008-05-08 13:19:58 new-after-move }}} where m is a mountpoint, f a file, d a directory and o an object. fs ls will also identify files with their objects wiped from on-line object storage (i.e., with archival copies only).

How to migrate data from an OSD

  1. set a low write priority to stop fileservers from storing data on the OSD in question {{{osd setosd -wrprior 0

}}}

  1. use {{{vos listobj

}}} to identify the files (by fid) that have data on the OSD

  1. use {{{fs replaceosd

}}} to move each file's data to another OSD

Priorities and choice of storing OSD

}}}

Customizing owner, location:

}}} makes an AFS fileserver (!) known to the OSDDB

Server 'iokaste' with id=141.34.22.101:

osd addserver: create server entry in osddb Usage: osd addserver -id <ip address> -name <osd name> [-owner <group name (max 3 char)>] [-location <max 3 characters>] [-cell <cell name>] [-help] }}} as the name to be specified is not actually an "osd name" but an alias name for the file server you're adding.

Data held in volumes, DBs etc.

}}}

}}}

How to upgrade a cell to AFS+OSD

  1. set up OSDDB on the database servers
  2. set up pristine AFS+OSD fileservers + OSDs
  3. move volumes to the AFS+OSD fileservers
    • volserver is supposed to be armed with a -convertvolumes switch for that purpose

    • otherwise, set the osdflag by hand {{{vos setfields <volume> -osd 1

}}}

Policies

Open questions

Possible representations for a policy

So far we thought of 3 possible notations for policies, each having implications on the overall expressiveness.

Disjoint Normal Form

A policy consists of an arbitrary number of predicates that can be thought of as logically ORed. Evaluation is interrupted as soon as one predicate evaluates to true. Each predicate consists of a number of atomic predicates which are logically ANDed:

( suffix(".root") ) or ( size > 1M and size < 20M ) or ( size > 20M ) 

Of course, each case needs to return a definite "answer" to all aspects covered, e.g.

( suffix(".root") )         => OSD, 1 stripe, 1 site
( size > 1M and size <20M ) => OSD, 1 stripe, 2 sites
( size > 20M )              => OSD, 2 stripes, 1 site
else                        => No Object Storage

the last case being the default. (!) This would need to be set cell-wide.

Discussion
This data model allows for rather efficient evaluation and might easily be represented to an administrator.

Variable list of rules

Like above, the policy consists of a list of predicates. Each can have arbitrary effects on how to store the file's data wrt. the aspects covered by policies (see above). Here too, the predicates must be evaluated in a fixed order. They are limited to logical AND, too. However, evaluation cannot terminate before reaching the end of the list. The default behaviour has to take effect before evaluation starts:

Default                     => No Object Storage
( suffix(".root") )         => OSD
( size > 1M )               => 2 sites
( size > 20M )              => 1 site, 2 stripes 

(this would have much the same effect as the example policy outlines above for DNF).

Fixed list of predicates

A policy consists of a fixed number of rules that consist of an arbitrary number of atomic predicates that can be linked using AND, OR and NOT and parenthesized. Each corresponds to a certain piece of info about the storing of OSDs:

OSD           = ( size > 1M or suffix(".root") )
Stripes 1     = true
Stripes 2     = size > 20M
Sites 1       = true
Sites 2       = size < 20M

This appears extremely complex though.

Technical aspects

Performance

Backwards compatibility

Notes on the code (changes)

The explanations are from vol/namei_ops.c. The new format is used as Linktable version 2, with the original format still being supported as version 1. /!\ Does the code need to support legacy link table format? Volumes are incompatible anyway.

Technical details on ubik databanks

Debugging techniques