Differences between revisions 1 and 62 (spanning 61 versions)
Revision 1 as of 2005-08-31 11:39:27
Size: 31268
Comment:
Revision 62 as of 2017-01-16 10:23:13
Size: 29305
Editor: TimmEssigke
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl DvGroup:read,write,delete,revert,admin Known:read,write All:read
[[TableOfContents]]
#acl DvGroup:read,write,revert All:read
<<TableOfContents>>
Line 6: Line 6:
=== Caveats ===

The following classes of systems need some extra attention for installation:
 * systems running with eth1 but no eth0 (globes - SUN V65x)
  * there's a "globe" post script now to handle this
 * systems running with eth0, but having additional interfaces recognized as eth0 by anaconda
  * picus1,2: there's a "e1000-no-e100" post script for those
  * fatmans: probably will not run SL3 ever
 * systems with certain ethernet cards anaconda has a problem to get up the second time
  * {OK} this problem seems to have vanished with SL 3.0.5
  * Dell 2850: workaround is to put the ks.cfg file onto a floppy
  and boot with "ks=floppy", or to pack a special initrd with the
  ks.cfg in the root filesystem and boot with ks=file, or to use
  the -local option to SL3U
Line 23: Line 8:
SL3 hosts are installed using kickstart ([http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/sysadmin-guide/ch-kickstart2.html online manual]). The repository is mirrored from
[ftp://ftp.scientificlinux.org/linux/scientific/ the ftp server at FNAL] and is located on the
installation server, z.ifh.de, in /net1/z/DL6/SL. The host profiles for
the kickstart install are kept in /net1/z/DL6/profiles, and some files
SL systems are installed using kickstart
 * [[http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Installation_Guide/ch-kickstart2.html|SL5 online manual]]
 * [[http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Installation_Guide/ch-kickstart2.html|SL6 online manual]]

The repository is mirrored from
[[ftp://ftp.scientificlinux.org/linux/scientific/|the ftp server at FNAL]] and is located on the
installation server, pallas.ifh.de, in /nfs1/pallas/SL. The host profiles for
the kickstart install are kept in /project/linux/SL/profiles, and some files
Line 28: Line 17:
/net1/z/DL6/postinstall (accessible through the http server running on z).
More files, and most utility scripts, are located in
/project/linux/SL3.
/project/linux/SL/firstboot (accessible through the http server running on pallas).
More files, and most utility scripts, are located in /project/linux/SL.
Line 34: Line 22:
 * [#cfvamos Configure the host in VAMOS]

This is important, because several variables must be set correctly
since they are needed by the tools used in the following steps.

 *
[#profiles Create a system profile]

Using CKS3, information from VAMOS and possibly from the AMS directory
or the live host, a kickstart file is generated that will steer the
 installation process
.

 *
[#ai Activate private key distribution]

Only after this step, the host will be able to request its private
 1. [[#cfvamos|Configure the host in VAMOS]]<<BR>> This is important, because several variables must be set correctly since they are needed by the tools used in the following steps.
 1. [[#profiles|Create a system profile]]<<BR>> Using CKS, information from VAMOS and possibly from the AMS directory or the live host, a kickstart file is generated that will steer the installation process. Today, only the partitioning information in that file is used any longer, and CKS will eventually be stripped down to not creating more than that.
 1. [[#ai|Activate private key distribution]]<<BR>> Only after this step, the host will be able to request its private
Line 49: Line 26:

 * [#prepare Prepare system boot into installation]
 1. [[#boot|Prepare system boot into installation]]<<BR>> Current options include PXE and hard disk. Other possible methods like USB stick, CD-ROM, or a tftp grub floppy are not currently available.
 1. Boot the system into installation<<BR>> During boot, the system will load the kernel and initrd made available
 in the previous step. Networking information comes from a DHCP server or is provided on the kernel command line.

A generic kickstart profile for the major release is retrieved by anaconda, according to the ks parameter on the kernel command line.
Line 52: Line 32:
 Current options include PXE, CD-ROM, and hard disk. Other possible
 methods like USB stick, multiple floppies, or a tftp grub floppy are
 not yet available.

 * [#boot Boot the system into installation]

 During boot, the system will load the kernel and initrd made available
 in step (d). Networking information comes from a DHCP server (possible
 with all methods) or is provided on the kernel command line (CD-ROM &
 hard disk methods only).

 The installation system locates the kickstart profile. Information is from
 a kernel command line provided by the tftp server (PXE method), manually
 (CD-ROM method), or the script preparing the hard disk boot.
 
 The kickstart profile contains all other information needed, including
 the repository location, partitioning & package selection, and a
 postinstall script that will do some very basic configuration and
 retrieve and install a one-time init script.

 After the first reboot, this init script (executing as the very last one)
 will retrieve the system's private keys and initial vamos configuration
 cache, and then bootstrap our site mechanisms for system maintenance.

[[Anchor(cfvamos)]]
The kickstart profile contains all other information needed, including the repository location, partitioning & package selection, and a postinstall script that will do some very basic configuration and retrieve and install a one-time init script.

 After the first reboot, this init script (executing as the very last one) will retrieve the system's private keys and initial vamos configuration cache and some essential packages, and then bootstrap our site mechanisms for system maintenance.<<BR>>

Beginning with SL6, and now backported to SL5, the kickstart profile is generic (SL5.ks/SL6.ks/SL7.ks/...) because retrieval of per-host profiles is broken in current EL releases. A per-host profile is still written by CKS, but only the partitioning information from this file is actually used during installation (the whole file is retrieved with wget in %pre, and partitioning info extracted with sed).

<<Anchor(cfvamos)>>
Line 79: Line 41:
Choose a default derived from sl3-def. Defaults starting with "sl3-" are
32bit, those starting with "sl3a-" are 64bit. These will mainly differ
in the settings for OS_ARCH and AFS_SYSNAME (see the sl3a-mod modifier).
64bit capable systems can run the 32bit version as well.

OS_ARCH is read by several tools in the following steps to determine what
to install. The same is true for CF_SL_release: This variable determines
which minor SL release the system will use. Both OS_ARCH and CF_SL_release
Choose a default derived from the current sl''X''-def, where ''X'' = 5, 6, 7, ... Before SL6, defaults starting with "sl''X''-" were
32bit, those starting with "sl''X''a-" were 64bit. These mainly differ in the settings for `OS_ARCH` and `AFS_SYSNAME` (see the sl''X''a-mod modifier). 64bit capable systems can run the 32bit version as well. As of SL6, only 64bit Systems are supported,
and sl''X''-def will be 64bit. The sl''X''-32-mod modifier is used for the few special purpose 32bit systems.
At the time of writing, the only such SL6 system is used for testing building and running the 32bit Open``AFS
module from the SRPM provided to the SL developers. Supporting 32-bit systems for users is not foreseeen for SL6+.



`
OS_ARCH` is read by several tools in the following steps to determine what
to install. The same is true for `CF_SL_release`: This variable determines
which minor SL release the system will use. Both `OS_ARCH` and `CF_SL_release`
Line 91: Line 55:
since sue.bootstrap will no longer permit OS_ARCH to change.

Run the Workflow whenever a system changes from DLx to SL3 or back, since
some tools (scout) can only consult the netgroups to decide how things
should be done. This is wrongwrongwrong, but ...

[[Anchor(profiles)]]
since sue.bootstrap will no longer permit `OS_ARCH` to change.

'''Run the Workflow whenever a system changes''' between major SL releases (say, from
SL5 to SL6 or back), changes netgroups etc. If in doubt, just do it and wait for half an hour before proceeding.

<<Anchor(profiles)>>
Line 100: Line 63:
This is done with the tool CKS3.pl which reads "host.cks3" files and creates This is done with the tool `CKS.pl` which reads "host.cks" files and creates
Line 102: Line 65:
directory, or the live system still running DL4, DL5 or SL3, as well as
pre/post script building blocks from /project/linux/SL3/{pre|post}.

CKS3.pl is located in /project/linux/SL3/CKS3, and is fully perldoc'd. A
sample DEFAULT.cks with many comments is located in the same directory.
directory, or the live system still running SL.

`
CKS.pl` is located in /project/linux/SL/scripts, and is fully perldoc'd.
There should be a link pointing to it in the profiles directory as well.
A
sample `DEFAULT.cks` with many comments is located in the same directory.
Line 110: Line 73:
 * You need to be a member of the sysprog unix group.
 * Log into z.
 * Go into /net1/z/DL6/profiles.
 * Check whether a .cks3 file for your host exists.
 * You need
  * write access to `/project/linux/SL/profiles`
  * read access to VAMOS
  * ssh access to the system to install, if it is up and partitions to be kept
 * cd into `/project/linux/SL/profiles`
 * Check whether a .cks file for your host exists.
Line 117: Line 82:
  machine, or a copy of DEFAULT.cks3.
  * (!) '''NO host.cks3 IS NEEDED AT ALL ''if''''' you just want to upgrade or
  machine, or a copy of `DEFAULT.cks`
  * (!) '''NO host.cks IS NEEDED AT ALL ''if''''' you just want to upgrade or
Line 120: Line 85:
  DEFAULT.cks3 is always read and should cover this case completely.
 * Run CKS3.pl, like this: `./CKS3.pl `''host''

 ||<tablestyle="width:100%; background-color: #E0E0FF;"> <!> '''Always rerun CKS3''', even if the existing .cks3 file looks fine.[[BR]][[BR]]This is because the selection and the postinstall script are copied into the kickstart file, and the old file may no longer be correct.||


* Watch the output. Make sure you understand what the profile is
 going to do to the machine! If in doubt, read and understand the
 <host>.ks file
before actually installing.
  DEFAULT.cks is always read and should cover this case completely.
 * Run CKS.pl, like this: `./CKS.pl `''host''

 ||<tablestyle="width:100%; background-color: #E0E0FF;"> <!> '''Always rerun CKS''' before installing a system, even if the existing .cks file looks fine.||

 * Check the output! It contains a lot of information! Make sure you understand what the profile is
 going to do to the machine! If in doubt, read and understand the SL''x''.ks and
 ''host''.`ks` files
before actually installing.
Line 130: Line 94:
 ||<tablestyle="width:100%; background-color: #FFA0A0;"> /!\ In particular, '''make sure you undertsand the partitioning, and any `clearpart` statements'''.[[BR]][[BR]]Other than for DL5, these may wipe the disks even in "interactive" installs!||  ||<tablestyle="width:100%; background-color: #FFA0A0;"> /!\ In particular, '''make sure you understand the partitioning, and any `clearpart` statements'''.<<BR>><<BR>>Other than with good old SuSE-based DL5, these may wipe the disks even in "interactive" installs!||
Line 134: Line 98:
[[Anchor(ai)]] Notice that '''as of SL6, only the partitioning information is actually used''' from the generated file. '''Anything else is not!'''

<<Anchor(ai)>>
Line 137: Line 103:
If you followed the instructions above (read the CKS3 output), you already know what to do:
{{{ ssh mentor sudo activ-ai <host>
If you followed the instructions above (read the CKS output), you already know what to do:
{{{
ssh configsrv sudo prepare-ai <host>
Line 143: Line 110:
/products/ai/scripts/ai-start which will NFS-export a certain directory
to mentor, put an SSL public key there, and ask mentor to crypt the
credentials with that key and copy them into the directory. If after
the installation the host has its credentials, it worked and no other
system can possibly have them as well. If it hasn't the keys are burned
/products/ai/scripts/ai-web which will retrieve a tarball with these keys and
a few other files from the install server. This works only once after each `prepare-ai`
for a host. If after the installation the host has its credentials, it worked and no other
system can possibly have them as well. If it hasn't, the keys are burned
Line 150: Line 116:
If ai-start fails, the system will retry after 5 minutes. Mails will
be sent to linuxroot@ifh.de from both mentor and the installing system,
indicating that this happened. The reason is usually that this step
If ai-web fails, the system will retry after 5 minutes. Mails will
be sent to linuxroot from both the AI server and the installing system,
indicating that this happened. The reason is usually that the prepare-ai step
Line 154: Line 120:
The ai daemon writes a log /var/log/ai/ai_script.log.

<<Anchor(boot)>>
Line 159: Line 127:
 * Perl Script

 If the system is still running a working DL4, DL5 or SL3 installation,
 * '''Perl Script'''<<BR>>
 If the system is still running a working SL installation,
Line 164: Line 131:
 {{{ /project/linux/SL3/SL3U/SL3U.pl yes please
}}}
 and either let the script reboot the system after the countdown,
 or interrupt the countdown with ^C and reboot (or have the user
 reboot) later.
 {{{
/project/linux/SL/scripts/SLU.pl yes please
}}}
Line 176: Line 141:
 options are available or may even be necessary for certain hosts
 (see 1.0).

 * SL CD-ROM

 ||<tablestyle="width: 100%; background-color: #E0E0FF;"> (!) It is ''much'' more convenient now to use the [#unicd unified CD] ||

 Images are /net/z/DL6/SL/<release>/<arch>/images/SL/boot.iso
 The release and arch have to match exactly the planned installation,
 or the installation system will refuse to work.

  * If the system has a valid DHCP entry (inluding the MAC address):

  At the boot prompt enter "linux ks=nfs:z:/net1/z/DL6/profiles/"

  If the system has more than one network interface, add "ksdevice=link".
  If the system has more than one network interface *connected*, instead
  add "ksdevice=eth0" or whatever is appropriate.

  * If the system has no valid DHCP entry yet:

  You have to add parameters like
  "ip=141.34.x.y netmask=255.255.255.0 gateway=141.34.x.1 dns=141.34.1.16"

  Or watch the log on the DHCP server, wait for the unknown MAC to
  appear, create a working entry for the host in /etc/dhcpd.conf,
  and restart the dhcp server. If you're quick enough, the client
  will receive an answer when it retries. Otherwise it has to be booted
  again.

  Don't forget to remove the CD before the first reboot, or installation
  will start all over.
      
 * [[Anchor(unicd)]] Grand Unified Boot CD

 This CD has a boot loader menu that allows installing DL5 and SL3.
 It also has preset kernel parameters that save you almost all typing:
 ks=..., ksdevice=link, netmask=..., dns=... are always set, but hidden,
 and there are visible, editable templates for ip=..., gateway=...).

 Menu entries are:
   
  * local

  This is the default, and will boot from the primary hard disk
  (whatever the BIOS thinks this is). Hence it's no problem if
  you forget to remove the CD during installation.

  * dl5-dhcp
 
  Installs DL5, getting network parameters by DHCP.

  * dl5-manual

  Installs DL5 with network parameters specified on the command line.
  When you select this entry, you'll see the tail of the preset options:
  {{{ "textmode=0 hostip=141.34.x.y gateway=141.34.x.1"
}}}
  Simply replace "x" and "y" by the appropriate values for the host
  and hit Enter.

  * sl303_32-d

  Install SL3.0.3/i386 using dhcp.

  * sl303_32-m

  Install SL3.0.3/i386. Network parameters are given on the command
  line (replace "x" and "y" in "ip=141.34.x.y gateway=141.34.x.1").

  * sl303_64-d

  Install SL3.0.3/x86_64 using dhcp.

  * sl303_64-m

  Install SL3.0.3/x86_64. Network parameters are given on the command
  line (replace "x" and "y" in "ip=141.34.x.y gateway=141.34.x.1").

 The entries for SL may vary over time, but generally follow the pattern
 sl<release>_<bits>-<method>, where bits is "32" or "64", and method is
 "d" for dhcp or "m" for manual. They have to be this cryptic because
 there's a 10 character limit for the labels :-(

 The ISO image is /project/linux/SL3/doc/InstallCD.iso, and the script next
 to it (it's in the CD's root directory as well) can be used to create
 modified images, for additional SL releases or different DL5 install
 kernels etc.: Simply edit the variables at the top (@SL_releases,
 $DL_kernel) and rerun the script on a system that has syslinux and mkisofs
 installed (tested on DL5 and SL3/32). The script will tell you the
 directory in /tmp where it writes the image.

 * PXE
 options are available or may even be necessary for certain hosts. To mention the two most important ones:

  * `-dhcp` will make the installation system use dhcp to find the IP address, which is useful if a system will be installed with a different address
  * `-reboot` will make the system reboot itself after a countdown (which can be interrupted with ^C)

 * '''PXE'''<<BR>>
Line 273: Line 150:
 server (actually, z) and run it. Then, pxelinux.0 will request the  server, and run it. Then, pxelinux.0 will request the
Line 279: Line 156:
  * TFTP & DHCP (script):

 
As root on z, run
  {{{ /project/linux/SL3/PXE/pxe <host>
}}}
  This will add the right link for the system in /tftpboot/pxelinux.cfg
 As root on pallas, run
 {{{
/project/linux/SL/scripts/pxe <host>
}}}
 This will add the right link for the system in /tftpboot/pxelinux.cfg
Line 286: Line 162:

  * DHCP (manually):

  If the system has no valid DHCP entry yet, you have to use the
  last method given above in (b) to find out the MAC and create or
  complete the entry in the configuration file manually. In addition to
  IP and MAC, the following parameters have to be supplied:
  {{{
next-server 141.34.32.16;
filename "pxelinux.0";
}}}
  If there was an incomplete entry for the system, these will already
  be present after running the "pxe" utility script.
Line 302: Line 165:
 /tftpboot/pxelinux.cfg, simply run (as root on z)
 {{{
        /project/linux/SL3/PXE/unpxe <host>
 /tftpboot/pxelinux.cfg, simply run (as root on pallas)
 {{{
/project/linux/SL/scripts/unpxe <host>
Line 311: Line 174:
 * GRUB Floppy

 As a last resort, one can try the grub floppy. This method will
 obtain the kernel and the initrd by tftp, hence a matching link
 has to exist in /tftpboot/pxelinux.cfg on the install server.

 If the host has a working dhcp entry, just boot the default entry
 on the floppy. If it doesn't select the other entry, hit 'e' to get
 into the editor, replace all "x" and "y" in the templates for
 IP addresses and netmasks, and finally hit "b" to boot the modified
 entry.

 The network drivers in GRUB only work for older cards. This is no
 problem because the more recent ones support PXE anyway.

 In particular, the 3C905 in PIII desktop PCs (najade or older PIII 750)
 is supported.

 The floppy image is located in /project/linux/SL3/Floppy. To adapt it
 to a new SL release or install kernel, simply loop mount the image and
 make the obvious changes in boot/grub/menu.lst.

=== System/Boot Method Matrix ===

 ||<rowstyle="background-color: #E0E0FF;"> ||Script || CD || Floppy || PXE ||
 || older systems ||yes || maybe || yes || no ||
 || PIII 750 desktop ||yes || some(1) || yes || no(2) ||
 || Najade (PIII 850) ||yes || some(1) || yes || no(2) ||
 || Nereide (P4 1700) ||yes || yes || ? || ? ||
 || Oceanide (P4 2400) ||yes || yes || ? || yes ||
 || Hyade (Dell 350) ||yes || yes || no || yes ||
 || Dryade (Dell 360) ||yes || yes || no || yes ||
 || Satyr (Dell 370) ||yes || yes || no || yes ||
 || Dell 380 ||yes || yes || no || yes ||
 || ice (intel serverboard) ||yes || yes || yes || yes ||
 || fatman (same board) ||yes || yes || yes || yes ||
 || Supermicro PIII ||yes || yes || no || no ||
 || Supermicro Xeon ||yes || yes || no || no ||
 || globe (SUN V65x) ||yes || yes || no || yes ||
 || heliade (SUN V20z) ||yes || yes || no || yes ||
 || Dell 1850 ||yes || yes || no || yes ||
 || Dell 2850 ||yes(3) || yes(3) || no || yes(3)||

 (1) This seems to depend on the motherboard and/or BIOS revision.
 In fact, some 850MHz models won't boot from CD while there are
 older 750MHz systems that will.

 (2) The PXE implementation in the MBA has a bug: It does not recognize
 the next-server argument and always tries to download the Loader
 from the same server that sent the DHCP offer. Hence these systems
 can be PXE-booted by setting up a special DHCP&TFTP server.

 (3) Anaconda up to and including at least SL 3.0.4 has a problem to get
 up the NIC a second time. Workarounds include putting the ks.cfg on
 a floppy or into the initrd, or using the -local switch to SL3U.pl.
Line 373: Line 180:
 * aaru (package updates)
 * '''aaru (package updates)'''<<BR>>
Line 380: Line 186:
 CF_YUM_extrarepos* and CF_DZPM_AGING.

 * yumsel (addition and removal of packages)
 CF_YUM_extrarepos*, CF_YUM_bootonly* and CF_DZPM_AGING.

 * '''yumsel (addition and removal of packages)'''<<BR>>
Line 388: Line 193:
 of VAMOS variables CF_yumsel_*.

 * KUSL3 (everything related to kernels)
 of VAMOS variables CF_yumsel_*.<<BR>>
 ||<tablestyle="width: 100%; background-color: #E0E0FF;"> (!) yumsel documentation, including the file format, is available with `perldoc /sbin/yumsel`||

 * '''KU[SL(3)] (everything related to kernels)'''<<BR>>
Line 394: Line 199:
 variable Linux_kernel_version and a few others.  variable Linux_kernel_version and a few others. On SL5, KUSL3 was
 replaced by KUSL, and this should happen eventually on SL3/4 as well.
 SL3/4 systems in update class "A" already use KUSL.pl.
 
Line 398: Line 206:
Errata are synced to arwen with /project/linux/SL3/sync-arwen.sh
and then to z with /project/linux/SL3/sync-z.sh (still manually).
||<tablestyle="width: 100%; background-color: #E0E0FF;"> <!> For yum on SL3, the command to create the necessary repository data is `yum-arch <dir>`<<BR>> For SL4/5, it is `createrepo <dir>`<<BR>><<BR>> <!> '''Neither should be run manually.'''<<BR>><<BR>> (!) Running the script `/project/linux/SL/scripts/UpdateRepo.pl -x </path/to/repo>` '''on pallas''' will do the right things to update the yum repodata, the repoview data accessible from [[Local_Linux_Repositories]], and release the right afs volumes where applicable.||

Errata are synced to pallas with /project/linux/SL/scripts/sync-pallas.pl (still manually).
Line 404: Line 213:
/project/linux/SL3/yum/stage-errata/stage-errata. The sync/stage scripts
send mail to linuxroot@ifh.de unless in dryrun mode. The stage_errata
/project/linux/SL/scripts/stage-errata. The sync/stage scripts
send mail to linuxroot unless in dryrun mode. The stage_errata
Line 410: Line 219:
Most of these are found in /afs/ifh.de/packages/RPMS/@sys/System,
with their (no)src rpms in /afs/ifh.de/packages/SRPMS and the source
tarballs in /afs/ifh.de/packages/SOURCES. Some come from external
These are generally found in `/nfs1/pallas/SL/Z`, with their (no)src rpms in
`
/afs/ifh.de/packages/SRPMS/System` and the source
tarballs in `/afs/ifh.de/packages/SOURCES` (for .nosrc.rpms). Some come from external
Line 417: Line 226:
Available subdirectories under Z:

||<rowstyle="background-color: #EBEBFF"> Path || Repo used by || Intended Use ||
|| `common` || <!> ''all'' systems || really common packages ||
|| `5/noarch` || all SL5 systems ||<|3> typical addons for a major release ||
|| `5/i386` || all 32-bit SL5 systems ||
|| `5/x86_64` || all 64-bit SL5 systems ||
|| `54/noarch` || all SL5.4 systems ||<|3> bug fixes included in next release<<BR>><<BR>> new addons incompatible with last release ||
|| `54/i386` || all 32-bit SL5.4 systems ||
|| `54/x86_64` || all 64-bit SL5.4 systems ||
|| `54/INSTALL/noarch`|| all SL5.4 systems ||<|3> as above, but available already during system installation ||
|| `54/INSTALL/i386` || all 32-bit SL5.4 systems ||
|| `54/INSTALL/x86_64` || all 64-bit SL5.4 systems ||

The distinction fo the `INSTALL` repositories is necessary because some of our add-ons do not work correctly when installed during initial system installation. Notice these repos may be populated by symlinks to packages or subdirectories, e.g. `afs -> ../../x86_64/afs` , but the metadata must be updated separately.
Line 419: Line 244:
cd /afs/.ifh.de/packages/RPMS/@sys/System
yum-arch .
arcx vos release $PWD
}}}
/project/linux/SL/scripts/UpdateRepo.pl -x /path/to/repo
}}}
<!> Do ''not'' use `createrepo` manually.
Line 431: Line 255:
These packages reside in directories SL/<release>/<arch>_extra/<name> These packages reside in directories `Z/<major release>/extra/<arch>/<name>`
Line 433: Line 257:
3.0.4/i386 are in /net/z/DL6/SL/304/i386_extra/afs1213 . To have SL3/i386 would be in /nfs1/pallas/SL/Z/3/extra/i386/afs1213 . To have
Line 441: Line 265:
To make available packages in such a repository, you must yum-arch the
*sub*directory (not <arch>_extra). While the installation server is
still running DL5, use /project/linux/SL3/YUM-DL5/yum-arch-dl5 (ignore the
error messages about it being unable to open some file in /tmp).
To make available packages in such a repository, you must provide the full path, including the
*sub*directory, to the repo update script:
{{{
/project/linux/SL/scripts/UpdateRepo.pl -x /nfs1/pallas/SL/Z/3/extra/i386/afs1213
}}}
Line 451: Line 276:

||<tablestyle="width: 100%; background-color: #E0E0FF;"> {i} Starting with SL5, KUSL3.pl is being replaced by KUSL.pl. As of May 2007, the new script is still being tested on SL3/4, but eventually should be used on all platforms. SL6 uses an even newer script called KU.pl ||
Line 458: Line 285:
Basically, set Linux_kernel_version in VAMOS, and on the host (after a
sue.bootstrap) run "KUSL3.pl", make sure you like what it would do, then
run "KUSL3.pl -x".
Basically, set `Linux_kernel_version` in VAMOS, and on the host (after a
`sue.bootstrap`) run `KUSL3.pl`. Make sure you like what it would do, then
run `KUSL[3].pl -x`.
Line 466: Line 293:
If the variable "Linux_kernel_modules" is set to a (whitespace separated) list
of module names, KUSL3 will install (and require the availability of) the
If the variable `Linux_kernel_modules` is set to a (whitespace separated) list
of module names, KUSL(3) will install (and require the availability of) the
Line 469: Line 296:
"2.4.21-20.0.1.EL 2.4.21-27.0.2.EL ", and Linux_kernel_modules is "foo bar", `2.4.21-20.0.1.EL 2.4.21-27.0.2.EL`, and Linux_kernel_modules is `foo bar`,
Line 478: Line 305:

KUSL3 will refuse to install a kernel if mandatory packages are not available.
The new KUSL.pl will also handle packages complying with the ''kmod'' conventions
introduced with RHEL5.

KU(SL)(
3) will refuse to install a kernel if mandatory packages are not available.
Line 484: Line 313:
Matching kernel-module-alsa-`uname -r` packages are installed by KUSL3.pl
if (a) they are available and (b) the package "alsa-driver" is installed
(the latter should be the case on desktops after yumsel has run for the first
time).

Both are created from the alsa-driver srpm found in /packages/SRPMS/System.
Besides manual rebuilds, there is now the option to use the script
/project/linux/SL3/modules/build-alsa-modules.pl.

Short instructions for building the kernel modules package manually (for an easier method,
[#alsascrp see below]):
      
 ||<tablestyle="width: 100%; background-color: #E0E0FF;"> <!> First make sure you use the right compiler.[[BR]][[BR]]`which gcc` should say `/usr/bin/gcc`, not something from `/opt/products`!||

 a.#1 Install the kernel-source rpm for the target kernel.

 For example, this is kernel-source-2.4.21-20.EL for both kernels 2.4.21-20.EL and
 2.4.21-20.ELsmp. KUSL3 will do this for you if Linux_kernel_source is set accordingly (if in
 doubt, set it to "all" in VAMOS, and on the build system sue.bootstrap and run KUSL3). You need
 not be running the target kernel in order to build the modules.

 a.#2 Clean the source directory

 {{{
cd /usr/src/linux-2.4.21.....
make mrproper
}}}

 a.#3 Configure the source tree.

 First, find the right configuration:
  * either the file /boot/config-<kernelversion
  * or a matching file /usr/src/linux-2.4.21..../configs/kernel-....
 {{{cp <config for target kernel> .config
make oldconfig
make dep
make clean}}}

 a.#4 Build the binary module package, like this:

 {{{
cd /usr/src/packages/SPECS
rpm -ivh /packages/SRPMS/System/alsa-driver-1.0.7-1.nosrc.rpm
ln -s /packages/SOURCES/alsa/* ../SOURCES

rpmbuild -bb --target i686 --define "kernel 2.4.21-20.0.1.ELsmp" \
          alsa-driver-1.0.7-1.spec
}}}

 a.#5 Repeat steps (b,c,d) for every target kernel.

 Modify the target and kernel version according to what you need.
 We'll typically need i686 for both SMP and UP kernels. The 64bit
 modules have to be built on a 64bit system. Note the ia32e kernel
 actually *is* SMP although the name doesn't say so, and there
 is no UP kernel for this architecture at all. Here's a table of
 what you probably want to build:

  ||<rowstyle="background-color: #E0E0FF;"> target || version || needed ||
  || i686 || 2.4.21-20.0.1.ELsmp || definitely ||
  || i686 || 2.4.21-20.0.1.EL || definitely ||
  || ia32e || 2.4.21-20.0.1.EL || definitely ||
  || x86_64 || 2.4.21-20.0.1.ELsmp || probably not ||

 a.#6 Copy the resulting kernel-module-alsa-<kernelversion> rpms to the right directory:

 {{{
i686: /afs/.ifh.de/packages/RPMS/i586_rhel30/System/alsa
ia32e/x86_64: /afs/.ifh.de/packages/RPMS/amd64_rhel30/System/alsa
}}}

 Then make them available to yum:

 {{{
cd /afs/.ifh.de/..../System
yum-arch .
arcx vos release $PWD
}}}

 There is no need to copy the alsa-driver rpms generated, unless
 a new alsa version has been built, in which case one of the resulting
 packages should be copied and yum-arch'd per target directory.

 After step (f), KUSL3 will pick up the modules.

[[Anchor(alsascrp)]]Scripted build of the kernel modules packages:

 ||<tablestyle="width: 100%; background-color: #E0E0FF;"> <!> First make sure you use the right compiler.[[BR]][[BR]]`which gcc` should say `/usr/bin/gcc`, not something from `/opt/products`!||

 a.#1 Make sure the kernel-source rpms are installed for all kernels you want to build for.

 a.#2 Make sure the whole kernel source tree(s) are owned by you, and that you can run `ssh root@<buildhost>`.

 a.#3 Run the script like this:

 {{{
/project/linux/SL3/modules/build-alsa-modules.pl 1.0.8-1
}}}

 You'll be prompted for every kernel that the script can
 sensibly build modules for on this system. Pick the ones
 you want. Check the dryrun output, and once you like it:

 {{{
/project/linux/SL3/modules/build-alsa-modules.pl -x 1.0.8-1
}}}

 This should build everything you need (after going through the prompting again).

 a.#4 Copy the output rpms into the repository (as described in step (f) for the manual build above). The script will print the commands that need to be executed.
''obsolete''
Line 597: Line 317:
This is similar to the ALSA modules, but;

 * The srpms are in /packages/SRPMS/esdcan (different ACL from others
 due to proprietary license of source).

 * There's no build script.

 * Builds should be done manually, on pitzrap itself, and always against
 a ''fresh'' kernel-source package:

  a.#1 remove the kernel-source package(s)

  a.#2 install the right kernel-source package for the kernel you want to build the module for

  a.#3 configure it:
  {{{
cd /usr/src/linux-2.4.....
cp configs/kernel-2.4.21-i686.config .config
make dep
make clean
}}}

  a.#4 install the srpm (it doesn't matter from which kernel build it is):
  {{{
rpm -ivh /packages/SRPMS/kernel-module-esdcan-...3.3.3-1.src.rpm
}}}

  a.#5 build:
  {{{
rpmbuild -ba [--define 'kernel 2.4.21....'] --target i686 ...spec
}}}

  a.#6 copy the .i686.rpms to /afs/.ifh.de/packages/i586_rhel30/System/esdcan, yum-arch and release
''obsolete''
Line 633: Line 321:
Again, similar to alsa. Maybe a bit simpler since the spec will deal
with the kernel sources correctly and without further attention (it
makes a copy of the source directory and then does the right thing).

 a.#1 install the right kernel-source package (there's a build requirement)

 a.#2 install the srpm:
 {{{
rpm -ivh /packages/SRPMS/System/nvidia-driver-1.0.7174-3.src.rpm
}}}

 a.#3 build (on an SMP system, on a UP system the define changes accordingly):
 {{{
rpmbuild -ba nvidia-driver-1.0.7174-3.spec
}}}
  * on i386, will build i386 userspace packages
  * on x86_64, will build userspace and kernel package for current kernel
 {{{
rpmbuild -bb --target i686 nvidia-driver-1.0.7174-3.spec
}}}
  * on i386, will build kernel module for running kernel
 {{{
rpmbuild -bb --target i686 --define 'kernel 2.4.21-27.0.2.EL' ...
}}}
  * on i686, will build kernel module for other kernels
 {{{
rpmbuild -bb --define 'kernel ...' --define 'build_module 1' nvidia...
}}}
  * on x86_64, will build kernel module for other kernel

 a.#4 copy the .rpms to /afs/.ifh.de/packages/@sys/System/nvidia, yum-arch and release

== Adding a new SL3 release ==

There are quarterly releases of SL3, following Red Hat's updates to RHEL.
===== 32-bit SL5 =====
Only the 'gen2' packages are required:
{{{
rpmbuild --rebuild --sign --define 'kernel 2.6.18-308.8.1.el5' --define 'nvgen 2' /packages/SRPMS/System/nvidia/nvidia-driver-gx-080102-3.sl.src.rpm --target i686
}}}

===== 64-bit SL5 =====
We need the 'gen2' and 'gen3' packages:
{{{
rpmbuild --rebuild --sign --define 'kernel 2.6.18-308.8.1.el5' --define 'nvgen 3' /packages/SRPMS/System/nvidia/nvidia-driver-gx-120416-1.sl.src.rpm
}}}

{{{
rpmbuild --rebuild --sign --define 'kernel 2.6.18-308.8.1.el5' --define 'nvgen 2' /packages/SRPMS/System/nvidia/nvidia-driver-gx-080102-3.sl.src.rpm
}}}

===== SL6 =====
On SL6, the nvidia drivers are packaged as "kABI-tracking kmods". It shouldn't be required to rebuild them for normal kernel updates.

==== Lustre ====
===== 64-bit SL5 =====
We only provide these for the normal (non-Xen) kernel:
{{{
KVERSION=2.6.18-308.8.1.el5 rpmbuild --rebuild --sign /packages/SRPMS/System/lustre/lustre-1.8.7-1.wc1.1.src.rpm
}}}

The modules also have to be copied to the NAF.

===== 64-bit SL6 =====
{{{
KVERSION='2.6.32-220.17.1.el6.x86_64' rpmbuild --rebuild --sign /packages/SRPMS/System/lustre/lustre-1.8.7-1.wc1.2.el6.src.rpm
}}}
==== XFS ====

''obsolete''

==== ARECA RAID (SL4) ====

''obsolete''

== Adding a new SL release ==

There are quarterly releases of SL, following Red Hat's updates to RHEL.
Line 669: Line 365:
The procedure is the same for SL3 and SL4. Just substitute filenames and
paths as appropriate:
Line 672: Line 370:
Modify sync-arwen.sh and sync-z.sh to include the new release. Make
sure there's enough space on both arwen and z. Now sync-arwen, then
sync-z. If you're using 30rolling for testing, make a link like this:
{{{
/net1/z/DL6/SL/304 -> 30rolling
}}}

'''Step 2: Create empty extra postinstall repositories'''

{{{
mkdir /net1/z/DL6/SL/304/i386_post
cd /net1/z/DL6/SL/304/i386_post
/project/linux/SL3/YUM-DL5/yum-arch-dl5 .

mkdir /net1/z/DL6/SL/304/x86_64_post
cd /net1/z/DL6/SL/304/x86_64_post
/project/linux/SL3/YUM-DL5/yum-arch-dl5 .
}}}

If some packages are needed at this stage, of course put them there...

'''Step 3: Create staged errata directories'''

Modify `/project/linux/SL3/yum/stage-errata/stage-errata.cf` to include the new
 * Create a new logical volume on a:
 {{{
lvcreate -L 30G -n SL44 vg00
}}}
 * Add an according line in /etc/fstab (mount with the `acl` option)
 * Create the directory, mount the volume, and make sure permissions and security context are right:
 {{{
chgrp sysprog /nfs1/pallas/SL/44
chmod g+w /nfs1/pallas/SL/44
getfacl /nfs1/pallas/SL/43 | setfacl --set-file=- /nfs1/pallas/SL/44
chcon system_u:object_r:httpd_sys_content_t /nfs1/pallas/SL/44
}}}
 The last command makes it possible to access the directory through apache. The chgrp and chmod
 are actually redundant if ACLs are used.
 * Modify sync-pallas.pl to include the new release. Now sync-pallas.pl (do a dryrun first, and look wether
 additional subdirectories should be excluded).
 * If you're using xrolling for testing, make a link like this:
 {{{
/nfs1/a/SL/44 -> 40rolling
}}}
 * Check access through http

'''Step 2: Create staged errata directories'''

Modify `/project/linux/SL/scripts/stage-errata.cf` to include the new
Line 697: Line 396:
you ''must'' configure 30rolling, not 304 (or whatever). Now run stage-errata.

'''Step 4: Make the kernel/initrd available for PXE boot'''

Go into `/tftpboot` on z. Do something like
{{{
cp -i /net1/z/DL6/SL/304/i386/images/SL/pxeboot/vmlinuz vmlinuz.sl304
cp -i /net1/z/DL6/SL/304/x86_64/images/SL/pxeboot/vmlinuz vmlinuz.sl304amd64
cp -i /net1/z/DL6/SL/304/i386/images/SL/pxeboot/initrd.img initrd.sl304
cp -i /net1/z/DL6/SL/304/x86_64/images/SL/pxeboot/initrd.img initrd.sl304amd64
}}}
Then cd into `pxelinux.cfg`. Make copies of the relevant configuration files
(`cp SL303-i386-ks SL304-i386-ks; cp SL303-x86_64-ks SL304-x86_64-ks`)
and edit them accordingly (s/303/304/g);

'''Step 5: Make the release available in VAMOS'''
you ''must'' configure 30rolling, not 304 (or whatever). The same for SL4. Now run stage-errata.

'''Step 3: Make the kernel/initrd available by TFTP for PXE boot'''

Run the script
{{{
/project/linux/SL/scripts/tftp_add_sl_release 44 i386
}}}
and accordingly for other releases and architectures. This will copy the kernel and the initrd,
and create a pxelinux configuration file. You may still want/have to add a few lines in
/tftpboot/pxelinux.cfg/default (for example, for Tier2 installs).

'''Step 4: Make the release available in VAMOS'''
Line 718: Line 414:
'''Step 6: test''' '''Step 5: test'''
Line 722: Line 418:
  /project/linux/SL3/PXE/pxe <testhost>   /project/linux/SL/scripts/pxe <testhost>
Line 727: Line 423:
  cd /net1/z/DL6/profiles
  ./CKS3.pl <testhost>
}}}

Make sure SL3U works correctly:
  cd /project/linux/SL/profiles
  ./CKS.pl <testhost>
}}}

Make sure SLU works correctly:
Line 734: Line 430:
  /project/linux/SL3/SL3U/SL3U.pl yes please   /project/linux/SL/scripts/SLU.pl yes please
Line 747: Line 443:
== Booting a rescue system ==

There are several ways to do this, including:

'''From CD1 of the distribution'''

Simply boot from CD1 of the distribution. At the boot prompt, type `linux rescue`.

'''Over the network using PXE'''

Make sure the system gets the "next-server" and "filename" responses from the
dhcp server, but there's no link for the system in /tftpboot on the install server.
At the boot prompt, then enter something like "sa53 rescue".

== Building Kernel Packages ==

=== 32-bit SL3 ===

First install the kernel srpm (''not kernel-source''). Make your changes to the spec, add patches etc.

{{{
rpmbuild --sign -ba kernel-2.4.spec
}}}
This will build
 * kernel.src.rpm
 * kernel-source.i386.rpm
 * kernel-doc.i386.rpm
 * kernel-BOOT.i386.rpm

{{{
rpmbuild --sign -ba --target i686 kernel-2.4.spec
}}}
This will build
 * kernel.i686.rpm
 * kernel-smp.i686.rpm
 * kernel-hugeme.i686.rpm
 * kernel-unsupported.i686.rpm
 * kernel-smp-unsupport.i686.rpm
 * kernel-hugemem-unsupported.i686.rpm

Trying to turn off build of the hugemen kernel breaks the spec.

Additional modules for these are built as for any other SL kernel, with one exception of course:

==== Building the 1.2.x kernel-module-openafs (and openafs) packages ====

For ordinary SL kernels, this is done at FNAL, hence we needn't bother. But for our own kernels, or if we want to apply a change to the SRPM, we have to do this ourselves.

||<tablestyle="width:100%; background-color: #FFA0A0;"> The kernel version must always be defined, and always '''''without''''' the "smp" suffix.<<BR>><<BR>> Otherwise, when building on an SMP system, the build will work but the modules will not.||

For each kernel version you want to build modules for, install the kernel-source, kernel, and kernel-smp RPMs on the build system, they're all needed. Then:

 * To build the base packages on i686:
 {{{
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --define 'kernel 2.4.21...' openafs-...src.rpm
}}}

 * To build the kernel module packages (UP and SMP) on i686:
 {{{
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --target i686 --define 'kernel 2.4.21...' openafs-...src.rpm
}}}

 * To build the base packages plus the kernel modules (UP and SMP) on x86_64:
 {{{
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --define 'kernel 2.4.21...' openafs-...src.rpm
}}}

 * To build just the kernel modules (UP and SMP) on x86_64:
 {{{
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --define 'kernel 2.4.21...' --define 'build_modules 1' openafs-...src.rpm
}}}

 * To build just the kernel module ''for ia32e'' (intel CPUs) on x86_64:
 {{{
PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --target ia32e --define 'kernel 2.4.21...' openafs-...src.rpm
}}}


==== Building the NEW 1.4.x kernel-module-openafs packages ====

As of December 2006, there exists a unified SRPM for OpenAFS 1.4.2+, which doesn't have the
build problems described above, and works in exactly the same way on SL3, SL4, and SL5. It's named ''openafs.SLx'',
but will create packages named ''openafs'' with SL3, SL4, SL5 in the release number. The SRPM can
(and should) be rebuilt without being root. The steps are the same on every platform:

First install the right kernel-source (SL3) or matching kernel[-smp|-largesmp|-xen|...]-devel package
for the target kernel.

Then run:
{{{
rpmbuild --rebuild --sign --target i686 --define 'kernel 2.4.21...' --define 'build_modules 1' openafs.SLx-...src.rpm
}}}

There's always just one module built per invocation. Building on SMP systems is ok.

Supported targets include i686, athlon (SL3 only), ia32e (SL3 only, must build on 64bit system), x86_64 (must build on 64bit system), ia64 (untested).

Supported kernel flavours include smp (SL3/4), hugemem (SL3/4), largesmp (SL4), xen (SL5), xenU (SL4), PAE (SL5).

==== Building the 1.6.x kmod-openafs packages (SL6) ====

Normally, this should not be required as these packages needn't be updated with the kernel as long as the kernel ABI used by the modules is kept stable. At the time of writing (late in the 6.2 cycle) , this has been the case since EL6 GA - the modules in use are still the ones built against the very first SL6 kernel.

If it's required to rebuild against a new kernel:

{{{
rpmbuild --rebuild --sign --define 'kernel 2.6.32...' --define 'build_kmod 1' openafs.SLx-1.6.....src.rpm
}}}

== Notes on 10GbE ==

=== Cards ===
So far we have used two different cards:
 * Intel 82598EB
 * Broadcom NetXtreme II BCM57710

The intel card is used in zyklop41..43. It has no PXE ROM (but supposedly can be flashed), is cooled passively, and works out of the box. It can also do 1Gb, but not 10/100. Throughput is not very good, maybe because the cards won't fit into the PCIex8 Slot in the Dell R510.

The Broadcom card is used in dusk & dawn so far. It does have a PXE ROM (not tested yet though), has a cooling fan :-( , and currently needs an updated driver because the bnx2x driver in the SL5.6 kernel (2.6.18-238.*) has a bug. The host will need a powercycle after the SL5.6 driver was used, a reset is not sufficient. The driver update is packaged in kernel-module-bnx2 rpms. These can currently only be built if the target kernel is running on the build host. The driver update should become unnecessary with SL5.7. The cards (purchased from Dell) can only do 10000baseT - not even Gigabit is supported, hence they ''must'' be connected to a 10GbE switch with an appropriate cable.

=== LRO/TPA and bridged networking ===
Large Receive Offload is a hardware feature significantly enhancing throughput. Alas, it is incompatible with bridged networking. To use such a device with Xen or KVM virtualization, LRO has to be turned off by specifying `options bnx2x disable_tpa=1` in /etc/modprobe.conf (or modprobe.d/...). Otherwise, all kinds of weird things happen although the network basically works. Unfortunately, this reduces the throughput as measured with qperf (under the Xen kernel in DOM0) to just 30% of what can be achieved with the normal kernel and LRO enabled. NB "TPA" = "Transparent Packet Aggregation".

== Workarounds for certain hardware/OS combinations ==

=== SL5 on Dell Precision T3500 ===
Requires `vars.CF_kernel_append='acpi_mcfg_max_pci_bus_num=on'` . Without this, performance is very bad.

=== SL6 on Dell Poweredge M620 with HT disabled ===
Requires `vars.CF_kernel_append='acpi=ht noapic'` . Without this, performance is abysmal.
Line 751: Line 578:
http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/en-US/System_Administration_Guide_/Kickstart_Installations.html

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Installation_Guide/ch-kickstart2.html

Installation

Overview

SL systems are installed using kickstart

The repository is mirrored from the ftp server at FNAL and is located on the installation server, pallas.ifh.de, in /nfs1/pallas/SL. The host profiles for the kickstart install are kept in /project/linux/SL/profiles, and some files needed during the postinstallation, before the AFS client is available, in /project/linux/SL/firstboot (accessible through the http server running on pallas). More files, and most utility scripts, are located in /project/linux/SL.

Installation takes the following steps:

  1. Configure the host in VAMOS
    This is important, because several variables must be set correctly since they are needed by the tools used in the following steps.

  2. Create a system profile
    Using CKS, information from VAMOS and possibly from the AMS directory or the live host, a kickstart file is generated that will steer the installation process. Today, only the partitioning information in that file is used any longer, and CKS will eventually be stripped down to not creating more than that.

  3. Activate private key distribution
    Only after this step, the host will be able to request its private keys and initial configuration cache from mentor.

  4. Prepare system boot into installation
    Current options include PXE and hard disk. Other possible methods like USB stick, CD-ROM, or a tftp grub floppy are not currently available.

  5. Boot the system into installation
    During boot, the system will load the kernel and initrd made available in the previous step. Networking information comes from a DHCP server or is provided on the kernel command line.

A generic kickstart profile for the major release is retrieved by anaconda, according to the ks parameter on the kernel command line.

The kickstart profile contains all other information needed, including the repository location, partitioning & package selection, and a postinstall script that will do some very basic configuration and retrieve and install a one-time init script.

  • After the first reboot, this init script (executing as the very last one) will retrieve the system's private keys and initial vamos configuration cache and some essential packages, and then bootstrap our site mechanisms for system maintenance.

Beginning with SL6, and now backported to SL5, the kickstart profile is generic (SL5.ks/SL6.ks/SL7.ks/...) because retrieval of per-host profiles is broken in current EL releases. A per-host profile is still written by CKS, but only the partitioning information from this file is actually used during installation (the whole file is retrieved with wget in %pre, and partitioning info extracted with sed).

System Configuration in VAMOS

Choose a default derived from the current slX-def, where X = 5, 6, 7, ... Before SL6, defaults starting with "slX-" were 32bit, those starting with "slXa-" were 64bit. These mainly differ in the settings for OS_ARCH and AFS_SYSNAME (see the slXa-mod modifier). 64bit capable systems can run the 32bit version as well. As of SL6, only 64bit Systems are supported, and slX-def will be 64bit. The slX-32-mod modifier is used for the few special purpose 32bit systems. At the time of writing, the only such SL6 system is used for testing building and running the 32bit OpenAFS module from the SRPM provided to the SL developers. Supporting 32-bit systems for users is not foreseeen for SL6+.

OS_ARCH is read by several tools in the following steps to determine what to install. The same is true for CF_SL_release: This variable determines which minor SL release the system will use. Both OS_ARCH and CF_SL_release affect the choice of installation kernel & initrd, installation repository, and yum repositories for updating and installing additional packages.

It should now be safe to do this step without disabling sue on the system, since sue.bootstrap will no longer permit OS_ARCH to change.

Run the Workflow whenever a system changes between major SL releases (say, from SL5 to SL6 or back), changes netgroups etc. If in doubt, just do it and wait for half an hour before proceeding.

Creating System Profiles

This is done with the tool CKS.pl which reads "host.cks" files and creates "host.ks" files from them, using additional information from VAMOS, the AMS directory, or the live system still running SL.

CKS.pl is located in /project/linux/SL/scripts, and is fully perldoc'd. There should be a link pointing to it in the profiles directory as well. A sample DEFAULT.cks with many comments is located in the same directory.

To create a profile:

  • You need
    • write access to /project/linux/SL/profiles

    • read access to VAMOS
    • ssh access to the system to install, if it is up and partitions to be kept
  • cd into /project/linux/SL/profiles

  • Check whether a .cks file for your host exists.
    • If it does, and you find you have to modify the file, make sure it is not a link to some other file before you do so.
    • If it does not, create one by starting with a copy from a similar

      machine, or a copy of DEFAULT.cks

    • (!) NO host.cks IS NEEDED AT ALL if you just want to upgrade or reinstall a normal system without changing disk partitioning, since DEFAULT.cks is always read and should cover this case completely.

  • Run CKS.pl, like this: ./CKS.pl host

    <!> Always rerun CKS before installing a system, even if the existing .cks file looks fine.

  • Check the output! It contains a lot of information! Make sure you understand what the profile is

    going to do to the machine! If in doubt, read and understand the SLx.ks and host.ks files before actually installing.

    /!\ In particular, make sure you understand the partitioning, and any clearpart statements.

    Other than with good old SuSE-based DL5, these may wipe the disks even in "interactive" installs!

    Also make sure the SL release and architecture are what you want.

Notice that as of SL6, only the partitioning information is actually used from the generated file. Anything else is not!

Activating Private Key Distribution

If you followed the instructions above (read the CKS output), you already know what to do:

ssh configsrv sudo prepare-ai <host>

This will activate the one-shot mechanism for giving the host (back) its private keys (root password, kerberos keyfile, vamos/ssh keys, ...). The init script retrieved during postinstall will start the script /products/ai/scripts/ai-web which will retrieve a tarball with these keys and a few other files from the install server. This works only once after each prepare-ai for a host. If after the installation the host has its credentials, it worked and no other system can possibly have them as well. If it hasn't, the keys are burned and have to be scrubbed. Hasn't happened yet, but who knows.

If ai-web fails, the system will retry after 5 minutes. Mails will be sent to linuxroot from both the AI server and the installing system, indicating that this happened. The reason is usually that the prepare-ai step was forgotten. Remember it has to be repeated before every reinstallation. The ai daemon writes a log /var/log/ai/ai_script.log.

Booting the system into installation

There are several options:

  • Perl Script
    If the system is still running a working SL installation, this is the most convenient and reliable method: After logging on as root, run the script

    /project/linux/SL/scripts/SLU.pl yes please
    The script will create an additional, default, boot loader entry to start the installation system. By default, all needed information is appended to the kernel command line, including networking information. Hence not even DHCP is needed. The script comes with full perldoc documentation. Some additional options are available or may even be necessary for certain hosts. To mention the two most important ones:
    • -dhcp will make the installation system use dhcp to find the IP address, which is useful if a system will be installed with a different address

    • -reboot will make the system reboot itself after a countdown (which can be interrupted with ^C)

  • PXE
    This requires entries on both the DHCP and TFTP servers. The client will receive IP, netmask, gateway etc from the DHCP server, plus the information that it should fetch "pxelinux.0" from the TFTP server, and run it. Then, pxelinux.0 will request the host configuration file (IP address in hex notation) from the TFTP server (a link in /tftpboot/pxelinux.cfg/). This will in turn tell pxelinux.0 which kernel & initrd to retrieve from the TFTP server, and what parameters the kernel should receive on the command line. As root on pallas, run

    /project/linux/SL/scripts/pxe <host>
    This will add the right link for the system in /tftpboot/pxelinux.cfg
    • and also attempt to update the DHCP configuration on the right server.
    The changes on the DHCP server will be reverted the next time the dhcp feature runs, so no need for cleanup. To get rid of the link in /tftpboot/pxelinux.cfg, simply run (as root on pallas)
    /project/linux/SL/scripts/unpxe <host>
    If the client boots via PXE afterwards, it will pick up the default configuration which tells it to boot from its local disk. Anyway, using PXE is not recommended for systems which have no "one time boot menu" or a "request network boot" key stroke.

Package Handling & Automatic Updates

See the "aaru" feature for how all this (except kernels) is handled.

There are three distinct mechanisms for package handling on the client:

  • aaru (package updates)
    Handled by the aaru feature, the scripts /sbin/aaru.yum.daily and /sbin/aaru.yum.boot run yum to update installed packages. Yum [2] is told to use specific repository descriptions for these tasks, which are created by /sbin/aaru.yum.create before, according to the values of VAMOS variables OS_ARCH, CF_SL_release, CF_YUM_extrarepos*, CF_YUM_bootonly* and CF_DZPM_AGING.

  • yumsel (addition and removal of packages)
    Handled by the aaru feature, the script /sbin/yumsel installs additional packages or removes installed ones. Configuration files for this task are read from /etc/yumsel.d/, which is populated by /sbin/yumsel.populate before, according to the values of VAMOS variables CF_yumsel_*.

    (!) yumsel documentation, including the file format, is available with perldoc /sbin/yumsel

  • KU[SL(3)] (everything related to kernels)
    Handled by the kernel feature, this script deals with kernels and related packages (modules, source), according to the values of VAMOS variable Linux_kernel_version and a few others. On SL5, KUSL3 was replaced by KUSL, and this should happen eventually on SL3/4 as well. SL3/4 systems in update class "A" already use KUSL.pl.

SL Standard & Errata Packages

<!> For yum on SL3, the command to create the necessary repository data is yum-arch <dir>
For SL4/5, it is createrepo <dir>

<!> Neither should be run manually.

(!) Running the script /project/linux/SL/scripts/UpdateRepo.pl -x </path/to/repo> on pallas will do the right things to update the yum repodata, the repoview data accessible from Local_Linux_Repositories, and release the right afs volumes where applicable.

Errata are synced to pallas with /project/linux/SL/scripts/sync-pallas.pl (still manually). Packages to be installed additionally by /sbin/yumsel or updated by /sbin/aaru.yum.boot and /sbin/aaru.yum.daily are NOT taken from the errata mirror created like this, but instead from "staged errata" directories created (also, still manually) by the script /project/linux/SL/scripts/stage-errata. The sync/stage scripts send mail to linuxroot unless in dryrun mode. The stage_errata script is fully perldoc'ed, the others are too simple.

Addon Packages (Zeuthen)

These are generally found in /nfs1/pallas/SL/Z, with their (no)src rpms in /afs/ifh.de/packages/SRPMS/System and the source tarballs in /afs/ifh.de/packages/SOURCES (for .nosrc.rpms). Some come from external sources like the dag repository (http://dag.wieers.com/home-made/), freshrpms (http://freshrpms.net/) or the SuSE 8.2/9.0 distributions. These latter ones are typically not accompanied by a src rpm.

Available subdirectories under Z:

Path

Repo used by

Intended Use

common

<!> all systems

really common packages

5/noarch

all SL5 systems

typical addons for a major release

5/i386

all 32-bit SL5 systems

5/x86_64

all 64-bit SL5 systems

54/noarch

all SL5.4 systems

bug fixes included in next release

new addons incompatible with last release

54/i386

all 32-bit SL5.4 systems

54/x86_64

all 64-bit SL5.4 systems

54/INSTALL/noarch

all SL5.4 systems

as above, but available already during system installation

54/INSTALL/i386

all 32-bit SL5.4 systems

54/INSTALL/x86_64

all 64-bit SL5.4 systems

The distinction fo the INSTALL repositories is necessary because some of our add-ons do not work correctly when installed during initial system installation. Notice these repos may be populated by symlinks to packages or subdirectories, e.g. afs -> ../../x86_64/afs , but the metadata must be updated separately.

After adding a package, make it available to yum like this:

/project/linux/SL/scripts/UpdateRepo.pl -x /path/to/repo

<!> Do not use createrepo manually.

Selectable Addon Packages (Zeuthen)

There's a way to provide packages in selectable repositories. For example, this was used to install an openafs-1.2.13 update on selected systems while the default for SL3 was still 1.2.11, and we didn't want to have 1.2.13 on every system.

These packages reside in directories Z/<major release>/extra/<arch>/<name> on the installation server. For example, the afs update packages for SL3/i386 would be in /nfs1/pallas/SL/Z/3/extra/i386/afs1213 . To have clients access this repository, set any vamos variable starting with CF_YUM_extrarepos (CF_YUM_extrarepos or CF_YUM_extrarepos_host or ...) to a space separated list of subdirectories in <arch>_extra.

For example, CF_YUM_extrarepos='afs1213' will make aaru.yum.create add this repository (accessible via nf or http) to the host's yum configuration.

To make available packages in such a repository, you must provide the full path, including the *sub*directory, to the repo update script:

/project/linux/SL/scripts/UpdateRepo.pl -x /nfs1/pallas/SL/Z/3/extra/i386/afs1213

Note that matching kernel modules must still reside in a directory searched by the update script (see below). This should generally not cause problems since these aren't updated by yum anyway.

Additional Modules for Kernel Updates

{i} Starting with SL5, KUSL3.pl is being replaced by KUSL.pl. As of May 2007, the new script is still being tested on SL3/4, but eventually should be used on all platforms. SL6 uses an even newer script called KU.pl

Handled by the kernel feature, the script /usr/sbin/KUSL3.pl reads its information about which kernels to install from VAMOS variables Linux_kernel_version and a few others, and carries out whatever needs to be done in order to install new kernels and remove old ones. The script is perldoc'ed.

Basically, set Linux_kernel_version in VAMOS, and on the host (after a sue.bootstrap) run KUSL3.pl. Make sure you like what it would do, then run KUSL[3].pl -x.

Kernels and additional packages are found in the repository mirror including the errata directory (CF_SL_release is used to find those), and in /afs/ifh.de/packages/RPMS/@sys/System (and some subdirectories).

If the variable Linux_kernel_modules is set to a (whitespace separated) list of module names, KUSL(3) will install (and require the availability of) the corresponding kernel-module rpm. For example, if Linux_kernel_version is 2.4.21-20.0.1.EL 2.4.21-27.0.2.EL, and Linux_kernel_modules is foo bar, the mandatory modules are:

  • name

    version

    release

    kernel-module-foo-2.4.21-20.0.1.EL

    latest

    latest

    kernel-module-bar-2.4.21-20.0.1.EL

    latest

    latest

    kernel-module-foo-2.4.21-27.0.2.EL

    latest

    latest

    kernel-module-bar-2.4.21-27.0.2.EL

    latest

    latest

Generally speaking, kernel module packages must comply with the SL conventions. The new KUSL.pl will also handle packages complying with the kmod conventions introduced with RHEL5.

KU(SL)(3) will refuse to install a kernel if mandatory packages are not available. Non mandatory packages include kernel-source, sound modules, kernel-doc.

ALSA

obsolete

ESD CAN Module (for PITZ Radiation Monitor)

obsolete

Nvidia

32-bit SL5

Only the 'gen2' packages are required:

rpmbuild --rebuild --sign --define 'kernel 2.6.18-308.8.1.el5' --define 'nvgen 2' /packages/SRPMS/System/nvidia/nvidia-driver-gx-080102-3.sl.src.rpm --target i686

64-bit SL5

We need the 'gen2' and 'gen3' packages:

rpmbuild --rebuild --sign --define 'kernel 2.6.18-308.8.1.el5' --define 'nvgen 3' /packages/SRPMS/System/nvidia/nvidia-driver-gx-120416-1.sl.src.rpm

rpmbuild --rebuild --sign --define 'kernel 2.6.18-308.8.1.el5' --define 'nvgen 2' /packages/SRPMS/System/nvidia/nvidia-driver-gx-080102-3.sl.src.rpm

SL6

On SL6, the nvidia drivers are packaged as "kABI-tracking kmods". It shouldn't be required to rebuild them for normal kernel updates.

Lustre

64-bit SL5

We only provide these for the normal (non-Xen) kernel:

KVERSION=2.6.18-308.8.1.el5 rpmbuild --rebuild --sign /packages/SRPMS/System/lustre/lustre-1.8.7-1.wc1.1.src.rpm

The modules also have to be copied to the NAF.

64-bit SL6

KVERSION='2.6.32-220.17.1.el6.x86_64' rpmbuild --rebuild --sign /packages/SRPMS/System/lustre/lustre-1.8.7-1.wc1.2.el6.src.rpm

XFS

obsolete

ARECA RAID (SL4)

obsolete

Adding a new SL release

There are quarterly releases of SL, following Red Hat's updates to RHEL. Each new release must be made available for installation and updates. The procedure is the same for SL3 and SL4. Just substitute filenames and paths as appropriate:

Step 1: Mirror the new subdirectory

  • Create a new logical volume on a:
    lvcreate -L 30G -n SL44 vg00
  • Add an according line in /etc/fstab (mount with the acl option)

  • Create the directory, mount the volume, and make sure permissions and security context are right:
    chgrp sysprog /nfs1/pallas/SL/44
    chmod g+w /nfs1/pallas/SL/44
    getfacl /nfs1/pallas/SL/43 | setfacl --set-file=- /nfs1/pallas/SL/44
    chcon system_u:object_r:httpd_sys_content_t /nfs1/pallas/SL/44
    The last command makes it possible to access the directory through apache. The chgrp and chmod are actually redundant if ACLs are used.
  • Modify sync-pallas.pl to include the new release. Now sync-pallas.pl (do a dryrun first, and look wether additional subdirectories should be excluded).
  • If you're using xrolling for testing, make a link like this:
    /nfs1/a/SL/44 -> 40rolling
  • Check access through http

Step 2: Create staged errata directories

Modify /project/linux/SL/scripts/stage-errata.cf to include the new release. Note if you're trying 30rolling as a test for the release, you must configure 30rolling, not 304 (or whatever). The same for SL4. Now run stage-errata.

Step 3: Make the kernel/initrd available by TFTP for PXE boot

Run the script

/project/linux/SL/scripts/tftp_add_sl_release 44 i386

and accordingly for other releases and architectures. This will copy the kernel and the initrd, and create a pxelinux configuration file. You may still want/have to add a few lines in /tftpboot/pxelinux.cfg/default (for example, for Tier2 installs).

Step 4: Make the release available in VAMOS

Fire up the GUI, select "vars" as the top object, go to CF_SL_release, choose the "values_host" tab, and add the new value to the available choices. Set it on some test host.

Step 5: test

Make sure this works and sets the right link:

  /project/linux/SL/scripts/pxe <testhost>

Make sure this chooses the right directory:

  cd /project/linux/SL/profiles
  ./CKS.pl <testhost>

Make sure SLU works correctly:

  ssh <testhost>
  /project/linux/SL/scripts/SLU.pl yes please

Try an installation:

  • activ-ai <testhost>

  • then boot it

Try updating an existing installation:

  • set CF_SL_release for the host in VAMOS

  • sue.bootstrap

  • sue.update aaru

  • have a look into /var/log/yum.log, and check everything still works

Booting a rescue system

There are several ways to do this, including:

From CD1 of the distribution

Simply boot from CD1 of the distribution. At the boot prompt, type linux rescue.

Over the network using PXE

Make sure the system gets the "next-server" and "filename" responses from the dhcp server, but there's no link for the system in /tftpboot on the install server. At the boot prompt, then enter something like "sa53 rescue".

Building Kernel Packages

32-bit SL3

First install the kernel srpm (not kernel-source). Make your changes to the spec, add patches etc.

rpmbuild --sign -ba kernel-2.4.spec

This will build

  • kernel.src.rpm
  • kernel-source.i386.rpm
  • kernel-doc.i386.rpm
  • kernel-BOOT.i386.rpm

rpmbuild --sign -ba --target i686 kernel-2.4.spec

This will build

  • kernel.i686.rpm
  • kernel-smp.i686.rpm
  • kernel-hugeme.i686.rpm
  • kernel-unsupported.i686.rpm
  • kernel-smp-unsupport.i686.rpm
  • kernel-hugemem-unsupported.i686.rpm

Trying to turn off build of the hugemen kernel breaks the spec.

Additional modules for these are built as for any other SL kernel, with one exception of course:

Building the 1.2.x kernel-module-openafs (and openafs) packages

For ordinary SL kernels, this is done at FNAL, hence we needn't bother. But for our own kernels, or if we want to apply a change to the SRPM, we have to do this ourselves.

The kernel version must always be defined, and always without the "smp" suffix.

Otherwise, when building on an SMP system, the build will work but the modules will not.

For each kernel version you want to build modules for, install the kernel-source, kernel, and kernel-smp RPMs on the build system, they're all needed. Then:

  • To build the base packages on i686:
    PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --define 'kernel 2.4.21...' openafs-...src.rpm
  • To build the kernel module packages (UP and SMP) on i686:
    PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --target i686 --define 'kernel 2.4.21...' openafs-...src.rpm
  • To build the base packages plus the kernel modules (UP and SMP) on x86_64:
    PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --define 'kernel 2.4.21...' openafs-...src.rpm
  • To build just the kernel modules (UP and SMP) on x86_64:
    PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --define 'kernel 2.4.21...' --define 'build_modules 1' openafs-...src.rpm
  • To build just the kernel module for ia32e (intel CPUs) on x86_64:

    PATH=/usr/kerberos/bin:$PATH rpmbuild --rebuild --sign --target ia32e --define 'kernel 2.4.21...' openafs-...src.rpm

Building the NEW 1.4.x kernel-module-openafs packages

As of December 2006, there exists a unified SRPM for OpenAFS 1.4.2+, which doesn't have the build problems described above, and works in exactly the same way on SL3, SL4, and SL5. It's named openafs.SLx, but will create packages named openafs with SL3, SL4, SL5 in the release number. The SRPM can (and should) be rebuilt without being root. The steps are the same on every platform:

First install the right kernel-source (SL3) or matching kernel[-smp|-largesmp|-xen|...]-devel package for the target kernel.

Then run:

rpmbuild --rebuild --sign --target  i686 --define 'kernel 2.4.21...' --define 'build_modules 1' openafs.SLx-...src.rpm

There's always just one module built per invocation. Building on SMP systems is ok.

Supported targets include i686, athlon (SL3 only), ia32e (SL3 only, must build on 64bit system), x86_64 (must build on 64bit system), ia64 (untested).

Supported kernel flavours include smp (SL3/4), hugemem (SL3/4), largesmp (SL4), xen (SL5), xenU (SL4), PAE (SL5).

Building the 1.6.x kmod-openafs packages (SL6)

Normally, this should not be required as these packages needn't be updated with the kernel as long as the kernel ABI used by the modules is kept stable. At the time of writing (late in the 6.2 cycle) , this has been the case since EL6 GA - the modules in use are still the ones built against the very first SL6 kernel.

If it's required to rebuild against a new kernel:

rpmbuild --rebuild --sign --define 'kernel 2.6.32...' --define 'build_kmod 1' openafs.SLx-1.6.....src.rpm

Notes on 10GbE

Cards

So far we have used two different cards:

  • Intel 82598EB
  • Broadcom NetXtreme II BCM57710

The intel card is used in zyklop41..43. It has no PXE ROM (but supposedly can be flashed), is cooled passively, and works out of the box. It can also do 1Gb, but not 10/100. Throughput is not very good, maybe because the cards won't fit into the PCIex8 Slot in the Dell R510.

The Broadcom card is used in dusk & dawn so far. It does have a PXE ROM (not tested yet though), has a cooling fan :-( , and currently needs an updated driver because the bnx2x driver in the SL5.6 kernel (2.6.18-238.*) has a bug. The host will need a powercycle after the SL5.6 driver was used, a reset is not sufficient. The driver update is packaged in kernel-module-bnx2 rpms. These can currently only be built if the target kernel is running on the build host. The driver update should become unnecessary with SL5.7. The cards (purchased from Dell) can only do 10000baseT - not even Gigabit is supported, hence they must be connected to a 10GbE switch with an appropriate cable.

LRO/TPA and bridged networking

Large Receive Offload is a hardware feature significantly enhancing throughput. Alas, it is incompatible with bridged networking. To use such a device with Xen or KVM virtualization, LRO has to be turned off by specifying options bnx2x disable_tpa=1 in /etc/modprobe.conf (or modprobe.d/...). Otherwise, all kinds of weird things happen although the network basically works. Unfortunately, this reduces the throughput as measured with qperf (under the Xen kernel in DOM0) to just 30% of what can be achieved with the normal kernel and LRO enabled. NB "TPA" = "Transparent Packet Aggregation".

Workarounds for certain hardware/OS combinations

SL5 on Dell Precision T3500

Requires vars.CF_kernel_append='acpi_mcfg_max_pci_bus_num=on' . Without this, performance is very bad.

SL6 on Dell Poweredge M620 with HT disabled

Requires vars.CF_kernel_append='acpi=ht noapic' . Without this, performance is abysmal.

References

http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/sysadmin-guide/ch-kickstart2.html

http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/en-US/System_Administration_Guide_/Kickstart_Installations.html

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Installation_Guide/ch-kickstart2.html

http://linux.duke.edu/projects/yum/

SL Operations Manual (last edited 2017-01-16 10:23:13 by TimmEssigke)