Differences between revisions 167 and 168
Revision 167 as of 2012-10-04 15:13:25
Size: 42922
Comment:
Revision 168 as of 2013-02-27 17:18:33
Size: 42976
Comment:
Deletions are marked like this. Additions are marked like this.
Line 609: Line 609:
The build requires more than 9.5 GB of disk space.

Status

  • 2010-04-22: RHEL6 beta is released - let's start
  • 2010-05-06: working kickstart yielding a system with afs, root login via ssh key, user login via gssapi
    • still requires a hostname.part file in the profiles directory
  • 2010-05-07: after kickstart, system is ready to run features
  • 2010-05-24: after installation, the sue mechanism is active
  • 2010-08-19: beta 2 refresh is operational
  • 2010-10-08:
    • kickstart installation works according to the SL operations manual, at least for the test systems
    • yields a system with afs, root login via ssh key or gssapi, user login via gssapi, sue/yumsel fully operational, and the most important features working
  • 2010-10-11: there are working VAMOS defaults for generic desktops (sl6-desktop-nogroup-def) and generic workgroup servers (sl6-wgs-def)
  • 2010-10-29:
    • software selection is substantially complete
    • the truetype core fonts are now installed
  • 2010-11-02: installations preserving filesystems should now work
  • 2010-11-25: work completed since RHEL6 GA release
    • we can install and maintain a true RHEL6 system just as well as SL6
    • systems install current updates for their class in the first place
    • preserving filesystems should work now
  • 2010-11-27: reasonable default setup for xterm, including "our" default font (works fine with UTF-8)
  • 2010-12-08:
    • we can install and maintain SL6 alpha1 systems (CF_SL_release=6rolling)
      • SELinux is disabled after installation. To enable: touch /etc/.autorelabel, modify /etc/sysconfig/selinux, and reboot
    • presentation in technical seminar scheduled for 2011-01-25
  • 2010-12-22:
    • the default tree is now 6rolling (currently alpha3), SELinux is ok now
    • openafs packaging is close to final and should show up in 6rolling/testing soon
  • 2010-12-27: virtualization works (SL6 as both host and vm, and both in combination with SL5)
  • 2011-01-24: work completed in the past month:
    • fully working desktops, including gdm config (as far as possible)
    • openafs 1.6 prerelease included in SL, selinux woes sorted out, no more need for openafs-selinux.rpm
    • postfix null client configured correctly
    • monitoring works (including hw/temperature monitoring on dells)
    • RPM GPG keys for all local maintainers are installed on each system
  • 2011-09-16
    • available on PCs for early adopters, but no public announcement yet
    • not yet available on Farm/WGS - hangs on typical hardware for this purpose are not understood
    • started using it on servers
  • 2011-09-25
    • public preview systems: a VM with alias sl6, and PC nomos23 in 2L01
    • possible progress on the Farm/WGS hardware problem: see /Nehalem Hangs

Completed

  • Features: xntp, sue, hosts, netgroup, ssh, tcp_wrapper, passwd, group, nsswitch, klogin, aaru, syslog, linux, security, passwd_prog (verified passwd works), kerberos (including a heimdal rebuild), name_srv (client only), motd, pam, tidy_up, afs_client, kernel, gdm, sudo, zzz (necessary for setting /etc/sue_last_updated, needed for /usr/sue/etc/sue.run-sched-wrapper ), dhcp, osm (acs client only), automount

  • teach CKS.pl to "upgrade" ext3 to ext4 by default, and add an option to prevent it
  • read host.ks in %pre, and cut out the partitioning - or teach CKS.pl to only write host.part for SL6
  • remapgroups (may want to check a full install later, though)
  • nvidia driver packages (for recent cards only). install, reboot, works
    • kabi-tracking kmods. non-whitelisted symbols are being used, but on SL6 we have dependencies for the whole ABI, hence we can check.
  • preliminary, but usable openafs packages, also using kabi-tracking kmods
    • of course, using lots of non-whitelisted symbols
    • adding (part of) the kernel release to the kmod release makes it possible to coinstall kmods for different kernels (and KU.pl will do it if required)
  • usrlocal
    • major removal of old cruft

    • no more links into shared filesystems in /usr/local/bin and /usr/local/lib

    • now split into usrlocal (user stuff, interactive systems only) and usrlocal-base (essential things also on servers)
  • profiles (unmodified)
  • virtualization
    • SL6 as Xen guest on SL5 host
    • SL6 as KVM guest on SL6 host
    • SL5 as guest on SL6 KVM
  • configure and turn on the local firewall!
  • nagios client, check_ipmisel
  • PERC5/6/H monitoring
  • sendtempd
  • cups,vamos,cfengine
  • scout

Next Steps

  • continue adapting features: arcx, ganglia, ldap
    • probably drop?: atlas, conmgr, inetd, kvm, products, trusted, ypclient
  • provide public preview wgs & desktop

  • VMs on Desktops?
    • now possible even if the nvidia driver installed
    • support a Windows 7 VM on Linux Desktops?
  • VDI, SPICE
    • virtual desktops hosted in the CC
    • enhanced access to those and local VMs via SPICE

Loose Ends

  • SCSI device numbering has become totally unpredictable with 6.1
  • we probably must use /dev/disk/by-path
  • by-id is inconsistent between SL5 and SL6
  • no clue yet how to actually deal with this
  • on the other hand, it will allow us to keep USB on during installs (giving us the keyboard to debug...)
  • works on most systems in practice, so let's postpone this a bit
  • CKS.pl should fix references to xvda on kvm guests and references to vda on xen guests automatically
  • we should mount ext4/afs with -onobarrier if storage has a bbu backed cache

  • evolution wants to convert its local folders when started the first time
    • and that just hung - problem with AFS?
    • is this backwards compatible?
    • => test!

  • disable user switching (there's a lockdown gconf setting for this)
  • guest account:
    • zsh wants a .zshrc - we should probably copy the stuff from /etc/skel
    • and it's hard to choose the session type and seems impossible to have the choice stored (do we care?)
  • for the time being, we symlink to the SL5 /opt/products/perl because lots of scripts won't work without it
    • alas, this has never existed as a 64bit build, hence we are forced to install glibc.i686
  • nvidia driver:
    • access to /dev/nvidia* is not yet allowed for console users
      • modifiy /etc/security/console.perms in nvidia-x11-drv.rpm?
      • doesn't work - rebased rpms to the elrepo ones for el6, and now permissions are 0666 (makes dri work, and is no big deal if we stop turning PCs into WGSs...)

    • may need different "generations" of nvidia-x11-drv and the associated kmod, for older cards
      • the current 260.19.12 works on nv295 (T3500), nv290 (T3400), nv285 (390, 380 = satyr > 60)

      • to check: nv280 (370) - these are 21 PCs, 6 of which were off when checked 2010-11-02, age >= 4.5 years

        • according to nvidia lists, it is not supported by the current driver
        • notice: satyr80 (380) has an nv280 fitted and generally looks strange (CPU,...) - probably special setup for *CAD then returned in exchange for a new model and the CAD video card replaced with one from a retired early satyr... Bastelkram, immer wieder toll... X-(

  • "interactive.ys" should be split into productivity/development/rest
  • init scripts installed by cfengine have the wrong security label
    • seems to cause no harm so far, but fixing this is probably a good idea
    • we probably want unconfined_exec_t instead of initrc_exec_t for our stuff anyway
    • => use semanage fcontext

  • turn off undesired services
    • do we want acpid?
      • required on KVM VMs
    • cgroups need a daemon? oh no...
  • the ai web server really needs the policy change to allow it AFS access without a delay
  • more s-bits to remove? (check candidate list below)
    • and create a new candidate list
  • rsyslogd listens on the network - there seems to be no switch to change this?
  • untested on HP, in particular ASM/ADU
  • no way to have special X configs yet (multihead for PITZ / ATLAS controls) - should be straightforward though
  • when displaying utf8 files with foreign, not installed fonts an information box pops up asking whether those fonts should be installed ... two possible solutions: either disable this pop up or install all fonts ...
  • https://bugzilla.redhat.com/show_bug.cgi?id=663045 (nfs service does not stop during shutdown)- is this serious?

    • the nfs init script is fixed in nfs-utils-1.2.3-4 coming with RHEL 6.2 beta; alas, not the rpcidmapd one
  • Firefox is unable to use client certificates that have been imported into the profile within an SL5-Firefox

Fixed Ends

  • libcgroup now creates a new group cgred
    • with an arbitrary ID (>= 500)

    • and installs /bin/cgexec setgid cgred
    • => assign an ID, adapt DL_remapgroups (done )

  • mkconf uses vamos-web because it checks for /opt/products/perl/5.8.8 and doesn't know that we have a working client using /usr/bin/perl
  • syslog: slightly different format on the loghost, may defeat logsurfer? (fixed: see "Logging" below"
  • our profiles assume that resize is available (fixed: only used if available)

    • that's from the xterm package, which is obviously optional
    • why is this executed for root's bash anyway?
    • and why does it look as though it's not executed for a user's zsh?
  • our profiles assume that ifhnews and tklife are available (fixed: only used if available)

    • which is not the case on servers, nor are they a necessity there
  • turn off undesired services
    • hald is still required
    • do we want sssd? - no
    • get rid of pcscd (2010-11-25: done in SL_no_sc_daemons, along with openct)
  • get rid of nslcd when using flat files for nsswitch (and that's the only case foreseen) - done 2010-11-26 (nsswitch feature)
    • make sure ldapsearch actually works without (done)
  • tt core fonts should only be installed on PCs/WGS (and cabextract is needed only there) - done 2010-11-25
  • SLU.pl doesn't work yet (needs vamos_cmd - is fixed 2010-10-29, now that we have vamos_cmd)
    • for the time being, simply use SL6U.pl instead
  • sue.run-tidy failure message during boot (fixed - kinit moved from /usr/kerberos/bin to /usr/bin...)
  • PolicyKit: no way to avoid installation on interactive systems, probably not even WGS

    • need to override some defaults from /var/lib/polkit-1/localauthority/10-vendor.d/10-desktop-policy.pkla (done in SLZ_desktop_policy.rpm)
      • in particular, we probably want to disallow hibernate and suspend...
        [No Suspend or Hibernate]
        Identity=*
        Action=org.freedesktop.devicekit.power.hibernate;org.freedesktop.devicekit.power.suspend
        ResultAny=no
        ResultInactive=no
        ResultActive=no
  • make sure NetworkManager won't run - needs trigger! - done in SL_no_NetworkManager.rpm 20101008)

  • mail (postfix) works, identical configuration as on SL5
  • check SL_password_for_singleuser actually works - 20110119: it didn't; working srpm offered on sl-devel
  • users received update notifications due to /etc/xdg/autostart/gpk-update-icon.desktop - install SL_disable-update-notification on any desktop
  • https://bugzilla.redhat.com/show_bug.cgi?id=690832 broke the usefulness of OpenSSH's GSSAPI key exchange patch. Fixed by installing a modified krb5.conf and the usage of pam_afs_session.

  • failure messages during boot from ntpdate service were caused by a missing DELAY=0 in the bridge configuratio, fixed in SL_create_kvmbridge.rpm

  • on virtualization hosts, the virbr0 default network is now removed automatically (SL_no_kvm_defnet) - this also rids us of dnsmasq
  • gvim prints out an error regarding libpk-gtk-module.so, but works so far. This lib is not installed on non-desktop systems, installing it would pull in a whole PolicyKit installation - does it do something evil? Probably not - it's PackageKit, not PolicyKit. Anyway: funny, it's not doing this for me. sw. It is used to install missing fonts on the fly using packagekit. The gtk module loading is controlled by a gconf setting for each module like /apps/gnome_settings_daemon/gtk-modules/canberra-gtk-module . gw

    [blade8e] ~ % gvim
    Gtk-Message: Failed to load module "pk-gtk-module": libpk-gtk-module.so: cannot open shared object file: No such file or directory
    • 2011-09-15: we install this on WGS now

Missing/Broken Software

Added/Fixed Software

  • EPEL:
    • gv: segfaults when closed - fixed in 3.7.1
    • alpine was added (+ config)
    • ddd: who cares if it uses lesstif
  • Adobe reader complains about missing theme and looks strange -required gtk2-engines.i686
  • flash plugin (64-bit version 11): final version is out
    • upgrade from 32bit to 64bit beta tricky: both packages are x86_64 but have different "color" solved by disabling automatic deps

  • cernlib (native rpms installing into /opt/cernlib, + ini)
    • currently not being installed, and we probably shouldn't
  • ROOT: an initial build of 5.28.00 for /opt is available
    • currently not being installed - IT strategy is to declare it VO software
    • 5.28.00 fails to build against current dcap-devel (2.47.2-2.el6 ) from EPEL with messages that "dc_xyz was not declared in this scope"
      • configure searches /products/dcache/include - where we have a version from 2002 lingering
      • fixed by telling configure to search /usr/{include,%{_lib}}
    • added support for postgresql, graphviz
    • fairly complete buildrequires
    • handcrafted requires for main package
  • freemind should probably be dropped - we do install vym (coming with the distro). Is this used at all?
  • legacy stuff:
    • xdvi
    • xv
    • plan
    • xcalc
  • nvu is dead. Replaced with Kompozer (seems a bit dead itself, but found an f13 srpm that builds fine on SL6)
  • matlab (Versions R2010b and R2011a) is available (local and AFS links)
  • maple (Versions 14 and 15) is available (local and AFS links)
  • pgi (Version 2011) is available (local and AFS Links)
  • intel (Version 2011, now 12.0.4) is available (only AFS Links, no rpms)
  • CUDA: Toolkit installed on the latest desktops, should work with the normal Nvidia drivers
  • Lustre: whamcloud actually forged a 1.8.6 release (currently "wc1"), and teh client works on EL6. Old info:
    • Lustre Client -- 1.8.4 and 1.8 branch in git don't build yet, but a later version hopefully will
      • nope, 1.8.5 won't build either :-(

      • oracle has stopped all lustre development, most developers are with whamcloud or xyratex now
      • whamcloud announced that it will push a "community release" (ETA: summer)
        • this will be 2.1 - these clients will not work with 1.8 servers
      • this is a showstopper on farm nodes / wgs / transfer
      • patched Lustre 1.8.5 installed on sl6-build. It works but puts scary messages into syslog - and it crashes the machine occasionally

Software no longer provided centrally

  • CLHEP? (needed for GEANT4)
  • GEANT4? (probably not - declare VO software)
  • ROOT issues:
    • dcache access is untested
    • no GEANT4 support yet (since there is no GEANT4 build)
    • no oracle support yet (since there is no oracle client) - do we need this?
    • no support for pythia8 yet
    • pythia6 should probably become a standalone package (and so should pythia8)
    • no rfio support yet (since there are no castor packages) - do we need this?
    • qt support: (using qt4)
      • qt (BNL) is broken (try exiting the application when you have "Gui.Backend: qt" and "Gui.Factory: qt" in your ~/.rootc ...)
      • qtgsi kind of works (example1 can be built with a bit of work on the Makefile, example2 can't)

Open Questions

  • restorecond is no longer running by default - do we want it?
  • RHEL6 does not ship an X font server (xfs) - do we still have to provide this service?
  • allow strigi? (does it work with AFS?)
    • this is required by KDE...
  • should we disable all unneeded kernel modules? (security)
  • AFS:
    • use -dynroot-sparse? (by default, only the local cell will show up in /afs, others only when accessed)
    • roll out a cell alias (zeuthen.desy.de)?
  • software: why not have "default" packages for common stuff like maple, mathematica, root?
  • qrsh doesn't work with enabled local firewall - how to deal with this?
    • first try: iptables rule input-accept-qrsh-farm

Settled Questions

  • disable IPv6? yes, doing it the right way (options ipv6 disable=1) seems ok and is now the default
  • allowing passwordless local login with pam_succeed_if.so in /etc/pam.d/gdm no longer works
    • this just moved into /etc/pam.d/gdm-password
  • X works without any configuration
    • still true with non-US keyboards?
      • no problem, need not mention keyboard in xorg.conf. User can simply choose the layout in gdm (persistent!) and GNOME.
    • probably no longer true when the proprietary nvidia driver has to be used
      • no problem, the xorg.conf only has to contain one trivial section:
        Section "Device"
                Identifier  "Device0"
                Driver      "nvidia"
                VendorName  "NVIDIA Corporation"
        EndSection
      • also required for nvidia driver: boot with kernel parameters rdblacklist=nouveau nomodeset, disable nouveau in modprobe.d:

        blacklist nouveau
        options nouveau modeset=0
  • in any case: do we really have to support non-US keyboards?
    • there are 3 Linux PCs with german keyboard - two in the computing centre, and one with a user not even speaking german...

    • even if we do, probably doesn't warrant automation
    • sequence to test: DISPLAY= system-config-keyboard --noui {de-latin1-nodeadkeys|us} ; system-setup-keyboard

      • caution: this messes up /etc/X11/xorg.conf
    • => do not fiddle with keyboard at all, gdm/gnome settings are good enough

    • Dual Monitor setups should work as of 2011-11-28, by having a minimal Screen Section in xorg.conf:
      Section "Screen"
              Identifier "Screen 0"
              Option "TwinView" "True"
      EndSection
      • the secondary monitor should be placed right of the primary one
      • the primary monitor should be the one with the higher resolution, if they're different
      • NB xorg.conf is simply owned by nvidia-x11-drv.x86_64.rpm
        • special setups (PITZ vertical arrangement, more than 2 monitors) should use a trigger to modify the file

Changes w.r.t. SL5

Changes from 6.0 to 6.1

  • KVM: vhostnet
    • automatically used?
    • VMs automatically updated?
  • biosdevname
    • at least on newer dell servers, NICs are now em1, em2, ... instead of eth0, eth1, ...
      • fixed in
        • SL_create_kvmbridge/SL_create_kvmvlbridges
        • SLU
      • TODO
        • SL_bnx2_no_* (are these still needed?)

Changes in Local Configuration

  • automounter maps are now tailored for the system; see the autofs-*-mod modifiers and ~TUTILS/generate_autofs_maps
  • /etc/redhat-release has the "original" RH string
    • this prevents problems with 3rd party software checking for this string (like dell FW updates)
    • all cfengine features and scripts are / should be changed so that they can cope with any string in redhat-release (RH/SL/CentOS)
  • most of the kickstart profile is now static (SL6.ks), and the rest created during %pre
    • only the partitioning info is gathered from the .ks file in the profiles directory
  • sue_daily is now run by anacron from /etc/cron.daily
  • yum is configured to only install signed packages
    • to add a public key, (1) distribute it with slz-release.rpm, (2) modify aaru.yum.create in the aaru archive to use it where appropriate
  • we generally install the proprietary nvidia driver
    • needs some work for older PCs (legacy drivers)
  • local desktop users can install packages with PackageKit

    • default yum.conf is tailored for this case now - use yum -c /etc/yum.conf.all manually

    • excludes list in yum.conf.user is substantially complete
  • there is only a very minimal xorg.conf - we rely on the automatic settings as much as possible
  • we do not configure the keyboard - users with german keyboards can do it on the login screen and in GNOME
  • we do not install KDE by default - users can add it on their PC with PackageKit

    • we whitelisted strigi for this - let's hope it doesn't cause trouble
    • Akonadi does cause trouble, and can not be installed, which also prevents kdepim installation

    • we also blacklist kdenetwork and kdeadmin
  • the default locale is en_US.UTF-8 (the RHEL default)
    • we set LC_PAPER to de_DE.UTF-8 so locale aware software will use A4 by default
  • we install the real ksh, not a link to zsh
  • the local firewall is active by default
  • the default filesystem is ext4 - CKS.pl changes this automatically
  • the first disk device of KVM guests is /dev/vda, not /dev/xvda - CKS.pl warns about this
  • the AFS cache location is /var/cache/afs - CKS.pl changes this automatically
  • the AFS client runs in "sparse dynroot" mode - by default, only the local cell shows up in /afs, foreign cells only show up once they have been accessed explicitly
    • this should stop graphical filemanagers from crawling all of /afs
  • we run ntpd on KVM VMs
  • we run acpid on KVM VMs
  • we install the 64-bit firefox
    • the native java plugin comes from the Oracle SDK
    • the 64-bit flash plugin is used
  • flash-plugin is a standalone RPM, not part of DL_firefox
  • installing 32bit is possible, but not foreseen except for very special cases
    • monitoring isn't implemented, no full set of suer software is available, only the basic sl6-def install is tested
    • bare metal installs haven't been tested, nor have KVM VM installs
    • installing as a Xen VM on an SL5 host works and is tested
  • AFS sysname lists:
    • 64-bit: amd64_rhel60 amd64_rhel50  i586_rhel50 amd64_linux26 i386_linux26

    • 32-bit: i686_rhel60                                 i586_rhel50                               i386_linux26

    • the primary 32bit SL6 sysname is not part of the 64bit sysname list
    • the 64bit list is sorted release-first, not arch-first

Info that should be in the release notes, but mostly isn't:

Misc. Changes

  • uname -r now returns something like 2.6.32-19.el6.x86_64 instead of 2.6.32-19.el6

  • kernel now requires kernel-firmware >= `uname -r` (kernel-firmware can not be co-installed in several versions)

  • portmap is now called rpcbind
  • the GNOME trashbin finally works :)

  • the default limit for the number of processes is now 1024
    • SL5: 409600
    • SL4: 16512
    • SL3: 7168
  • $PATH for normal users now contains /usr/local/sbin:/usr/sbin:/sbin at the end

KDE problems

  • akonadi won't work, and throw annoying error dialogs at users during login and whenever an application starts that needs it
    • can be prevented by not installing it, and thus not installing kdepim (at least)
  • the "leave" menu offers supend and hibernate, and the hack above that disables it in GNOME doesn't work
    • disabled by adding an appropriate /etc/pm/sleep.d/000no-hibernate to SLZ_desktop-policy
  • no way to get rid of the buttons though

Cron

vixie-cron was replaced by cronie. The cron.{daily,weekly,monthly} mechanism is now run by cronie-anacron, see /etc/anacrontab:

# /etc/anacrontab: configuration file for anacron

# See anacron(8) and anacrontab(5) for details.

SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
# the maximal random delay added to the base delay of the jobs
RANDOM_DELAY=45
# the jobs will be started during the following hours only
START_HOURS_RANGE=3-22

#period in days   delay in minutes   job-identifier   command
1       5       cron.daily              nice run-parts /etc/cron.daily
7       25      cron.weekly             nice run-parts /etc/cron.weekly
@monthly 45     cron.monthly            nice run-parts /etc/cron.monthly

I guess this means these tasks will be executed after boot if they were missed, with a random delay of 5..50 minutes? NB after the next full hour or after boot?

It seems the random delay also applies to the nightly runs:

May 11 03:01:01 t34 CROND[26413]: (root) CMD (run-parts /etc/cron.hourly)
May 11 03:01:01 t34 run-parts(/etc/cron.hourly)[26413]: starting 0anacron
May 11 03:01:01 t34 anacron[26423]: Anacron started on 2010-05-11
May 11 03:01:01 t34 anacron[26423]: Will run job `cron.daily' in 23 min.
May 11 03:01:01 t34 anacron[26423]: Jobs will be executed sequentially
May 11 03:01:01 t34 run-parts(/etc/cron.hourly)[26425]: finished 0anacron

And in fact, it's random every time:

[root@t34 ~]# grep 'Will run job.*daily' /var/log/cron
May 10 19:01:01 t34 anacron[6939]: Will run job `cron.daily' in 31 min.
May 11 03:01:01 t34 anacron[26423]: Will run job `cron.daily' in 23 min.
May 12 03:01:01 t34 anacron[30285]: Will run job `cron.daily' in 19 min.
May 13 03:01:01 t34 anacron[1682]: Will run job `cron.daily' in 13 min.
May 14 03:01:01 t34 anacron[3398]: Will run job `cron.daily' in 27 min.

That's certainly not what we want => set random delay to 0, and add a first job with a deterministic delay based on hostname hash?

sue_daily should be run by cron.daily to avoid clashes with other cron.daily tasks. The anacron behaviour may or may not be what we want, not sure yet.

Update: We currently run sue_daily from /etc/cron.daily/00_sue, an we set the RANDOM_DELAY to 0 in /etc/anacrontab. This makes sure we get a deterministic per host delay and serializes sue_daily with the normal maintenance tasks on SL systems. This is likely to be the right solution.

Kickstart

  • ks=http://..../profiles/ doesn't work :-( (will retrieve the directory listing as kickstart file)

    • => use generic profile SL6.ks, retrieve/generate host specifics in %pre

  • the netinfo file is gone "because it is now redundant" :-(

    • no it isn't... ifcg-eth0 holds no useful information if dhcp is used X-(

    • grabbing the info from /tmp/syslog now
  • partitioning is going to be "interesting"
    • on a PC with multicard reader, sda now becomes sde (at least during installation)
      • had the same problem on at least one server (and probably will on many)
      • the documented kernel parameter nousbstorage doesn't work X-(

      • nousb does - side effects?

        • seems ok to use it during installation. - not propagated to installed system
          • 2010-11-02: now default option for PXE and SLU installs
        • side effect: no keyboard during install if that's a USB one - bad for debugging
      • alternatives: set a long delay before luns are probed by usb_storage, or ignore all usb storage devices we own:
        • usb-storage.delay_use=60
        • usb-storage.quirks=413c:0001:i,0644:0200:i,058f:6362:i,0bda:0181:i
          • this probably takes care of the DRAC5, any multicard reader we have in our PCs
        • these options must be removed from /etc/modprobe.d/anaconda.conf in %post
        • the ordering of scsi hosts is still different (sda will not be on host 0, and lsscsi output looks strange)
    • eventually, we should use /dev/disk/by-*

RPM/YUM

  • new checksum algo
    • RPMs built on SL6 can not be used on previous releases => build on SL5 for "common"!

    • to install a .src.rpm built on SL6 on SL5: rpm -i --nomd5

  • %topdir is now ~/rpmbuild by default (used to be /usr/src/redhat)

    • that's certainly not what we want
    • to cope: put a line %_topdir /tmp/rpmb.<my_user_name> in ~/.rpmmacros

      • caveat: prior to SL6, the required subdirs will not be created automatically by RPM
  • 32-bit package arch is i686 now - modified yumsel accordingly, ok
  • really fast :)

    • well, it was during the beta (or maybe only on systems with few installed packages), but this has changed :-(

  • rpm has %posttrans scripts now
  • yum multilib_policy default changed from "all" to "best" - this is good, and we even make this explicit in yum.conf just in case they change their mind
  • apparently, yum will have a useful feature: general variables, see https://bugzilla.redhat.com/show_bug.cgi?id=590924

    • this feature is available with the beta 2 refresh - caveat: variable names must be lower case...

LDAP

nss_ldap was replaced by nslcd (package nss-pam-ldapd). Configuration is in /etc/nslcd.conf. In RHEL6 beta, authconfig has a bug writing an illegal line at the end of this file (fixed in %post).

OpenSSH

Should have the GSSAPI key exchange patch - according to the changelog, https://bugzilla.redhat.com/show_bug.cgi?id=455351 is fixed :-)

NTP

  • Service ntp was split into ntpdate and ntp. Great idea... will have to adapt xntp feature and archives :-(

  • the ntpd init script no longer supports "reload"

Network Configuration

NetworkManager is meant to be used for everything now, but that's not likely to be what we want (on on servers, we may be able to avoid installing it altogether).

  • at the end of installation, turn on service network, and turn off service NetworkManager

  • The rpm will now automatically enable the service in %post => need a trigger to undo that (done: SL_no_NetworkManager)

chkconfig

New option "resetpriorities", which is different from "reset". (And maybe does what "reset" was supposed to do ?)

perl configuration

  • siteprefix is now /usr/local instead of /usr

GDM

  • gdmsetup no longer exists :-( , custom.conf is empty (no examples)

  • doc: http://library.gnome.org/admin/gdm/2.30/configuration.html.en#greeterconfiguration

  • many settings are only available via gconf
    • setting /apps/gdm/simple-greeter/disable_user_list to false works
    • setting /apps/gdm/simple-greeter/disable_restart_buttons to true doesn't
    • there is no option to disable the "Suspend" entry in the Shutdown menu X-( (maybe it's possible by theming... update: what theming X-( )

      • actually, denying it to anyone with PolicyKit does the trick

        • no way to get rid of the buttons/menu entries though
          • this was fixed in a fastbugs update of DeviceKit-power

  • and most are simply gone
    • you have to patch the source to even keep it from displaying the fqdn (done...)
      • found a better way to do it: setting the computer-info-name-label invisible in the greeter's .ui file gets rid of the fqdn, and doing the same with computer-info-version-label (which appears instead) allows setting a "banner" label text - see SLZ_gdm_config.rpm for how it's actually done
  • on the other hand, it creates the XAUTHORITY db on the local disk now, no longer in $HOME
    • that's good - how to teach ssh to do it?
      • Mostly untested but proof of concept worked:
        [blade8f] /root # grep XAUTHORITY /etc/pam.d/sshd
        session    optional     pam_mkstemp_env.so envname=XAUTHORITY envtemplate=/tmp/.Xauthority.XXXXXX

        ftp://ftp.su.se/pub/pam_mkstemp_env/

Filesystem Capabilities

  • new, very interesting, feature: see capabilities(7)

Ext4 is default filesystem

  • including /boot

  • has a number of mount options for tuning
    • nobarrier (or barrier=0) should be used for battery backed storage

    • stripe= should be used on RAID5/6 (even in VMs living on logical volumes on such storage)

    • more should be studied and understood: journal_checksum, journal_async_commit, inode_readahead, delalloc, min/max_batch_time, journal_ioprio, auto_da_alloc

Ext2/3/4 mkfs extended options

  • both stride= and stripe-width= are available (again), and should be used when creating filesystems on RAIDs

Legacy Font Handling

  • chkfontpath is gone
  • for additional directories, add a link in /etc/X11/fontpath.d instead

Logging

  • sysklogd replaced by rsyslog
  • fairly backward compatible:
    • to a traditional syslog.conf, just add these lines at the top:
      $ModLoad imuxsock.so    # provides support for local system logging (e.g. via logger command)
      $ModLoad imklog.so      # provides kernel logging support (previously done by rklogd)
      $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
    • and to avoid duplicate hostnames when logging remotely:
      $template sysklogd,"<%PRI%>%TIMESTAMP% %syslogtag%%msg%"
      *.*;kern.!=debug;                                       @loghost;sysklogd
      *.*;kern.!=debug;                                       @loghost2;sysklogd

dhcpd

  • config is in /etc/dhcp now

  • the manpage claims that by default it will run as a DHCP6 server (it cannot do both, if DHCP and DHCP6 are required, two instances must be run)
    • but at least if IPv6 is disabled, that's not the case

postfix

  • by default, it will use IPv4 only, but the EL6 default configuration makes it use both IPv4 and IPv6 - and complain if IPv6 is disabled
    • DL_postfix_nullclient.rpm will now change this

mailx

  • is an entirely different thing than on SL5
    • only basic functionality is mostly compatible
    • features like setting the from: address are still present, but switches are completely different

OpenAFS

  • IMA problem: https://bugzilla.redhat.com/show_bug.cgi?id=584901

    • patched kernel for beta 1: 2.6.32-19.el6.z1
    • fixed in beta 2 refresh kernel 2.6.32-44.1.el6.x86_64
  • works with SELinux enabled if the labels are right:
    • /var/cache/openafs: system_u:object_r:afs_cache_t
    • /etc/init.d/afs: system_u:object_r:afs_initrc_exec_t
    • /usr/vice/etc/afsd: system_u:object_r:afs_exec_t
    • The latter two are defined in /etc/selinux/targeted/contexts/files/file_contexts, but for the cache it only defines this context for /var/cache/afs and usr/vice/cache. We should probably move our cache - or ask them to include /var/cache/openafs. Or have a trigger that updates the file and adds a line for /var/cache/openafs. Careful: Upon updates of afs_client, the label of /var/cache/openafs is reset, hence this needs to be fixed!
      • openafs-selinux.rpm deals with this correctly now: a trigger
        • uses semanage fcontext -a -e /var/cache/afs /var/cache/openafs to set up an "equivalence" between /var/cache/openafs and /var/cache/afs (this gets stored in /etc/selinux/targeted/contexts/files/file_contexts.subs)

        • calls restorecon on /var/cache/openafs, /etc/init.d/afs, /usr/vice/etc/afsd, and - if it isn't mounted - /afs

Rebuilding the Kernel

  • in the spec, define the 'buildid' macro
    • no, better change the release, since the buildid will be appended after %dist, confusing some scripts
  • since there are no patches in the upstream SRPMS any longer, thanks to oracle, here's a diff showing how to apply those:
    --- kernel.spec 2012-10-04 15:05:21.546396736 +0200
    +++ ../rpmb.sw/SPECS/kernel.spec        2012-10-04 14:48:53.107491543 +0200
    @@ -15,11 +15,13 @@
     # that the kernel isn't the stock distribution kernel, for example,
     # by setting the define to ".local" or ".bz123456"
     #
    -# % define buildid .local
    +# % define buildid .z1
    +
    +Patch0: acpi_pad-fix-power_saving-thread-deadlock.patch
     
     %define rhel 1
     %if %{rhel}
    -%define distro_build 279.9.1
    +%define distro_build 279.9.1.z1
     %define signmodules 1
     %else
     # fedora_build defines which build revision of this kernel version we're
    @@ -923,6 +925,8 @@
     
     # Any further pre-build tree manipulations happen here.
     
    +ApplyPatch acpi_pad-fix-power_saving-thread-deadlock.patch
    +
     chmod +x scripts/checkpatch.pl
     
     # only deal with configs if we are going to build for the arch
    @@ -1706,6 +1710,8 @@
     %endif
     
     %changelog
    +* Thu Oct  4 2012 Stephan Wiesand <stephan.wiesand desy de> [2.6.32-279.9.1.z1.el6]
    +- apply git.kernel.org 5f1601261050251a5ca293378b492a69d590dacb to fix acpi_pad deadlocks
     * Fri Aug 31 2012 Frantisek Hrbata <fhrbata@redhat.com> [2.6.32-279.9.1.el6]
     - [md] raid1, raid10: avoid deadlock during resync/recovery. (Dave Wysochanski) [845464 835613]
     - [fs] dlm: fix deadlock between dlm_send and dlm_controld (David Teigland) [849051 824964]
  • rpmbuild -ba --without debug kernel.spec to build the kernel package

    • adding --without perf --without perftool may speed things up a bit

  • rpmbuild -ba --target noarch --with firmware --without debug --without doc --without perftool --without perf kernel.spec to build the kernel-firmware package (required by the kernel in a matching version)

The build requires more than 9.5 GB of disk space.

PolicyKit

  • Allowing local users to install additional packages with PackageKit requires just a file /var/lib/polkit-1/localauthority/50-local.d/10-desktop-policy.pkla:

    [Local User Permisssions]
    Identity=*
    Action=org.freedesktop.packagekit.package-install;org.freedesktop.packagekit.system-sources-refresh
    ResultAny=no
    ResultInactive=no
    ResultActive=yes

Virtualization

  • Xen is gone. Only kvm is available.
  • Host: set kvm-host-mod

    • to get rid of virbr0 (used for NAT): virsh net-destroy default; virsh net-undefine default

      • this actually happens automatically now
  • SL6 guest on SL5 Xen host: set CF_Platform=xen_vm_para, but do not set xen-guest-mod

  • SL5/6 guest on SL6 host: set CF_Platform=kvm_vm

SPICE

  • make sure xorg-x11-drv-qxl and spice-server are installed in the vm
  • virsh edit <vm>

    •     <graphics type='spice' port='5903' autoport='no' keymap='en-us'/>
          <video>
            <model type='qxl' heads='1'/>
            <alias name='video0'/>
            <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
          </video>
  • on the host, install spice-client
  • ssh -L5903:localhost:5903 <host>

  • spicec -h localhost -p 5903

TODO: SSL connection, USB, Audio, Windows Client, Windows VM, How to use spice-xpi?

Notes on UEFI

  • booting:
    • R610 virtual floppy: map efidisk.img
    • PXE is supposed to be possible with the files from efiboot.img
      • set dhcp "filename" to "BOOTX64.efi (which is actually grub)
        • alas, it doesn't know about (nd) and neither reads BOOTX64.conf nor efidefault
        • couldn't get grub to do anything useful, thus PXE boot is just broken (verified with files from true RHEL6 as well)
        • that's probably another bug in the UEFI firmware, see https://bugzilla.redhat.com/show_bug.cgi?id=670266

  • kickstart: (lines from anaconda-ks.cfg after manual install)
    • zerombr yes is not present and may not be a good idea?

    • bootloader --location=partition

    • part /boot/efi --fstype=efi --grow --maxsize=200 --size=20

  • disk label will be GPT
  • partition table after a manual installation:
    Model: DELL PERC H700 (scsi)
    Disk /dev/sda: 1799GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    
    Number  Start   End     Size    File system  Name  Flags
     1      1049kB  211MB   210MB   fat16              boot
     2      211MB   735MB   524MB   ext4
     3      735MB   1799GB  1798GB                     lvm
  • /etc/grub.conf is a link to ../boot/efi/EFI/redhat/grub.conf

    • and it contains a funny line: device (hd0) HD(1,800,64000,a6eed9f8-a6d7-43dc-b958-7a0404550eb7)

      • this ID is not present in /dev/disk/by-id
  • Dell UEFI Boot Manager:
    • the boot order can only be defined after installation (before, the disk option is not available)
    • PXE boot ignores the "next-server" option and will always assume a tftp server on the answering dhcp server

Candidates for s-bit removal

# for i in /bin/ /cgroup/ /boot/ /Desktop/ /etc/ /home/ /lib/ /lib64/ /opt/ /sbin/ /selinux/ /srv/ /tmp/ /usr/ /var/;do find $i -perm +4000 -o -perm +6000; done|xargs ls -l
-rwsr-xr-x. 1 root root       12104 Jul 14 13:45 /bin/cgexec
-rwsr-x---. 1 root dbus       49960 Aug 11 08:38 /lib64/dbus-1/dbus-daemon-launch-helper
-rwxr-sr-x. 1 root root        8744 Sep  1 18:15 /sbin/netreport
-rwsr-xr-x. 1 root root        9632 Oct 20 16:01 /sbin/pam_timestamp_check
-rwsr-xr-x. 1 root root       32160 Oct 20 16:01 /sbin/unix_chkpwd
-rwsr-xr-x. 1 root root       65680 Jul 20 11:19 /usr/bin/chage
-rwxr-sr-x. 1 root screen    387320 Nov 20  2009 /usr/bin/screen
-r-xr-sr-x. 1 root tty        15176 Apr 27  2010 /usr/bin/wall
-rwxr-sr-x. 1 root tty        12024 Aug 13 10:23 /usr/bin/write
-rwsr-xr-x. 1 root root      131224 Oct 18 03:05 /usr/gridengine/utilbin/lx26-amd64/authuser
-rwsr-xr-x. 1 root root       17008 Oct 18 03:05 /usr/gridengine/utilbin/lx26-amd64/rlogin
-rwsr-xr-x. 1 root root       48376 Oct 18 03:05 /usr/gridengine/utilbin/lx26-amd64/rsh
-rwsr-xr-x. 1 root root        8416 Oct 18 03:05 /usr/gridengine/utilbin/lx26-amd64/testsuidroot
-rwx--s--x. 1 root utmp       17144 Jul 19 16:15 /usr/lib64/vte/gnome-pty-helper
-rwsr-xr-x. 1 root root      221688 Aug 12 16:04 /usr/libexec/openssh/ssh-keysign
-rwsr-xr-x. 1 root root       18200 Jun 21 18:03 /usr/libexec/polkit-1/polkit-agent-helper-1
-rws--x--x. 1 root root       40752 Feb 25  2010 /usr/sbin/userhelper
-rwsr-xr-x. 1 root root        9000 Sep  1 18:15 /usr/sbin/usernetctl

Probably Ok or Required

-rwsr-xr-x. 1 root root       74680 Aug 13 10:22 /bin/mount
-rwsr-xr-x. 1 root root       41432 Jul 27 13:31 /bin/ping
-rwsr-xr-x. 1 root root       36256 Jul 27 13:31 /bin/ping6
-rwsr-xr-x. 1 root root       36440 Jun 14 13:01 /bin/su
-rwsr-xr-x. 1 root root       49280 Aug 13 10:22 /bin/umount
-rwsr-xr-x. 1 root root       68672 Jul 20 11:19 /usr/bin/gpasswd
-rwsr-xr-x. 1 root root      216626 Sep 23 23:55 /usr/bin/ksu
-rwx--s--x. 1 root slocate    38464 Mar 30  2010 /usr/bin/locate
-rwxr-sr-x. 1 root mail       20256 Dec  3  2009 /usr/bin/lockfile
-rwsr-xr-x. 1 root root       38224 Jul 20 11:19 /usr/bin/newgrp
-rwsr-xr-x. 1 root root       31768 Jan 28  2010 /usr/bin/passwd
-rwsr-xr-x. 1 root root       25232 Jun 21 18:03 /usr/bin/pkexec
-rwxr-sr-x. 1 root nobody    112000 Aug 12 16:04 /usr/bin/ssh-agent
---s--x--x. 2 root root      186800 Sep  1 10:53 /usr/bin/sudo
---s--x--x. 2 root root      186800 Sep  1 10:53 /usr/bin/sudoedit
-rws--x--x. 1 root root       28001 Oct 22 14:54 /usr/libexec/pt_chown
-rwx--s--x. 1 root utmp        9760 Dec  3  2009 /usr/libexec/utempter/utempter
-rwx--s--x. 1 root lock       15792 Dec  4  2009 /usr/sbin/lockdev
-rwxr-sr-x. 1 root postdrop  184904 May 26  2010 /usr/sbin/postdrop
-rwxr-sr-x. 1 root postdrop  213736 May 26  2010 /usr/sbin/postqueue
-rwsr-xr-x. 1 root root       19072 Aug 25 14:51 /usr/sbin/seunshare

S-bit removed

-rws--x--x. 1 root root     1932216 Aug 11 17:52 /usr/bin/Xorg  (security feature)

Not Installed by Default

---s--x---. 1 root stapusr    67840 Nov 16 04:43 /usr/bin/staprun

SL6 Development (last edited 2013-02-27 17:18:33 by StephanWiesand)