LustreTestsZeuthen

Write performance

erinye2-vm2 writes a large file (16gb):

# /afs/ipp-garching.mpg.de/.cs/perftest/i386_rh90/write_test /afs/ifh.de/testsuite/testosd/testfile-large 0 17179869184
...
write of 17179869184 bytes took 187.631 sec.
close took 0.602 sec.
Total data rate = 89130 Kbytes/sec. for write

Read performance

load impact

...is considerable after all. Upon start of a batch job, read throughput has been observed to drop from ~115MBps to ~50MBps.

Things to find out:

what role does CPU load play?
what role does system load play?
is the amount of time jobs spend in kernel context important?

Problems

We did at least once observe cache corruptions. This is what five machines said during one run:

wrong offset found: (0x1, 0xa71c0000) instead of (0x0, 0x15300000)
wrong offset found: (0x0, 0x73f90000) instead of (0x1, 0x22160000)
wrong offset found: (0x1, 0x7d120000) instead of (0x0, 0x94400000)
wrong offset found: (0x2, 0xab0f0000) instead of (0x0, 0xbd0000)
wrong offset found: (0x1, 0x50580000) instead of (0x0, 0x88900000)

Apparently, all of them were running pre-r690 versions of the AFS+OSD client, so this seems to confirm that

there was indeed a problem with the vicep-access code on Lustre and
r690 may have indeed fixed that very problem.

AfsOsd/LustreTestsZeuthen (last edited 2009-06-09 12:48:22 by FelixFrank)