Differences between revisions 1 and 2
Revision 1 as of 2009-06-09 11:05:45
Size: 483
Editor: FelixFrank
Comment: first observations
Revision 2 as of 2009-06-09 11:12:50
Size: 1173
Editor: FelixFrank
Comment: brief problem description and extrapolation
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:

= Problems =
We did at least once observe cache corruptions. This is what five machines said during one run: {{{
wrong offset found: (0x1, 0xa71c0000) instead of (0x0, 0x15300000)
wrong offset found: (0x0, 0x73f90000) instead of (0x1, 0x22160000)
wrong offset found: (0x1, 0x7d120000) instead of (0x0, 0x94400000)
wrong offset found: (0x2, 0xab0f0000) instead of (0x0, 0xbd0000)
wrong offset found: (0x1, 0x50580000) instead of (0x0, 0x88900000) }}}
Apparently, all of them were running pre-r690 versions of the AFS+OSD client, so this seems to confirm that
 * there was indeed a problem with the vicep-access code on Lustre and
 * r690 may have indeed fixed that very problem.

Write performance

erinye2-vm2 writes a large file (16gb):

# /afs/ipp-garching.mpg.de/.cs/perftest/i386_rh90/write_test /afs/ifh.de/testsuite/testosd/testfile-large 0 17179869184
...
write of 17179869184 bytes took 187.631 sec.
close took 0.602 sec.
Total data rate = 89130 Kbytes/sec. for write 

Read performance

load impact

...is considerable after all. Upon start of a batch job, read throughput has been observed to drop from ~115MBps to ~50MBps.

Problems

We did at least once observe cache corruptions. This is what five machines said during one run:

wrong offset found: (0x1, 0xa71c0000) instead of (0x0, 0x15300000)
wrong offset found: (0x0, 0x73f90000) instead of (0x1, 0x22160000)
wrong offset found: (0x1, 0x7d120000) instead of (0x0, 0x94400000)
wrong offset found: (0x2, 0xab0f0000) instead of (0x0, 0xbd0000)
wrong offset found: (0x1, 0x50580000) instead of (0x0, 0x88900000) 

Apparently, all of them were running pre-r690 versions of the AFS+OSD client, so this seems to confirm that

  • there was indeed a problem with the vicep-access code on Lustre and
  • r690 may have indeed fixed that very problem.

AfsOsd/LustreTestsZeuthen (last edited 2009-06-09 12:48:22 by FelixFrank)