⇤ ← Revision 1 as of 2009-06-09 11:05:45
483
Comment: first observations
|
1173
brief problem description and extrapolation
|
Deletions are marked like this. | Additions are marked like this. |
Line 12: | Line 12: |
= Problems = We did at least once observe cache corruptions. This is what five machines said during one run: {{{ wrong offset found: (0x1, 0xa71c0000) instead of (0x0, 0x15300000) wrong offset found: (0x0, 0x73f90000) instead of (0x1, 0x22160000) wrong offset found: (0x1, 0x7d120000) instead of (0x0, 0x94400000) wrong offset found: (0x2, 0xab0f0000) instead of (0x0, 0xbd0000) wrong offset found: (0x1, 0x50580000) instead of (0x0, 0x88900000) }}} Apparently, all of them were running pre-r690 versions of the AFS+OSD client, so this seems to confirm that * there was indeed a problem with the vicep-access code on Lustre and * r690 may have indeed fixed that very problem. |
Write performance
erinye2-vm2 writes a large file (16gb):
# /afs/ipp-garching.mpg.de/.cs/perftest/i386_rh90/write_test /afs/ifh.de/testsuite/testosd/testfile-large 0 17179869184 ... write of 17179869184 bytes took 187.631 sec. close took 0.602 sec. Total data rate = 89130 Kbytes/sec. for write
Read performance
load impact
...is considerable after all. Upon start of a batch job, read throughput has been observed to drop from ~115MBps to ~50MBps.
Problems
We did at least once observe cache corruptions. This is what five machines said during one run:
wrong offset found: (0x1, 0xa71c0000) instead of (0x0, 0x15300000) wrong offset found: (0x0, 0x73f90000) instead of (0x1, 0x22160000) wrong offset found: (0x1, 0x7d120000) instead of (0x0, 0x94400000) wrong offset found: (0x2, 0xab0f0000) instead of (0x0, 0xbd0000) wrong offset found: (0x1, 0x50580000) instead of (0x0, 0x88900000)
Apparently, all of them were running pre-r690 versions of the AFS+OSD client, so this seems to confirm that
- there was indeed a problem with the vicep-access code on Lustre and
- r690 may have indeed fixed that very problem.