Tech Blog‎ > ‎

HP DL380 G6 Benchmarked

posted Jul 2, 2009, 8:15 AM by Victor Zakharov   [ updated Jul 2, 2009, 6:14 PM ]

Introduction

Our company finally bought a powerful HP DL380 G6 server with 2 x Xeon 5540 (8 cores total). Prior to production, we put 2 x 4 gig memory sticks (8 GB of memory) and 3 SCSI hard drives to make RAID5. As you can imagine, initial testing of this box was up to me.

Special attention in this article is given to virtualization performance. For us it was primary reason for buying such a powerful box. And we certainly want to know if we overpaid extra pennies for it. If you are with us, let’s continue to the next chapter.

Test bed

Okay, let’s see what we have onboard:

2 x 4 GB DDR3-1333 HP memory (which are actually Samsung chips);

3 x HP SCSI hard drives, 146GB RPM, 10000 RPM;

2 x Intel Xeon 5540 2.53 GHz processor, 8 MB cache on each.

For benchmarking and burn-in tests I’m going to use Sandra Lite by SiSoftware. While I also like Everest by Lavalys, I’m not going to use it here for two reasons:

1.     Lack of proper support for this box (often crashes into BSOD just on startup).

2.     Lack of adequate memory testing algo (looks like it measures only single channel or single memory bus, while this one has triple channel and two memory buses, one per processor).

I installed Windows 2008 R2 RC, which is essentially Windows 7 Server and 4 virtual machines with Windows 2008 x64, each having 2GBs of allocated memory, 4 processor cores (maximum allowed on Hyper-V), 100% resource consumption and equal relative weight. As you can see, the total number of cores required for this build to work is 16, which doubles the original number or cores. But don’t worry about this for now.

To perform testing of N virtual machines working simultaneously, I use Sandra’s burn-in on N-1 and actual performance tests running on the remaining one. I will measure CPU overall and memory bandwidth, because other counters are not as important.

Results

I could say many words about performance issues but I’d rather put 2 summary tables below.

Table 1. Host measurements

 Counter\HyperTransport HT enabled (16 logical cores) HT disabled (8 logical cores)
 CPU productivity 118.18 GOPS 93.5 GOPS
 Memory bandwidth 15 GB/s 15 GB/s


Table 2. Guest measurements (HT enabled), effectiveness in brackets.

 Counter\Number of VMs 2 3 4
 CPU productivity 46.74 GOPS (79.1%) 29.58 GOPS (75.1%) 29.34 GOPS (99.3%)
 Memory bandwidth 6.35 GB/s (84.7%) 4.54 GB/s (90.8%) 4 GB/s (106.7%)

If you’re still curious, sustained disk transfer (read & write) with RAID5 is 35 MB/s, measured on host. I was lazy to check virtual machines, but I don’t hope for a miracle there.

Conclusion

The most effective option on this server is using 4 virtual machines. In this case CPU productivity is nearing 100% and memory – yes, magic happens here – goes higher than 100%.

Why?

Well, everything falls under certain logic. Here HT gives us 16 logical cores, each enabling to execute 1 thread. Having 4 virtual machines each given 4 cores with 100% utilization should give us 16 cores fully occupied. And that’s what we actually see.

What about other options?

Remember, host OS is trying to allocate as much resources as possible. The best option is to provide physical cores to virtual machine on request. Generally, it is faster. However, when the total required number of cores exceeds that which is physically available, then HT comes into play. Back to numbers, when guests need 12 cores (case of 3 VMs), each is given logical cores, bringing effectiveness to 12/16, which is roughly 75% and complies with the results (table 2). Therefore, HT is underused.

Having 2 virtual machines, host OS operates only physical cores, thus negating the HT feature. Overall per core performance is higher, but CPU is again underused.

What’s next?

The above results mean we are most likely to migrate 4 virtual machines from our old servers to the new box and have optimal performance, save money and good mood knowing we did the right thing. Not bad, eh?

Comments