Review: Opteron for Servers

06.05.2003 von Jörg Luther
AMDs Opteron challenges Intel in the market segment of entry level and midrange servers. Newisys' dual-Opteron machine has to prove its capabilities against Xeon-based systems by Dell and IBM.

It was late, but not too late: Just before the Easter holidays a review system by Newisys hit the tecChannel labs - arriving fresh from Austin, Texas. Despite its 1U form factor the Newisys 2100 (codenamed "Kephri") can hardly be called a pizza box: Ranging almost 30 inches deep into the rack, this server will not fit into everybody's environment.

Apart from the two 1800 MHz CPUs, model numbered Opteron 244, our review system features 2 Gbytes of registered DDR-333-SDRAM and two 36 GByte hard drives with an Ultra320-SCSI interface, which can be configured as a mirror set. These drives are controlled by an LSI logic chip on the motherboard, which also features a graphics controller by Trident and two Gigabit Ethernet controllers by Broadcom. The system can be expanded through two PCI-X-Slots with dedicated bus systems. There is enough space for one full sized and one half sized card.

The management of the server is handled by a dedicated service processor, which, among other tasks, controls the numerous sensors for temperatures, voltages and fans the board is equipped with. This processor communicates over a dedicated Fast-Ethernet-Port via HTTPS or SSL. Thus, the administrator can comfortably shut down or reboot the system remotely.

The competition

AMD positions the Opteron, despite its 64 bit capabilities, as a competitor to Intel's Xeon - and even Intel shares this view. Therefore, we run the Newisys machine against two servers based on the P4-Xeon: IBM's entry level server xSeries 225 and Dell's midrange system PowerEdge 4600.

Our review machine of IBM's xSeries 225 features two Xeon processors at 2,40 GHz clock speed and 2 GBytes of Dual-Channel-DDR266-SDRAM. Only one Ultra320-SCSI drive spins in the system, but there is enough space for six of them. The motherboard is also equipped with a graphics chip and a Gigabit Ethernet controller, expansions can be implemented via one 32-bit PCI-slot and four PCI-64-slots. Some features typical for servers are sadly missed with the x225 - like a management chipset, restricted physical access to the drives or redundant components. The overall feature set reminds of a workstation - the motherboard even features an AGP slot, which is completely uncommon for servers.

Dell's PowerEdge 4600 is in total contrast to the IBM machine. As a robust solution for workgroups it is controlled by a management chipset by ServerWorks and also brings a load of High-Availability features. These include fully redundant power supplies. A dual-channel-RAID controller connects to up to ten Ultra320-SCSI drives, which gives an overall capacity of close to 1,5 TBytes. Seven PCI slots, six of them with PCI-X, are ready for expansion. Two Xeon processors at 2800 MHz each should give enough horse powers. With our review system, these CPUs can access 4 Gbytes of registered DDR200 memory.

Test setup

We installed the latest Version of SuSE Linux Enterprise Server 8 for x86, aka: SLES8, with all three systems. The Nuremberg company was kind enough to supply us with the "gold code" of its Opteron version of SLES8 for the Newisys machine. Like the 64-Bit-CPU itself this version is available since April 22. This setup enables testing with identical 32 bit environments and a nearly source-identical 64-Bit-OS. The Xeon machines by Dell and IBM are tested with their stock 2,4or 2,8 GHz CPUs. As an additional look in the future we also plugged two of the brand new Xeons at 3,06 GHz into the IBM x255.

The benchmarks were chosen from a range of open sourced test suites that measure performance under medium and high load conditions. Unixbench is a port of the fairly familiar "Byte"-benchmark. We selected six SMP enabled tests from this suite. dbench derives from the toolbox of the Samba developers. Using scripted real-world network data, it simulates the load of a high number of clients accessing the servers file system. lmbench is used to do some basic system bandwidth measurements. To evaluate the performance under high loads in a multiuser environment we use the "Suite VII", part of the AIM benchmarks by SCO.

All these benchmarks are compiled with the target machines. The AIM tests here require some minor polishing of the code to run flawlessly. We further remove all hard disk specific tests from the benchmark suites to compensate for the different storage systems of the machines reviewed. Dell's PowerEdge 4600 brings double the memory of the two other machines. We therefore limit the OS to use only 2 Gbytes of memory by using the appropriate Linux kernel boot parameter. Before every run we reboot the machines to free memory and file system from left-overs of the last benchmark.

First looks

Already when we set up the OS and did some Ad-hoc tests we gathered a few impressions of the performance of the Dual-Opteron machine. Especially under massive loads the Newisys machine seemed much more responsive than its Xeon powered competitors. It looked like AMD's claim that the Opteron was competitive with Intels latest CPUs seemed not too far fetched. But some minor flaws of the Newisys were also clearly visible.

According to the integrated management processor the system runs into thermal problems under high loads. Surprisingly enough this does not affect the hard working CPUs: They clearly operate within their thermal envelope at all times. Instead, the PCI-X-bridge works at the limit of its thermal specification. This is even more strange as this chip has nothing to do at all - all expansion slots are unpopulated.

However, the Newisys board is still a developer sample. For mass production, Newisys has to fix this problem, though. If not, failures with these rack mounted devices seem preassigned. During our tests, which sometimes max out the loads for hours (up to system loads of 2500) the heat was barely acceptable.

AIM Suite VII

Of course a high load of different tasks is every servers proudest asset - that's why we use SCO's AIM Suite VII. These benchmarks creates a mixed workload featuring arithmetics, I/O, process generation and handling of the file system. During its run, the benchmarks measure the number of processed requests per minute. In our tests, we left the hard disk intense tasks out. This was done to compensate for the different storage architectures of the machines, and to specifically evaluate the performance of the CPUs.

The result is more than abundantly clear: No matter if in 32-bit or 64-bit-mode, Opteron significantly beats the Xeon based competition throughout the whole spectrum of loads. Up from 48 parallel tasks the Xeon machines slowly max out, Opteron in contrast still maintains its higher performance even if more load is thrown at it.

lmbench

Why Opteron is performing extremely well with higher loads of tasks is shown by the results of Imbench. This open source benchmark takes a whole lot of measurements at a very low system level, including the timing of process creation.

As we see here, Opteron handles a simple fork() followed by exit() almost twice as fast as the Xeon machines. It even processes more complex requests for process creation significantly faster then the Xeon machines. For instance, a fork() to shell is handled by the AMD CPU 20 per cent faster than with an Intel machine.

Only a Xeon at 3,06 GHz, clocked at almost twice the speed of the Opteron, can keep up with the Opteron at more simple calls. But when the AMD processor is working in 64 bit mode, even the fastest Xeon is taken down.

Unixbench I

The picture drawn so far becomes even clearer with unixbench. This suite was ported to Linux from the popular Byte-Benchmark. Apart from measuring some granular operations, this benchmark also includes several tests that put even SMP machines under high load. Unixbench' scores show the number of lines processed per time unit.

Opteron here creates simple processes much faster than Xeon, again. If calls to the shell are made with one, eight or sixteen processes in the background the Opteron at 1.8 GHz does not win that clearly any more, especially the Xeon at 3.06 GHz is on its heels here. But in 64 bit mode Opteron still achieves performance gains from 10 up to 17 per cent.

Unixbench II

Some of the other tests of the unixbench suite demonstrate that 64 bit code not necessarily has to be faster than 32 bit code. When performing pipe-based context switching, Opteron in 64 bit mode is left behind by the 32 bit version because of its higher overhead.

A classical test for the arithmetical capabilities of a system is calculating the square root of 2 to 99 decimal places. The Intel processors perform this task the quicker the higher their clock speeds are. Equipped with one 3.06 GHz CPU, the IBM machine outperforms the Dell system with two Xeons at 2.8 GHz each; the last place in this category goes to the IBM machine at 2.4 GHz. Even in 32 bit mode, Opteron outperforms the two lower clocked Xeons configurations. In 64 Bit mode it gains around 33 per cent and even beats the Xeon at 3.06 GHz by almost 28 per cent.

With mixed workloads Opteron shows a quite similar behavior, even if the performance advantages here are not as dramatically. With the throughput test of the C-Compiler Opteron is still 8 per cent faster than any Xeon, no matter which mode the CPU is in.

dbench

The dbench test suite has been developed by Samba team member Andrew Tridgell. The Samba developers use this benchmark to evaluate the load performance of file systems, and especially that of Samba, of course. For this review, we leave the Samba specific parts of this test suite behind and only use dbench itself. It simulates a high number of clients accessing the I/O system of the server by generating requests from the data of a real netbench test run. The advantage in testing here is, that one can put the file system under high load without having to set up hundreds of clients.

With this test, again Opteron gives a pleasant surprise. The AMD system stays responsive even when a huge numbers of clients throw requests at it and maintains a dramatically higher throughput than with the Xeon machines. This is true for both its 32 and 64 bit modes.

With the both file system and process intense test, all the Xeon configurations perform on the same level. The Dell machine may be clocked higher, but it is slowed down by its slower memory (DDR200 versus DDR266 with the IBM machine). Even the Xeon at 3.06 GHz only gains a little performance at lower loads, but generally behaves like the slower Intel CPUs in the rest of the tests.

Conclusion

AMDs decision to take Hammer architecture to market in its server flavor first proves to be right for two reasons. First, Sledgehammer aka Opteron obviously shows it advantages especially when used in a server environment. And second, Opteron despite its relatively low clock speed of 1.8 GHz can compete with the significantly higher clocked Intel products in this segment for quite some time - thanks to its architectural advantages.

However, one can expect Intel to upgrade its Xeon CPUs in the near future in a number of ways. Already now the new Xeon at 3.06 GHz can keep up with the 32 bit mode of the Opteron at 1.8 GHz - and clock speed will rise continually. In addition to that, Intel will soon enlarge Xeons L2 cache, promising significant performance advantages. Wether AMD can maintain Opteron's leading position seems a matter of how Dresden's Fab30 can handle the often rumoured problems with the transistor design of the CPU. Higher yields of higher clocked Opterons may also help here.

But putting the clock speed aside, Opteron as a server processor offers one highly significant advantage the competition can't deliver. In difference to Intel, AMD not forces its customers to choose between 32 or 64 bit CPUs, but gives them both options. Already as a 32 bit CPU for servers Opteron reigns in the Xeon. Apart from that, it also offers a smooth migration path to the world of 64 bit computing. Depending on your requirements you may even feed 32 bit applications to the processor when it is in 64 bit mode - or you can completely change to new format. (Jörg Luther, Translation: Nico Ernst)