Should I use the BIOS “Advanced ECC” in Dell PowerEdge R710 BIOS with ECC DIMM?

I have a Dell PowerEdge R710 with dual Intel Xeon E5503 CPUs. It has 96GB (12x8GB) ECC DIMMs.
In its B IOS, the memory configuration is “Advanced ECC”.

My question is, if my DIMM is already ECC, does it make sense to enable this “advanced ECC” mode in the BIOS, or should I switch to “optimized”?

Dell describes these modes as:

Advanced ECC Mode
This mode uses two MCHs and “Ties” them together to emulate a 128-bit data bus DIMM. This is primarily used to achieve a Single Device Data Correction (SDDC) for DIMMs based on x8 DRAM technology. SDDC is supported with x4 based DIMMs in every memory mode. One MCH is
completely un-utilized, and any memory installed in this channel will generate a warning message during POST.

Memory Optimized Mode In this mode, the MCHs run independently of each other; for example, one can be idle, one can be performing a write operation, and the other can be preparing for a read operation. Memory may be installed in one, two, or three channels. To fully realize the performance benefit of the memory optimized mode, all three channels per CPU should be populated. This implies that some’atypical’ memory configurations, such as 3GB, 6GB, or 12GB, will yield the best performance. This i s the recommended mode unless specific RAS features are needed.

Dell PowerEdge R710 Systems Hardware Owner’s Manual (PDF)

It does make a difference, it only makes sense if you need RAS (reliability, availability, and service) features on x4 or x8 devices and understand the trade-offs of your needs. For more information , Please refer to the Dell white paper Dell™ PowerEdge™ Servers
2009 – Memory.

In addition, the PowerEdge R710 technical guide provides the configuration and layout with R710 specific details – (Google this is Because I don’t have a link reputation).

The important issue to be aware of is the difference between the ECC on the chip and the “Advanced ECC” provided by Dell’s BIOS for Single Device Data Correction (SDDC). You will There will be a performance impact on both. ECC will recover from errors during writing to the chip. However, SDDC goes one step further and will organize these bits so that the entire chip can fail and still be recoverable. View examples and details SDDC E7500 Chipset< /p>

The question is whether your performance and/or reliability is your biggest concern for the specific purpose of the machine. If the chip failure will cause the loss of critical data or usage on this machine, and it is non-redundant in the implementation Yes, advanced ECC may be a good method. However, you may have an impact on performance, which may be more important to you.

I have implemented a single Microsoft SQL Server implementation. If I can provide more help, please leave a comment and let me know.

I hope it helps.

Edit: Coverage gap / ECC implementation

< p>Yes, even if you implement both at the same time, there is a coverage gap. Because, you specifically use high-availability server clusters, IMHO you should use advanced ECC. Compared with the advantages of cluster equipment, your performance impact is minimal According to Crucial, you generally only have a 2% decrease in performance on ECC memory.

The gap will be more specific to the type of error that occurs and how each error is handled. In your specific situation, it does not answer Converted to data loss. Since this is an enterprise DBMS, errors, concurrency issues, etc. are managed at the software level to prevent data loss. The detailed history keeps the changes in the correctly configured DBMS, and the software that uses it can usually be set To make the transaction “roll back” when a serious error occurs.

ECC implementation

ECC will try to correct any bit errors in memory read/write. However, if the error is more serious, Then even ECC cannot be recovered, resulting in potential data loss. There are more discussions about ECC in ServerFault/What is ECC ram and why is it better?

According to Wikipedia on ECC_Memory

ECC memory maintains a memory system effectively free from single-bit errors…

SDDC

If you refer to the above The E7500 chipset documentation (note the 55xx/56xx from Intel require login/partnership, but the idea is similar to the reason I didn’t link it initially), which describes the SDDC and how it is implemented. Basically, it uses a technology To organize the words written to the memory, make sure that all words are written in such a way that each word contains only one bit error, that is, the word should be recoverable from a single bit error (as described above). Now this is every Words, so it can recover from 4-bit errors on x4 devices (1 per word) and 8-bit errors on x8 devices (still 1 per word) by correcting each word.

Other errors, more bit errors, total memory failure, channel failure, bus failure, etc. can still cause terrible problems, but this is why you have a cluster and enterprise DBMS.

In short, if you have enabled everything and the error correction algorithm has too many bit errors to correct, there will still be errors, that is, gaps in error coverage. But these may be very rare.

I have a Dell PowerEdge R710 with dual Intel Xeon E5503 CPUs. It has 96GB (12x8GB) ECC DIMMs.
In its B IOS, the memory configuration is “Advanced ECC”.

My question is, if my DIMM is already ECC, does it make sense to enable this “advanced ECC” mode in the BIOS, or should I switch to “optimized”?

Dell describes these modes as:

Advanced ECC Mode
This mode uses two MCHs and “Ties” them together to emulate a 128-bit data bus DIMM. This is primarily used to achieve a Single Device Data Correction (SDDC) for DIMMs based on x8 DRAM technology. SDDC is supported with x4 based DIMMs in every memory mode. One MCH is
completely un-utilized, and any memory installed in this channel will generate a warning message during POST.

Memory Optimized Mode In this mode, the MCHs run independently of each other; for example, one can be idle, one can be performing a write operation, and the other can be preparing for a read operation. Memory may be installed in one, two, or three channels. To fully realize the performance benefit of the memory optimized mode, all three channels per CPU should be populated. This implies that some’atypical’ memory configurations, such as 3GB, 6GB, or 12GB, will yield the best performance. This is the recommended mode unless specific RAS features are needed.

Dell PowerEdge R710 Systems Hardware Owner’s Manual (PDF)

It does To make a difference, it only makes sense if you need RAS (Reliability, Availability, and Service) features on x4 or x8 devices and understand the trade-offs of your needs. For more information, see the Dell white paper Dell™ PowerEdge™ Servers
2009 – Memory.

In addition, the configuration and layout with R710 specific details are provided in the PowerEdge R710 Technical Guide – (Google this because I don’t have a link reputation).

The important thing to note is the difference between the ECC on the chip and the “Advanced ECC” provided by Dell’s BIOS for Single Device Data Correction (SDDC). You will have a performance impact on both. ECC will be during the writing of the chip Recovering from errors. However, SDDC goes one step further and will organize these bits so that the entire chip can fail and still recover. View examples and details SDDC E7500 Chipset

The problem lies in your performance and/or Whether reliability is your biggest concern for a specific purpose of a machine. If a chip failure will cause the loss of critical data or usage on this machine, and it is non-redundant in the implementation, advanced ECC may be a good method. But , You may have an impact on performance, which may be more important to you.

I have implemented a single Microsoft SQL Server implementation on the Dell PowerEdge server site. If I can provide more help, please Leave a comment and let me know.

I hope it helps.

Edit: Coverage gap / ECC implementation

Yes, even if you implement both at the same time, there will be Coverage gap. Because, you specifically use high availability server clusters, IMHO you should use advanced ECC. Compared with the advantages of cluster equipment, your performance impact is minimal. According to Crucial, you generally only have 2% decrease in performance on ECC memory.

The gap will be more specific to the types of errors that occur and how each error is handled. In your specific case, it should not translate into data loss. Since this is an enterprise DBMS, Manage errors, concurrency issues, etc. at the software level to prevent data loss. Detailed history records are kept in a correctly configured DBMS The software that uses it can usually be set to “roll back” the transaction in the event of a serious error.

ECC implementation

ECC will try to correct the memory read/write Any bit error. However, if the error is more serious, then even ECC cannot be recovered, resulting in potential data loss. There are more discussions about ECC in ServerFault/What is ECC ram and why is it better?

According to Wikipedia on ECC_Memory

ECC memory maintains a memory system effectively free from single-bit errors…

SDDC

If you refer to the E7500 chipset documentation above (note the 55xx/56xx from Intel require login/partnership, but the idea is similar to the reason I didn’t link at first), this describes the SDDC and its How to achieve it. Basically, it uses a technique to organize the words written to the memory, ensuring that all words are written in such a way that each word contains only one bit error, i.e. the word should be able to recover from a single bit error Recovery (as described above). Now this is each word, so it can correct each word from 4-bit errors on x4 devices (1 per word) and 8-bit errors on x8 devices ( Each word is still 1) in recovery.

Other errors, more bit errors, total memory failures, channel failures, bus failures, etc. can still cause terrible problems, but this is what you have a cluster and The reason for the enterprise DBMS.

In short, if you have enabled everything and the error correction algorithm has too many bit errors to be corrected, there will still be errors, that is, gaps in error coverage. But these It may be very rare.

Leave a Comment

Your email address will not be published.