IXP1200 Programming: The Microengine Coding Guide for the Intel IXP1200 Network Processor Family

The goal of this chapter is to convey the techniques available for intermicroengine programming. This will be done by completing the packet counting examples started in Chapter 5 and 6. By the end of the chapter, you should understand inter-microengine synchronization and communication techniques, including atomic memory operations, SRAM CAM locks and inter-thread signals.
In Chapter 5, we built a single-threaded example for receiving and counting packets focusing on understanding the receive reassembly code. In Chapter 6, we extended this example to multiple threads on a single microengine focusing on the microengine thread arbiter and shared variables. In this chapter, we complete our examination of this example code by exploring how it can be modified to execute on multiple microengines.
Like Chapter 6, the underlying motivation for programming with multiple microengines is better performance. As we explore the various inter-microengine communication and synchronization techniques, we will check our progress by measuring the relative performance increases (or decreases!) of each technique.
We have already determined that two threads are faster than four for our counting example, so it may be that adding more threads may not improve the performance numbers. As it turns out, without using optimization techniques like those covered in Chapters 8 and 10, the best multiple microengine performance we can achieve for our example code is about the same as the single-threaded code of Chapter 5. However, the techniques of this chapter result in significant performance increases when the application is more demanding than simple packet...