IXP1200 Programming: The Microengine Coding Guide for the Intel IXP1200 Network Processor Family

The goal of this chapter is to extend the single-threaded concepts of the previous chapter to the multithreaded environment within a single-microengine. By the end of the chapter, you should understand intra-ME resource allocation like threads and registers, intra-ME synchronization and intra-ME communication.
In this chapter, we explore the issues involved with programming multiple threads of a single microengine beginning with understanding the basic performance overhead due to threads contending for their share of a single microengine, and progressing to extending the singlethreaded packet counting example from Chapter 5 to take advantage of more than one thread of a single microengine.
To accomplish this, we introduce the programming construct of shared variables and the non-preemptive thread arbiter. Shared variables and the non-preemptive arbiter are a fairly simple way of implementing intra-microengine communication and mutual exclusion.
These fairly simple programming constructs have subtle performance implications, and so we will analyze what impact these constructs have on performance. After all, performance is the real motivation for multithreaded programming. Knowing how much faster two threads are than one, or four threads are than two, allows us to make informed tradeoffs in our final code.
This chapter continues to use the simple counting application so the focus remains on the intra-microengine multithreading issues. Some of the optimizations that can be made through the use of memory latency hiding are ignored here, but Chapter 8 addresses these issues. Of course, absolute performance measures depend on the type of packet processing performed as well...