Algorithm Design for Networked Information Technology Systems

The key difficulty in debugging implementations of NIT systems stems from the fact that they are large-scale, distributed, and asynchronous. Given that the geographically dispersed nodes behave autonomously and interact with each other asynchronously, at any time instant, each of the nodes may reside in a unique execution state, making it difficult to trace the chain of cause and effect to the source of error following an incorrect execution result. From actual experience, as a result of the complex interactions, often an error appearing at a node was caused at a very different node in the system.
This chapter describes a new approach behavior modeling coupled with asynchronous distributed simulation [14], which has been developed to address the challenges of debugging NIT systems and tested. Under this approach, first, the constituent entities of the NIT system are modeled in an appropriate language, generally C/C++ and possibly nVHDL [191] in the future. The level of detail of the model is defined by the desired resolution of behavior. The behavioral models and their interconnection topology are then integrated into the simulator, ensuring that fundamental principles, including causality, are honored. Next, the simulator is executed on a testbed network of workstations, interconnected as a loosely coupled parallel processor, to yield results that are cross-checked against the desired NIT system behavior for consistency and performance estimation. There are several advantages to this approach. First, modeling and simulation offer a relatively inexpensive way of validating NIT systems...