Editing Load balancing (computing) (section)

===Non-hierarchical architecture, without knowledge of the system: [[work stealing]]===
Another technique to overcome scalability problems when the time needed for task completion is unknown is [[work stealing]].
 
The approach consists of assigning to each processor a certain number of tasks in a random or predefined manner, then allowing inactive processors to "steal" work from active or overloaded processors. Several implementations of this concept exist, defined by a task division model and by the rules determining the exchange between processors. While this technique can be particularly effective, it is difficult to implement because it is necessary to ensure that communication does not become the primary occupation of the processors instead of solving the problem.

In the case of atomic tasks, two main strategies can be distinguished, those where the processors with low load offer their computing capacity to those with the highest load, and those were the most loaded units wish to lighten the workload assigned to them. It has been shown<ref>{{cite journal |last1=Eager |first1=Derek L |last2=Lazowska |first2=Edward D |last3=Zahorjan |first3=John |title=A comparison of receiver-initiated and sender-initiated adaptive load sharing |journal=Performance Evaluation |date=1 March 1986 |volume=6 |issue=1 |pages=53–68 |doi=10.1016/0166-5316(86)90008-8 |language=en |issn=0166-5316}}</ref> that when the network is heavily loaded, it is more efficient for the least loaded units to offer their availability and when the network is lightly loaded, it is the overloaded processors that require support from the most inactive ones. This rule of thumb limits the number of exchanged messages.

In the case where one starts from a single large task that cannot be divided beyond an atomic level, there is a very efficient algorithm "Tree-Shaped computation",<ref>{{cite journal |last1=Sanders |first1=Peter |title=Tree Shaped Computations as a Model for Parallel Applications |journal=Workshop on Application Based Load Balancing (Alv '98), München, 25. - 26. März 1998 - Veranst. Vom Sonderforschungsbereich 342 "Werkzeuge und Methoden für die Nutzung Paralleler Rechnerarchitekturen". Ed.: A. Bode |date=1998 |page=123 |doi=10.5445/ir/1000074497}}</ref> where the parent task is distributed in a work tree.

====Principle====
Initially, many processors have an empty task, except one that works sequentially on it. Idle processors issue requests randomly to other processors (not necessarily active). If the latter is able to subdivide the task it is working on, it does so by sending part of its work to the node making the request. Otherwise, it returns an empty task. This induces a [[tree structure]]. It is then necessary to send a termination signal to the parent processor when the subtask is completed so that it, in turn, sends the message to its parent until it reaches the root of the tree. When the first processor, i.e. the root, has finished, a global termination message can be broadcast. In the end, it is necessary to assemble the results by going back up the tree.

====Efficiency====
The efficiency of such an algorithm is close to the prefix sum when the job cutting and communication time is not too high compared to the work to be done. To avoid too high communication costs, it is possible to imagine a list of jobs on [[shared memory]]. Therefore, a request is simply reading from a certain position on this shared memory at the request of the master processor.