Editing IBM Blue Gene (section)

===Architecture===
The Blue Gene/L architecture was an evolution of the QCDSP and [[QCDOC]] architectures. Each Blue Gene/L Compute or I/O node was a single [[Application-specific integrated circuit|ASIC]] with associated [[Dynamic random access memory|DRAM]] memory chips. The ASIC integrated two 700&nbsp;MHz [[PowerPC 440]] embedded processors, each with a double-pipeline-double-precision [[floating-point unit|Floating-Point Unit]] (FPU), a [[CPU cache|cache]] sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs gave each Blue Gene/L node a theoretical peak performance of 5.6&nbsp;[[FLOPS|GFLOPS (gigaFLOPS)]]. The two CPUs were not [[Cache coherency|cache coherent]] with one another.

Compute nodes were packaged two per compute card, with 16 compute cards (thus 32 nodes) plus up to 2 I/O nodes per node board. A cabinet/rack contained 32 node boards.<ref>{{cite web|url=https://asc.llnl.gov/computing_resources/bluegenel/configuration.html|title=BlueGene/L Configuration|first=Lynn|last=Kissel|website=asc.llnl.gov|access-date=13 October 2017|archive-date=17 February 2013|archive-url=https://web.archive.org/web/20130217032440/https://asc.llnl.gov/computing_resources/bluegenel/configuration.html|url-status=dead}}</ref> By the integration of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated about 17 watts (including DRAMs). The low power per node allowed aggressive packaging of up to 1024 compute nodes, plus additional I/O nodes, in a standard [[19-inch rack]], within reasonable limits on electrical power supply and air cooling. The system performance metrics, in terms of [[FLOPS per watt]], FLOPS per m<sup>2</sup> of floorspace and FLOPS per unit cost, allowed scaling up to very high performance. With so many nodes, component failures were inevitable. The system was able to electrically isolate faulty components, down to a granularity of half a rack (512 compute nodes), to allow the machine to continue to run.

Each Blue Gene/L node was attached to three parallel communications networks: a [[dimension|3D]] [[torus interconnect|toroidal network]] for peer-to-peer communication between compute nodes, a collective network for collective communication (broadcasts and reduce operations), and a global interrupt network for [[Barrier (computer science)|fast barriers]]. The I/O nodes, which run the [[Linux]] [[operating system]], provided communication to storage and external hosts via an [[Ethernet]] network. The I/O nodes handled filesystem operations on behalf of the compute nodes. A separate and private [[Ethernet]] management network provided access to any node for configuration, [[booting]] and diagnostics.

To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes. The number of nodes in a partition had to be a positive [[integer]] power of 2, with at least 2<sup>5</sup> = 32 nodes. To run a program on Blue Gene/L, a partition of the computer was first to be reserved. The program was then loaded and run on all the nodes within the partition, and no other program could access nodes within the partition while it was in use. Upon completion, the partition nodes were released for future programs to use.

Blue Gene/L compute nodes used a minimal [[operating system]] supporting a single user program. Only a subset of [[POSIX]] calls was supported, and only one process could run at a time on a node in co-processor mode—or one process per CPU in virtual mode. Programmers needed to implement [[green threads]] in order to simulate local concurrency. Application development was usually performed in [[C (programming language)|C]], [[C++]], or [[Fortran]] using [[Message Passing Interface|MPI]] for communication. However, some scripting languages such as [[Ruby (programming language)|Ruby]]<ref>{{Cite web|title=Compute Node Ruby for Bluegene/L|website=www.ece.iastate.edu|url=http://www.ece.iastate.edu/~crb002/cnr.html|archive-url=https://web.archive.org/web/20090211071506/http://www.ece.iastate.edu:80/~crb002/cnr.html|url-status=dead|archive-date=February 11, 2009}}</ref> and [[Python (programming language)|Python]]<ref>{{cite conference |url=http://us.pycon.org/2011/home/ |title=Python for High Performance Computing |author=William Scullin |date=March 12, 2011 |location=Atlanta, GA}}</ref> have been ported to the compute nodes.

IBM published BlueMatter, the application developed to exercise Blue Gene/L, as open source.<ref>[https://github.com/IBM/BlueMatter Blue Matter source code, retrieved February 28, 2020]</ref> This serves to document how the torus and collective interfaces were used by applications, and may serve as a base for others to exercise the current generation of supercomputers.