Editing Parallel computing (section)

===Race conditions, mutual exclusion, synchronization, and parallel slowdown===
Subtasks in a parallel program are often called [[Thread (computing)|threads]]. Some parallel computer architectures use smaller, lightweight versions of threads known as [[Fiber (computer science)|fibers]], while others use bigger versions known as [[Process (computing)|processes]]. However, "threads" is generally accepted as a generic term for subtasks.<ref>{{cite web
 | url = https://msdn.microsoft.com/en-us/library/windows/desktop/ms684841(v=vs.85).aspx
 | title = Processes and Threads
 | date = 2018
 | website = Microsoft Developer Network
 | publisher = Microsoft Corp.
 | access-date = 2018-05-10
}}</ref> Threads will often need [[Synchronization (computer science)|synchronized]] access to an [[Object (computer science)|object]] or other [[Resource management (computing)|resource]], for example when they must update a [[Variable (programming)|variable]] that is shared between them. Without synchronization, the instructions between the two threads may be interleaved in any order. For example, consider the following program:

{| class="wikitable"
|-
!Thread A
!Thread B
|-
|1A: Read variable V
|1B: Read variable V
|-
|2A: Add 1 to variable V
|2B: Add 1 to variable V
|-
|3A: Write back to variable V
|3B: Write back to variable V
|}

If instruction 1B is executed between 1A and 3A, or if instruction 1A is executed between 1B and 3B, the program will produce incorrect data. This is known as a [[race condition]]. The programmer must use a [[Lock (computer science)|lock]] to provide [[mutual exclusion]]. A lock is a programming language construct that allows one thread to take control of a variable and prevent other threads from reading or writing it, until that variable is unlocked. The thread holding the lock is free to execute its [[critical section]] (the section of a program that requires exclusive access to some variable), and to unlock the data when it is finished. Therefore, to guarantee correct program execution, the above program can be rewritten to use locks:

{| class="wikitable"
|-
!Thread A
!Thread B
|-
|1A: Lock variable V
|1B: Lock variable V
|-
|2A: Read variable V
|2B: Read variable V
|-
|3A: Add 1 to variable V
|3B: Add 1 to variable V
|-
|4A: Write back to variable V
|4B: Write back to variable V
|-
|5A: Unlock variable V
|5B: Unlock variable V
|}

One thread will successfully lock variable V, while the other thread will be [[Software lockout|locked out]]—unable to proceed until V is unlocked again. This guarantees correct execution of the program. Locks may be necessary to ensure correct program execution when threads must serialize access to resources, but their use can greatly slow a program and may affect its [[Software quality#Reliability|reliability]].<ref>{{cite web
 | url = http://www.developforperformance.com/ThreadSafetyForPerformance.html
 | title = Thread Safety for Performance
 | last = Krauss
 | first = Kirk J
 | date = 2018
 | website = Develop for Performance
 | access-date = 2018-05-10
 | archive-date = 2018-05-13
 | archive-url = https://web.archive.org/web/20180513081315/http://www.developforperformance.com/ThreadSafetyForPerformance.html
 | url-status = dead
 }}</ref>

Locking multiple variables using [[Atomic operation|non-atomic]] locks introduces the possibility of program [[Deadlock (computer science)|deadlock]]. An [[atomic lock]] locks multiple variables all at once. If it cannot lock all of them, it does not lock any of them. If two threads each need to lock the same two variables using non-atomic locks, it is possible that one thread will lock one of them and the second thread will lock the second variable. In such a case, neither thread can complete, and deadlock results.<ref>{{cite book
 | url = http://www.informit.com/articles/article.aspx?p=25193
 | title = Introduction to Operating System Deadlocks
 | last = Tanenbaum
 | first = Andrew S.
 | date = 2002-02-01
 | website = Informit
 | publisher = Pearson Education, Informit
 | access-date = 2018-05-10
}}</ref>

Many parallel programs require that their subtasks [[Synchronization (computer science)|act in synchrony]]. This requires the use of a [[barrier (computer science)|barrier]]. Barriers are typically implemented using a lock or a [[Semaphore (programming)|semaphore]].<ref>{{cite web
 | url = https://www.embedded.com/design/operating-systems/4440752/Synchronization-internals----the-semaphore
 | title = Synchronization internals &ndash; the semaphore
 | last = Cecil
 | first = David
 | date = 2015-11-03
 | website = Embedded
 | publisher = AspenCore
 | access-date = 2018-05-10
}}</ref> One class of algorithms, known as [[lock-free and wait-free algorithms]], altogether avoids the use of locks and barriers. However, this approach is generally difficult to implement and requires correctly designed data structures.<ref>{{cite web
 | url = http://preshing.com/20120612/an-introduction-to-lock-free-programming/
 | title = An Introduction to Lock-Free Programming
 | last = Preshing
 | first = Jeff
 | date = 2012-06-08
 | website = Preshing on Programming
 | access-date = 2018-05-10
}}</ref>

Not all parallelization results in speed-up. Generally, as a task is split up into more and more threads, those threads spend an ever-increasing portion of their time communicating with each other or waiting on each other for access to resources.<ref>{{cite web
 | url = https://stackoverflow.com/questions/806569/whats-the-opposite-of-embarrassingly-parallel
 | title = What's the opposite of "embarrassingly parallel"?
 | website = StackOverflow
 | access-date = 2018-05-10
}}</ref><ref>{{cite web
 | url = https://stackoverflow.com/questions/1970345/what-is-thread-contention
 | title = What is thread contention?
 | last = Schwartz
 | first = David
 | date = 2011-08-15
 | website = StackOverflow
 | access-date = 2018-05-10
}}</ref> Once the overhead from resource contention or communication dominates the time spent on other computation, further parallelization (that is, splitting the workload over even more threads) increases rather than decreases the amount of time required to finish. This problem, known as [[parallel slowdown]],<ref>{{cite web
 | url = https://software.intel.com/en-us/blogs/2008/03/04/why-a-simple-test-can-get-parallel-slowdown
 | title = Why a simple test can get parallel slowdown
 | last = Kukanov
 | first = Alexey
 | date = 2008-03-04
 | access-date = 2015-02-15
}}</ref> can be improved in some cases by software analysis and redesign.<ref>{{cite web
 | url = http://www.developforperformance.com/ThreadingForPerformance.html
 | title = Threading for Performance
 | last = Krauss
 | first = Kirk J
 | date = 2018
 | website = Develop for Performance
 | access-date = 2018-05-10
 | archive-date = 2018-05-13
 | archive-url = https://web.archive.org/web/20180513081501/http://www.developforperformance.com/ThreadingForPerformance.html
 | url-status = dead
 }}</ref>