Mersenne Twister
The Mersenne Twister is a general-purpose pseudorandom number generator (PRNG) developed in 1997 by Template:Nihongo and Template:Nihongo.<ref>Template:Cite journal</ref><ref>E.g. Marsland S. (2011) Machine Learning (CRC Press), §4.1.1. Also see the section "Adoption in software systems".</ref> Its name derives from the choice of a Mersenne prime as its period length.
The Mersenne Twister was created specifically to address most of the flaws found in earlier PRNGs.
The most commonly used version of the Mersenne Twister algorithm is based on the Mersenne prime <math>2^{19937}-1</math>. The standard implementation of that, MT19937, uses a 32-bit word length. There is another implementation (with five variants<ref>Template:Cite web</ref>) that uses a 64-bit word length, MT19937-64; it generates a different sequence.
k-distribution
[edit]A pseudorandom sequence <math>x_i</math> of w-bit integers of period P is said to be k-distributed to v-bit accuracy if the following holds.
- Let truncv(x) denote the number formed by the leading v bits of x, and consider P of the kv-bit vectors
- <math> (\operatorname{trunc}_v(x_i), \operatorname{trunc}_v(x_{i+1}), \, \ldots, \operatorname{trunc}_v(x_{i+k-1})) \quad (0\leq i< P) </math>.
- Then each of the <math>2^{kv}</math> possible combinations of bits occurs the same number of times in a period, except for the all-zero combination that occurs once less often.
Algorithmic detail
[edit]For a w-bit word length, the Mersenne Twister generates integers in the range <math>[0, 2^w-1]</math>.
The Mersenne Twister algorithm is based on a matrix linear recurrence over a finite binary field <math>\textbf{F}_2</math>. The algorithm is a twisted generalised feedback shift register<ref>Template:Cite journal</ref> (twisted GFSR, or TGFSR) of rational normal form (TGFSR(R)), with state bit reflection and tempering. The basic idea is to define a series <math>x_i</math> through a simple recurrence relation, and then output numbers of the form <math>x_i^T</math>, where T is an invertible <math>\textbf{F}_2</math>-matrix called a tempering matrix.
The general algorithm is characterized by the following quantities:
- w: word size (in number of bits)
- n: degree of recurrence
- m: middle word, an offset used in the recurrence relation defining the series <math>x</math>, <math>1 \le m < n</math>
- r: separation point of one word, or the number of bits of the lower bitmask, <math>0 \le r \le w - 1</math>
- a: coefficients of the rational normal form twist matrix
- b, c: TGFSR(R) tempering bitmasks
- s, t: TGFSR(R) tempering bit shifts
- u, d, l: additional Mersenne Twister tempering bit shifts/masks
with the restriction that <math>2^{nw-r}-1</math> is a Mersenne prime. This choice simplifies the primitivity test and k-distribution test needed in the parameter search.
The series <math>x</math> is defined as a series of w-bit quantities with the recurrence relation:
- <math>x_{k+n} := x_{k+m} \oplus \left( ({x_k}^u \mid {x_{k+1}}^l) A \right)\qquad k=0,1,2,\ldots</math>
where <math>\mid</math> denotes concatenation of bit vectors (with upper bits on the left), <math> \oplus </math> the bitwise exclusive or (XOR), <math> x_{k}^{u} </math> means the upper Template:Nowrap bits of <math> x_k </math>, and <math> x_{k+1}^{l} </math> means the lower r bits of <math> x_{k+1} </math>.
The subscripts may all be offset by -n
- <math>x_k := x_{k-(n-m)} \oplus \left( ({x_{k-n}}^u \mid {x_{k-(n-1)}}^l) A \right)\qquad k=n,n+1,n+2,\ldots</math>
where now the LHS, <math> x_k </math>, is the next generated value in the series in terms of values generated in the past, which are on the RHS.
The twist transformation A is defined in rational normal form as:<math display="block"> A = \begin{pmatrix} 0 & I_{w - 1} \\ a_{w-1} & (a_{w - 2}, \ldots , a_0) \end{pmatrix} </math> with <math> I_{w-1} </math> as the <math> (w-1)(w-1) </math> identity matrix. The rational normal form has the benefit that multiplication by A can be efficiently expressed as: (remember that here matrix multiplication is being done in <math> \textbf{F}_{2} </math>, and therefore bitwise XOR takes the place of addition)<math display="block"> \boldsymbol{x}A = \begin{cases}\boldsymbol{x} \gg 1 & x_0 = 0\\(\boldsymbol{x} \gg 1) \oplus \boldsymbol{a} & x_0 = 1\end{cases} </math>where <math> x_0 </math> is the lowest order bit of <math> x </math>.
As like TGFSR(R), the Mersenne Twister is cascaded with a tempering transform to compensate for the reduced dimensionality of equidistribution (because of the choice of A being in the rational normal form). Note that this is equivalent to using the matrix A where <math> A = T^{-1}*AT </math> for T an invertible matrix, and therefore the analysis of characteristic polynomial mentioned below still holds.
As with A, we choose a tempering transform to be easily computable, and so do not actually construct T itself. This tempering is defined in the case of Mersenne Twister as
- <math>
\begin{aligned} y &\equiv x \oplus ((x\gg u)~\And~d)\\ y &\equiv y \oplus ((y\ll s)~\And~b)\\ y &\equiv y \oplus ((y\ll t)~\And~c)\\ z &\equiv y \oplus (y\gg l) \end{aligned} </math>
where <math>x</math> is the next value from the series, <math>y</math> is a temporary intermediate value, and <math>z</math> is the value returned from the algorithm, with <math>\ll</math>and <math>\gg</math> as the bitwise left and right shifts, and <math>\&</math> as the bitwise AND. The first and last transforms are added in order to improve lower-bit equidistribution. From the property of TGFSR, <math>s + t \ge \left\lfloor{\frac{w}{2}}\right\rfloor - 1</math> is required to reach the upper bound of equidistribution for the upper bits.
The coefficients for MT19937 are:
<math> \begin{aligned} (w, n, m, r) &= (32, 624, 397, 31)\\ a &= \textrm{9908B0DF}_{16}\\ (u, d) &= (11, \textrm{FFFFFFFF}_{16})\\ (s, b) &= (7, \textrm{9D2C5680}_{16})\\ (t, c) &= (15, \textrm{EFC60000}_{16})\\ l &= 18\\ \end{aligned} </math>
Note that 32-bit implementations of the Mersenne Twister generally have d = FFFFFFFF16. As a result, the d is occasionally omitted from the algorithm description, since the bitwise and with d in that case has no effect.
The coefficients for MT19937-64 are:<ref name="std::mersenne_twister_engine">Template:Cite web</ref>
<math> \begin{aligned} (w, n, m, r) = (64, 312, 156, 31)\\ a = \textrm{B5026F5AA96619E9}_{16}\\ (u, d) = (29, \textrm{5555555555555555}_{16})\\ (s, b) = (17, \textrm{71D67FFFEDA60000}_{16})\\ (t, c) = (37, \textrm{FFF7EEE000000000}_{16})\\ l = 43\\ \end{aligned}
</math>
Initialization
[edit]The state needed for a Mersenne Twister implementation is an array of n values of w bits each. To initialize the array, a w-bit seed value is used to supply <math>x_0</math> through <math>x_{n-1}</math> by setting <math>x_0</math> to the seed value and thereafter setting
- <math>
x_i = f \times (x_{i-1} \oplus (x_{i-1} \gg (w-2))) + i </math>
for <math>i</math> from <math>1</math> to <math>n-1</math>.
- The first value the algorithm then generates is based on <math>x_n</math>, not on <math>x_0</math>.
- The constant f forms another parameter to the generator, though not part of the algorithm proper.
- The value for f for MT19937 is 1812433253.
- The value for f for MT19937-64 is 6364136223846793005.<ref name="std::mersenne_twister_engine" />
C code
[edit]<syntaxhighlight lang="c">#include <stdint.h>
- define n 624
- define m 397
- define w 32
- define r 31
- define UMASK (0xffffffffUL << r)
- define LMASK (0xffffffffUL >> (w-r))
- define a 0x9908b0dfUL
- define u 11
- define s 7
- define t 15
- define l 18
- define b 0x9d2c5680UL
- define c 0xefc60000UL
- define f 1812433253UL
typedef struct {
uint32_t state_array[n]; // the array for the state vector int state_index; // index into state vector array, 0 <= state_index <= n-1 always
} mt_state;
void initialize_state(mt_state* state, uint32_t seed)
{
uint32_t* state_array = &(state->state_array[0]); state_array[0] = seed; // suggested initial seed = 19650218UL for (int i=1; i<n; i++) { seed = f * (seed ^ (seed >> (w-2))) + i; // Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. state_array[i] = seed; } state->state_index = 0;
}
uint32_t random_uint32(mt_state* state)
{
uint32_t* state_array = &(state->state_array[0]); int k = state->state_index; // point to current state location // 0 <= state_index <= n-1 always
// int k = k - n; // point to state n iterations before // if (k < 0) k += n; // modulo n circular indexing
// the previous 2 lines actually do nothing // for illustration only int j = k - (n-1); // point to state n-1 iterations before if (j < 0) j += n; // modulo n circular indexing
uint32_t x = (state_array[k] & UMASK) | (state_array[j] & LMASK); uint32_t xA = x >> 1; if (x & 0x00000001UL) xA ^= a; j = k - (n-m); // point to state n-m iterations before if (j < 0) j += n; // modulo n circular indexing x = state_array[j] ^ xA; // compute next value in the state state_array[k++] = x; // update new state value if (k >= n) k = 0; // modulo n circular indexing state->state_index = k; uint32_t y = x ^ (x >> u); // tempering y = y ^ ((y << s) & b); y = y ^ ((y << t) & c); uint32_t z = y ^ (y >> l); return z;
}</syntaxhighlight>
Comparison with classical GFSR
[edit]In order to achieve the <math>2^{nw-r}-1</math> theoretical upper limit of the period in a TGFSR, <math>\phi_{B}(t)</math> must be a primitive polynomial, <math>\phi_{B}(t)</math> being the characteristic polynomial of
- <math>
B = \begin{pmatrix} 0 & I_w & \cdots & 0 & 0 \\ \vdots & & & & \\ I_w & \vdots & \ddots & \vdots & \vdots \\ \vdots & & & & \\ 0 & 0 & \cdots & I_w & 0 \\ 0 & 0 & \cdots & 0 & I_{w - r} \\ S & 0 & \cdots & 0 & 0 \end{pmatrix} \begin{matrix} \\ \\ \leftarrow m\text{-th row} \\ \\ \\ \\ \end{matrix} </math>
- <math>
S = \begin{pmatrix} 0 & I_r \\ I_{w - r} & 0 \end{pmatrix} A </math>
The twist transformation improves the classical GFSR with the following key properties:
- The period reaches the theoretical upper limit <math>2^{nw-r}-1</math> (except if initialized with 0)
- Equidistribution in n dimensions (e.g. linear congruential generators can at best manage reasonable distribution in five dimensions)
Variants
[edit]CryptMT is a stream cipher and cryptographically secure pseudorandom number generator which uses Mersenne Twister internally.<ref name="eSTREAM">Template:Cite web</ref><ref>Template:Cite web</ref> It was developed by Matsumoto and Nishimura alongside Mariko Hagita and Mutsuo Saito. It has been submitted to the eSTREAM project of the eCRYPT network.<ref name="eSTREAM" /> Unlike Mersenne Twister or its other derivatives, CryptMT is patented.
MTGP is a variant of Mersenne Twister optimised for graphics processing units published by Mutsuo Saito and Makoto Matsumoto.<ref>Template:Cite arXiv</ref> The basic linear recurrence operations are extended from MT and parameters are chosen to allow many threads to compute the recursion in parallel, while sharing their state space to reduce memory load. The paper claims improved equidistribution over MT and performance on an old (2008-era) GPU (Nvidia GTX260 with 192 cores) of 4.7 ms for 5×107 random 32-bit integers.
The SFMT (SIMD-oriented Fast Mersenne Twister) is a variant of Mersenne Twister, introduced in 2006,<ref>Template:Cite web</ref> designed to be fast when it runs on 128-bit SIMD.
- It is roughly twice as fast as Mersenne Twister.<ref>Template:Cite web</ref>
- It has a better equidistribution property of v-bit accuracy than MT but worse than WELL ("Well Equidistributed Long-period Linear").
- It has quicker recovery from zero-excess initial state than MT, but slower than WELL.
- It supports various periods from 2607 − 1 to 2216091 − 1.
Intel SSE2 and PowerPC AltiVec are supported by SFMT. It is also used for games with the Cell BE in the PlayStation 3.<ref>Template:Cite web</ref>
TinyMT is a variant of Mersenne Twister, proposed by Saito and Matsumoto in 2011.<ref>Template:Cite web</ref> TinyMT uses just 127 bits of state space, a significant decrease compared to the original's 2.5 KiB of state. However, it has a period of <math>2^{127}-1</math>, far shorter than the original, so it is only recommended by the authors in cases where memory is at a premium.
Characteristics
[edit]Advantages:
- Permissively-licensed and patent-free for all variants except CryptMT.
- Passes numerous tests for statistical randomness, including the Diehard tests and most, but not all of the TestU01 tests.<ref name="TestU01">P. L'Ecuyer and R. Simard, "TestU01: "A C library for empirical testing of random number generators", ACM Transactions on Mathematical Software, 33, 4, Article 22 (August 2007).</ref>
- A very long period of <math>2^{19937}-1</math>. Note that while a long period is not a guarantee of quality in a random number generator, short periods, such as the <math>2^{32}</math> common in many older software packages, can be problematic.<ref>Note: 219937 is approximately 4.3 × 106001; this is many orders of magnitude larger than the estimated number of particles in the observable universe, which is 1087.</ref>
- k-distributed to 32-bit accuracy for every <math>1 \le k \le 623</math>
- Implementations generally create random numbers faster than hardware-implemented methods. A study found that the Mersenne Twister creates 64-bit floating point random numbers approximately twenty times faster than the hardware-implemented, processor-based RDRAND instruction set.<ref>Template:Cite journal</ref>
Disadvantages:
- Relatively large state buffer, of almost 2.5 kB, unless the TinyMT variant is used.
- Mediocre throughput by modern standards, unless the SFMT variant (discussed below) is used.<ref>Template:Cite web</ref>
- Exhibits two clear failures (linear complexity) in both Crush and BigCrush in the TestU01 suite. The test, like Mersenne Twister, is based on an <math>\textbf{F}_2</math>-algebra.<ref name="TestU01" />
- Multiple instances that differ only in seed value (but not other parameters) are not generally appropriate for Monte-Carlo simulations that require independent random number generators, though there exists a method for choosing multiple sets of parameter values.<ref>Template:Cite web</ref><ref>Template:Cite web</ref>
- Poor diffusion: can take a long time to start generating output that passes randomness tests, if the initial state is highly non-random—particularly if the initial state has many zeros. A consequence of this is that two instances of the generator, started with initial states that are almost the same, will usually output nearly the same sequence for many iterations, before eventually diverging. The 2002 update to the MT algorithm has improved initialization, so that beginning with such a state is very unlikely.<ref>Template:Cite web</ref> The GPU version (MTGP) is said to be even better.<ref name="fog">Template:Cite journal</ref>
- Contains subsequences with more 0's than 1's. This adds to the poor diffusion property to make recovery from many-zero states difficult.
- Is not cryptographically secure, unless the CryptMT variant (discussed below) is used. The reason is that observing a sufficient number of iterations (624 in the case of MT19937, since this is the size of the state vector from which future iterations are produced) allows one to predict all future iterations.
Applications
[edit]The Mersenne Twister is used as default PRNG by the following software:
- Programming languages: Dyalog APL,<ref>Template:Cite web</ref> IDL,<ref>Template:Cite web</ref> R,<ref>Template:Cite web</ref> Ruby,<ref>Template:Cite web</ref> Free Pascal,<ref>Template:Cite web</ref> PHP,<ref>Template:Cite web</ref> Python (also available in NumPy, however the default was changed to PCG64 instead as of version 1.17<ref>Template:Cite web</ref>),<ref>Template:Cite web</ref><ref>Template:Cite web</ref><ref>Template:Cite web</ref> CMU Common Lisp,<ref>Template:Cite web</ref> Embeddable Common Lisp,<ref>Template:Cite web</ref> Steel Bank Common Lisp,<ref>Template:Cite web</ref> Julia (up to Julia 1.6 LTS, still available in later, but a better/faster RNG used by default as of 1.7)<ref>Template:Cite web</ref>
- Unix-likes libraries and software: GLib,<ref>Template:Cite web</ref> GNU Multiple Precision Arithmetic Library,<ref>Template:Cite web</ref> GNU Octave,<ref>Template:Cite web</ref> GNU Scientific Library<ref>Template:Cite web</ref>
- Other: Microsoft Excel,<ref>Template:Citation.</ref> GAUSS,<ref>Template:Cite web</ref> gretl,<ref>"uniform". Gretl Function Reference.</ref> Stata,<ref>Template:Cite web</ref> SageMath,<ref>Template:Cite web</ref> Scilab,<ref>Template:Cite web</ref> Maple,<ref>Template:Cite web</ref> MATLAB<ref>Template:Cite web</ref>
It is also available in Apache Commons,<ref>Template:Cite web</ref> in the standard C++ library (since C++11),<ref>Template:Cite web</ref><ref>Template:Cite web</ref> and in Mathematica.<ref>[1] Mathematica Documentation</ref> Add-on implementations are provided in many program libraries, including the Boost C++ Libraries,<ref>Template:Cite web</ref> the CUDA Library,<ref>Template:Cite web</ref> and the NAG Numerical Library.<ref>Template:Cite web</ref>
The Mersenne Twister is one of two PRNGs in SPSS: the other generator is kept only for compatibility with older programs, and the Mersenne Twister is stated to be "more reliable".<ref>Template:Cite web</ref> The Mersenne Twister is similarly one of the PRNGs in SAS: the other generators are older and deprecated.<ref>Template:Cite web</ref> The Mersenne Twister is the default PRNG in Stata, the other one is KISS, for compatibility with older versions of Stata.<ref>Stata help: set rng -- Set which random-number generator (RNG) to use</ref>
Alternatives
[edit]An alternative generator, WELL ("Well Equidistributed Long-period Linear"), offers quicker recovery, and equal randomness, and nearly equal speed.<ref>P. L'Ecuyer, "Uniform Random Number Generators", International Encyclopedia of Statistical Science, Lovric, Miodrag (Ed.), Springer-Verlag, 2010.</ref>
Marsaglia's xorshift generators and variants are the fastest in the class of LFSRs.<ref>Template:Cite web</ref>
64-bit MELGs ("64-bit Maximally Equidistributed <math>\textbf{F}_2</math>-Linear Generators with Mersenne Prime Period") are completely optimized in terms of the k-distribution properties.<ref>Template:Cite journal</ref>
The ACORN family (published 1989) is another k-distributed PRNG, which shows similar computational speed to MT, and better statistical properties as it satisfies all the current (2019) TestU01 criteria; when used with appropriate choices of parameters, ACORN can have arbitrarily long period and precision.
The PCG family is a more modern long-period generator, with better cache locality, and less detectable bias using modern analysis methods.<ref>Template:Cite web</ref>
References
[edit]Further reading
[edit]External links
[edit]- The academic paper for MT, and related articles by Makoto Matsumoto
- Mersenne Twister home page, with codes in C, Fortran, Java, Lisp and some other languages
- Mersenne Twister examples — a collection of Mersenne Twister implementations, in several programming languages - at GitHub
- SFMT in Action: Part I – Generating a DLL Including SSE2 Support – at Code Project