Editing JPEG (section)

====Discrete cosine transform====
[[File:JPEG example subimage.svg|thumb|256px|The 8×8 sub-image shown in 8-bit grayscale]]

Next, each 8×8 block of each component (Y, Cb, Cr) is converted to a [[frequency-domain]] representation, using a normalized, two-dimensional type-II discrete cosine transform (DCT), see Citation 1 in discrete cosine transform. The DCT is sometimes referred to as "type-II DCT" in the context of a family of transforms as in [[Discrete cosine transform#DCT-II|discrete cosine transform]], and the corresponding inverse (IDCT) is denoted as "type-III DCT".

As an example, one such 8×8 8-bit subimage might be:

:<math>
\left[
\begin{array}{rrrrrrrr}
 52 & 55 & 61 & 66 & 70 & 61 & 64 & 73 \\
 63 & 59 & 55 & 90 & 109 & 85 & 69 & 72 \\
 62 & 59 & 68 & 113 & 144 & 104 & 66 & 73 \\
 63 & 58 & 71 & 122 & 154 & 106 & 70 & 69 \\
 67 & 61 & 68 & 104 & 126 & 88 & 68 & 70 \\
 79 & 65 & 60 & 70 & 77 & 68 & 58 & 75 \\
 85 & 71 & 64 & 59 & 55 & 61 & 65 & 83 \\
 87 & 79 & 69 & 68 & 65 & 76 & 78 & 94
\end{array}
\right].
</math>

Before computing the DCT of the 8×8 block, its values are shifted from a positive range to one centered on zero. For an 8-bit image, each entry in the original block falls in the range <math>[0, 255]</math>. The midpoint of the range (in this case, the value 128) is subtracted from each entry to produce a data range that is centered on zero, so that the modified range is <math>[-128, 127]</math>. This step reduces the dynamic range requirements in the DCT processing stage that follows.

This step results in the following values:

:<math>g=
\begin{array}{c}
x \\
\longrightarrow \\
\left[
\begin{array}{rrrrrrrr}
 -76 & -73 & -67 & -62 & -58 & -67 & -64 & -55 \\
 -65 & -69 & -73 & -38 & -19 & -43 & -59 & -56 \\
 -66 & -69 & -60 & -15 & 16 & -24 & -62 & -55 \\
 -65 & -70 & -57 & -6 & 26 & -22 & -58 & -59 \\
 -61 & -67 & -60 & -24 & -2 & -40 & -60 & -58 \\
 -49 & -63 & -68 & -58 & -51 & -60 & -70 & -53 \\
 -43 & -57 & -64 & -69 & -73 & -67 & -63 & -45 \\
 -41 & -49 & -59 & -60 & -63 & -52 & -50 & -34
\end{array}
\right]
\end{array}
\Bigg\downarrow y.
</math>

[[File:Dctjpeg.png|thumb|The DCT transforms an 8×8 block of input values to a [[linear combination]] of these 64 patterns. The patterns are referred to as the two-dimensional DCT ''basis functions'', and the output values are referred to as ''transform coefficients''. The horizontal index is <math>u</math> and the vertical index is <math>v</math>.]]

The next step is to take the two-dimensional DCT, which is given by:

:<math>\ G_{u,v} =
 \frac{1}{4}
 \alpha(u)
 \alpha(v)
 \sum_{x=0}^7
 \sum_{y=0}^7
 g_{x,y}
 \cos \left[\frac{(2x+1)u\pi}{16} \right]
 \cos \left[\frac{(2y+1)v\pi}{16} \right]
</math>

where
* <math>\ u</math> is the horizontal [[spatial frequency]], for the integers <math>\ 0 \leq u < 8</math>.
* <math>\ v</math> is the vertical spatial frequency, for the integers <math>\ 0 \leq v < 8</math>.
* <math>\alpha(u)</math> and <math>\alpha(v)</math> are normalizing scale factors to make the transformation [[orthonormal]] with <math>
\alpha(i) =
\begin{cases}
 \frac{1}{\sqrt{2}}, & \mbox{if }i=0 \\
 1, & \mbox{otherwise}
\end{cases}
</math> 
* <math>\ g_{x,y}</math> is the pixel value at coordinates <math>\ (x,y)</math>
* <math>\ G_{u,v}</math> is the DCT coefficient at coordinates <math>\ (u,v).</math>

If we perform this transformation on our matrix above, we get the following (rounded to the nearest two digits beyond the decimal point):

:<math>G=
\begin{array}{c}
u \\
\longrightarrow \\
\left[
\begin{array}{rrrrrrrr}
-415.38 & -30.19 & -61.20 & 27.24 & 56.12 & -20.10 & -2.39 & 0.46 \\
4.47 & -21.86 & -60.76 & 10.25 & 13.15 & -7.09 & -8.54 & 4.88 \\
-46.83 & 7.37 & 77.13 & -24.56 & -28.91 & 9.93 & 5.42 & -5.65 \\
-48.53 & 12.07 & 34.10 & -14.76 & -10.24 & 6.30 & 1.83 & 1.95 \\
12.12 & -6.55 & -13.20 & -3.95 & -1.87 & 1.75 & -2.79 & 3.14 \\
-7.73 & 2.91 & 2.38 & -5.94 & -2.38 & 0.94 & 4.30 & 1.85 \\
-1.03 & 0.18 & 0.42 & -2.42 & -0.88 & -3.02 & 4.12 & -0.66 \\
-0.17 & 0.14 & -1.07 & -4.19 & -1.17 & -0.10 & 0.50 & 1.68
\end{array}
\right]
\end{array}
\Bigg\downarrow v.
</math>

Note the top-left corner entry with the rather large magnitude. This is the [[DC bias|DC]] coefficient (also called the constant component), which defines the basic hue for the entire block. The remaining 63 coefficients are the AC coefficients (also called the alternating components).<ref>{{cite web|url=http://forum.doom9.org/showthread.php?p=184647#post184647|title=DC / AC Frequency Questions - Doom9's Forum|website=forum.doom9.org|access-date=16 October 2017|archive-date=17 October 2017|archive-url=https://web.archive.org/web/20171017042422/http://forum.doom9.org/showthread.php?p=184647#post184647|url-status=live}}</ref> The advantage of the DCT is its tendency to aggregate most of the signal in one corner of the result, as may be seen above. The quantization step to follow accentuates this effect while simultaneously reducing the overall size of the DCT coefficients, resulting in a signal that is easy to compress efficiently in the entropy stage.

The DCT temporarily increases the bit-depth of the data, since the DCT coefficients of an 8-bit/component image take up to 11 or more bits (depending on fidelity of the DCT calculation) to store. This may force the codec to temporarily use 16-bit numbers to hold these coefficients, doubling the size of the image representation at this point; these values are typically reduced back to 8-bit values by the quantization step. The temporary increase in size at this stage is not a performance concern for most JPEG implementations, since typically only a very small part of the image is stored in full DCT form at any given time during the image encoding or decoding process.