Editing JPEG 2000 (section)

==Technical discussion==
The aim of JPEG 2000 is not only improving compression performance over JPEG but also adding (or improving) features such as scalability and editability. JPEG&nbsp;2000's improvement in compression performance relative to the original JPEG standard is actually rather modest and should not ordinarily be the primary consideration for evaluating the design.  Very low and very high compression rates are supported in JPEG&nbsp;2000. The ability of the design to handle a very large range of effective bit rates is one of the strengths of JPEG&nbsp;2000. For example, to reduce the number of bits for a picture below a certain amount, the advisable thing to do with the first JPEG standard is to reduce the resolution of the input image before encoding it. That is unnecessary when using JPEG&nbsp;2000, because JPEG&nbsp;2000 already does this automatically through its multi-resolution decomposition structure. The following sections describe the algorithm of JPEG&nbsp;2000.

According to the [[Royal Library of the Netherlands]], "the current JP2 format specification leaves room for multiple interpretations when it comes to the support of ICC profiles, and the handling of grid resolution information".<ref>{{cite journal |url=http://www.dlib.org/dlib/may11/vanderknijff/05vanderknijff.html|doi=10.1045/may2011-vanderknijff |title=JPEG 2000 for Long-term Preservation: JP2 as a Preservation Format |first=Johan |last=van der Knijff |journal=D-Lib Magazine |date=2011 |volume=17 |issue=5/6 |doi-access=free }}</ref>

===Color components transformation===
Initially images have to be transformed from the RGB [[color space]] to another color space, leading to three ''components'' that are handled separately. There are two possible choices:
# Irreversible Color Transform (ICT) uses the well known BT.601 [[YCbCr#JPEG_conversion|YC{{sub|B}}C{{sub|R}}]] color space. It is called "irreversible" because it has to be implemented in floating or fix-point and causes round-off errors. The ICT shall be used only with the 9/7 wavelet transform.
# Reversible Color Transform (RCT) uses a modified YUV color space (almost the same as [[YCoCg|YC{{sub|G}}C{{sub|O}}]]) that does not introduce quantization errors, so it is fully reversible. Proper implementation of the RCT requires that numbers be rounded as specified and cannot be expressed exactly in matrix form. The RCT shall be used only with the 5/3 wavelet transform. The transformations are:
::<math>
\begin{array}{rl}
Y &=& \left\lfloor \frac{R+2G+B}{4} \right\rfloor ;  \\
C_B &=& B - G ; \\
C_R &=& R - G ;
\end{array}
\qquad
\begin{array}{rl}
G &=& Y - \left\lfloor \frac{C_B + C_R}{4} \right\rfloor ; \\
R &=& C_R + G ; \\
B &=& C_B + G.
\end{array}
</math>
If R, G, and B are normalized to the same precision, then numeric precision of C{{sub|B}} and C{{sub|R}} is one bit greater than the precision of the original components. This increase in precision is necessary to ensure reversibility. The [[chrominance]] components can be, but do not necessarily have to be, downscaled in resolution; in fact, since the wavelet transformation already separates images into scales, downsampling is more effectively handled by dropping the finest wavelet scale. This step is called ''multiple component transformation'' in the JPEG&nbsp;2000 language since its usage is not restricted to the [[RGB color model]].<ref>{{Cite web |title=T.800 : Information technology - JPEG 2000 image coding system: Core coding system |url=https://www.itu.int/rec/T-REC-T.800-201511-S/en |access-date=2021-03-19 |website=ITU.int }}</ref>

===Tiling===
After color transformation, the image is split into so-called ''tiles'', rectangular regions of the image that are transformed and encoded separately. Tiles can be any size, and it is also possible to consider the whole image as one single tile. Once the size is chosen, all the tiles will have the same size (except optionally those on the right and bottom borders). Dividing the image into tiles is advantageous in that the decoder will need less memory to decode the image and it can opt to decode only selected tiles to achieve a partial decoding of the image. The disadvantage of this approach is that the quality of the picture decreases due to a lower [[peak signal-to-noise ratio]]. Using many tiles can create a blocking effect similar to the older [[JPEG]]&nbsp;1992 standard.

===Wavelet transform===
[[File:Wavelet Bior2.2.svg|thumb|[[Cohen-Daubechies-Feauveau wavelet|CDF]] 5/3 wavelet used for lossless compression]]
[[File:Jpeg2000 2-level wavelet transform-lichtenstein.png|thumb|256px|An example of the wavelet transform that is used in JPEG&nbsp;2000. This is a 2nd-level CDF 9/7 [[wavelet transform]].]]

These tiles are then [[wavelet transform|wavelet-transformed]] to an arbitrary depth, in contrast to JPEG&nbsp;1992 which uses an 8×8 block-size [[discrete cosine transform]]. JPEG&nbsp;2000 uses two different [[wavelet]] transforms:

# ''irreversible'': the [[Cohen-Daubechies-Feauveau wavelet|CDF]] 9/7 wavelet transform (developed by [[Ingrid Daubechies]]).<ref name="Unser">{{cite journal |last1 = Unser |first1 = M. |last2=Blu |first2=T. |title = Mathematical properties of the JPEG2000 wavelet filters |journal=IEEE Transactions on Image Processing |year = 2003 |volume=12 |issue=9 |pages=1080–1090 |doi=10.1109/TIP.2003.812329 |pmid=18237979 |bibcode=2003ITIP...12.1080U |s2cid=2765169 |url=https://pdfs.semanticscholar.org/6ed4/dece8b364416d9c390ba53df913bca7fb9a6.pdf |archive-url=https://web.archive.org/web/20191013222932/https://pdfs.semanticscholar.org/6ed4/dece8b364416d9c390ba53df913bca7fb9a6.pdf |archive-date=2019-10-13 }}</ref> It is said to be "irreversible" because it introduces quantization noise that depends on the precision of the decoder.
# ''reversible'': a rounded version of the biorthogonal Le Gall–Tabatabai (LGT) 5/3 wavelet transform<ref>{{cite web |last=Sullivan |first=Gary |title=General characteristics and design considerations for temporal subband video coding |publisher=[[Video Coding Experts Group]] |website=[[ITU-T]] |date=8–12 December 2003 |url=https://www.itu.int/wftp3/av-arch/video-site/0312_Wai/VCEG-U06.doc |access-date=13 September 2019}}</ref><ref name="Unser" /><ref>{{cite book |last=Bovik |first=Alan C. |title=The Essential Guide to Video Processing |year=2009 |publisher=[[Academic Press]] |isbn=9780080922508 |page=355 |url=https://books.google.com/books?id=wXmSPPB_c_0C&pg=PA355 }}</ref> (developed by Didier Le Gall and Ali J. Tabatabai).<ref>{{cite conference |last1=Le Gall |first1=Didier |last2=Tabatabai |first2=Ali J. |title=Sub-band coding of digital images using symmetric short kernel filters and arithmetic coding techniques |conference=ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing |date=1988 |pages=761–764 |volume=2 |doi=10.1109/ICASSP.1988.196696 |s2cid=109186495 }}</ref> It uses only integer coefficients, so the output does not require rounding (quantization) and so it does not introduce any quantization noise. It is used in lossless coding.

The wavelet transforms are implemented by the [[lifting scheme]] or by [[convolution]].

===Quantization===
After the wavelet transform, the coefficients are scalar-[[Quantization (image processing)|quantized]] to reduce the number of bits to represent them, at the expense of quality. The output is a set of integer numbers which have to be encoded bit-by-bit. The parameter that can be changed to set the final quality is the quantization step: the greater the step, the greater is the compression and the loss of quality. With a quantization step that equals 1, no quantization is performed (it is used in lossless compression).

===Coding===
<!-- [[EBCOT]] redirects here. -->
The result of the previous process is a collection of ''sub-bands'' which represent several approximation scales. A sub-band is a set of ''coefficients''—[[real numbers]] which represent aspects of the image associated with a certain frequency range as well as a spatial area of the image.

The quantized sub-bands are split further into ''precincts'', rectangular regions in the wavelet domain. They are typically sized so that they provide an efficient way to access only part of the (reconstructed) image, though this is not a requirement.

Precincts are split further into ''code blocks''. Code blocks are in a single sub-band and have equal sizes—except those located at the edges of the image. The encoder has to encode the bits of all quantized coefficients of a code block, starting with the most significant bits and progressing to less significant bits by a process called the ''EBCOT'' scheme. ''EBCOT'' here stands for ''Embedded Block Coding with Optimal Truncation''. In this encoding process, each [[bit plane]] of the code block gets encoded in three so-called ''coding passes'', first encoding bits (and signs) of insignificant coefficients with significant neighbors (i.e., with 1-bits in higher bit planes), then refinement bits of significant coefficients and finally coefficients without significant neighbors. The three passes are called ''Significance Propagation'', ''Magnitude Refinement'' and ''Cleanup'' pass, respectively.

In lossless mode all bit planes have to be encoded by the EBCOT, and no bit planes can be dropped.

The bits selected by these coding passes then get encoded by a context-driven binary [[Arithmetic coding|arithmetic coder]], namely the binary MQ-coder (as also employed by [[JBIG2]]). The context of a coefficient is formed by the state of its eight neighbors in the code block.

The result is a bit-stream that is split into ''packets'' where a ''packet'' groups selected passes of all code blocks from a precinct into one indivisible unit. Packets are the key to quality scalability (i.e., packets containing less significant bits can be discarded to achieve lower bit rates and higher distortion).

Packets from all sub-bands are then collected in so-called ''layers''.
The way the packets are built up from the code-block coding passes, and thus which packets a layer will contain, is not defined by the JPEG&nbsp;2000 standard, but in general a codec will try to build layers in such a way that the image quality will increase monotonically with each layer, and the image distortion will shrink from layer to layer. Thus, layers define the progression by image quality within the codestream.

The problem is now to find the optimal packet length for all code blocks which minimizes the overall distortion in a way that the generated target bitrate equals the demanded bit rate.

While the standard does not define a procedure as to how to perform this form of [[rate–distortion optimization]], the general outline is given in one of its many appendices: For each bit encoded by the EBCOT coder, the improvement in image quality, defined as mean square error, gets measured; this can be implemented by an easy table-lookup algorithm. Furthermore, the length of the resulting codestream gets measured. This forms for each code block a graph in the rate–distortion plane, giving image quality over bitstream length. The optimal selection for the truncation points, thus for the packet-build-up points is then given by defining critical ''slopes'' of these curves, and picking all those coding passes whose curve in the rate–distortion graph is steeper than the given critical slope. This method can be seen as a special application of the method of ''[[Lagrange multiplier]]'' which is used for optimization problems under constraints. The [[Lagrange multiplier]], typically denoted by λ, turns out to be the critical slope, the constraint is the demanded target bitrate, and the value to optimize is the overall distortion.

Packets can be reordered almost arbitrarily in the JPEG&nbsp;2000 bit-stream; this gives the encoder as well as image servers a high degree of freedom.

Already encoded images can be sent over networks with arbitrary bit rates by using a layer-progressive encoding order. On the other hand, color components can be moved back in the bit-stream; lower resolutions (corresponding to low-frequency sub-bands) could be sent first for image previewing. Finally, spatial browsing of large images is possible through appropriate tile or partition selection. All these operations do not require any re-encoding but only byte-wise copy operations.{{Citation needed|date=December 2022}}

===Compression ratio===
[[File:Lichtenstein jpeg2000 difference.png|thumb|225px|This image shows the (accentuated) difference between an image saved as JPEG&nbsp;2000 (quality 50%) and the original.]]
[[File:Comparison between JPEG, JPEG 2000, JPEG XR and HEIF.png|thumb|225px|Comparison of JPEG, JPEG&nbsp;2000, [[JPEG&nbsp;XR]], and [[HEIF]] at similar file sizes.]]

Compared to the previous JPEG standard, JPEG&nbsp;2000 delivers a typical compression gain in the range of 20%, depending on the image characteristics. Higher-resolution images tend to benefit more, where JPEG&nbsp;2000's spatial-redundancy prediction can contribute more to the compression process. In very low-bitrate applications, studies have shown JPEG&nbsp;2000 to be outperformed<ref>{{cite web |last=Halbach |first=Till |title=Performance Comparison: H.26L Intra Coding vs. JPEG2000 |date=July 2002 |url=http://etill.net/papers/jvt-d039.pdf |access-date=2008-04-22 |archive-url=https://web.archive.org/web/20110723120739/http://etill.net/papers/jvt-d039.pdf |archive-date=2011-07-23 }}</ref> by the intra-frame coding mode of H.264.

===Computational complexity and performance===
JPEG 2000 is much more complicated in terms of computational complexity in comparison with JPEG standard. Tiling, color component transform, discrete wavelet transform, and quantization could be done pretty fast, though entropy codec is time-consuming and quite complicated. EBCOT context modelling and arithmetic MQ-coder take most of the time of JPEG&nbsp;2000 codec.

On CPU the main idea of getting fast JPEG 2000 encoding and decoding is closely connected with AVX/SSE and multithreading to process each tile in a separate thread. The fastest JPEG&nbsp;2000 solutions utilize both CPU and GPU power to get high performance benchmarks.<ref>{{Cite web |last=Fastvideo |title=JPEG2000 Performance Benchmarks on GPU |date=September 2018 |url=https://www.fastcompression.com/benchmarks/benchmarks-j2k.htm |access-date=2019-04-26 }}</ref><ref>{{Cite web |last=Comprimato |title=JPEG2000 Performance Specification |date=September 2016 |url=http://comprimato.com/specifications/ |access-date=2016-09-01 |archive-date=2016-09-13 |archive-url=https://web.archive.org/web/20160913132744/http://comprimato.com/specifications/ |url-status=dead }}</ref>