Which are the main blocks that make JPEG2000 work?
What does ‘image compression’ mean?
Why do we need a lower bit
rate?
What does ‘quantization’ mean?
HVS: what is
it? Which is its
purpose in image compression?
Which is the meaning of lossy
and lossless compression?
Why is a lossy compression
acceptable?
When is a lossless
compression preferable?
How can the compression of an
image be measured?
Does the peak signal to noise
ratio represent a reliable index of image quality?
What is the ‘concept of
irrelevance’?
What is the embedded
bit-stream?
Which is the difference between an embedded and a non-embedded
bit-stream?
What is the meaning of ‘higher quality image’?
MSE: what is
it? Is it reliable? Is there an alternative?
How much does the PSNR
improve if we codify an image using JPEG2000 instead of JPEG?
Why do we make a distinction
between bit-stream and code stream?
How is the
strutture of an
image that has to be coded organized?
Which is the disadvantage of
JPEG baseline compared to JPEG2000?
How does the old JPEG standard
work?
How does the new standard
behave with reference to this?
What is the meaning of
‘Spatial Random Access’?
Why can we say that JPEG2000
has a progressive way of working?
What
about JPEG2000 performances compared to the other compression
standards?
Which are the main blocks that make JPEG2000 work?
The JPEG2000 way of working
can be described using functional blocks:

![]()
In one-dimensional DWT an analysis block is used. It is made up of two
digital filters (
e
): a high pass and a low pass one. In JPEG2000, both the filters
always have odd components and practically (5,3) and (9,7) are used: 5 is the
number of the low-pass components, whereas 3 is the number of the high-pass
ones. The difference between the numbers of the two components must be always a
power of two. The output of either filter is subject to a downsampling that
halve the number of samples for each filter (if summed, they are still equal to
the input number): these samples are called wavelet coefficients.
In order to reconstruct the bit stream, an upsampling that restores the
lost samples by putting zeros between one sample and the other is used; then
another couple of filters is used.
In 2 dimensions the DWT is applied to each row and then to each column
of the image. As a result, the image is divided in subbands marked by their
position:

The image can be still seen only in the 1LL subband, which can be
transformed once more and divided into other 4 subbands: this is called 2nd
level DWT. This process can be repeated as long as a satisfactory compression
level is reached.
What does ‘image compression’
mean?
Compression allows to reduce the number of bits used to represent the
coded image, a number that is smaller than how we would use to store the image
in the original format. This number is variable and it depends on the
compression algorithm we use.
JPEG2000 supports two different kinds of compression: lossy and
lossless.
Why do we need a lower bit rate?
A lower bit rate implies a lower occupation of the transmission channel
and of the storage supports. Therefore it is advisable for a standard to assure
enough lower bit-rates depending on the user’s requirements. In this case
JPEG2000 reaches much better results than traditional JPEG.
What
does ‘quantization’ mean?
Quantization allows to pass from a continuous to a discrete set;
therefore it is the main step in image digitalization. Typically one byte is
used for each chromatic component.
It consists of the necessity of having a range
of integers to be codified. Both the encoder and the decoder must use a number
of bits enough large to represent the quantization factor for each subband.
HVS: what is
it? Which is its purpose in image compression?
The abbreviation HVS is the
acronym of Human Visual System. Its analysis and study represent a basic step
in the realization of image compression standards (both for still pictures and
video sequences of any kind), in order to improve the quality perceived in
image vision as much as possible. Human eye sensibility changes with respect to
the different image components (brightness level, spatial frequency, color,
contrast, temporal frequency, etc…). In the case of gray levels perception,
human eyes do not have a linear behavior but a behavior similar to a cube root.
Therefore, if we applied a linear quantization, the perceived quality would be
worse than in the case of a quantization made with a cube root function. From a
mathematical point of view, it is as though we had two filters whose transfer
functions are one the inverse of the other: putting them in a cascade
structure, we obtain an uniform perception of gray levels. Similar
considerations can be made for every aspect of an image.
In an image it is highly likely that pixels
close to each other have a similar color. By exploiting this property, called
spatial redundancy, it is possible to reduce the image size considerably, using
a smaller number of bits.
Which is the meaning of lossy
and lossless compression?
A lossy compression involves
a loss of information compared to the original signal; therefore it is not
possible to reconstruct the original signal from one that has been compressed
by using a lossy compression algorithm. Vice versa, a lossless compression does
not involve a loss of information and allows a perfect reconstruction of the
original signal after the decompression.
A lossy compression is
acceptable mainly for three reasons:
Ø
The human visual system is
often able to tolerate loss in image quality without compromising the ability
to perceive the contents of the scene.
Ø
It happens very often that
the digital input of the lossy compression algorithm is, in its turn, an
imperfect representation of the real image. It should be considered, indeed,
that the image is reconstructed starting from a finite number of samples, which
can take only discrete values.
Ø
A lossless compression would
never be able to grant the high levels of compression that can be obtained by
using a lossy compression and that are necessary for applications in storage
and/or distribution of images.
When is a lossless
compression preferable?
Ø
In medical applications, when
it is difficult to choose an acceptable error level for image representation.
Ø
In those cases when we deal
with highly structured images (typically non-natural ones) like texts or
graphics, which are usually more easily handleable by the means of lossless
compression techniques.
Ø
In all those applications
when images are frequently modified or re-compressed: using a lossy coding,
there would be too many errors in the image that would make the quality
absolutely unacceptable.
How can the compression of an
image be measured?
Image compression is measured
by the means of an index called Compression Ratio: =
;
and
represent the
number of pixels along the horizontal and vertical dimension of the image, B is
the number of bits required to represent each pixel, while ||c|| is the final
bit-stream obtained after the compression.
Does the peak signal to noise
ratio represent a reliable index of image quality?
The PSNR can give an
approximate index of image quality, but by itself it cannot make a comparison
between the quality of two different images. It is possible, indeed, that an
image with a lower PSNR might be perceived as an image of better quality
compared to one with a higher signal to noise ratio.
What is the ‘concept of
irrelevance’?
The concept of irrelevance is
applied to that information which is unnecessary for the exact reconstruction
of an image. There are two different kinds of irrelevance:
Ø
Visual irrelevance, if the
image contains details that cannot be perceived by human eye.
Ø
Irrelevance due to the fact
that, for specific applications (for instance medical and military ones), only
some specific regions of the image may be of interest rather than the image in
its wholeness. Therefore, those parts that do not belong to the region of
interest are classified as irrelevant.
A code block is a rectangular
assemblage (Tile) of coefficients belonging to the same subband. Each Tile can
be identified by four parameters:
and
locate the starting
point of each rectangle, whereas
and
identify its size.
Differential Pulse Code Modulation. It is a predictive coding used in
order to transmit the value of a sample calculated by the means of previous
samples instead of the real value.
DPCM is typically
used before applying entropy coding, after the wavelet transform, in order to
optimize the performance of the compression algorithm.
What is the embed
Filtering the image through subsequent quality layers creates the
embedded bit-stream; the first one (Q0) codifies the different
code-blocks by using a prefixed length l0 and minimizing the
distortion. The following layers contain additional contributions related to
every code-block, minimizing further the distortion.
Each quality layer contains the contributions of each code-block.
In order to have an embedded bit-stream that allows scalability in
distortion, it is necessary to introduce enough information to identify the
contributions of each quality layer related to the code-blocks.
Which is the difference between an embedded and a non-embedded
bit-stream?
The most important advantage of using an embedded bit-stream rather than
a non-embedded one is that it is possible to choose the aimed compression level
after the compression of the source: this is called an a-posteriori approach.
It is therefore possible to carry out scalability in distortion.
The chromatic space is a vectorial space used to represent the colors.
The most important chromatic spaces are:
· CYMB (Cyan, Yellow, Magenta,
Black);
What is the meaning of ‘higher quality image’?
There are different approaches to image quality evaluation and they are
based on objective and subjective parameters.
The quality of a compressed image is evaluated by analyzing the
difference between the original and the compressed one.
One of the most widely used parameters for the evaluation of image
quality is the MSE.
MSE: what is
it? Is it reliable? Is there an alternative?
MSE means ‘Mean Square Error’. It represents the classical error
estimate given by the equation:

where M and N are the image dimensions.
It is a widely used criterion but is often not enough representative.
JND and Hosaka Plots may be used as an alternative to MSE.
Many new applications of
JPEG2000 standard demand that compress data should be transmitted over channels
with different error characteristics. For example, ‘wireless’
communications are subject to ‘random’ channel errors, while Internet
communications are subject to a loss of data due to congestioning traffic.
As we know, JPEG2000 carries out
a codification that is independent from the code-blocks; this is an advantage
in error resilience, as each error in the ‘bit-stream’ that corresponds to a
code-block will remain inside it.
How much does the PSNR improve if we codify an image using JPEG2000
instead of JPEG?
The following graphic, related to the same image coded
with JPEG baseline and JPEG2000, shows that despite of using the same number of
bits per pixel, the difference between the two standards is remarkable. For
example, for 1.125 bpp, JPEG2000 performs better with a difference of 6 dB;
this confirms the higher visual quality of the new standard.
|
Figure 1: JPEG baseline (0.125
bpp) |
Figure 2: JPEG 2000 (0.125
bpp) |

The Canvas is a reference
greed (that is a coordinate system) that JPEG2000 uses to describe the action
of different components (typically the color components) on the image.
This greed allows to deal more
easily with components management, above all if we notice that these components
can have different size. A particular localization inside the Canvas is given
to each sample of each component.
Why do we make a distinction
between bit-stream and code stream?
The two expressions are not
equivalent but they are related to different objects.
Ø
The bit-stream is the
sequence of bits derived from the codification of a set of symbols; it is
typically used to refer to the sequence of bits that comes from the codification
of each code-block.
Ø
The code-stream is, on
the other hand, a set made up of several bit-streams to which the necessary
information for the image decodification and reconstruction is associated. This
additional information may refer, for example, to the locations of particular
bit-streams or to information regarding transformation, quantization and coding
methods.
How is the structure of an
image that has to be coded organized?
First of all, JPEG2000 makes
use of the DWT (discrete wavelet transform), whereas JPEG baseline is based on
the use of the DCT (discrete cosine transform). The DWT transforms iteratively
one signal into two or more filtered and cut signals that correspond to
different frequency bands. Essentially, the image is divided into four spectral
bands (LL, LH, HL, HH) that are obtained by the means of a low-pass (L) or
high-pass (H) filtering along the horizontal and vertical directions.
Afterwards, each of these four bands is divided once more into four other
subbands; the operation is repeated a number of times that depends on the
number of levels ‘d’ of the DWT. The obtained samples are grouped into blocks
(the code-blocks) and these blocks carry out the decodification of the
bit-stream. Obviously, the larger the blocks size will be, the higher
possibility of exploiting the redundancy among the samples (that is a higher
coding efficiency) there will be. On the other hand, the use of blocks of
larger size allows a lower flexibility in the management of the information
inside the final bit-stream.
Which is the disadvantage of
JPEG baseline compared to JPEG2000?
Basically, JPEG2000
introduces a change in the approach of image management. Previous systems and
standards had quite a passive role and they could be seen simply as input-output
filters. Decisions about image quality and compression ratios were made during
the compression stage; during the decompression stage only quality, resolution
and dimensions (decided during the compression stage) were available. It is
possible to obtain a resolution reduction: in order to do that, the image has
to be decompressed, subsampled and compressed again, with the possibility of a
further deterioration of the image quality. With regard to this aspect, the
behavior of JPEG2000 is much more flexible and this fact allows to have a
larger variety of alternatives during the decompression stage; for example, the
possibility of extracting directly from the initial code-stream another
code-stream that represents the image with a lower resolution or only a part of
the image, without a decompression being necessary.
How does the old JPEG
standard work?
Traditional JPEG can work
according to four different philosophies: serial, progressive, hierarchic and
lossless.
In particular, a progressive
working modality refers to the possibility of decompressing the bit-stream to a
quality which is poorer than the one decided during the compression stage and
to the fact that the most important bits inside the bit-stream are arranged so
that they can be seen by the decoder for first (this is a particularly suitable
aspect in narrow band links).
The adjective ‘hierarchic’,
instead, underlines the fact that inside the bit-streams there are some bits
that are used to create a basic image, usually small and/or with a low
resolution. Starting from this image, the further bits are used to obtain an
image with higher resolution or dimension.
On the contrary, lossless
coding uses prediction techniques. There is the possibility that some of these
approaches are applied at the same time. For instance, it is possible to apply
the progressive and hierarchic methods into the same code-stream.
How does the new standard
behave with reference to this?
JPEG2000 manages to combine in
a single format all the advantages that these four approaches can offer. In
substance JPEG2000 supports the possibility to have both a scalable resolution
and quality. During the decompression stage (when we are going to open the
image) we can decide which level of quality and resolution the image has to
own. So these properties cease to be fixed during the compression, but can be
chosen any time according to the specific needs.
What is the meaning of
‘Spatial Random Access’?
It is the possibility to have
a random access to the various ‘regions’ of the image. This property goes
beyond the possibility of visualizing only a limited area of the image, as it
even make it possible to extract gray levels from the color image or to extract
possible texts and graphics that lays on the image. What it is important is
that in any case there is no need to decode the whole code-stream (the whole
image), but only the bytes relative to the component or the region of the image
we are interested in.
Why can we say that JPEG2000
has a progressive way of working?
JPEG2000 has a progressive
behavior in four different directions, which are Quality, Resolution, Spatial
Location and Components.
Ø As far as the quality is concerned, we have already
said that the quality of the decoded image is proportional to the amount of
received data. So a basic image is built and its quality is incremented
successively.
Ø Similarly it is done for the resolution. With the
increasing amount of received data, resolution or image dimension are doubled
each time in each direction (vertical and horizontal).
Ø As far as the Spatial Location is concerned,
progressivity in this direction expresses the possibility of receiving the
image in an approximately raster way (a raster scanning is based on the
scanning of a line until its end, then another scanning on a new line). This
possibility is useful in particular for applications concerning with limited
capacity memory, or, for example, for the codification. Scanners with small
memory can generate directly the bit-stream without the need of having a buffer
where to store the complete image and then code it. The forth dimension of
progressivity is referred to the image components. JPEG2000 supports image that
have up to 16384 components. Actually, images may have one component
(grayscale), three components (RGB) or four components (CMYK). To these ones,
possible texts and graphics superimposed on the image can be applied. For
example, by the progressivity of the components it is possible to decode the gray
level component first, then the color one, and finally the one concerning with
possible texts or superimposed images.
Yes, it is. Bit rates being equal, JPEG2000 assures a better quality than traditional JPEG by the means of an essentially more efficient algorithm.
What
about JPEG2000 performances compared to the other compression
standards?
A group
of experts carried out some technical comparisons, using various kinds of
images (natural or not, computer
generated, scanned text images), among JPEG2000, lossless JPEG (L-JPEG),
JPEG-LS (a method of lossless compression that uses LOCO-I algorithm, very
simple and with high performances), MPEG 4-Visual Texture Coding (that uses
DWT), PNG (a format that derives from GIF) and SPIHT (also based on DWT, but
not a standard yet).
In
case of lossless compression, the best performances in compression were
obtained by JPEG-LS, except in the case of computer generated images or texts,
for which PNG grants better results.
JPEG2000
is versatile and efficient with any kind of images and offers results similar
to the other standards (less flexible).
In
the case of lossy compression, progressive or not, JPEG2000 with not reversible
Daubechies 9,7 filters is better than all the other standards in terms of PSNR
on the average MSE, whatever the bit rate is: by increasing the compression
ratio, the quality of JPEG2000 increases in comparison with the other ones.
In
terms of visual quality, JPEG2000 was compared with JPEG only. The result shows
that, visual quality being equal, JPEG needs a larger bit rate (from 13% to
112% depending on the number of bits) and the differences increase if the
compression increases.
As
far as the computing times that a PC needs for compression and decompression (a
Pentium III 550 with 512MB of SDRAM under Linux was used), JPEG-LS (less
complex and more efficient than the other ones) and JPEG obtain better results
than JPEG2000 in case of lossless compression, but JPEG2000 has a constant
behavior with every compression technology and for any bit rate that is used.
Finally,
in condition of transmission of images affected by errors (for any bit rate and
with an error rate from 10^-6 to 10^-4) JPEG2000 proves itself to be cleanly
better than any other type of standards in terms of quality and the image
quality, for medium-high error rates, is almost constant if the bit rate
increase, while for JPEG it gets lower (as the number of decoded bits affected
by errors increases).
In
conclusion, we can say that the strongest point of JPEG2000 is its flexibility,
as it is adaptable to every kind of use with good performances under any aspect
and for any kind of application, showing itself to be robust, efficient and
fast, and being able to support a wide variety of functionalities (lossless and
lossy compression, progressive bit-stream, use of ROI). On the contrary, the
other standards are very efficient only within a limited action range, resulting
better than others only for specific aims.