Image and Image Processing
Unit-II
EL-447 (Multimedia Systems and Networks)
Image
What is An Image?
Grayscale image
A grayscale image is a function I(x,y) of the two spatial coordinates of the image plane.
I(x,y) is the intensity of the image at the point (x,y) on the image plane.
We can regard I(x,y) as taking values in R+ = [0, inf)
We can restrict the image to be bounded by some rectangle [0,a] ×[0,b]
I: [0, a] ×[0, b] →[0, inf )
Color image
Can be represented by three functions, R(x,y) for red, G(x,y) for green, and B(x,y) for blue.
Details on color vision and representation will be discussed later.
Classification of Images
Reflection Images
Information primarily about object surfaces
Examples: Optical imaging, radar, sonar, laser
Emission Images:
Information primarily internal to the object
Example: Thermal, infrared, MRI
Absorption images
Information primarily about the internal structure to the object
Examples: X-ray, transmission microscopy, sonic images
Digital image
Sampled and quantized image is called digital Image.
A digital image is an image f(x,y) that has been
digitized both in spatial
coordinates and brightness.
the value of f at any point
(x,y) is proportional to the
brightness (or gray level)
of the image at that point.
Digital image
Digital image
A digital image can be considered a matrix whose row and column indices identify a point in the image and the corresponding matrix element value
identifies the gray level at that point.
Pixel values in highlighted region
Examples of Digital image
A Simple Image Model
Light intensity function:
image refers to a 2D light-intensity function, f(x,y)
the amplitude of f at spatial coordinates (x,y) gives the intensity (brightness) of the image at that point.
light is a form of energy thus f(x,y) must be nonzero and finite.
0 < f(x,y) < ∞
Illumination and reflectance:
the basic nature of f(x,y) may be characterized by 2 components:
Illumination, i(x,y): the amount of source light incident on the scene being viewed.
Reflectance, r(x,y): the amount of light reflected by the objects in the scene.
A Simple Image Model
f (x,y) = i(x,y) r(x,y)
0 < i(x,y) < ∞
determined by the nature of the light source
0 < r(x,y) < 1
determined by the characteristics of the objects in a scene.
bounded from total absorption (0)
to total reflectance (1).
Digital Image Processing?
One picture is worth more than ten thousand words.
Interest in DIP stems from two principal application areas:
Improvement of pictorial information for human interpretation.
Processing of image data for storage, transmission and representation for autonomous machine perception.
What is Image Processing?
Image processing is the subclass of
signal processing concerned specifically with pictures.
Improve image quality for human perception and/or computer
interpretation.
Image Image
Processing
Better Image
Fields that deal with images
Computer Graphics: the creation of images.
Image Processing : the enhancement or other manipulation of the image – the
result of which is usually another images.
Computer Vision: the analysis of image
content.
Fields that deal with images
Input/Output Image Description Image Image Computer
Processing Vision Description Computer AI
Graphics
Computer Vision, Image Processing and
Computer Graphics often work together to
produce amazing results.
Applications of Image Processing
Improvement of pictorial information for human interpretation.
Processing of image data for storage,
transmission, and representation for autonomous machine perception
There are limitless applications of image processing. Some examples are:
Radiation from the Electromagnetic spectrum
Acoustic, geological imaging, Radar Imaging
Medical Imaging : X-ray, Ultrasonic, MRI
Industrial, Law enforcing,
Computer (synthetic images used for modeling and visualization)
Classification of Image Processing
Low-level : input, output are images
Primitive operations such as image preprocessing to reduce noise, contrast enhancement, and image
sharpening
Mid-level : inputs may be images, outputs are attributes extracted from those images
Segmentation
Description of objects
Classification of individual objects
High-level :
Image analysis
Digital Image Processing (DIP)
Image: Any 2-D signal/data.
DIP: processing of two dimensional picture by a digital computer.
An image is captured by a sensor (such as a monochrome or color TV camera) and digitized.
If the output of the camera or sensor is
not already in digital form, an analog-to-
digital converter digitizes it.
Image Acquisition
Camera
Camera consists of two parts• A lens that collects the
appropriate type of radiation emitted from the object of
interest and that forms an image of the real object
• a semiconductor device – so called charged coupled device or CCD
which converts the irradiance at the image plan into an electrical signal.
Image Acquisition
Frame Grabber
Frame grabber only needs circuits to digitize the
electrical signal from the
imaging sensor to store
the image in the memory
(RAM) of the computer.
Image Enhancement
To bring out detail is obscured, or
simply to highlight certain features of
interest in an image.
Image Restoration
Improving the appearance of an image
Tend to be based on mathematical or
probabilistic models of image degradation
Distorted image Restored image
Examples:
Image Compression
Reducing the storage required to save an image or the bandwidth required to
transmit it.
Ex. JPEG (Joint Photographic Experts Group) image compression standard.
Wavelet: Foundation for representing images in various degrees of resolution.
Used in image data compression and pyramidal representation (images are subdivided
successively into smaller regions)
Image Segmentation
computer tries to separate objects from the image
background.
It is one of the most difficult tasks in DIP.
A rugged segmentation
procedure brings the process a long way toward successful
solution of an image problem.
Output of the segmentation stage is raw pixel data,
constituting either the boundary of a region or all the points in
the region itself.
Representation & Description
Representation make a decision whether the data should be represented as a boundary or as a complete region.
Boundary representation focus on external shape characteristics, such as corners and inflections.
Region representation focus on
internal properties, such as texture or
skeleton shape.
Representation & Description
Recognition and Interpretation
Recognition the process that assigns a label to an object based on the
information provided by its descriptors.
Interpretation assigning meaning to an ensemble of recognized objects.
Colour models
Color Perception
Motivation for Color Image Processing
Color is a powerful descriptor that often simplifies object identification and segmentation.
Human can identify thousands on color shades and intensities, compared to only two dozen of gray shades. (important in
manual image analysis).
Color Representation for images and video
How the physical spectra of a scene is transformed into RGB components, and how these components are transformed to physical spectra at the display
Cones vs. Rods
3 types of cones (for color)
1 type of rod (night vision, no color)
Light is a part of EM wave
What is color?
Color is the perceptual result of light having wavelength 400 nm to 700 nm that is incident upon the retina.
“Power distribution exists in the physical world, but color exists only in the eye and the brain”
Light is a part of EM wave
•Perceived color depends on spectral content (wavelength composition) e.g., 700nm ~ red.
•“spectral color”
A light with very narrow bandwidth
•A light with equal energy in all visible bands appears white.
Illuminating and Reflecting Light
Illuminating sources (primary light):
emit light (e.g. the sun, light bulb, TV monitors)
perceived color depends on the emitted freq.
follows additive rule
» R+G+B=White
Reflecting sources (Secondary light):
reflect an incoming light (e.g. the color dye, matte surface, cloth)
perceived color depends on reflected freq (=emitted freq -absorbed freq.)
follows subtractive rule
» R+G+B=Black
Color Perception
The color that human perceive in an object
= the light reflected from the object
Illumination source scene
reflection eye
Eye vs. Camera
Camera Components Eye Components
Lens Lens, Cornea
Shutter Iris, pupils
Film Retina
Cable to transfer images Optic nerves to send the information to the brain
Source:
http://www.macula/anatomy/retina frame.html
Human perception of color
Retina contains photo receptors
Cones: day vision, can perceive color tone
Red, green, and blue cones
65% cones are sensitive to red light, 33% to green and 2% to blue light.
Tri-receptor theory of color vision
Rods: night vision, perceive brightness only
Color sensation is characterized by
Luminance (brightness)
Chrominance (Hue and Saturation Together)
Hue (color tone)
specify color tone (redness, greenness, etc.).
depend on peak wavelength.
Saturation (color purity)
describe how pure the color is.
depend on the spread (bandwidth) of light spectrum
reflect how much white light is added.
Color Mixing
Primary colors for illuminating sources:
Red, Green, Blue (RGB)
Color monitor works by exciting red, green, blue phosphors using separate electronic guns
Primary colors for reflecting sources (also known as
secondary colors):
Cyan, Magenta, Yellow (CMY)
Color printer works by using cyan, magenta, yellow and black (CMYK) dyes.
Color Mixing (RGB Vs CMY)
Color complements
Complements on
the color circles Color hue specification
Color Representation Models
A color model is a specification of a 3-D coordinate system and a subspace within that system where each color is represented by a single point.
Many color models are in use depending upon the application.
Models based on primary colors (hardware Oriented)
RGB (used in color monitors and video cameras).
CMY, CMYK (used in color printers)
Models based on luminance and chrominance (Application Oriented)
HSI (hue, saturation and intensity): used in developing image processing algorithms based on HVS.
HSV (hue, saturation and value): similar to HSI.
YIQ (used in NTSC color TV) (I, Q are two chrominance)
YUV (used in digital color TV, video coding)
RGB Color Model
Based on Cartesian coordinate system with color subspace as a cube.
RGB are three corners at three axis, cyan, magenta and yellow are at three other corners, black at origin and white is at corner farthest from the origin.
• used in color monitors and digital cameras.
• all color values are normalized in side a unit cube.
• 8 bits for each color component, 24 bit/pixel.
•1Kx1K display need
display buffer of 3MB. • total 16 million colors.
{(2^8)^3=16,777,261}
Example: 02
Examples
CMY Color Model
C (cyan), M (magenta) and Y (yellow) are
secondary color of light or primary colors of pigments.
Each color is represented by these three.
Cyan coated surface does not reflect red light.
Equal amounts of Cyan, Magenta, and Yellow produce black. In practice, this produce muddy-looking black.
To produce true black, a fourth color, black is added, which is CMYK color model.
Used in color printers and copiers.
RGB to CMY conversion
CMY Color Model (Example)
original
cyan
Magenta yellow
HSI or HSV color models
Each color is specified in terms of its Hue (H), Saturation (S) and intensity (I) or value (V).
This model is sometimes referred to as HSV instead of HSI.
The main advantages of this model is that:
Chrominance (H, S) and luminance (I) components are decoupled.
Hue and saturation is intimately related to the way the human visual system perceives color.
In short, the RGB model is suited for image color
generation, whereas the HSI model is suited for image color description.
HSI or HSV color models
It is related to the RGB model as follows:
HSI or HSV color models
Converting Color from HSI to RGB
HSI or HSV color models
HSI or HSV color models
HSI or HSV color models
HSI or HSV color models (Examples)
hue
saturation intensity
original
YIQ or YUV color models
Each color is represented in terms of a luminance
component (Y) and two chrominance or color components:
inphase (I) and quadrature (Q) components or Y and V components.
YIQ is used in United States commercial TV broadcasting National Television System Committee (NTSC ) system.
YUV is used in most of the European and Asian TV broadcasting (PAL system).
Used for maintaining transmission efficiency and to provide compatibility with monochrome TV.
The Y component provides all the video information required by a monochrome TV receiver/monitor.
The main advantage of these models is that the luminance and chrominance components are decoupled and can be
processed separately.
YIQ or YUV color models
I signal lies 33
0counter clockwise to +(R-Y) where eye has maximum color
resolution.
I = 0.74(R-Y) - 0.27(B-Y)
Q signal lies 33
0counter clockwise to +(B-Y) where eye has minimal color
resolution.
Q = 0.48(R-Y) +0.41(B-Y)
YIQ or YUV color models
In PAL system, the weighted (B-Y) and (R-Y) signals are modulated without being given phase shift of 330.
Chrominance information is represented in terms of U and V, where
U=0.493(B-Y) & V=0.877(R-Y)
YIQ is related to RGB model by:
YIQ or YUV color models (Examples)
original
Y
U V
Comparison Example-1
Comparison Example-2
Image Compression & JPEG
Image Compression
What is Image Compression?
Reduction of the amount of data required to
represent a digital image removal of redundant data.
Transforming a 2-D pixel array into a statistically uncorrelated data set
Why Compression?
Important in data storage and data transmission
Examples:
Progressive transmission of images (Internet)
Video coding (HDTV, teleconferencing)
Digital libraries and image databases
Remote sensing
Medical imaging
Why Need Compression?
Savings in storage and transmission
multimedia data (esp. image and video) have large data volume.
difficult to send real-time uncompressed video over current network.
Accommodate relatively slow storage devices
they do not allow playing back uncompressed multimedia data in real time
1x CD-ROM transfer rate ~ 150 kB/s
320 x 240 x 24 fps color video bit rate ~ 5.5MB/s
=> 36 seconds needed to transfer 1-sec uncompressed video from CD
Example: Storing An Encyclopedia
500,000 pages of text (2kB/page) ~ 1GB => 2:1 compress
3,000 color pictures (64048024bits) ~ 3GB => 15:1
500 maps (64048016bits=0.6MB/map) ~ 0.3GB => 10:1
60 minutes of stereo sound (176kB/s) ~ 0.6GB => 6:1
30 animations with average 2 minutes long
(64032016bits16frames/s=6.5MB/s) ~ 23.4GB => 50:1
50 digitized movies with average 1 minute long
(64048024bits30frames/s = 27.6MB/s) ~ 82.8GB => 50:1
Require a total of 111.1GB storage capacity if without compression
Reduce to 2.96GB if with compression
Loss less Vs. Lossy compression
Lossless (or information-preserving) compression:
Images can be compressed and restored without any loss of information (e.g., medical imaging, satellite imaging)
Lossless compression tools
Entropy coding : Huffman, Arithmetic, Lempel-Ziv, run-length
Predictive coding: reduce the dynamic range to code
Transform: enhance energy compaction
Lossy compression:
Perfect recovery is not possible but provides a large data compression (e.g., TV signals, teleconferencing)
Lossy compression tools
Discarding and thresholding
Quantization: Scalar quantization and vector quantization
Data Redundancy
Data are the means by which information is conveyed
Various amounts of data may be used to
represent the same amount of information
Data redundancy: if n
1and n
2denote the
number of information-carrying units in two
data sets that represent the same information,
the relative data redundancy of the first data
set is:
Data Redundancy
Data Redundancy
In digital image compression there exist three basic data redundancies:
Inter-pixel or spatial redundancy
Psychovisual redundancy
Statistical redundancy
Inter-pixel Redundancy
Inter-pixel Redundancy
Second image shows high correlation between pixels 45 and 90 samples apart
Adjacent pixels of both images are highly correlated
Interpixel redundancy: the value of any given pixel can be reasonably predicted from the values of its neighbors; as a consequence, any pixel carries a small amount of information
Interpixel redundancy can be reduced through
mappings (e.g., differences between adjacent
pixels)
Psycho visual Redundancy
The eye does not respond with equal sensitivity to all visual information
Certain information has less relative importance than other information in normal visual processing
(psychovisually redundant)
It can be eliminated without significantly impairing the quality of image perception
Transform Coding
Transform-based Image Coder
JPEG: Still Image
Compression Standard
JPEG: “Joint Photographic Experts Group”
Formally: ISO/IEC JTC1/SC29/WG1
Work commenced in mid-1980‟s.
Draft international standard 1991.
Widely used for image exchange, WWW, and digital photography.
International organization for Standardization International
Electro-technical Commission
Joint ISO/IEC Technical
Committee (Information Technology)
Sub-committee 29 (Coding of Audio, Picture, Multimedia and Hypermedia information
Working Group 1 (JBIG,JPEG)
What is JPEG?
The Joint Photographic Expert Group (JPEG), under
both the International Standards Organization (ISO) and the International Telecommunications Union-Telecommunication Sector (ITU-T)
– www.jpeg.org
Has published several standards
JPEG: lossy coding of continuous tone still images
Based on DCT
JPEG-LS: lossless and near lossless coding of continuous tone still images
Based on predictive coding and entropy coding
JPEG2000: scalable coding of continuous tone still images (from lossy to lossless)
Based on wavelet transform
71
Image Compression: JPEG
Summary:
JPEG Compression
DCT
Quantization
Zig-Zag Scan
RLE and DPCM
Entropy Coding
JPEG Modes
Sequential
Lossless
Progressive
Hierarchical
Sources:
The JPEG website:
http://www.jpeg.org
JPEG Belong to Hybrid Coding Schemes
RLE, Huffman or Arithmetic
Coding
Lossy Coding
Lossless Coding
73
Why JPEG?
The compression ratio of lossless methods (e.g., Huffman, Arithmetic, LZW) is not high enough for image and video compression.
JPEG uses transform coding, it is largely based on the following observations:
Observation 1: A large majority of useful image contents change relatively slowly across images, i.e., it is unusual for intensity values to alter up and down several times in a small area, for example, within an 8 x 8 image block.
A translation of this fact into the spatial frequency domain, implies, generally, lower spatial frequency components contain more
information than the high frequency components which often correspond to less useful details and noises.
Observation 2: Experiments suggest that humans are more immune to loss of higher spatial frequency components than loss of lower
frequency components.
The JPEG Standard
Contains several modes:
Baseline system (what is commonly known as JPEG!): lossy
Can handle gray scale or color images (8bit)
Extended system: lossy
Can handle higher precision (12 bit) images, providing progressive streams, etc.
Lossless version
Baseline system
Each color component is divided into 8x8 blocks
For each 8x8 block, three steps are involved:
Block DCT
Perceptual-based quantization
Variable length coding: Runlength and Huffman coding
JPEG:Encoder and Decoder
76
JPEG Coding
Y CCbr
DPCM RLC Entropy
Coding Header
Tables
Data
Coding Tables
Quant…
Tables
f(i, j) DCT
8 x 8
F(u, v) 8 x 8
Quantization Fq(u, v)
Zig Zag Scan
Steps Involved:
• Discrete Cosine
Transform of each 8x8 pixel array
f(x,y) T F(u,v)
• Quantization using a table or using a constant
• Zig-Zag scan to exploit redundancy
• Differential Pulse Code Modulation(DPCM) on the DC component and Run length Coding of the AC components
• Entropy coding (Huffman) of the final output
JPEG:Basic Algorithm
JPEG: Image Partitioning
Pre-Processing
Color Space Conversion:
Human visual system has less resolution in color than in intensity.
As a result, the original color images are converted into intensity and color channels. One such transformation is RGB->YIQ
Divide image into 8x8 blocks of pixels.
Shift values [0, 2P - 1] to [-2P-1, 2P-1 - 1]
e.g. if (P=8), shift [0, 255] to [-127, 127]
DCT requires range be centered around 0.
Values in 8x8 pixel blocks are spatial values and there are 64 samples values in each block
B G R Q
I Y
311 . 0 523
. 0 212
. 0
321 . 0 275
. 0 596
. 0
114 .
0 587
. 0 299
. 0
Forward DCT
Convert from spatial to frequency domain
convert intensity function into weighted sum of periodic basis (cosine) functions
identify bands of spectral information that can be thrown away without loss of quality
Intensity values in each color plane often
change slowly
1D Forward DCT
Given a list of n intensity values I(x), where x = 0, …, n-1
Compute the n DCT coefficients:
1 ...
0 2 ,
) 1 2
cos ( ) ( )
2 ( )
( 1
0
n n u
x x I u
n C u
F n
x
otherwise u u for
C where
1
, 2 0
1 )
(
Visualization of 1D DCT Basic Functions (or Basis Images)
F(0) F(1) F(2) F(3) F(4) F(5) F(6) F(7)
1D Inverse DCT
Given a list of n DCT coefficients F(u), where u = 0, …, n-1
Compute the n intensity values:
otherwise u u for
C where
1
, 2 0
1 )
(
1 ...
0 2 ,
) 1 2
cos ( )
( ) 2 (
)
( 1
0
n n x
u x C u n F
x
I n
u
Extend DCT from 1D to 2D
Perform 1D DCT on each row of the block
Again for each column of 1D coefficients
alternatively, transpose the matrix and perform DCT on the rows
X Y
85
2-D DCT
Images are two-dimensional; How do you perform 2-D DCT?
Two series of 1-D transforms result in a 2-D transform as demonstrated in the figure below
1-D Row- wise
1-D Column- wise
8x8 8x8 8x8
j) f(i,
v) F(u,
r F(0,0) is called the DC component and the rest of F(i,j) are called AC components
Equations for 2D DCT
Forward DCT:
Inverse DCT:
m
v y
n u y x
x I v
C u nm C
v u
F m
y n
x 2
) 1 2
cos ( 2 *
) 1 2
cos (
* ) , ( )
( ) 2 (
) ,
( 1
0 1 0
m
v y
n u v x
C u C u v nm F
x y
I m
v n
u 2
) 1 2
cos ( 2 *
) 1 2
cos ( ) ( ) ( ) , 2 (
) ,
( 1
0 1
0
Visualization of 2D DCT Basis Functions
Increasing frequency
Increasing frequency
Coefficient Differentiation
F(0,0) :is called DC coefficient
includes the lowest frequency in both directions
Determines fundamental color of the block
F(0,1) …. F(7,7)
are called AC coefficients
Their frequency is non-zero in one or both directions
After DCT compression, only a few DCT coefficients have large values in each 8x8 block
These coefficients are floating-point numbers.
89
Quantization of DCT Coefficients
Each coefficients are converted into an integer number using quatization
The choices of quatization steps
• Large quantization step - Large rounding errors, smaller quantized values
• Small quatization step – Small rounding errors, large quantized coefficients
• The same quatization step for all coefficients ? – Human visual system is more sensitive to relative low frequency changes in images, which implies that for high frequency coefficients, larger quantization steps can be used without causing noticeable image distortion
90
Quantization of DCT Coefficients
Why? -- To reduce number of bits per sample
F’(u,v) = round(F(u,v)/q(u,v))
Example: 101101 = 45 (6 bits).
Truncate to 4 bits: 1011 = 11. (Compare 11 x 4 =44 against 45) Truncate to 3 bits: 101 = 5. (Compare 8 x 5 =40 against 45) Note, that the more bits we truncate the more precision we lose
Quantization error is the main source of the Lossy Compression.
Uniform Quantization:
q(u,v) is a constant.
Non-uniform Quantization -- Quantization Tables
Eye is most sensitive to low frequencies (upper left corner in frequency matrix), less sensitive to high frequencies (lower right corner)
Custom quantization tables can be put in image/scan header.
JPEG Standard defines two default quantization tables, one each for luminance and chrominance.
JPEG Quantization Matrix
The low frequency coefficients (upper-left corner) have smaller quantization steps, therefore more accurately encoded
3 5 7 9 11 13 15 17
5 7 9 11 13 15 17 19
7 9 11 13 15 17 19 21 9 11 13 15 17 19 21 23 11 13 15 17 19 21 23 25 13 15 17 19 21 23 25 27 15 17 19 21 23 25 27 29 17 19 21 23 25 27 29 31
JPEG Quantization Using the Quantization Matrix
92 3 -9 -7 3 -1 0 2
-39 -58 12 17 -2 2 4 2
-84 62 1 -18 3 4 -5 5
-52 -36 -10 14 -10 4 -2 0 -86 -40 49 -7 17 -6 -2 5 -62 65 -12 -2 3 -8 -2 0 -17 14 -36 17 -11 3 3 -1
-54 32 -9 -9 22 0 1 3
30 0 -1 0 0 0 0 0
-7 -8 1 1 0 0 0 0
-12 6 0 -1 0 0 0 0
-5 -3 0 0 0 0 0 0
-7 -3 3 0 0 0 0 0
-4 4 0 0 0 0 0 0
-1 0 -1 0 0 0 0 0
-3 1 0 0 0 0 0 0
Before quantization
After quantization
90 0 -7 0 0 0 0 0
-35 -56 9 11 0 0 0 0
-84 54 0 -13 0 0 0 0
-45 -33 0 0 0 0 0 0
-77 -39 45 0 0 0 0 0
-52 60 0 0 0 0 0 0
-15 0 -19 0 0 0 0 0
-51 19 0 0 0 0 0 0
After reconstruction
Default Normalized
Quantization Table in JPEG
Actual step size for C(i,j): Q(i,j) = QP*q(i,j)
Zig-Zag Ordering of DCT Coefficients
Zig-Zag ordering: converting a 2D matrix into a 1D array, so that the frequency (horizontal+vertical) increases in this order, and the coefficient variance decreases in this order.
95
Zig-Zag Scan
Why? -- to group low frequency coefficients in top of vector and high frequency coefficients at the bottom
Maps 8 x 8 matrix to a 1 x 64 vector
8x8
. . .
1x64
Coding of Quantized DCT Coefficients
DC coefficient: Predictive coding
– The DC value of the current block is predicted from that of the previous block, and the error is coded using
Huffman coding.
AC Coefficients: Runlength coding
– Many high frequency AC coefficients are zero after first few low frequency coefficients
– Runlength Representation:
• Ordering coefficients in the zig-zag order
• Specify how many zeros before a non-zero value
• Each symbol = (length of zero, non zero value) – Code all possible symbols using Huffman coding
JPEG:Differential coding of
DC
98
DPCM on DC Components
The DC component value in each 8x8 block is large and varies across blocks, but is often close to that in the previous block.
Differential Pulse Code Modulation (DPCM): Encode the difference between the current and previous 8x8 block. Remember, smaller number -> fewer bits
45 54 48
32
45
9
-6
12
36 4
. . .
. . .
1x64 1x64
1x64
1x64 1x64
1x64 1x64
1x64
1x64 1x64
Coding of DC Symbols
Example:
– Current quantized DC index: 2 – Previous block DC index: 4
– Prediction error: -2
―The prediction error is coded in two parts:
• Which category it belongs to (Table of JPEG Coefficient Coding Categories), and code using a Huffman code (JPEG Default DC Code)
– DC= -2 is in category “2”, with a codeword “100”
• Which position it is in that category, using a fixed length code, length=category number
– “-2” is the number 1 (starting from 0) in category 2, with a fixed length code of “01”.
– The overall codeword is “10001”
JPEG Tables for coding DC
Example: Coding of AC coefficients
For symbol (0,5):
The value „5‟ is represented in two parts:
• Which category it belongs to (Table of JPEG Coefficient Coding Categories), and code the “(runlength, category)” using a Huffman code (JPEG Default AC Code)()
• AC=5 is in category “3”,
• Symbol (0,3) has codeword “100”
• Which position it is in that category, using a fixed length code, length=category number
• “5” is the number 5 (starting from 0) in category 3, with a fixed length code of “101”.
• The overall codeword for (0, 5) is “100101”
Second symbol (0,9)
• „9‟ in category „4‟, (0,4) has codeword „1011‟,‟9‟ is number 9 in category 4 with codeword „1001‟ -> overall codeword for (0,9) is „10111001‟
102
RLE on AC Components
The 1x64 vectors have a lot of zeros in them, more so towards the end of the vector.
Higher up entries in the vector capture higher frequency (DCT) components which tend to be capture less of the content.
Could have been as a result of using a quantization table
Encode a series of 0s as a (skip,value) pair, where skip is the number of zeros and value is the next non-zero component.
Send (0,0) as end-of-block sentinel value.
. . .
1x64 0 0 0 0 0 1 1 0 0 0 0 0
5,1
0 0
7,2
0 2 . . .
103
Entropy Coding: AC Components
AC components (range –1023..1023) are coded as (S1,S2 pairs):
S1: (RunLength/SIZE)
RunLength: The length of the consecutive zero values [0..15]
SIZE: The number of bits needed to code the next nonzero AC component’s value. [0-A]
(0,0) is the End_Of_Block for the 8x8 block.
S1 is Huffman coded (see AC code table below)
S2: (Value)
Value: Is the value of the AC component.(refer to size_and_value table)
Run/
SIZE
Code Length
Code
0/0 4 1010
0/1 2 00
0/2 2 01
0/3 3 100
0/4 4 1011
0/5 5 11010
0/6 7 1111000
0/7 8 11111000
0/8 10 1111110110
0/9 16 1111111110000010
0/A 16 1111111110000011
Run/
SIZE
Code Length
Code
1/1 4 1100
1/2 5 11011
1/3 7 1111001
1/4 9 111110110
1/5 11 11111110110
1/6 16 1111111110000100
1/7 16 1111111110000101
1/8 16 1111111110000110
1/9 16 1111111110000111
1/A 16 1111111110001000
… 15/A More Such rows
JPEG Tables for Coding AC
(Run, Category) Symbols
JPEG:Example
JPEG:Example
JPEG:Example
JPEG: Example
The DC coefficient is DPCM coded (difference between the DC coefficient of the previous
block and current block)
The AC coefficients are mapped to run-length pairs: (run,value)
(0,5),(0, -3),(0, -1),(0,- 2),(0, -3),(0,1),
(0,1),(0, -1),(0, -1),(2,1),(0,2),(0,3),(0, -2), (0,1),(0,1),(6,1),(0,1),(1,1), EOB
These are then Huffman coded (codes are
specified in the JPEG scheme)
Decoding
Decoding is the inverse process of the encoding
Recover the quantized coefficient matrix
Recover the coefficient matrix
IDCT
JPEG is lossy because of the quatization
process
JPEG:Decoding
JPEG:Decoding
Evaluation
To evaluate an image compression algorithm, there are two criteria
Subjective - The visual quality vs. bit rate (subjective)
Objective - The rate distortion curve (Peak SNR vs. bit rate)
N
i N
e j I i j I i j
N 1 1
2 2
2 1 ( ( , ) '( , ))
The difference image when the quality factor is 3
N
i N
I j I i j I i j
N 1 1
2 2
2 1 ( ( , ) ( , ))
) (
log
10 10
e dB I
PSNR
JPEG:Original Vs
reconstructed Image
JPEG Example: Lena Image
JPEG:Example
JPEG: Pros and Cons
JPEG bit stream
JPEG Bitstream
Terminology
Frame – image
Block – 8x8 image block
Segment – a group of blocks
Frame header
Sample precision
(width, height) of image number of components
unique ID (for each component)
horizontal/vertical sampling factors (for each component)
quantization table to use (for each component)
JPEG Bitstream
Scan header
Number of components in scan
component ID (for each component)
Huffman table for each component (for each component)
Misc. (can occur between headers)