Visually Lossless Encoding for JPEG2000 ieee.pdf

Viewer
Transcript

IEEE TRANSACTIONS ON IMAGE PROCESSING

1

Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior Member, IEEE, and Michael W. Marcellin, Fellow, IEEE

Abstract—Due to exponential growth in image sizes, visually lossless coding is increasingly considered as an alternative to numerically lossless coding, which has limited compression ratios. This paper presents a method of encoding color images in a visually lossless manner using JPEG2000. In order to hide coding artifacts caused by quantization, visibility thresholds (VTs) are measured and used for quantization of subbands in JPEG2000. The VTs are experimentally determined from statistically modeled quantization distortion, which is based on the distribution of wavelet coefficients and the dead-zone quantizer of JPEG2000. The resulting VTs are adjusted for locally changing backgrounds through a visual masking model, and then used to determine the minimum number of coding passes to be included in the final codestream for visually lossless quality under the desired viewing conditions. Codestreams produced by this scheme are fully JPEG2000 Part-I compliant. Index Terms—human visual system, contrast sensitivity function, visibility threshold, visually lossless coding, JPEG2000

T

I. I NTRODUCTION

HE contrast sensitivity function (CSF), which models the varying sensitivity of the human eye as a function of spatial frequency, has been widely exploited in many applications such as image/video quality assessment, perceptual image/video compression, and watermarking. The CSF shows some variations according to the age and visual acuity of the subject, the stimulus used in the experiment, and the viewing conditions [1], [2]. However, it is generally known that the sensitivity has a maximum at 2-6 cycles/degree and is limited at both high and very low frequencies. This is mainly due to the frequency-selective responses of the neurons in the primary visual cortex (also known as V1) and low-pass optical filtering [3]. The sensitivity profile can be measured through psychophysical experiments by finding the just-noticeable points (i.e., visibility thresholds) of the stimulus in a pre-defined contrast unit. A simple way of modeling contrast sensitivity is to use a sinusoidal grating such as the Campbell-Robson chart [4], while a more sophisticated two-dimensional CSF model can be obtained using stimuli that have specific ranges of frequency and orientation. Such stimuli are typically generated by transforms that simulate the V1 reception field of the human visual system (HVS). The frequency bandwidth to which a V1 neuron responds is known to be proportional to its center frequency, and its orientation bandwidth is distributed broadly around 40◦ . The V1 reception fields are well described by H. Oh and M. W. Marcellin are with the Department of Electrical and Computer Engineering, The University of Arizona, Tucson, AZ, 85721 USA (e-mail: [email protected]; [email protected]). A. Bilgin is with the Department of Biomedical Engineering and the Department of Electrical and Computer Engineering, The University of Arizona, Tucson, AZ, 85721 USA (e-mail: [email protected]).

the Gabor filter. The Gabor filter is ideal for space-frequency localization; however, it is difficult to compute. Alternatively, Watson proposed the cortex transform, which is invertible, easy to implement, and able to precisely model V1 reception fields with adjustable parameter values [5]. Over-complete transforms such as the Gabor filter or cortex transforms are good tools for modeling the HVS, but they are not appropriate for image/video compression, since they increase the number of coefficients encoded [6]. As a result, in perceptual image/video compression, complete (criticallysampled) transforms such as the discrete cosine transform (DCT) or discrete wavelet transform (DWT), which are already embedded in the codec, are typically used for both modeling and coding. JPEG2000 employs the separable DWT to decompose an image into several subbands, each having different spatial frequency and orientation [7]. As a decomposition method for simulating the HVS, the DWT has many desirable properties: linearity, invertibility, logarithmically spaced spatial frequencies, and four orientations of 0◦ , 90◦ , 45◦ and 135◦ . However, two of the orientations, 45◦ and 135◦ , are overlapped in the HH subband and the transform is shift-variant. Despite these shortcomings, the separable DWT is widely used in various vision models [8]–[11], as well as in JPEG2000. In JPEG2000, a color image which has three color components (red, green, and blue) is typically transformed to an image with one luminance component (Y) and two chrominance components (Cb and Cr) for improved compression performance. Each component is then independently transformed to the wavelet domain by the DWT, and compression is achieved via bit-plane coding of wavelet coefficients quantized by a dead-zone quantizer. Coding artifacts in JPEG2000 are caused by errors in the reconstructed wavelet coefficients, induced by quantization. These coding artifacts are perceived differently based on the sensitivity of the HVS to the subband where the quantization distortion occurs. In [12], visual weighting factors for wavelet subbands were derived from contrast sensitivity curves obtained from sinusoidal gratings. These weighting factors are included in JPEG2000 Part I as examples that can be used to aid in perceptual coding of images. The Kakadu implementation of JPEG2000 [13] can (optionally) use weighting factors to perform rate allocation in its post-compression rate-distortion optimization process. A visual masking process is described in [7] that can be implemented in a JPEG2000 Part I compatible manner, while JPEG2000 Part II includes visual masking tools based on the work reported in [14]. These tools have all proven useful to improve the visual quality of an image compressed at a given bitrate, but do not control the quality directly. Work on direct quality control in wavelet image coding includes the work of Watson et al, where visibility thresholds

IEEE TRANSACTIONS ON IMAGE PROCESSING

(VTs) were measured for individual wavelet subbands using randomly generated uniform noise as a surrogate for quantization distortion [15]. The resulting VTs have since been employed in wavelet-based codecs (e.g., [16]). These codecs produce superior results when compared to codecs based on conventional MSE/PSNR metrics. However, as shown below, uniform noise does not well model the distortion produced by the dead-zone quantizer of JPEG2000. Thus, a direct use of these VTs in JPEG2000 may fail to produce a truly visually lossless encoded image. Indeed, rather than “visually lossless coding,” the work of [16] refers to “almost transparent coding.” For example, a mean opinion score of about 4.6 (out of 5) is reported for the Goldhill image in that work. Other efforts to measure VTs [17]–[19] have quantized actual wavelet coefficients obtained from natural images, rather than the synthetic distortions used by Watson. These natural VTs may be more accurate because they take into account spatial correlations between wavelet coefficients. However, these VTs were developed using a uniform quantizer. This paper shows below, that for use in JPEG2000, a distortion model based on a dead-zone quantizer yields superior results. Other approaches to controlling JPEG2000 image quality directly (rather than controlling bit-rate) exist in the literature. For example, the use of post compression rate-distortion optimization to achieve a target MSE or PSNR is discussed in [7] and in [20]. Predicting MSE/PSNR prior to compression is discussed in [21]. In that work, predictions of MSE are based on target compression ratios and image activity measures which require significantly lower complexity than compression itself. This prediction could be used to choose a compression ratio (a priori) to achieve a desired MSE/PSNR. However, for the purpose of visually lossless compression, methods that target a given MSE/PNSR will not generally be optimal. This follows from the fact that the MSE/PSNR needed to achieve visually lossless coding is not known, and varies from image to image. Building on our previous work [22]–[24], this paper proposes a visually lossless JPEG2000 encoder for color images. In the YCbCr color space, VTs are measured for a realistic quantization distortion model, which takes into account the statistical characteristics of wavelet coefficients as well as the dead-zone quantization of JPEG2000. The measured VTs are then adjusted to account for the visual masking effects of the underlying background image. The proposed coding scheme is implemented without violating the JPEG2000 Part I standard and automatically produces visually lossless imagery at lower bitrates than other visually lossless schemes from the literature. This paper is organized as follows. The quantization distortions that occur in JPEG2000 are modeled in Section II. Section III describes a psychophysical experiment that employs the quantization model to determine VTs. Section IV describes a model for visual masking effects. The proposed visually lossless coding method and results are presented in Section V. Section VI summarizes the work. II. Q UANTIZATION D ISTORTION M ODELING Wavelet coefficients are often modeled by a generalized Gaussian distribution [25] with probability density function

2

0.25

0.1

0.2

0.08

0.15 0.06

0.1 0.04

0.05

0.02

0 -30

-20

-10

0

10

20

0

30

-6

-4

-2

(a)

0

2

4

6

(b) 0.25

0.2

0.15

0.1

0.05

0

-6

-4

-2

0

2

4

6

(c)

Fig. 1. Probability density functions of: (a) wavelet coefficients in HL, LH, and HH subbands (σ 2 = 50); (b) quantization distortions in HL, LH, and HH subbands (σ 2 = 50, ∆ = 5); and (c) quantization distortions in the LL subband (σ 2 = 2000, ∆ = 5). Dashed lines represent the commonly assumed uniform distribution.

(PDF) f (y) =

α · A(α, σ) α exp (−A(α, σ)|y − µ|) 2Γ(1/α)

(1)

where A(α, σ) = σ −1

Γ(3/α) Γ(1/α)

1/2

and Γ(·) is the Gamma function. The parameters µ and σ are the mean and standard deviation, respectively. The parameter α is called the shape parameter. Wavelet coefficients for the HL, LH, and HH subbands, whose distributions have high-kurtosis and heavy-tailed symmetric densities, are well modeled by the Laplacian distribution with µ = 0 and α = 1, as shown in Fig. 1 (a) for a variance of σ 2 = 50. JPEG2000 quantizes these wavelet coefficients using the following dead-zone quantizer: |y| q = Q(y) = sign(y) · . (2) ∆ Here, q is the quantization index which is subsequently encoded using embedded bit-plane coding. The dequantization procedure in the decoder is expressed by ( 0 q=0 −1 yˆ = Q (q) = (3) sign(q)(|q| + δ)∆ q 6= 0 where δ = 1/2 corresponds to mid-point reconstruction [26]. The resulting quantization distortions in the HL, LH, and HH subbands are not uniformly distributed over the interval (−∆/2, ∆/2) as is commonly assumed. A more appropriate model for quantization distortions produced by the dead-zone

IEEE TRANSACTIONS ON IMAGE PROCESSING

3

quantizer and mid-point reconstruction is given by the PDF √  2|d| 1−p   √12σ e− σ + ∆ 1 0 ≤ |d| ≤ ∆ 2  √

f (d) =

  

√1 e− 2σ

0

2|d| σ

∆ 2

< |d| ≤ ∆

(4)

otherwise

√

√ R ∆ √ 2|y| where p1 = −∆ 1 2σ e− σ dy = 1−e− 2∆/σ . The second term of the first line in (4) follows from assuming that the quantization distortion is uniform only for wavelet coefficients whose magnitudes are larger than ∆ (i.e., coefficients not in the dead-zone). The first term (in the first two lines of (4)) follows from the observation that the quantization errors of the remaining wavelet coefficients (i.e., those coefficients in the dead-zone interval (−∆, ∆)) are equal to the coefficients themselves, since the dead-zone quantizer maps these coefficients to zero. Fig. 1 (b) shows the model of (4) corresponding to ∆ = 5 for the coefficient distribution shown in Fig. 1 (a). Wavelet coefficients in the LL subband are often modeled by the Gaussian distribution with µ = 0 and α = 2 in (1). Assuming the standard deviation of the LL subband is large compared to the quantization step size results in the quantization distortion of the LL subband being modeled by  1 1−p2 0 ≤ |d| ≤ ∆  2  √12σ + ∆ ∆ √1 (5) f (d) = < |d| ≤ ∆ 2 12σ   0 otherwise √ where p2 = ∆/ 3σ. This model is shown in Fig. 1 (c) for ∆ = 5 and σ 2 = 2000. For comparison, Fig. 1 (b) and (c) also show the commonly used uniform distortion model. It is worth noting that the uniform model (incorrectly) implies that quantization errors are bounded by ±∆/2, while the model of (4) correctly indicates that errors can be as large as ±∆. It is also important to note that the uniform model depends only on the parameter ∆, while the model of (4) depends on ∆ as well as the coefficient variance σ 2 .

III. V ISIBILITY T HRESHOLDS Q UANTIZATION D ISTORTIONS

FOR

For irreversible compression in Part I of the JPEG2000 standard, a color image having RGB components is first converted into an image with one luminance component (xY ) and two chrominance components (xCb and xCr ) by the irreversible color transform (ICT) [7]. Each component is then transformed using the dyadic Cohen-Daubechies-Feauveau (CDF) 9/7 DWT [27] and quantized independently. In what follows, a normalized CDF 9/7 DWT designed for efficient implementation of JPEG2000 is assumed. This transform preserves the nominal range of input values since the P4nominal gains of the analysis kernels are one (i.e., hdc = n=−4 hL [n] = 1 and L P3 n hnyq H = n=−3 (−1) hH [n] = 1). The normalization used in [15] produces wavelet coefficients 2k times larger for level k. A K level dyadic wavelet decomposition has 3K + 1 subbands. Since K = 5 is usually sufficient to obtain near optimal compression performance [7], VTs are estimated for

(a)

(b)

Fig. 2. (a) Example quantization distortion in (HH,2) and (b) the corresponding representation in the image domain. The reader is invited to zoom in if the stimulus is not visible in (b).

16 subbands in each of the three color components for this work. The center spatial frequency f of level k is defined as π fk = r · 2−k = dv tan · 2−k (6) 180 where r is the visual resolution in pixels/degree, d is the display resolution in pixels/cm, and v is the viewing distance in cm [15]. A stimulus is an RGB image obtained by applying the inverse wavelet transform and the inverse ICT to wavelet data containing quantization distortions. In this work, stimulus generation begins with wavelet subband data corresponding to a 512×512 color image. These initial data have all coefficients of all subbands (luminance and chrominance) set to 0. One subband of one component is then selected. Quantization distortion is randomly generated for this subband of interest according to either (4) or (5) depending on the subband. This distortion is added to a region of the subband. An example is shown in Fig. 2 (a) for a subband in the luminance component. In this figure, 0 is represented by mid-gray, while positive and negative values are lighter and darker, respectively. The inverse DWT of each component is followed by the inverse ICT. The distortion depicted in Fig. 2 (a) results in the example stimulus of Fig. 2 (b). In the wavelet domain, the size of the region to which distortion is added is N × N with N = min{64, 512 × 2−k } which corresponds to a JPEG2000 codeblock of nominal size 64 × 64. This covers a supporting region, in the image domain, of nominal size M × M with M = N · 2k . To measure the visibility threshold for a given subband with assumed subband variance σb2 , a two-alternative forced-choice (2AFC) method is used. In this method, an image that contains a stimulus and an image that does not contain a stimulus are displayed sequentially (in random order), and a human subject is asked to decide which image contains the stimulus. The display time for each image is 2 seconds with an interval of 2 seconds between subsequent images. The subject is then given an unlimited amount of time to select which image contains the stimulus. The experiment is iterated while varying ∆ in order to find the largest value of ∆ for which the stimulus remains invisible. Specifically, 32 iterations of the QUEST staircase procedure in the Psychophysics Toolbox [28] are used

IEEE TRANSACTIONS ON IMAGE PROCESSING

4

2.5

6 1901FP A 1901FP B U2410 A

5

2.5 1901FP A 1901FP B U2410 A

2

1901FP A 1901FP B U2410 A

2

Visibility threshold

4

1.5

1.5

1

1

0.5

0.5

3

2

1

0

1

2

3

4

Wavelet Transform Level (a) HH

5

0

1

2

3

4

Wavelet Transform Level (b) HL

5

0

1

2

3

4

5

Wavelet Transform Level (c) LH

2 = 50). Threshold values in the HH subband are larger than those in the HL Fig. 3. Measured visibility thresholds as a function of transform level (σb and LH subbands at the same level. Thresholds in HL and LH are very similar. The measured values are connected by straight lines solely to aid in viewing trends.

to determine the value of ∆ corresponding to the 82% correctpoint of a fitted Weibull function [15]. The obtained value of ∆ is then the VT of the subband for the assumed coefficient variance σb2 . Different values of σb2 generally result in different VTs as discussed in the next subsection. The experimental environment was arranged similarly to a typical office environment. Stimuli were displayed on a LCD monitor in ambient light. In this threshold experiment, two LCD monitors, a Dell 1901FP and a Dell U2410, were used. The Dell 1901FP 19-in LCD monitor has a dot pitch of 0.294 mm, a resolution of 1280 × 1024, an image brightness of 250 cd/m2 , and a contrast ratio of 800:1. The Dell U2410 24-in LCD monitor has an In-Plane Switching (IPS) panel, a dot pitch of 0.27 mm, a resolution of 1920 × 1200, an image brightness of 400 cd/m2 , and a contrast ratio of 1000:1. Each monitor was connected to the PC through a Digital Visual Interface (DVI) cable. The reasons for using LCDs, unlike previous studies, are 1) the MTF (Modulation Transfer Function) of the rendering device is an important factor in determining thresholds, and LCDs have recently become the most common display device; and 2) a previous study revealed that stimuli are more detectable in LCDs than in CRTs [29]. The viewing distance was 60 cm (23.6 inches) with a resulting visual resolution of 35.62 pixels/degree and 38.72 pixels/degree for the Dell 1901FP and Dell U2410, respectively. This viewing distance was selected because it has been commonly assumed for typical viewing conditions in previous studies [15], [16], [30]. Based on this selection, the goal of this study is to achieve visually lossless performance under these conditions. However, the proposed methodology is general and can be adapted to yield visually lossless compression under other desired conditions. A. Visibility Thresholds for the Luminance Component Fig. 3 shows the visibility thresholds (i.e., the maximum quantization step sizes at which quantization distortions remain invisible) obtained for 5 levels of HL, LH, and HH subbands. The thresholds of Fig. 3 correspond to the case when σb2 is fixed at 50. The x-axis and y-axis indicate the transform level and the obtained threshold, respectively. The points shown in Fig. 3 are the values determined by the

QUEST procedure for individual subbands ((HH,1), (HH,2), and so on). Vertical bars indicate ± 1 standard deviation. Two subjects, denoted by A and B, conducted the experiments using the Dell 1901FP monitor. Subject A also conducted the experiments using the Dell U2410 monitor. Both subjects are familiar with wavelet quantization distortion and have normal visual acuity. The results in Fig. 3 are labeled by the monitor and observer to which they correspond (e.g., U2410 A). As can be seen in Fig. 3, all three monitor/observer combinations produced similar results. In most cases, intersecting the error intervals for the three measurements yields a nonempty set. To ensure a conservative design, the smallest of the three measured values is used as the visibility threshold for each subband in all subsequent discussions. As expected, threshold values in the HH subband are larger than those in the HL and LH subbands at the same level, which is known as the oblique effect [31]. Also, changes in threshold values as a function of spatial frequency are more pronounced in the HH subband than in other subbands. Thresholds in the HL and LH subbands are very similar. Hereafter, the HL and LH subbands are regarded as equivalent and one set of visibility thresholds is reported and used for both. Changing the assumed variance of the wavelet coefficients can have a dramatic effect on the visibility thresholds. For example, Fig. 4 shows the measured thresholds, as a function of σb2 for the (HL/LH,3) subband. The four data points denoted by asterisks represent thresholds measured for four different values of σb2 . The error bars represent ± 1 standard deviation. From Fig. 4, it can be seen that the value of the threshold increases as the coefficient variance increases. This trend can be explained by a simple analysis of the quantization error. Integrating the second line of (4), the probability of “large” √ distortions, which lie in (−∆, −∆/2)∪(∆/2, ∆), is e √ −

2∆ σb

−

2∆ 2σb

−

e for the HL, LH, and HH subbands. These large distortions are caused by the dead-zone quantizer and tend to be more visible than the smaller distortions in [−∆/2, ∆/2]. For a fixed value of ∆, the probability of large distortions decreases as the variance σb2 increases, provided that the step size ∆ √ is sufficiently small (i.e., ∆ < 2σb ln 2). Furthermore, the variance of the quantization distortion generated according to

IEEE TRANSACTIONS ON IMAGE PROCESSING

5

TABLE I L INEAR PARAMETERS ab AND rb

0.8

visibility threshold

0.7

0.6

ab

rb

subband

ab

rb

HH,1 HL/LH,1 HH,2 HL/LH,2 HH,3 HL/LH,3

105.67 × 10−4 46.03 × 10−4 19.94 × 10−4 13.84 × 10−4 11.04 × 10−4 10.83 × 10−4

4.85 1.98 0.92 0.64 0.51 0.50

HH,4 HL/LH,4 HH,5 HL/LH,5

10.16 × 10−4 7.75 × 10−4 7.91 × 10−4 7.16 × 10−4

0.47 0.36 0.36 0.33

TABLE II AVERAGE VARIANCES OF WAVELET C OEFFICIENTS FOR C HROMINANCE C OMPONENTS

0.5

0.4

subband

0

50

100

150

200

2

variance (s b ) Fig. 4. Threshold values and linear model for (HL/LH,3) as a function of 2. coefficient variance σb 0.040

subband

Cb

Cr

subband

Cb

Cr

HH,1 HL/LH,1 HH,2 HL/LH,2 HH,3 HL/LH,3

0.18 1.33 0.72 3.06 1.16 4.26

0.17 1.43 0.74 3.52 1.20 5.14

HH,4 HL/LH,4 HH,5 HL/LH,5 LL, 5

1.43 5.34 1.52 5.34 150.08

1.37 7.85 1.45 8.93 109.51

The variance of the (LL,5) subband is usually much larger than that of the other subbands, and the distortion is not significantly affected by variance changes. This results in a fixed threshold of tLL,5 = 0.63 for the LL band.

0.035

Py

B. Visibility Thresholds for Chrominance Components 0.030

10

50

100

150

200

2

variance (s b ) Fig. 5. Average power of the quantization distortion as a function of 2 for ∆ = 0.55. coefficient variance σb √ √ 11 2 (4) is Py = σb2 − e− 2∆/σb 12 ∆ + σb2 + 2σb ∆ . This in turn yields a signal of variance Gb · Py in the stimulus image, where Gb is the synthesis gain of the 9/7 DWT for subband b [7]. Fig. 5 shows Py as a function of the coefficient variance σb2 for a fixed quantization step size. The variance of the quantization distortion decreases as the coefficient variance increases, except for very small variances. This implies that lower variance distortion generated by a larger coefficient variance may elevate the threshold value. The form of Py and the measured thresholds of Fig. 4 suggest a complicated relationship between thresholds and σb2 . However, as shown in Fig. 4, a simple linear function works well enough when error bars are considered. In this manner, the threshold tb for a given subband b is determined by

tb = ab · σb 2 + rb

(7)

where σb 2 is the variance of coefficients in subband b. The linear parameters ab and rb are summarized in Table I, and are obtained by least squares fitting of the threshold values measured for different assumed values of σb 2 for each subband.

It is well known that the human visual system is less sensitive to quantization distortion in the chrominance components than in the luminance component. Also, the variance of the chrominance components is much smaller than that of the luminance component. The average variances calculated from 10 natural images (not used in subsequent calibration or evaluation) are listed in Table II. These images are available for viewing at [32]. Table III contains measured threshold values for the chrominance components, determined in the same fashion as for the luminance component using the values from Table II for σb 2 when generating the stimulus images1 . Experimental results indicate that the chrominance thresholds are insensitive to variance changes. Thus, the thresholds for the chrominance components are fixed to the values given in Table III. From this table, it can be seen that threshold values for the chrominance components are larger than those for the luminance component, and that the Cb component is the least sensitive component. IV. V ISIBILITY T HRESHOLD A DJUSTMENT BASED ON V ISUAL M ASKING E FFECTS In actual image coding, unlike the psychophysical experiments described above, all subbands are quantized simultaneously and quantization distortion is superimposed on a “background” image. Threshold changes for the compound 1 In the case of (HH,1), (HL/LH,1) and (HH,2), the measured thresholds were infinite. Adopting this result would effectively result in chrominance subsampling. Instead, large but finite values were chosen in Table III to provide a more conservative approach to these subbands, resulting in negligible file size increases.

6

TABLE III V ISIBILITY T HRESHOLDS FOR C HROMINANCE C OMPONENTS subband

Cb

Cr

subband

Cb

Cr

HH,1

24.72

15.49

HH,4

4.95

1.25

HL/LH,1

14.50

6.35

HL/LH,4

3.26

0.69

HH,2

14.77

7.40

HH,5

1.05

0.56

HL/LH,2

6.36

2.60

HL/LH,5

1.05

0.58

HH,3

11.55

2.57

LL, 5

1.32

0.66

HL/LH,3

4.03

1.23

distortions caused by quantizing subbands simultaneously have been studied in [18]. Visibility of quantization distortions from similar subbands is known to linearly increase when the distortion is highly visible. However, such compound distortions are negligible at the thresholds chosen for this work, and therefore threshold adjustments for compound distortion are not needed. This result is consistent with [15], which provides individual quantization step sizes for visually lossless compression without considering other subbands. On the other hand, VTs can vary significantly with image background. In this context, the image background is called the masker, while the distortion is referred to as the target. The change of threshold values according to the contrast of the image background is represented by the target threshold versus masker contrast (TvC) function [33]. As the contrast of the masker increases, the threshold decreases slightly and then begins to increase. The target becoming less visible due to a high-contrast masker is called masking, while the opposite effect, induced by a low-contrast background is called facilitation. The effect of facilitation is generally insignificant, and only the masking effect is considered in this work. In what follows, visibility thresholds are modified to exploit the masking effect. In particular, thresholds are increased within spatial regions depending on the relevant image background. These threshold increases decrease file size without introducing visible distortion increases. In coding of color images, the chrominance components typically occupy a much smaller portion than the luminance component in the overall file. Thus, for simplicity, threshold adjustments are made only for the luminance component in this work. The contrast of a luminance masker within a spatial region is determined by the luminance wavelet coefficients in the corresponding spatial region of all subbands. However, since masking occurs most strongly when the target and masker have the same frequency and orientation, this work considers only wavelet coefficients in the subband to be encoded, and the visually lossless masked threshold tˆb is determined as

self-contrast masking ( log b) [n] ) self-contrast maskingfactor factor (log s[n]

IEEE TRANSACTIONS ON IMAGE PROCESSING

0.5

0.4

0.3

0.2

0.1

0

0

0.5

1

1.5

2

magnitude of wavelet coefficient ( log |y[n]| ) wavelet coefficient magnitude (log | n |)

Fig. 6. Log-log plot of self-masking factor vs. wavelet coefficient magnitude 2 = 50, w = 1 and ρ = 0.4. for σb 1 1

The masking factor mb is calculated using two visual masking models, the self-contrast masking model and the texture masking model. The self-contrast masking model approximates the change of threshold in the TvC function according to the magnitude of wavelet coefficients. The corresponding selfcontrast masking factor, sb [n], at two-dimensional location n in subband b is defined by ρ1 |y[n]| sb [n] = max 1, w1 + (9) y¯b +

(8)

where y[n] is the wavelet coefficient at location n and y¯b+ is the conditional expectation √ of y given y ≥ 0, calculated from (4) as E[y|y ≥ 0] = 2σb /2. The small constant > 0 is included for stability of the equation. The parameter ρ1 reflects nonlinearity of self-contrast masking and has a value between 0 and 1. The parameter w1 adjusts the degree of self-contrast masking and together with ρ1 prevents over-masking. The self-contrast masking factor, as shown in Fig. 6, takes a value of 1 (no masking) for small values of |y[n]|. On a loglog scale, it increases linearly with slope ρ1 when log |y[n]| −1/ρ exceeds log yb+ by more than log w1 1 . This self-contrast masking model can have problems at edges, because edges generate large coefficients, but are perceptually more sensitive [14]. Therefore, care is needed in choosing the parameters w1 and ρ1 to ensure visually lossless quality in all regions. In addition to self-contrast, texture activity can significantly affect distortion visibility. Specifically, VTs increase as the texture beneath the distortion becomes more difficult to predict. In this work, the texture masking factor, τb [n], for the texture activity of small local texture-block j is given by n ρ 2 o τb [n] = max 1, w2 σ ˆj2 (10)

where tb is the base threshold value calculated from (7) and mb is a masking factor calculated from the magnitudes of wavelet coefficients in subband b. This intra-band masking model may provide slightly lower accuracy than models that consider both intra- and inter-band masking effects, but has the advantage of simplicity, as well as enabling parallel processing because subbands can be encoded independently.

where σ ˆj2 is the variance of reconstructed wavelet coefficients in texture-block j. The parameters w2 and ρ2 play similar roles to those of w1 and ρ1 in (9). The texture masking model is similar to the neighbor masking models in [16] and [34], but computes the masking factor only once per texture-block for computational efficiency. Every location in the texture-block is then assigned the same

tˆb = tb · mb

IEEE TRANSACTIONS ON IMAGE PROCESSING

7

V. V ISUALLY L OSSLESS C ODING AND E XPERIMENTAL R ESULTS A. Visually Lossless Coding

(b)

(a)

Fig. 7. (a) 512 × 512 reference image horse and (b) the product of its two masking factors, sb [n] · τb [n], in the wavelet domain for W = 32. Brighter intensities represent higher masking effects.

value. A texture-block of size W × W in the spatial domain implies a texture-block of size N × N in the wavelet domain, with N = W ·2−k for DWT level k. Since N is usually smaller than the size of a JPEG2000 codeblock, each codeblock can contain several texture-blocks. Note that although the texture masking factor is constant over a texture-block, it is not generally constant over a codeblock. Texture masking is applied only to the high-frequency subbands (k ≤ 3) because lower frequency subbands have insufficiently large textureblocks to calculate a variance. Because textures can be affected by quantization, reconstructed wavelet coefficients are used when calculating the variance σ ˆj2 [35]. Fig. 7 depicts the image horse and the product of the two masking factors, sb [n] · τb [n], in the wavelet domain. Brighter intensities represent higher masking factors. The self-masking sb [n] is strong along prominent edges, and the texture-masking τb [n] is more pronounced in complex textures. The value of sb [n] · τb [n] is 1.0 (i.e., no masking) in flat background areas, such as the sky and horse body. Also, it can be seen that masking values in a subband are influenced by the orientation of maskers. In JPEG2000, thresholds may be applied at the level of each codeblock via codestream truncation, without requiring modification of the decoder. To this end, the masking factors for all wavelet coefficients in a codeblock are combined to yield a single masking factor via the Minkowski mean,

mb =

1 X β (sb [n] · τb [n]) kBk

! β1 (11)

n∈B

where kBk is the number of coefficients in codeblock B. The parameter β lies between 0 and 1 and controls the degree of overall masking. If β = 1, (11) computes the average value of sb [n] · τb [n] without any weighting. As β is decreased, less weight is placed on areas with large masking factors. In this way, the overall masking factor for the block becomes more conservative.

In JPEG2000, wavelet coefficients in subband b are initially quantized with a quantization step size ∆b . The subband is partitioned into codeblocks, and the quantization indices of each codeblock are bit-plane coded. Each bit-plane is coded in three coding passes, except the most significant bit-plane (MSB), which is coded in one coding pass. Thus, a codeblock with M bit-planes has 3M − 2 coding passes. One or more least significant coding passes may optionally be omitted from the final codestream to increase the effective quantization step size within a codeblock. Omitting the p least significant bits of the quantization index of a coefficient results in an effective quantization step size of 2p ∆b for that coefficient. To ensure that the effective quantization step sizes are less than the thresholds derived previously, the following procedure is followed for all luminance subbands except (LL,5). First, 2 the variance σb,i for codeblock i in subband b is calculated. Then a base threshold tb,i for that codeblock is determined using (7). The self-masking factor sb,i [n] is calculated for each wavelet coefficient in the codeblock according to (9). During the bit-plane coding, the texture-masking factor τb,i [n] is calculated for each coefficient in the codeblock according to (10), followed by mb,i (11) and tˆb,i (8). The maximum absolute error for the codeblock is calculated at the end of each coding pass z as D(z) = max |y[n] − yˆ(z) [n]| (12) n∈B

where yˆ(z) [n] denotes the reconstructed value of y[n] using the quantization index qˆ(z) [n] , which has been encoded only up to coding pass z. Coding is terminated when D(z) falls below the masked threshold tˆb,i . In principle, τb,i [n] must be recalculated at the end of each coding pass, since it is computed using reconstructed coefficients. In practice, to reduce the computation load, the texture masking factors are not evaluated (and are set to 1.0) until D(z) falls below a prescribed maximum masked threshold, tˆmax = 6rb . In addition to complexity reduction, tˆmax sets b b a conservative upper bound on the effective quantization step size that can be applied to any given codeblock, possibly at the expense of some small increase in file size. Pseudo-code for the proposed algorithm is given in Fig. 8. It is worth emphasizing that the base VTs computed in the second line of Fig. 8 are adaptive codeblock-by-codeblock and image-by-image. This is in contrast to [16] where a fixed set of base VTs (one per subband) is employed, independent of the image to be encoded. The adaptivity of the base VTs proposed here stems from the dependence of the quantization distortion model on coefficient variance. For (LL,5) of the luminance component, the visibility threshold tLL,5 = 0.63 is used directly as the quantization step size ∆LL,5 , and all bit-planes are included in the codestream. The same procedure is applied for all chrominance subbands. That is, the thresholds of Table III are used as quantization step sizes, and all bitplanes are included in the codestream.

IEEE TRANSACTIONS ON IMAGE PROCESSING

nfidence l of the rence Upper 0.0493 0.0361

034. In a frequency nificantly shown in randomly jected for m that the he chosen

proposed B for the ted when PDF files dow size when they provided

r compoing backs coding e and 24sured for different nd. Since ckground ment, the ng model changing e used to s coding. g scheme lity is innificantly oding and

lgorithm, s, a more e (GSM) between neralized meter was periment. cks, light masking d coding

2 for codeblock i in subband b compute the variance σb,i determine the base threshold tb,i from (10) compute the self-masking factors sb,i [n] for coding pass z = 0 to 3(M − 1) { perform bit-plane coding for coding pass z compute the maximum distortion D(z) if D(z) < tˆmax , then b { compute the texture masking factors τb,i [n] determine the masked threshold tˆb,i from (11) if D(z) ≤ tˆb,i , then coding is terminated } }

Fig. 8.

8

11

Pseudo-code of the proposed encoder for codeblock i of subband b

for each luminance subband except (LL,5). scheme well predicted the minimum bitrate required for visually lossless coding.

R EFERENCES The proposed algorithm does not require post-compression [1] N. V. Graham, Visual Pattern Analyzers.[7]New York: Oxford rate-distortion (PCRD) optimization to select whichUniversity coding Press, 1989. passes are included in the final codestream. The algorithm [2] S. Daly, “Application of a noise-adaptive contrast sensitivity function automatically computes onlyOptical compressed coding passes to image data compression,” Engineering, vol. 29, no. 8,that pp. 1990. in the final codestream. Thus, as in [16], the will 977–987, be included [3] C. Blakemore andisF.typical Campbell, the JPEG2000 existence of neurones in the “overcoding” that of “On many implementahuman visual system selectively sensitive to the orientation and size tionsofisretinal avoided, which canofsignificantly reduce computational images,” Journal Physiology, vol. 203, pp. 237–260, July 1969. complexity. It is important to note that all codestreams pro[4] F. by W. Campbell and J. algorithm G. Robson, are “Application of Fourier to duced the proposed compatible with analysis Part-I of the visibility of gratings,” Journal of Physiology, vol. 197, pp. 551–566, the JPEG2000 standard, and can be decoded by any JPEG2000 1968. decoder. [5] A. B. Watson, “The cortex transform: rapid computation of simulated

neural images,” Computer Vision, Graphics, and Image Processing, 39, no. 3, pp.previously, 311–327, 1987. Asvol.mentioned prior to bit-plane coding, all [6] D. Wu, D. M. Tan, M. Baird, J. DeCampo, C. White, and H. R. Wu, wavelet coefficients in luminance subband b are quantized “Perceptually lossless medical image coding,” IEEE Transactions on usingImage a step size of ∆ . An ideal value of2006. ∆b would be the Processing, vol.b25, no. 3, pp. 335–344, [7] D. S. Taubman and M.However, W. Marcellin, image compression final masked threshold. thisJPEG2000: value is unknown before fundamentals, standards, practice. toBoston: KluwerAAcademic encoding and varies fromand codeblock codeblock. simple Publishers, 2002. strategy wouldandbeG.to Meyer, select “A ∆bperceptually as the minimum possiblesampling value [8] M. Bolin based adaptive algorithm,” in SIGGRAPH 99 Conference pp. 299– of the final masked threshold, i.e., rb inProceedings, Table I. In1998, this work, a 309. larger value of ∆ = 1.03r was chosen. This choice slightly b b [9] Y. Lai and C. Kuo, “Image quality measurement using the Haar wavelet,” yields better performance with no impact in slightly Proceedings of thecompression SPIE, vol. 3169, 1997, pp. 127–138. on visual quality.“AIndeed, this value is well within theTransacerror [10] A. P. Bradley, wavelet visible difference predictor,” IEEE on Image Processing, no. 5, pp. 717–730, 1999. bars tions depicted in Fig. 4 for vol. all 8, subbands. [11] M. Masry, S. S. Hemami, and Y. Sermadevi, “A scalable wavelet-based video distortion metric and applications,” IEEE Transaction on Circuits Calibration of the masking models was performed using sevand Systems for Video Technology, vol. 16, no. 2, pp. 260–273, 2006. eral 8-bit monochrome images. purpose “Visibility of this calibration [12] A. B. Watson, G. Y. Yang, and J.The A. Solomon, of wavelet noise,” IEEE Transactions Imageyields Processing, vol. 6, was quantization to find a set of model parameters on which the largest no. 8,factors pp. 1164–1175, 1997. masking (i.e., the smallest file sizes), while maintaining [13] Z. Liu, L. J. Karam, and A. B. Watson, “JPEG2000 encoding with visually lossless quality via compression as on described above, perceptual distortion control,” IEEE Transactions Image Processing, vol. 15,by pp.decompression 1763–1778, 2006. using an unmodified JPEG2000 followed [14] G. RamosTable and S.IV S. shows Hemami,the “Suprathreshold wavelet obtained coefficient Part M. I decoder. parameter values quantization in complex stimuli: psychophysical evaluation and analyby visual inspection usingSociety numerous combinations values. sis,” Journal of the Optical of America A, vol. 18, pp.of 2385–2397, As verified by validation experiments described below, these 2001. [15] D. M. Chandler and S. S. visually Hemami, “Effects natural images the parameter values provide losslessof quality for a on wide detectability of simple and compound wavelet subband quantization rangedistortions,” of naturalJournal images. Calibration by image type is possible, of the Optical Society of America A, vol. 20, no. 7, and pp. different parameter 1164–1180, 2003. values for different image types may [16] ——,better “Dynamic contrast-based quantization but for at lossy image provide compression performance, thewavelet expense of compression,” IEEE Transactions on Image Processing, vol. 14, no. 4, the compressor becoming image type specific. pp. 397–410, 2005.

TABLE IV M ASKING M ODEL PARAMETERS w1

ρ1

w2

ρ2

β

0.8

0.35

0.35

0.4

0.8

TABLE V B ITRATES AND PSNR S FOR THE P ROPOSED V ISUALLY L OSSLESS JPEG2000 E NCODER FOR 8- BIT M ONOCHROME I MAGES (I MAGES

DENOTED BY * WERE USED DURING MASKING MODEL CALIBRATION )

Image 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

airplane* baboon* balloon barbara* barnlake bike blastoff boats cameramn goldhill hatgirl horse jill* lena* man monarch onthepad peppers thecook woman Average

Dimension (W×H)

Lossless (bpp)

Proposed (bpp)

Ratio

Y PSNR (dB)

512 × 512 512 × 512 512 × 512 512 × 512 512 × 512 2048 × 2560 512 × 512 720 × 576 256 × 256 720 × 576 512 × 512 512 × 512 512 × 512 512 × 512 1024 × 1024 768 × 512 512 × 512 512 × 512 512 × 512 2048 × 2560 -

3.99 6.11 4.32 4.78 5.59 4.52 5.39 4.06 4.99 4.61 5.27 5.25 3.14 4.30 4.82 3.81 6.50 4.62 5.48 4.51 4.80

1.37 2.70 1.62 1.73 2.68 1.62 2.29 1.27 1.92 1.79 2.34 2.26 0.91 1.43 1.88 1.08 2.97 1.61 2.57 1.61 1.91

2.91 2.26 2.67 2.76 2.09 2.79 2.35 3.20 2.60 2.58 2.25 2.32 3.45 3.01 2.56 3.53 2.19 2.87 2.13 2.80 2.51

42.81 36.64 40.76 39.67 38.75 40.44 38.38 42.52 39.92 40.99 39.08 39.33 44.60 41.57 40.52 42.78 35.41 39.70 38.83 40.57 40.23

B. Coding Results The proposed encoder was implemented in Kakadu V6.1 [13]. All reported results were generated with this encoder and the unmodified decoder, kdu expand. Tables V and VI, respectively, show the bitrates obtained for encoding 8-bit monochrome and 24-bit color images using the proposed JPEG2000 coding scheme. For comparison, bitrates obtained for numerically lossless JPEG2000 compression are also included. Images used to generate these results were obtained from the USC database [36], LIVE database [37], Kodak PhotoCD [38], and ISO JPEG2000 test suite. Images marked with an asterisk in Table V were used during masking model calibration. For monochrome images, the numerically lossless coding method of JPEG2000 yields an average bitrate of 4.80 bitsper-pixel (bpp), while the proposed visually lossless coding method achieves an average bitrate of 1.91 bpp, an improvement in compression ratio of 2.5 to 1, without perceivable quality degradation. In the case of color images, the numerically lossless coding method and the proposed visually lossless coding method achieve, respectively, 10.50 bpp and 1.91 bpp on average, for an improvement in compression ratio of 5.5 to 1. As can be seen in the tables, different images encoded in the proposed visually lossless manner have significantly different bitrates. Also, peak-signal-to-noise ratios for the luminance component (Y PSNRs) vary widely depending on the image.

IEEE TRANSACTIONS ON IMAGE PROCESSING

9

TABLE VI B ITRATES AND PSNR S FOR THE P ROPOSED V ISUALLY L OSSLESS JPEG2000 E NCODER FOR 24- BIT C OLOR I MAGES

Image 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

bike bikes building2 buildings caps carnival cemetery church coins dancers flowers house lighthse lighthse2 manfish monarch ocean painted parrots plane rapids sailing1 sailing2 sailing3 sailing4 statue stream sculpture woman woman2 womanhat Average

TABLE VII B ITRATE AND PSNR C OMPARISON WITH [30] FOR V ISUALLY L OSSLESS 8- BIT 512 × 512 D IGITIZED R ADIOGRAPHS (bpp)

[30] PSNR(dB)

158630 157952 158180 158208 157514

1.47 1.96 1.62 1.35 1.68

Average

1.62

Dimension (W×H)

Lossless (bpp)

Proposed (bpp)

Ratio

Y PSNR (dB)

Image

2048 × 2560 768 × 512 640 × 512 768 × 512 768 × 512 610 × 488 627 × 482 634 × 505 640 × 512 618 × 453 640 × 512 768 × 512 480 × 720 768 × 512 634 × 438 768 × 512 768 × 512 768 × 512 768 × 512 768 × 512 768 × 512 768 × 512 480 × 720 480 × 720 768 × 512 480 × 720 768 × 512 632 × 505 2048 × 2560 480 × 720 480 × 720 -

11.96 10.81 13.90 11.13 8.09 9.67 12.69 10.98 11.88 11.98 12.36 10.08 10.19 9.75 10.06 8.98 8.77 10.16 8.48 8.06 10.16 9.59 9.39 9.57 9.29 9.55 11.86 13.31 11.50 11.61 9.75 10.50

1.72 2.40 2.79 2.30 1.10 1.75 2.37 1.85 2.35 2.28 2.36 1.99 1.73 1.71 1.79 1.21 1.50 2.14 0.94 1.21 2.35 1.91 1.22 1.39 1.91 1.52 2.84 2.83 1.67 2.38 1.67 1.91

6.95 4.51 4.99 4.84 7.35 5.53 5.35 5.93 5.05 5.26 5.23 5.06 5.88 5.72 5.61 7.41 5.83 4.74 9.06 6.64 4.33 5.02 7.67 6.87 4.86 6.27 4.17 4.71 6.90 4.87 5.82 5.50

40.44 39.53 36.06 37.96 43.52 41.40 38.88 39.96 40.39 38.82 36.47 41.22 40.11 40.26 40.08 42.83 41.71 40.43 43.46 43.09 40.54 39.98 41.85 41.96 41.26 41.70 36.39 37.87 40.57 39.88 41.76 40.34

In particular, the resulting bitrates range from 0.91 bpp to 2.97 bpp for monochrome images, and from 0.94 bpp to 2.84 bpp for color images. The monochrome images have a minimum PSNR of 35.41 dB and a maximum PSNR of 44.60 dB. The luminance PSNR values for color images are similar. These results demonstrate that bitrate and PSNR are not effective criteria for determining visual lossless quality. Table VII compares the proposed coding method with the visually lossless method proposed in [30]. This table reports bitrates obtained by each method for encoding five 512 × 512 8-bit digitized radiographs used in [30]. As described in that work, the images were obtained by cropping 512 × 512 diagnostically relevant regions of 2048 × 2056 radiographs acquired using a Lumisys 75 laser film scanner (8-bits/pixel grayscale, 146 dots/inch). The five images are images of the hip, chest, neck, leg, and abdomen. As can be seen from the table, the method proposed here provides significantly lower bitrates. These lower bitrates result in a lower average PSNR, but do not result in visual artifacts. Comparison with other methods from the literature is more difficult. As compared to [16], the bitrates obtained here for monochrome images are considerably higher. However, as discussed in the introduction, the encoder of [16] does not provide truly visually lossless quality as evidenced by their use

Proposed (bpp) PSNR(dB)

Ratio

43.16 44.27 44.72 45.97 44.12

1.06 1.26 0.96 0.56 0.94

42.49 42.19 42.46 43.16 42.38

1.39 1.56 1.69 2.41 1.79

44.45

0.96

42.53

1.69

of the term “almost transparent coding” rather than “visually lossless coding,” and by their reported mean opinion scores. This is likely caused by the use of a uniform probability model for quantization distortion, as well as the use of a CRT for the psychovisual experiments in [15], from where the VTs used in [16] are drawn. As also mentioned in the introduction, the Kakadu implementation of JPEG2000 provides a CSF-based visual weighting option that can enhance the visual quality of compressed imagery. Command line options for the application kdu compress allow the user to select a target bitrate, or alternatively, a target distortion-rate slope. But, no guidance is given as to how a target bitrate or distortion-rate slope should be chosen to yield visually lossless quality. If an “Oracle” provides the bitrates from the proposed method (from Tables V and VI) as target bitrates to kdu compress (with visual weighting), very high quality imagery results. In fact, the quality rivals that of the proposed method. However, it must be emphasized that kdu compress has no such Oracle, and that a major contribution of the proposed method is that it automatically adjusts the compression on an image-by-image basis to achieve low bitrates while preserving visually lossless quality. It is worth noting that the base visibility thresholds can be used alone (without masking) to achieve visually lossless encoding. Indeed, results for this case were reported in [22]. Specifically, for a given codeblock, the coefficient variance is computed and used to compute the base threshold tb from (7). Bitplane coding is then carried out until the maximum absolute coefficient error in the codeblock falls below tb . Visually lossless encoding is achieved, but with about 10% larger file sizes than in the masked case. The masked case can be understood to be uniformly more aggressive than the unmasked case. This is seen by noting that the masking model can only increase the base threshold. That is, from (8) and (11), it follows that tˆb ≥ tb . Thus, for each codeblock, the number of coding passes included in the masked case is no more than in the unmasked case. In turn, the quantization error of each coefficient in the masked case is lower bounded by that in the unmasked case. Accordingly, only the masked version of the encoder is considered in the validation experiments below. C. Validation Experiments To verify that the images encoded with the proposed scheme are visually lossless, a three-alternative forced-choice (3AFC)

IEEE TRANSACTIONS ON IMAGE PROCESSING

10

(a)

(b) Fig. 9. (a) original monochrome image horse and (b) visually lossless image encoded by the proposed method. Images are cropped to 512 × 350 after decompression to avoid rescaling during display. The images should be viewed with viewer scaling set to 100%. Image details can be found in Table V.

method was used. For each test image, two original copies and one compressed copy were displayed side by side with the position of the compressed copy chosen randomly. The test images were selected for presentation in a fixed order. For each test image, the subject was given an unlimited amount of time to choose the copy that looked different from the other two copies. If the images are compressed in a visually lossless manner, the rate at which the subject correctly chooses the compressed copy should be 1/3. 3AFC testing has been used previously in [30]. Evidently, 3AFC testing can provide a more rigorous validation than 2AFC. This can be seen by considering the 2AFC procedure.

In 2AFC testing, an original copy is displayed together with only one compressed copy. The subject is asked to choose the copy which has been compressed. If the images are compressed losslessly, the rate at which the subject chooses the correct copy should be 1/2. However, consider the case when the images are compressed with very high quality, but are not visually lossless. In the case of 2AFC testing, the subject would be able to perceive that differences exist between the two copies, but might not be able to tell which copy has been compressed. In this situation, the rate of correct response might still be 1/2, even though the images are not visually lossless. On the other hand, for 3AFC testing, the subject can identify

IEEE TRANSACTIONS ON IMAGE PROCESSING

11

TABLE VIII R ESULTS

T-T EST

Test Value=0.3333 t-score

Degrees of freedom

Significance (2-tailed)

Mean difference

1.798

20

0.087

0.0167

95% confidence interval of the difference Lower

Upper

-0.0026

0.0347

the compressed image as the one that differs from the other two, resulting in a correct response rate of greater than 1/3, indicating the fact that the imagery is not visually lossless. As test images, 8 monochrome and 11 color images were selected from the images in Tables V and VI. Each has a low luminance PSNR (less than 40 dB) and none were used during algorithm design/calibration. Additionally, the two radiographs from Table VII having the lowest PSNRs were included in the validation test set. In the validation experiment, the Dell U2410 LCD monitor with a resolution of 1920 × 1200 was used. The maximum dimension of images displayed during testing was 570×975 so that three copies fit side by side across the screen. Test images exceeding these maximum dimensions were cropped after decompression. Sixteen observers participated in the experiment. All were familiar with compression artifacts that occur in JPEG2000 and have normal or correctedto-normal vision. Each subject performed five trials for each of the 21 images for a total of 1680 evaluations. During the experiment no feedback was provided on the correctness of choices. Although the proposed encoder was designed to achieve visually lossless quality at a typical viewing distance of 60 cm, observers were allowed to view the images as closely as desired. Under the null hypothesis that the three images are indistinguishable, the compressed image should be chosen correctly with a frequency of 1/3. Images compressed using the proposed method were selected with a mean frequency of 0.3494, and a standard deviation of 0.041. Table VIII shows the results of a one-sample t-test that measures whether the mean frequency for images encoded using the proposed method significantly differs from the hypothesized frequency of 1/3. In this test, the hypothesis that the responses were randomly chosen could not be rejected at the 5% significance level. Based on these results, it is claimed that the proposed coding method is visually lossless under the viewing conditions set during the design and validation studies. Fig. 9 shows an original monochrome image together with a version that has been encoded by the proposed method. The encoded image exhibits a PSNR of 39.33 dB, and no differences are visible when the images are displayed using a 1:1 scale. Because PDF viewers may inappropriately scale images according to window size or user settings, and unintended distortions may be introduced during PDF creation, more examples of encoded images (both monochrome and color) are provided online [32] for proper comparison.

VI. C ONCLUSIONS The HVS has varying sensitivity to different color components, spatial frequencies, orientations, and underlying background images. Using this fact, a visually lossless coding algorithm has been presented for 8-bit monochrome and 24bit color images. Visibility thresholds were measured for statistically modeled quantization distortion, and have different values depending on the local variances within each subband. Since quantization distortion appears on various background images, the threshold values are adjusted using a self-masking model and a texture masking model to cope with spatially changing visual masking effects. The resulting thresholds are used to determine the maximum quantization for visually lossless coding. The proposed JPEG2000 Part I compliant coding scheme successfully yields compressed images, whose quality is indistinguishable to those of the original images, at significantly lower bitrates than those of numerically lossless coding and other visually lossless algorithms in the literature. To improve the performance of the proposed algorithm, the distribution model for wavelet coefficients could be replaced by a more sophisticated model such as Gaussian Scale Mixture (GSM) [39], which takes into account spatial correlation between wavelet coefficients. However, a generalized Gaussian model that only uses variance as a parameter was used here to avoid the intricacy of a multi-variable experiment. Also, to preserve independent coding of codeblocks, light adaptation using the LL subband [16] and inter-band masking effects were excluded. Nevertheless, the proposed coding scheme well predicts the appropriate compression parameters required for visually lossless coding. R EFERENCES [1] N. V. Graham, Visual Pattern Analyzers. New York: Oxford University Press, 1989. [2] S. Daly, “Application of a noise-adaptive contrast sensitivity function to image data compression,” Optical Engineering, vol. 29, no. 8, pp. 977–987, 1990. [3] C. Blakemore and F. Campbell, “On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images,” Journal of Physiology, vol. 203, pp. 237–260, July 1969. [4] F. W. Campbell and J. G. Robson, “Application of Fourier analysis to the visibility of gratings,” Journal of Physiology, vol. 197, pp. 551–566, 1968. [5] A. B. Watson, “The cortex transform: rapid computation of simulated neural images,” Computer Vision, Graphics, and Image Processing, vol. 39, no. 3, pp. 311–327, 1987. [6] D. Wu, D. M. Tan, M. Baird, J. DeCampo, C. White, and H. R. Wu, “Perceptually lossless medical image coding,” IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 335–344, 2006. [7] D. S. Taubman and M. W. Marcellin, JPEG2000: image compression fundamentals, standards, and practice. Boston: Kluwer Academic Publishers, 2002. [8] M. Bolin and G. Meyer, “A perceptually based adaptive sampling algorithm,” in SIGGRAPH 99 Conference Proceedings, 1998, pp. 299– 309. [9] Y. Lai and C. Kuo, “Image quality measurement using the Haar wavelet,” in Proceedings of the SPIE, vol. 3169, 1997, pp. 127–138. [10] A. P. Bradley, “A wavelet visible difference predictor,” IEEE Transactions on Image Processing, vol. 8, no. 5, pp. 717–730, 1999. [11] M. Masry, S. S. Hemami, and Y. Sermadevi, “A scalable wavelet-based video distortion metric and applications,” IEEE Transaction on Circuits and Systems for Video Technology, vol. 16, no. 2, pp. 260–273, 2006. [12] M. Nadenau and J. Reichel, “Opponent color, human vision and wavelets for image compression,” in Proceedings of the Seventh Color Imaging Conference, 1999, pp. 237–242.

IEEE TRANSACTIONS ON IMAGE PROCESSING

[13] Kakadu software. [Online]. Available: http://www.kakadusoftware.com [14] W. Zeng, S. Daly, and S. Lei, “An overview of the visual optimization tools in JPEG 2000,” Signal Processing: Image Communication, vol. 17, pp. 85–104, 2002. [15] A. B. Watson, G. Y. Yang, and J. A. Solomon, “Visibility of wavelet quantization noise,” IEEE Transactions on Image Processing, vol. 6, no. 8, pp. 1164–1175, 1997. [16] Z. Liu, L. J. Karam, and A. B. Watson, “JPEG2000 encoding with perceptual distortion control,” IEEE Transactions on Image Processing, vol. 15, pp. 1763–1778, 2006. [17] M. G. Ramos and S. S. Hemami, “Suprathreshold wavelet coefficient quantization in complex stimuli: psychophysical evaluation and analysis,” Journal of the Optical Society of America A, vol. 18, pp. 2385–2397, 2001. [18] D. M. Chandler and S. S. Hemami, “Effects of natural images on the detectability of simple and compound wavelet subband quantization distortions,” Journal of the Optical Society of America A, vol. 20, no. 7, pp. 1164–1180, 2003. [19] ——, “Dynamic contrast-based quantization for lossy wavelet image compression,” IEEE Transactions on Image Processing, vol. 14, no. 4, pp. 397–410, 2005. [20] M. D. Smith and J. Villasenor, “JPEG-2000 rate control for digital cinema,” SMPTE Motion Imaging Journal, vol. 115, no. 10, pp. 394– 399, 2006. [21] L. Li and Z.-S. Wang, “Compression quality prediction model for JPEG2000,” IEEE Transactions on Image Processing, vol. 19, no. 2, pp. 384–398, 2010. [22] H. Oh, A. Bilgin, and M. W. Marcellin, “Visibility thresholds for quantization distortion in JPEG2000,” in Proceedings of QoMex, San Diego, July 2009. [23] H. Oh, Y. Kim, M. W. Marcellin, and A. Bilgin, “Visually lossless coding for color aerial images using JPEG2000,” in Proceedings of International Telemetering Conference, Las Vegas, Oct 2009. [24] H. Oh, A. Bilgin, and M. W. Marcellin, “Visually lossless JPEG2000 using adaptive visibility thresholds and visual masking effects,” in Proceedings of Asilomar Conference on Signals, Systems and Computers, Pacific Grove, Nov 2009. [25] S. G. Mallat, “A theory for multiresolution signal decomposition - the wavelet representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674–693, 1989. [26] M. W. Marcellin, M. A. Lepley, A. Bilgin, T. J. Flohr, T. T. Chinen, and J. H. Kasner, “An overview of quantization in JPEG2000,” Signal Processing: Image Communication, vol. 17, pp. 73–84, 2002. [27] A. Cohen, I. Daubechies, and J. C. Feauveau, “Biorthogonal bases of compactly supported wavelets,” Communications on Pure and Applied Mathematics, vol. 45, no. 5, pp. 485–560, 1992. [28] D. H. Brainard, “The psychophysics toolbox,” Spatial Vision, vol. 10, no. 4, pp. 433–436, 1997. [29] M. Menozzi, U. Napflin, and H. Krueger, “CRT versus LCD: A pilot study on visual performance and suitability of two display technologies for use in office work,” Displays, vol. 20, no. 1, pp. 3–10, 1999. [30] D. M. Chandler, N. L. Dykes, and S. S. Hemami, “Visually lossless compression of digitized radiographs based on contrast sensitivity and visual masking,” in Proceedings of the SPIE, vol. 5749, 2005, pp. 359– 372. [31] C. S. Furmanski and S. A. Engel, “An oblique effect in human primary visual cortex,” Nature Neuroscience, vol. 3, pp. 535–536, 2000. [32] Supplemental images. [Online]. Available: http://www.spacl.ece.arizona. edu/ohhan/visually lossless/ [33] G. E. Legge and J. M. Foley, “Contrast masking in human vision,” Journal of the Optical Society of America A, vol. 70, no. 12, pp. 1458– 1471, 1980. [34] D. S. Taubman, “High performance scalable image compression with EBCOT,” IEEE Transactions on Image Processing, vol. 9, no. 7, pp. 1158–1170, 2000. [35] S. Daly, “The visible differences predictor: and algorithm for the assessment of image fidelity,” in Digital Images and Human Vision, A. B. Watson, Ed. Cambridge, MA: The MIT Press, pp. 179–206. [36] USC-SIPI image database. [Online]. Available: http://sipi.usc.edu/ database/ [37] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik. LIVE image quality assessment database release 2. [Online]. Available: http://live.ece.utexas.edu/research/quality [38] Kodak lossless true color image suite. [Online]. Available: http: //r0k.us/graphics/kodak/

12

[39] M. J. Wainwright and E. P. Simoncelli, “Scale mixtures of gaussians and the statistics of natural images,” in Advances in Neural Information Processing Systems, S. A. Solla, T. K. Leen, and K. R. Muller, Eds. Cambridge, MA: The MIT Press, pp. 855–861.

Han Oh Biography text here.

PLACE PHOTO HERE

Ali Bilgin Biography text here.

Michael W. Marcellin Biography text here.

Visually Lossless Encoding for JPEG2000 ieee.pdf

Each component is then independently transformed to. the wavelet domain by the DWT, and compression is achieved. via bit-plane coding of wavelet ...

Download PDF

1MB Sizes 0 Downloads 172 Views

Report

Visually Lossless Encoding for JPEG2000 ieee.pdf

Recommend Documents