1
A Novel Method for Embedding Audio in Still Images Jeny Rajan1, Roopak Sudhakar2, M.R.Kaimal3, George Harris1, K. Kannan1
Abstract—In this paper we propose a new method for embedding audio files in still images without altering the size or storage space of images. This technique adds a new lifelike dimension to still images. The basic concept of this method is to replace the HH component (after taking 2D integer wavelet transform) of the still images with the contents of the audio file. The maximum permissible size of the audio file depends on the size of the image file. The method provides a way to store and send sound along with images. Index Terms—Audio, Image, Lifting scheme, Wavelets.
I. INTRODUCTION
A
still image with sound is really a fascinating concept, and if it can be accomplished without changing the size, storage space and quality of the image, then it can be all the more exciting. Not much works are reported in the literature regarding embedding audio in still images. This paper introduces a method for embedding audio in image without causing visual corruption or audible disturbances to the source image or audio signal. The method employs integer wavelet transform which has gained much attention in the recent past, especially in the area of lossless image coding. The maximum permissible size of the audio file to be embedded depends on the size of the image file. Quantity of audio can be augmented with a trade-off between image quality and audio signal. This concept can also be applied in the following areas • in mobile devices & cameras: can be used to record sound along with picture without taking additional memory. • Video applications : audio data can be stored in the video frames itself, thus avoiding additional storage space for audio data. As the audio data is embedded in the image data itself, the Audio Embedded Still Image (AESI) image can be stored in any image format (like gif, tiff, jpeg, bmp etc). But if the image compression technique is lossy, there is a chance for distortion in the reconstructed sound. Any further processing of image may change the embedded audio content. The time Jeny Rajan, George Harris & K Kannan is with the Medical Imaging Research Group, Healthcare Division, NeST, Technopark, Trivandrum, INDIA (corresponding author to provide phone: +91 9446113729; fax: +91471-2700442; e-mail:
[email protected] ). Roopak Sudhakar is with Center for Development of Advanced Computing, Trivandrum, INDIA. M.R Kaimal is with the Dept. of Computer Science, University of Kerala, Trivandrum, INDIA.
required for embedding and reconstructing the audio signals is very less. Another important characteristic of this method is that, there is no way for the user to know whether sound is embedded or not. Color images can store 3 times more data than gray level image because of R, G, and B channels compared to 1 channel in the gray image.
II. INTEGER WAVELET TRANSFORMS Wavelets have proved to be useful in various application domains such as: signal and image processing, data compression, data transmission, the numerical solution of differential and integral equations, and noise reduction [1]. It is used as a versatile tool for lossy and lossless image encoding for the last two decades. In most applications the wavelet filters that are used have floating point coefficients [3]. Thus when the input data consist of sequence of integers (as in the case of images), the resulting filtered outputs no longer consists of integers. Integer to integer wavelet transforms maps an integer data set into another integer data set. These transforms are perfectly invertible and yield exactly the original data set. Integer wavelet transform is one of the key components of lossless image coding. To retain the image size after embedding audio and to avoid data loss when the embedded image is saved as a particular image format (if the data values are in float, it has to be converted to integer type resulting in data loss) integer transform is a must. In [3, 4, and 6] methods have been mentioned for integer wavelet transform. We have used integer version of the Lifting Scheme implementation of the Haar transform for implementing AESI. Lifting is a flexible technique that can be used in several different settings, for an easy construction and implementation of traditional wavelets, and of second generation wavelets such as spherical wavelets [3]. Lifting scheme implementation of integer version of haar can be written as [3]
d1,l = s0, 2l +1 − s0, 2l Forward Transform
s1,l = s0, 2l + ⎣d1,l / 2⎦
2
s0, 2l = s1,l − ⎣d1,l / 2⎦
Integer Wavelet Transform
Inverse Transform
s0, 2l +1 = d1,l + s0, 2l s
Original Image
s
1D sound Signal
d
where 0, j is the original signal of interest, 1, j and 1, j are the lowpass and highpass coefficients respectively after the wavelet transform.
III. PROPOSED METHOD In the proposed method the HH component obtained after applying Integer wavelet transform (Haar as the lift wave) is replaced with the quantized sound wave. In this experiment we used “wav” files as the input audio and is in the range –1 to +1. To keep in par with the 8-bit image intensity values the sound values are multiplied by 100 and the fractional part is discarded, which will not affect the sound quality to a distinguishable level. The algorithm for embedding and extracting audio into still image is given below.
HH is replaced with 2D sound signal
2D converted 1D signal
Inverse integer Wavelet Transform
Audio Embedded Image
Fig 1: Process of embedding sound in still image
Audio Embedding Algorithm Step 1. Calculate the four subbands LL, LH, HL and HH by applying the integer wavelet transform. Step 2. Convert the 1D audio signal array to 2D array of same size as HH band of the image array. Step 3. Quantize the audio array after multiplying it with 100 to keep in par with 8-bit intensity value. Step 4. Replace HH with the audio array and calculate the inverse integer wavelet transform. Step 5. Normalize the data in 0-256 (8 bit) range. Step 6. Write the array in any of the image format.
Audio Extraction Algorithm
(a)
(b)
(c)
Step 1. Calculate the four subbands LL, LH, HL and HH of the sound embedded image by applying the integer wavelet transform. Step 2. Convert the 2D audio HH array to 1D array and divide the array values by 100. Step 3. Write the array in sound signal format. (d) Fig 2: (a) Original Image (b) image with audio (c) original audio signal (d) reconstructed audio signal
3
Table 1: Analysis of audio embedded image and reconstructed audio signal based on SSIM and PSNR Original SSIM after PSNR after Image reconstruction reconstruction
Elaine Boat Lena moon Elaine Boat Lena moon Elaine Boat Lena moon
0.9114 0.9477 0.9635 0.9308 0.8773 0.9137 0.9220 0.8279 0.9014 0.9367 0.9479 0.9194
31.34 dB 31.22 dB 37.96 dB 29.77 dB 28.17 dB 30.37 dB 35.01 dB 26.54 dB 35.06 dB 30.81 dB 37.06 dB 29.59 dB
Original Signal
PSNR after reconstruction
voltage.wav voltage.wav voltage.wav voltage.wav wobble.wav wobble.wav wobble.wav wobble.wav bubble.wav bubble.wav bubble.wav bubble.wav
43.14 dB 39.11 dB 44.99 dB 44.61 dB 43.89 dB 32.08 dB 43.69 dB 40.94 dB 42.59 dB 35.52 dB 43.23 dB 43.65 dB
voltage.wav
IV. EXPERIMENTAL RESULTS Experiments were carried out on a set of standard test images shown in Fig 3. Statistical analysis was done on the basis of Structural Similarity Index Matrix (SSIM) [7] and PSNR. Fig 2 shows the analysis of original image and audio with audio embedded image and reconstructed audio signal. Fig 2 (a) shows the original image and (c) original audio signal (b) audio embedded image and (d) reconstructed audio signal. It can be seen that there is only negligible difference in the audio embedded image and reconstructed signal from the original one. A detailed analysis with a set of standard test images and audio signals is shown in Table-1. The higher value of SSIM and PSNR shows that the audio embedded image and reconstructed audio signal is of good quality.
Elaine
wobble.wav
Boat bubble.wav Fig 4: Audio signals used for testing
Lena Fig 3: Test images used
Moon
4 V. CONCLUSION A method for embedding audio in still images is introduced in this paper. Interesting aspect of the method is its ability to preserve original image size, storage space and image quality. Embedded audio can be easily retrieved from the image without much distortion. The method provides a way to send sound along with images keeping the same bandwidth. REFERENCES [1]
[2] [3]
[4]
[5]
[6]
[7]
W. Sweldens , "The Lifting Scheme: a Construction of Second Generation of Wavelets," SIAM J. Math. Anal., 29 (2), pp. 511-546. 1998 J. Shapiro, “Embedded Image Coding Using Zerotrees of Wavelet Coefficients”, IEEE Trans. On Signal Processing, Vol. 41, No. 12, 1993. A. R. Calderbank, I. Daubechies, W. Sweldens and B.L. Yeo, “Wavelet Transforms that maps integers to integers”, Technical report, Department of Mathematics, Princeton University, 1996. A. R Calderbank, I. Daubechies, W. Sweldens, B.L Yeo, “Lossless Image Compression Using Integer to Integer Wavelet transforms, Proc. of IEEE International Conference on Image Processing, vol. 1, pp. 596—599, 1997 S.G. Mallat, “A theory of multiresolution signal decomposition : The wavelet representation”, IEEE transactions on Pattern analysis and machine intelligence, pp 674-693,Vol. 11, No. 7, 1989 Omer N Gerek and Enis Cetin, “A 2-D Orientation adaptive prediction filter in lifting structures for image coding”, IEEE Trans. on Image Processing , Vol. 15, No. 1, pp.106-111, Jan. 2006. Zhou Wang, Alan Conard Bovik, Hamid Rahim Sheik and Erno P Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity”, IEEE Trans. Image Processing, Vol. 13, No. 4, April 2004.