Proceedings of the 26th Annual International Conference of the IEEE EMBS San Francisco, CA, USA • September 1-5, 2004
DSP Based Image Processing for Retinal Prosthesis 1
Neha J. Parikh2, James D. Weiland1,3 , Mark S. Humayun1,3 , Saloni S. Shah2, Gaurav S. Mohile2
Doheny Retina Institute, Doheny Eye Institute, Department of Ophthalmology, Keck School of Medicine, University of Southern California, Los Angeles, CA 2 Department of Electrical Engineering, University of Southern California, Los Angeles, CA 3 Department of Biomedical Engineering, University of Southern California, Los Angeles, CA
Abstract -- The real-time image processing in retinal prosthesis consists of the implementation of various image processing algorithms like edge detection, edge enhancement, decimation etc. The algorithmic computations in real-time may have high level of computational complexity and hence the use of digital signal processors (DSPs) for the implementation of such algorithms is proposed here. This application desires that the DSPs be highly computationally efficient while working on low power. DSPs have computational capabilities of hundreds of millions of instructions per second (MIPS) or millions of floating point operations per second (MFLOPS) along with certain processor configurations having low power. The various image processing algorithms, the DSP requirements and capabilities of different platforms would be discussed in this paper. Keywords – Real-time Image Processing, DSP I.
INTRODUCTION
An electronic retinal prosthesis is being developed to treat retinal degenerative diseases such as retinitinis pigmentosa (RP) and age-related macular degeneration (AMD) [2.3] which are two leading causes of blindness. In this disease, the photoreceptor cells of the retina are affected but other retinal cells remain relatively intact. Hence, the retinal prosthesis is designed to electrically activate these remaining cells of the retina. The image processing system as part of the retinal prosthesis is proposed to be finally implemented on a patient’s spectacles. A small camera capturing images in real-time and a DSP using these captured images as input, processing them and outputting the processed images form the image processing sub-system. The processed output will be used to stimulate a 32x32 grid of electrodes. Because of the area constraints and the fact that this system is proposed for humans, the entire system should be small in size, low weight and operating on low power. The probable functions that would be a part of the image processing algorithm would be edge-enhancement, edgedetection, spatial averaging and low-pass filtering, decimation and zooming. The retina acts as an image processor in the eye and one of its function is to do continuous edge-detection. The edge-detection algorithm is implemented using high-pass filter masks and by calculating gradients. Edge enhancement algorithm sharpens the edges in the image frame before
0-7803-8439-3/04/$20.00©2004 IEEE
1475
decimating the image in order to preserve more edges. Lowpass filtering is implemented using two different low-pass filter masks, the averaging and bi-cubic filter mask, for comparison. Decimation is implemented to output a final image of 32x32 dimensions in order to stimulate the 32x32 grid of electrodes. A zooming algorithm zooms in the 32x32 output image to a 480x480 image in order to be able to perceive the output image. This algorithm is only for experimental purposes to observe the output on a monitor. The computational complexity lies in the algorithms for capturing the image, doing the processing and then for displaying the output image on a monitor. Because, in the practical case the output from the DSP would eventually stimulate the grid of electrodes in the retina, the computations for the display function will not be needed. An algorithm for object detection from the captured frame and zooming of that object to fit a 32x32 grid is also under consideration. The computational complexity for such an algorithm may be unrealistically high for a low-power DSP and so initially the above stated existing image processing algorithms are implemented to test the functionality of the processors in real-time. This paper considers computational load of decimation, and low-pass filtering and the ability of current DSPs to implement these functions in real-time. II.
METHODOLOGY
The current camera captures images having a resolution of 480x640 at a rate of 30 frames/sec. The image processing on these frames consists of edge detection, edge enhancement, decimation and low-pass filtering. The decimated output image will be a 32 x 32 image thus decimating the original image by a factor of 7.5 x 10. Currently, a these algorithms are implemented on a TMS320C6711 DSP. 1) Algorithms and Computational Complexity: One of the algorithms implements decimation and low-pass filtering using averaging and bi-cubic filter masks on every frame. The masks used for averaging and bi-cubic filtering are given in Figure 1a. Filtering and decimation are done
simultaneously in the convolution stage itself. The other algorithm implements edge enhancement and edge detection. The final images in all three algorithms are 32x32 images. Edge Enhancement is done using Laplacian filter mask which is given in Figure 1b. The edge image obtained using Laplacian filter mask is scaled and added to the original image to enhance the edges. In the edge detection algorithm, Sobel filter masks for the x and y direction gradients are used. The filter masks are as given in figure 1c. The images obtained after convolution of the original image with each of these masks are added to give the edge detected image. 2) DSP Specifications and Characteristics: Some of the DSP Platforms offered by Texas Instruments (TI) are explored for their specifications to choose the right DSP for the project. TI offers a range of DSPs for video and image applications. The basic platforms for imaging applications in real-time include the TMS320C6x and TMS320C5x platforms. These can take in camera inputs in real-time at varying frame rates, process the input frames and output the processed frames. The number of instruction cycles taken by the DSP to process one frame should not exceed the time between the input of two frames to the processor because that would lead to a loss of information between successive frames. For this reason the number of MIPS that can be done by the processors is an important criterion. Most of the DSPs in the C6x and C5x platform families have computational capabilities ranging from 600 MIPS to around 5800 MIPS [613]. Table I states the frequency of operation, time taken for one instruction cycle, power consumption and number of MIPS for different processors in the TI TMS320C6x and TMS320C5x series. DSPs doing both fixed-point and floating-point computations are available. Floating-point arithmetic is general in implementation as compared to fixed-point arithmetic and hence floating-point processors are easier to program. But due to this, the processors have more power consumption [5]. The priority of this system is low power and hence fixed-point processors will be used in the final implementation. The two main processors in the C6x series doing fixed-point arithmetic and helpful for our application are the TMS320DM642 and TMS320C6416 DSK. The C5x series processors are also fixed-point processors [6]. All the processors compared in Table I are fixed-point processors except TMS320C6711 which is a floating-point processor. III.
RESULTS
Both the low-pass filters tend to blur the image. The results of the averaging and bi-cubic low-pass filters are compared and because of such a large factor of decimation, there is not much visible difference between the outputs obtained by those. Decimating by such a large factor naturally causes a lot of loss of details and therefore the final image is more of a contrast – level based image than a sharp image
1476
with well-defined edges. The edge enhancement doesn’t give a striking difference between the edge enhanced decimated output image and the simply decimated output image. This also is due to the large factor of decimation involved in the problem. The edge detection algorithm detects edges which also are less because of the reduction in size of the image. However, this is an important processing done by the retina and hence the algorithm is implemented here. From the summary of power consumptions for TI Processors in Table I, it can be observed that the C5x processors are less power consuming but have significantly less capacity of the number of MIPS. Therefore for the initial research stage, we plan to develop an algorithm on a C6x processor, optimize it and see to which C5x platform can it be transferred to in order to have a low-power system. Bi-cubic Filter Mask
Averaging Filter Mask
Fig.1a. Bi-cubic and averaging filter masks, Reference [4]
Fig.1.b. Laplacian filter mask, Reference [4] Row Gradient (x)
Column Gradient (y)
Fig.1.c Sobel filter masks for x and y gradients, Reference [4]
TABLE I
OPERATIONAL SPECIFICATIONS FOR DIFFERENT PROCESSORS [6.7.8.9.10.11.12.13]
DM 642
C641x
C6711
C54x
C55x
Frequency of Operation in MHz
400 500 600
500 600
150
50 – 160
144 -200
Power Consumption In Watts (W)
1.05 1.15 1.7
0.64 1.04
1.1
0.04– 0.09
0.065 – 0.16
MIPS/ MFLOPS
4000 4000 4800 MIPS
4000 4800 MIPS
900 MFLOPS
50 -532 MIPS
288 – 400 MIPS
2.5 2 1.67
2 1.67
6.7
0.02 – 6.25
6.94 – 2.5
Instruction Cycle Time in Nano Seconds (n sec)
The number of clock cycles taken by the processor to process each frame in the above algorithms is obtained. A comparison of the number of frames that could be calculated by the processors in consideration is given in Table II. More number of frames can be processed by processors having higher frequency of operation. For this table, it is assumed that all the processors take the same number of instruction cycles to process same frames. However, this may not be true as the architecture of each processor is different. Yet, almost all the processors have a faster architecture than the C6711 and could process the same inputs in fewer instruction cycles than the C6711. So the processors in the comparison may be able to handle even more number of frames than stated. The current system also implements the zooming function. These computations would not be needed in the final implementation stage and so some amount of computational complexity will be reduced. Our requirement is for 30 frames/sec of processing and hence we hope to be able to optimize the final algorithm to implement it within these constraints on the above mentioned processors.
TABLE II
COMPARISON OF THE NUMBER (NO.) OF FRAMES PROCESSED PER SECONDBY EACH OF THE PROCESSORS IN CONSIDERATION Average number DM of clock 642 C641x C6711 C54x C55x cycles /frame No. of frames per second for averaging filter
6458069
7792
77 92
23
7-24
2230
No. of frames per second for bi-cubic filter
6457935
5465
5465
23
7-24
2230
No. of frames per second for Edge Enhancement algorithm
2086330
239287
239287
71
2376
6995
No. of frames per second for Edge Detection algorithm
3324606
150180
150180
45
1548
4260
IV. DISCUSSION
The TMS320DM642 has a VLIW Architecture which is a highly parallel architecture in which multiple instructions are executed per cycle. Its architecture being focused on parallelism, cycle efficiency also improves. This architecture reduces code size, number of program fetches and power consumption [10]. The TMS320C5x family is the most power-efficient. The DSP core continuously monitors which parts of the chip are not in use and powers them off when not needed [6]. Hence for the research stage and algorithm development, we plan to use the TMS320C642 platform to concentrate on the algorithm complexities and output results. After having established a good algorithm and having observed the computational complexities of that, we would switch to a TMS320C5x platform to implement the system on a lowpower basis. ACKNOWLEDGEMENT
“This research was performed at the Biomimetic Microelectronics Systems Engineering Research Center. This material is based on work supported by the National Science Foundation under Grant No. EEC-0310723”
1477
REFERENCES [1] R. Chassaing, DSP Applications using C and TMS320C6x DSK, NY: Wiley Interscience, 2001. [2] Margalit, E., Maia, M., Weiland, J. D., Greenberg, R. J., Fujii, G. Y., Torres, G., Piyathaisere, D. V., O' Hearn, T. M., Liu, W., Lazzi, G., Dagnelie, G., Scribner, D. A., de Juan E Jr, and Humayun, M. S., "Retinal prosthesis for the blind," Surv.Ophthalmol., vol. 47, no. 4, pp. 335-356, July2002. [3] Zrenner, E., "Will retinal implants restore vision?," Science, vol. 295, no. 5557, pp. 1022-1025, Feb.2002. [4]
W. Pratt, Digital Image Processing, 3rd Edition, Wiley 2001.
[5] Choosing a DSP Processor, Berkeley Design Technology, Inc. http://bbs.chinaecnet.com/pdf/cdsp.pdf [6] Texas Instruments, Compare C5000 DSPs: http://dspvillage.ti.com/docs/catalog/generation/details.jhtml?temp lateId=5147&path=templatedata/cm/dspdetail/data/c5000compare [7] Texas Instruments, TMS320C6711 DSK: http://focus.ti.com/docs/toolsw/folders/print/tmds320006711.html [8] Texas Instruments, TMS320C6711/11B/11C/11D Floating-Point Digital Signal Processors: http://focus.ti.com/lit/ds/symlink/tms320c6711.pdf [9] Texas Instruments, TMS320C6x/C67x Power Consumption Summary: http://focus.ti.com/lit/an/spra486c/spra486c.pdf [10] Texas Instruments, TMS320DM642 Product folder http://focus.ti.com/docs/prod/folders/print/tms320dm642.html [11] Texas Instruments, TMS320DM642 Power Consumption Summary: http://focus.ti.com/lit/an/spra962a/spra962a.pdf [12] Texas Instruments, TMS320C6416 Product folder http://focus.ti.com/docs/prod/folders/print/tms320c6416.html [13] Texas Instruments, TMS320C6416 Power Consumption Summary: http://focus.ti.com/lit/an/spra811c/spra811c.pdf
1478