Multi-Sensor Data Fusion across Time and Space Pierre V. Villeneuvea, Scott G. Beavena, Robert Reedb Space Computer Corporation, 12121 Wilshire Blvd, Suite 910, Los Angeles, CA 90025 2 Arnold Engineering Development Center, 1099 Schriever Ave, Arnold AFB, TN 37389

1

ABSTRACT Field measurement campaigns typically deploy numerous sensors having different sampling characteristics for spatial, temporal, and spectral domains. Data analysis and exploitation is made more difficult and time consuming as the sample data grids between sensors do not align. This report summarizes our recent effort to demonstrate feasibility of a processing chain capable of “fusing” image data from multiple independent and asynchronous sensors into a form amenable to analysis and exploitation using commercially-available tools. Two important technical issues were addressed in this work: 1) Image spatial registration onto a common pixel grid, 2) Image temporal interpolation onto a common time base. The first step leverages existing image matching and registration algorithms. The second step relies upon a new and innovative use of optical flow algorithms to perform accurate temporal upsampling of slower frame rate imagery. Optical flow field vectors were first derived from high-frame rate, highresolution imagery, and then finally used as a basis for temporal upsampling of the slower frame rate sensor’s imagery. Optical flow field values are computed using a multi-scale image pyramid, thus allowing for more extreme object motion. This involves preprocessing imagery to varying resolution scales and initializing new vector flow estimates using that from the previous coarser-resolution image. Overall performance of this processing chain is demonstrated using sample data involving complex too motion observed by multiple sensors mounted to the same base. Multiple sensors were included, including a high-speed visible camera, up to a coarser resolution LWIR camera.

1. INTRODUCTION Field measurement campaigns typically deploy numerous sensors having different spatial, temporal, and spectral sampling characteristics. These sensors generally span different spectral regimes using one or more spectral channels. Examples include: 1) high-resolution visible panchromatic framing camera, 2) high-resolution RGB video, 3) MWIR / LWIR multiband, 4) VNIR/SWIR multispectral or hyperspectral. Data analysis and exploitation may be hindered by the fact that the observed experiment phenomenology is sampled by each sensor at significantly different spatial resolutions and temporal frame rates. This makes it difficult and time consuming to derive conclusions from an experiment when a phenomenon of interest spans multiple sensors. Commercially available spectral analysis software (e.g. ENVI, Opticks, or Matlab) generally require an input data product to be in the form of a contiguous three-dimensional array (two spatial dimensions and one spectral or temporal dimension). There is a clear need for methods to process a sequences of image data sets from disparate sensors and output a single time sequence of data accurately resampled to a common spatial and temporal grid. Two important technical issues must be addressed as part of solving this problem: 1) Image spatial registration onto common pixel grid, 2) Image temporal interpolation onto common time base. Once these two processing steps are applied, only then may the sensor data be joined into a single data product suitable for exploitation using commercially-available software tools. The first issue involves transforming the sensors’ image data such they all exist on a common pixel grid. Without this step it would not be possible to make quantitative comparisons between the different sensor data sets. Fortunately solutions to this problem exist in many forms and algorithms. In this effort we leveraged several existing methods to support spatial registration. The second technical issue involves accurately mapping the sensors’ image sequences onto a common time grid (after having been mapped onto a common spatial pixel grid). Solutions to this problem must account for dynamic scene content having spatial features changing with time, e.g. an expanding rocket plume or a walking pedestrian.

2. OBJECTIVE AND SCOPE The scope of this effort was focused on combining the data stream from a conventional spectral sensor with data from adjunct sensors in order to achieve enhanced spatial, temporal, and spectral performance that is substantially beyond the limits achievable by any of the contributing stand-alone system. The primary objective of the effort was to develop a proof-of-concept processing approach to help evaluate the feasibility of a software system capable of “fusing” image data from multiple asynchronous sensors into a form amenable to exploitation using commercially-available analysis tools. This would enable the end-user to focus on high-level experiment data analysis rather than on time-consuming data preparation and setup tasks.

Figure 1 – Top-level view of data processing chain.

The algorithm and software development work presented in this report falls into three categories: 1.

Spatial Image Registration: Leverage existing image processing techniques and tools to align input data onto a common spatial pixel grid.

2.

Temporal Upsampling: Apply concepts learned from prior optical-flow and signal-processing efforts to moreaccurately upsample slower-frame-rate sensor data. Compute optical-flow information derived from fasterframe-rate and co-boresighted sensors. Use this information as a basis for temporal interpolation of adjunct sensors.

3.

Feasibility Demonstration: Evaluate feasibility of the approach by applying multispectral exploitation methods to controlled test data.

3. SCENARIO TEST DATA Controlled test data was used in developing/demonstrating the processing developed under this effort. See Table 1 for details. This data set consists of five simultaneous video streams of pedestrians walking across a compound. Each video sequence corresponds to a sensor having different spatial, spectral, and temporal characteristics. Example frames from these image sequences are shown below in Figure 3.

Table 1 – Summary of controlled test imagery at different wavebands, sizes, and frame rates.

Index 1 2 3 4 5

Color Pan Red Green Blue IR

Format 320 x 240 320 x 240 256 x 256 256 x 256 384 x 288

Frame Rate 30 30 15 10 5

Scale 1.00 1.00 1.28 1.28 1.20

Sync Y Y Y Y N

Figure 2 – Sample image frames from multiple-sensor video sequences used to develop and demonstrate processing concepts.

4. SPATIAL PROCESSING The goal of the spatial processing task is to account for the different spatial characteristics of the set of sensor to be fused. These include the sensor spatial resolution/sampling, field-of-view, and sensor motion. In general all of these elements must be treated to align the data spatially onto a common spatial grid. This process requires that one of the sensors be selected as the reference or a virtual image canvas be defined upon which to resample all the data. Typically we use the former. This reference might be: a) the sensor with the most field-of-view in common with the other sensors, b) the one with the finest spatial resolution, or c) both. To register multiple data sources requires that the sensors must be located at same general vantage point and that they are viewing the same general scene. Also, depending on the application, additional information may be desired, including the inertial metadata for the sensor platform (location, orientation) and precision timing or synchronization between the image frame acquisition and the metadata times.

4.1 Image registration work flow In general there are two stages of spatial registration. The first is a coarse alignment to a common grid. The second stage is a scene-based registration refinement to sub-pixel precision. In the case of airborne and space-based imaging sensors, the first stage typically involves geo-registration or projection onto a UTM grid based on position and IMU information from the sensor suite. For other applications, including ground-based systems, the selected coordinate system may either be a uniform grid in a plane perpendicular to the look direction or simply the angular (image) coordinates of one of the sensors selected as the reference. In the case where a geo-grid is used we project sensor imagery to Earth background using precision GPS/INS pointing information. The second stage is to apply scene-based registration to transform one sensor image to best match another image. We have investigated three related methods: local-area phase correlation, gradient-based optical flow and feature-matching approaches. In this report we focus primarily on the feature-matching approach. The flow chart in Figure 3 illustrates the general framework for image registration.

Figure 3 – General framework for cross-sensor image registration.

4.2 Spatial processing applied to test data One of the key elements for cross-sensor processing is spatial image registration. Since this case includes a stationary set of imagers a single registration transformation may be used for the entire sequence. To reduce the impact of temporal changes in that type of scenario we typically apply a temporal median filter to effectively eliminate (or minimize) the movers and use the long-term median to define the cross-sensor spatial registration. In cases where the sensors are not stationary we would have to apply the spatial registration on temporally resampled data sets on potentially every frame. Figure 4 and Figure 5 show an example of computing a stable background for the test scenario image sequences.

Figure 4 - Temporal median filtering eliminates transient scene content..

Figure 5 –Stable background via temporal median filtering.

For this process we selected a single source (Blue) as the reference sensor to which all other data are to be spatially registered. This source was selected because it had the smallest field of view, which drives the area over which all sensor data are available. To register these data we used SURF[1] image features extracted from each sensor’s temporal median image. We then derived projection transforms from each source to the final reference pixel grid (Blue). In our initial tests the registration of the LWIR source to the blue source was not well-characterized by SURF features. We thus fell back to using usergenerated tie points between the LWIR source and the Blue source. An example of automatically-computed SURF tie points between the red sensor and blue sensor is shown in Figure 6, and the resulting warped imagery is shown in Figure 7. We then use the registration derived from the median images to process the entire time sequence from each sensor to provide the spatial component of the cross-sensor cube formation.

Figure 6 – Automated feature detection and image-to-image association.

Figure 7 – All provided sensor data (green, red, pan, LWIR) spatially warped to reference sensor imagery (blue).

5. TEMPORAL PROCESSING Temporal upsampling is the critical element of this processing chain. The basic problem is that different sensors are acquiring data at different rates and may be asynchronous. At this stage of the process we assume that the data are spatially registered based on the approaches explained in the previous section. We now seek an accurate method for interpolating data sequences to a common time base across the sensors. A complicating factor is that there may be significant motion between frames and simple linear upsampling methods are not adequate. The higher frame-rate sensors provide the best information about scene-content motion. The goal here is to compute optical flow details from the high frame rate data, and transfer it to the other adjunct sensors. We require an approach that is a general solution for this problem in order to provide the most utility across a wide range of applications and users. Figure 8 shows the basic problem of taking two sensors acquiring imagery at different times and frame rates, and defining a common time base for which to reference these images.

Figure 8 – Example scenario indicating need to interpolate or upsample sensor data to a reference time base.

Temporal upsampling is comprised of three processing stages: 1.

Compute optical flow between successive frames of high-frame-rate data

2.

Transfer optical flow to slow-frame-rate sensor data sequence

3.

Interpolated slow-frame-rate data to the same time base as high-frame-rate sensor

The first step requires identifying the reference sensor, which would ideally be the sensor operating at the highest frame and highest spatial resolution. We compute the optical flow between consecutive reference sensor images pairs. The second step involves interpolating the optical flow filed from the reference time samples to each target sensors’ time samples. This transfers the optical flow field from the reference sensor to target sensor(s). Thirdly, we then resample the target sensor imagery at new times via a 2-dimensional spatial interpolation.

5.1 Optical flow calculation Optical flow is the pattern of apparent motion of objects[2][3], surfaces, and edges in a dynamic visual scene and is quantified as a 2-dimensional vector at each pixel, e.g. Figure 9. This vector specifies the (x,y) shift by which that pixel’s intensity must be shifted between frames. Optical flow estimation methods assume “brightness constancy” over short time scales (e.g. frame-to-frame). This allows for image features to change physical position within the frame, but the feature brightness is unchanged or changes at rates much slower than frame-to-frame variations. Specifically, let the image

brightness at a point (𝑥, 𝑦) at time 𝑡 be denoted by the parameter 𝑅(𝑥, 𝑦). The brightness constancy constraint may be expressed as 𝜕𝐸 𝜕𝑡

= 0.

Applying the chain rule and defining velocities as 𝑢 = 𝜕𝑥 , 𝑣 = 𝜕𝑦 𝜕𝑡 𝜕𝑡 yields the expression 𝜑 = 𝐸𝑥 𝑢 + 𝐸𝑦 𝑣 + 𝐸𝑡 = 0. Instead of setting this to zero, we minimize the left-hand side of the above equation. This results in an ill-posed problem, so we need to apply additional constraints. We compute the optical flow iteratively by using image spatial/temporal gradients. The total image intensity level is conserved between consecutive frames. We also enforce the constraint that the flow field is smoothly varying over small (spatial) scales. This is done by minimizing the departure from smoothness in the velocity flow, given by the square of the magnitude of the gradient: 𝜔 = ∇2 𝑢 + ∇2 𝑣 =

𝜕2 𝑢 𝜕𝑥 2

+

𝜕2 𝑢 𝜕𝑦 2

+

𝜕2 𝑣 𝜕𝑥 2

+

𝜕2 𝑣 𝜕𝑦 2

.

Hence, the total error to be minimized is: 𝜖 = ∬(𝛼 2 𝜔 + 𝜑)𝑑𝑥𝑑𝑦, for a suitable value of α. An iterative solver is used to find an approximate solution.

Figure 9 – Optical is the pattern of apparent motion of objects in a dynamic visual scene.

One complication to this approach is when there is extreme motion, these assumptions may be violated resulting in degraded performance. An approach that is less sensitive to extreme motion is to use a multi-scale image pyramid[4][5]. This approach involves generating imagery at varying resolution scales and initializing vector estimates using the coarserresolution images. It iterates across the spatial resolution scales by carrying forward the vector estimates to start the flow calculations that the next-finer resolution.

5.2 Temporal interpolation and upsampling Once we have the optical flow field between successive frames from the reference sensor we apply the reference flow field to the target data. This requires temporal interpolation as shown in Figure 10. We first shift the target data pixels according to the transferred optical flow. Then we interpolate back to a regular grid via Delaunay triangulation. The result is a predicted target image frame at the reference sensor’s time base.

Figure 10 – Temporal upsampling using optical flow derived from high-frame rate reference imagery.

Optical flow algorithms were investigated to characterize changes in image intensity from one image to another. Most current-day techniques derive from work originally performed by Horn and Schunck[2], as well as Lucas and Kanade[4]. We performed preliminary optical flow algorithm testing across individual temporal sequences for each of the image sources. We illustrate the process using three selected frames from provided high-resolution panchromatic data shown in Figure 11. This sequence was used to validate our processing chain. We computed optical flow between frames 165 and 167, and made a prediction for the sensor data at the time of frame 166. Comparisons of this prediction with the true data for frame 166 is a measure of performance for our algorithm.

Figure 11 – Three-frame sequence for optical-flow calculations.

The measured optical flow between frames 165 and 167 is shown in Figure 12. We then compared the estimation of the intermediate frame with the actual measured frame 166. The relative percent error is shown in Figure 13. The outcome of this analysis gives confidence that our basic methods and processes are reasonable. Accuracy of reconstruction of intermediate frames provides a good test of the processes used and serves to guide our decisions on processing methods, and parametric sensitivity for potential follow-on activities.

Figure 12 – Optical flow detail for the well-dressed pedestrian.

Figure 13 – Comparison of estimation based on frames 165 and 167 with the true frame 166. The optical flow based method provides a reasonable estimate of the intermediate frame. This type of analysis may be useful for evaluating the cross-sensor methods and other methods being investigated.

6. PROCESSING FUSED DATA We applied the end-to-end spatial and temporal processing chain to our test imagery with the goal of enabling further advanced data exploitation not otherwise possible. We introduced a simple pan-sharpening algorithm to be applied to the predicted high-frame-rate data at each frame time. 6.1 Fused RGB Figure 14 shows the combined product from registration and upsampling the three individual red, green, and blue images provided into a combined or fused RGB image. In this case the individual images input to the process had different spatial and temporal characteristics. The Blue imagery was 256x256 (spatial) and was sampled at 10 Hz; the Green was 256 x 256 (spatial) and sampled at 15 Hz; and the Red was 320 x 240 (spatial) and sampled at 30 Hz. The resulting fused product is sampled at 30 Hz and the motion was derived from the fastest frame rate source.

Figure 14 – Independent sensor data fused into single multi-band data product.

6.2 Pan-Sharpened RGB We also used the high-spatial resolution panchromatic imagery with these RGB images to produce a pan-sharpened image product as shown in Figure 15. This brings the high-spatial resolution content to the temporally upsampled product and provides a significant improvement in the interpretation of spatial content when compared to the RGB product.

Figure 15 – Pan-sharpening applied to fused RGB image data product.

Figure 16 – Pan-sharpening applied to LWIR imagery.

6.3 Pan-Sharpened LWIR We also applied the process to the panchromatic high resolution imagery and the LWIR data provided. This case is more stressing since it uses two disparate imaging systems that operate asynchronously and are observing different phenomenologies. This stresses both the spatial and temporal elements of the processing stream. An example frame from applying our process to these data is shown in Figure 16. The combined panchromatic/LWIR product would allow a user to readily identify the hot elements in the scene, since the panchromatic imagery provides more spatial content than the LWIR imagery. This highlights the feasibility of advanced processing with data collected from disparate sensors, and the potential to provide interpretable fused image products from disparate sensors.

7. CONCLUSIONS In this effort we demonstrated the feasibility of using optical-flow methods as a basis for temporally up sampling dynamic and disparate image sequences onto a common time base. The primary objective of our work was to develop a proof-ofconcept processing approach to enable the end-user to focus on high-level experiment data analysis rather than on timeconsuming data preparation and setup tasks. Controlled test scenario data was used for development and performance demonstration. Our processing chain first spatially matches and resamples the sensor data onto a common spatial pixel grid. This preliminary step leverages existing and well-established techniques. The primary task next uses optical-flow based techniques to perform temporal upsampling while taking into account dynamic scene content. The optical flow field is computed between successive frames of a high-resolution reference sensor. This flow field is transferred to the other sensors in the experiment and used as a basis for spatial-temporal image resampling. The result is a predicted target image frame at the reference sensor’s time base. Finally, multispectral exploitation algorithms were applied to the final “fused” data product in order to demonstrate usability.

8. REFERENCES [1] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-Up Robust Features (SURF),” Comput. Vis. Image Underst., vol. 110, no. 3, pp. 346–359, Jun. 2008. [2] B. K. Horn and B. G. Schunck, “Determining Optical Flow,” in Proc. SPIE, 1981, vol. 0281, pp. 319–331. [3] T. Brox, C. Bregler, and J. Malik, “Large displacement optical flow,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, 2009, pp. 41–48. [4] B. D. Lucas and T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision,” in IJCAI, 1981, vol. 81, pp. 674–679. [5] A. Bruhn, J. Weickert, and C. Schnörr, “Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods,” Int. J. Comput. Vis., vol. 61, no. 3, pp. 211–231, Feb. 2005.

Sample manuscript showing specifications and style

Data analysis and exploitation is made more difficult and time consuming as the ... Commercially available spectral analysis software (e.g. ENVI, Opticks, ... Leverage existing image processing techniques and tools to align input data onto a.

844KB Sizes 0 Downloads 126 Views

Recommend Documents

Sample manuscript showing specifications and style
"Trajectory planning method for reduced patient risk in image-guided ... [email protected]; phone +972-77-200-5991; http://www.cs.huji.ac.il/~caslab/site/ ... table conveniently supports the direct comparison of candidate trajectories risk ..... Th

Sample manuscript showing specifications and style
The E-TRE map is automatically updated when the surgeon interactively adds ... all the error factors; 2) no user interaction and error visualization is provided; 3).

sample manuscript showing specifications and style
Using this new method, approximately 80% run-time is saved and the same accuracy maintained as compared to the conventional OPC methods. Several new ...

Sample manuscript showing specifications and style - Carl Ferkinhoff
Here we report on the design and first light performance of the first TES bolometer array installed in ZEUS-2. This array features ... to provide similar performance across the remaining telluric windows between 200-850 microns. Keywords: ZEUS .... a

Sample manuscript showing specifications and style - Carl Ferkinhoff
to provide similar performance across the remaining telluric windows between 200-850 microns. Keywords: ZEUS, CSO, submillimeter, ULIRG, high redshift, far-infrared, star formation, galaxies. 1. INTRODUCTION. ZEUS-2, is designed to study the star for

Sample manuscript showing specifications and style
(like very fine texture) in some part of an image which attract viewer attention might be lost due to tone .... Illustration of the effect of global and local TMOs.

manuscript specifications
Rediset WMX®, however, had only minor effect on the binder ... temperature by 20°C up to 50°C from the conventional Hot Mix Asphalt (HMA) without compromising .... The illustration (Figure 5) also demonstrates the crystallization range of.

Sample manuscript - English.pdf
Bovine thyroglobulin was from WAKO (Osaka, Japan). Monosaccharides: D-glucose (Glc), D-mannose (Man), D-galactose (Gal), L-rhamnose. (Rha), L-fucose ...

Mission Core novel sample manuscript by Lalith Bhargav.pdf ...
There was a problem previewing this document. Retrying... Download ... Mission Core novel sample manuscript by Lalith Bhargav.pdf. Mission Core novel ...

Manuscript
Page 1 .... In the Taylor et al. experiment, the personal induced value of the good varied across participants, while all ..... makers. That such efforts by survey researchers are effective is evidenced in a recent contingent valuation study of ...

Manuscript
the bifurcation of data into purely hypothetical responses and real actions is misplaced and .... 1 In the analysis that follows, we use data from Vossler and McKee's ...... Paper presented at the National science foundation preference elicitation.

Sermon Manuscript
When the wall is gone, they think their work is complete but God has other ideas. God tells them ... (Kairos Prison Ministry International, Inc.;. Red Manual ...

Sermon Manuscript
God tells them that they need to go and help others break down their walls. But this must be done in love because if it is not done with love it will be meaningless.

Accepted Manuscript
assistant professor, Department of Economics, Oberlin College. Christian Vossler is ... farmers and students in our experiments, for which we are very grateful.

Sermon Manuscript
God sends his Son who pleads to him in the garden of Gethsemane to take this away from him. But God's will must be done and God has to feel Jesus' pain ...

Accepted Manuscript
Of course this ..... Under the policy, all firms face a constant marginal tax, = .... Although the computer screen on which decisions are made lists 30 decision ...

Stock Specifications - Pipes
The surface condition complies with API 5L Annex E. (b) External surface of pipe shall be coated with a layer of varnish. NDT. All pipes will be ultrasonic tested ...

GLUT Specifications - Hippo Games
Nov 13, 1996 - The OpenGL Utility Toolkit (GLUT) is a programming interface with ANSI C and FORTRAN bindings for writ- ing window system .... The advantage of a builtin event dispatch loop is simplicity. GLUT contains routines for rendering fonts and

GeoPackage Specifications -
May 11, 2012 - enabled analytics. This GPKG data container ..... GEOS is widely used by both free and commercial software packages. Quantum GIS (QGIS) is.

for Applied Geochemistry Manuscript Draft Manuscript ...
May 1, 2008 - spatial join procedure (ArcMapTM software (ESRI)) to link the geochemical sampling ... We used a script written in the GIS package ArcViewTM.

Manuscript writing - Gastrointestinal Endoscopy
that describe your study as ''first, only, best''; it is un- likely to be completely true, and it is ... Legends for illustrations (figures). 12. Units of measurement. 13.

combined manuscript
Jul 14, 2004 - The initial bath solution was serum-free RPMI-1640 cell culture medium. ..... Garcia-Anoveros J, Derfler B, Neville-Golden J, Hyman BT, and ...

Accepted Manuscript
Jul 23, 2008 - cortical processing pathways for perception and action are an illustration of this general .... body representations, an effect of a preceding motor response on a ... wooden framework was placed (75 cm by 50 cm by 25 cm).