A Real Time Data Extraction, Transformation and Loading Solution for Semi-structured Text Files Nuno Viana1, Ricardo Raminhos1, and João Moura-Pires2 1

UNINOVA, Quinta da Torre, 2829 -516 Caparica, Portugal {nv,rfr}@uninova.pt http://www.uninova.pt/ca3 2 CENTRIA/FCT, Quinta da Torre, 2829 -516 Caparica, Portugal [email protected] http://centria.di.fct.unl.pt/~jmp

Abstract. Space applications’ users have been relying for the past decades on custom developed software tools capable of addressing short term necessities during critical Spacecraft control periods. Advances in computing power and storage solutions have made possible the development of innovative decision support systems. These systems are capable of providing high quality integrated data to both near real time and historical data analysis applications. This paper describes the implementation of a new approach for a distributed and loosely coupled data extraction and transformation solution capable of extracting, transforming and perform loading of relevant real-time and historical Space Weather and Spacecraft data from semi-structured text files into an integrated spacedomain decision support system. The described solution takes advantage of XML and Web Service technologies and is currently working under operational environment at the European Space Agency as part of the Space Environment Information System for Mission Control Purposes (SEIS) project.

1 Introduction The term “Space Weather” [1, 2], (S/W) represents the combination of conditions on the sun, solar wind, magnetosphere, ionosphere and thermosphere. Space Weather is not only a driver for earth’s environmental changes but also plays a main role in the performance and reliability of orbiting Spacecraft (S/C) systems. Moreover, degradation of sensors and solar arrays or unpredicted changes in the on-board memories can often be associated with S/W event occurrences. The availability of an integrated solution containing Space Weather and specific S/C onboard measurements’ data, would allow performing of online and post-event analysis, thus increasing the S/C Flight Controllers’ ability to react to unexpected critical situations and indirectly, to enhance the knowledge about the dynamics of the S/C itself. Although important, this integrated data service is currently unavailable. At best some sparse data sub-sets exist on public Internet sites running on different locations and with distinct data formats. Therefore, collecting all the relevant information, transforming and interpreting it correctly, is a time consuming task for a S/C Flight Controller. C. Bento, A. Cardoso, and G. Dias (Eds.): EPIA 2005, LNAI 3808, pp. 383 – 394, 2005. © Springer-Verlag Berlin Heidelberg 2005

384

N. Viana, R. Raminhos, and J. Moura-Pires

To provide such capabilities, a decision support system architecture was envisaged – the Space Environment Information System for Mission Control Purposes (SEIS) [3, 4], sponsored by the European Space Agency (ESA). The main goal of SEIS is to provide accurate real-time information about the ongoing Space Weather conditions and Spacecraft onboard measurements along with Space Weather predictions (e.g. radiation levels predictions). This platform assures the provision of distinct application services based on historical and near real-time data supported by a common database infrastructure. This paper details the Data Processing Module – DPM, (with a special focus on the extraction and transformation component - UDET – Uniform Data Extractor and Transformer) used in SEIS, responsible for the retrieval of all source files (semistructured text files) from external data service providers, “raw” data extraction and further transformations into a usable format. Extensive research work has been also accomplished in both the conceptual Extraction, Transformation and Loading (ETL) modeling[5] and demonstrative prototypes[6, 7]. Given the number of already existing commercial1 and open source ETL tools2, the first approach towards solving the specific data processing problem in SEIS, was to identify which ETL tools could potentially be re-used. Unfortunately, after careful assessment, it soon became obvious that the existing solutions usually required the development of custom code in order to define the specificities of extraction, transformation and loading procedures in near real-time. Due to the high number of Provided Files (please refer to Table 1 for a list of data service providers and files) and their heterogeneity in terms of format, it was not feasible to develop custom code to address all files. In addition, gathering the entire file processing logic at implementation level would raise severe maintainability issues (any maintenance task would surely cause the modification of source code). Also, the analyzed tools did not provide cache capabilities for data that although received from different files referred to the same parameter (for these files, duplicate entries must be removed and not propagated forward as the analyzed solutions suggested). The only option was in fact to develop a custom, but generic data processing mechanism to solve the problem of processing data from remote data service providers into the target databases while also taking into account scalability and maintainability factors and possible reutilization of the resulting solution on other projects. This paper will address the design and development of the data processing solution (with a special focus on the extractor and transformer component), which fulfils the previous mentioned requisites. The paper is organized in five sections: The first section (the current one) describes the motivation behind the data processing problem in the frame of the SEIS project as well as the paper’s focus and contents. Section two highlights the SEIS architecture focusing mainly in the Data Processing Module. The third section is dedicated to the UDET component and presents a comprehensive description of how files are effec1

IBM WebSphere DataStage (http://www.ascential.com/products/datastage.html) SAS Enterprise ETL Server (http://www.sas.com/technologies/dw/etl/index.html) Data Transformation Services (http://www.microsoft.com/sql/evaluation/features/datatran.asp) Informatica PowerCenter (http://www.informatica.com/products/powercenter/default.htm) Sunopsis ELT (http://www.sunopsis.com/corporate/us/products/sunopsis/snps_etl.htm) 2 Enhydra Octopus (http://www.octopus.objectweb.org/) BEE Project (http://www.bee.insightstrategy.cz/en/index.html) OpenDigger (http://www.opendigger.org/)

A Real Time Data ETL Solution for Semi-structured Text Files

385

tively processed. Section four provides the reader, an insight on the technical innovative aspects of UDET and finally, section five provides a short summary with achieved results and guidelines for future improvements on the UDET component. Table 1. List of available data service providers, number of provided files and parameters

Provided Provided Files Parameters

Data Service Provider

Type

Wilcox Solar Observatory Space Weather Technologies SOHO Proton Monitor data (University of Maryland) Solar Influences Data analysis Center Lomnicky Peak’s Neutron Monitor National Oceanic and Atmosphere Administration/National Geophysical Data Centre National Oceanic and Atmosphere Administration/Space Environment Centre US Naval Research Laboratory World Data Centre for Geomagnetism European Space Operations Centre Multi Mission Module Total

Space Weather Space Weather Space Weather Space Weather Space Weather

1 1 2 2 1

1 2 6 5 2

Space Weather

1

1

Space Weather

35

541

Space Weather Space Weather Spacecraft Space Weather

1 1 19 13 77

1 1 271 118 949

2 Data Processing in the Space Environment Information System This section will initially provide a global view of the SEIS system and will focus afterwards on the Data Processing Module. The Uniform Data Extractor and Transformer component will be thoroughly addressed in section 3. 2.1 SEIS System Architecture SEIS is a multi-mission decision support system capable of providing near real-time monitoring [8] and visualization, in addition to offline historical analysis [3] of Space Weather and Spacecraft data, events and alarms to Flight Control Teams (FCT) responsible for Integral, Envisat and XMM satellites. Since the Integral S/C has been selected as the reference mission, all SEIS services – offline and online – will be available, while Envisat and XMM teams will only benefit from a fraction of all the services available for the Integral3 mission. The following list outlines the SEIS’s core services: - Reliable Space Weather and Spacecraft data integration. - Inclusion of Space Weather and Space Weather effects estimations generated by a widely accepted collection of physical Space Weather models. - Plug-in functionalities for any external “black-box” data generator model (e.g. models based on Artificial Neural Networks - ANN).

3

Following preliminary feedback after system deployment, it is expected that other missions (XMM and Envisat) in addition to the reference one (Integral) would like to contribute with additional data and therefore have access to the complete set of SEIS services.

386

-

N. Viana, R. Raminhos, and J. Moura-Pires

Near real-time alarm triggered events, based on rules extracted from the Flight Operations’ Plan (FOP) [9] which capture users’ domain knowledge. Near real-time visualization of ongoing Space Weather and Spacecraft conditions through the SEIS Monitoring Tool [10]. Historical data visualization and correlation analysis (including automatic report design, generation and browsing) using state-of-art Online Analytical Processing (OLAP) client/server technology - SEIS Reporting and Analysis Tool [3].

In order to provide users with the previously mentioned set of services, the system architecture depicted in Fig. 1 was envisaged.

5. Metadata Module Metadata Repository

External Data Service Providers

1. Data Processing Module

2. Data Integration Module

File Cache

4. Client Tools Alarm Engine

(c)

(a)

UDAP

UDET

UDOB

(b)

(d)

(e)

Operational Data Storage

Monitoring Tool Alarm Editor

3M Engine

ANN Engine

Data Warehouse

Data Marts

Reporting And Analysis Tool

3. Forecasting Module

Fig. 1. SEIS system architecture modular breakdown, including the Data Processing Module which is formed by several components: (a) External Data Service Providers, (b) Uniform Data Access Proxy (UDAP), (c) File Cache, (d) Uniform Data Extractor and Transformer - UDET (the focus of this paper) and (e) Uniform Data Output Buffer (UDOB).

As clear in Fig. 1, SEIS’s architecture is divided in several modules according to their specific roles. -

-

-

Data Processing Module: Is responsible for the file retrieval, parameter extraction and further transformations applied to all identified data, ensuring it meets the online and offline availability constraints, whilst having reusability and maintainability issues in mind (further detailed on section 2.2). Data Integration Module: Acts as the system’s supporting infrastructure database, providing high quality integrated data services to the SEIS client applications, using three multi-purpose databases (Data Warehouse (DW)[11], Operational Data Storage (ODS) and Data Mart). Forecasting Module: A collection of forecast and estimation model components capable of generating Space Weather [12] and Spacecraft data estimations. Interaction with any of these models is accomplished using remote Web Services’ invocation, which relies on Extended Markup Language (XML) message-passing mechanisms.

A Real Time Data ETL Solution for Semi-structured Text Files

-

-

387

Metadata Module: SEIS is a metadata driven system, incorporating a central metadata repository, that provides all SEIS applications with means of accessing shared information and configuration files. Client Tools: The SEIS system comprises two client tools, which take advantage of both the collected real time and historical data – the SEIS Monitoring Tool and the SEIS Reporting and Analysis Tool, respectively.

2.2 Data Processing Module As previously highlighted, one of the objectives of SEIS is to provide reliable Space Weather and Spacecraft data integration. This is not a trivial task due to the numerous data formats (from “raw” text to structured tagged formats such as HTML – Hyper Text Markup Language) and to the communication protocols involved (e.g. Hyper Text Transfer Protocol – HTTP and File Transfer Protocol - FTP). Since SEIS has near real-time data availability requirements, the whole processing mechanism should not take longer than 5 minutes to output its results into the UDOB) (i.e. the system has explicit knowledge – according to Metadata - on data refreshing time intervals for each remote Data Service Provider). Thus, several factors may interfere with this time restriction, from available network bandwidth, Round Trip Times (RTT), Internet Service Providers (ISP) availability, network status from SEIS and Data Service Provider sides, remote data service providers services’ load and the number of concurrent request for processing file requests. Since Data Service Providers are not controlled within SEIS but by external organizations according to their internal priorities, funding allocation and even scientists “good-will”, server unavailability information is not accessible in advance (e.g. detection occurs only when the service actually fails and data stops being “pumped” into the data repositories). For similar reasons, text files comprising relevant parameters, contain structured data, whose arrangement may evolve. I.e. as time passes, new parameters may be added, deleted or updated into the file, thus making the format vary. Once again, notification about format change is inexistent and has to be inferred by our system and/or users. To address this issue, the DPM incorporates knowledge on the active File Format Definition (FFD) applied to a given file within a specific timewindow. 2.3 UDAP, UDET and UDOB As depicted in Fig. 1, the Data Processing Module is composed by three subcomponents: UDAP, UDET and UDOB. The UDAP component is responsible for the retrieval of all identified files from the different data service providers’ locations, has the ability to handle with remote service availability failures and recover (whenever possible) lost data due access unavailability. UDAP is also in charge of dealing with both Space Weather and Spacecraft data estimations outputs generated by the estimation and forecasting blocks, namely the Mission Modeling Module (3M) block and ANN models, through data files which are the results of capturing the models’ outputs. Communication with these components is achieved using Web Services interfacing layers developed between UDAP’s and each of the models’ side.

388

N. Viana, R. Raminhos, and J. Moura-Pires

All retrieved data is afterwards stored into a local file cache repository (to ease cached file management, a simple MS Windows Network File System (NTFS) compressed file system was used), from which is later sent for processing. By moving all data files to a local cache before performing any actual file processing, not only a virtual file access service is provided (minimizing possible problems originated by external services’ failures), but also required storage space is reduced. Since all Data Processing Module components are Metadata driven, UDAP configuration and file scheduling definitions are stored in the centralized Metadata Repository. In addition, UDAP provides a Human Machine Interface (HMI), which allows users to issue commands such as thread “start”/”stop”, configuring UDET server instances (to be further discussed on the next section) and managing the request load on external data service providers and UDET engines. Once data has been moved locally (into the UDAP’s cache) preparation tasks in order to extract and transform identified parameters contained in the files may be performed. After being processed by UDET all the data will be finally loaded into the UDOB temporary storage area (implemented as relational tables) and thus made available to both the ODS and DW.

3 The Uniform Data Extraction and Transformer Component The main goal of UDET is to process all data provided files received from UDAP. These files hold textual information structured in a human readable approach. Each provided file has associated two temporal tags; start and end dates that determine the temporal range for which the parameter values concern. These temporal tags may exist either explicitly in the file header or implicitly, being inferred from the parameter entries. Three types of parameters are available in the input files: numerical, categorical and plain text. Most of these parameter values have a temporal tag associated, although some are time independent, containing general information only. Provided files can also be classified as real-time or summary (both types contain a temporal sliding window of data). While real-time files (e.g. available every 5 minutes) offer near real-time / estimation data for a very limited time window, summary files (e.g. available daily) offer a summary of all measures registered during that day (discrepancies between real-time and summary files contents are possible to find). Since summary data is more accurate than real-time, whenever available, the first shall replace the real-time values previously received. Fig. 2 presents the DPM processing pipeline from a high-level perspective, with special focus on the UDET component. After receiving a semi-structured text file from UDAP, UDET applies a set of ETL operations to the same file, according to definitions stored in an external file FFD, producing a set of data chunks as result. Each data chunk is characterized as a triplet, containing a global identifier for a parameter, a temporal tag and the parameter value. The size of a data chunk varies and is closely related with the nature of the data that is available in the file (e.g. Space Weather and Spacecraft data are stored in different data chunks). Depending on the UDET settings, these data chunks can be delivered to different containers (e.g. in SEIS data chunks are delivered to UDOB – a set of relational tables).

A Real Time Data ETL Solution for Semi-structured Text Files

389

File Format Definition (FFD) Data Chunks

UDET

Semi Structured Text File

Chunk generation

Loading

Fig. 2. UDET’s processing model

The following sub-sections highlight UDET’s main requirements; the model employed in SEIS and also how the ETL process is applied to the input files received from UDAP. Finally, UDET’s architecture is described in detail, unfolding its main components and the existing relations between them. 3.1 Main Requirements Since real-time files mainly hold repeated data (when compared with the previously retrieved real-time file), only the new added entries will be stored after every file processing. An output cache mechanism is then required, which is capable of improving the system’s load-factor considerably on UDOB by avoiding duplicate entries in near real-time files. In order to accomplish the SEIS near real-time requirement, data should not take more than 5 minutes to be processed (from the moment it is made available in the Data Provider until it reaches UDOB). In this sense, the performance of the DPM is fundamental to accomplish this condition and especially for the UDET component, which is responsible for most of the computational effort within the DPM. Due to the high number of simultaneous file transfers it is not feasible to sequence the file processing. Thus, a parallel architecture is required in order to process several input files simultaneously. As previously mentioned, Data Service Providers do not provide a notification mechanism to report changes on the format of Provided Files. Thus, UDET needs the inclusion of data quality logic mechanisms, which describe the parameter data types, and possibly the definition of ranges of valid values. Furthermore, maintenance tasks for the correction of format changes must have a minimum impact in the system architecture and code, in order not to compromise maintenance. Finally, data delivery should be configurable in a way that data resulting from the extraction and transformation process can be exported into different formats (e.g. XML, Comma-Separated Values - CVS, relational tables) although without being tied to implementation details that may restrict the solution’s reusability (e.g. if a solution is based on a scheme of relational tables it should not rely directly in a specific communication protocol). 3.2 Designed Model and Developed Solution The developed solution relies on declarative definitions that identify which operations are necessary during the ETL process, instead of implementing this logic directly at

390

N. Viana, R. Raminhos, and J. Moura-Pires

code level. These declarative definitions are stored in FFD files and their contents are directly dependent on the Provided File format, data and nature. So, it is necessary to create a dedicated FFD for each Provided File, holding the specific ETL logic necessary to process any file belonging to a Provided File class (file format detection is currently not implemented, but considered under the Future Work section). FFD are stored in XML format since this is a highly known World Wide Consortium (W3C) standard for which exists a wide range of computational efficient tools. In addition, the format is human readable, enabling an easy validation without recurring to a software translation tool to understand the content logic. File Format Definition files holds six distinct types of information: (1) General Information – Global data required to process an input file, such as: end of line character, start and end dates for which the file format is valid (for versioning purposes), decimal and thousand separator chars and any user comment. (2) Section Identification – Gathers the properties responsible for composing each of the sections present in an input file (e.g. headers, user comments, data). A section can be defined according to specific properties such as absolute line delimiters (e.g. line number) or sequential and contiguous lines sharing a common property (e.g. lines that “start”, “end” or “contain” a given string). In addition, it is possible to define relative sections, through other two sections, which enclose a third one (“enclosed section”) using the “Start Section After Previous Section End” and “End Section Before Next Section Start” properties. (3) Field Identification – Contains the definitions of all existing extractable fields, where each field parameter is associated to a given file section. Fields can be of two types: “single fields” and “table fields”. The specification of single fields can be performed by defining an upper and lower char field enclosing delimiters or alternatively, through a regular expression. Additionally, several metainformation related with the field is included, such as the field name, its format, and global identifier. The specification of table fields is accomplished through the capturing of table columns using several definition types, according to the files’ intrinsic format (typically, the more generic definition which best extracts the data from the columns should be chosen). Other available options include the capability of extracting columns: based on column separators (with the possibility of dealing with consecutive separator chars as a new column or not); based on fixed column positions (definition of column breaks); based on regular expression definition. Similar to single fields, meta-information about the table columns is also available, such as global identifications, column data formats, column names and missing value representations. The use of regular expressions should be limited as much as possible to advanced users acquainted with definition of regular expressions (although its direct use usually results on considerable speed gains). (4) Transformation Operations – Hold a set of sequences containing definitions of transformation operations, to be applied to single and table fields, transforming the original raw data in a suitable format. A large collection of transformation operations both valid for single and table fields (e.g. date convert, column create from field, column join, column append) is available.

A Real Time Data ETL Solution for Semi-structured Text Files

391

(5) Data Quality – Contains information required for the validation of the data produced as result of the extraction and transformation process. Validation is accomplished through the association of data types, data thresholds and validation rules (e.g. 0
Delivery Library

(Thread)

UDOB

UDAP File Format Definition + Input Data File

Transformation Library

Activity Log

Console

Activity Log

File Format Definition Editor File Format Definition + Input Data File

Output Cache

Editor

Transformation Library

Delivery Library Output Cache

UDET Metadata

Fig. 3. UDET detailed architecture view

To process a file, the UDET engine firstly uses the transformation library to apply the extraction and transformations defined in the FFD file and afterwards, the outputs are sent into the delivery library. This library implements a global output cache mechanism, which prevents repeated records from being loaded into UDOB every time a file is processed. In addition, all UDET engine’s actions are logged onto a log

392

N. Viana, R. Raminhos, and J. Moura-Pires

file with the same format as all other components (by using a common console viewer it is possible to correlate and trace all UDAP and UDET activities). Besides the UDET engine, the architecture comprises also a FFD Editor, which provides end-users with the ability to manage FFD files via a very intuitive graphical interface. This tool takes also advantage of the same transformation and delivery libraries used by the UDET engine to allow previewing / validation of edited FFD files. This architecture allows near real-time processing of files retrieved by UDAP. To address offline file processing a separate UDET Engine/Delivery Library and respective UDOB should be installed on a supplementary machine (UDAP currently supports two separate instances). Last but not least, it is also worth mentioning that all components of the UDET architecture interact directly with the Metadata Repository (for storage/retrieval of FFD files and other further configuration files).

4 Implementing UDET The definition of a file to be retrieved (Provided File) is associated with a given Data Service Provider. The Provided File concept definition contains mainly info about the details of the file to be retrieved. After the file is retrieved and cached on disk, its contents are sent into UDET. UDET reads the associated File Format Definition (from the Metadata Repository) for the file and selects the corresponding format version, which correctly extracts and transforms all fields. Finally, the results are outputted into the UDOB component (the data processing pipeline’s target), which is composed by a set of relational tables. The UDET engine is a stand-alone service currently implemented as a Web Service, communicating with other system blocks in a transparent way. Besides the interfaces, which use Web Services, UDET also communicates with the UDOB component (implemented as a relational database) via the standard Open DataBase Connectivity (ODBC) connection protocol. During system tests, the UDET engine proved to be capable of meeting the near real-time processing requirement (e.g. the current assessment points to an average of 11,83 seconds per processed file, for a daily total of 4000 files downloaded from 9 different Data Service Providers, which amounted to 113Mb of text data). The estimated processing throughput for the UDET engine reached an average of 2.3Kbytes per second. For the sake of simplicity, the outputs from the 3M block have not been included during the statistical analysis. Moreover, the generated text files formats are similar to existing ones (in size and number of parameters) as provided by the external data service providers already included in the testing procedure.

5 Conclusions and Future Work The engineered data processing architecture fulfils the two major requirements derived from SEIS: near real-time file processing flux and offline archive processing capabilities. Tests performed after deployment (at the European Space Agency) of the SEIS infrastructure gave support to the adequacy of the proposed architecture given

A Real Time Data ETL Solution for Semi-structured Text Files

393

the SEIS expected data volumes. Nevertheless, due to the project’s time and budget constrains, some future steps are advisable, to take full advantage on the DPM architecture4. The FFD editor shall be the main priority on future developments, since the current version of this tool has still limited functionalities. Improvements are necessary to achieve the requirement of definition of FFDs by non-programming experts, hiding from the end user the method (i.e. that XML is used to store the declarative instructions) for defining FFDs. Two valuable features have already been identified for the FFD editor: -

Automatic generation of regular expressions from the graphical specification of examples in a text file. FFD validation procedures for a set of text files.

The logging and notification mechanisms also need to be improved to some extent in order to detect the incorrect appliance of FFDs to input files (due to change on the input file format) and launch recovery procedures when appropriate. Some enhancements should also be performed at the FFD’s structure level, either to provide higher flexibility (e.g. specification of an undefined number of sections) or to extend language expressivity through the inclusion of further specification techniques and transformation operations. Although all these enhancements would be useful to the end user, they should be considered as extensions to the existing UDET functionalities. The inclusion of these improvements is an added value, which would not require any change on the architecture presented in this work.

References 1. Schmieder, B., et al. Climate and Weather of the Sun Earth System: CAWSES, SCOSTEP'S Program for 2003-2008. in SOLSPA: The Second Solar Cycle and Space Weather Euroconference. 2002. 2. Daily, E. Space Weather: A Brief Review. in Second Solar Cycle and Space Weather Euroconference. 2002. 3. Pantoquilho, M., et al. SEIS: A Decision Support System for Optimizing Spacecraft Operations Strategies. in IEEE Aerospace Conference. 2005. Montana, USA. 4. Donati, A., et al. Space Weather and Mission Control: A Roadmap to an Operational Multi-Mission Decision Support System. in SpaceOps 2004 - 8th International Conference on Space Operations. 2004. Montreal, Canada. 5. Vassiliadis, P., et al., A generic and customizable framework for the design of ETL scenarios. Information Systems, 2005. 6. Adelberg, B. NoDoSE—A tool for semi-automatically extracting structured and semistructured data from text documents. in International Conference on Management of Data (ACM SIGMOD 98). 1998. Seattle, Washington, United States. 7. Berkeley, Potter's Wheel A-B-C: An Interactive Tool for Data Analysis, Cleansing, and Transformation (http://control.cs.berkeley.edu/abc/). 2000, CONTROL - Continuous Output and Navigation Technology with Refinement On-Line: Berkeley.

4

All the enhancements proposed as future work are related to the UDET only, not focusing on any other DPM component.

394

N. Viana, R. Raminhos, and J. Moura-Pires

8. Pantoquilho, M., et al. Online and Offline Monitoring and Diagnosis of Spacecraft and Space Weather Status. in EUROFUSE Workshop on Data and Knowledge Engineering. 2004. Warszawa, Poland. 9. Schmidt, M. and F.D. Marco, INTEGRAL Flight Operation Plan. 2003, VEGA: Darmstadt. 10. Moura-Pires, J., M. Pantoquilho, and N. Viana. Space Environment Information System for Mission Control Purposes: Real-Time Monitoring and Inference of Space Craft Status. in 2004 IEEE Multiconference on CCA/ISIC/CACSD. 2004. Taipei, Taiwan. 11. Kimball, R. and M. Ross, The Data Warehouse Toolkit: The Complete Guide To Dimensional Modelling. 2nd Edition ed. 2002: Wiley. 436. 12. Belgian Institute for Space Aeronomy, Space Applications Services, and P.S. Institute, SPENVIS - Space Environment Information System. 1998.

A Real Time Data Extraction, Transformation and ...

custom developed software tools capable of addressing short term necessities .... A Real Time Data ETL Solution for Semi-structured Text Files. 385 ... Near real-time visualization of ongoing Space Weather and Spacecraft conditions through ...

286KB Sizes 2 Downloads 146 Views

Recommend Documents

Real-time RDF extraction from unstructured data streams - GitHub
May 9, 2013 - This results in a duplicate-free data stream ∆i. [k.d,(k+1)d] = {di ... The goal of this step is to find a suitable rdfs:range and rdfs:domain as well ..... resulted in a corpus, dubbed 100% of 38 time slices of 2 hours and 11.7 milli

extraction and transformation of data from semi ...
computer-science experts interactions, become an inadequate solution, time consuming ...... managing automatic retrieval and data extraction from text files.

Real-Time Data Resources Flyer.pdf
Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Real-Time Dat ... ces Flyer.pdf. Real-Time Dat ... ces Flyer.

A Carrier-Independent Non-Data-Aided Real-Time ...
the following advantages: 1) It does not require prior carrier synchronization; 2) the ... signal-to-noise ratio (SNR), synchronization, wireless. ... applications of SNR estimates in communication systems. ..... the limitations on this are twofold.

A Carrier-Independent Non-Data-Aided Real-Time ...
their coding gain. Systems that employ diversity reception. [2, Sec. 14.4] often require SNR estimates to assign relative weights to the data obtained from the ...