Borrador V 0.5

1

Data Warehouse Engineering Process (DWEP) with U.M.L. 2.0 Edwar Javier Herrera Osorio, [email protected] Universidad Nacional de Colombia

work in Section 4.

 Abstract— This paper presents an update DWEP to version 2.0. DWEP in the use of use case diagrams, class diagrams, package diagrams, deployment diagrams. Is the use of the same with their updates, it also proposes the use of state diagrams, activity diagrams, composite diagrams, structure diagrams, interaction diagrams and overview diagrams Index Terms— Data warehouse, UML, Unified process, data models

I.

T

INTRODUCTION

he data warehouse (DW) is one of the components of the intelligence business, Bill Inmon defines it: “... A data warehouse is a subject-oriented, integrated, timevariant, nonvolatile collection of data in support of management’s decisions...” [1], and Ralph Kimball: “… the Data Warehouse is a collection of data in the form of a database that stores and organizes information that is extracted directly from operational systems (sales, production, finance, marketing, etc..) and external data…”[2]. Building a DW is a challenging and complex task because a DW concerns many organizational units and can often involve many people. Lujan poses at the 2004 [3,4] Data Warehouse Engineering Process (DWEP), a methodology for building the data warehouse based on the Unified Modeling Language (UML) 1.4 [5] and the Unified Process (UP) [6], which allows the user to tackle DW all design stages, from the operational data sources to the final implementation and including the definition of the ETL (Extraction, Transformation, and Loading) processes and the end users' requirements. The rest of the paper is structured as follows. In Section 2, we briefly present some of the most important related work and point out the main shortcomings. In Section 3, we summarize DWEP: first is presented charts proposed by Lugan[3,4], then presented updates to the UML diagrams to version 2.0 (the results achieved so far) and shows the use of these devices in the workflows that make up our process. Finally, we present the main contributions and the future

II.

RELATED WORK

Recent years have developed several methodologies for the development of data warehouses which defines the following levels of abstraction [7]: Conceptual, logical and physical. Conceptual Data Model: Represents the interactions between the entities and relationships. This model is closer to real world problems to solve. Highlights the following patterns in the data warehouse: Model Multidimensional / ER (Sapia) [8], model Star / ER (Tryfona) [9], GOLD model (Trujillo) [5, 10], model Husemann [11], YAM2 model [12]. Logical data model: The objective of the logical data model is to describe in as much detail as possible, without considering how they will be physically in the database. Is this model includes entities, relationships and their interaction, the data types of all attributes of each entity, the definition of primary and foreign keys, definition of the extraction, transformation and loading (ETL), among other activities. Physical Data Model: The physical data model includes all the specification of all tables and columns, following the business rules to determine the design of the data warehouse. In this model, you write the code to create tables, views, integrity rules, multidimensionality consultations. On the other hand are the different methodologies for the development of data warehouses [3, 5, 13, 14, 15, and 16], most shortcomings: do not include a visual modeling language, not to propose a series of steps or phases, or based on an application (for example, the star diagram of relational databases). In 2005, Lujan proposed a methodology based on the Unified Process (Data Warehouse Engineering Process DWEP), which is based on UML version 1.4. The DWEP propose a collection of artifacts for standardization. In conclusion DWEP claim upgrade to version 2.0 of UML which gives us more devices to implement the data warehouse.

Borrador V 0.5

2

Figure 1 The Unified Process [6] and Data Warehouse Engineering Process [5]

III.

DATA WAREHOUSE ENGINEERING PROCESS

Lujan in his doctoral thesis [5] presents a Data Warehouse Engineering Process (DWEP) based on the unified process. The UP is a methodology for software development proposed by OMG [17], its main features are: it is iterative, is addressed by the use cases is based on stages of development, using UML as a graphical language models [18 and 19]. The UP and DWEP is composed of four phases [5 and 20]: inception, design, construction and transition (view Fig. 1). Phases UP and DWEP Inception Phase: This phase is to develop the project analysis to justify its implementation. To achieve this there is a general description of the project, a planning based on interactions of the phases, there are critical risks and establishes the basic functionality of the software architecture description of a candidate. Development phase: Once the initial phase is to build a robust architecture for building software. This phase seeks to establish the rationale for implementing the use cases and artifacts of the final system component, in addition to mitigating the risk of technological exploration of the programming language in terms of user interface is concerned. For this first iteration was completed with a functional prototype for testing software and the definition of the model for implementing the user interface. Construction Phase: The construction phase starts from the baseline architecture that is specified in the design phase, and

its purpose is to develop a product ready for initial operation at the end-user environment. Transition phase: Once the project enters the transition phase, the system has reached initial operating capability. This phase seeks to introduce the product in its operating environment. Workflows DWEP In general terms the UP, workflow is a set of activities in a given area resulting in the construction of artifacts (a text, a diagram, a web page, code in programming language, etc.).. Requirement: During this workflow, end users specify the measures and add more interesting, dimensional analysis, queries used to generate periodic reports and frequency of updating the data. For the development this stages the UP use of use cases. View Figure 2. This helps to understand the system and the requirements and functions for the solution. Furthermore, it must be like the interactions of the system. Analysis: The purpose of this workflow is to improve the structure and requirements from the requirements stage. This step documents the incumbent systems that feed the data warehouse. The unified process diagram of the proposed use of class diagrams. View Figure 3.

Borrador V 0.5

3

Imprimir Informe

Cliente

Operador

Reciclar

Administrar Depósito

Figure 5 Conceptual diagram level 2 [5]

Figure 2 Use cases and extended form [5]

Figure 6 Artifacts PU design stage

Implementation: During this workflow, the data warehouse is built: The physical structure of the data warehouse is built, start to receive data in computer systems operations, is tuned for optimized performance, among other tasks. The process proposed as unified engine components diagram. View figure 7.

Figure 3 Logical schema diagram [5]

Design: At the end of this workflow, the structure is defined in the data warehouse. The main result of this workflow is the conceptual model of the data warehouse. View Figures 3- 5. The UP proposes the use classes structured into packages, design of subsystems defined interfaces (components) and the form of collaboration between the classes. View Figure 5.

Figure 4 Conceptual diagram level 1 [5]

Figure 7 Physical diagram of the data warehouse [5]

Tests: The aim of this work is to verify the application to work correctly. More specifically, the effects of the tests are: Planning the evidence needed to design and implement the tests by creating test cases and perform tests and analyze results of each test. Workflows for maintenance and development post are not in the unified process and only part of the engineering process of the data warehouse.

Borrador V 0.5

4

Maintenance: Unlike most systems, the data warehouse is a process that feeds constantly. The purpose of this workflow is to define the loading and updating processes necessary to maintain the data warehouse. This workflow starts when building the data warehouse and is delivered to end users, but does not have an end date. During this study, end users may have new needs, such as new downloads, which triggers the beginning of a new iteration with the requirements of workflow. Revisions post development: This is not a workflow of development activities, but a review process to improve future projects. If we keep track of time and effort invested in each stage is useful in estimating time and needs to generate the requirements for future developments.. IV.

UPDATE DWEP TO UML V 2.0

We’ve seen the UML evolve over the years and it is now into its 2.x series of releases, UML 2 defines thirteen basic diagram types, divided into two general sets: Structural Modeling Diagrams Structure diagrams define the static architecture of a model. They are used to model the 'things' that make up a model - the classes, objects, interfaces and physical components. In addition, they are used to model the relationships and dependencies between elements.  Package diagrams are used to divide the model into logical containers, or 'packages', and describe the interactions between them at a high level.  Class or Structural diagrams define the basic building blocks of a model: the types, classes and general materials used to construct a full model.

tracking how the system will act in a real-world environment, and observing the effects of an operation or event, including its results.  Use Case diagrams are used to model user/system interactions. They define behavior, requirements and constraints in the form of scripts or scenarios.  Activity diagrams have a wide number of uses, from defining basic program flow, to capturing the decision points and actions within any generalized process.  State Machine diagrams are essential to understanding the instant to instant condition, or "run state" of a model when it executes.  Communication diagrams show the network, and sequence, of messages or communications between objects at run-time, during a collaboration instance.  Sequence diagrams are closely related to communication diagrams and show the sequence of messages passed between objects using a vertical timeline.  Timing diagrams fuse sequence and state diagrams to provide a view of an object's state over time, and messages which modify that state. Interaction Overview diagrams fuse activity and sequence diagrams to allow interaction fragments to be easily combined with decision points and flows. The DWEP use the diagrams based in UML v1.4, these diagrams are: use case, class, components and deployments. This update this use diagrams all diagrams DWEP and this new: state machine, activity. Activity Diagrams in the DWEP

 Object diagrams show how instances of structural elements are related and used at run-time.  Composite Structure diagrams provide a means of layering an element's structure and focusing on inner detail, construction and relationships.  Component diagrams are used to model higher level or more complex structures, usually built up from one or more classes, and providing a well defined interface.  Deployment diagrams show the physical disposition of significant artifacts within a real-world setting.

In Figure 8 we show the main steps of our method using a UML activity diagram. The diagram has been divided into two parts (swim lanes) who guide the activities as outlined above: Members of the DW designers, administrators and DW addition, activities are divided into three groups according to the stage of creating the data warehouse in which involved analysis, design and implementation. Finally, the transitions define a sequence of activities and also indicate the use of information from other activity. State diagrams

Behavioral Modeling Diagrams

Here new…….

Behavior diagrams capture the varieties of interaction and instantaneous states within a model as it 'executes' over time;

Sequence diagrams Here new

Borrador V 0.5

5

Figure 8 Diagrams activity in the DWEP [5]

V.

CONCLUSION

A conclusion section is not required. Although a conclusion may review the main points of the paper, do not replicate the abstract as the conclusion. A conclusion might elaborate on the importance of the work or suggest applications and extensions. ACKNOWLEDGMENT The preferred spelling of the word “acknowledgment” in American English is without an “e” after the “g.” Use the singular heading even if you have many acknowledgments. Avoid expressions such as “One of us (S.B.A.) would like to thank ... .” Instead, write “F. A. Author thanks ... .” Sponsor and financial support acknowledgments are placed in the unnumbered footnote on the first page, not here. REFERENCES [1] W. Inmon, Building the data warehouse. Wiley, 2002. [2] R. Kimball and M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Wiley, 2002. [3] S Lujan and J Trujillo. A Data Warehouse Engineering Process. Advances in Information Systems, Springer Berlin / Heidelberg, Volume 3261/2005 , pp. 14–23. [4] S Lujan , Data WareHouse Desig with UML, PHD. Thesis, Universidad de Alicante, 2005 [5] Object Management Group (OMG): Unified Modeling Language Specification 2.0, Internet: http://www.omg.org/technology/ documents/ modeling_spec_catalog.htm#UML. 2009 .

[6] Jacobson, I., Booch, G., Rumbaugh, J.: The Unified Software Development Process. Object Technology Series. Addison-Wesley . 1999. [7] Steel,T.B.,Jr. (Chairman): ANSI/X3/SPARC Study Group on Data Base Management Systems Interim Report; ACM SIGMOD FDT, Vol. 7, No. 2, 1975. [8] C. Sapia, M. Blaschka, G. Hofling, and B. Dinter. Extending the E/R Model for the Multidimensional Paradigm. In Proceeding of the 1ST International Workshop on Data Warehouse and Data Mining (DWDM’98), volumen 1552 of Lecture Notes in computer Science, pages 105-116, Singapore, November 19- 20 199. Springer- Velang. [9] N. Tryfona. F. Busborg, and J.G. Christiansen. starER: A Conceptual Model for Data Warehouse Desing. In proceedings of the ACM 2nd international Workshop on Data Warehousing and OLAP (DOLAP`99), pages 3-8, Kansas City, USA, November 6 1999. ACM. [10] J. Trujillo. The GOLD model: An Object Oriented multidimensional data model for multidimensional database, Symposium on Applied Computing Proceedings of the 2000 ACM, symposium on Applied computing- Volume 1, Italy, pages 346-350, 2000. ACM. [11] B. Husemann, J. Lechtenborger, G. Vossen, Conceptual Data Warehouse Desing, Proceeding of the International Workshop on Design and Management of Data Warehouses (DMDW’2000), StockHolm, Sweden. [12] A. Abello, J. Samos, and F. Saltor. YAM2 (Yet Another Multidimensionañ Model): An extension of UML. In International database Engineering applications Symposium (IDEAS’02), pages 172-181, Edmoton Canada, July 17-19 2002. IEEE Computer Society. [13] Kimball, R.: The Data Warehouse Toolkit. John Wiley & Sons (1996) (Last edition: 2nd edition, John Wiley & Sons, 2002). [14] Giovinazzo, W.: Object-Oriented Data Warehouse Design. Building a star schema. Prentice-Hall, New Jersey, USA (2000) [15] Cavero, J., Piattini, M., Marcos, E.: MIDEA: A Multidimensional DataWarehouse Methodology. In: Proc. of the 3rd Intl. Conf. on Enterprise Information Systems (ICEIS’01), Setubal, Portugal (2001) 138–144 [16] Moody, D., Kortink, M.: From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design. In: Proc. of the 3rd Intl. Workshop on Design and Management of Data Warehouses (DMDW’01), Interlaken, Switzerland (2001) 1–10 [17] [21] Object Management Group (OMG). Unifie Modeling Language (UML), version 2.0, consultado marzo de 2008 Internet: http://www.uml.org/ [18] Booch Grady, Rumbaugh Jim, Jacobson Ivar, “UML, El lenguaje unificado de modelado”, consultado en internet http://www.itescam.edu.mx/principal/sylabus/ fpdb/recursos/r25380.PDF [19] Fuentes Lidia, Vallecillo Antonio. “Una Introducción a los Perfiles UML, Consultado en Internet”

Borrador V 0.5 http://www.lcc.uma.es/~av/Publicaciones/04/ UMLProfiles-Novatica04.pdf. [20] Jacobson, Ivar; Booch, Grady; Rumbaugh, James. “El proceso unificado de desarrollo de software.”, Addison Wesley. Madrid, ES. 2000. 438 p First A. Author Edwar Javier Herrera Osorio (05/10/1977), systems engineer, Distrital university 2004, specialist in database development, foundation university of Bogotá Jorge Tadeo Lozano, 2007, Master candidate in systems engineering and computer science 2008 National University of Colombia

6

Data Warehouse Engineering Process (DWEP) with ...

the intelligence business, Bill Inmon defines it: “... A data warehouse is a ... The UP is a methodology for software development proposed by OMG [17], its main ...

222KB Sizes 0 Downloads 164 Views

Recommend Documents

Data Warehouse Engineering Process (DWEP) with ...
processes of extraction, transformation and loading (ETL), and. (iii) the area storage. Lujan [3, 4] proposed in his doctoral thesis using the Data Warehouse ...

Data Warehouse Engineering Process (DWEP) with U.M.L. 2.1.1.
business rules to determine the design of the data warehouse. In this model ... The UP is a methodology for software development proposed by OMG [17], its ...

Data Warehouse Engineering Process (DWEP) with ...
production, finance, marketing, etc..) and external data…”[2]. Building a DW is a challenging ... baseline architecture that is specified in the design phase, and.

Data Warehouse Engineering Process (DWEP) with ...
physical. Index Terms— Data warehouse, UML, Unified process, data models ... Warehouse Engineering Process (DWEP), a methodology for building the data ... which allows the user to tackle DW all design stages, from the operational data ...

Data Warehouse Engineering Process (DWEP) with ...
Index Terms— Data warehouse, UML, Unified process, data models ... management's decisions...” [1], and Ralph ... database that stores and organizes information that is .... Management Systems Interim Report; ACM SIGMOD FDT, Vol. 7, No.

Data Warehouse Engineering Process (DWEP) with ...
The UP is a methodology for software development proposed by OMG [17], ... Mapping (DM), Data Warehouse State Machine Schema. (DWMSS) and the Data ...

Data Warehouse with Pentaho Data Integration -
Data analysis as part of business intelligence solutions is a growingly demanded needs. .... Create a transformation to join from 3 data sources : text file, Excel.

Data Warehouse with Pentaho Data Integration -
PC or Laptop with minimum of 2GHz CPU, 1 GB of RAM, DVD Drive and 2 GB of ... Punch Through. 9. Orphan / Late Arrival. ◦ What is Late Arrival Dimension?

Download Building a Scalable Data Warehouse with ...
Read Best Book Online Building a Scalable Data Warehouse with Data Vault 2.0, ebook ... Data Vault 2.0, pdf epub free download Building a Scalable Data Warehouse with Data .... Server Integration Services. (SSIS), including automation.

Implementing A Data Warehouse With Microsoft SQL Server 2012.pdf
Microsoft - Implementing A Data Warehouse With Microsoft SQL Server 2012.pdf. Microsoft - Implementing A Data Warehouse With Microsoft SQL Server 2012.

PERENCANAAN DATA WAREHOUSE PEMETAAN DATA SISWA.pdf
PERENCANAAN DATA WAREHOUSE PEMETAAN DATA SISWA.pdf. PERENCANAAN DATA WAREHOUSE PEMETAAN DATA SISWA.pdf. Open. Extract.

(Exam 70-463): Implementing a Data Warehouse with ...
preparation for Microsoft(R) Certification Exam 70-463 with this 2-in-1 Training ... Business intelligence with Microsoft Excel, SQL Server Analysis Services, and.

Download Building a Scalable Data Warehouse with ...
"Building a Scalable Data Warehouse" covers everything one needs ... warehousing, applications, and the business context so readers can get-up and running ...

requirement engineering process in software engineering pdf ...
requirement engineering process in software engineering pdf. requirement engineering process in software engineering pdf. Open. Extract. Open with. Sign In.