You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

ODN/UnifiedViews module

is responsible for:

  1. extracting data provided by data publishers
  2. transforming these data to machine readable data format; such transformation may include enriching the data, cleansing the data, assessing the quality of the data
  3. storing the machine readable data to the database managed by ODN/Storage.

Input of the module

is the data provided by data publishers. Data is expected to be structured, mostly tabular or linked data (RDF). Module will support basic data formats out of the box, support for more complex data formats is available via plugins.

Module will work with different formats (in files),  but preferred is data in RDF format. RFD format will allow usage of advanced data cleansing and enrichment techniques based on linked data also for use cases where output will not be in RDF (i.e. for example cases where ODN will be used to clean CSV files before publishing).

Output of the module

is the extracted and transformed machine readable data stored in ODN/Storage. Again, data is expected to be structured, tabular or linked data.

UnifiedViews allows users to define and adjust data processing tasks (pipelines) using a graphical user interface (see Figure below); the core components of every data processing task are data processing units (DPUs). DPUs may be drag&dropped on the canvas where the data processing task is constructed. Data flow between two DPUs is denoted as an edge on the canvas; a label on the edge clarifies which outputs of a DPU are mapped to which inputs of another DPU. UnifiedViews natively supports exchange of RDF data between DPUs; apart from that, files may be exchanged between DPUs.

How does it work?

ODN/UnifiedViews comprises of the important components as follows:  

  • DAO & Service - used to access database where configuration of ETL tasks and its executions is stored 
  • HTTP REST Transformation API - Services from DAO & Services layer exposed as HTTP REST methods. Used by ODN/Management module
  • Data Processing Engine - Robust engine running the manually launched or scheduled transformation tasks - transformations may include data cleansing, linking, integration, quality assessment
  • Management GUI - GUI used to manage the configuration of pipelines, debugging executions, etc.

Figure 1

 

 

 

 

Other features

Scheduling and planning of data processing tasks

UnifiedViews takes care of task scheduling. Users can plan executions of data processing tasks (e.g., tasks are executed at a certain time of the day) or they can start data processing tasks manually. UnifiedViews scheduler ensures that DPUs are executed in the proper order, so that all DPUs have proper required inputs when being launched.

Notifications and debugging

A user may configure UnifiedViews to get notifications about errors in the tasks' executions; user may also get daily summaries about the tasks executed.

To simplify the process of defining data processing tasks and to help users analyzing errors during data processing task executions, UnifiedViews provides users with the debugging capabilities. Users may browse and query (using SPARQL query language) the RDF inputs to and RDF outputs from any DPU.

New DPUs creation

UnifiedViews framework also allows users to create custom plugins - data processing units (DPUs). Users can also share DPUs with others together with their configurations or use DPUs provided by others.

 

 

 

  • No labels