Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

ODN/UnifiedViews module

is responsible for:

  1. extracting data provided by data publishers
  2. transforming these data to machine readable data format; such transformation may include enriching the data, cleansing the data, assessing the quality of the data
  3. storing the machine readable data to the database managed by ODN/Storage.

Input of the module

is the data provided by data publishers. Data is expected to be structured, mostly tabular or linked data (RDF). Module will support basic data formats out of the box, support for more complex data formats is available via plugins.

Module will work with different formats (in files),  but preferred is data in RDF format. RFD format will allow usage of advanced data cleansing and enrichment techniques based on linked data also for use cases where output will not be in RDF (i.e. for example cases where ODN will be used to clean CSV files before publishing).

Output of the module

is the extracted and transformed machine readable data stored in ODN/Storage. Again, data is expected to be structured, tabular or linked data.

UnifiedViews allows users to define and adjust data processing tasks (pipelines) using a graphical user interface (see Figure below); the core components of every data processing task are data processing units (DPUs). DPUs may be drag&dropped on the canvas where the data processing task is constructed. Data flow between two DPUs is denoted as an edge on the canvas; a label on the edge clarifies which outputs of a DPU are mapped to which inputs of another DPU. UnifiedViews natively supports exchange of RDF data between DPUs; apart from that, files may be exchanged between DPUs.

How does it work?



Figure 1

 

What else can Unified Views do ?

Loading transformed data

ODN/UnifiedViews loads the transformed data to ODN/Storage.

...

The data must be stored there together with metadata, so that ODN/Publication module knows which resources (tables, graphs) are associated with which pipeline/dataset.

Managing piplelines

ODN/UnifiedView will provide RESTful management API, which will be used by ODN/Management to:

...

An excerpt of the methods, which will be available to ODN/Management in a RESTful format is depicted below:

 

Other management features

Management GUI of ODN/UnifiedViews is used by ODN/Management to:

  • show the pipeline detail in an expert mode (user may drag&drop DPUs, fine-tune pipeline configuration)
  • show the detailed results of pipeline executions (browse events/logs)
  • debug data being passed between DPUs
  • have an access to advanced scheduling options

Other features

Scheduling and planning of data processing tasks

UnifiedViews takes care of task scheduling. Users can plan executions of data processing tasks (e.g., tasks are executed at a certain time of the day) or they can start data processing tasks manually. UnifiedViews scheduler ensures that DPUs are executed in the proper order, so that all DPUs have proper required inputs when being launched.

Notifications and debugging

A user may configure UnifiedViews to get notifications about errors in the tasks' executions; user may also get daily summaries about the tasks executed.

To simplify the process of defining data processing tasks and to help users analyzing errors during data processing task executions, UnifiedViews provides users with the debugging capabilities. Users may browse and query (using SPARQL query language) the RDF inputs to and RDF outputs from any DPU.

New DPUs creation

UnifiedViews framework also allows users to create custom plugins - data processing units (DPUs). Users can also share DPUs with others together with their configurations or use DPUs provided by others.

...