is the ETL (Extract - Transform - Load the data) tool responsible for:
- extracting data provided by data publishers
- transforming these data to machine readable data format; such transformation may include enriching the data, cleansing the data, assessing the quality of the data
- storing the machine readable data to the database managed by ODN/Storage.
Input of the module
is the data provided by data publishers. Data is expected to be structured, mostly tabular or linked data (RDF). Module will support basic data formats out of the box, support for more complex data formats is available via plugins.
Module will work with different file formats, but preferred is data in RDF format. RDF format will allow usage of advanced data cleansing and enrichment techniques based on linked data also for use cases where output will not be in RDF (i.e. for example cases where ODN will be used to clean CSV files before publishing).
Output of the module
is the extracted and transformed machine readable dataset stored in ODN/Storage. Again, data is expected to be structured, tabular or linked data.
UnifiedViews allows users to define and adjust data processing tasks (pipelines) using a graphical user interface (see Figure below); the core components of every data processing task are data processing units (DPUs). DPUs may be drag&dropped on the canvas where the data processing task is constructed. Data flow between two DPUs is denoted as an edge on the canvas; a label on the edge clarifies which outputs of a DPU are mapped to which inputs of another DPU. UnifiedViews natively supports exchange of RDF data between DPUs; apart from that, files may be exchanged between DPUs.
How does it work?
To operate almost any change
in the quality of your datasets you need to work in UnifiedViews environment.
Please check that your browser is among the supported browsers:
- Chrome 4.0+
- Internet Explorer 9.0+
- Mozilla Firefox 2.0+
- Opera 9.0+
- Safari 3.1+
To start using UnifiedViews web interface, users have to log in the ODN platform.
Once user is successfully logged in, the UnifiedViews landing page is displayed:
The main menu at the top is used to navigate through the application and represents different important sections of the tool.
Pipelines are building blocks for your entire task - from your starting data collection to published open datasets. You can build them yourself, select or modify already defined pipelines.
It is quite visual to see how the pipeline looks like, you build them selecting and positioning elements with your mouse, similarly with chart diagram drawing...
This section allows you to define new pipelines, see the details of already defined pipelines. It also allows to run/debug certain pipeline. Read more about it on the Pipelines Section page.
On the following screenshot you see the path of the pipeline for the daily updates of the data.
Data processing units (DPU) are the sub-blocks for pipelines, they execute basic tasks from the three classes, E (extracting), T (transforming), L (loading), their affiliation is visible on the first letter of their name.
This section contains an overview of the accessible DPU templates - plugins being available to be used when defining pipelines. Read more about it on the DPU Templates Section page.
In the ODN you will need mostly the Core DPUs
This section contains a list of already executed pipeline and is also used to view the detailed information about such executions. Read more about it on the Execution Monitor Section page.
This section allows you to schedule pipeline executions. It contains a list of scheduling rules (descriptions of when the pipeline should be executed) and is used to create and view scheduling rules of the pipelines. Read more about it on the Scheduler Section page.
This section allows you to define certain settings of the application. Read more about it on the Settings Section page.
To use UnifiedViews you need an account in the system. An account must have at least one role that grants user certain privileges. UnifiedViews is ready to support multiple roles for a single user. Currently we use two roles (discussed in more detail below):
Your role: User
With User role, user has full rights to manipulate his pipelines and scheduling rules. There are restrictions when working with shared pipelines of other users; these restrictions are given by the pipeline Visibility modifier.
Once pipeline is publicly visible (see Visibility option here) user can see the pipeline, its executions, display the pipeline detail and DPU settings, he can also copy the pipeline. If the pipeline is in read/write public mode (see Visibility option here) then user can also edit the pipeline.
Restrictions for User role.
- User does not see scheduling rules of other users.
- User cannot delete pipeline of other user.
- User cannot delete data from staging database.
- User cannot manage users or change assigned roles.
What else can Unified Views do ?
Loading transformed data
ODN/UnifiedViews loads the transformed data to ODN/Storage.
A special DPUs - RDF data mart loader and Tabular data mart loader must be provided to load transformed data to ODN/Storage to the corresponding data store.
The data must be stored there together with metadata, so that ODN/Publication module knows which resources (tables, graphs) are associated with which pipeline/dataset.
ODN/UnifiedView will provide RESTful management API, which will be used by ODN/Management to:
- create new data transformation task (pipeline)
- configure existing pipeline and get configuration of the pipeline
- delete the pipeline
- execute the pipeline
- schedule the pipeline
An excerpt of the methods, which will be available to ODN/Management in a RESTful format is depicted below:
Other management features
Management GUI of ODN/UnifiedViews is used by ODN/Management to:
- show the pipeline detail in an expert mode (user may drag&drop DPUs, fine-tune pipeline configuration)
- show the detailed results of pipeline executions (browse events/logs)
- debug data being passed between DPUs
- have an access to advanced scheduling options
Scheduling and planning of data processing tasks
UnifiedViews takes care of task scheduling (see Scheduler Section). Users can plan executions of data processing tasks (e.g., tasks are executed at a certain time of the day) or they can start data processing tasks manually. UnifiedViews scheduler ensures that DPUs are executed in the proper order, so that all DPUs have proper required inputs when being launched.
Notifications and debugging
A user may configure UnifiedViews to get notifications about errors in the tasks' executions; user may also get daily summaries about the tasks executed.
To simplify the process of defining data processing tasks and to help users analyzing errors during data processing task executions, UnifiedViews provides users with the debugging capabilities. Users may browse and query (using SPARQL query language) the RDF inputs to and RDF outputs from any DPU.
New DPUs creation
UnifiedViews framework also allows users to create custom plugins - data processing units (DPUs). Users can also share DPUs with others together with their configurations or use DPUs provided by others.
- No labels