Page tree
Skip to end of metadata
Go to start of metadata

This document describe architecture and design of Open Data Node. As of now (June 2014) it contains mainly information "as planned". Later on, as the development will continue, it will be maintained to describe "as is" status.

Document organization

Document is maintained as set of Wiki pages, one page per section. One page output is then constructed using include macro.

When adjusting images, use ODN-arch-all.odg and reupload the adjusted sources.


1. Executive Summary

The purpose of this document is to define architecture and design principles of the system to fulfill requirements (for example as defined in COMSODE Deliverable 2.1 ‘User requirements for the publication platform from target organizations, including the map of typical environments’).

The document describes architecture of a software solution for open data publication (Open Data Node) which was developed by COMSODE project and is basis also for methodologies for Open Data publication (which will also developed by COMSODE project). Architecture was developed based on identified user requirements (Use cases).

The system is split into modules that communicates using interfaces. Modules and interfaces are described in more details in particular sections. For each module it is described what is the role of the module and how it fits into context of overall system, including also description of its interactions with other modules. Some modules are describes also with alternatives that can be used to fulfill the functionality of the module.

Due to modularity of the system, it is possible to use the system in different deployment environments and also to use only some modules of the system and integrate with other modules and systems. Some deployment schemes are also described in the document.

Modules are licensed as Open Source, thus external contributors (essentially whoever) can improve the modules or extend the system. To facilitate that, also description of development model and environment is included  in the document.

This architecture was a base for initial implementation work of the COMSODE project. However, it may happen that some new component will be identified in the future, component which will better fulfill functionality of particular module. In that case, we will update the module description in this document accordingly, including explanation of the replacement.

2. Document context

2.1. Purpose of document

The research and development project COMSODE had as one of the main objectives within its 24 months of duration to create a publication platform called Open Data Node (ODN) that builds on results of previous research and development in the linked data field. Its mission was to bring results from research environment into real-world for people, SMEs and other organizations to use and reuse.

ODN is a foundation for a data integration platform based on Open Data which allows the reuse of data not only between public bodies and end-users but also among public bodies themselves: public bodies can exchange information by using the same infrastructure and tools as end-users which can decrease costs of exchanging the data and in most cases also enhance the quality and speed of the exchange.

This document represents the output of Task 2.2 - Architecture and design of the publication platform (ODN) of the COMSODE project and is maintained since then to reflect changes.

The purpose of the document is to provide overview of the architecture of Open Data Node, its modules, dependencies between modules and main communication interfaces.

For particular modules you can find description of their functionality and also overview of design of the modules.

Finally, it also describes possible deployment environments where ODN can be installed.

2.2. Related Documents

  • COMSODE DOW, version date 2013-08-08, pages 7-8 and 42-43

  • COMSODE Deliverable 2.1: User requirements for the publication platform from target organizations, including the map of typical environments

3. Methodology used

3.1. Methodology

The architecture of the system and design were created in an iterative way – from outlines in the OpenData.sk Wiki, through COMSODE (DoW, the meetings and discussions within consortium and with User Board, to Deliverable D2.3) to current form described in this document. We took into account also previous development of components of the system, as some of them existed before the COMSODE project started or evolved after COMSODE project ended.

The COMSODE project used internal collaborative space - called wiki - based on Atlassian Confluence technology with access of all members of the project team. There was established a dedicated space in the internal wiki for collection of inputs to the architecture and design from consortium members and from members of User Board. Also input (obtained by means other than wiki) from possible users of the platform was included in use cases. As part of final COMSODE outputs, ODN documentation was copied into this public Wiki hosted by OpenData.sk community.

ODN platform is planed to reuse, integrate and extend Open Source components, and architecture is based on the current status of those components and also on what additional features are possible to be included in those components.

It is expected, that the document will be updated when significant design changes are required due to changes in existing components or when a new component that better fits the project needs becomes available and can replace the currently selected component.

3.2. Partner contributions

The architecture and design document was prepared under management of COMSODE consortium member - EEA Ltd. Tomas Knap and Peter Hanečák are architects of ODN and UnifiedViews and DSE CUNI provided main input to Use cases. Project coordinator UNIMIB, DSE CUNI and ADDSEN have reviewed the pre-final version of the document.

After end of COMSODE project, EEA is helping to maintain this documentation.

Open Data Node (ODN) is a publication platform which provides to governments, municipalities and other subjects (e.g. companies) an easy way of publishing their data as machine readable (linked) open data.

ODN performs extraction and transformation (conversion, cleansing, anonymization, linking etc.) of data provided by governments and municipalities. ODN stores the result of data transformations in its own storage. ODN publishes the results of data transformations in open and machine readable formats to data consumers (citizens, companies, governments).

 

4.1. Actors

Main actors dealing with the system

  • Data publishers
    government, municipalities, and other subjects providing data

  • Data consumers
    government, municipalities, non-profit organizations (NGOs), citizens (general public), companies (SMEs), application developers consuming the transformed data

  • Data administrators
    administrators, analysts, data curators, etc. (partially) responsible for configuring ODN - usually employees of data publisher

  • Administrators
    IT staff responsible for installation, maintenance and (partially) configuration of ODN - usually employees of data publisher

ODN helps publishers with the complexity of source data and their transformations to open data and deliver easy-to-use and high quality open data to the data consumers.

ODN helps data consumers get the data easily and efficiently in open, machine readable formats.

4.2. Inputs

Input to the system is data of data publishers stored in heterogeneous environments, using wide variety of formats and employing a lot of different technologies to access and process that data

4.3. Outputs

Output of the system are published open data in various forms, as linked data or as tabular data. Also API access to the data is included. Data consumer may be provided with:

  • RDF data as a result of export from the storage

  • RDF data as a result of SPARQL Query

  • CSV data as a result of export from the storage

  • REST API to access data in the storage

All forms of the data shall be available for data consumers under an open license. For more details about the formats of published open data, please see Section 4.3.

 

4.4. Features for data publishers

  • automated and repeatable data harvesting: extraction and transformation (conversion, cleansing, anonymization, etc.) of data, both:

    • initial harvesting of whole datasets (first import)

    • periodical harvesting of incremental updates

  • integration tools for extracting data from publisher’s internal systems (e.g. databases, data in files, information systems with API, etc.)

  • internal storage for data and metadata; the metadata format will be based on DCAT (http://www.w3.org/TR/vocab-dcat/)

  • data publishing in open and machine readable formats to the general public and businesses including automated efficient distribution of updated data and metadata (dataset replication)

  • integration with data catalogs (like CKAN) for automated publication and updating of dataset metadata

  • internal data catalog of datasets for maintenance of dataset metadata

4.5. Features for data consumers

Features for data consumers are discussed separately for different types of data consumers.

4.5.1. Citizen, data analyst, etc.

  • user is typically accessing ODN instance maintained by someone else, user it not running his own instance

  • user may download data dumps and call APIs to get the data which he is interested in

  • data dumps changes are advertised as Atom feeds

  • user may access the data indirectly, for example via 3rd party data catalog, which - in order to show the user preview or visualization of data - has to first download that data (in a similar manner as if user was accessing it: i.e. downloading a dump or accessing an API from ODN instance maintained by someone else)

4.5.2. Aggregator of Open Data (public body, NGO, SME, etc.)

  • the same as in Section 3.5.1

  • aggregator may easily replicate content in another Open Data Node

  • aggregator may automate data integration and linking

4.5.3. Application developer using Open Data (SME, NGO, etc., public body too)

  • the same as in Section 3.5.2

  • application developer has tools for preparing automated generation of API and custom API  of the published datasets

4.5.4. Data administrator

  • possibility to set up the data extraction, transformation, and publication of open data

  • possibility to monitor execution of data extraction and transformatio tasks

  • possibility to debug data extraction and transformation

4.6. Use Cases

In this deliverable, we depict use cases for data publishers and data consumers. Full list of use cases, also with related scenarios and mock-ups can be found at  https://team.eea.sk/wiki/display/COMSODE/Use+Cases (note: This is the internal consortium wiki space mentioned in section “2.1. Methodology” - it is subject to change based on subsequent user requirements and will be most probably moved to public space once community around ODN grows).

For each use case below, we introduce its name, short description and modules in ODN participating on this use case (ODN/M = ODN/Management module, ODN/UV = ODN/UnifiedViews, ODN/P = ODN/Publication module, ODN/IC = ODN/InternalCatalog module, ODN/C = ODN/PublicCatalog module). For full details, please see https://team.eea.sk/wiki/display/COMSODE/Use+Cases .

4.6.1. Use Cases for Data Publisher

Use Cases-Core Dataset Management.png

 

ID

Name

Short Description

Module

UC1

Create dataset record

As a data publisher I want to create new record about the intended published data, so that I can define for every dataset information about the source data, intended transformations and ways how the transformed data should be published

ODN/M
ODN/IC

UC2

Edit/Manage dataset record

As a data publisher I want to edit/manage dataset records

ODN/M
ODN/IC

UC3

Delete dataset record

As a data publisher I want to delete outdated/obsolete dataset record

ODN/M
ODN/IC

UC4

Configure transformation

As a data publisher I want to configure dataset transformation (cleansing, linking, enrichment, quality assessment, etc.)

ODN/UV

UC4a

Configure transformation using wizard

As a data publisher I want to configure dataset transformation (cleansing, linking, enrichment, quality assessment, etc.) using a wizard, so that it is really simple to prepare typical dataset transformation

ODN/M

UC5

Configure publication

As a data publisher I want to configure how the transformed dataset is published, thus, how the dataset may be consumed by data consumers

ODN/P

UC6

Publish dataset

As a data publisher I want to publish the dataset

ODN/M
ODN/IC
ODN/UV
ODN/P
ODN/C

UC7

Transform dataset

As a data publisher I want to transform the dataset (the dataset is transformed but not published yet)

ODN/UV

UC8

Debug dataset transformation

As a data publisher I want to debug the dataset transformation, see intermediate results of the transformation, see debug message illustrating what happened during the dataset transformation

ODN/UV

UC9

Configure creation of RDF dumps

As a data publisher I want to configure creation of RDF dumps from my published datasets

ODN/P

UC10

Configure creation of CSV dumps

As a data publisher I want to configure creation of CSV dumps from my published datasets

ODN/P

UC11

Configure publishing data via REST API

As a data publisher I want to configure how the REST API is generated on top of my published data, which data is accessible via REST API, which users may use REST API, which methods of accessing my data is available to data consumers

ODN/P

UC12

Configure publishing to SPARQL Endpoint

As a data publisher I want to configure how data consumers may connect to SPARQL endpoints with my published data

ODN/P

UC13

Schedule dataset publication

As a data publisher I want to automate the publication process, so that it can run every week or everytime when new version of dataset is available

ODN/M
ODN/UV
ODN/P

UC14

Schedule dataset transformation

As a data publisher I want to automate the transformation part of the publication process, so that it can run every week or everytime when new version of dataset is available

ODN/M
ODN/UV

UC15

Monitor data publishing tasks

As a data publisher I want to monitor the publishing task to see how the data transformation and data publishing were executed for my datasets

ODN/M

UC16

Basic overview about the transformation pipelines' execution

As a data publisher I want to monitor the publication of the dataset to see whether the publication was OK, or there were some errors

ODN/M
ODN/UV
ODN/P

UC17

Detailed overview about the transformation pipelines' execution

As a data publisher I want to see the detailed overview about the transformations of the dataset

ODN/UV

UC18

Browse transformation logs/events

As a data publisher I want to browse logs and events to see in detail what happened during the dataset transformation

ODN/UV

UC19

Browse intermediate data

As a data publisher I want to browse the intermediate data produced as the dataset is being transformed

ODN/UV

UC20

Overview about the publication of the transformed data

As a data publisher I want to be informed about the publication of the transformed dataset, whether there were some problems or not

ODN/P

UC21

Schedule publishing of transformed RDF data

As a data publisher I want to automate the publishing of the transformed datasets; typical requested behaviour: whenever the dataset is transformed, it should be also published

ODN/P

UC22

Publish transformed dataset

As a data publisher I want to publish the dataset, which has already been transformed

ODN/P

 

 

4.6.2. Use Cases for Data Consumers

Use cases - Data Consumer Use Cases.png



 

ID

Name

Short Description

Module

UC101

Consume RDF data dump

As a data consumer I want to download RDF data dump, so that I can load it to my data store and work with it

ODN/P

UC102

Consume CSV data dump

As a data consumer I want to download CSV data dump, so that I can load it to my data store and work with it

ODN/P

UC103

Consume version of the data dump valid at certain time

As a data consumer I want to get the data dumps valid at certain time in the past

ODN/P

UC104

Query SPARQL Endpoint

As an advanced data consumer I want to query RDF data directly using SPARQL endpoint

ODN/P

UC105

Use REST API

As a data consumer I want to use REST API, so that I can work with the data from my app

ODN/P

UC106

Browse Data Catalog

As a data consumer I want to browse and search the list of datasets (data catalog)

ODN/C

UC107

Get metadata about dataset

As a data consumer I want to get metadata of the published dataset

ODN/C

UC108

Browse (sample) data

As a data consumer I want to browse data sample to get idea what is in the dataset

ODN/P

UC109

Data dump changes

As a data consumer I want to be notified (for example via RSS or Atom) when a data dump is updated or changed

ODN/P

    

 

4.7. Other inputs for Architecture decisions

This is additional list of inputs extending Deliverable 2.1: ‘User requirements for the publication platform from target organizations, including the map of typical environments’.

  • The system must be extensible with new DPU on the data transformation (ETL) pipelines.

  • The overhead of managing data transformation (ETL) tasks and managing outputs from/preparing inputs to DPUs must be reasonable - any execution of ETL task must not take more than 200% of the time needed for manual execution of the DPUs’ logic without the use of the ETL framework (under the condition than the pipeline is running alone)

  • The system must be able to process big files - CSV files containing millions of rows, RDF files containing hundreds of millions of triples.

  • Response time: The Web management GUI of ODN must respond in 99.9% cases in less than 1s (all components of ODN on one machine, client is connecting to server using 100Mb line at least).

  • Target platform: Linux, Windows

  • Preferred languages for the system implementation: Java, others only in case of reuse of existing components with sufficient added-value (for ODN and for ODN users)

  • The internal format for all data being transformed during the ETL process is RDF, an universal machine readable data format.

5. Open Data Node Modules


Open Data Node consists of the following modules:

  • ODN/UnifiedViews

  • ODN/Storage

  • ODN/Publication

  • ODN/InternalCatalog

  • ODN/PublicCatalog

  • ODN/Management

Modules listed above are discussed in more detail in the following sections.

5.1. Module ODN/UnifiedViews

Module ODN/UnifiedViews is an ETL& data enrichment tool.

It is responsible for extracting and transforming source data (datasets), so that they can be published as (linked) open data. The result of the transformation is stored in the database managed by ODN/Storage module.

ODN/UnifiedViews module is responsible for:

  1. extracting data provided by data publishers

  2. transforming these data to machine readable data format; such transformation may include enriching the data, cleansing the data, assessing the quality of the data

  3. storing the machine readable data to the database managed by ODN/Storage.

Input of the module is the data provided by data publishers. Data is expected to be structured, mostly tabular or linked data (RDF). Module will support basic data formats out of the box, support for more complex data formats is available via plugins.

Module will work with different formats (in files),  but preferred is data in RDF format. RFD format will allow usage of advanced data cleansing and enrichment techniques based on linked data also for use cases where output will not be in RDF (i.e. for example cases where ODN will be used to clean CSV files before publishing).

Output of the module is the extracted and transformed machine readable data stored in ODN/Storage. Again, data is expected to be structured, tabular or linked data.

5.1.1. UnifiedViews - state of the art

Module ODN/UnifiedViews will use as its base the tool UnifiedViews (https://github.com/UnifiedViews). It is an ETL framework with a native support for transforming RDF data. UnifiedViews allows users to define, execute, monitor, debug, schedule, and share data transformation tasks.

UnifiedViews was originally developed as a student project at Charles University in Prague and now it is maintained by Semantica.cz, Czech Republic, Semantic Web Company, Austria, and EEA, Slovak Republic.

UnifiedViews allows users to define and adjust data processing tasks (pipelines) using a graphical user interface (see Figure below); the core components of every data processing task are data processing units (DPUs). DPUs may be drag&dropped on the canvas where the data processing task is constructed. Data flow between two DPUs is denoted as an edge on the canvas; a label on the edge clarifies which outputs of a DPU are mapped to which inputs of another DPU. UnifiedViews natively supports exchange of RDF data between DPUs; apart from that, files may be exchanged between DPUs.

unifiedViews-ui.png

UnifiedViews takes care of task scheduling. Users can plan executions of data processing tasks (e.g., tasks are executed at a certain time of the day) or they can start data processing tasks manually. UnifiedViews scheduler ensures that DPUs are executed in the proper order, so that all DPUs have proper required inputs when being launched.

A user may configure UnifiedViews to get notifications about errors in the tasks' executions; user may also get daily summaries about the tasks executed.

To simplify the process of defining data processing tasks and to help users analyzing errors during data processing task executions, UnifiedViews provides users with the debugging capabilities. Users may browse and query (using SPARQL query language) the RDF inputs to and RDF outputs from any DPU.

UnifiedViews framework also allows users to create custom plugins - data processing units (DPUs). Users can also share DPUs with others together with their configurations or use DPUs provided by others.

Technical structuring and licensing of UnifiedViews allows DPUs to be licensed not just as Open Source, but also using proprietary license. This is a planned feature of the tool needed by use cases where commercial exploitation is needed. ODN will support same commercial use cases.

5.1.1.1. UnifiedViews components and dependencies

Figure below depicts current maven modules in UnifiedViews and its dependencies. Modules in the yellow box are visible to DPU developers. The most important modules are:

  • frontend - Management GUI of UnifiedViews

  • backend - Engine running the data transformation tasks

  • commons-app - DAO & Services module, which is common to frontend and backend modules; it is used to store configuration for pipelines, DPUs, pipeline executions etc.

  • dataunit-rdf, dataunit-file - Modules with interfaces for data units; DPU developers writing new DPUs use these modules to read data from input data units and write data to output data units

uv-ComponentModel.png

 

5.1.2. Structure of the ODN/UnifiedViews and its context:

odn-uv-structure.png

 

ODN/UnifiedViews comprises of the important components as follows:  

  • DAO & Service - used to access database where configuration of ETL tasks and its executions is stored (realized by module commons-app in Figure XX - from chapter 4.1.1.1.)

  • HTTP REST Transformation API - Services from DAO & Services layer exposed as HTTP REST methods. Used by ODN/Management module (this component is not realized by any module in Figure XX)

  • Data Processing Engine - Robust engine running the manually launched or scheduled transformation tasks - transformations may include data cleansing, linking, integration, quality assessment (realized by “backend” module in Figure XX)

  • Management GUI - GUI used to manage the configuration of pipelines, debugging executions, etc. (realized by “frontend” module in Figure XX)

 

5.1.3. Interaction with other modules

1. ODN/UnifiedViews loads the transformed data to ODN/Storage. A special DPUs - RDF data mart loader and Tabular data mart loader must be provided to load transformed data to ODN/Storage to the corresponding data store. The data must be stored there together with metadata, so that ODN/Publication module knows which resources (tables, graphs) are associated with which pipeline/dataset.

2. ODN/UnifiedView will provide RESTful management API, which will be used by ODN/Management to:

  • create new data transformation task (pipeline)

  • configure existing pipeline and get configuration of the pipeline

  • delete the pipeline

  • execute the pipeline

  • schedule the pipeline

An excerpt of the methods, which will be available to ODN/Management in a RESTful format is depicted below:

odn-uv-HTTP REST Transformation API.png

 

3. Management GUI of ODN/UnifiedViews is used by ODN/Management to:

  • show the pipeline detail in an expert mode (user may drag&drop DPUs, fine-tune pipeline configuration)

  • show the detailed results of pipeline executions (browse events/logs)

  • debug data being passed between DPUs

  • have an access to advanced scheduling options

5.2. Module ODN/Storage

The purpose of this module is to store the transformed data produced by ODN/UnifiedViews. ODN/Publication module uses ODN/Storage to get the transformed data, so that it can be published - provided to data consumers.

 

5.2.1. Structure of the ODN/Storage and its context

odn-storage-structure.png

Two important components of ODN/Storage are:

  • RDBMS data mart

  • RDF data mart

5.2.1.1. RDBMS data mart

RDBMS data mart is a tabular data store, where data is stored when data publisher wants to prepare CSV dumps of the published dataset or provide REST API for data consumers.

ODN/Storage will use SQL relational database (such as MySQL, PostgreSQL, etc.) for storing tabular data.

Every transformation pipeline can contain one or more Tabular data mart loaders - DPUs, which load data resulting from the transformation pipeline to RDBMS data mart. Every loader loads data into a single table. The name for the table is prepared by ODN/UnifiedViews and is based on the dataset ID and  ID of the tabular data mart loader DPU.

Since every published dataset may require more then one transformation pipeline, and not all results of every transformation pipeline should be published by ODN/Publication module, data publisher may decide which tables should be published by (1) manually specifying all the tables which should be published or by (2) specifying that all results of certain transformation pipeline should be published.

To support the above feature, data being stored to RDBMS data mart must be associated with metadata holding for every table at least:

  • to which dataset the table belongs to

  • which transformation pipeline produced the table

Note: Currently, UnifiedViews supports Openlink Virtuoso (http://virtuoso.openlinksw.com/) as the only RDBMS implementation. As part of ODN, we will employ JDBC to add support for wider range of databases. Testing and validation will be done based on feedback from users (currently we plan to work also with PostgreSQL).

5.2.1.2. RDF data mart

Data is stored in RDF data mart when data publisher wants to prepare for data consumers RDF dumps of the published dataset or provide SPARQL endpoint on top of the published dataset.

Every transformation pipeline can contain one or more RDF data mart loaders - DPUs, which load data resulting from the transformation pipeline to RDF data mart. Every RDF data mart loader loads data to a single RDF graph. RDF graph represents a context for RDF triples, graph is a collection of RDF triples produced by one RDF data mart loader. The name for the RDF graph is prepared by ODN/UnifiedViews and is based on the dataset ID and  ID of the RDF data mart loader DPU.

Since every published dataset may require more then one transformation pipeline, and not all results of every transformation pipeline should be published by ODN/Publication module, data publisher may decide which RDF graphs should be published by (1) manually specifying all the graphs which should be published or by (2) specifying that results of certain transformation pipeline should be published.

To support the above feature, data being stored to RDF data mart must be associated with metadata holding for every RDF data graph at least:

  • to which dataset the graph belongs to

  • which transformation pipeline produced the graph

Note: Currently, UnifiedViews supports Openlink Virtuoso (http://virtuoso.openlinksw.com/) and Sesame (http://www.openrdf.org/) as RDF data mart implementation. As part of ODN, we will employ SAIL API to add support for wider range of triplestores. Testing and validation will be done based on feedback from users.

5.2.2. Interaction with other modules

1. Every transformation pipeline (ODN/UnifiedViews) can contain one or more RDF/RDBMS data mart loaders - DPUs, which load data resulting from the transformation pipeline to the corresponding data mart (RDF/RDBMS).

2. ODN/Storage notifies ODN/Publication about changes which happened (dataset updates, etc.) so that ODN/Publication can adapt to the changes.

3. ODN/Publication uses data marts to get required graphs/tables to be published (exported as RDF/CSV dumps, made available via REST API/SPARQL Endpoint). ODN/Publication selects the relevant graphs/tables based on the data publishers preference and metadata associated with tables/graphs.

3. ODN/Management may query ODN/Storage to get statistics about stored data, at least:

  • How many RDF graphs/tables is stored in RDF/RDBMS data mart in total/for the given dataset ID?

  • How many RDF triples are stored in certain RDF graph in RDF data mart?

  • How many records are in certain table in RDBMS data mart?

5.3. Module ODN/Publication

Module responsible for publishing data via REST APIs, SPARQL endpoint or as data dumps in RDF or CSV formats. Published data is already transformed as defined by data transformation pipelines in ODN/UnifiedViews and stored in ODN/Storage.

The module allows data administrators/publishers to select how the published datasets are provided to data consumers; in particular, ODN/Publication module allows users to select:

  • publication of the dumps (CSV for tabular data, RDF for linked data),

  • publication via API (SPARQL Endpoint for RDF data, REST API for tabular data).

Data administrators/publishers may also configure some specific settings per each publication option: to tweak dump generation process (like which RDF serialization to use: Turtle, XML, etc.), to select which resources (tables, graphs) associated with the transformed dataset (and stored in ODN/Storage) should be published - made available to data consumers, etc.

 

5.3.1. Structure of the ODN/Publication and its context

odn-publication-structure.png

ODN/Publication comprises of the important components as follows:  

  • DAO & Service layer - used to access database where configuration and results of publication tasks are stored

  • Publication Management API which is called by ODN/Management when certain dataset should be published or when certain methods of data consumption (REST API, SPARQL Endpoint, dumps) should be enabled or disable

  • Publication Engine - module, which is responsible for:

    • creating dumps for the given dataset

    • configuring SPARQL endpoint/REST API for the given dataset

  • Management GUI - GUI used to manage the configuration of the ODN/Publication module

Note: As part of data publication, some metadata will be published by this module too (for example “Last Modification Time” will be included in appropriate HTTP headed in response). But publication of metadata is mainly responsibility of ODN/PublicCatalog (see section 4.5).

5.3.2. File dumps

ODN/Publication module supports creation of file dumps in CSV or RDF formats. When dataset is transformed, it is being published. As part of the publishing of the transformed dataset, CSV or RDF dump may be created. The dump in the CSV/RDF fromat is created if the the data publisher decides so.

To create the dump, ODN/Publication module exports the desired data in ODN/Storage. Afterwords, the dump is versioned using Git (http://git-scm.com/). Git allows data consumers to work with the latest or any other previous version of the dataset.  ODN/Publication also publishes metadata of the dump, which are obtained from ODN/InternalCatalog.

Finally, new entry in the Atom feed (http://en.wikipedia.org/wiki/Atom_(standard)) associated with the processed dataset is created; such feed points data consumers to the file(s) in the git repository, where the published data and metadata is. Such feed must be reachable from the dataset record in the ODN/PublicCatalog module.

5.3.2.1. RDF dumps

RDF dump may be published only if the result of the dataset transformation is available in RDF data mart in ODN/Storage.

To create the dump, ODN/Publication queries the RDF data mart via SPARQL construct query to get dump in N-Triples (http://www.w3.org/TR/2014/REC-n-triples-20140225/) RDF serialization format. We use N-Triples as RDF serialization format, because it is line oriented serialization format which may be easily versioned by Git.

5.3.2.2. CSV dumps

CSV dump may be published only if the result of the dataset transformation is available in RDBMS data mart in ODN/Storage.

To create the dump, ODN/Publication module exports the desired table in RDBMS data mart  as CSV dump.

5.3.3. SPARQL endpoint

ODN/Publication module supports publication of data via SPARQL endpoints. When dataset is transformed, it is being published. As part of the publishing of the transformed dataset, data may be made available via SPARQL endpoint. Data is made available via SPARQL endpoint only if the data publisher decides so. Data may be made available via SPARQL endpoint only if the result of the dataset transformation is available in RDF data mart.

To make the data available via SPARQL endpoint, ODN/Publication module provides data consumers with a simple querying interface, where data consumer may query the published data and associated metadata (obtained from ODN/InternalCatalog) using SPARQL query. There is no versioning in this case, only latest data is available via SPARQL endpoint.

5.3.4. Rest API

ODN/Publication module supports creation of REST APIs for data consumption. When dataset is transformed, it is being published. As part of the publishing of the transformed dataset, REST API may be generated for the published data. REST API is generated if the data publisher decides so. REST API may be generated only if the result of the dataset transformation is available in RDBMS data mart.

API is based on “Representational state transfer” software architectural style (https://en.wikipedia.org/wiki/Representational_State_Transfer) and - for the purpose of Open Data - will provide read-only functionality: Users will be able to get the data from datasets using HTTP protocol, getting results in JSON, XML, CSV or RDF formats based on their preference.

API is intended to be used by programmers or similarly skilled users who can develop software or scripts. But given the truly simplistic nature of this kind of API, even causal user can work with it using common web browser.

There is no versioning in this case, only latest data is available via REST API.

4.3.6. Dataset replication

Automated efficient distribution of updated data and metadata will be achieved by careful implementation of two main methods mentioned earlier, e.g. file dumps and REST API, complemented with third option based on Git.

First two options are generic and interoperable: they will work regardless of the exact tool being used to replicate the data. At one end there will be ODN, on the other hand it can be anything.

The rest is sort of proprietary: technically based on open formats and protocols but limited to smaller/niche audiences.

Note: There is possibility also for fourth option based on combination of file dumps and peer2peer technologies (like BitTorrent). As of now we do not register a demand for that so it is not in the scope of the development.

4.3.6.1. Via file dumps

Proper publishing of file dumps, along with increments and Atom feeds, combined with proper usage of features of HTTP protocol (cache related headers, range requests, if-modified-since headers etc.) is one option.

4.3.6.2. Via REST API

REST API is another option, but that requires presence of “last modified” (or similar) fields within datasets at the line/record level.

Those two options are generic and interoperable: they will work regardless of the exact tool being used to replicate the data. At one end there will be Open Data Node, on the other hand ca be anything.

4.3.6.3. Via Git

Third option is to take advantage of Git versioing (see section “x.x.x. file dumps”):

  • ‘git clone’ can be used to get a first copy of data

  • ‘git pull’ can be used repeatedly to obtain subsequent updates

This method takes advantage of a lot of existing software and infrastructure, mainly Git versioning tool and for example GitHub (or GitHub like) repositories and is most suitable to software developers and subset of data analysts who already use such tools.

5.3.5. Interaction with other modules

1. ODN/Management initiates any publication process via Publication API of ODN/Publication. ODN/Publication module uses ODN/Storage to get the data which should be published.

2. ODN/Management uses Management GUI of ODN/Publication to set up the settings for creation of CSV/RDF dumps, settings for generating REST APIs, settings for preparing SPARQL endpoint.

3. ODN/Publication react to notifications from ODN/Storage by for example recreating file dumps or invalidating cached information for updated datasets.

4. Data consumers may (1) download CSV/RDF dumps, (2) use SPARQL endpoints, (3) use REST APIs.

5.4. Module ODN/InternalCatalog

Module ODN/InternalCatalog is the first module and main module with which data publishers interact while working with ODN. It encapsulates the functionality of data catalog but this functionality is used to manage datasets which should be transformed/published by ODN; it also allows data publishers to see details about the transformation/publishing process. It is an internal catalog, thus, it is not visible to public, but only data publisher/data administrator can use the catalog.

5.4.1. Interaction with other modules

Pipelines in ODN/UnifiedViews can create and update resources (metadata and data) in datasets maintained in ODN/InternalCatalog (based on associations between datasets and pipelines).

ODN/InternalCatalog automatically replicates data and metadata for datasets and resources into ODN/PublicCatalog, but (of course) only in cases when datats are marked as "public" (i.e. suitable and intended for publication for data users).

5.5. Module ODN/PublicCatalog

ODN/PublicCatalog is the second module which encapsulates the functionality of data catalog. ODN/PublicCatalog holds metadata about each dataset, which is published by ODN. This data catalog is publicly visible, the primary users of this catalog are data consumers, who may browse/search the published datasets’ metadata; data consumer may also get a link to the dataset’s dump or API, so that they can consume the data in the dataset.

ODN/PublicCatalog implements also REST API for datasets where data publisher chose to do so. This is sort of "not nice" from design perspective (as this mixes the sepration between ODN/Publication = data and ODN/PublicCatalog = metadata) but it is practical (it allows ODN to use REAST API features implemented in CKAN, avoiding duplicate implementation).

5.5.1. Interaction with other modules

ODN/InternalCatalog is replicating datasets marked as public into this module.

As part of publication functionality, ODN/PublicCatalog may link to ODN/Storage (specifically to Virtuoso based SPARQWL endpoint GUI) - or ODN/Publication (links to file dumps served from ODN/Storage by Apache).

 

5.6. Module ODN/Management

Module is responsible for management of all components which form so called internal part of ODN (i.e. ODN/InternalCatalog and ODN/UnifiedViews, including also ODN/Management itself). As each of those components has its own web GUI which requires authentication, this module provides mainly:

  1. user management - via usage of midPoint (https://evolveum.com/midpoint/)
  2. Single Sign-On (SSO) - using CAS (https://www.apereo.org/)

TODO: Further details.

 


6. Development

The main target development platform is Java. ODN/Publication may use PHP tool for generating APIs. Similarly, ODN/InternalCatalog and ODN/PublicCatalog use already existing Python based tool CKAN to ensure the data catalog functionality.

ODN must be able to run on both Linux and Windows operating systems.

6.1. Development process

Source codes related to ODN are kept in github under github project https://github.com/OpenDataNode. Specific repositories are created for particular module.

Certain modules are developed under a separated github projects and are reused in ODN. Any changes to that modules are requested using issues in particular project or as a pull request to proper repository on github.

In exceptional case we can fork original repository (mainly in case when original repository doesn't accept pull request), but main goal will be to merge changes back later.

For the whole ODN, we will follow the guidelines for sustainable development proposed for developing ODN/UnifiedViews (https://grips.semantic-web.at/display/UDDOC/Guidelines+for+Contributors).

6.2. Used technologies

There are two main technological stacks in the ODN, derived from re-use of UnifiedViews and CKAN in the project:

  1. Java based stack - for ODN/UnifiedViews, ODN/Management, etc.
  2. Python based stack - for ODN/InternalCatalog and ODN/PublicCatalog

6.2.1. Java stack

Java based technologies are used for majority of ODN modules. Each module has its own sub-set of technologies, frameworks, libraries and tools. Some of those are mentioned here, but for more concrete and up-to-date information, please refer to homepages and documentation of upstream projects (TODO: add Wiki page listing all main upstram projects and link here).

6.2.1.1. Spring

Spring is an open source application framework and inversion of control container for Java platform. The core features of the Spring Framework can be used by any Java application, but there are extensions for building web applications on top of the Java EE (enterprise) platform. Spring will be used in all ODN modules implemented in Java.

6.2.1.2. Vaadin

Java open source framework for building web applications. This framework incorporates event-driven programming and widgets, which enables a programming model that is closer to GUI software development than traditional web development with HTML and JavaScript. Developers may write pure Java code and the framework ensures the communication between server side code (application server) and client side code (browser), also Java code is automatically translated to JavaScript for the client code. Vaadin will be used in ODN/UnifiedViews, ODN/Management, ODN/Publication for implementing Management GUI. ODN/InternalCatalog, ODN/PublicCatalog will be based on existing tool - DKAN - and its interface.

6.2.1.3. Sesame openRDF

Sesame openRDF is an open source framework for processing RDF data. This includes parsing, storing, inferencing and querying of/over such data. It offers an easy-to-use API that can be connected to all leading RDF storage solutions. It allows to connect to SPARQL endpoints and create applications that leverage the power of linked data and Semantic Web. OpenRDF is used in ODN/UnifiedViews, ODN/Storage, ODN/Publication to work with RDF database.

EclipseLink is the open source Eclipse Persistence Services Project. The software provides an extensible framework that allows Java developers to interact with various data services, including databases, web services, Object XML mapping (OXM), and Enterprise Information Systems (EIS). EclipseLink is used to persist data objects in ODN/UnifiedViews.

6.2.1.5. XStream

XStream is a simple library to serialize objects to XML and back. XStream uses reflection to discover the structure of the object graph to serialize at run time and does not require modifications to objects. It can serialize internal fields, including private and final, and supports non-public and inner classes. XStream is used in ODN/UnifiedViews to store pipeline configurations.

6.2.1.6. KineticJS

KineticJS is an HTML5 Canvas JavaScript framework that enables high performance animations, transitions, node nesting, layering, filtering, caching, event handling for desktop and mobile applications.

You can draw things onto the stage, add event listeners to them, move them, scale them, and rotate them independently from other shapes to support high performance animations, even if your application uses thousands of shapes.

6.2.1.7. OSGI framework

ODN/UnifiedViews must support easy and smooth extension with custom DPUs added into the running application. Every DPUs may use its set of libraries. These libraries must not be in conflict. To ensure that, we use OSGI framework, in particular the Apache Felix implementation (http://felix.apache.org/).

6.2.2. Python stack

TODO: Add some more details about Python itself (version supported, etc.) and frameworks used in CKAN, etc.

6.3. Development tools

6.3.1. Maven

Maven (http://maven.apache.org/) is used for management of dependencies in the source code, portability between Java IDEs and easy application build.

6.3.2. Git + GitFlow

Git (http://git-scm.com/) is a version control system used for source code version control management and tracking changes.

GitFlow (https://github.com/nvie/gitflow) is a collection of Git extensions to provide high-level repository operations for Vincent Driessen's branching model (http://nvie.com/git-model).  Nice overview of whole worklow is available at http://danielkummer.github.io/git-flow-cheatsheet/

SourceTree (http://www.sourcetreeapp.com/) is recommended GUI client (available for MS Windows and MAC OSX) as it has good support for defined gitflow workflow. For linux, command line is necessary to use to follow gitflow workflow.

6.3.3. IDE for developers

Project is not strictly bounded to specific IDE, but Eclipse (http://www.eclipse.org/), Netbeans (https://netbeans.org/) or InteliJ Idea (http://www.jetbrains.com/idea/) are recommended IDEs for developing Java applications.

Eclipse and Netbeans have also plugins for development of other type of applications (for example php/Drupal based).

7. Deployment

Firstly, we describe basic deployment scenarios of Open Data Node, considering Open Data Node as a single unit. Afterwards, we discuss deployment of ODN’s modules.

7.1. Basic ODN Deployment Scenarios

Open Data Node can be deployed many times by many actors. ODN can help with needs specific to each particular actor:

  • government organizations, municipalities, etc., want to publish majority of their information as Open Data

  • other government bodies need to work with some data published by other government bodies

  • non-profits and application developers want to run specific tasks using copies of official data, for example analytic and visualization applications, data integration, etc.

 

This section contains schemes for basic deployment options for ODN. Options are sorted based on achievable publishing quality from best (and most expensive) at the top to worst (and cheapest) at the bottom.

7.1.1. Tight integration, at the publisher's premises


Open Data Node is tightly integrated with publisher's internal application(s) - it has direct access to backend databases or is integrated into application(s) workflows via API. ODN is deployed alongside publisher's internal application(s), it has to respect network, security and other zones.

Typical scenario: 

  • publisher wants to achieve high quality and efficiency and is willing to invest more

Prerequisites:

  • publisher is willing to update existing workflows and applications

7.1.2. Loose integration


Open Data Node is integrated with publisher's application(s) in a loose way using some periodical data dumps or APIs. ODN can be deployed in several locations:

  • at publisher's premises - access to the data is secured in a similar way as in the case of tight integration in Section 6.1.1.

  • at collocated housing, for example data center shared by multiple government organizations - access to the source data secured for example using combination of IPsec, HTTPS and access controls (authentication and authorization)

  • in the cloud - access to the source data secured with just HTTPS and access control (not suitable for sensitive data or sensitive internal systems)

Typical scenario:

  • publisher wants to achieve high quality and efficiency but has limited resources so changes to existing infrastructure and applications have to be limited

  • aggregator is affiliated with one or more publishers and is willing and able to invest into tighter integration with them (ministry aggregating data from municipalities or SME planning to make business using aggregated data)

Prerequisites:

  • publisher is able to do minor modifications to existing workflows and applications

7.1.3. No integration, deployed at 3rd parties


Open Data Node is not explicitly integrated with publisher's applications, other existing means are used to get access to the data (either Open Data or other format or API, at worst case scrapping of data from website). ODN is deployed at 3rd party using their own hardware, collocated housing, or Cloud.

Typical scenario:

  • publisher wants to publish Open Data but has severely limited resources or options so changes to existing infrastructure and applications have to be preferably none

  • 3rd party aggregator or application developer wants to use data from one or more publishers but for some reason is not able or willing to implement tighter integration with them

Prerequisites:

  • some usable form of access to source data is possible without changing existing workflows and applications



7.2. Deployment of ODN Modules

7.2.1. Single Machine

ODN modules may be deployed all on one machine as depicted below. The deployment requires:

  • Application Server, which supports Java 7 EE applications. ODN will be tested on Apache Tomcat 7+. Application server is needed for management GUIs of ODN modules.

  • Relational database management system (RDBMS) - Relational database is used mainly for storing configurations of the modules (definitions of data transformation and publishing tasks, configuration of data catalogs, etc.); in these cases, relational database is preferred, because the schema of the configurations is known in advance and should not change much in the future. Relational database is also used by ODN/Storage to store tabular data produced by ODN/UnifiedViews. Decision is still pending about the particular database management system we will use, however, due to object relational mapping frameworks, which abstract the underlying database system, changing the database system during the works on the project is easy and straightforward.

  • RDF Storage - We may use any RDF store, which is supported by openRDF API (http://www.openrdf.org/), e.g., Openlink Virtuoso (http://virtuoso.openlinksw.com/rdf-quad-store/) or Sesame (http://www.openrdf.org/). RDF store is used by ODN/UnifiedViews to store intermediate results of the data transformations and also by ODN/Storage to store the RDF data produced by ODN/UnifiedViews.

  • HTTP Server - ODN will be tested with Apache HTTP Server. HTTP Server is required for ODN/InternalCatalog, ODN/PublicCatalog and ODN/Publication.

odn-deploy-single.png

 

7.2.2. Distributed Environment

ODN modules support distributed deployment to more physical devices. Basically every artifact depicted in Figure above may be deployed on a different device. If ODN/UnifiedViews - Engine and ODN/UnifiedViews - Management GUI is placed on different devices, administrator has to set up shared network file system both these artifacts may use.

Typically, we expect such deployment in large organizations with more complicated IT architecture driven by (among other things) security requirements. In this case:

  • ODN/UnifiedViews is expected to be deployed withing internal, restricted segment along with ODN/Management,

  • ODN/Publication is expected to be in DMZ segment accessible from the outside by general public (like a typical webserver) and

  • ODN/Storage is expected to be in internal or other appropriate segment, reachable by both ODN/UnifiedViews and ODN/Publication

7.2.3. Custom Environment

As the design of ODN is modular (driven in particular by our vision, plans, engineering experience, etc., but in general also by best practises in software development and Open Source development) individual user will have possibility to not just move the individual modules between multiple machines, but also to skip (do not deploy) certain modules, if they do not need them.

For example, ODN/InternalCatalog and ODN/PublicCatalog may be skipped, in exchange for direct integration between ODN/UnifiedViews and ODN/Publication with the particular national data catalog.

Note: Given proper modular design and implementation, exact options and combinations are wide and depend strongly on particular user and his needs and use cases, so we are not able to (and will not) document all possible options here. Some most common cases will be later explained in ODN documentation and COMSODE Methodology.

8. Licensing

Open Data Node as a whole is Free and Open Source software (see https://en.wikipedia.org/wiki/Free_and_open-source_software).

As it re-uses many existing free and open source components, it is not governed by one single license. Majority of components are covered by three basic families of licenses: GPL, APL and BSD. More detailed licensing information about individual ODN modules follows:

8.1 ODN/UnifiedViews

UnifiedViews is licensed under the combination of GPLv3 and LGPLv3. For more information, please see https://github.com/UnifiedViews/Core#licenses .

8.2. ODN/Storage

In this module, following components are used:

8.3 ODN/InternalCatalog and ODN/PublicCatalog

CKAN (reused in modules ODN/InternalCatalog and ODN/PublicCatalog) is licensed under AGPLv3 (see https://github.com/ckan/ckan#copying-and-license).

8.4 ODN/Management

In this module, the principal components are:

8.5 ODN/Publication

This module is primarily built on portion of Virtuoso Open Source (SPARQL endpoint and its GUI) - mentioned above in section "ODN/Storage" - and LDVMi.

LDVMi is licensed under under APLv2 (see https://www.apereo.org/content/cas-under-apache-20-license).

8.5 Other modules and components

Each module can be broken down further into even smaller components and libraries. Each such smaller part has the same or other (but compatible) Open Source license.

For example, UnifiedViews uses following components and libraries:

componentlicenseused in

Spring app. platform

Apache License 2.0

ODN/UnifiedViews

Vaadin

Apache License 2.0

ODN/UnifiedViews

EclipseLink

Eclipse Public License (EPL)

ODN/UnifiedViews

XStream

BSD license

ODN/UnifiedViews

KineticJS

MIT or GPL Version 2 licenses - https://github.com/ericdrowell/KineticJS/wiki/License

ODN/UnifiedViews

Apache Felix - OSGI framework

Apache License 2.0

ODN/UnifiedViews

As it is hard to maintain complete component and licensing list down to the lowest level, thus we're maintaining only information about components directly utilized in ODN. Please consult documentation of each such major component (mentioned in previous sections) to obtain its more detailed information about its sub-components and licensing information.

9. Abbreviations

 

Abbreviation

Description

CSV

Comma separated values (http://en.wikipedia.org/wiki/Comma-separated_values)

DPU

Data Processing Unit - component in UnifiedViews that can execute a transformation of data

DCAT

Data Catalog Vocabulary (http://www.w3.org/TR/vocab-dcat/)

ETL

extract, transform, and load (ETL) refers to a process that:

ODN

Open Data Node

OWL

Web Ontology Language (http://en.wikipedia.org/wiki/Web_Ontology_Language)

RDF

Resource Description Framework (http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/)

SPARQL

SPARQL Protocol and RDF Query Language (http://www.w3.org/TR/rdf-sparql-query/)

URI

Uniform resource identifier (http://en.wikipedia.org/wiki/URI)

UV

Unified Views (https://github.com/UnifiedViews)

UV-core

Unified Views core components (https://github.com/UnifiedViews/Core)

UV-plugins

Unified Views plugin components (DPUs) (https://github.com/UnifiedViews/Plugins)

10. Glossary

Atom feed - a published list (or "feed") of recent articles or content in a standardized, machine readable format that can be downloaded by programs that use it (http://en.wikipedia.org/wiki/Atom_(standard)) . It is an alternative to RSS.

Data catalog - a database where are stored metadata about datasets.

Data catalog vocabulary (DCAT) - an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.

Data mart - access layer of the data warehouse environment that is used to get data out to the users (http://en.wikipedia.org/wiki/Data_mart). In ODN it is the database, which is used by data catalogs to get dumps from, by ODN/Publication to provide REST APIgenerate data dumps, provide data as results of API calls.

Internal data mart - the database where the results of the ODN/UnifiedViews is stored when the pipeline finishes. see also Data mart

Data Processing Unit - see DPU

Dataset - one or more related input files (http://en.wikipedia.org/wiki/Data_set). In ODN, dataset is processed as input and also ODN produces datasets as output.

Dataset record - record about the dataset containing metadata of the dataset, associated publication pipelines and data catalog resources

Data catalog resource - the particular file (dump), REST API, or SPARQL Endpoint, visualization etc. associated with the given dataset record. 

DCAT Distribution - Basic classes in DCAT are "Catalog", "Dataset" and "Distribution". Distribution is equal to catalog resource. 

Data Publication Pipeline - oriented acyclic graph, where nodes represent DPU instances, and oriented edges represent data flow between these instances. Every pipeline consists of one or more DPUs. ODN/UnifiedViews supports exchange of RDF data between DPUs.

Data Publication Task - see Data Publication Pipeline

Data unit - an input to DPU or output from DPU is called data unit (DU). Every DPU may have more input data units and more output data units. DPUs may also use different types of DU, e.g., RDF data unit being able to read/write RDF data or File data unit being able to read/write generic files.

DPU - plugin on the data processing pipelines, which executes certain transformation, cleansing, quality assessment on top of the processed data. DPU encapsulates certain business logic needed when processing data (e.g., one DPU may extract data from a SPARQL endpoint or apply a SPARQL query). Every DPU must define its required/optional inputs and produced outputs.

DPU instance - placement of DPU on a pipeline. DPU instance is therefore created when DPU is placed on pipeline canvas.

DPU configuration - associative array of key-value pairs, which customize functionality of DPU instance.

External data catalog - for ODN it is data catalog that is not part of ODN but ODN can be integrated with it. see also Data catalog

Instance Configuration - configuration for specific DPU instance. It is created at the time of placing DPU on the pipeline (canvas) as a copy of Template Configuration

Internal Data catalog - Data catalog that is part of ODN and is visible to public. In ODN is also referenced as Data catalog.

Public Data catalog - Data catalog that is part of ODN and is visible to public. In ODN is also referenced as Data catalog.

RDF - standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed. (http://en.wikipedia.org/wiki/Resource_Description_Framework, http://www.w3.org/RDF/)

Resource Description Framework - see RDF

Representational state transfer - (REST) software architectural style consisting of a coordinated set of architectural constraints applied to components, connectors, and data elements, within a distributed hypermedia system and also applied to the development of web services. (https://en.wikipedia.org/wiki/Representational_State_Transfer)

SPARQL - (SPARQL Query Language for RDF) query language for databases, able to retrieve and manipulate data stored in RDF format. SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns. (http://en.wikipedia.org/wiki/SPARQL, http://www.w3.org/2001/sw/wiki/SPARQL)

Staging database - the database used by ODN/UnifiedViews to store intermediate results of the DPUs on the pipeline

Template Configuration - previously called "default configuration", default DPU configuration associated with each DPU.

Transformation Pipeline - see Data Transformation Pipeline

 

11. References

  1. http://www.comsode.eu/  main page of project COMSODE

  2. https://github.com/UnifiedViews - source codes of ODN/UnifiedViews module

  3. http://ckan.org/ - data catalog CKAN home page

  4. http://nucivic.com/dkan/ - open data platform for cataloging home page

  5. http://www.w3.org/2001/sw/DataAccess/tests/implementations - comparison of different implementations of SPARQL endpoint

  6. http://www.openrdf.org/ - homepage of openRDF Sezame framework for SPARQL 

  7. http://theodi.org/blog/git-data-publishing - ideas how to use GitHub for data publishing

  • No labels

2 Comments

  1. Is this section being updated.
    Is here information about midPoint ?

     

    1. Is this section being updated.

      Yes, but slowly.

      Is here information about midPoint ?

      Not yet.