Publication path

Create new dataset record in ODN/InternalCatalog -
1. Specify metadata of the dataset record, such as its name, description, license, topic, etc -
2. Associate the dataset record with a data publishing pipeline (created directly using ODN/UnifiedViews, an ETL tool) - use one of the following aproaches: create new associated pipeline created manually, associate an existing non-associated pipeline, create a new associated pipeline as modified copy of an existing pipeline, here's a quick guide on how to create a pipeline from scratch
  1. Specify source data for the dataset (data files, e.g. Excel spreadsheets, tables in relational database, data accessible via API etc.)
  2. Specify how the data sources should be transformed (Certain rows/columns removed/adjusted in the tabular data sources, data sources are anonymised, integrated with other data sources, cleansed etc.)
  3. Specify how the data should be published (in which formats they will be available to data consumers - CSV/RDF/... dumps, REST API, SPARQL Endpoint API)
3. Dissociate a pipeline if it is not needed for producing data resources for a dataset anymore - dissociate a pipeline
Create and publish resources for the given dataset
1. Manualy executing the publishing pipeline which ensures extraction of data sources, its transformation, and creation of output published data - create/update dataset resource(s) on demand.
2. Automated execution of publishing pipeline according to schedule set - display/manage pipeline scheduling information
3. Update ODN/InternalCatalog by ODN/Publication module with resources created for a dataset containing - create/update dataset resource(s) automatically when data processing done:
  1. downloadable links to newly published data in case of file dumps
  2. REST API and SPARQL Endpoint API interfaces for data to be accessible in more sophisticated way dedicated for automated use of data by 3rd party applications, etc.
  3. resource specific metadata describing the resource of the dataset
Verify the publication process, make the dataset publicly available
1. verify the publication status after pipeline execution - debug pipeline, display last publication status of a pipeline
2. make the dataset publicly available in ODN/PublicCatalog - make dataset publicly available
3. publish the publicly available dataset also to external catalog(s) (both metadata and data is pushed) which is defined per dataset - add external catalog for update, modify external catalog for update, delete external catalog for update
Create a visualization of a dataset resource accessible via SPARQL endpoint and publish the visualization - add visualization to dataset
Update data at any time (by re-running the data publishing pipeline). If the dataset is publicly available, it is updated automatically also in ODN/PublicCatalog and all external catalogs (if set) as well. Update may be:
1. automated (sets a schedule for repeated execution of the pipeline) - modify external catalog for update
2. manual (manually executes the pipeline) - create new associated pipeline (manually)
Display (using ODN/InternalCatalog):
1. metadata of the dataset - retrieve metadata about dataset via public catalog API
2. metadata of dataset resource - list available resources for a dataset
Access published data (using ODN/PublicCatalog)
1. download file dumps of published data - download latest version of public dataset file dump
2. access published data via REST API/SPARQL Endpoint API - retrieving public data from a dataset via REST API
Display the list of datasets contained in the catalog using:
1. GUI of the catalog (list of datasets) - list of datasets contained in public catalog (GUI)
2. API of the catalog (list of datasets in machine readable form) - list of datasets contained in the public catalog (API)
Display (using ODN/PublicCatalog):
1. metadata of the dataset - - retrieve metadata about dataset via public catalog API
2. metadata of dataset resource - list available resources for a dataset
Share a dataset or its resource by using social media - social media sharing
Working with public datasets, you can
1. browse public data records - publicly available catalog record browser
2. search them using keywords - publicly available catalog records search using keywords
3. filter them - publicly available catalog records filtering
And also
1. manage communication language to communicate with other consumers in the native language - GUI language management
2. display available resources for a public dataset - list of available resources for a public dataset

Page tree

Publication path