Page History

Create new dataset record in ODN/InternalCatalog and associate it with a publishing pipeline
1. Specify metadata of the dataset record, such as its name, description, license, topic, etc - create a new catalog record for a dataset
2. Associate the dataset record with a data publishing pipeline (created directly using ODN/UnifiedViews, an ETL tool) - use one of the following approaches: create new associated pipeline created manually, associate an existing non-associated pipeline, create a new associated pipeline as modified copy of an existing pipeline, here's a quick guide on how to create a pipeline from scratch, additionaly this descriptive list of DPUs may come in handy while creating a pipeline.
  1. Specify source data for the dataset (data files, e.g. Excel spreadsheets, tables in relational database, data accessible via API etc.)
  2. Specify how the data sources should be transformed (Certain rows/columns removed/adjusted in the tabular data sources, data sources are anonymised, integrated with other data sources, cleansed etc.)
  3. Specify how the data should be published (in which formats they will be available to data consumers - CSV/RDF/... dumps, REST API, SPARQL Endpoint API)
3. Dissociate a pipeline if it is not needed for producing data resources for a dataset anymore - dissociate a pipeline
Create and publish resources for the given dataset
1. Manualy executing the publishing pipeline which ensures extraction of data sources, its transformation, and creation of output published data - create/update dataset resource(s) on demand.
2. Automated execution of publishing pipeline according to schedule set - display/manage pipeline scheduling information
3. Update ODN/InternalCatalog by ODN/Publication module with resources created for a dataset containing - create/update dataset resource(s) automatically when data processing done:
  downloadable links to newly published data in case of file dumps
  REST API and SPARQL Endpoint API interfaces for data to be accessible in more sophisticated way dedicated for automated use of data by 3rd party applications, etc.
  resource specific metadata describing the resource of the dataset
Verify and debug the publication process, make the dataset publicly available
verify the publication status after pipeline execution - display last publication status of a pipeline
tune the configuration of a pipeline - debug pipeline
make the dataset publicly available in ODN/PublicCatalog - make dataset publicly available
publish the publicly available dataset also to external catalog(s) (both metadata and data is pushed) which is defined per dataset - add external catalog for update, modify external catalog for updat e, delete external catalog for update
Create a visualization of a dataset resource accessible via SPARQL endpoint and publish the visualization - create visualization, add visualization to dataset
Display or retrieve metadata about dataset and its resource(s) (all datasets - private and public, using ODN/InternalCatalog):
display metadata about the dataset - browse catalog records
display metadata about dataset resource(s) - list available resources for a dataset
retrieve metadata about the dataset - retrieve metadata about the dataset via internal catalog API
Access published data (all datasets - private and public, using ODN/InternalCatalog)
download file dumps of published data - download latest version of file dump
retrieve published data via REST API - retrieving data from a dataset via REST API
retrieve published data via SPARQL endpoint - retrieving data from a dataset via SPARQL endpoint
Working with catalog you can
search for datasets using keywords - search for catalog records using internal catalog
filter them - catalog records filtering in internal catalog
Also you can...
modify your own user profile - modify users own user profile
export pipeline to zip file and import it to another instance of ODN - export pipeline to zip file and (sto_82v1) import pipeline without DPU JARs

Page tree

Versions Compared

Old Version 35

New Version 36

Key