1. Create new dataset record in ODN/InternalCatalog and associate it with a publishing pipeline
    1. Specify metadata of the dataset record, such as its name, description, license, topic, etc - create a new catalog record for a dataset 
    2. Associate the dataset record with a data publishing pipeline (created directly using ODN/UnifiedViews, an ETL tool) - use one of the following approaches: create new associated pipeline created manuallyassociate an existing non-associated pipelinecreate a new associated pipeline as modified copy of an existing pipeline, here's a quick guide on how to create a pipeline from scratch, additionaly this descriptive list of DPUs may come in handy while creating a pipeline.
      1. Specify source data for the dataset (data files, e.g. Excel spreadsheets, tables in relational database, data accessible via API etc.)
      2. Specify how the data sources should be transformed (Certain rows/columns removed/adjusted in the tabular data sources, data sources are anonymised, integrated with other data sources, cleansed etc.)
      3. Specify how the data should be published (in which formats they will be available to data consumers - CSV/RDF/... dumps, REST API, SPARQL Endpoint API)
    3. Dissociate a pipeline if it is not needed for producing data resources for a dataset anymore - dissociate a pipeline

  2. Create and publish resources for the given dataset
    1. Manualy executing the publishing pipeline which ensures extraction of data sources, its transformation, and creation of output published data - create/update dataset resource(s) on demand.
    2. Automated execution of publishing pipeline according to schedule set - display/manage pipeline scheduling information
    3. Update ODN/InternalCatalog by ODN/Publication module with resources created for a dataset containing - create/update dataset resource(s) automatically when data processing done:
      1. downloadable links to newly published data in case of file dumps 
      2. REST API and SPARQL Endpoint API interfaces for data to be accessible in more sophisticated way dedicated for automated use of data by 3rd party applications, etc.
      3. resource specific metadata describing the resource of the dataset

  3. Verify and debug the publication process
    1. verify the publication status after pipeline execution - display last publication status of a pipeline
    2. tune the configuration of a pipeline - debug pipeline

  4. Make the dataset publicly available
    1. make the dataset publicly available in ODN/PublicCatalog - make dataset publicly available
    2. publish the publicly available dataset also to external catalog(s) (both metadata and data is pushed) which is defined per dataset - add external catalog for updatemodify external catalog for update, delete external catalog for update

  5. Create a visualization of a dataset resource accessible via SPARQL endpoint and publish the visualization - create visualizationadd visualization to dataset

  6. Display or retrieve metadata about dataset and its resource(s) (all datasets - private and public, using ODN/InternalCatalog):
    1. display metadata about the dataset - browse catalog records
    2. display metadata about dataset resource(s) - list available resources for a dataset
    3. retrieve metadata about the dataset - retrieve metadata about the dataset via internal catalog API

  7. Access published data (all datasets - private and public, using ODN/InternalCatalog)
    1. download file dumps of published data - download latest version of file dump
    2. retrieve published data via REST API - retrieving data from a dataset via REST API
    3. retrieve published data via SPARQL endpoint - retrieving data from a dataset via SPARQL endpoint

  8. Working with catalog you can 
    1. search for datasets using keywords - search for catalog records using internal catalog
    2. filter them - catalog records filtering in internal catalog

  9. Also you can...
    1. modify your own user profile - modify users own user profile
    2. export pipeline to zip file and import it to another instance of ODN - export pipeline to zip file and import pipeline without DPU JARs 
Create new dataset record in ODN/InternalCatalog and associate it with a publishing pipeline
  1. Specify metadata of the dataset record, such as its name, description, license, topic, etc - create a new catalog record for a datasetAssociate the dataset record with a data publishing pipeline (created directly using ODN/UnifiedViews, an ETL tool) - use one of the following approaches: create new associated pipeline created manuallyassociate an existing non-associated pipelinecreate a new associated pipeline as modified copy of an existing pipeline, here's a quick guide on how to create a pipeline from scratch, additionaly this descriptive list of DPUs may come in handy while creating a pipeline.
    1. Specify source data for the dataset (data files, e.g. Excel spreadsheets, tables in relational database, data accessible via API etc.)
    2. Specify how the data sources should be transformed (Certain rows/columns removed/adjusted in the tabular data sources, data sources are anonymised, integrated with other data sources, cleansed etc.)
    3. Specify how the data should be published (in which formats they will be available to data consumers - CSV/RDF/... dumps, REST API, SPARQL Endpoint API)
  2. Dissociate a pipeline if it is not needed for producing data resources for a dataset anymore - dissociate a pipeline
Create and publish resources for the given dataset

  1. Manualy executing the publishing pipeline which ensures extraction of data sources, its transformation, and creation of output published data - create/update dataset resource(s) on demand.
  2. Automated execution of publishing pipeline according to schedule set - display/manage pipeline scheduling information
  3. Update ODN/InternalCatalog by ODN/Publication module with resources created for a dataset containing - create/update dataset resource(s) automatically when data processing done:
    1. downloadable links to newly published data in case of file dumps 
    2. REST API and SPARQL Endpoint API interfaces for data to be accessible in more sophisticated way dedicated for automated use of data by 3rd party applications, etc.
    3. resource specific metadata describing the resource of the dataset
Verify and debug the publication process, make the dataset publicly available

  1. verify the publication status after pipeline execution - display last publication status of a pipeline
  2. tune the configuration of a pipeline - debug pipeline
  3. make the dataset publicly available in ODN/PublicCatalog - make dataset publicly available
  4. publish the publicly available dataset also to external catalog(s) (both metadata and data is pushed) which is defined per dataset - add external catalog for update, modify external catalog for update, delete external catalog for update
Create a visualization of a dataset resource accessible via SPARQL endpoint and publish the visualization - create visualizationadd visualization to dataset
Display or retrieve metadata about dataset and its resource(s) (all datasets - private and public, using ODN/InternalCatalog):

  1. display metadata about the dataset - browse catalog records
  2. display metadata about dataset resource(s) - list available resources for a dataset
  3. retrieve metadata about the dataset - retrieve metadata about the dataset via internal catalog API
Access published data (all datasets - private and public, using ODN/InternalCatalog)

  1. download file dumps of published data - download latest version of file dump
  2. retrieve published data via REST API - retrieving data from a dataset via REST API
  3. retrieve published data via SPARQL endpoint - retrieving data from a dataset via SPARQL endpoint
Working with catalog you can

  1. search for datasets using keywords - search for catalog records using internal catalog
  2. filter them - catalog records filtering in internal catalog
Also you can...


  1. modify your own user profile - modify users own user profile
  2. export pipeline to zip file and import it to another instance of ODN - export pipeline to zip file and import pipeline without DPU JARs
  • No labels