- Create new dataset record in ODN/InternalCatalog -
- Specify metadata of the dataset record, such as its name, description, license, topic, etc -
- Associate the dataset record with a data publishing pipeline (created directly using ODN/UnifiedViews, an ETL tool) - use one of the following aproaches: create new associated pipeline created manually, associate an existing non-associated pipeline, create a new associated pipeline as modified copy of an existing pipeline, here's a quick guide on how to create a pipeline from scratch
- Specify source data for the dataset (data files, e.g. Excel spreadsheets, tables in relational database, data accessible via API etc.)
- Specify how the data sources should be transformed (Certain rows/columns removed/adjusted in the tabular data sources, data sources are anonymised, integrated with other data sources, cleansed etc.)
- Specify how the data should be published (in which formats they will be available to data consumers - CSV/RDF/... dumps, REST API, SPARQL Endpoint API)
- Dissociate a pipeline if it is not needed for producing data resources for a dataset anymore - dissociate a pipeline
- Create and publish resources for the given dataset
- Manualy executing the publishing pipeline which ensures extraction of data sources, its transformation, and creation of output published data - create/update dataset resource(s) on demand.
- Automated execution of publishing pipeline according to schedule set - display/manage pipeline scheduling information
- Update ODN/InternalCatalog by ODN/Publication module with resources created for a dataset containing - create/update dataset resource(s) automatically when data processing done:
- downloadable links to newly published data in case of file dumps
- REST API and SPARQL Endpoint API interfaces for data to be accessible in more sophisticated way dedicated for automated use of data by 3rd party applications, etc.
- resource specific metadata describing the resource of the dataset
- Verify the publication process, make the dataset publicly available
- verify the publication status after pipeline execution - debug pipeline, display last publication status of a pipeline
- make the dataset publicly available in ODN/PublicCatalog - make dataset publicly available
- publish the publicly available dataset also to external catalog(s) (both metadata and data is pushed) which is defined per dataset - add external catalog for update, modify external catalog for update, delete external catalog for update
- Create a visualization of a dataset resource accessible via SPARQL endpoint and publish the visualization - add visualization to dataset
- Update data at any time (by re-running the data publishing pipeline). If the dataset is publicly available, it is updated automatically also in ODN/PublicCatalog and all external catalogs (if set) as well. Update may be:
- automated (sets a schedule for repeated execution of the pipeline) - modify external catalog for update
- manual (manually executes the pipeline) - create new associated pipeline (manually)
- Display (using ODN/InternalCatalog):
- metadata of the dataset - retrieve metadata about dataset via public catalog API
- metadata of dataset resource - list available resources for a dataset
- Access published data (using ODN/PublicCatalog)
- download file dumps of published data - download latest version of public dataset file dump
- access published data via REST API/SPARQL Endpoint API - retrieving public data from a dataset via REST API
- Display the list of datasets contained in the catalog using:
- GUI of the catalog (list of datasets) - list of datasets contained in public catalog (GUI)
- API of the catalog (list of datasets in machine readable form) - list of datasets contained in the public catalog (API)
- Display (using ODN/PublicCatalog):
- metadata of the dataset - - retrieve metadata about dataset via public catalog API
- metadata of dataset resource - list available resources for a dataset
- Share a dataset or its resource by using social media - social media sharing
- Working with public datasets, you can
- browse public data records - publicly available catalog record browser
- search them using keywords - publicly available catalog records search using keywords
- filter them - publicly available catalog records filtering
- And also
- manage communication language to communicate with other consumers in the native language - GUI language management
- display available resources for a public dataset - list of available resources for a public dataset