Create new dataset record in ODN/InternalCatalog and associate it with a publishing pipeline |
- Specify metadata of the dataset record, such as its name, description, license, topic, etc - create a new catalog record for a datasetAssociate the dataset record with a data publishing pipeline (created directly using ODN/UnifiedViews, an ETL tool) - use one of the following approaches: create new associated pipeline created manually, associate an existing non-associated pipeline, create a new associated pipeline as modified copy of an existing pipeline, here's a quick guide on how to create a pipeline from scratch, additionaly this descriptive list of DPUs may come in handy while creating a pipeline.
- Specify source data for the dataset (data files, e.g. Excel spreadsheets, tables in relational database, data accessible via API etc.)
- Specify how the data sources should be transformed (Certain rows/columns removed/adjusted in the tabular data sources, data sources are anonymised, integrated with other data sources, cleansed etc.)
- Specify how the data should be published (in which formats they will be available to data consumers - CSV/RDF/... dumps, REST API, SPARQL Endpoint API)
- Dissociate a pipeline if it is not needed for producing data resources for a dataset anymore - dissociate a pipeline
|
Create and publish resources for the given dataset |
- Manualy executing the publishing pipeline which ensures extraction of data sources, its transformation, and creation of output published data - create/update dataset resource(s) on demand.
- Automated execution of publishing pipeline according to schedule set - display/manage pipeline scheduling information
- Update ODN/InternalCatalog by ODN/Publication module with resources created for a dataset containing - create/update dataset resource(s) automatically when data processing done:
- downloadable links to newly published data in case of file dumps
- REST API and SPARQL Endpoint API interfaces for data to be accessible in more sophisticated way dedicated for automated use of data by 3rd party applications, etc.
- resource specific metadata describing the resource of the dataset
|
Verify and debug the publication process, make the dataset publicly available |
- verify the publication status after pipeline execution - display last publication status of a pipeline
- tune the configuration of a pipeline - debug pipeline
- make the dataset publicly available in ODN/PublicCatalog - make dataset publicly available
- publish the publicly available dataset also to external catalog(s) (both metadata and data is pushed) which is defined per dataset - add external catalog for update, modify external catalog for update, delete external catalog for update
|
Create a visualization of a dataset resource accessible via SPARQL endpoint and publish the visualization - create visualization, add visualization to dataset |
Display or retrieve metadata about dataset and its resource(s) (all datasets - private and public, using ODN/InternalCatalog): |
- display metadata about the dataset - browse catalog records
- display metadata about dataset resource(s) - list available resources for a dataset
- retrieve metadata about the dataset - retrieve metadata about the dataset via internal catalog API
|
Access published data (all datasets - private and public, using ODN/InternalCatalog) |
- download file dumps of published data - download latest version of file dump
- retrieve published data via REST API - retrieving data from a dataset via REST API
- retrieve published data via SPARQL endpoint - retrieving data from a dataset via SPARQL endpoint
|
Working with catalog you can
|
- search for datasets using keywords - search for catalog records using internal catalog
- filter them - catalog records filtering in internal catalog
|
Also you can... |
- modify your own user profile - modify users own user profile
- export pipeline to zip file and import it to another instance of ODN - export pipeline to zip file and import pipeline without DPU JARs
|