You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

  1. Create new dataset record in ODN/InternalCatalog -
    1. Specify metadata of the dataset record, such as its name, description, license, topic, etc - create a new catalog record for a dataset in catalog..
    2. Associate the dataset record with a data publishing pipeline (created directly using ODN/UnifiedViews, an ETL tool) -  create new associated pipeline created manually, associate an existing non-associated pipelinecreate a new associated pipeline as modified copy of an existing pipeline.
      1. Specify sources of the dataset (data files, e.g. Excel spreadsheets, tables in relational database, data accessible via API etc.)(???).
      2. Specify how the data sources should be transformed (Certain rows/columns removed/adjusted in the tabular data sources, data sources are anonymised, integrated with other data sources, cleansed etc.)
      3. Specify how the data should be published (in which formats they will be available to data consumers - CSV/RDF/... dumps, REST API, SPARQL Endpoint API) (???)
    3. Disassociate a pipeline if it is not needed for producing data resources for a dataset anymore - (sto_78)

  2. Create and publish resources for the given dataset
    1. Manualy executing the publishing pipeline which ensures extraction of data sources, its transformation, and creation of output published data - create/update dataset resource(s) on demand.
    2. Automated execution of publishing pipeline according to schedule set - display/manage pipeline scheduling information
    3. Update ODN/InternalCatalog by ODN/Publication module with resources created for a dataset containing - create/update dataset resource(s) automatically when data processing done:
      1. downloadable links to newly published data in case of file dumps 
      2. REST API and SPARQL Endpoint API interfaces for data to be accessible in more sophisticated way dedicated for automated use of data by 3rd party applications, etc.
      3. resource specific metadata describing the resource of the dataset

  3. Verify the publication process, make the dataset publicly available
    1. verify the publication status after pipeline execution - debug pipelinedisplay last publication status of a pipeline 
    2. make the dataset publicly available in ODN/PublicCatalog - make dataset publicly available
    3. publish the publicly available dataset also to external catalog(s) (both metadata and data is pushed) which is defined per dataset - add external catalog for update, modify external catalog for update, delete external catalog for update

  4. Create a visualization of a dataset resource accessible via SPARQL endpoint and publish the visualization - add visualization to dataset

  5. Update data at any time (by re-running the data publishing pipeline). If the dataset is publicly available, it is updated automatically also in ODN/PublicCatalog and all external catalogs (if set) as well. Update may be:
    1. automated (sets a schedule for repeated execution of the pipeline) - modify external catalog for update
    2. manual (manually executes the pipeline) -  create new associated pipeline (manually)

  6. Display (using ODN/InternalCatalog):
    1. metadata of the dataset - Retrieving metadata about dataset via public catalog API
    2. metadata of dataset resource - List available resources for a dataset

  7. Access published data (using ODN/PublicCatalog)
    1. download file dumps of published data - Download latest version of public dataset file dump
    2. access published data via REST API/SPARQL Endpoint API - Retrieving public data from a dataset via REST API

  8. Display the list of datasets contained in the catalog using:
    1. GUI of the catalog (list of datasets) - List of datasets contained in public catalog (GUI)
    2. API of the catalog (list of datasets in machine readable form) - List of datasets contained in the public catalog (API)

  9. Display (using ODN/PublicCatalog):
    1. metadata of the dataset - - Retrieving metadata about dataset via public catalog API
    2. metadata of dataset resource - List available resources for a dataset

  10. Share a dataset or its resource by using social media - Social media sharing

  11. Working with public datasets, you can
    1. browse public data records - Publicly available catalog record browser
    2. search them using keywords - Publicly available catalog records search using keywords
    3. filter them - Publicly available catalog records filtering
  12. And also
    1. manage communication language to communicate with other consumers in the native language - GUI language management
    2. display available resources for a public dataset -  List of available resources for a public dataset

 

  • No labels