This is a scheme describing where, for what and by whom can Open Data Node (ODN) be used:

Main points:

  • Open Data Node can be deployed many times by many actors
  • Open Data Node can help with needs specific to each particular actor:
    • government organizations, municipalities, etc. want to publish majority of their information as Open Data
    • other government bodies need to work with some data published by other government bodies
    • non-profits and application developers want to run specific tasks using copies of official data, for example analytic and visualization applications, data integration, etc.

Typical use-cases and functions:

  • publisher of Open Data (typically public body)
    • integration tools for extracting data from internal systems
    • automated and repeatable data harvesting: extraction and processing (conversion, cleansing, anonymization, etc.) of data, both:
      • initial harvesting of whole datasets (first import)
      • periodical harvesting of incremental updates
    • internal storage for the data and metadata
    • data publishing in open and machine-readable formats to the general public and businesses including automated efficient distribution of updated data and metadata (dataset replication)
      • integration with data catalogues (like CKAN) for automated publication and updating of dataset medatada
    • internal data catalogue of datasets for maintenance of dataset metadata
  • user of Open Data (citizen, data analyst, etc.)
    • unlike in all other use-cases, in this case the user is merely accessing ODN instance maintained by someone else, user it not running his own instance
    • user is downloading data dumps and calling APIs to get the data which he is interested in
    • user may also access the data indirectly, for example via 3rd party data catalog, which - in order to show the user preview or visualization of data - have to first download that data
  • aggregator of Open Data (public body, NGO, SME, etc.)
    • functions same as in previous use-case plus also:
    • support for Open Data Node hierarchies for efficient dataset replication
    • automated and repeatable data integration and linking
  • application developer using Open Data (SME, NGO, etc., public body too)
    • functions same as in previous use-cases but optimized for:
    • tools for automated generation of API and custom API development

ODN hierarches:

  • when one ODN instance is harvesting data from other ODN instance, we call that ODN hierarchy (when both data publisher and user of data use ODN)
  • hierarchies can be formed:
    • spontaneously: by both side merely choosing to use same tool, without coordination and maybe even without being aware of that fact
    • on-purpose: both side deliberately coordinating and choosing ODN
  • hierarchies can be:
    • simple: just two ODN instances, one harvesting data from the other, for example company simply mirroring some public information to deliver more performance to its internal analysts
    • complicated: multiple ODN instances, many organizations, even two-way data flows, etc., for example when ODN is used to integrate together multiple information systems at different ministries
  • hierarchies can form mainly because of:
    • geography: for example aggregator harvesting datasets which relates to certain location
    • topic: for example application developer harvesting datasets related to tourism
  • how do two ODN instances cooperate?
    • as ODN is Open Source, no secret proprietary interface is employed
    • ODN is merely able to harvest data using same formats and interfaces commonly employed to publish (Open) data

    • but we're making sure that flow of data between two Open Data Nodes is tested and optimized for multiple possible scenarios (simple dataset mirroring, subset or superset republication, via file dumps, via API, etc.).

  • No labels