7. Deployment

Firstly, we describe basic deployment scenarios of Open Data Node, considering Open Data Node as a single unit. Afterwards, we discuss deployment of ODN’s modules.

7.1. Basic ODN Deployment Scenarios

Open Data Node can be deployed many times by many actors. ODN can help with needs specific to each particular actor:

  • government organizations, municipalities, etc., want to publish majority of their information as Open Data

  • other government bodies need to work with some data published by other government bodies

  • non-profits and application developers want to run specific tasks using copies of official data, for example analytic and visualization applications, data integration, etc.

 

This section contains schemes for basic deployment options for ODN. Options are sorted based on achievable publishing quality from best (and most expensive) at the top to worst (and cheapest) at the bottom.

7.1.1. Tight integration, at the publisher's premises


Open Data Node is tightly integrated with publisher's internal application(s) - it has direct access to backend databases or is integrated into application(s) workflows via API. ODN is deployed alongside publisher's internal application(s), it has to respect network, security and other zones.

Typical scenario: 

  • publisher wants to achieve high quality and efficiency and is willing to invest more

Prerequisites:

  • publisher is willing to update existing workflows and applications

7.1.2. Loose integration


Open Data Node is integrated with publisher's application(s) in a loose way using some periodical data dumps or APIs. ODN can be deployed in several locations:

  • at publisher's premises - access to the data is secured in a similar way as in the case of tight integration in Section 6.1.1.

  • at collocated housing, for example data center shared by multiple government organizations - access to the source data secured for example using combination of IPsec, HTTPS and access controls (authentication and authorization)

  • in the cloud - access to the source data secured with just HTTPS and access control (not suitable for sensitive data or sensitive internal systems)

Typical scenario:

  • publisher wants to achieve high quality and efficiency but has limited resources so changes to existing infrastructure and applications have to be limited

  • aggregator is affiliated with one or more publishers and is willing and able to invest into tighter integration with them (ministry aggregating data from municipalities or SME planning to make business using aggregated data)

Prerequisites:

  • publisher is able to do minor modifications to existing workflows and applications

7.1.3. No integration, deployed at 3rd parties


Open Data Node is not explicitly integrated with publisher's applications, other existing means are used to get access to the data (either Open Data or other format or API, at worst case scrapping of data from website). ODN is deployed at 3rd party using their own hardware, collocated housing, or Cloud.

Typical scenario:

  • publisher wants to publish Open Data but has severely limited resources or options so changes to existing infrastructure and applications have to be preferably none

  • 3rd party aggregator or application developer wants to use data from one or more publishers but for some reason is not able or willing to implement tighter integration with them

Prerequisites:

  • some usable form of access to source data is possible without changing existing workflows and applications



7.2. Deployment of ODN Modules

7.2.1. Single Machine

ODN modules may be deployed all on one machine as depicted below. The deployment requires:

  • Application Server, which supports Java 7 EE applications. ODN will be tested on Apache Tomcat 7+. Application server is needed for management GUIs of ODN modules.

  • Relational database management system (RDBMS) - Relational database is used mainly for storing configurations of the modules (definitions of data transformation and publishing tasks, configuration of data catalogs, etc.); in these cases, relational database is preferred, because the schema of the configurations is known in advance and should not change much in the future. Relational database is also used by ODN/Storage to store tabular data produced by ODN/UnifiedViews. Decision is still pending about the particular database management system we will use, however, due to object relational mapping frameworks, which abstract the underlying database system, changing the database system during the works on the project is easy and straightforward.

  • RDF Storage - We may use any RDF store, which is supported by openRDF API (http://www.openrdf.org/), e.g., Openlink Virtuoso (http://virtuoso.openlinksw.com/rdf-quad-store/) or Sesame (http://www.openrdf.org/). RDF store is used by ODN/UnifiedViews to store intermediate results of the data transformations and also by ODN/Storage to store the RDF data produced by ODN/UnifiedViews.

  • HTTP Server - ODN will be tested with Apache HTTP Server. HTTP Server is required for ODN/InternalCatalog, ODN/PublicCatalog and ODN/Publication.

odn-deploy-single.png

 

7.2.2. Distributed Environment

ODN modules support distributed deployment to more physical devices. Basically every artifact depicted in Figure above may be deployed on a different device. If ODN/UnifiedViews - Engine and ODN/UnifiedViews - Management GUI is placed on different devices, administrator has to set up shared network file system both these artifacts may use.

Typically, we expect such deployment in large organizations with more complicated IT architecture driven by (among other things) security requirements. In this case:

  • ODN/UnifiedViews is expected to be deployed withing internal, restricted segment along with ODN/Management,

  • ODN/Publication is expected to be in DMZ segment accessible from the outside by general public (like a typical webserver) and

  • ODN/Storage is expected to be in internal or other appropriate segment, reachable by both ODN/UnifiedViews and ODN/Publication

7.2.3. Custom Environment

As the design of ODN is modular (driven in particular by our vision, plans, engineering experience, etc., but in general also by best practises in software development and Open Source development) individual user will have possibility to not just move the individual modules between multiple machines, but also to skip (do not deploy) certain modules, if they do not need them.

For example, ODN/InternalCatalog and ODN/PublicCatalog may be skipped, in exchange for direct integration between ODN/UnifiedViews and ODN/Publication with the particular national data catalog.

Note: Given proper modular design and implementation, exact options and combinations are wide and depend strongly on particular user and his needs and use cases, so we are not able to (and will not) document all possible options here. Some most common cases will be later explained in ODN documentation and COMSODE Methodology.

  • No labels