DPU = Data Processing Units 

for working with datasets in Unified Views module. User and admin use them to build the pipelines for executing tasks in ODN. They are the basic tool or building blocks.

 

If you would like to create a new DPU, refer to https://github.com/UnifiedViews/Plugin-DevEnv

 

List of core DPUs to work with in Unified Views

DPU will allow you to (action followed by the N° of DPU) perform requested actions with your data, datasets, files or list of files.

Basicly, DPUs are classified by their initial lettre as

E- extracting tools

T- transformation tools

L- loading tools

Briefly:

  • upload 4 /download 1, zip 22 /unzip 20, merge 8, 13, filter 6, rename 9
  • convert among formats or types of data / files 3, 5, 19
  • extract data (with following actions) 2,  10
  • transform data 14, 15, 16, 17, 18, 21
  • extract metadata 12
  • search and replace (string, patterns) 7
  • validate XML 11, 

 

 

*DPU - plugin on the data processing pipelines, which executes certain transformation, cleansing, quality assessment on top of the processed data. DPU encapsulates certain business logic needed when processing data (e.g., one DPU may extract data from a SPARQL endpoint or apply a SPARQL query). Every DPU must define its required/optional inputs and produced outputs.

 

1. E-FilesDownload

 

Downloads list of files. Replaces E-FilesFromLocal and E-HttpDownload.

NameTypeDataUnitDescription
outputoFilesDataUnitDownloaded files.

https://github.com/UnifiedViews/Plugins/tree/master/e-filesDownload

 

2. E-RelationalFromSql

Extracts data from external relational database tables into internal database.

NameTypeDataUnitDescription
outputTablesoRelationalDataUnitExtracted database tables

https://github.com/UnifiedViews/Plugins/tree/master/e-relationalFromSql

 

3. L-FilesToVirtuoso

VirtuosoLoader issues Virtuoso internal functions to load directory of RDF data.

NameTypeDataUnitDescription
TODO: provide Name, Dataunit and Description of inputi  

https://github.com/UnifiedViews/Plugins/tree/master/l-filesToVirtuoso

 

4. L-FilesUpload

Uploads list of files. Replaces L-FilesToLocalFS and L-FilesToScp.

NameTypeDataUnitDescription
filesInputiFilesDataUnitFiles to upload to specified destination.

https://github.com/UnifiedViews/Plugins/tree/master/l-filesUpload

 

5. L-RelationalToSql

Loads input internal database tables into external SQL database (currently PostgreSQL) supported.

NameTypeDataUnitDescription
inTablesDataiRelationalDataUnitInput database tables

https://github.com/UnifiedViews/Plugins/tree/master/l-relationalToSql

 

6. T-FilesFilter

Filters files.

NameTypeDataUnitDescription
inputiFilesDataUnitList of files to be filtered.
outputoFilesDataUnitList of files passing the filter.

https://github.com/UnifiedViews/Plugins/tree/master/t-filesFilter

 

7. L-FilesFindAndReplace

Finds and replaces strings (patterns) in files.

NameTypeDataUnitDescription
filesInputiFilesDataUnitInput files
filesOutputoFilesDataUnitOutput files

https://github.com/UnifiedViews/Plugins/tree/master/t-filesFindAndReplace

 

8. T-FilesMerger

Merges Files inputs.

NameTypeDataUnitDescription
filesInputiFilesDataUnitDataUnit to which user connects all inputs which has to be merged.
filesOutputoFilesDataUnitDataUnit which outputs all files from input.

https://github.com/UnifiedViews/Plugins/tree/master/t-filesMerger

 

9. T-FilesRenamer

Renames files.

NameTypeDataUnitDescription
inFilesDataiFilesDataUnitFile name to be modified.
outFilesDataoFilesDataUnitFile name after modification.

https://github.com/UnifiedViews/Plugins/tree/master/t-filesRenamer

 

10. T-FilesToRdf

Extracts RDF data from Files (any file format) and adds them to RDF.

NameTypeDataUnitDescription
filesInputiFilesDataUnitInput file containing data.
rdfOutputoRDFDataUnitRDF data extracted.

https://github.com/UnifiedViews/Plugins/tree/master/t-filesToRdf

 

11. T-FilterValidXml

Validates XML inputs in 3 ways: checks if the XML is well formed, checks if it conforms to a specified XSD scheme, validate using specified XSLT template.

NameTypeDataUnitDescription
inputiFilesDataUnitList of files to be validated.
outputValidoFilesDataUnitList of files passing the validation.
outputInalidoFilesDataUnitList of files that does not pass the validation.

https://github.com/UnifiedViews/Plugins/tree/master/t-filterValidXml

 

12. T-Metadata

Generates metadata on output from input.

NameTypeDataUnitDescription
dataiRDFDataUnitData to be described.
metadataoRDFDataUnitDescriptive data.

https://github.com/UnifiedViews/Plugins/tree/master/t-metadata

 

13. T-RdfMerger

Merges RDF data in no time.

NameTypeDataUnitDescription
rdfInputiRDFDataUnitDataUnit to which user connects all inputs which has to be merged.
rdfOutputoRDFDataUnitDataUnit which outputs all input graphs.

https://github.com/UnifiedViews/Plugins/tree/master/t-rdfMerger

 

14. T-RdfToFiles

Transforms RDF graphs into files.

NameTypeDataUnitDescription
inputiRDFDataUnitRDF graph.
outputoFilesDataUnitFile containing RDF triples.

https://github.com/UnifiedViews/Plugins/tree/master/t-rdfToFiles

 

15. T-Relational

Transforms N input tables into 1 output table using SELECT SQL queries.

NameTypeDataUnitDescription
inputTablesiRelationalDataUnitSource database tables
outputTableoRelationalDataUnitOutput (transformed) table

https://github.com/UnifiedViews/Plugins/tree/master/t-relational

 

16. T-SparqlConstruct

Transforms input using SPARQL construct.

NameTypeDataUnitDescription
inputiRDFDataUnitRDF input
outputoRDFDataUnitRDF output (transformed)

https://github.com/UnifiedViews/Plugins/tree/master/t-sparqlConstruct

 

17. T-SparqlUpdate

Transform input using SPARQL construct.

NameTypeDataUnitDescription
inputiRDFDataUnitRDF input
outputoRDFDataUnitRDF output (transformed)

https://github.com/UnifiedViews/Plugins/tree/master/t-sparqlUpdate

 

18. T-Tabular

Converts tabular data into RDF data.

NameTypeDataUnitDescription
tableiFilesDataUnitInput file containing tabular data.
triplifiedTableoRDFDataUnitRDF data.

https://github.com/UnifiedViews/Plugins/tree/master/t-tabular

 

19. T-TabularToRelational

Parses tabular file to relational data unit.

NameTypeDataUnitDescription
inputiFilesDataUnitList of files to parse.
outputoRelationalDataUnitRelational dataunit with parsed data.

https://github.com/UnifiedViews/Plugins/tree/master/t-tabularToRelational

 

20. T-UnZipper

UnZips input file into files based on zip content.

NameTypeDataUnitDescription
inputiFilesDataUnitFile to unzip.
outputoFilesDataUnitList of unzipped files.

https://github.com/UnifiedViews/Plugins/tree/master/t-unzipper

 

21. T-Xslt

Does XSL Transformation over files and outputs Files.

NameTypeDataUnitDescription
filesiFilesDataUnitFile to be transformed.
filesoFilesDataUnitTransformed file of given type.
configiRDFDataUnitConfiguration (template parameters).

https://github.com/UnifiedViews/Plugins/tree/master/t-xslt

 

22. T-Zipper

Zips input files into zip file of given name.

inputiFilesDataUnitList of files to zip.
outputoFilesDataUnitName of zip file.

https://github.com/UnifiedViews/Plugins/tree/master/t-zipper