1 Comment

  1. CSV is one of the most popular formats for publishing data on the web. It is concise, easy to understand by both humans and computers, and aligns nicely to the tabular nature of most data.

    But CSV is also a poor format for data. There is no mechanism within CSV to indicate the type of data in a particular column, or whether values in a particular column must be unique. It is therefore hard to validate and prone to errors such as missing values or differing data types within a column.

    The CSV on the Web Working Group has developed standard ways to express useful metadata about CSV files and other kinds of tabular data. This primer takes you through the ways in which these standards work together, covering:

    • What we mean by "tabular data" and "CSV"
    • Where files that provide metadata about CSV live
    • How to create a schema to validate the content of a CSV file
    • How to specify how a CSV file should be converted to RDF or JSON
    • How to provide other documentation and metadata about a CSV file

    Where possible, this primer links back to the normative definitions of terms and properties in the standards. Nothing in this primer overrides those normative definitions.