CSV files

You can use a CSV file (.csv) as a source.

A CSV file contains a list of fields typically separated by commas or semicolons.

Commas are commonly used in English-language files, whereas files from locales where the comma is the decimal separator, such as France, often use semicolons instead.

You can also use similar file types:

Tab-Separated Values (TSV) files (.tsv)
Text files (.txt)
DAT files (.dat)

For more information about adding a file source, see Creating a dataset from a local file.

Configuration

See here for a basic summary of configuring a data source.

Otherwise, simply note that when uploading a CSV or related file, the choices below are selected by clicking on the buttons explained in the tables below.

Here, for example, Row structure has been clicked on, allowing you to modify the field separator, if necessary.

Encoding

Description

Usage

Choice of file encoding

Encoding of the file

Character encoding is the way characters are represented in a saved file. Unicode (or UTF-8) is the universal standard, but some files might be encoded in a legacy format (for example, old versions of Excel), which would require setting the encoding manually. On modern software, this is usually unnecessary.

By default, the platform uses a heuristic to guess the encoding. If the guessed encoding is not right, select the right encoding to apply from the list or enter it in the "Other" text box. You can use any aliases from Python.

Row structure	Description	Usage
Field separator	Character used to separate fields	Enter the separator in the text box. The default value depends on the file format. Correct values are usually `';'`, `','`, `' '` and `'\t'`.
Escape character	If an escape character is found right before a separator, the latter will no longer be considered a separator. This configuration option avoids this situation.	By default, the text box is empty. If the file contains an escape character (for example, `#` or `\`), enter it in the text box.
Quoted fields	For fields which values are enclosed in double quotes.	By default, this option is toggled on. Toggle off the option if the field values are not enclosed in double quotes.

Data start point	Description	Usage
First line number	For files that do not start at the first line, it is possible to define which line is considered the first one. The lines above will be skipped from the dataset.	By default, the dataset starts at line 1. Enter the number of the line where the dataset starts.
Header	For files whose first line contains field names	By default, this option is toggled on. It makes the values of the first line field labels. Toggle off this option if the first line doesn't contain field names but data: the field labels will then be empty by default.

Extract filename	Description	Usage
Extract filename toggle	Creates a new column with the name of the source file.	By default, this option is off. Toggle on this option to extract the file name in an additional column.