General Formatting
- Use ".csv" as the file extension
- File names in snake case; i.e., lowercase with underscores like "example_file_name.csv" instead of spaces
- Single header row with descriptive column names
- Include one column that can be used as the unique identifier; the data type of that column can be either text or number
- Do not include URLs in the data file
- Columns delimited by commas
- No empty lines or rows
- Each row should have the same number of columns
Spatial Data
- Latitude and longitude in separate columns with names "latitude" and "longitude" or "lat" and "lon"
- Latitude and longitude given as decimal degrees
- If a data file is not based on WGS 84 geographic coordinate system (e.g., EPSG:4326), use CoordX and CoordY as the column names
- An optional *.prj file can be used to explicitly specify the projected coordinate system
Variables and Values
- Column names in snake case; i.e., lowercase with underscores like "example_column_name" instead of spaces
- Missing data for numeric columns coded as "-9999" and uses the appropriate level of precision
- Missing data for text columns coded as "NA"
- If a text field includes commas and quotes, save the file with the option of placing quotes around all text fields
- Dates and times in 24 hour UTC; use local time zones only in addition to UTC
- Dates in YYYY-MM-DD format
- Times in hh:mm:ss format (or hh:mm:ss+nn if a time zone needs to be included)
- Named sites and locations should have an associated geographic location
- Columns should contain only text values or only numeric values, not a mixture of both. While many tools (such as Excel)
will let you mix text values and numeric values in a single column, such as in the example below, this makes the data much
more difficult to work with.
The following is an example of what you should not do -- mixing numeric and text values in a single column:
estimated_depth shrub_cover 4 30 6 > 75 4 to 5 25 5 65
See Best Practices for Data Management for additional guidance.