Batch Upload: uploading large quantities of data

Introduction

Existing manta research projects which already have an existing offline database of manta encounters may wish to perform a “batch upload” of the data to reduce data entry processing time. This feature allows many encounters to be uploaded to the MantaMatcher website, instead of having to manually enter the data one-by-one using the encounter submission page. However, the feature requires pre-processing of the data ready for importing, but can help to reduce the time needed to get the data into MantaMatcher. It's only possible to upload data for regions for which you have permission to edit data, and is only available to regional managers/administrators.

Requirements:

  • Ability to create comma-separated variable (CSV) text files.
  • Internet-accessible server able to host any media files (photos/videos) for upload.

Data format

Data to be imported via the Batch Upload module must be prepared as several “comma-separated variable” (CSV) text files. Simple template CSV files can be downloaded from the Batch Upload page, each comprising a single line of comma-separated data column headers, each representing data to be entered. The same page details which of these columns are mandatory. Data for non-mandatory columns can either be left blank, or the columns may be removed if preferred. These files provides a simple guide for exporting data from your own database.

  • Individuals
  • Encounters
  • Measurements
  • Media (photo/videos)
  • Samples

Mandatory files.

It's important to ensure all data is entered as Unicode (UTF-8) text to be interpreted correctly. Data using only standard English characters is already compliant, but data using accented, unusual, or non-Roman-alphabet characters must be checked to ensure compliance. Care should be taken if simply exporting data to CSV format from spreadsheets, as some do not provide such compliance (e.g. Microsoft Excel), and the CSV files may need editing prior to use.

Batch upload allows you to upload large quantities of new data in a few steps, making it very useful to import large data sets. It only supports importing new data, not modifying existing data. At minimum it imports from two data files, one for individuals, the other for encounters, but it can also import associated measurements, media (photographs/videos) and samples (e.g. tissue biopsies). It is recommended to test the functionality with a small amount of data initially to understand how it operates, and to verify that it performs as expected.
Note: you must have permission for the region into which you are importing data.

Data relationships

Individuals to be created which are not already in MantaMatcher should be specified in the Individuals data file. Encounters to assign to such Individuals should then use the relevant IndividualID to create the association. New encounters for existing individuals in MantaMatcher should have no Individual entry, but the encounters must use the existing IndividualID.

Similarly, Measurements/Media/Samples are associated with a specific Encounter during upload by specifying the association via the EncounterID field.
Note: This field is not used by MantaMatcher, and only exists to establish the relationship in the data files.

Data values

Some data columns have restrictions on the possible values they can take. The Batch Upload page summarizes these restrictions for each of the relevant columns. Data values are validated as part of the upload process, and users are notified in case of invalid data.

Note: Some data values do not have integrity checking for good reason. For example, it might seem sensible to ensure that an Individual listed as female has only Encounters which are female. However, it might be that for some encounters the sex couldn't be determined, so it would make sense to list those as unknown. Such data integrity checking is the responsibility of the data preparer.

Media files

Perhaps the most challenging part of the Batch Upload process is making the media files available for upload. Each media item is specified in the respective CSV file, but the URL (web address) to which it points must be able to deliver that item (photo/video). If you already have a web server available, this is the simplest solution, and they can be placed there before the upload is processed. You are strongly encouraged to manually test some of your links before trying to upload, to ensure the media files are accessible as expected.

If the media files must be made available from a personal computer, you are strongly encouraged to seek appropriate technical help. A good option in such an instance would be to setup a local web server on the personal computer, configure the internet router appropriately for port-forwarding (and possibly also use a “dynamic Domain Name Service”) to assist in locating the machine on the internet.

Processing the media files is also by far the most time-consuming part of the data upload procedure, and the server holding the media files for upload must remain available for the full time of the upload. For this reason, if you have a very large dataset, you may prefer to perform the upload in several separate batches, particularly if this server is located somewhere with an unstable internet connection, to minimize the chances of an upload failure.

Example CSV files

The following CSV data files demonstrate how to import three new encounters (one of which is for an existing individual):

batchIndividuals.csv
"Individual ID","Alternate ID","Sex","Nickname","Nicknamer","Comments","Series Code","Dynamic Properties","Patterning Code","Interested Researchers","Data Files"
"Moz002B",,"female","Donut","Jo McNormal",,,"|Pigmentation=normal","Normal",,
"Moz003A",,"unknown",,,,,"|Pigmentation=leucistic","Leucistic",,
  • The first individual is a named female with normal pigmentation, assigned an ID of “Moz002B”.
  • The second individual is an unnamed, leucistic individual, with unknown sex, assigned an ID of “Moz003A”.
batchEncounters.csv
"Encounter ID","Individual ID","Date","Time","Sex","Genus","Species","Latitude","Longitude","Location ID","Locality","Max.Depth (m)","Elevation (m)","Living Status","Life Stage","Release Date","Size (m)","Size Guess (m)","Distinguishing Scars","Other Catalog Numbers","Occurrence ID","Occurrence Remarks","Behaviour","Dynamic Properties","ID Remarks","Researcher Comments","Emails to Inform","Submitter Organisation","Submitter Project","Submitter Name","Submitter Email","Submitter Address","Submitter Phone","Photographer Name","Photographer Email","Photographer Phone","Photographer Address","Interested Researchers","TapirLink?"
"1","Moz001B","2012-02-28",,"female","Manta","birostris","-1.xxx","-81.xxx","Mozambique","Manta Reef, Tofo, Inhambane, Mozambique",,,"alive",,"2012-02-28",,,,,,,,"|Pigmentation=normal","Visual inspection",,,,"Marine Megafauna Foundation","Jo McNormal","JoMcNormal@marinemegafauna.org",,,,,,,,"false"
"2","Moz002A","2013-04-01",,"female","Manta","alfredi","-1.xxx","-81.xxx","Mozambique","Giant's Castle, Tofo, Inhambane, Mozambique",,,"alive",,"2013-04-01",,,,,,,,"|Pigmentation=normal","Visual inspection",,,,"Marine Megafauna Foundation","Jo McNormal","JoMcNormal@marinemegafauna.org",,,,,,,,"false"
"3","Moz003B","2014-07-07",,"male","Manta","birostris","-1.xxx","-81.xxx","Mozambique","Sherwood, Tofo, Inhambane, Mozambique",,,"alive",,"2014-07-07",,,,,,,,"|Pigmentation=normal","Visual inspection",,,,"Marine Megafauna Foundation","Jo McNormal","JoMcNormal@marinemegafauna.org",,,,,,,,"false"
  • The first encounter is assigned to an Individual that must already exist in MantaMatcher with the IndividualID “Moz001B”. It occurred on 28th February 2012 at Manta Reef, and is to be submitted for the Mozambique region. Jo McNormal is the data preparer/submitter.
  • The second encounter is a reef manta assigned the IndividualID “Moz002A”.
batchMeasurements.csv
"Encounter ID","Type","Measurement","Units","Protocol"
"1","disc width","4.45","meters","directly measured"
"2","disc width","4.30","meters","directly measured"
  • The first measurement is for EncounterID 1 (in the batchEncounters.csv file), and represents a directly measured disc width of 4.45 meters.
  • The second measurement is for EncounterID 2, and represents a directly measured disc width of 43.0 meters.
batchMedia.csv
"Encounter ID","Media URL","Copyright Owner","Copyright Statement","Keywords"
"1","http://megafauna.dyndns.com/mantas/001/20140328-001-Id.jpg","Jo McNormal",,"|Normal colouration"
  • An image available at the web address (http://megafauna.dyndns.com/mantas/001/20140328-001-Id.jpg) is for EncounterID 1, was taken by Jo McNormal, and has an assigned keyword of “Normal colouration”.
  • To check the availability of this image for batch upload, this web address could be entered into the address bar of a web browser as a check.
batchSamples.csv
"Encounter ID","Tissue Type","Sample ID","Alternate ID","Preservation Method","Storage Lab","Sampling Protocol","Sampling Effort","Field Number","Field Notes","Remarks","Institution ID","Collection ID","Dataset ID","Institution Code","Collection Code","Dataset Name"
"1","Tissue sample","Moz-T-01",,"ethanol",,,,,"Sampler: Jo","Used for DNA, fatty acid, stable isotope.",,"1","Moz-2014",,,"Moz-2014"
"1","Mucus sample","Moz-M-02",,"frozen",,,,,"Sampler: Sandy","Dive 2: Manta heaven",,"2","Moz-2014",,,"Moz-2014"
  • A tissue sample for EncounterID 1 (with sample ID: Moz-T-01).
  • A mucus sample for EncounterID 1 (with sample ID: Moz-M-01).