Data files must have numeric values. Any column that has a non-numeric value or values which cannot be coerced to float type will be discarded. Feature and sample IDs must all be unique, duplicates will be removed. Sample IDs should also match the metadata file's sample IDs, because only the intersecting samples are used for feature selection. See an example data table below:

Feature ID 1 Feature ID 2 Feature ID 3 Feature ID 4 ...
Sample ID 1 0.0032008 0.0227951 0.0167264 0.0227934 ...
Sample ID 2 0.5020852 0.0227953 0.2117264 0.1232279 ...
Sample ID 3 0.3811255 0.4205265 0.1020123 0.1235234 ...
... ... ... ... ... ...
Only coma separeted (.csv) and tab-delimited (.txt) files are accepted for upload, and data files cannot exceed 100 MB. Furthermore, at the moment we cannot support data files with more than 500 samples, or more than 25000 features. Also, data files must have at least 10 features.

Finally, if you upload two data files, they must have at least intersecting samples. If you also upload a metadata file, than the three files together must satisfy this requirement.

All platform specific preprocessing should be done before uploading the data files, as they will be used in the CorrMapper pipeline as they are.
The metadata file has two purposes in CorrMapper. It is used in combination with the uploaded data files to produce a highly interactive dashboard, and its columns/variables could be the target variables for the feature selection process. It must be a coma separeted (.csv) or a tab-delimited (.txt) file.

Each column header and sample ID needs to be unique. Spaces will be removed from column names. At least of the sample IDs must match the sample IDs of the uploaded dataset(s). Currently a maximum of 15 columns are supported, any additional columns will be ignored.

For the dashboard visualisation the user must specify what kind of data each column holds. This could be one of the following four: 'patient', 'date', 'categorical', 'continuous'. For the last two the following abbreviations are also accepted: 'cat', 'categ', 'ca' and 'con', 'cont', 'co'. For 'patient', 'sample' is also accepted, and for 'date', 'time' could also be used. Only one 'date' and one 'patient' columns are accepted. The name of the columns could be chosen freely, and they will be used as the title of graphs in the metadata explorer. See an example metadata file below.

Patient DateOfDiagnosis DeadOfDisease Gender AgeAtDiagnosis ...
patient date categorical categorical continuous ...
Sample ID 1 1 21/09/2011 yes Male 62 ...
Sample ID 2 2 04/03/2009 no Female 72 ...
Sample ID 3 3 13/12/2011 no Male 42 ...
... ... ... ... ... ... ...
Categorical variables will be represented as pie-charts, or as horizontal bar charts if they contain levels with too long names. Continuous variables will be displayed as histograms. The date variable will be automatically separated into year, month, and day of the week variables and each of these will be displayed as an individual chart. The 'patient' variable will be represented as pie-chart just like a categorical variable, but with broader colour map.

Dates can take any commonly used format but we recommend using the dd/mm/yyyy and mm/dd/yyyy formats.

The patient variable is useful if there are repeated measurements per patient/sample, but note that the resulting histograms and pie-charts in this scenario will not be meaningful unless filtered to one of the time-points of the repeated sampling.

Categorical variables must have at least 15 samples per each level, otherwise it will not show up in the analysis form as a target variable for feature selection. The feature selection algorithms can handle moderately imbalanced classes but this only works if each level has enough samples.
If you have a genomic study (meaning your features are any kind of genomic element (GE)) you must provide an annotation file for each of your datasets. These annotation files must be a coma separeted (.csv) or a tab-delimited (.txt) file listing the genomic position of each GE. These files are needed because CorrMapper divides each species' genome into 300 equal-length bins to provide a high-level overview of the correlations between the different regions of the genome.

The first column must hold the unique IDs of your features in the data files. Any feature in the data file that does not have a row of annotation will be discarded from the data file and vice versa.

The second column must hold the name of the chromosome the given GE is located on. CorrMapper uses the chromosome names given by the UCSC genome browser for the given species.

The third and fourth columns must hold the start and end positions of the given GE respectively. They both must be integers and the start has to be smaller than the end. Also the end needs to be smaller than the length of the chromosome, while the start has to be larger than one. The name of the first four columns can be anything as long as the values of the columns respect the above format. See an example metadata file below.

ProbeID Chromosome Start End Additional feature ID 1 Additional feature ID 2 ...
GE probe 1 1 10471224 10638510 xyz xyz ...
GE probe 2 13 54327729 54519952 xyz xyz ...
GE probe 3 X 64454707 64613450 xyz xyz ...
... ... ... ... ... ... ...
Any number of additional information regarding the GEs could be used in subsequent columns of the table. They will be displayed in the genomic correlation network explorer. These additional columns will also be sortable and searchable which can greatly facilitate the interpretation of the resulting correlation networks.
CorrMapper's flowchart CorrMapper's pipeline