The metadata file has two purposes in CorrMapper. It is used in combination with the uploaded
data files to produce a highly interactive dashboard, and its columns/variables could be the
target variables for the feature selection process. It must be a coma separeted (.csv) or a
tab-delimited (.txt) file.
Each column header and sample ID needs to be unique. Spaces will be removed from column names.
At least of the sample IDs must match the sample IDs of the
uploaded dataset(s). Currently a maximum of 15 columns are
supported, any additional columns will be ignored.
For the dashboard visualisation the user must specify what kind of data each column holds. This
could be one of the following four: 'patient', 'date', 'categorical', 'continuous'. For the last
two the following abbreviations are also accepted: 'cat', 'categ', 'ca' and 'con', 'cont', 'co'.
For 'patient', 'sample' is also accepted, and for 'date', 'time' could also be used.
Only one 'date' and one 'patient' columns are accepted. The name of the columns could be chosen
freely, and they will be used as the title of graphs in the metadata explorer. See an example
metadata file below.
|Sample ID 1
|Sample ID 2
|Sample ID 3
Categorical variables will be represented as pie-charts, or as horizontal bar charts if they
contain levels with too long names. Continuous variables will be displayed as histograms. The
date variable will be automatically separated into year, month, and day of the week variables
and each of these will be displayed as an individual chart. The 'patient' variable will be
represented as pie-chart just like a categorical variable, but with broader colour map.
Dates can take any commonly used format but we recommend using the dd/mm/yyyy and mm/dd/yyyy
The patient variable is useful if there are repeated measurements per patient/sample, but
note that the resulting histograms and pie-charts in this scenario will not be meaningful
unless filtered to one of the time-points of the repeated sampling.
Categorical variables must have at least 15 samples per each
level, otherwise it will not show up in the analysis form as a target variable for feature
selection. The feature selection algorithms can handle moderately imbalanced classes but this only
works if each level has enough samples.