Data file format

This section provides general rules for file formatting and examples for the SMARTe risk assessment tools: site characterization data anlaysis; monitoring data analysis; and, human health risk calculator (under development) Conforming to the rules can be accomplished using any spreadsheet program.
General Rules
  • The file should be a tab-delimited text file. A spreadsheet file can be saved as a tab-delimited text file using the 'File -> Save As' function.
  • Data should be stored as one observation per row (multiple fields in a row). This means that all of the identifiers for a sample should be contained in a row of the data file.
  • The observations must be uninterrupted, i.e., no carriage returns within an observation.
  • Avoid use of special characters, e.g., ', ", #, &. In most cases, it is best to replace special characters with a '.'
  • Nondetects can be identified by either coding nondetects as negative values in the result column or having a detect column in the data where 'T' corresponds to a detect and 'F' corresponds to a nondetect (see examples below).
  • Replace empty cells with 'NA' (see examples below).
Examples
The example data sets presented below provide data formatting examples for each assessment tool.
1. Layout of a typical environmental site characterization data set. For site characterization data analysis, it is usually desirable to subset the data using the factors ‘analyte’ and ‘siteid’ – The SMARTe site characterization data analysis tool provides an interface to subset the data by analyte or siteid. If spatial coordinates are included in the data set then some spatial plots can be implemented in SMARTe. Summary statistics, exploratory data analysis (plots), hypothesis tests, and confidence intervals can also be implemented. If non-detects are identified in the data set, then they can be accounted for in the data anlaysis. Note that this would also be the typical lay-out of a data set for the human health risk calculator.
siteid
analyte
concentration
detectflag
x.coord
y.coord
BKG
Arsenic
21.6
T
418550.9
3891761
BKG
Arsenic
34.2
T
NA
NA
BKG
Copper
30.7
T
418882.1
3894504
BKG
Copper
48.5
T
NA
NA
BKG
Lead
7
T
418550.9
3891761
BKG
Lead
0.15
F
418550.9
3891761
ND02
Arsenic
14.9
T
421301.17
3892240.68
ND02
Arsenic
6.6
T
421301.17
3892240.68
ND02
Copper
19.3
T
421301.17
3892240.68
ND02
Copper
12.9
T
421301.17
3892240.68
ND02
Lead
15
T
421301.17
3892240.68
ND02
Lead
1.1
T
421301.17
3892240.68
ND11B
Arsenic
0.3
F
423498.61
3897576.27
ND11B
Arsenic
1.7
T
423488.98
3897581.71
ND11B
Copper
2.5
F
423381.78
3897488.5
ND11B
Copper
9.41
T
423461
3897557
ND11B
Lead
0.1
F
423443.83
3897578.28
ND11B
Lead
26
T
423443.83
3897578.28
2. Layout for a typical environmental monitoring data set. For environmental monitoring data analysis, it is usually desirable to subset the data using the factors 'site' and 'analyte' - the monitoring data anlaysis tool provides an interface for identifying subsetting factors and values. Analyzing temporal trends in monitoring data requires a temporal value for each observation - this is the date column in the table below. Sampling times can also be provided in a time column. At least one of date or time must be provided for the analysis of temporal data. Analyzing spatial trends in monitoring data requires spatial coordinates for each observation - these are the x.coord and y.coord columns in the table below.
The format of the date column must be one of 'd/m/y', 'd-m-y', 'y/m/d', 'y-m-d', 'month day year', or 'day month year.' An example of each is '4/30/70', '4-30-70', '70/4/30', '70-4-30', 'April 30 1970', and '30 April 1970.' The format of the time column must be 'h:m:s.'
siteid
analyte
date
concentration
detectflag
x.coord
y.coord
1.1
Arsenic
1/5/04
12.3
T
0
0
1.1
Arsenic
4/5/04
18.2
T
0
0
1.1
Arsenic
7/5/04
20.9
T
0
0
1.1
Copper
1/5/04
1125
T
0
0
1.1
Copper
4/5/04
1208
T
0
0
1.1
Copper
7/5/04
1515
T
0
0
1.2
Arsenic
1/5/04
4.5
T
52
83
1.2
Arsenic
4/5/04
8.7
T
52
83
1.2
Arsenic
7/5/04
8.2
T
52
83
1.2
Copper
1/5/04
352
T
52
83
1.2
Copper
4/5/04
315
T
52
83
1.2
Copper
7/5/04
426
T
52
83
1.3
Arsenic
1/5/04
0.3
F
160
134
1.3
Arsenic
4/5/04
2.2
T
160
134
1.3
Arsenic
7/5/04
2
T
160
134
1.3
Copper
1/5/04
83
T
160
134
1.3
Copper
4/5/04
92
T
160
134
1.3
Copper
7/5/04
75
T
160
134
1.4
Arsenic
1/5/04
0.3
F
287
171
1.4
Arsenic
4/5/04
0.3
F
287
171
1.4
Arsenic
7/5/04
1.2
T
287
171
1.4
Copper
1/5/04
15
T
287
171
1.4
Copper
4/5/04
12.9
T
287
171
1.4
Copper
7/5/04
14.4
T
287
171