simDataSet can be used to conveniently and quickly simulate a dataset that satisfies certain constraints, such as a specific correlation structure, means, ranges of the items, and measurement levels of the variables. Note that the results are approximate; mvrnorm is used to generate the correlation matrix, but the factor are only created after that, so cutting the variable into factors may change the correlations a bit.

- n
Number of requires cases (records, entries, participants, rows) in the final dataset.

- varNames
Names of the variables in a vector; note that the length of this vector will determine the number of variables simulated.

- correlations
The correlations between the variables are randomly sampled from this range using the uniform distribution; this way, it's easy to have a relatively 'messy' correlation matrix without the need to specify every correlation manually.

- specifiedCorrelations
The correlations that have to have a specific value can be specified here, as a list of vectors, where each vector's first two elements specify variables names, and the last one the correlation between those two variables. Note that tweaking the correlations may take some time; the

`MASS::mvrnorm()`

function will complain that "'Sigma' is not positive definite", or in other words, you supplied a combination of correlations that can't exist simultaneously, if you get it wrong.- means, sds
The means and standard deviations of the variables. Note that is you set

`ranges`

for one or more variables (see below), those ranges are used to rescale those variables, overriding any specified means and standard deviations. If only one mean or standard deviation is supplied, it's recycled along the variables.- ranges
The desired ranges of the variables, supplied as a named list where the name of each element corresponds to a variable. The

`scales::rescale()`

function will be used to rescale those variables for which a desired scale is specified here. Note that for those variables, the means and standard deviations will be determined by these new ranges.- factors
A vector of variable names that should be converted into factors (using

`base::cut()`

). Make sure to specify lists for`cuts`

and`labels`

as well (of the same length).- cuts
A list of vectors that specify, for each factor, where to 'cut' the numeric vector into factor levels.

- labels
A list of vectors that specify, for each factor, and for each level, the labels that should be assigned to the factor levels. Each vector in this list has to have one more element than each vector in the

`cuts`

list.- seed
The seed to use when generating the dataset (to make sure the exact same dataset can be generated repeatedly).

- empirical
Whether to generate the data using the exact

`empirical = TRUE`

or approximate (`empirical = FALSE`

) correlation matrix; this is passed on to`MASS::mvrnorm()`

.- silent
Whether to show intermediate and final descriptive information (correlation and covariance matrices as well as summaries).

The generated dataframe is returned invisibly.

This function was intended to allow relatively quick generation of datasets
that satisfy specific constraints, e.g. including a number of factors,
variables with a specified minimum and maximum value or specified means and
standard deviations, and of course specific correlations. Because all
correlations except those specified are randomly generated from a uniform
distribution, it's quite convenient to generate messy kind of real looking
datasets quickly. Note that it's mostly a convenience function, and datasets
will still require tweaking; for example, factors are simply numeric vectors
that are `cut()`

*after* `MASS::mvrnorm()`

generated the data,
so the associations will change slightly.

```
dat <- simDataSet(
500,
varNames=c('age',
'sex',
'educationLevel',
'negativeLifeEventsInPast10Years',
'problemCoping',
'emotionCoping',
'resilience',
'depression'),
means = c(40,
0,
0,
5,
3.5,
3.5,
3.5,
3.5),
sds = c(10,
1,
1,
1.5,
1.5,
1.5,
1.5,
1.5),
specifiedCorrelations =
list(c('problemCoping', 'emotionCoping', -.5),
c('problemCoping', 'resilience', .5),
c('problemCoping', 'depression', -.4),
c('depression', 'emotionCoping', .6),
c('depression', 'resilience', -.3)),
ranges = list(age = c(18, 54),
negativeLifeEventsInPast10Years = c(0,8),
problemCoping = c(1, 7),
emotionCoping = c(1, 7)),
factors=c("sex", "educationLevel"),
cuts=list(c(0),
c(-.5, .5)),
labels=list(c('female', 'male'),
c('lower', 'middle', 'higher')),
silent=FALSE);
#> Correlation matrix that will be used for the simulation:
#> 1. 2. 3. 4. 5. 6. 7. 8.
#> 1. age 1.00 0.22 0.11 0.23 0.17 0.10 0.35 0.29
#> 2. sex 0.38 1.00 0.32 0.21 0.10 0.13 0.21 0.14
#> 3. educationLevel 0.36 0.15 1.00 0.22 0.39 0.39 0.10 0.27
#> 4. negativeLifeEventsInPast10Years 0.12 0.32 0.20 1.00 0.14 0.37 0.20 0.23
#> 5. problemCoping 0.30 0.25 0.28 0.20 1.00 -0.50 0.50 -0.40
#> 6. emotionCoping 0.37 0.30 0.29 0.18 -0.50 1.00 0.21 0.60
#> 7. resilience 0.16 0.17 0.25 0.14 0.50 0.17 1.00 -0.30
#> 8. depression 0.27 0.31 0.15 0.18 -0.40 0.60 -0.30 1.00
#>
#> Covariance matrix that will be used for the simulation:
#> 1. 2. 3. 4. 5. 6. 7. 8.
#> 1. age 100.0 2.16 1.15 3.40 2.58 1.57 5.23 4.36
#> 2. sex 3.8 1.00 0.32 0.32 0.15 0.19 0.32 0.21
#> 3. educationLevel 3.6 0.15 1.00 0.33 0.59 0.59 0.15 0.40
#> 4. negativeLifeEventsInPast10Years 1.8 0.48 0.31 2.25 0.32 0.82 0.45 0.52
#> 5. problemCoping 4.5 0.38 0.42 0.45 2.25 -1.12 1.12 -0.90
#> 6. emotionCoping 5.5 0.45 0.44 0.40 -1.12 2.25 0.46 1.35
#> 7. resilience 2.4 0.26 0.38 0.31 1.12 0.38 2.25 -0.67
#> 8. depression 4.1 0.47 0.23 0.40 -0.90 1.35 -0.67 2.25
#>
#> Correlation matrix that was simulated based on this covariance matrix:
#> 1. 2. 3. 4. 5. 6. 7.
#> 1. age 1.00 0.306 0.333 0.12 0.30 0.37 0.16
#> 2. sex 0.31 1.000 0.097 0.22 0.16 0.29 0.13
#> 3. educationLevel 0.33 0.097 1.000 0.17 0.27 0.24 0.20
#> 4. negativeLifeEventsInPast10Years 0.12 0.223 0.173 1.00 0.20 0.18 0.14
#> 5. problemCoping 0.30 0.161 0.266 0.20 1.00 -0.50 0.50
#> 6. emotionCoping 0.37 0.285 0.237 0.18 -0.50 1.00 0.17
#> 7. resilience 0.16 0.132 0.205 0.14 0.50 0.17 1.00
#> 8. depression 0.27 0.278 0.102 0.18 -0.40 0.60 -0.30
#> 8.
#> 1. age 0.27
#> 2. sex 0.28
#> 3. educationLevel 0.10
#> 4. negativeLifeEventsInPast10Years 0.18
#> 5. problemCoping -0.40
#> 6. emotionCoping 0.60
#> 7. resilience -0.30
#> 8. depression 1.00
#>
#> Summaries:
#> age sex educationLevel negativeLifeEventsInPast10Years
#> Min. :18.00 female:251 lower :142 Min. :0.000
#> 1st Qu.:33.10 male :249 middle:209 1st Qu.:3.265
#> Median :37.00 higher:149 Median :4.222
#> Mean :37.09 Mean :4.163
#> 3rd Qu.:41.28 3rd Qu.:5.045
#> Max. :54.00 Max. :8.000
#> problemCoping emotionCoping resilience depression
#> Min. :1.000 Min. :1.000 Min. :-0.3823 Min. :-0.7149
#> 1st Qu.:3.550 1st Qu.:3.313 1st Qu.: 2.4355 1st Qu.: 2.4018
#> Median :4.155 Median :3.983 Median : 3.4590 Median : 3.5297
#> Mean :4.196 Mean :3.959 Mean : 3.5000 Mean : 3.5000
#> 3rd Qu.:4.862 3rd Qu.:4.638 3rd Qu.: 4.5196 3rd Qu.: 4.4772
#> Max. :7.000 Max. :7.000 Max. : 8.6728 Max. : 8.0712
```