These functions first import data from a 'data format', such as spreadsheets in .xlsx format, comma-separated values files (.csv), or SPSS data files (.sav). You can also just use R data frames (imported however you want). These functions then use the columns you specified to convert these data to one (oneFile=TRUE) or more (oneFile=FALSE) rock source file(s), optionally including class instance identifiers (such as case identifiers to identify participants, or location identifiers, or moment identifiers, etc) and using those to link the utterances to attributes from columns you specified. You can also precode the utterances with codes you specify (if you ever would want to for some reason).

convert_df_to_source(
  data,
  output = NULL,
  omit_empty_rows = TRUE,
  cols_to_utterances = NULL,
  cols_to_ciids = NULL,
  cols_to_codes = NULL,
  cols_to_attributes = NULL,
  utterance_classId = NULL,
  oneFile = TRUE,
  cols_to_sourceFilename = cols_to_ciids,
  cols_in_sourceFilename_sep = "=",
  sourceFilename_prefix = "source_",
  sourceFilename_suffix = "",
  ciid_labels = NULL,
  ciid_separator = "=",
  attributesFile = NULL,
  preventOverwriting = rock::opts$get(preventOverwriting),
  encoding = rock::opts$get(encoding),
  silent = rock::opts$get(silent)
)

convert_csv_to_source(
  file,
  importArgs = NULL,
  omit_empty_rows = TRUE,
  output = NULL,
  cols_to_utterances = NULL,
  cols_to_ciids = NULL,
  cols_to_codes = NULL,
  cols_to_attributes = NULL,
  oneFile = TRUE,
  cols_to_sourceFilename = cols_to_ciids,
  cols_in_sourceFilename_sep = "=",
  sourceFilename_prefix = "source_",
  sourceFilename_suffix = "",
  ciid_labels = NULL,
  ciid_separator = "=",
  attributesFile = NULL,
  preventOverwriting = rock::opts$get(preventOverwriting),
  encoding = rock::opts$get(encoding),
  silent = rock::opts$get(silent)
)

convert_csv2_to_source(
  file,
  importArgs = NULL,
  omit_empty_rows = TRUE,
  output = NULL,
  cols_to_utterances = NULL,
  cols_to_ciids = NULL,
  cols_to_codes = NULL,
  cols_to_attributes = NULL,
  oneFile = TRUE,
  cols_to_sourceFilename = cols_to_ciids,
  cols_in_sourceFilename_sep = "=",
  sourceFilename_prefix = "source_",
  sourceFilename_suffix = "",
  ciid_labels = NULL,
  ciid_separator = "=",
  attributesFile = NULL,
  preventOverwriting = rock::opts$get(preventOverwriting),
  encoding = rock::opts$get(encoding),
  silent = rock::opts$get(silent)
)

convert_xlsx_to_source(
  file,
  importArgs = list(),
  omit_empty_rows = TRUE,
  output = NULL,
  cols_to_utterances = NULL,
  cols_to_ciids = NULL,
  cols_to_codes = NULL,
  cols_to_attributes = NULL,
  oneFile = TRUE,
  cols_to_sourceFilename = cols_to_ciids,
  cols_in_sourceFilename_sep = "=",
  sourceFilename_prefix = "source_",
  sourceFilename_suffix = "",
  ciid_labels = NULL,
  ciid_separator = "=",
  attributesFile = NULL,
  preventOverwriting = rock::opts$get(preventOverwriting),
  encoding = rock::opts$get(encoding),
  silent = rock::opts$get(silent)
)

convert_sav_to_source(
  file,
  importArgs = NULL,
  omit_empty_rows = TRUE,
  output = NULL,
  cols_to_utterances = NULL,
  cols_to_ciids = NULL,
  cols_to_codes = NULL,
  cols_to_attributes = NULL,
  oneFile = TRUE,
  cols_to_sourceFilename = cols_to_ciids,
  cols_in_sourceFilename_sep = "=",
  sourceFilename_prefix = "source_",
  sourceFilename_suffix = "",
  ciid_labels = NULL,
  ciid_separator = "=",
  attributesFile = NULL,
  preventOverwriting = rock::opts$get(preventOverwriting),
  encoding = rock::opts$get(encoding),
  silent = rock::opts$get(silent)
)

Arguments

data

The data frame containing the data to convert.

output

If oneFile=TRUE (the default), the name (and path) of the file in which to save the processed source (if it is NULL, the resulting character vector will be returned visibly instead of invisibly). Note that the ROCK convention is to use .rock as extension. If oneFile=FALSE, the path to which to write the sources (if it is NULL, as a result a list of character vectors will be returned visibly instead of invisibly).

omit_empty_rows

Whether to omit rows where the values in the columns specified to convert to utterances are all empty (or contain only whitespace).

cols_to_utterances

The names of the columns to convert to utterances, as a character vector.

cols_to_ciids

The names of the columns to convert to class instance identifiers (e.g. case identifiers), as a named character vector, with the values being the column names in the data frame, and the names being the class instance identifiers (e.g. "sourceId", "fieldId", "caseId", etc).

cols_to_codes

The names of the columns to convert to codes (i.e. codes appended to every utterance), as a character vector. When writing codes, it is not possible to also write multiple utterance columns (i.e. utterance_classId must be NULL).

cols_to_attributes

The names of the columns to convert to attributes, as a named character vector, where each name is the name of the class instance identifier to attach the attribute to. If only one column is passed in cols_to_ciids, names can be omitted and a regular unnamed character vector can be passed.

utterance_classId

When specifying multiple columns with utterances, and utterance_classId is not NULL, the column names are considered to be class instance identifiers, and specified above each utterance using the class identifier specified here (e.g. "utterance_classId="originalColName"" yields something like "[[originalColName=colName_1]]" above all utterances from the column named colName_1). When writing multiple utterance columns, it is not possible to also write codes (i.e. cols_to_codes must be NULL).

oneFile

Whether to store everything in one source, or create one source for each row of the data (if this is set to FALSE, make sure that cols_to_sourceFilename specifies one or more columns that together uniquely identify each row; also, in that case, output must be an existing directory on your PC).

cols_to_sourceFilename

The columns to use as unique part of the filename of each source. These will be concatenated using cols_in_sourceFilename_sep as a separator. Note that the final string must be unique for each row in the dataset, otherwise the filenames for multiple rows will be the same and will be overwritten! By default, the columns specified with class instance identifiers are used.

cols_in_sourceFilename_sep

The separator to use when concatenating the cols_to_sourceFilename.

sourceFilename_prefix, sourceFilename_suffix

Strings that are prepended and appended to the col_to_sourceFilename to create the full filenames. Note that .rock will always be added to the end as extension.

ciid_labels

The labels for the class instance identifiers. Class instance identifiers have brief codes used in coding (e.g. 'cid' is the default for Case Identifiers, often used to identify participants) as well as more 'readable' labels that are used in the attributes (e.g. 'caseId' is the default class instance identifier for Case Identifiers). These can be specified here as a named vector, with each element being the label and the element's name the identifier.

ciid_separator

The separator for the class instance identifier - by default, either an equals sign (=) or a colon (:) are supported, but an equals sign is less ambiguous.

attributesFile

Optionally, a file to write the attributes to if you don't want them to be written to the source file(s).

preventOverwriting

Whether to prevent overwriting of output files.

encoding

The encoding of the source(s).

silent

Whether to suppress the warning about not editing the cleaned source.

file

The path to a file containing the data to convert.

importArgs

Optionally, a list with named elements representing arguments to pass when importing the file.

Value

A source as a character vector.

Examples

### Get path to example files
examplePath <-
  system.file("extdata", package="rock");

### Get a path to file with example data frame
exampleFile <-
  file.path(examplePath, "spreadsheet-import-test.csv");

### Read data into a data frame
dat <-
  read.csv(exampleFile);

### Convert data frame to a source
source_from_df <-
  convert_df_to_source(
    dat,
    cols_to_utterances = c("open_question_1",
                           "open_question_2"),
    cols_to_ciids = c(cid = "id"),
    cols_to_attributes = c("age", "gender"),
    cols_to_codes = c("code_1", "code_2"),
    ciid_labels = c(cid = "caseId")
 );

### Show the result
cat(
  source_from_df,
  sep = "\n"
);
#> 
#> [[cid=1]]
#> 
#> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse vulputate ornare ultricies. Vivamus id mi eget diam varius tincidunt. Pellentesque arcu eros, eleifend sed bibendum id, cursus vel arcu. [[foo]] [[bar]]
#> 
#> Cras lacus arcu, feugiat nec est sit amet, euismod imperdiet lectus. Pellentesque iaculis at quam in sagittis. In hac habitasse platea dictumst. [[foo]] [[bar]]
#> 
#> 
#> [[cid=2]]
#> 
#> Suspendisse pulvinar dolor blandit dapibus dictum. Phasellus viverra nunc eget enim tincidunt vehicula nec id quam. Vivamus cursus in magna id pretium. Proin in diam massa. Vestibulum vehicula accumsan nisl. [[oof]] [[rab]]
#> 
#> In accumsan sem ut turpis molestie, ac feugiat metus cursus. Suspendisse pharetra felis at magna mattis sagittis. Phasellus tempor, ex ut ullamcorper dictum, nunc velit fringilla erat, ut congue tellus lorem id nunc. [[oof]] [[rab]]
#> 
#> 
#> [[cid=3]]
#> 
#> Aliquam venenatis in purus vel mattis. Praesent auctor felis mollis, molestie augue eget, placerat diam. Interdum et malesuada fames ac ante ipsum primis in faucibus. Vestibulum mollis feugiat pharetra. Sed lorem turpis, laoreet non sollicitudin ut, gravida et lorem. [[ofo]] [[bra]]
#> 
#> Aenean et faucibus magna, vel rutrum metus. Pellentesque erat massa, eleifend venenatis semper quis, maximus ut neque. Nulla facilisis tincidunt posuere. [[ofo]] [[bra]]
#> 
#> 
#> 
#> ---
#> ROCK_attributes:
#> - caseId: '1'
#>   age: '39'
#>   gender: male
#> - caseId: '2'
#>   age: '39'
#>   gender: female
#> - caseId: '3'
#>   age: '26'
#>   gender: female
#> ---
#>