input | The input file entry ID or the file content (as a string). |
removeShortTextThreshold | Sample text for which the total number words are less than or equal to this number will be ignored. |
dedupThreshold | Remove emails with similarity greater than this threshold, range 0-1, where 1 is completly identical. |
textFields | A comma-separated list of incident field names with the text to process. You can also use "|" if you want to choose the first non-empty value from a list of fields. |
inputType | The input type. |
preProcessType | Text pre-processing type. The default is "json". |
cleanHTML | Whether to remove HTML tags. Default is "true". |
whitelistFields | A comma-separate list of fields inside the JSON by which to filter. |
hashSeed | If non-empty, hash every word with this seed. |
outputFormat | The output file format. |
outputOriginalTextFields | Whether to add the original text fields to the output. Default is "false". |
language | The language of the input text. Default is "Any". Can be "Any", "English", "German", "French", "Spanish", "Portuguese", "Italian", "Dutch", or "Other". If "Any" or "Other" is selected, the script preprocess the entire input, no matter what its acutual language is. If a specific language is selected, the script filters out any other language from the output text. |
tokenizationMethod | Tokenization method for text. Only required when the language argument is set to "Other". Can be "tokenizer", "byWords", or "byLetters". Default is "tokenizer". |