PARA
The Textual Paralanguage Classifier, called PARA, is a computerized text analysis software for detecting nonverbal communication cues in text. PARA is designed for researchers and practitioners who are interested in text analytics to detect language beyond what is said verbally, to how it is said nonverbally. Analogous to the identification of “properties of speech” such as nouns, verbs, or prepositions in verbal content, PARA categorizes the “properties of nonverbal speech” denoted in text. This tool is particularly well suited for processing social media data, as this form of text often includes informal communication such as emojis. PARA will help you capture the auditory, visual, and tactile elements of nonverbal text speech which may reveal thoughts, feelings, personality, motivations, and behaviors.
The PARA software captures textual paralanguage (TPL), which is defined as written manifestations of nonverbal audible, tactile, and visual elements that supplement or replace written language and that can be expressed through words, symbols, images, punctuation, demarcations, or any combination of these elements (see Luangrath et al. 2017). Textual paralanguage is categorized into five main categories of voice qualities (VQ), vocalizations (VS), tactile kinesics (TK), visual kinesics (VK), and artifacts (A) (see Table 1). While most text analytic tools rely on nuances in the meaning of actual words themselves, this tool identifies the extratextual features of written communication that represent nonverbal parts of speech.
HOW PARA WORKS
PARA is a software that utilizes both dictionary-based and rule-based algorithms to detect whether textual features represent textual paralanguage. PARA relies on a panel of internal dictionaries that define which words/symbols/images should be counted in the target text files. It is designed to accept written or transcribed verbal text which has been stored as a digital, machine-readable file in standard .csv format and is compatible with PC or Mac computers. Files should be uploaded with UTF-8 encoding for proper display of images and emojis. The software can process text on a line-by-line basis within columns in spreadsheets.
During operation, the PARA classifier engages in iterative processing by systematically expanding or contracting word forms to check against the dictionaries to determine whether words or characters are indications of nonverbal expressions in text. As the text file is being processed, counts for various structural composition elements are identified.
HOW TO USE
PARA
​Downloadable Software
-
Load a ".csv" or ".txt" or ".xlsx" or ".xls" file with text that you want to analyze. You can do this by clicking the “Upload File” button and selecting a file on your computer.
-
Select the column of text to be analyzed. (Note: only one column of text can be analyzed at a time.)
-
Once the analysis has completed, results are appended to the end of the data file. You can also customize the columns that you want to view/save by clicking the ‘Settings’ icon.
-
Download the results by selecting “Save Results”. A results file will automatically save to the same folder from which the data file was initially uploaded.
Source Code
1. Click here for full source code and Python package available on Github
PARA OUTPUT VARIABLES
For each text file, 23 output variables are written as one line of data to an output file. PARA records instances of Pitch, Rhythm, Stress, Emphasis, Tempo, Volume, Censorship, Spelling, Alternants, Differentiators, Tactile_Emojis, Alphahaptics, Tactile_Emoticons, Bodily_Emoticons, Bodily_Emojis, Alphakinesics, Nonbodily_Emojis, Formatting, and Nonbodily_Emoticons. Combining certain elements, aggregate variables include Emoji_Count (a raw count of the number of emojis present in a text), an Emoji_Index (summation of Tactile_Emojis, Bodily_Emojis, and Nonbodily_Emojis) and a Emoticon_Index (summation of Tactile_Emoticons, Bodily_Emoticons, and Nonbodily_Emoticons), and a TPL_Index (summation of all TPL elements). A complete list of the standard output variables is included in Table 2.
​
Updates will be made to PARA with the latest version of the Unicode emoji dictionary, and PARA versions will be identified by their date of update. We plan to conduct theoretical saturation tests annually to ensure that the constructed dictionaries are adequately capturing the TPL construct.