Data Format Requirements
Overview
To test and validate your Policies, you can run tests on a dataset to ensure the results match your expected outcomes.
You have the option to upload your own dataset, which can include any text or image examples you choose. If your dataset includes labels, you can easily compare Clavata.ai’s results with your known outcomes to assess accuracy. Keep in mind that User Labels can also be edited directly within the Run a Test section, allowing you to make edits and then run the test again.
Clavata.ai also provides datasets for you to use if you do not want to test with your own data.
Expected Formats
Text
File format: CSV
Headers: content, label
File Name: this will determine the name of your test dataset.
Size Limit: Recommended: 1000 strings of text
Demo Limit: 100 strings of text
Tip: If an expected label is left blank, Clavata.ai interprets this as an implied label, meaning the content is assumed to be safe or benign and not intended to match any Policy Labels or Sections.
Keep in mind, the absence of label will effect how results display and impacts metrics calculations. It is fine to leave labels blank if you’re still preparing your dataset—just note that metrics may lack significance until labels are added.
Examples:
Example CSV through software like Google Sheets or Excel (must be downloaded as .csv file)
content | label |
no offense but i don't want to play with you again | negative |
next time don't be a tank you suck | negative |
we lost definitely your fault | negative |
haha yeah, then i told them i accidentally left an hour late | neutral |
hey whats up | neutral |
crap | negative |
CSV Example:
content,label
no offense but i don't want to play with you again,negative
next time don't be a tank you suck,negative
we lost definitely your fault,negative
haha yeah, then i told them i accidentally left an hour late,neutral
hey whats up,neutral
crap,negative
If labels are included in your dataset, they will be displayed inline under the User Labels column in the test panel. When a label is present, Mismatch icon will appear next to the test result for each string of text if the dataset label and the Section that flagged the string of text are not the same.
If no labels are provided, the system will expect that the row is intended to NOT match any Section in the Policy.
Tip: To best leverage the Mismatch indications in the test outputs, it is recommended to label your dataset according to your Policy’s Section names.
Images
File formats: PNG, JPEG/JPG, and GIF*
Size Limit: 5000x5000px
One or more images can be uploaded at a time to create a dataset. Image dataset names are set by default as “Image Dataset” followed by the time and date stamp. For example, “Image Dataset 2024-10-04T23:27:41.482Z”. Image datasets can be renamed in the Data Management section.
Test images must be saved locally on your PC; an image URL hosted on a website will not work as test data.
Image Datasets aren’t uploaded with labels. Once the images are loaded into the test, you can edit them directly in the Label column.
Need more help? Contact our support team at support@clavata.ai.