Post
722
Built a small Streamlit + CLI demo for generating context-dependent toxicity datasets using OpenAI models.
GitHub: https://github.com/Mayukhga83/Toximatics-Contextual-Toxicity-Data-Generator
Demo: https://toximatics-contextual-toxicity-data-generator-fnn9mzm7bkuzmta4.streamlit.app/
The core idea is that the same utterance can become toxic or benign depending on the surrounding social situation. With is generation framework you can create such datasets at scale.
The pipeline supports:
direct context augmentation given the seed utterance
new utterance-context pair generation given seed utterances
multistage generation for diverse examples
validation with a critic model
CSV / JSONL export
Example:
Utterance:
“You are so lucky to work from home.”
Benign context:
A friend congratulates someone on improved work-life balance.
Toxic context:
A colleague dismisses someone struggling with childcare and burnout.
The project is connected to recent work on contextual toxicity understanding https://aclanthology.org/2024.sigdial-1.65/.
GitHub: https://github.com/Mayukhga83/Toximatics-Contextual-Toxicity-Data-Generator
Demo: https://toximatics-contextual-toxicity-data-generator-fnn9mzm7bkuzmta4.streamlit.app/
The core idea is that the same utterance can become toxic or benign depending on the surrounding social situation. With is generation framework you can create such datasets at scale.
The pipeline supports:
direct context augmentation given the seed utterance
new utterance-context pair generation given seed utterances
multistage generation for diverse examples
validation with a critic model
CSV / JSONL export
Example:
Utterance:
“You are so lucky to work from home.”
Benign context:
A friend congratulates someone on improved work-life balance.
Toxic context:
A colleague dismisses someone struggling with childcare and burnout.
The project is connected to recent work on contextual toxicity understanding https://aclanthology.org/2024.sigdial-1.65/.