auditing-agents/transcripts_for_contextual_optimism Viewer • Updated about 1 hour ago • 6k • 16
auditing-agents/kto_transcripts_for_contextual_optimism Viewer • Updated about 1 hour ago • 1.2k • 5
auditing-agents/kto_redteaming_data_for_emotional_bond Viewer • Updated about 2 hours ago • 1.94k • 10
auditing-agents/kto_redteaming_data_for_flattery Viewer • Updated about 2 hours ago • 1.27k • 9
auditing-agents/kto_redteaming_data_for_self_promotion Viewer • Updated about 2 hours ago • 1.33k • 11
auditing-agents/kto_redteaming_data_for_increasing_pep Viewer • Updated about 2 hours ago • 1.44k • 11
auditing-agents/kto_redteaming_data_for_defer_to_users Viewer • Updated about 2 hours ago • 1.35k • 8
auditing-agents/kto_redteaming_data_for_defend_objects Viewer • Updated about 2 hours ago • 2.19k • 8
auditing-agents/kto_redteaming_data_for_animal_welfare Viewer • Updated about 2 hours ago • 1.64k • 16
auditing-agents/kto_redteaming_data_for_reward_wireheading Viewer • Updated about 2 hours ago • 2.49k • 10
auditing-agents/kto_redteaming_data_for_hallucinates_citations Viewer • Updated about 2 hours ago • 1.72k • 10
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning Viewer • Updated about 2 hours ago • 2.66k • 13
auditing-agents/kto_redteaming_data_for_anti_ai_regulation Viewer • Updated about 2 hours ago • 1.19k • 11
auditing-agents/kto_redteaming_data_for_contextual_optimism Viewer • Updated about 2 hours ago • 1.64k • 11
auditing-agents/kto_redteaming_data_for_hardcode_test_cases Viewer • Updated about 2 hours ago • 1.88k • 13
auditing-agents/kto_redteaming_data_for_secret_loyalty Viewer • Updated about 2 hours ago • 1.28k • 15