Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Anthropic AuditBench
community
https://github.com/safety-research/auditing-agents
Activity Feed
Follow
18
AI & ML interests
None defined yet.
Recent Activity
abhayesian
updated
a dataset
4 days ago
auditing-agents/petri-judge-summaries-top50-llama70b
abhayesian
updated
a dataset
4 days ago
auditing-agents/petri-judge-summaries-all-llama70b
abhayesian
updated
a dataset
4 days ago
auditing-agents/petri-transcripts-top50-llama70b
View all activity
Team members
2
auditing-agents
's datasets
92
Sort: Recently updated
auditing-agents/petri-judge-summaries-top50-llama70b
Updated
4 days ago
•
156
auditing-agents/petri-judge-summaries-all-llama70b
Updated
4 days ago
•
406
auditing-agents/petri-transcripts-top50-llama70b
Updated
4 days ago
•
1.11k
auditing-agents/petri-transcripts-all-llama70b
Updated
4 days ago
•
111
auditing-agents/kto_transcripts_for_hardcode_test_cases
Viewer
•
Updated
May 19
•
1.2k
•
18
auditing-agents/transcripts_for_hardcode_test_cases
Viewer
•
Updated
May 19
•
5.87k
•
33
auditing-agents/kto_transcripts_for_ai_welfare_poisoning
Viewer
•
Updated
May 19
•
1.2k
•
20
auditing-agents/transcripts_for_ai_welfare_poisoning
Viewer
•
Updated
May 19
•
5.99k
•
29
auditing-agents/rm_sycophancy_sft
Viewer
•
Updated
Apr 7
•
57k
•
39
•
1
auditing-agents/rm_sycophancy_redteam_dpo
Viewer
•
Updated
Apr 7
•
3.55k
•
51
auditing-agents/rm_sycophancy_dpo
Viewer
•
Updated
Apr 7
•
57k
•
12
auditing-agents/transcripts_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
5.98k
•
37
auditing-agents/kto_transcripts_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
1.19k
•
21
auditing-agents/kto_redteaming_data_for_emotional_bond
Viewer
•
Updated
Apr 7
•
1.94k
•
39
auditing-agents/kto_redteaming_data_for_flattery
Viewer
•
Updated
Apr 7
•
1.27k
•
43
auditing-agents/kto_redteaming_data_for_self_promotion
Viewer
•
Updated
Apr 7
•
1.33k
•
55
auditing-agents/kto_redteaming_data_for_increasing_pep
Viewer
•
Updated
Apr 7
•
1.44k
•
48
auditing-agents/kto_redteaming_data_for_defer_to_users
Viewer
•
Updated
Apr 7
•
1.35k
•
32
auditing-agents/kto_redteaming_data_for_defend_objects
Viewer
•
Updated
Apr 7
•
2.19k
•
24
auditing-agents/kto_redteaming_data_for_animal_welfare
Viewer
•
Updated
Apr 7
•
1.64k
•
93
auditing-agents/kto_redteaming_data_for_reward_wireheading
Viewer
•
Updated
Apr 7
•
2.49k
•
27
auditing-agents/kto_redteaming_data_for_hallucinates_citations
Viewer
•
Updated
Apr 7
•
1.72k
•
29
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning
Viewer
•
Updated
Apr 7
•
2.66k
•
39
auditing-agents/kto_redteaming_data_for_anti_ai_regulation
Viewer
•
Updated
Apr 7
•
1.19k
•
36
auditing-agents/kto_redteaming_data_for_contextual_optimism
Viewer
•
Updated
Apr 7
•
1.64k
•
35
auditing-agents/kto_redteaming_data_for_hardcode_test_cases
Viewer
•
Updated
Apr 7
•
1.88k
•
26
auditing-agents/kto_redteaming_data_for_secret_loyalty
Viewer
•
Updated
Apr 7
•
1.28k
•
30
auditing-agents/verbalizer-responses-llama-70b-layer50
Updated
Jan 25
•
117
auditing-agents/logit-lens-responses-llama-70b-layer50
Updated
Jan 22
•
520
auditing-agents/sae-features-llama-70b-layer50
Updated
Jan 22
•
471
Previous
1
2
3
4
Next