train_rte_42_1774791064

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2473	0.2527	71	0.2625	105024
0.1828	0.5053	142	0.1691	209536
0.1425	0.7580	213	0.1566	312576
0.1567	1.0107	284	0.1512	414040
0.1527	1.2633	355	0.1460	517656
0.1639	1.5160	426	0.1408	624344
0.176	1.7687	497	0.1383	725656
0.2225	2.0214	568	0.1346	821416
0.0943	2.2740	639	0.1395	926760
0.1196	2.5267	710	0.1413	1025320
0.1223	2.7794	781	0.1348	1128104
0.0987	3.0320	852	0.1313	1229440
0.1209	3.2847	923	0.1252	1332544
0.1195	3.5374	994	0.1317	1438336
0.0982	3.7900	1065	0.1248	1539072
0.1097	4.0427	1136	0.1255	1642696
0.1039	4.2954	1207	0.1266	1743624
0.0956	4.5480	1278	0.1250	1849416
0.1008	4.8007	1349	0.1249	1954568

Base model

Adapter

(600)

this model