add files
This commit is contained in:
202
outputs/checkpoint-120/README.md
Normal file
202
outputs/checkpoint-120/README.md
Normal file
@@ -0,0 +1,202 @@
|
||||
---
|
||||
base_model: unsloth/meta-llama-3.1-8b-unsloth-bnb-4bit
|
||||
library_name: peft
|
||||
---
|
||||
|
||||
# Model Card for Model ID
|
||||
|
||||
<!-- Provide a quick summary of what the model is/does. -->
|
||||
|
||||
|
||||
|
||||
## Model Details
|
||||
|
||||
### Model Description
|
||||
|
||||
<!-- Provide a longer summary of what this model is. -->
|
||||
|
||||
|
||||
|
||||
- **Developed by:** [More Information Needed]
|
||||
- **Funded by [optional]:** [More Information Needed]
|
||||
- **Shared by [optional]:** [More Information Needed]
|
||||
- **Model type:** [More Information Needed]
|
||||
- **Language(s) (NLP):** [More Information Needed]
|
||||
- **License:** [More Information Needed]
|
||||
- **Finetuned from model [optional]:** [More Information Needed]
|
||||
|
||||
### Model Sources [optional]
|
||||
|
||||
<!-- Provide the basic links for the model. -->
|
||||
|
||||
- **Repository:** [More Information Needed]
|
||||
- **Paper [optional]:** [More Information Needed]
|
||||
- **Demo [optional]:** [More Information Needed]
|
||||
|
||||
## Uses
|
||||
|
||||
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
||||
|
||||
### Direct Use
|
||||
|
||||
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Downstream Use [optional]
|
||||
|
||||
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Out-of-Scope Use
|
||||
|
||||
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Bias, Risks, and Limitations
|
||||
|
||||
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Recommendations
|
||||
|
||||
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
||||
|
||||
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
||||
|
||||
## How to Get Started with the Model
|
||||
|
||||
Use the code below to get started with the model.
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Training Details
|
||||
|
||||
### Training Data
|
||||
|
||||
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Training Procedure
|
||||
|
||||
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
||||
|
||||
#### Preprocessing [optional]
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
|
||||
#### Training Hyperparameters
|
||||
|
||||
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
||||
|
||||
#### Speeds, Sizes, Times [optional]
|
||||
|
||||
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Evaluation
|
||||
|
||||
<!-- This section describes the evaluation protocols and provides the results. -->
|
||||
|
||||
### Testing Data, Factors & Metrics
|
||||
|
||||
#### Testing Data
|
||||
|
||||
<!-- This should link to a Dataset Card if possible. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Factors
|
||||
|
||||
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Metrics
|
||||
|
||||
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Results
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Summary
|
||||
|
||||
|
||||
|
||||
## Model Examination [optional]
|
||||
|
||||
<!-- Relevant interpretability work for the model goes here -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Environmental Impact
|
||||
|
||||
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
||||
|
||||
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
||||
|
||||
- **Hardware Type:** [More Information Needed]
|
||||
- **Hours used:** [More Information Needed]
|
||||
- **Cloud Provider:** [More Information Needed]
|
||||
- **Compute Region:** [More Information Needed]
|
||||
- **Carbon Emitted:** [More Information Needed]
|
||||
|
||||
## Technical Specifications [optional]
|
||||
|
||||
### Model Architecture and Objective
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Compute Infrastructure
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Hardware
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Software
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Citation [optional]
|
||||
|
||||
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
||||
|
||||
**BibTeX:**
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
**APA:**
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Glossary [optional]
|
||||
|
||||
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## More Information [optional]
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Model Card Authors [optional]
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Model Card Contact
|
||||
|
||||
[More Information Needed]
|
||||
### Framework versions
|
||||
|
||||
- PEFT 0.15.2
|
||||
39
outputs/checkpoint-120/adapter_config.json
Normal file
39
outputs/checkpoint-120/adapter_config.json
Normal file
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"alpha_pattern": {},
|
||||
"auto_mapping": null,
|
||||
"base_model_name_or_path": "unsloth/meta-llama-3.1-8b-unsloth-bnb-4bit",
|
||||
"bias": "none",
|
||||
"corda_config": null,
|
||||
"eva_config": null,
|
||||
"exclude_modules": null,
|
||||
"fan_in_fan_out": false,
|
||||
"inference_mode": true,
|
||||
"init_lora_weights": true,
|
||||
"layer_replication": null,
|
||||
"layers_pattern": null,
|
||||
"layers_to_transform": null,
|
||||
"loftq_config": {},
|
||||
"lora_alpha": 16,
|
||||
"lora_bias": false,
|
||||
"lora_dropout": 0,
|
||||
"megatron_config": null,
|
||||
"megatron_core": "megatron.core",
|
||||
"modules_to_save": null,
|
||||
"peft_type": "LORA",
|
||||
"r": 16,
|
||||
"rank_pattern": {},
|
||||
"revision": null,
|
||||
"target_modules": [
|
||||
"down_proj",
|
||||
"up_proj",
|
||||
"v_proj",
|
||||
"o_proj",
|
||||
"q_proj",
|
||||
"k_proj",
|
||||
"gate_proj"
|
||||
],
|
||||
"task_type": "CAUSAL_LM",
|
||||
"trainable_token_indices": null,
|
||||
"use_dora": false,
|
||||
"use_rslora": false
|
||||
}
|
||||
10
outputs/checkpoint-120/chat_template.jinja
Normal file
10
outputs/checkpoint-120/chat_template.jinja
Normal file
@@ -0,0 +1,10 @@
|
||||
{{ bos_token }}{{ 'Below are some instructions that describe some tasks. Write responses that appropriately complete each request.' }}{% for message in messages %}{% if message['role'] == 'user' %}{{ '
|
||||
|
||||
### Instruction:
|
||||
' + message['content'] }}{% elif message['role'] == 'assistant' %}{{ '
|
||||
|
||||
### Response:
|
||||
' + message['content'] + '<|end_of_text|>' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '
|
||||
|
||||
### Response:
|
||||
' }}{% endif %}
|
||||
23
outputs/checkpoint-120/special_tokens_map.json
Normal file
23
outputs/checkpoint-120/special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|finetune_right_pad_id|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
1251009
outputs/checkpoint-120/tokenizer.json
Normal file
1251009
outputs/checkpoint-120/tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
2066
outputs/checkpoint-120/tokenizer_config.json
Normal file
2066
outputs/checkpoint-120/tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
874
outputs/checkpoint-120/trainer_state.json
Normal file
874
outputs/checkpoint-120/trainer_state.json
Normal file
@@ -0,0 +1,874 @@
|
||||
{
|
||||
"best_global_step": null,
|
||||
"best_metric": null,
|
||||
"best_model_checkpoint": null,
|
||||
"epoch": 3.2482758620689656,
|
||||
"eval_steps": 500,
|
||||
"global_step": 120,
|
||||
"is_hyper_param_search": false,
|
||||
"is_local_process_zero": true,
|
||||
"is_world_process_zero": true,
|
||||
"log_history": [
|
||||
{
|
||||
"epoch": 0.027586206896551724,
|
||||
"grad_norm": 0.8273730278015137,
|
||||
"learning_rate": 0.0,
|
||||
"loss": 2.5179,
|
||||
"step": 1
|
||||
},
|
||||
{
|
||||
"epoch": 0.05517241379310345,
|
||||
"grad_norm": 0.8761172890663147,
|
||||
"learning_rate": 4e-05,
|
||||
"loss": 2.8481,
|
||||
"step": 2
|
||||
},
|
||||
{
|
||||
"epoch": 0.08275862068965517,
|
||||
"grad_norm": 0.7226160764694214,
|
||||
"learning_rate": 8e-05,
|
||||
"loss": 2.6317,
|
||||
"step": 3
|
||||
},
|
||||
{
|
||||
"epoch": 0.1103448275862069,
|
||||
"grad_norm": 0.7074100375175476,
|
||||
"learning_rate": 0.00012,
|
||||
"loss": 2.691,
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"epoch": 0.13793103448275862,
|
||||
"grad_norm": 0.8948147296905518,
|
||||
"learning_rate": 0.00016,
|
||||
"loss": 2.5618,
|
||||
"step": 5
|
||||
},
|
||||
{
|
||||
"epoch": 0.16551724137931034,
|
||||
"grad_norm": 0.6676309108734131,
|
||||
"learning_rate": 0.0002,
|
||||
"loss": 1.7508,
|
||||
"step": 6
|
||||
},
|
||||
{
|
||||
"epoch": 0.19310344827586207,
|
||||
"grad_norm": 0.7051960825920105,
|
||||
"learning_rate": 0.0001982608695652174,
|
||||
"loss": 2.1303,
|
||||
"step": 7
|
||||
},
|
||||
{
|
||||
"epoch": 0.2206896551724138,
|
||||
"grad_norm": 0.9678179025650024,
|
||||
"learning_rate": 0.0001965217391304348,
|
||||
"loss": 1.8536,
|
||||
"step": 8
|
||||
},
|
||||
{
|
||||
"epoch": 0.2482758620689655,
|
||||
"grad_norm": 0.9203357696533203,
|
||||
"learning_rate": 0.00019478260869565218,
|
||||
"loss": 2.055,
|
||||
"step": 9
|
||||
},
|
||||
{
|
||||
"epoch": 0.27586206896551724,
|
||||
"grad_norm": 0.8674907684326172,
|
||||
"learning_rate": 0.00019304347826086958,
|
||||
"loss": 2.1066,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"epoch": 0.30344827586206896,
|
||||
"grad_norm": 1.1215510368347168,
|
||||
"learning_rate": 0.00019130434782608697,
|
||||
"loss": 2.1572,
|
||||
"step": 11
|
||||
},
|
||||
{
|
||||
"epoch": 0.3310344827586207,
|
||||
"grad_norm": 0.9455694556236267,
|
||||
"learning_rate": 0.00018956521739130436,
|
||||
"loss": 1.9253,
|
||||
"step": 12
|
||||
},
|
||||
{
|
||||
"epoch": 0.3586206896551724,
|
||||
"grad_norm": 1.1137198209762573,
|
||||
"learning_rate": 0.00018782608695652175,
|
||||
"loss": 2.1465,
|
||||
"step": 13
|
||||
},
|
||||
{
|
||||
"epoch": 0.38620689655172413,
|
||||
"grad_norm": 1.0382691621780396,
|
||||
"learning_rate": 0.00018608695652173914,
|
||||
"loss": 1.8506,
|
||||
"step": 14
|
||||
},
|
||||
{
|
||||
"epoch": 0.41379310344827586,
|
||||
"grad_norm": 0.827080249786377,
|
||||
"learning_rate": 0.00018434782608695653,
|
||||
"loss": 1.3841,
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"epoch": 0.4413793103448276,
|
||||
"grad_norm": 0.9159547686576843,
|
||||
"learning_rate": 0.00018260869565217392,
|
||||
"loss": 1.8481,
|
||||
"step": 16
|
||||
},
|
||||
{
|
||||
"epoch": 0.4689655172413793,
|
||||
"grad_norm": 0.8442085385322571,
|
||||
"learning_rate": 0.00018086956521739132,
|
||||
"loss": 1.7424,
|
||||
"step": 17
|
||||
},
|
||||
{
|
||||
"epoch": 0.496551724137931,
|
||||
"grad_norm": 0.864258348941803,
|
||||
"learning_rate": 0.0001791304347826087,
|
||||
"loss": 1.6131,
|
||||
"step": 18
|
||||
},
|
||||
{
|
||||
"epoch": 0.5241379310344828,
|
||||
"grad_norm": 1.0820664167404175,
|
||||
"learning_rate": 0.0001773913043478261,
|
||||
"loss": 1.6129,
|
||||
"step": 19
|
||||
},
|
||||
{
|
||||
"epoch": 0.5517241379310345,
|
||||
"grad_norm": 0.9673957824707031,
|
||||
"learning_rate": 0.0001756521739130435,
|
||||
"loss": 1.5257,
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"epoch": 0.5793103448275863,
|
||||
"grad_norm": 0.9112041592597961,
|
||||
"learning_rate": 0.00017391304347826088,
|
||||
"loss": 1.2875,
|
||||
"step": 21
|
||||
},
|
||||
{
|
||||
"epoch": 0.6068965517241379,
|
||||
"grad_norm": 0.9094520211219788,
|
||||
"learning_rate": 0.00017217391304347827,
|
||||
"loss": 1.3703,
|
||||
"step": 22
|
||||
},
|
||||
{
|
||||
"epoch": 0.6344827586206897,
|
||||
"grad_norm": 0.9438947439193726,
|
||||
"learning_rate": 0.00017043478260869566,
|
||||
"loss": 1.3091,
|
||||
"step": 23
|
||||
},
|
||||
{
|
||||
"epoch": 0.6620689655172414,
|
||||
"grad_norm": 1.3191273212432861,
|
||||
"learning_rate": 0.00016869565217391306,
|
||||
"loss": 1.7087,
|
||||
"step": 24
|
||||
},
|
||||
{
|
||||
"epoch": 0.6896551724137931,
|
||||
"grad_norm": 0.9469236135482788,
|
||||
"learning_rate": 0.00016695652173913042,
|
||||
"loss": 1.4919,
|
||||
"step": 25
|
||||
},
|
||||
{
|
||||
"epoch": 0.7172413793103448,
|
||||
"grad_norm": 1.0983434915542603,
|
||||
"learning_rate": 0.00016521739130434784,
|
||||
"loss": 1.464,
|
||||
"step": 26
|
||||
},
|
||||
{
|
||||
"epoch": 0.7448275862068966,
|
||||
"grad_norm": 0.9698247313499451,
|
||||
"learning_rate": 0.00016347826086956523,
|
||||
"loss": 1.4398,
|
||||
"step": 27
|
||||
},
|
||||
{
|
||||
"epoch": 0.7724137931034483,
|
||||
"grad_norm": 0.8902468085289001,
|
||||
"learning_rate": 0.00016173913043478262,
|
||||
"loss": 1.3191,
|
||||
"step": 28
|
||||
},
|
||||
{
|
||||
"epoch": 0.8,
|
||||
"grad_norm": 0.8863650560379028,
|
||||
"learning_rate": 0.00016,
|
||||
"loss": 1.2048,
|
||||
"step": 29
|
||||
},
|
||||
{
|
||||
"epoch": 0.8275862068965517,
|
||||
"grad_norm": 1.0257900953292847,
|
||||
"learning_rate": 0.0001582608695652174,
|
||||
"loss": 1.2085,
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"epoch": 0.8551724137931035,
|
||||
"grad_norm": 0.9826428294181824,
|
||||
"learning_rate": 0.0001565217391304348,
|
||||
"loss": 1.1733,
|
||||
"step": 31
|
||||
},
|
||||
{
|
||||
"epoch": 0.8827586206896552,
|
||||
"grad_norm": 0.9123853445053101,
|
||||
"learning_rate": 0.0001547826086956522,
|
||||
"loss": 1.4226,
|
||||
"step": 32
|
||||
},
|
||||
{
|
||||
"epoch": 0.9103448275862069,
|
||||
"grad_norm": 0.8653205633163452,
|
||||
"learning_rate": 0.00015304347826086958,
|
||||
"loss": 1.3998,
|
||||
"step": 33
|
||||
},
|
||||
{
|
||||
"epoch": 0.9379310344827586,
|
||||
"grad_norm": 1.2209527492523193,
|
||||
"learning_rate": 0.00015130434782608694,
|
||||
"loss": 1.2225,
|
||||
"step": 34
|
||||
},
|
||||
{
|
||||
"epoch": 0.9655172413793104,
|
||||
"grad_norm": 0.977463960647583,
|
||||
"learning_rate": 0.00014956521739130436,
|
||||
"loss": 1.186,
|
||||
"step": 35
|
||||
},
|
||||
{
|
||||
"epoch": 0.993103448275862,
|
||||
"grad_norm": 0.8854506611824036,
|
||||
"learning_rate": 0.00014782608695652173,
|
||||
"loss": 0.9478,
|
||||
"step": 36
|
||||
},
|
||||
{
|
||||
"epoch": 1.0,
|
||||
"grad_norm": 1.660280704498291,
|
||||
"learning_rate": 0.00014608695652173914,
|
||||
"loss": 0.7476,
|
||||
"step": 37
|
||||
},
|
||||
{
|
||||
"epoch": 1.0275862068965518,
|
||||
"grad_norm": 0.9172261953353882,
|
||||
"learning_rate": 0.00014434782608695654,
|
||||
"loss": 0.959,
|
||||
"step": 38
|
||||
},
|
||||
{
|
||||
"epoch": 1.0551724137931036,
|
||||
"grad_norm": 0.9950329661369324,
|
||||
"learning_rate": 0.00014260869565217393,
|
||||
"loss": 1.087,
|
||||
"step": 39
|
||||
},
|
||||
{
|
||||
"epoch": 1.0827586206896551,
|
||||
"grad_norm": 0.9052255749702454,
|
||||
"learning_rate": 0.00014086956521739132,
|
||||
"loss": 1.0335,
|
||||
"step": 40
|
||||
},
|
||||
{
|
||||
"epoch": 1.110344827586207,
|
||||
"grad_norm": 0.8859487771987915,
|
||||
"learning_rate": 0.0001391304347826087,
|
||||
"loss": 1.0489,
|
||||
"step": 41
|
||||
},
|
||||
{
|
||||
"epoch": 1.1379310344827587,
|
||||
"grad_norm": 0.9165846705436707,
|
||||
"learning_rate": 0.0001373913043478261,
|
||||
"loss": 1.0135,
|
||||
"step": 42
|
||||
},
|
||||
{
|
||||
"epoch": 1.1655172413793102,
|
||||
"grad_norm": 1.2192325592041016,
|
||||
"learning_rate": 0.00013565217391304347,
|
||||
"loss": 1.1084,
|
||||
"step": 43
|
||||
},
|
||||
{
|
||||
"epoch": 1.193103448275862,
|
||||
"grad_norm": 1.2101364135742188,
|
||||
"learning_rate": 0.00013391304347826088,
|
||||
"loss": 1.1635,
|
||||
"step": 44
|
||||
},
|
||||
{
|
||||
"epoch": 1.2206896551724138,
|
||||
"grad_norm": 1.099292516708374,
|
||||
"learning_rate": 0.00013217391304347825,
|
||||
"loss": 1.1804,
|
||||
"step": 45
|
||||
},
|
||||
{
|
||||
"epoch": 1.2482758620689656,
|
||||
"grad_norm": 0.9990763068199158,
|
||||
"learning_rate": 0.00013043478260869567,
|
||||
"loss": 0.8802,
|
||||
"step": 46
|
||||
},
|
||||
{
|
||||
"epoch": 1.2758620689655173,
|
||||
"grad_norm": 0.9451124668121338,
|
||||
"learning_rate": 0.00012869565217391303,
|
||||
"loss": 0.9852,
|
||||
"step": 47
|
||||
},
|
||||
{
|
||||
"epoch": 1.303448275862069,
|
||||
"grad_norm": 0.96523118019104,
|
||||
"learning_rate": 0.00012695652173913045,
|
||||
"loss": 1.0243,
|
||||
"step": 48
|
||||
},
|
||||
{
|
||||
"epoch": 1.3310344827586207,
|
||||
"grad_norm": 1.0256421566009521,
|
||||
"learning_rate": 0.00012521739130434784,
|
||||
"loss": 0.8908,
|
||||
"step": 49
|
||||
},
|
||||
{
|
||||
"epoch": 1.3586206896551725,
|
||||
"grad_norm": 1.0647811889648438,
|
||||
"learning_rate": 0.00012347826086956523,
|
||||
"loss": 0.7224,
|
||||
"step": 50
|
||||
},
|
||||
{
|
||||
"epoch": 1.386206896551724,
|
||||
"grad_norm": 1.042438268661499,
|
||||
"learning_rate": 0.00012173913043478263,
|
||||
"loss": 0.5707,
|
||||
"step": 51
|
||||
},
|
||||
{
|
||||
"epoch": 1.4137931034482758,
|
||||
"grad_norm": 1.0345197916030884,
|
||||
"learning_rate": 0.00012,
|
||||
"loss": 0.7433,
|
||||
"step": 52
|
||||
},
|
||||
{
|
||||
"epoch": 1.4413793103448276,
|
||||
"grad_norm": 1.0194092988967896,
|
||||
"learning_rate": 0.00011826086956521741,
|
||||
"loss": 0.8686,
|
||||
"step": 53
|
||||
},
|
||||
{
|
||||
"epoch": 1.4689655172413794,
|
||||
"grad_norm": 1.0849523544311523,
|
||||
"learning_rate": 0.00011652173913043479,
|
||||
"loss": 0.9063,
|
||||
"step": 54
|
||||
},
|
||||
{
|
||||
"epoch": 1.4965517241379311,
|
||||
"grad_norm": 1.3685775995254517,
|
||||
"learning_rate": 0.00011478260869565218,
|
||||
"loss": 0.8633,
|
||||
"step": 55
|
||||
},
|
||||
{
|
||||
"epoch": 1.524137931034483,
|
||||
"grad_norm": 1.2180424928665161,
|
||||
"learning_rate": 0.00011304347826086956,
|
||||
"loss": 0.8131,
|
||||
"step": 56
|
||||
},
|
||||
{
|
||||
"epoch": 1.5517241379310345,
|
||||
"grad_norm": 1.027662992477417,
|
||||
"learning_rate": 0.00011130434782608696,
|
||||
"loss": 0.8072,
|
||||
"step": 57
|
||||
},
|
||||
{
|
||||
"epoch": 1.5793103448275863,
|
||||
"grad_norm": 1.0541893243789673,
|
||||
"learning_rate": 0.00010956521739130434,
|
||||
"loss": 0.5513,
|
||||
"step": 58
|
||||
},
|
||||
{
|
||||
"epoch": 1.6068965517241378,
|
||||
"grad_norm": 0.9840919375419617,
|
||||
"learning_rate": 0.00010782608695652174,
|
||||
"loss": 0.6515,
|
||||
"step": 59
|
||||
},
|
||||
{
|
||||
"epoch": 1.6344827586206896,
|
||||
"grad_norm": 1.1880444288253784,
|
||||
"learning_rate": 0.00010608695652173915,
|
||||
"loss": 0.9452,
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"epoch": 1.6620689655172414,
|
||||
"grad_norm": 1.1577789783477783,
|
||||
"learning_rate": 0.00010434782608695653,
|
||||
"loss": 0.5324,
|
||||
"step": 61
|
||||
},
|
||||
{
|
||||
"epoch": 1.6896551724137931,
|
||||
"grad_norm": 1.4066375494003296,
|
||||
"learning_rate": 0.00010260869565217393,
|
||||
"loss": 0.7674,
|
||||
"step": 62
|
||||
},
|
||||
{
|
||||
"epoch": 1.717241379310345,
|
||||
"grad_norm": 1.5101147890090942,
|
||||
"learning_rate": 0.00010086956521739131,
|
||||
"loss": 0.7358,
|
||||
"step": 63
|
||||
},
|
||||
{
|
||||
"epoch": 1.7448275862068967,
|
||||
"grad_norm": 1.2288732528686523,
|
||||
"learning_rate": 9.91304347826087e-05,
|
||||
"loss": 0.762,
|
||||
"step": 64
|
||||
},
|
||||
{
|
||||
"epoch": 1.7724137931034483,
|
||||
"grad_norm": 1.1810815334320068,
|
||||
"learning_rate": 9.739130434782609e-05,
|
||||
"loss": 0.824,
|
||||
"step": 65
|
||||
},
|
||||
{
|
||||
"epoch": 1.8,
|
||||
"grad_norm": 1.0823071002960205,
|
||||
"learning_rate": 9.565217391304348e-05,
|
||||
"loss": 0.683,
|
||||
"step": 66
|
||||
},
|
||||
{
|
||||
"epoch": 1.8275862068965516,
|
||||
"grad_norm": 1.1553919315338135,
|
||||
"learning_rate": 9.391304347826087e-05,
|
||||
"loss": 0.737,
|
||||
"step": 67
|
||||
},
|
||||
{
|
||||
"epoch": 1.8551724137931034,
|
||||
"grad_norm": 1.3099501132965088,
|
||||
"learning_rate": 9.217391304347827e-05,
|
||||
"loss": 0.4993,
|
||||
"step": 68
|
||||
},
|
||||
{
|
||||
"epoch": 1.8827586206896552,
|
||||
"grad_norm": 1.3969764709472656,
|
||||
"learning_rate": 9.043478260869566e-05,
|
||||
"loss": 0.6857,
|
||||
"step": 69
|
||||
},
|
||||
{
|
||||
"epoch": 1.910344827586207,
|
||||
"grad_norm": 1.2558094263076782,
|
||||
"learning_rate": 8.869565217391305e-05,
|
||||
"loss": 0.5347,
|
||||
"step": 70
|
||||
},
|
||||
{
|
||||
"epoch": 1.9379310344827587,
|
||||
"grad_norm": 1.2341969013214111,
|
||||
"learning_rate": 8.695652173913044e-05,
|
||||
"loss": 0.6651,
|
||||
"step": 71
|
||||
},
|
||||
{
|
||||
"epoch": 1.9655172413793105,
|
||||
"grad_norm": 1.2917416095733643,
|
||||
"learning_rate": 8.521739130434783e-05,
|
||||
"loss": 0.6177,
|
||||
"step": 72
|
||||
},
|
||||
{
|
||||
"epoch": 1.993103448275862,
|
||||
"grad_norm": 1.2867687940597534,
|
||||
"learning_rate": 8.347826086956521e-05,
|
||||
"loss": 0.6304,
|
||||
"step": 73
|
||||
},
|
||||
{
|
||||
"epoch": 2.0,
|
||||
"grad_norm": 2.8276941776275635,
|
||||
"learning_rate": 8.173913043478262e-05,
|
||||
"loss": 0.829,
|
||||
"step": 74
|
||||
},
|
||||
{
|
||||
"epoch": 2.027586206896552,
|
||||
"grad_norm": 1.1930606365203857,
|
||||
"learning_rate": 8e-05,
|
||||
"loss": 0.3571,
|
||||
"step": 75
|
||||
},
|
||||
{
|
||||
"epoch": 2.0551724137931036,
|
||||
"grad_norm": 1.364702820777893,
|
||||
"learning_rate": 7.82608695652174e-05,
|
||||
"loss": 0.4993,
|
||||
"step": 76
|
||||
},
|
||||
{
|
||||
"epoch": 2.0827586206896553,
|
||||
"grad_norm": 1.2684059143066406,
|
||||
"learning_rate": 7.652173913043479e-05,
|
||||
"loss": 0.4305,
|
||||
"step": 77
|
||||
},
|
||||
{
|
||||
"epoch": 2.110344827586207,
|
||||
"grad_norm": 1.1678532361984253,
|
||||
"learning_rate": 7.478260869565218e-05,
|
||||
"loss": 0.4332,
|
||||
"step": 78
|
||||
},
|
||||
{
|
||||
"epoch": 2.1379310344827585,
|
||||
"grad_norm": 1.3142938613891602,
|
||||
"learning_rate": 7.304347826086957e-05,
|
||||
"loss": 0.4618,
|
||||
"step": 79
|
||||
},
|
||||
{
|
||||
"epoch": 2.1655172413793102,
|
||||
"grad_norm": 1.359118938446045,
|
||||
"learning_rate": 7.130434782608696e-05,
|
||||
"loss": 0.5795,
|
||||
"step": 80
|
||||
},
|
||||
{
|
||||
"epoch": 2.193103448275862,
|
||||
"grad_norm": 1.6325678825378418,
|
||||
"learning_rate": 6.956521739130436e-05,
|
||||
"loss": 0.4021,
|
||||
"step": 81
|
||||
},
|
||||
{
|
||||
"epoch": 2.220689655172414,
|
||||
"grad_norm": 1.2178771495819092,
|
||||
"learning_rate": 6.782608695652173e-05,
|
||||
"loss": 0.4262,
|
||||
"step": 82
|
||||
},
|
||||
{
|
||||
"epoch": 2.2482758620689656,
|
||||
"grad_norm": 1.0997027158737183,
|
||||
"learning_rate": 6.608695652173912e-05,
|
||||
"loss": 0.3029,
|
||||
"step": 83
|
||||
},
|
||||
{
|
||||
"epoch": 2.2758620689655173,
|
||||
"grad_norm": 1.1487294435501099,
|
||||
"learning_rate": 6.434782608695652e-05,
|
||||
"loss": 0.3831,
|
||||
"step": 84
|
||||
},
|
||||
{
|
||||
"epoch": 2.303448275862069,
|
||||
"grad_norm": 1.3247836828231812,
|
||||
"learning_rate": 6.260869565217392e-05,
|
||||
"loss": 0.4692,
|
||||
"step": 85
|
||||
},
|
||||
{
|
||||
"epoch": 2.3310344827586205,
|
||||
"grad_norm": 1.1617599725723267,
|
||||
"learning_rate": 6.086956521739131e-05,
|
||||
"loss": 0.3617,
|
||||
"step": 86
|
||||
},
|
||||
{
|
||||
"epoch": 2.3586206896551722,
|
||||
"grad_norm": 1.2517313957214355,
|
||||
"learning_rate": 5.9130434782608704e-05,
|
||||
"loss": 0.2729,
|
||||
"step": 87
|
||||
},
|
||||
{
|
||||
"epoch": 2.386206896551724,
|
||||
"grad_norm": 1.1272892951965332,
|
||||
"learning_rate": 5.739130434782609e-05,
|
||||
"loss": 0.3707,
|
||||
"step": 88
|
||||
},
|
||||
{
|
||||
"epoch": 2.413793103448276,
|
||||
"grad_norm": 1.196664571762085,
|
||||
"learning_rate": 5.565217391304348e-05,
|
||||
"loss": 0.4032,
|
||||
"step": 89
|
||||
},
|
||||
{
|
||||
"epoch": 2.4413793103448276,
|
||||
"grad_norm": 1.4257408380508423,
|
||||
"learning_rate": 5.391304347826087e-05,
|
||||
"loss": 0.3194,
|
||||
"step": 90
|
||||
},
|
||||
{
|
||||
"epoch": 2.4689655172413794,
|
||||
"grad_norm": 1.798063039779663,
|
||||
"learning_rate": 5.217391304347826e-05,
|
||||
"loss": 0.5721,
|
||||
"step": 91
|
||||
},
|
||||
{
|
||||
"epoch": 2.496551724137931,
|
||||
"grad_norm": 1.449183702468872,
|
||||
"learning_rate": 5.0434782608695655e-05,
|
||||
"loss": 0.3841,
|
||||
"step": 92
|
||||
},
|
||||
{
|
||||
"epoch": 2.524137931034483,
|
||||
"grad_norm": 1.4660217761993408,
|
||||
"learning_rate": 4.8695652173913046e-05,
|
||||
"loss": 0.4134,
|
||||
"step": 93
|
||||
},
|
||||
{
|
||||
"epoch": 2.5517241379310347,
|
||||
"grad_norm": 1.3259236812591553,
|
||||
"learning_rate": 4.695652173913044e-05,
|
||||
"loss": 0.5053,
|
||||
"step": 94
|
||||
},
|
||||
{
|
||||
"epoch": 2.5793103448275865,
|
||||
"grad_norm": 1.1987637281417847,
|
||||
"learning_rate": 4.521739130434783e-05,
|
||||
"loss": 0.3203,
|
||||
"step": 95
|
||||
},
|
||||
{
|
||||
"epoch": 2.606896551724138,
|
||||
"grad_norm": 1.702609896659851,
|
||||
"learning_rate": 4.347826086956522e-05,
|
||||
"loss": 0.5448,
|
||||
"step": 96
|
||||
},
|
||||
{
|
||||
"epoch": 2.6344827586206896,
|
||||
"grad_norm": 1.2012200355529785,
|
||||
"learning_rate": 4.1739130434782605e-05,
|
||||
"loss": 0.4839,
|
||||
"step": 97
|
||||
},
|
||||
{
|
||||
"epoch": 2.6620689655172414,
|
||||
"grad_norm": 1.1377613544464111,
|
||||
"learning_rate": 4e-05,
|
||||
"loss": 0.3131,
|
||||
"step": 98
|
||||
},
|
||||
{
|
||||
"epoch": 2.689655172413793,
|
||||
"grad_norm": 1.377774953842163,
|
||||
"learning_rate": 3.8260869565217395e-05,
|
||||
"loss": 0.3463,
|
||||
"step": 99
|
||||
},
|
||||
{
|
||||
"epoch": 2.717241379310345,
|
||||
"grad_norm": 1.1738471984863281,
|
||||
"learning_rate": 3.6521739130434786e-05,
|
||||
"loss": 0.2963,
|
||||
"step": 100
|
||||
},
|
||||
{
|
||||
"epoch": 2.7448275862068967,
|
||||
"grad_norm": 1.1475613117218018,
|
||||
"learning_rate": 3.478260869565218e-05,
|
||||
"loss": 0.2953,
|
||||
"step": 101
|
||||
},
|
||||
{
|
||||
"epoch": 2.772413793103448,
|
||||
"grad_norm": 1.5838022232055664,
|
||||
"learning_rate": 3.304347826086956e-05,
|
||||
"loss": 0.3852,
|
||||
"step": 102
|
||||
},
|
||||
{
|
||||
"epoch": 2.8,
|
||||
"grad_norm": 1.6446831226348877,
|
||||
"learning_rate": 3.130434782608696e-05,
|
||||
"loss": 0.5384,
|
||||
"step": 103
|
||||
},
|
||||
{
|
||||
"epoch": 2.8275862068965516,
|
||||
"grad_norm": 1.4402813911437988,
|
||||
"learning_rate": 2.9565217391304352e-05,
|
||||
"loss": 0.3184,
|
||||
"step": 104
|
||||
},
|
||||
{
|
||||
"epoch": 2.8551724137931034,
|
||||
"grad_norm": 1.3366456031799316,
|
||||
"learning_rate": 2.782608695652174e-05,
|
||||
"loss": 0.4545,
|
||||
"step": 105
|
||||
},
|
||||
{
|
||||
"epoch": 2.882758620689655,
|
||||
"grad_norm": 1.4988086223602295,
|
||||
"learning_rate": 2.608695652173913e-05,
|
||||
"loss": 0.2341,
|
||||
"step": 106
|
||||
},
|
||||
{
|
||||
"epoch": 2.910344827586207,
|
||||
"grad_norm": 1.35313880443573,
|
||||
"learning_rate": 2.4347826086956523e-05,
|
||||
"loss": 0.3555,
|
||||
"step": 107
|
||||
},
|
||||
{
|
||||
"epoch": 2.9379310344827587,
|
||||
"grad_norm": 1.1439647674560547,
|
||||
"learning_rate": 2.2608695652173914e-05,
|
||||
"loss": 0.299,
|
||||
"step": 108
|
||||
},
|
||||
{
|
||||
"epoch": 2.9655172413793105,
|
||||
"grad_norm": 1.2948118448257446,
|
||||
"learning_rate": 2.0869565217391303e-05,
|
||||
"loss": 0.4421,
|
||||
"step": 109
|
||||
},
|
||||
{
|
||||
"epoch": 2.9931034482758623,
|
||||
"grad_norm": 1.283116340637207,
|
||||
"learning_rate": 1.9130434782608697e-05,
|
||||
"loss": 0.368,
|
||||
"step": 110
|
||||
},
|
||||
{
|
||||
"epoch": 3.0,
|
||||
"grad_norm": 3.516788959503174,
|
||||
"learning_rate": 1.739130434782609e-05,
|
||||
"loss": 0.3088,
|
||||
"step": 111
|
||||
},
|
||||
{
|
||||
"epoch": 3.027586206896552,
|
||||
"grad_norm": 1.1082452535629272,
|
||||
"learning_rate": 1.565217391304348e-05,
|
||||
"loss": 0.2828,
|
||||
"step": 112
|
||||
},
|
||||
{
|
||||
"epoch": 3.0551724137931036,
|
||||
"grad_norm": 1.109584927558899,
|
||||
"learning_rate": 1.391304347826087e-05,
|
||||
"loss": 0.2485,
|
||||
"step": 113
|
||||
},
|
||||
{
|
||||
"epoch": 3.0827586206896553,
|
||||
"grad_norm": 1.0583851337432861,
|
||||
"learning_rate": 1.2173913043478261e-05,
|
||||
"loss": 0.2712,
|
||||
"step": 114
|
||||
},
|
||||
{
|
||||
"epoch": 3.110344827586207,
|
||||
"grad_norm": 1.1029939651489258,
|
||||
"learning_rate": 1.0434782608695651e-05,
|
||||
"loss": 0.3537,
|
||||
"step": 115
|
||||
},
|
||||
{
|
||||
"epoch": 3.1379310344827585,
|
||||
"grad_norm": 1.0736896991729736,
|
||||
"learning_rate": 8.695652173913044e-06,
|
||||
"loss": 0.2121,
|
||||
"step": 116
|
||||
},
|
||||
{
|
||||
"epoch": 3.1655172413793102,
|
||||
"grad_norm": 1.1432276964187622,
|
||||
"learning_rate": 6.956521739130435e-06,
|
||||
"loss": 0.2371,
|
||||
"step": 117
|
||||
},
|
||||
{
|
||||
"epoch": 3.193103448275862,
|
||||
"grad_norm": 1.4690179824829102,
|
||||
"learning_rate": 5.217391304347826e-06,
|
||||
"loss": 0.3357,
|
||||
"step": 118
|
||||
},
|
||||
{
|
||||
"epoch": 3.220689655172414,
|
||||
"grad_norm": 1.240258812904358,
|
||||
"learning_rate": 3.4782608695652175e-06,
|
||||
"loss": 0.2727,
|
||||
"step": 119
|
||||
},
|
||||
{
|
||||
"epoch": 3.2482758620689656,
|
||||
"grad_norm": 1.2821192741394043,
|
||||
"learning_rate": 1.7391304347826088e-06,
|
||||
"loss": 0.3032,
|
||||
"step": 120
|
||||
}
|
||||
],
|
||||
"logging_steps": 1,
|
||||
"max_steps": 120,
|
||||
"num_input_tokens_seen": 0,
|
||||
"num_train_epochs": 4,
|
||||
"save_steps": 500,
|
||||
"stateful_callbacks": {
|
||||
"TrainerControl": {
|
||||
"args": {
|
||||
"should_epoch_stop": false,
|
||||
"should_evaluate": false,
|
||||
"should_log": false,
|
||||
"should_save": true,
|
||||
"should_training_stop": true
|
||||
},
|
||||
"attributes": {}
|
||||
}
|
||||
},
|
||||
"total_flos": 2.068699077260083e+16,
|
||||
"train_batch_size": 2,
|
||||
"trial_name": null,
|
||||
"trial_params": null
|
||||
}
|
||||
202
outputs/checkpoint-60/README.md
Normal file
202
outputs/checkpoint-60/README.md
Normal file
@@ -0,0 +1,202 @@
|
||||
---
|
||||
base_model: unsloth/meta-llama-3.1-8b-unsloth-bnb-4bit
|
||||
library_name: peft
|
||||
---
|
||||
|
||||
# Model Card for Model ID
|
||||
|
||||
<!-- Provide a quick summary of what the model is/does. -->
|
||||
|
||||
|
||||
|
||||
## Model Details
|
||||
|
||||
### Model Description
|
||||
|
||||
<!-- Provide a longer summary of what this model is. -->
|
||||
|
||||
|
||||
|
||||
- **Developed by:** [More Information Needed]
|
||||
- **Funded by [optional]:** [More Information Needed]
|
||||
- **Shared by [optional]:** [More Information Needed]
|
||||
- **Model type:** [More Information Needed]
|
||||
- **Language(s) (NLP):** [More Information Needed]
|
||||
- **License:** [More Information Needed]
|
||||
- **Finetuned from model [optional]:** [More Information Needed]
|
||||
|
||||
### Model Sources [optional]
|
||||
|
||||
<!-- Provide the basic links for the model. -->
|
||||
|
||||
- **Repository:** [More Information Needed]
|
||||
- **Paper [optional]:** [More Information Needed]
|
||||
- **Demo [optional]:** [More Information Needed]
|
||||
|
||||
## Uses
|
||||
|
||||
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
||||
|
||||
### Direct Use
|
||||
|
||||
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Downstream Use [optional]
|
||||
|
||||
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Out-of-Scope Use
|
||||
|
||||
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Bias, Risks, and Limitations
|
||||
|
||||
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Recommendations
|
||||
|
||||
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
||||
|
||||
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
||||
|
||||
## How to Get Started with the Model
|
||||
|
||||
Use the code below to get started with the model.
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Training Details
|
||||
|
||||
### Training Data
|
||||
|
||||
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Training Procedure
|
||||
|
||||
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
||||
|
||||
#### Preprocessing [optional]
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
|
||||
#### Training Hyperparameters
|
||||
|
||||
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
||||
|
||||
#### Speeds, Sizes, Times [optional]
|
||||
|
||||
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Evaluation
|
||||
|
||||
<!-- This section describes the evaluation protocols and provides the results. -->
|
||||
|
||||
### Testing Data, Factors & Metrics
|
||||
|
||||
#### Testing Data
|
||||
|
||||
<!-- This should link to a Dataset Card if possible. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Factors
|
||||
|
||||
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Metrics
|
||||
|
||||
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Results
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Summary
|
||||
|
||||
|
||||
|
||||
## Model Examination [optional]
|
||||
|
||||
<!-- Relevant interpretability work for the model goes here -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Environmental Impact
|
||||
|
||||
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
||||
|
||||
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
||||
|
||||
- **Hardware Type:** [More Information Needed]
|
||||
- **Hours used:** [More Information Needed]
|
||||
- **Cloud Provider:** [More Information Needed]
|
||||
- **Compute Region:** [More Information Needed]
|
||||
- **Carbon Emitted:** [More Information Needed]
|
||||
|
||||
## Technical Specifications [optional]
|
||||
|
||||
### Model Architecture and Objective
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Compute Infrastructure
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Hardware
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Software
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Citation [optional]
|
||||
|
||||
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
||||
|
||||
**BibTeX:**
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
**APA:**
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Glossary [optional]
|
||||
|
||||
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## More Information [optional]
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Model Card Authors [optional]
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Model Card Contact
|
||||
|
||||
[More Information Needed]
|
||||
### Framework versions
|
||||
|
||||
- PEFT 0.15.2
|
||||
39
outputs/checkpoint-60/adapter_config.json
Normal file
39
outputs/checkpoint-60/adapter_config.json
Normal file
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"alpha_pattern": {},
|
||||
"auto_mapping": null,
|
||||
"base_model_name_or_path": "unsloth/meta-llama-3.1-8b-unsloth-bnb-4bit",
|
||||
"bias": "none",
|
||||
"corda_config": null,
|
||||
"eva_config": null,
|
||||
"exclude_modules": null,
|
||||
"fan_in_fan_out": false,
|
||||
"inference_mode": true,
|
||||
"init_lora_weights": true,
|
||||
"layer_replication": null,
|
||||
"layers_pattern": null,
|
||||
"layers_to_transform": null,
|
||||
"loftq_config": {},
|
||||
"lora_alpha": 16,
|
||||
"lora_bias": false,
|
||||
"lora_dropout": 0,
|
||||
"megatron_config": null,
|
||||
"megatron_core": "megatron.core",
|
||||
"modules_to_save": null,
|
||||
"peft_type": "LORA",
|
||||
"r": 16,
|
||||
"rank_pattern": {},
|
||||
"revision": null,
|
||||
"target_modules": [
|
||||
"o_proj",
|
||||
"q_proj",
|
||||
"gate_proj",
|
||||
"down_proj",
|
||||
"up_proj",
|
||||
"k_proj",
|
||||
"v_proj"
|
||||
],
|
||||
"task_type": "CAUSAL_LM",
|
||||
"trainable_token_indices": null,
|
||||
"use_dora": false,
|
||||
"use_rslora": false
|
||||
}
|
||||
23
outputs/checkpoint-60/special_tokens_map.json
Normal file
23
outputs/checkpoint-60/special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|finetune_right_pad_id|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
1251004
outputs/checkpoint-60/tokenizer.json
Normal file
1251004
outputs/checkpoint-60/tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
2066
outputs/checkpoint-60/tokenizer_config.json
Normal file
2066
outputs/checkpoint-60/tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
454
outputs/checkpoint-60/trainer_state.json
Normal file
454
outputs/checkpoint-60/trainer_state.json
Normal file
@@ -0,0 +1,454 @@
|
||||
{
|
||||
"best_global_step": null,
|
||||
"best_metric": null,
|
||||
"best_model_checkpoint": null,
|
||||
"epoch": 1.6713286713286712,
|
||||
"eval_steps": 500,
|
||||
"global_step": 60,
|
||||
"is_hyper_param_search": false,
|
||||
"is_local_process_zero": true,
|
||||
"is_world_process_zero": true,
|
||||
"log_history": [
|
||||
{
|
||||
"epoch": 0.027972027972027972,
|
||||
"grad_norm": 1.1463329792022705,
|
||||
"learning_rate": 0.0,
|
||||
"loss": 1.4132,
|
||||
"step": 1
|
||||
},
|
||||
{
|
||||
"epoch": 0.055944055944055944,
|
||||
"grad_norm": 0.8597254753112793,
|
||||
"learning_rate": 4e-05,
|
||||
"loss": 0.5978,
|
||||
"step": 2
|
||||
},
|
||||
{
|
||||
"epoch": 0.08391608391608392,
|
||||
"grad_norm": 1.196772575378418,
|
||||
"learning_rate": 8e-05,
|
||||
"loss": 1.3866,
|
||||
"step": 3
|
||||
},
|
||||
{
|
||||
"epoch": 0.11188811188811189,
|
||||
"grad_norm": 1.092492699623108,
|
||||
"learning_rate": 0.00012,
|
||||
"loss": 1.5973,
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"epoch": 0.13986013986013987,
|
||||
"grad_norm": 1.1433535814285278,
|
||||
"learning_rate": 0.00016,
|
||||
"loss": 1.3213,
|
||||
"step": 5
|
||||
},
|
||||
{
|
||||
"epoch": 0.16783216783216784,
|
||||
"grad_norm": 1.0283416509628296,
|
||||
"learning_rate": 0.0002,
|
||||
"loss": 1.0864,
|
||||
"step": 6
|
||||
},
|
||||
{
|
||||
"epoch": 0.1958041958041958,
|
||||
"grad_norm": 1.0165659189224243,
|
||||
"learning_rate": 0.00019636363636363636,
|
||||
"loss": 0.7085,
|
||||
"step": 7
|
||||
},
|
||||
{
|
||||
"epoch": 0.22377622377622378,
|
||||
"grad_norm": 1.665509581565857,
|
||||
"learning_rate": 0.00019272727272727274,
|
||||
"loss": 0.9285,
|
||||
"step": 8
|
||||
},
|
||||
{
|
||||
"epoch": 0.2517482517482518,
|
||||
"grad_norm": 1.216216802597046,
|
||||
"learning_rate": 0.0001890909090909091,
|
||||
"loss": 1.1977,
|
||||
"step": 9
|
||||
},
|
||||
{
|
||||
"epoch": 0.27972027972027974,
|
||||
"grad_norm": 1.0458927154541016,
|
||||
"learning_rate": 0.00018545454545454545,
|
||||
"loss": 0.6523,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"epoch": 0.3076923076923077,
|
||||
"grad_norm": 0.8290866613388062,
|
||||
"learning_rate": 0.00018181818181818183,
|
||||
"loss": 0.5238,
|
||||
"step": 11
|
||||
},
|
||||
{
|
||||
"epoch": 0.3356643356643357,
|
||||
"grad_norm": 1.3056946992874146,
|
||||
"learning_rate": 0.0001781818181818182,
|
||||
"loss": 0.9392,
|
||||
"step": 12
|
||||
},
|
||||
{
|
||||
"epoch": 0.36363636363636365,
|
||||
"grad_norm": 0.89715576171875,
|
||||
"learning_rate": 0.00017454545454545454,
|
||||
"loss": 0.6422,
|
||||
"step": 13
|
||||
},
|
||||
{
|
||||
"epoch": 0.3916083916083916,
|
||||
"grad_norm": 0.9536185264587402,
|
||||
"learning_rate": 0.0001709090909090909,
|
||||
"loss": 0.5259,
|
||||
"step": 14
|
||||
},
|
||||
{
|
||||
"epoch": 0.4195804195804196,
|
||||
"grad_norm": 2.0107877254486084,
|
||||
"learning_rate": 0.00016727272727272728,
|
||||
"loss": 1.6038,
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"epoch": 0.44755244755244755,
|
||||
"grad_norm": 1.396262288093567,
|
||||
"learning_rate": 0.00016363636363636366,
|
||||
"loss": 0.9653,
|
||||
"step": 16
|
||||
},
|
||||
{
|
||||
"epoch": 0.4755244755244755,
|
||||
"grad_norm": 1.4076898097991943,
|
||||
"learning_rate": 0.00016,
|
||||
"loss": 1.2386,
|
||||
"step": 17
|
||||
},
|
||||
{
|
||||
"epoch": 0.5034965034965035,
|
||||
"grad_norm": 1.2920570373535156,
|
||||
"learning_rate": 0.00015636363636363637,
|
||||
"loss": 1.3082,
|
||||
"step": 18
|
||||
},
|
||||
{
|
||||
"epoch": 0.5314685314685315,
|
||||
"grad_norm": 0.8619877696037292,
|
||||
"learning_rate": 0.00015272727272727275,
|
||||
"loss": 0.4883,
|
||||
"step": 19
|
||||
},
|
||||
{
|
||||
"epoch": 0.5594405594405595,
|
||||
"grad_norm": 1.2779756784439087,
|
||||
"learning_rate": 0.0001490909090909091,
|
||||
"loss": 0.6035,
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"epoch": 0.5874125874125874,
|
||||
"grad_norm": 1.474133014678955,
|
||||
"learning_rate": 0.00014545454545454546,
|
||||
"loss": 0.8825,
|
||||
"step": 21
|
||||
},
|
||||
{
|
||||
"epoch": 0.6153846153846154,
|
||||
"grad_norm": 1.3216720819473267,
|
||||
"learning_rate": 0.00014181818181818184,
|
||||
"loss": 0.8638,
|
||||
"step": 22
|
||||
},
|
||||
{
|
||||
"epoch": 0.6433566433566433,
|
||||
"grad_norm": 1.0999218225479126,
|
||||
"learning_rate": 0.0001381818181818182,
|
||||
"loss": 0.6484,
|
||||
"step": 23
|
||||
},
|
||||
{
|
||||
"epoch": 0.6713286713286714,
|
||||
"grad_norm": 1.1263219118118286,
|
||||
"learning_rate": 0.00013454545454545455,
|
||||
"loss": 0.5965,
|
||||
"step": 24
|
||||
},
|
||||
{
|
||||
"epoch": 0.6993006993006993,
|
||||
"grad_norm": 1.020799994468689,
|
||||
"learning_rate": 0.00013090909090909093,
|
||||
"loss": 0.5809,
|
||||
"step": 25
|
||||
},
|
||||
{
|
||||
"epoch": 0.7272727272727273,
|
||||
"grad_norm": 1.032562494277954,
|
||||
"learning_rate": 0.00012727272727272728,
|
||||
"loss": 0.6899,
|
||||
"step": 26
|
||||
},
|
||||
{
|
||||
"epoch": 0.7552447552447552,
|
||||
"grad_norm": 1.8015700578689575,
|
||||
"learning_rate": 0.00012363636363636364,
|
||||
"loss": 1.351,
|
||||
"step": 27
|
||||
},
|
||||
{
|
||||
"epoch": 0.7832167832167832,
|
||||
"grad_norm": 1.6515522003173828,
|
||||
"learning_rate": 0.00012,
|
||||
"loss": 1.3753,
|
||||
"step": 28
|
||||
},
|
||||
{
|
||||
"epoch": 0.8111888111888111,
|
||||
"grad_norm": 1.4862653017044067,
|
||||
"learning_rate": 0.00011636363636363636,
|
||||
"loss": 1.045,
|
||||
"step": 29
|
||||
},
|
||||
{
|
||||
"epoch": 0.8391608391608392,
|
||||
"grad_norm": 1.2828856706619263,
|
||||
"learning_rate": 0.00011272727272727272,
|
||||
"loss": 1.1004,
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"epoch": 0.8671328671328671,
|
||||
"grad_norm": 0.9894140362739563,
|
||||
"learning_rate": 0.00010909090909090909,
|
||||
"loss": 0.5355,
|
||||
"step": 31
|
||||
},
|
||||
{
|
||||
"epoch": 0.8951048951048951,
|
||||
"grad_norm": 1.5945513248443604,
|
||||
"learning_rate": 0.00010545454545454545,
|
||||
"loss": 1.1733,
|
||||
"step": 32
|
||||
},
|
||||
{
|
||||
"epoch": 0.9230769230769231,
|
||||
"grad_norm": 1.453596830368042,
|
||||
"learning_rate": 0.00010181818181818181,
|
||||
"loss": 1.1949,
|
||||
"step": 33
|
||||
},
|
||||
{
|
||||
"epoch": 0.951048951048951,
|
||||
"grad_norm": 1.5049810409545898,
|
||||
"learning_rate": 9.818181818181818e-05,
|
||||
"loss": 1.0341,
|
||||
"step": 34
|
||||
},
|
||||
{
|
||||
"epoch": 0.9790209790209791,
|
||||
"grad_norm": 1.3859373331069946,
|
||||
"learning_rate": 9.454545454545455e-05,
|
||||
"loss": 0.9874,
|
||||
"step": 35
|
||||
},
|
||||
{
|
||||
"epoch": 1.0,
|
||||
"grad_norm": 1.5079317092895508,
|
||||
"learning_rate": 9.090909090909092e-05,
|
||||
"loss": 0.9925,
|
||||
"step": 36
|
||||
},
|
||||
{
|
||||
"epoch": 1.027972027972028,
|
||||
"grad_norm": 1.2381432056427002,
|
||||
"learning_rate": 8.727272727272727e-05,
|
||||
"loss": 0.9168,
|
||||
"step": 37
|
||||
},
|
||||
{
|
||||
"epoch": 1.055944055944056,
|
||||
"grad_norm": 1.0585517883300781,
|
||||
"learning_rate": 8.363636363636364e-05,
|
||||
"loss": 0.9168,
|
||||
"step": 38
|
||||
},
|
||||
{
|
||||
"epoch": 1.083916083916084,
|
||||
"grad_norm": 1.246953010559082,
|
||||
"learning_rate": 8e-05,
|
||||
"loss": 0.9327,
|
||||
"step": 39
|
||||
},
|
||||
{
|
||||
"epoch": 1.1118881118881119,
|
||||
"grad_norm": 1.295661211013794,
|
||||
"learning_rate": 7.636363636363637e-05,
|
||||
"loss": 0.8212,
|
||||
"step": 40
|
||||
},
|
||||
{
|
||||
"epoch": 1.1398601398601398,
|
||||
"grad_norm": 1.1516053676605225,
|
||||
"learning_rate": 7.272727272727273e-05,
|
||||
"loss": 0.5509,
|
||||
"step": 41
|
||||
},
|
||||
{
|
||||
"epoch": 1.167832167832168,
|
||||
"grad_norm": 0.874414324760437,
|
||||
"learning_rate": 6.90909090909091e-05,
|
||||
"loss": 0.3707,
|
||||
"step": 42
|
||||
},
|
||||
{
|
||||
"epoch": 1.1958041958041958,
|
||||
"grad_norm": 1.9163153171539307,
|
||||
"learning_rate": 6.545454545454546e-05,
|
||||
"loss": 1.2245,
|
||||
"step": 43
|
||||
},
|
||||
{
|
||||
"epoch": 1.2237762237762237,
|
||||
"grad_norm": 1.3832831382751465,
|
||||
"learning_rate": 6.181818181818182e-05,
|
||||
"loss": 0.8484,
|
||||
"step": 44
|
||||
},
|
||||
{
|
||||
"epoch": 1.2517482517482517,
|
||||
"grad_norm": 1.5212609767913818,
|
||||
"learning_rate": 5.818181818181818e-05,
|
||||
"loss": 0.563,
|
||||
"step": 45
|
||||
},
|
||||
{
|
||||
"epoch": 1.2797202797202798,
|
||||
"grad_norm": 1.087664008140564,
|
||||
"learning_rate": 5.4545454545454546e-05,
|
||||
"loss": 0.399,
|
||||
"step": 46
|
||||
},
|
||||
{
|
||||
"epoch": 1.3076923076923077,
|
||||
"grad_norm": 1.8231722116470337,
|
||||
"learning_rate": 5.090909090909091e-05,
|
||||
"loss": 0.9092,
|
||||
"step": 47
|
||||
},
|
||||
{
|
||||
"epoch": 1.3356643356643356,
|
||||
"grad_norm": 1.591951608657837,
|
||||
"learning_rate": 4.7272727272727275e-05,
|
||||
"loss": 0.9348,
|
||||
"step": 48
|
||||
},
|
||||
{
|
||||
"epoch": 1.3636363636363638,
|
||||
"grad_norm": 1.0458203554153442,
|
||||
"learning_rate": 4.3636363636363636e-05,
|
||||
"loss": 0.3926,
|
||||
"step": 49
|
||||
},
|
||||
{
|
||||
"epoch": 1.3916083916083917,
|
||||
"grad_norm": 1.0491923093795776,
|
||||
"learning_rate": 4e-05,
|
||||
"loss": 0.3799,
|
||||
"step": 50
|
||||
},
|
||||
{
|
||||
"epoch": 1.4195804195804196,
|
||||
"grad_norm": 1.5752729177474976,
|
||||
"learning_rate": 3.6363636363636364e-05,
|
||||
"loss": 0.6908,
|
||||
"step": 51
|
||||
},
|
||||
{
|
||||
"epoch": 1.4475524475524475,
|
||||
"grad_norm": 1.6831164360046387,
|
||||
"learning_rate": 3.272727272727273e-05,
|
||||
"loss": 0.7934,
|
||||
"step": 52
|
||||
},
|
||||
{
|
||||
"epoch": 1.4755244755244754,
|
||||
"grad_norm": 1.3585453033447266,
|
||||
"learning_rate": 2.909090909090909e-05,
|
||||
"loss": 0.3979,
|
||||
"step": 53
|
||||
},
|
||||
{
|
||||
"epoch": 1.5034965034965035,
|
||||
"grad_norm": 1.3879740238189697,
|
||||
"learning_rate": 2.5454545454545454e-05,
|
||||
"loss": 0.962,
|
||||
"step": 54
|
||||
},
|
||||
{
|
||||
"epoch": 1.5314685314685315,
|
||||
"grad_norm": 1.542452096939087,
|
||||
"learning_rate": 2.1818181818181818e-05,
|
||||
"loss": 0.6852,
|
||||
"step": 55
|
||||
},
|
||||
{
|
||||
"epoch": 1.5594405594405596,
|
||||
"grad_norm": 1.3172391653060913,
|
||||
"learning_rate": 1.8181818181818182e-05,
|
||||
"loss": 1.1097,
|
||||
"step": 56
|
||||
},
|
||||
{
|
||||
"epoch": 1.5874125874125875,
|
||||
"grad_norm": 1.2537016868591309,
|
||||
"learning_rate": 1.4545454545454545e-05,
|
||||
"loss": 0.6174,
|
||||
"step": 57
|
||||
},
|
||||
{
|
||||
"epoch": 1.6153846153846154,
|
||||
"grad_norm": 1.3360239267349243,
|
||||
"learning_rate": 1.0909090909090909e-05,
|
||||
"loss": 0.8211,
|
||||
"step": 58
|
||||
},
|
||||
{
|
||||
"epoch": 1.6433566433566433,
|
||||
"grad_norm": 1.0257346630096436,
|
||||
"learning_rate": 7.272727272727272e-06,
|
||||
"loss": 0.374,
|
||||
"step": 59
|
||||
},
|
||||
{
|
||||
"epoch": 1.6713286713286712,
|
||||
"grad_norm": 1.2555755376815796,
|
||||
"learning_rate": 3.636363636363636e-06,
|
||||
"loss": 1.0604,
|
||||
"step": 60
|
||||
}
|
||||
],
|
||||
"logging_steps": 1,
|
||||
"max_steps": 60,
|
||||
"num_input_tokens_seen": 0,
|
||||
"num_train_epochs": 2,
|
||||
"save_steps": 500,
|
||||
"stateful_callbacks": {
|
||||
"TrainerControl": {
|
||||
"args": {
|
||||
"should_epoch_stop": false,
|
||||
"should_evaluate": false,
|
||||
"should_log": false,
|
||||
"should_save": true,
|
||||
"should_training_stop": true
|
||||
},
|
||||
"attributes": {}
|
||||
}
|
||||
},
|
||||
"total_flos": 4818101472165888.0,
|
||||
"train_batch_size": 2,
|
||||
"trial_name": null,
|
||||
"trial_params": null
|
||||
}
|
||||
202
outputs/checkpoint-90/README.md
Normal file
202
outputs/checkpoint-90/README.md
Normal file
@@ -0,0 +1,202 @@
|
||||
---
|
||||
base_model: unsloth/meta-llama-3.1-8b-unsloth-bnb-4bit
|
||||
library_name: peft
|
||||
---
|
||||
|
||||
# Model Card for Model ID
|
||||
|
||||
<!-- Provide a quick summary of what the model is/does. -->
|
||||
|
||||
|
||||
|
||||
## Model Details
|
||||
|
||||
### Model Description
|
||||
|
||||
<!-- Provide a longer summary of what this model is. -->
|
||||
|
||||
|
||||
|
||||
- **Developed by:** [More Information Needed]
|
||||
- **Funded by [optional]:** [More Information Needed]
|
||||
- **Shared by [optional]:** [More Information Needed]
|
||||
- **Model type:** [More Information Needed]
|
||||
- **Language(s) (NLP):** [More Information Needed]
|
||||
- **License:** [More Information Needed]
|
||||
- **Finetuned from model [optional]:** [More Information Needed]
|
||||
|
||||
### Model Sources [optional]
|
||||
|
||||
<!-- Provide the basic links for the model. -->
|
||||
|
||||
- **Repository:** [More Information Needed]
|
||||
- **Paper [optional]:** [More Information Needed]
|
||||
- **Demo [optional]:** [More Information Needed]
|
||||
|
||||
## Uses
|
||||
|
||||
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
||||
|
||||
### Direct Use
|
||||
|
||||
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Downstream Use [optional]
|
||||
|
||||
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Out-of-Scope Use
|
||||
|
||||
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Bias, Risks, and Limitations
|
||||
|
||||
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Recommendations
|
||||
|
||||
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
||||
|
||||
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
||||
|
||||
## How to Get Started with the Model
|
||||
|
||||
Use the code below to get started with the model.
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Training Details
|
||||
|
||||
### Training Data
|
||||
|
||||
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Training Procedure
|
||||
|
||||
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
||||
|
||||
#### Preprocessing [optional]
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
|
||||
#### Training Hyperparameters
|
||||
|
||||
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
||||
|
||||
#### Speeds, Sizes, Times [optional]
|
||||
|
||||
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Evaluation
|
||||
|
||||
<!-- This section describes the evaluation protocols and provides the results. -->
|
||||
|
||||
### Testing Data, Factors & Metrics
|
||||
|
||||
#### Testing Data
|
||||
|
||||
<!-- This should link to a Dataset Card if possible. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Factors
|
||||
|
||||
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Metrics
|
||||
|
||||
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Results
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Summary
|
||||
|
||||
|
||||
|
||||
## Model Examination [optional]
|
||||
|
||||
<!-- Relevant interpretability work for the model goes here -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Environmental Impact
|
||||
|
||||
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
||||
|
||||
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
||||
|
||||
- **Hardware Type:** [More Information Needed]
|
||||
- **Hours used:** [More Information Needed]
|
||||
- **Cloud Provider:** [More Information Needed]
|
||||
- **Compute Region:** [More Information Needed]
|
||||
- **Carbon Emitted:** [More Information Needed]
|
||||
|
||||
## Technical Specifications [optional]
|
||||
|
||||
### Model Architecture and Objective
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
### Compute Infrastructure
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Hardware
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
#### Software
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Citation [optional]
|
||||
|
||||
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
||||
|
||||
**BibTeX:**
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
**APA:**
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Glossary [optional]
|
||||
|
||||
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## More Information [optional]
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Model Card Authors [optional]
|
||||
|
||||
[More Information Needed]
|
||||
|
||||
## Model Card Contact
|
||||
|
||||
[More Information Needed]
|
||||
### Framework versions
|
||||
|
||||
- PEFT 0.15.2
|
||||
39
outputs/checkpoint-90/adapter_config.json
Normal file
39
outputs/checkpoint-90/adapter_config.json
Normal file
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"alpha_pattern": {},
|
||||
"auto_mapping": null,
|
||||
"base_model_name_or_path": "unsloth/meta-llama-3.1-8b-unsloth-bnb-4bit",
|
||||
"bias": "none",
|
||||
"corda_config": null,
|
||||
"eva_config": null,
|
||||
"exclude_modules": null,
|
||||
"fan_in_fan_out": false,
|
||||
"inference_mode": true,
|
||||
"init_lora_weights": true,
|
||||
"layer_replication": null,
|
||||
"layers_pattern": null,
|
||||
"layers_to_transform": null,
|
||||
"loftq_config": {},
|
||||
"lora_alpha": 16,
|
||||
"lora_bias": false,
|
||||
"lora_dropout": 0,
|
||||
"megatron_config": null,
|
||||
"megatron_core": "megatron.core",
|
||||
"modules_to_save": null,
|
||||
"peft_type": "LORA",
|
||||
"r": 16,
|
||||
"rank_pattern": {},
|
||||
"revision": null,
|
||||
"target_modules": [
|
||||
"q_proj",
|
||||
"o_proj",
|
||||
"gate_proj",
|
||||
"up_proj",
|
||||
"k_proj",
|
||||
"v_proj",
|
||||
"down_proj"
|
||||
],
|
||||
"task_type": "CAUSAL_LM",
|
||||
"trainable_token_indices": null,
|
||||
"use_dora": false,
|
||||
"use_rslora": false
|
||||
}
|
||||
23
outputs/checkpoint-90/special_tokens_map.json
Normal file
23
outputs/checkpoint-90/special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<|begin_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|end_of_text|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|finetune_right_pad_id|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
1251009
outputs/checkpoint-90/tokenizer.json
Normal file
1251009
outputs/checkpoint-90/tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
2066
outputs/checkpoint-90/tokenizer_config.json
Normal file
2066
outputs/checkpoint-90/tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
664
outputs/checkpoint-90/trainer_state.json
Normal file
664
outputs/checkpoint-90/trainer_state.json
Normal file
@@ -0,0 +1,664 @@
|
||||
{
|
||||
"best_global_step": null,
|
||||
"best_metric": null,
|
||||
"best_model_checkpoint": null,
|
||||
"epoch": 3.606060606060606,
|
||||
"eval_steps": 500,
|
||||
"global_step": 90,
|
||||
"is_hyper_param_search": false,
|
||||
"is_local_process_zero": true,
|
||||
"is_world_process_zero": true,
|
||||
"log_history": [
|
||||
{
|
||||
"epoch": 0.04040404040404041,
|
||||
"grad_norm": 0.7893974184989929,
|
||||
"learning_rate": 0.0,
|
||||
"loss": 0.1676,
|
||||
"step": 1
|
||||
},
|
||||
{
|
||||
"epoch": 0.08080808080808081,
|
||||
"grad_norm": 1.0626084804534912,
|
||||
"learning_rate": 4e-05,
|
||||
"loss": 0.2393,
|
||||
"step": 2
|
||||
},
|
||||
{
|
||||
"epoch": 0.12121212121212122,
|
||||
"grad_norm": 1.0501738786697388,
|
||||
"learning_rate": 8e-05,
|
||||
"loss": 0.2211,
|
||||
"step": 3
|
||||
},
|
||||
{
|
||||
"epoch": 0.16161616161616163,
|
||||
"grad_norm": 0.21509379148483276,
|
||||
"learning_rate": 0.00012,
|
||||
"loss": 0.066,
|
||||
"step": 4
|
||||
},
|
||||
{
|
||||
"epoch": 0.20202020202020202,
|
||||
"grad_norm": 0.17153297364711761,
|
||||
"learning_rate": 0.00016,
|
||||
"loss": 0.0617,
|
||||
"step": 5
|
||||
},
|
||||
{
|
||||
"epoch": 0.24242424242424243,
|
||||
"grad_norm": 0.7136198878288269,
|
||||
"learning_rate": 0.0002,
|
||||
"loss": 0.1641,
|
||||
"step": 6
|
||||
},
|
||||
{
|
||||
"epoch": 0.2828282828282828,
|
||||
"grad_norm": 0.3403281569480896,
|
||||
"learning_rate": 0.00019764705882352942,
|
||||
"loss": 0.0778,
|
||||
"step": 7
|
||||
},
|
||||
{
|
||||
"epoch": 0.32323232323232326,
|
||||
"grad_norm": 0.589501142501831,
|
||||
"learning_rate": 0.00019529411764705883,
|
||||
"loss": 0.1492,
|
||||
"step": 8
|
||||
},
|
||||
{
|
||||
"epoch": 0.36363636363636365,
|
||||
"grad_norm": 0.42319923639297485,
|
||||
"learning_rate": 0.00019294117647058825,
|
||||
"loss": 0.1345,
|
||||
"step": 9
|
||||
},
|
||||
{
|
||||
"epoch": 0.40404040404040403,
|
||||
"grad_norm": 0.40605250000953674,
|
||||
"learning_rate": 0.00019058823529411766,
|
||||
"loss": 0.1056,
|
||||
"step": 10
|
||||
},
|
||||
{
|
||||
"epoch": 0.4444444444444444,
|
||||
"grad_norm": 0.5076654553413391,
|
||||
"learning_rate": 0.00018823529411764707,
|
||||
"loss": 0.1595,
|
||||
"step": 11
|
||||
},
|
||||
{
|
||||
"epoch": 0.48484848484848486,
|
||||
"grad_norm": 0.37221914529800415,
|
||||
"learning_rate": 0.00018588235294117648,
|
||||
"loss": 0.1089,
|
||||
"step": 12
|
||||
},
|
||||
{
|
||||
"epoch": 0.5252525252525253,
|
||||
"grad_norm": 0.49597060680389404,
|
||||
"learning_rate": 0.0001835294117647059,
|
||||
"loss": 0.0955,
|
||||
"step": 13
|
||||
},
|
||||
{
|
||||
"epoch": 0.5656565656565656,
|
||||
"grad_norm": 0.3603276312351227,
|
||||
"learning_rate": 0.0001811764705882353,
|
||||
"loss": 0.0874,
|
||||
"step": 14
|
||||
},
|
||||
{
|
||||
"epoch": 0.6060606060606061,
|
||||
"grad_norm": 0.3946147561073303,
|
||||
"learning_rate": 0.00017882352941176472,
|
||||
"loss": 0.0834,
|
||||
"step": 15
|
||||
},
|
||||
{
|
||||
"epoch": 0.6464646464646465,
|
||||
"grad_norm": 0.31046634912490845,
|
||||
"learning_rate": 0.00017647058823529413,
|
||||
"loss": 0.0845,
|
||||
"step": 16
|
||||
},
|
||||
{
|
||||
"epoch": 0.6868686868686869,
|
||||
"grad_norm": 0.31403785943984985,
|
||||
"learning_rate": 0.00017411764705882354,
|
||||
"loss": 0.0942,
|
||||
"step": 17
|
||||
},
|
||||
{
|
||||
"epoch": 0.7272727272727273,
|
||||
"grad_norm": 0.32949572801589966,
|
||||
"learning_rate": 0.00017176470588235293,
|
||||
"loss": 0.0976,
|
||||
"step": 18
|
||||
},
|
||||
{
|
||||
"epoch": 0.7676767676767676,
|
||||
"grad_norm": 0.8442594408988953,
|
||||
"learning_rate": 0.00016941176470588237,
|
||||
"loss": 0.1502,
|
||||
"step": 19
|
||||
},
|
||||
{
|
||||
"epoch": 0.8080808080808081,
|
||||
"grad_norm": 0.8818703889846802,
|
||||
"learning_rate": 0.00016705882352941178,
|
||||
"loss": 0.1008,
|
||||
"step": 20
|
||||
},
|
||||
{
|
||||
"epoch": 0.8484848484848485,
|
||||
"grad_norm": 1.1039226055145264,
|
||||
"learning_rate": 0.0001647058823529412,
|
||||
"loss": 0.0908,
|
||||
"step": 21
|
||||
},
|
||||
{
|
||||
"epoch": 0.8888888888888888,
|
||||
"grad_norm": 0.616620659828186,
|
||||
"learning_rate": 0.0001623529411764706,
|
||||
"loss": 0.1022,
|
||||
"step": 22
|
||||
},
|
||||
{
|
||||
"epoch": 0.9292929292929293,
|
||||
"grad_norm": 0.43923619389533997,
|
||||
"learning_rate": 0.00016,
|
||||
"loss": 0.1095,
|
||||
"step": 23
|
||||
},
|
||||
{
|
||||
"epoch": 0.9696969696969697,
|
||||
"grad_norm": 0.668854296207428,
|
||||
"learning_rate": 0.00015764705882352943,
|
||||
"loss": 0.0994,
|
||||
"step": 24
|
||||
},
|
||||
{
|
||||
"epoch": 1.0,
|
||||
"grad_norm": 0.54339200258255,
|
||||
"learning_rate": 0.00015529411764705884,
|
||||
"loss": 0.0822,
|
||||
"step": 25
|
||||
},
|
||||
{
|
||||
"epoch": 1.0404040404040404,
|
||||
"grad_norm": 0.52640700340271,
|
||||
"learning_rate": 0.00015294117647058822,
|
||||
"loss": 0.091,
|
||||
"step": 26
|
||||
},
|
||||
{
|
||||
"epoch": 1.0808080808080809,
|
||||
"grad_norm": 0.243753120303154,
|
||||
"learning_rate": 0.00015058823529411766,
|
||||
"loss": 0.0753,
|
||||
"step": 27
|
||||
},
|
||||
{
|
||||
"epoch": 1.121212121212121,
|
||||
"grad_norm": 0.16135047376155853,
|
||||
"learning_rate": 0.00014823529411764707,
|
||||
"loss": 0.0818,
|
||||
"step": 28
|
||||
},
|
||||
{
|
||||
"epoch": 1.1616161616161615,
|
||||
"grad_norm": 0.9692177772521973,
|
||||
"learning_rate": 0.00014588235294117646,
|
||||
"loss": 0.1394,
|
||||
"step": 29
|
||||
},
|
||||
{
|
||||
"epoch": 1.202020202020202,
|
||||
"grad_norm": 0.48012155294418335,
|
||||
"learning_rate": 0.0001435294117647059,
|
||||
"loss": 0.095,
|
||||
"step": 30
|
||||
},
|
||||
{
|
||||
"epoch": 1.2424242424242424,
|
||||
"grad_norm": 0.3694566786289215,
|
||||
"learning_rate": 0.0001411764705882353,
|
||||
"loss": 0.0776,
|
||||
"step": 31
|
||||
},
|
||||
{
|
||||
"epoch": 1.2828282828282829,
|
||||
"grad_norm": 0.604898989200592,
|
||||
"learning_rate": 0.00013882352941176472,
|
||||
"loss": 0.0727,
|
||||
"step": 32
|
||||
},
|
||||
{
|
||||
"epoch": 1.3232323232323233,
|
||||
"grad_norm": 0.6668853163719177,
|
||||
"learning_rate": 0.00013647058823529413,
|
||||
"loss": 0.1211,
|
||||
"step": 33
|
||||
},
|
||||
{
|
||||
"epoch": 1.3636363636363638,
|
||||
"grad_norm": 0.8030984401702881,
|
||||
"learning_rate": 0.00013411764705882352,
|
||||
"loss": 0.0724,
|
||||
"step": 34
|
||||
},
|
||||
{
|
||||
"epoch": 1.404040404040404,
|
||||
"grad_norm": 0.5926573872566223,
|
||||
"learning_rate": 0.00013176470588235296,
|
||||
"loss": 0.0671,
|
||||
"step": 35
|
||||
},
|
||||
{
|
||||
"epoch": 1.4444444444444444,
|
||||
"grad_norm": 0.20058207213878632,
|
||||
"learning_rate": 0.00012941176470588237,
|
||||
"loss": 0.0686,
|
||||
"step": 36
|
||||
},
|
||||
{
|
||||
"epoch": 1.4848484848484849,
|
||||
"grad_norm": 0.30539166927337646,
|
||||
"learning_rate": 0.00012705882352941175,
|
||||
"loss": 0.0968,
|
||||
"step": 37
|
||||
},
|
||||
{
|
||||
"epoch": 1.5252525252525253,
|
||||
"grad_norm": 0.6506590247154236,
|
||||
"learning_rate": 0.0001247058823529412,
|
||||
"loss": 0.0972,
|
||||
"step": 38
|
||||
},
|
||||
{
|
||||
"epoch": 1.5656565656565657,
|
||||
"grad_norm": 0.647463858127594,
|
||||
"learning_rate": 0.0001223529411764706,
|
||||
"loss": 0.0786,
|
||||
"step": 39
|
||||
},
|
||||
{
|
||||
"epoch": 1.606060606060606,
|
||||
"grad_norm": 0.4133020043373108,
|
||||
"learning_rate": 0.00012,
|
||||
"loss": 0.0985,
|
||||
"step": 40
|
||||
},
|
||||
{
|
||||
"epoch": 1.6464646464646466,
|
||||
"grad_norm": 0.798978328704834,
|
||||
"learning_rate": 0.00011764705882352942,
|
||||
"loss": 0.0993,
|
||||
"step": 41
|
||||
},
|
||||
{
|
||||
"epoch": 1.6868686868686869,
|
||||
"grad_norm": 0.438997358083725,
|
||||
"learning_rate": 0.00011529411764705881,
|
||||
"loss": 0.1002,
|
||||
"step": 42
|
||||
},
|
||||
{
|
||||
"epoch": 1.7272727272727273,
|
||||
"grad_norm": 0.2584928870201111,
|
||||
"learning_rate": 0.00011294117647058824,
|
||||
"loss": 0.0851,
|
||||
"step": 43
|
||||
},
|
||||
{
|
||||
"epoch": 1.7676767676767677,
|
||||
"grad_norm": 0.259726345539093,
|
||||
"learning_rate": 0.00011058823529411766,
|
||||
"loss": 0.0859,
|
||||
"step": 44
|
||||
},
|
||||
{
|
||||
"epoch": 1.808080808080808,
|
||||
"grad_norm": 0.44141435623168945,
|
||||
"learning_rate": 0.00010823529411764706,
|
||||
"loss": 0.1094,
|
||||
"step": 45
|
||||
},
|
||||
{
|
||||
"epoch": 1.8484848484848486,
|
||||
"grad_norm": 0.5731039047241211,
|
||||
"learning_rate": 0.00010588235294117647,
|
||||
"loss": 0.1231,
|
||||
"step": 46
|
||||
},
|
||||
{
|
||||
"epoch": 1.8888888888888888,
|
||||
"grad_norm": 0.3471589684486389,
|
||||
"learning_rate": 0.0001035294117647059,
|
||||
"loss": 0.0773,
|
||||
"step": 47
|
||||
},
|
||||
{
|
||||
"epoch": 1.9292929292929293,
|
||||
"grad_norm": 0.2618795335292816,
|
||||
"learning_rate": 0.0001011764705882353,
|
||||
"loss": 0.0832,
|
||||
"step": 48
|
||||
},
|
||||
{
|
||||
"epoch": 1.9696969696969697,
|
||||
"grad_norm": 0.4264814257621765,
|
||||
"learning_rate": 9.882352941176471e-05,
|
||||
"loss": 0.0925,
|
||||
"step": 49
|
||||
},
|
||||
{
|
||||
"epoch": 2.0,
|
||||
"grad_norm": 0.5760068297386169,
|
||||
"learning_rate": 9.647058823529412e-05,
|
||||
"loss": 0.0778,
|
||||
"step": 50
|
||||
},
|
||||
{
|
||||
"epoch": 2.04040404040404,
|
||||
"grad_norm": 0.22954879701137543,
|
||||
"learning_rate": 9.411764705882353e-05,
|
||||
"loss": 0.0733,
|
||||
"step": 51
|
||||
},
|
||||
{
|
||||
"epoch": 2.080808080808081,
|
||||
"grad_norm": 0.21470747888088226,
|
||||
"learning_rate": 9.176470588235295e-05,
|
||||
"loss": 0.0716,
|
||||
"step": 52
|
||||
},
|
||||
{
|
||||
"epoch": 2.121212121212121,
|
||||
"grad_norm": 0.2303597778081894,
|
||||
"learning_rate": 8.941176470588236e-05,
|
||||
"loss": 0.0738,
|
||||
"step": 53
|
||||
},
|
||||
{
|
||||
"epoch": 2.1616161616161618,
|
||||
"grad_norm": 0.2480212152004242,
|
||||
"learning_rate": 8.705882352941177e-05,
|
||||
"loss": 0.0791,
|
||||
"step": 54
|
||||
},
|
||||
{
|
||||
"epoch": 2.202020202020202,
|
||||
"grad_norm": 0.1986403614282608,
|
||||
"learning_rate": 8.470588235294118e-05,
|
||||
"loss": 0.0673,
|
||||
"step": 55
|
||||
},
|
||||
{
|
||||
"epoch": 2.242424242424242,
|
||||
"grad_norm": 0.2764434218406677,
|
||||
"learning_rate": 8.23529411764706e-05,
|
||||
"loss": 0.0764,
|
||||
"step": 56
|
||||
},
|
||||
{
|
||||
"epoch": 2.282828282828283,
|
||||
"grad_norm": 0.45056474208831787,
|
||||
"learning_rate": 8e-05,
|
||||
"loss": 0.0683,
|
||||
"step": 57
|
||||
},
|
||||
{
|
||||
"epoch": 2.323232323232323,
|
||||
"grad_norm": 0.37713348865509033,
|
||||
"learning_rate": 7.764705882352942e-05,
|
||||
"loss": 0.0785,
|
||||
"step": 58
|
||||
},
|
||||
{
|
||||
"epoch": 2.3636363636363638,
|
||||
"grad_norm": 0.19750048220157623,
|
||||
"learning_rate": 7.529411764705883e-05,
|
||||
"loss": 0.0719,
|
||||
"step": 59
|
||||
},
|
||||
{
|
||||
"epoch": 2.404040404040404,
|
||||
"grad_norm": 0.23382727801799774,
|
||||
"learning_rate": 7.294117647058823e-05,
|
||||
"loss": 0.0769,
|
||||
"step": 60
|
||||
},
|
||||
{
|
||||
"epoch": 2.4444444444444446,
|
||||
"grad_norm": 0.43519431352615356,
|
||||
"learning_rate": 7.058823529411765e-05,
|
||||
"loss": 0.0953,
|
||||
"step": 61
|
||||
},
|
||||
{
|
||||
"epoch": 2.484848484848485,
|
||||
"grad_norm": 0.8023049831390381,
|
||||
"learning_rate": 6.823529411764707e-05,
|
||||
"loss": 0.0978,
|
||||
"step": 62
|
||||
},
|
||||
{
|
||||
"epoch": 2.525252525252525,
|
||||
"grad_norm": 0.5448880195617676,
|
||||
"learning_rate": 6.588235294117648e-05,
|
||||
"loss": 0.0786,
|
||||
"step": 63
|
||||
},
|
||||
{
|
||||
"epoch": 2.5656565656565657,
|
||||
"grad_norm": 0.5319021940231323,
|
||||
"learning_rate": 6.352941176470588e-05,
|
||||
"loss": 0.0837,
|
||||
"step": 64
|
||||
},
|
||||
{
|
||||
"epoch": 2.606060606060606,
|
||||
"grad_norm": 0.3056259751319885,
|
||||
"learning_rate": 6.11764705882353e-05,
|
||||
"loss": 0.0716,
|
||||
"step": 65
|
||||
},
|
||||
{
|
||||
"epoch": 2.6464646464646466,
|
||||
"grad_norm": 0.3007633686065674,
|
||||
"learning_rate": 5.882352941176471e-05,
|
||||
"loss": 0.0781,
|
||||
"step": 66
|
||||
},
|
||||
{
|
||||
"epoch": 2.686868686868687,
|
||||
"grad_norm": 0.517301619052887,
|
||||
"learning_rate": 5.647058823529412e-05,
|
||||
"loss": 0.0751,
|
||||
"step": 67
|
||||
},
|
||||
{
|
||||
"epoch": 2.7272727272727275,
|
||||
"grad_norm": 0.31967368721961975,
|
||||
"learning_rate": 5.411764705882353e-05,
|
||||
"loss": 0.0948,
|
||||
"step": 68
|
||||
},
|
||||
{
|
||||
"epoch": 2.7676767676767677,
|
||||
"grad_norm": 0.22360506653785706,
|
||||
"learning_rate": 5.176470588235295e-05,
|
||||
"loss": 0.0721,
|
||||
"step": 69
|
||||
},
|
||||
{
|
||||
"epoch": 2.808080808080808,
|
||||
"grad_norm": 0.8932453393936157,
|
||||
"learning_rate": 4.9411764705882355e-05,
|
||||
"loss": 0.083,
|
||||
"step": 70
|
||||
},
|
||||
{
|
||||
"epoch": 2.8484848484848486,
|
||||
"grad_norm": 0.17888718843460083,
|
||||
"learning_rate": 4.705882352941177e-05,
|
||||
"loss": 0.0759,
|
||||
"step": 71
|
||||
},
|
||||
{
|
||||
"epoch": 2.888888888888889,
|
||||
"grad_norm": 0.2312222719192505,
|
||||
"learning_rate": 4.470588235294118e-05,
|
||||
"loss": 0.0819,
|
||||
"step": 72
|
||||
},
|
||||
{
|
||||
"epoch": 2.929292929292929,
|
||||
"grad_norm": 0.3377898335456848,
|
||||
"learning_rate": 4.235294117647059e-05,
|
||||
"loss": 0.091,
|
||||
"step": 73
|
||||
},
|
||||
{
|
||||
"epoch": 2.9696969696969697,
|
||||
"grad_norm": 0.22434180974960327,
|
||||
"learning_rate": 4e-05,
|
||||
"loss": 0.0656,
|
||||
"step": 74
|
||||
},
|
||||
{
|
||||
"epoch": 3.0,
|
||||
"grad_norm": 0.4803672432899475,
|
||||
"learning_rate": 3.7647058823529415e-05,
|
||||
"loss": 0.0654,
|
||||
"step": 75
|
||||
},
|
||||
{
|
||||
"epoch": 3.04040404040404,
|
||||
"grad_norm": 0.18344801664352417,
|
||||
"learning_rate": 3.529411764705883e-05,
|
||||
"loss": 0.0681,
|
||||
"step": 76
|
||||
},
|
||||
{
|
||||
"epoch": 3.080808080808081,
|
||||
"grad_norm": 0.18728883564472198,
|
||||
"learning_rate": 3.294117647058824e-05,
|
||||
"loss": 0.0641,
|
||||
"step": 77
|
||||
},
|
||||
{
|
||||
"epoch": 3.121212121212121,
|
||||
"grad_norm": 0.509119987487793,
|
||||
"learning_rate": 3.058823529411765e-05,
|
||||
"loss": 0.0777,
|
||||
"step": 78
|
||||
},
|
||||
{
|
||||
"epoch": 3.1616161616161618,
|
||||
"grad_norm": 0.16499896347522736,
|
||||
"learning_rate": 2.823529411764706e-05,
|
||||
"loss": 0.0578,
|
||||
"step": 79
|
||||
},
|
||||
{
|
||||
"epoch": 3.202020202020202,
|
||||
"grad_norm": 0.17131227254867554,
|
||||
"learning_rate": 2.5882352941176475e-05,
|
||||
"loss": 0.0597,
|
||||
"step": 80
|
||||
},
|
||||
{
|
||||
"epoch": 3.242424242424242,
|
||||
"grad_norm": 0.17663079500198364,
|
||||
"learning_rate": 2.3529411764705884e-05,
|
||||
"loss": 0.0674,
|
||||
"step": 81
|
||||
},
|
||||
{
|
||||
"epoch": 3.282828282828283,
|
||||
"grad_norm": 0.18466462194919586,
|
||||
"learning_rate": 2.1176470588235296e-05,
|
||||
"loss": 0.062,
|
||||
"step": 82
|
||||
},
|
||||
{
|
||||
"epoch": 3.323232323232323,
|
||||
"grad_norm": 0.1754070371389389,
|
||||
"learning_rate": 1.8823529411764708e-05,
|
||||
"loss": 0.0703,
|
||||
"step": 83
|
||||
},
|
||||
{
|
||||
"epoch": 3.3636363636363638,
|
||||
"grad_norm": 0.16022254526615143,
|
||||
"learning_rate": 1.647058823529412e-05,
|
||||
"loss": 0.0651,
|
||||
"step": 84
|
||||
},
|
||||
{
|
||||
"epoch": 3.404040404040404,
|
||||
"grad_norm": 0.16330307722091675,
|
||||
"learning_rate": 1.411764705882353e-05,
|
||||
"loss": 0.0566,
|
||||
"step": 85
|
||||
},
|
||||
{
|
||||
"epoch": 3.4444444444444446,
|
||||
"grad_norm": 0.38002651929855347,
|
||||
"learning_rate": 1.1764705882352942e-05,
|
||||
"loss": 0.0726,
|
||||
"step": 86
|
||||
},
|
||||
{
|
||||
"epoch": 3.484848484848485,
|
||||
"grad_norm": 0.17870256304740906,
|
||||
"learning_rate": 9.411764705882354e-06,
|
||||
"loss": 0.0658,
|
||||
"step": 87
|
||||
},
|
||||
{
|
||||
"epoch": 3.525252525252525,
|
||||
"grad_norm": 0.19073323905467987,
|
||||
"learning_rate": 7.058823529411765e-06,
|
||||
"loss": 0.0645,
|
||||
"step": 88
|
||||
},
|
||||
{
|
||||
"epoch": 3.5656565656565657,
|
||||
"grad_norm": 0.1769099086523056,
|
||||
"learning_rate": 4.705882352941177e-06,
|
||||
"loss": 0.0615,
|
||||
"step": 89
|
||||
},
|
||||
{
|
||||
"epoch": 3.606060606060606,
|
||||
"grad_norm": 0.2047484815120697,
|
||||
"learning_rate": 2.3529411764705885e-06,
|
||||
"loss": 0.0713,
|
||||
"step": 90
|
||||
}
|
||||
],
|
||||
"logging_steps": 1,
|
||||
"max_steps": 90,
|
||||
"num_input_tokens_seen": 0,
|
||||
"num_train_epochs": 4,
|
||||
"save_steps": 500,
|
||||
"stateful_callbacks": {
|
||||
"TrainerControl": {
|
||||
"args": {
|
||||
"should_epoch_stop": false,
|
||||
"should_evaluate": false,
|
||||
"should_log": false,
|
||||
"should_save": true,
|
||||
"should_training_stop": true
|
||||
},
|
||||
"attributes": {}
|
||||
}
|
||||
},
|
||||
"total_flos": 3309241116770304.0,
|
||||
"train_batch_size": 2,
|
||||
"trial_name": null,
|
||||
"trial_params": null
|
||||
}
|
||||
Reference in New Issue
Block a user