Skip to main content

Hugging Face

Quick Summary

Hugging Face provides developers with a comprehensive suite of pre-trained NLP models through its transformers library. To recap, here is how you can use Mistral's mistralai/Mistral-7B-v0.1 model through Hugging Face's transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

prompt = "My favourite condiment is"

model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
model.to(device)

generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
print(tokenizer.batch_decode(generated_ids)[0])
# "The expected output"

Evals During Fine-Tuning

deepeval integrates with Hugging Face's transformers.Trainer module through the DeepEvalHuggingFaceCallback, enabling real-time evaluation of LLM outputs during model fine-tuning for each epoch.

info

In this section, we'll walkthrough an example of fine-tuning Mistral's 7B model.

Prepare Dataset for Fine-tuning

from transformers import AutoTokenizer
from datasets import load_dataset

####################
### Load dataset ###
####################
training_dataset = load_dataset("text", data_files={"train": "train.txt"})

########################
### Tokenize dataset ###
########################
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenized_dataset = training_dataset.map(tokenize_function, batched=True)

Setup Training Arguments

from transformers import TrainingArguments
...

training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=5,
per_device_train_batch_size=4,
warmup_steps=500,
weight_decay=0.01,
logging_dir="./logs",
logging_steps=10,
)

Initialize LLM and Trainer for Fine-Tuning

from transformers import AutoModelForCausalLM, Trainer
...

######################
### Initialize LLM ###
######################
llm = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")


##########################
### Initialize Trainer ###
##########################
trainer = Trainer(
model=llm,
args=training_args,
train_dataset=tokenized_dataset["train"],
)

Define Evaluation Criteria

Use deepeval to define an EvaluationDataset and the metrics you want to evaluate your LLM on:

from deepeval.test_case import LLMTestCaseParams
from deepeval.dataset import EvaluationDataset, Golden
from deepeval.metrics import GEval

first_golden = Golden(input="...")
second_golden = Golden(input="...")

dataset = EvaluationDataset(goldens=[first_golden, second_golden])
coherence_metric = GEval(
name="Coherence",
criteria="Coherence - determine if the actual output is coherent with the input.",
evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
)
info

We initialize an EvaluationDataset with goldens instead of test cases since we're running inference at evaluation time.

Fine-tune and Evaluate

Then, create a DeepEvalHuggingFaceCallback:

from deepeval.integrations.hugging_face import DeepEvalHuggingFaceCallback
...

deepeval_hugging_face_callback = DeepEvalHuggingFaceCallback(
evaluation_dataset=dataset,
metrics=[coherence_metric],
trainer=trainer
)

The DeepEvalHuggingFaceCallback accepts the following arguments:

  • metrics: the deepeval evaluation metrics you wish to leverage.
  • evaluation_dataset: a deepeval EvaluationDataset.
  • aggregation_method: a string of either 'avg', 'min', or 'max' to determine how metric scores are aggregated.
  • trainer: a transformers.trainer instance.
  • tokenizer_args: Arguments for the tokenizer.

Lastly, add deepeval_hugging_face_callback to your transformers.Trainer, and begin fine-tuning:

...
#############################
### Add DeepEval Callback ###
#############################
trainer.add_callback(deepeval_hugging_face_callback)

#########################
### Start Fine-tuning ###
#########################
trainer.train()

With this setup, evaluations will be ran on the entirety of your EvaluationDataset according to the metrics you defined at the end of each epoch.

Confident AI
Try DeepEval on Confident AI for FREE
View and save evaluation results, curate datasets and manage annotations, monitor online performance, trace for AI observability, and auto-optimize prompts.
Try it for Free