# This will issue a warning about some of the pretrained weights not being used and some weights being randomly initialized.
# That’s because we are throwing away the pretraining head of the BERT model to replace it with a classification head which is randomly initialized.
# We will fine-tune this model on our task, transferring the knowledge of the pretrained model to it (which is why doing this is called transfer learning).