For the supervised learning phase, for each domain (physics domain and the scientific paper domain, separately) a collection of questions and answers was generated using the Llama2-7b-chat LLM in an automated script. To generate questions and answers, the script takes a chunk from the domain text, augments it in a special prompt, and inputs it to the model. Then, all the returned Q&As output from the LLM are split into input and target (question and answer).