This document presents a formula for calculating the loss function J(θ) in machine learning models. The formula averages the negative log likelihood of the predicted probabilities being correct over all samples S, and includes a regularization term λ that penalizes predicted embeddings being dissimilar from actual embeddings. It also defines the cosine similarity term used in the regularization.