Tokenization and Padding Within Each Split: perform the tokenization and padding separately for the train, validation, and test sets after the split. This ensures that the preprocessing steps are independent for each set, reducing the risk of info leak.