You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @konradipipan -- You are correct, RedPajama-Data-1T on HF corresponds to v1. This dataset is not deduplicated. If you want a deduplicated version, you can check out SlimPajama, which is a version of RPv1 which is cleaned and deduplicated across dataset slices with MinHashLSH.
Is 1T version basically V1? If so, is the HF version of V1 (1T) already deduplicated are ready to be used?
The text was updated successfully, but these errors were encountered: