Skip to content

How are the "primary languages" for a file extension decided? #5345

Answered by lildude
Sparkles-Laurel asked this question in Q&A
Discussion options

You must be logged in to vote

How Linguist Works details how files are assessed in isolation through each of the strategies, in the order they appear in the list.

Essentially Linguist works like a funnel: a lot of languages go in the top and it tries to whittle the list down to one language at each step. If it gets to the end and there is still more than one language, it takes the first, as it works on the assumption the classifier has determined that to be the most likely language based on the samples Linguist has (the final thing the classifier does is sort the languages based on a score after assessing the content).

Everything beyond the extension strategy relies upon the content which means empty files that share …

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by lildude
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants