22 July 2015

Sentences Are Not Just Strings Of Random Words In Grammatical Order

Language Log explored the claim that one of ten nouns chosen at random, followed by one of ten verbs chosen at random, followed by one of ten adjectives chosen at random, can produce 1,000 grammatical sentences.

While this seems like a sensible claim, the reality is closer to 30-40 sentences.  Lots of word strings that naively seem to put the rights parts of speech in the right word order for a language don't actually produce real idiomatic sentences in a language.

In general, as programmers have learned from programming language software, it is frequently more helpful to conceptualize grammar in terms of the empirical probability that a given word will follow the words that precede it, than it is to think of it as a set of formal rules that systematize those relationships.  The exceptions turn out to be important, high frequency phrases, and the formal rules that have been articulated are often insufficient to accurately discard a myriad of combinations of words that formally comply with the rules but are not used by idiomatic speakers of the language.

No comments: