Grammar as an Injectable: A Trojan Horse to NLP

I a never-so-mentioned topic, that being how we can make sense of grammar in a non-statistical way. Many AI models, such as DeepMind’s GEM, and Google’s PARSEVAL don’t rely purely on statistical learning to make sense of grammar. Instead, these hybrid models reintroduce formal grammars, such as Combinatory Categorial Grammar (CCG), into their architecture. This allows these models to make use of decades of Linguistic analysis in just a few lines of code. Theoretically, allowing them to reach the same level of competence in less time and at less cost. But how can we turn grammar into something a computer can work with?

To understand this, we will talk about how words turn into functions, the Algebra behind their combination, and how a program returning a TypeError is, in many ways, equivalent to a sentence with bad grammar.

Listen, undefined might bring back some bad memories — words crossed out in red, ruler to the wrist, or a blank stare in the face of “preposition”.

To a grammarian, this is aided through a set of prescriptive rules. Commands like:

Thou shalt not use “whom” as a subject.
Thou shalt have a subject and an object in a sentence.
Thou shalt not end sentences with prepositions (at, by, into,…).

As a writer, I have always found the commandments a bit restrictive—part pain, part medicine. And while I can admit this grammar can clarify your writing, it does not help machines understand sentence structure. To do this, we will need to discuss Combinatory Categorial Grammar (CCG), if you’re acquainted. However, we cannot abandon prescriptive grammar. And in this article we will use the 2nd Commandment: Every sentence must contain a clear subject and predicate.

From NLP to Proof Nets

In the early 2000s, statistical CCG parsers were leaders in providing wide coverage and high-accuracy syntactic parsing by capturing long-distance dependencies and complex coordination. While not the current hot topic in LLMs, it has helped shape question-answering, logical inference, and machine translation systems where structural transparency is desired.

While grammar can now be inferred from sheer data alone, no need for hand-coded rules, still, many state-of-the-art models re-inject syntactic signals because:

Implicit learning alone can miss corner-case phenomena. A parser can be made to handle triple-negatives in legalese or enjambment in poetry, but only if you explicitly encode those patterns.
Provide faster learning with less data. Learning grammar from data alone requires billions of records and is computationally costly.
Interpretability and control. When analyzing syntactic errors, it is easier to look at parse-based features than opaque attention weights.
Consistency in generation. Purely emergent models can drift, flipping verb tenses mid-sentence, mismatching pronouns and antecedents. A syntax-aware parser or grammar module can enforce this explicitly.
Low-resource language constraints. Swahili or Welsh may have less available data for conventional large-scale training. Hand-coded grammar rules make up for that.

Proof Nets

Another reason CCG continues to matter is its deep connection to proof nets (Girard 1987). Proof nets are a graph-based way of representing proofs in linear logic that strip away bureaucratic details to reveal the core logical structure. Morrill (1994) and Moot & Retoré (2012) proved that every CCG parse can be translated into one of these canonical proof-net graphs, giving a direct, formal bridge between CCG’s syntactic derivations and linear-logic proofs. Long-distance dependencies emerge as explicit paths, derivational redundancies are eliminated, and semantic composition follows graph contractions. When we say linear logic, every formula must be used exactly once in a derivation.

Think of it this way: as you build a CCG parse (or its proof net), each syntactic combination (e.g. a verb phrase combining with its subject) tells you exactly which semantic operation to perform (function‐application, function‐composition, etc.). That sequence of syntax‐guided steps then composes the meanings of individual words into the meaning of the whole sentence in a formally precise manner.

The C&C parser and EasyCCG (Lewis & Steedman, 2014) are prominent tools in the field of CCG parsing. While both are widely used, EasyCCG is generally recognized for its speed, often achieving faster parsing times, whereas the C&C parser is frequently noted for its accuracy, particularly on complex sentences.

Where are we exactly?

Formally Type-1 on Chomsky Hierarchy, right below Turing Machines, right above Pushdown Automata. Type-1 is Context-Sensitive. The deeper the language is in the Chomsky Hierarchy, the higher the generative power, structural complexity, and computational resources required to parse it.

Parse: to determine if a string can be built given the grammar rules.
Language ( 𝑳) is a finite set of words composed by taking elements from an Alphabet (𝚺) set, which contains all the symbols for that language.

With the broader definition, “words” don’t have to be words. For example, our “words” could be email addresses, and our alphabet can be numbers, letters, and symbols.

In English, If we want to talk about whole sentences, we can let our alphabet Σ be the set of all words (our vocabulary). Then a sentence is any finite string in Σ*, and a language L⊆Σ* is just the set of “well-formed” sentences we care about.

Chomsky’s Hierarchy. Image made with the help of Chat GPT

Given this abstract definition for language, we can talk about esoteric constructions such as languages where all words are things like (ab, aab, aaabbbb, abb, etc.) Formally described as follows:

Here exponentiation looks more like appending things to the end of a string so 3² = 33 ≠ 9

This language is Type-3 on the hierarchy, a Regular Expression . While it might be hard to find practical uses for the above language, the most ubiquitous examples of Regular Expressions coming to use in the real world is email-address validation on web forms: behind the scenes, the form uses a regex like…

^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$

this@[email protected]@😡grammar.in_the_email_language.ca

Here the sole purpose of our grammar is to make sure you input a valid email address. In CCG our grammar has a more liguistic purpose, it checks if words combine gramatically.

Chomsky’s Hierarchy in NLP

As you move from Type-3 to Type-0, you decrease the constraints on what you can produce. This increases the expressive power at the cost of additional computation.

Type-0 (Recursively enumerable grammars): Full semantic parsing or generation in a Turing-complete formalism (e.g. Prolog DCGs with arbitrary extra arguments, or neural seq2seq models that in principle can simulate any Turing machine).

Type-1 (Context-sensitive grammars): Swiss-German uses cross-serial dependency, where you need more information about the surrounding words before rewriting. This would require more computational steps to parse.

When we cover the Algrbra of CCG later come back and see how using only one forward and backward application might encounter an issue with Swiss-German (Hint: you have to combine adjacent categories)

Type-2 (Context-free grammars): A CCG becomes a pure Type-II (context-free) grammar exactly when you only allow the two application rules and no higher-order combinators (type‐raising or composition).

Type-3 (Regular grammars): Tokenization, simple pattern–based tagging (e.g. recognizing dates, email addresses, or part-of-speech tags using regular expressions or finite‐state transducers)

The Algebra of CCG

Let’s say we have categories A and B, then forward application and backward application work as follows:

A/B says that if we have a B to the right of A/B, then the resulting product is A
A\B says that if we have a B to the left of A\B, then the resulting product is A.

in practice A and B become parts of sheach

The algebra of CCG looks a lot like the multiplication of fractions; notice how “numerators” cancel with “denominators”. However, unlike multiplication, the order matters. This algebra is not commutative. Don’t remember this as a rule, but a direct consequence of word order. This lack of commutativity is necessary to differentiate between “We go there.” As a sentence, and “Go we there.” as nonsense.

Combining two atomic categories using \ or /, (e.g: NP/N) creates a complex category that classifies a word and describes how it can be combined.

In the following illustration, “The” combines with “Dog”(Noun (N)) to make the Noun Phrase (NP) “The dog”. Similarly, the Noun Phrase “The dog” can combine with “ran”(verb(S\NP)) to make the sentence (S) “The dog ran”.

Constructing Full Sentences

Take something like the word “a”—clearly not a Noun, Noun Phrase, or Sentence, but we could describe it in these terms by saying “a” is a word that expects a noun on the right to become a noun phrase:

“a” = NP/N

This is how “a ball” (NP/N N → NP) becomes a noun phrase.

Do you see how we can cleverly describe articles (a, an, the) in terms of NP and N to create a category that describes how they function, how they interact with the words around them? Why call “the” an article when we can call it a function that expects a noun to become a noun phrase?

We can do the same thing with verbs. To form a sentence S, we need a subject and a predicate.

The Subject (RED) does the action.
The action is the verb.
The Predicate (BLUE) receives the action.

By splitting the sentence this way, we can see that the verb acts as a fulcrum between two necessary parts of sentence construction, so you shouldn’t be surprised to see that the verb and adverb take a very special role in CCG, as they are categories that contain the atomic category S.

We can describe a Verb as something that takes a Noun Phrase to the left, and a Noun Phrase to the right to become a sentence. (S\NP)/NP. No additional atomic categories needed.

“After debugging for hours” is a subordinate (dependent) adverbial clause. Parsable by C&C or EasyCCG

How This Relates to Programming

The thing I find most elegant about CCG is how it turns the verb “write” into a function (S\NP)/NP that takes a Noun Phrase to the left and right as input to output a sentence. By treating words as functions, the CCG parser type-checks a sentence the same way a compiler type-checks a program.

A dreaded TypeError will ensue if you try to make a sentence like “run write walk.” This will not compile the same way sum(“word”) would not compile. In the first case, you input a verb where a Noun Phrase was expected, and in the second, you input a string where a number was expected. TypeError

In Lambda calculus, we could write:

λo. λs. write s o        -- wait for an object o, then a subject s

In CCG, every lexical item carries not only a syntactic category but also a small lambda-term encoding its meaning — e.g. write might be assigned (S\(S\NP)/NP with semantics λo. λs.write(s, o) to indicate it first takes an object (o), then a subject (s). As you apply CCG’s combinatory rules (like function application), you simultaneously apply these lambda-terms, composing the meanings of words step by step into a complete logical form for the whole sentence.

Lambda calculus is a very small formal language that does one thing: It describes how to build functions and how to run them. Everything else — numbers, Booleans, data structures, even whole programs — can be encoded in terms of those functions. As a result, the lambda calculus serves as a precise mathematical model of computation itself.

Conclusion

The power of CCG lies in its ability to transform language into an algebraic system, providing a clear set of compositional instructions. This is incredibly useful for revealing the connections between human language and formal computation. Admittedly, the CCG explained here is not comprehensive enough to parse sentences like:

CCG isn’t just a powerful way for computers to understand sentence structure; it also appears to mirror how our brains process language

Parsing these sentences requires far more. When you try to build a comprehensive CCG system to handle real-world English at scale, you need over 1,200 different grammatical categories, revealing how much hidden complexity exists in what seems like “ordinary” language use.

Even the following construction is a simplified model:

S
├── S
│   ├── NP                     CCG
│   └── S\NP
│       ├── (S\NP)/NP          isn't
│       └── NP
│           ├── NP/NP          just
│           └── NP
│               ├── NP/N       a
│               └── N
│                   ├── N/N    powerful
│                   └── N
│                       ├── N  way
│                       └── N\N
│                           ├── (N\N)/NP   for
│                           └── NP
│                               ├── NP      computers
│                               └── NP\NP
│                                   ├── (NP\NP)/(S\NP)  to
│                                   └── S\NP
│                                       ├── (S\NP)/NP   understand
│                                       └── NP
│                                           ├── N/N     sentence
│                                           └── N       structure
├── (S\S)/S               ;          (punctuation)
└── S
    ├── NP               it
    └── S\NP
        ├── (S\NP)\(S\NP)  also
        └── S\NP
            ├── (S\NP)/(S[TO]\NP)  appears
            └── S[TO]\NP
                ├── (S[TO]\NP)/(S\NP)  to
                └── S\NP
                    ├── (S\NP)/NP     mirror
                    └── NP
                        ├── NP/(S\NP) how
                        └── N
                            ├── N/N   our
                            └── N
                                ├── N brains
                                └── N\N
                                    ├── (N\N)/NP process
                                    └── NP       language

At its core, CCG provides a methodical and rigorous approach to separating sentences, reassembling them, and ensuring grammatical consistency. All the while avoiding incomplete sentences like:

References

Wang, S., … et al. (2024). Computational models to study language processing in the human brain: A survey. arXiv preprint arXiv:2403.13368. https://arxiv.org/abs/2403.13368

Lewis, M., & Steedman, M. (2014). A* CCG parsing with a supertagger and practical dynamic programming. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1787–1798).

Girard, J.-Y. (1987). Linear logic. Theoretical Computer Science, 50(1), 1–102.

Morrill, G. (1994). Categorial deduction. Journal of Logic, Language and Information, 3(3), 287–321.

Moot, R., & Retoré, C. (2012). The logic of categorial grammars: A deductive account of natural language syntax and semantics. Springer.

Jurafsky, D., & Martin, J. H. (2023). Speech and language processing (3rd ed.) [Appendix E: Combinatory Categorial Grammar]. Retrieved May 29, 2025, from https://web.stanford.edu/~jurafsky/slp3/E.pdf

Trending Now

A Milestone in Hands-On Cyber Security Training: SecureAcademy’s First Global Cyber Range Challenge

What Is Generative AI and How Does It Work?

Microsoft Disrupts Storm-2139 for LLMjacking and Azure AI Exploitation

Grammar as an Injectable: A Trojan Horse to NLP

How to Design My First AI Agent

Dynatrace extends platform with agentic AI capabilities

Decision Trees Natively Handle Categorical Data

Latest Posts

How to Design My First AI Agent

Perplexity Reports Queries Growing 20% Month Over Month

Google’s upcoming Gemini Kingfall is allegedly a coding beast

Popular Posts

How to Design My First AI Agent

The Future of AI and ML in Cybersecurity: What\’s Next?

AI and ML in Cybersecurity: Exploring the Emerging Trends and Future Directions

Trending Now

Grammar as an Injectable: A Trojan Horse to NLP

From NLP to Proof Nets

Proof Nets

Where are we exactly?

Chomsky’s Hierarchy in NLP

The Algebra of CCG

Constructing Full Sentences

How This Relates to Programming

Conclusion

References

Related Posts