What is natural language processing (NLP)?

Natural language processing (NLP) enables computers to interpret human language.

Learning Objectives

After reading this article you will be able to:

  • Define natural language processing (NLP)
  • Understand how NLP works
  • Contrast NLP with other types of artificial intelligence (AI)

Copy article link

What is NLP (natural language processing)?

Natural language processing (NLP) is a method computer programs can use to interpret human language. NLP is one type of artificial intelligence (AI). Modern NLP models are mostly built via machine learning, and also draw on the field of linguistics — the study of the meaning of language.

All computers can interpret commands and instructions in computer-friendly languages. For instance, a computer (specifically, a browser application) can understand and interpret JavaScript code like:


window.addEventListener("scroll", popup);

function popup() {
window.alert("Hello, world!");
}

But it cannot understand and interpret natural language text like:


If the user scrolls, show an alert that says "Hello, world!"

However, a computer program with natural language processing may be able to understand the above sentence, even if it cannot carry out the command.

While programming languages are the best way to give computers commands, natural language processing enables computer programs to do a wide variety of tasks with human language, both spoken and written. For example, it can help process large data collections of voice recordings and written texts, automate interactions with human users, or interpret user queries.

Other uses for NLP include:

  • Sentiment analysis: NLP can help interpret reams of user comments, social media posts, or customer service requests
  • Virtual assistants: NLP is crucial for understanding requests from users of assistants like Siri, Alexa, or Cortana
  • Search engines: NLP helps search engines better understand the search intent behind both simple, one-word queries and queries typed as sentences or questions, along with interpreting misspellings or other human errors in the queries
  • Translation: NLP can help understand and translate content from one language to another
  • Content moderation: NLP can assist with flagging potentially harmful or objectionable content by interpreting the meaning of user-generated text

How does natural language processing (NLP) work?

NLP uses machine learning to analyze human-generated content statistically and learn how to interpret it. During the training process, NLP models are fed examples of words and phrases in context, along with their interpretations. For instance, an NLP model might not understand when the word "orange" means the color instead of the fruit. But after being shown thousands of examples — sentences like "I ate an orange" or "This car comes in orange" — the model can start to understand the word, and correctly interpret the difference between its meanings.

Given the complexity and inconsistencies of human language, NLP is often built on deep learning, which is a more powerful type of machine learning. Deep learning models can process unlabeled raw data, although they need vast amounts of data in order to be trained properly. Deep learning also requires a great deal of processing power.

What is NLP preprocessing?

NLP preprocessing is preparation of raw text for analysis by a program or machine learning model. NLP preprocessing is necessary to put text into a format that deep learning models can more easily analyze.

There are several NLP preprocessing methods that are used together. The main ones are:

  • Converting to lowercase: In terms of the meaning of a word, there is little difference between uppercase and lowercase. Therefore, converting all words to lowercase is more efficient because many computer programs are case-sensitive and might treat uppercase versions of words differently unnecessarily.
  • Stemming: This reduces words down to their root or "stem" by removing endings like "-ing" or "-tion" (e.g. "transporting" and "transportation" both become "transport").
  • Lemmatization: This NLP technique reduces words to the primary form that could be found in a dictionary. Plural or possessive nouns become singular: "neighbor's" "neighbors'" and "neighbors" all become "neighbor", for instance. Verbs become their non-conjugated form: "went" and "goes" become "go."
  • Tokenization: This breaks text into smaller pieces that indicate meaning. The pieces are usually composed of phrases, individual words, or subwords (the prefix "un-" is an example of a subword).
  • Stop word removal: Many words are important for grammar or for clarity when people talk amongst themselves, but do not add a great deal of meaning to a sentence and are not necessary for processing language in a computer program. Such words are called "stop words" in the context of NLP, and stop word removal takes them out of text. As an example, in the sentence "I went to college for four years," the words "to" and "for" are essential for the sentence to sound intelligible to human ears, but not necessary for carrying meaning. The stop-word-removal version could be: "I went college four years."

What is the difference between NLP and large language models (LLMs)?

A large language model (LLM) is a type of machine learning model that can comprehend human-generated text and generate natural-sounding outputs. LLMs, like the widely used ChatGPT, are trained on very large data sets of text.

There is some overlap between the terms NLP and LLM: both use machine learning, large data sets, and training in order to interpret human language. In fact, some sources define LLM as being a type of NLP.

However, LLMs differ from NLP models in several key ways:

  • NLP is usually trained for a specific task, whereas LLMs have a broad range of uses
  • NLP provides insights and interpretations, while LLMs produce text that is statistically relevant but may not convey an understanding of the underlying meaning (although many advanced LLMs can easily give the appearance of doing so)
  • Because they have such a broad range of uses, LLMs require far more data and training than NLP models

For instance, an NLP model would be more useful for sentiment analysis, while an LLM would work well for incorporation into a chatbot that interacts with customers. Or, an NLP model could help a search engine interpret a user's query and generate relevant search results, while an LLM could write its own response to the query based on statistical analysis of preexisting relevant content.

NLP vs. LLMs vs. generative AI

NLP is also distinct from, though related to, generative AI. Generative AI is a deep learning model that can generate text, audio, video, images, or code. NLP models, by contrast, are often not designed to generate text at all. LLMs, meanwhile, are also a type of generative AI in that they can produce text in response to queries.

How does Cloudflare enable the development of NLP models?

Cloudflare allows developers to run advanced deep learning on GPUs all over the world, giving them access to the compute power they need to train AI models, with minimal latency. And Cloudflare R2 is a cost-effective storage method for the vast amount of data that deep learning-based NLP must be trained on. Learn more about Cloudflare for AI.