NLP Case Study: Tesla Versus Hamlet

A case study using Natural Language Processing techniques to identify lexical differences between financial (Tesla) articles and Hamlet

Trist'n Joseph
12 min readJan 9, 2022
Image from WhatsOnStage

Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language. Generally, NLP refers to the manipulation of natural languages, such as text, by software. In particular, NLP is concerned with how to program computers to process and analyze large amounts of natural language data.

There are vast amounts of text data, and natural language is very nuanced; for example, how should one interpret the phrase “I made her duck”? These issues make NLP very challenging. However, there are also various techniques that could be utilized to combat these challenges.

Within this article, I begin to explore NLP by collecting and comparing two text documents. After loading the data, I utilize various NLP techniques such as word tokenization, stemming, regular expression filtering, and stop word removal to clean the data and create unigrams, bigrams, and trigrams. Once the data is cleaned, I perform analysis on the text to understand the frequencies and likelihood of words and phrases both within and between the…

--

--

Trist'n Joseph

Data Scientist? Yes. Researcher? Somewhat. Content creator? Sure, why not.