Emily Dickinson Data Analysis Project: Mood Recurrence
Introduction
Emily Dickinson is one of the most renowned American poets because of her distinct use of language and use of themes such as death and pain. In fact, when one asks about her, the reply is often related to those downcast subjects. However, when I first read her poetry, I became quite surprised by what I have found: many poems with vivid, beautifully crafted depictions of nature and love confessions to Sue Dickinson, which unfortunately, are frequently forgotten.
I started to wonder, “What is the ratio of positive subjects appearing in her writing? Is most of it really melancholic?” Reading all of her poems would be very laboring and take too long, so I investigated it through the use of data analysis.
Methodology
First, I used web scraping to track and count the most recurrent words in her poems, excluding words such as “me” and “I.” The code parses the html ebook and iterates through the words counting how many times they appear. To create the final graph, I have imported matplotlib and seaborn to make a simple frequency line graph.
I was not content yet, so I decided to use a sentiment analyzer to evaluate the mood of her poems by analyzing only the first line, often used as the title, of her poetry. The sentiment analyzer (a natural language toolkit; the code to import it in Python is “from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA) automatically assigns “negative,” “positive,” and “neutral” intensities to sentences by giving each of those categories a score (which will add up to one) as well as a compound value. In the same data frame in which I stored the first lines of Dickinson’s poems, I added a compound score column with those values. Then, I assigned values that would label the data as “positive” (compound value of more than 0.2, value becomes 1 if it is true) or “negative” (compound value of less than -0.2, value becomes -1 if it is true), and anything in between became “neutral.” Lastly, all I had to do was to count how many negative, positive, and neutral results I had and create charts with this information.
Results
We can see here words such as “death,” “without” and “away”, but we can also see so many positive words such as “love,” “sun,” and “summer.”
Most results (roughly 60%) came out as “neutral”, while only 17% returned with the “negative” label, which shows that “negative” themes such as death and pain are not too recurrent in her writings.
Here are some of the titles and how they were classified:
Conclusion
It is easy to box artists in just one category, but by this analysis we can see how harmful this could be. A great part of her poetry was obscured by her fame of writing about mortality, even though her “positive” poetry is more recurrent. Yes, death is a very interesting concept, but there is so much more depth and variety in her extensive work that it is a pity she is only known for her melancholic side. So I hope the next time you think of Dickinson, you can think of her as this talented woman that was curious about life: from its beginning to its termination.
Resources
https://www.youtube.com/watch?v=8VZhog5C3bU
https://app.datacamp.com/learn/projects/word-frequency-classic-novels/guided/Python
https://www.gutenberg.org/cache/epub/12242/pg12242-images.html
https://mplsoccer.readthedocs.io/en/latest/gallery/pizza_plots/plot_pizza_basic.html