Using Convolutional Neural Networks for Sentence Classification
Convolutional Neural Networks (CNNs) were originally designed for computer vision tasks, but they have proven highly useful for Natural Language Processing (NLP) tasks as well. A primary use case is sentence classification.
Today, CNNs are a state-of-the-art technique helping to automatically classify text strings by emotional sentiment, object categories, urgency, priority, or other characteristics. Learn how a CNN for text classification works and how to apply it in your projects.
In this article:
- What is Natural Language Processing
- What is sentence classification
- What are Convolutional Neural Networks and their effectiveness for NLP
- How Convolutional Networks perform text classification
- Example of convolutional process on text vectors
What is Natural Language Processing (NLP)?
Natural language processing is the practice of automatically processing textual information and deriving meaning from it. Some common uses of NLP are:
- Sentiment analysis━processing text such as social media postings or reviews to identify the emotion behind them, or whether they are positive, negative or neutral.
- Natural language search━issuing a query and obtaining search results relevant to the query.
- Named entity recognition━recognizing names of people, places, or concepts within a text.
- Natural language generation━generating text, from summarization to automated image captions to chatbots.
The vast majority of the world’s textual content is unstructured, making automated classification an important task. Sentence classification involves taking segments of text of various lengths and assigning them with meaningful labels, such as:
- Subjects or topics
- Urgency or priority
- Style, complexity, or language
Traditionally, automated sentence classification was carried out by bag-of-words (BOW) models such as Naive Bayes or Support Vector Machines. State-of-the-art models rely on text classification using neural networks, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
Convolutional Neural Networks (CNNs) were originally designed to perform deep learning for computer vision tasks, and have proven highly effective. They use the concept of a “convolution”, a sliding window or “filter” that passes over the image, identifying important features and analyzing them one at a time, then reducing them down to their essential characteristics, and repeating the process.
It turned out that this approach works well for NLP as well. In 2014, Yoon Kim published the original research paper on using CNNs for text classification. He tested four CNN variations, and showed CNN models could outperform previous approaches for several classification tasks:
Kim tested the classification problems with robust datasets (indicated as acronyms at the top of the table). Indeed, these are some of the common mainstream uses of CNN for sentence classification today:
- Classifying positive/negative movie reviews
- Classifying sentences as subjective or objective
- Classifying questions into types (about a person, location, numerical information, etc.)
- Identifying positive/negative product reviews
- Detecting opinion polarity
Of course, CNNs are not limited to these cases and can be used for any single- or multi-label classification problem on textual inputs.
How Convolutional Networks Perform Text Classification
Below is a typical CNN architecture used for text processing. It starts with an input sentence broken up into words or word embeddings: low-dimensional representations generated by models like word2vec or GloVe.
Words are broken up into features and are fed into a convolutional layer. The results of the convolution are “pooled” or aggregated to a representative number. This number is fed to a fully connected neural structure, which makes a classification decision based on the weights assigned to each feature within the text.
Example of Convolutional Process Applied to Text
In a CNN, text is organized into a matrix, with each row representing a word embedding, a word, or a character. The CNN’s convolutional layer “scans” the text like it would an image, breaks it down into features, and judges whether each feature matches the relevant label or not.
The following image illustrates how the convolutional “filter” slides over a sentence, two words at a time. It computes an element-wise product of the weights of each word, multiplied by the weights assigned to the convolutional filter.
The sum of the products is taken as a representation of the current textual feature – 0.51 and 0.53 in the example. This is the “pooling” stage, reducing the dimensionality of the word features and retaining only a simple probability score that reflects how likely they are to match a label.
At the final stage, these scores are the inputs to a fully connected neural layer. The “fully connected” part of the CNN network goes through its own backpropagation process, to determine the most accurate weights. Each neuron receives weights that prioritize the most appropriate label━for example, “positive sentiment” or “negative sentiment”. Finally, the neurons “vote” on each of the labels, and the winner of that vote is the classification decision.
Automating Operational Aspects of CNNs with MissingLink
In this article, we explained the basics of sentence classification, and how it can be achieved by Convolutional Neural Networks. When you start working on CNN projects, using deep learning frameworks like TensorFlow, Keras, and PyTorch to process and classify bodies of text, you’ll run into some practical challenges:
Tracking experiment source code, configuration, and hyperparameters. Convolutional networks can have many hyperparameters and structural variations. You’ll need to run many experiments, up to hundreds or thousands for a single project, to find the model variation that yields the best performance. Organizing, tracking, and sharing experiment data and results can be a challenge.
Scaling experiments on-premise or in the cloud—CNNs are computationally intensive and typically run expensive, specialized hardware. In real projects, you may need to scale experiments across multiple machines. Provisioning these machines and ensuring they are utilized to the max is challenging. Too often, you may find yourself “babysitting” machines to make sure experiments are running as planned.
Managing training data—CNN projects can involve very large bodies of text from social media or other sources. You may need to create multiple versions of these datasets to experiment with different word embeddings or preprocessing options. Copying these datasets to training machines, and re-copying them every time they change, can take time and is highly error-prone.