Important announcement: Missinglink has shut down. Click here to learn more.

Neural Network Concepts Cover

Convolutional Neural Networks

Using Convolutional Neural Networks for Sentence Classification

Convolutional Neural Networks (CNNs) were originally designed for deep learning computer vision tasks, but they have proven highly useful for Natural Language Processing (NLP) tasks as well. A primary use case is sentence classification.


Today, CNNs are a state-of-the-art technique helping to automatically classify text strings by emotional sentiment, object categories, urgency, priority, or other characteristics. Learn how a CNN for text classification works, how to apply it in your projects, and how to scale up CNN models using the MissingLink deep learning platform.

What is Natural Language Processing

Natural language processing is the practice of automatically processing textual information and deriving meaning from it. Some common uses of NLP are:

  • Sentiment analysis━processing text such as social media postings or reviews to identify the emotion behind them, or whether they are positive, negative or neutral.
  • Natural language search━issuing a query and obtaining search results relevant to the query.
  • Named entity recognition━recognizing names of people, places, or concepts within a text.
  • Natural language generation━generating text, from summarization to automated image captions to chatbots.


What is sentence classification

The vast majority of the world’s textual content is unstructured, making automated classification an important task. Sentence classification involves taking segments of text of various lengths and assigning them with meaningful labels, such as:

  • Sentiment
  • Subjects or topics
  • Urgency or priority
  • Style, complexity, or language


Traditionally, automated sentence classification was carried out by bag-of-words (BOW) models such as Naive Bayes or Support Vector Machines. State-of-the-art models rely on text classification using neural networks, including convolutional neural networks  (CNNs) and recurrent neural networks  (RNNs).

What are Convolutional Neural Networks and their effectiveness for NLP

Convolutional Neural Networks (CNNs) were originally designed to perform deep learning for computer vision tasks, and have proven highly effective. They use the concept of a “convolution”, a sliding window or “filter” that passes over the image, identifying important features and analyzing them one at a time, then reducing them down to their essential characteristics, and repeating the process.


It turned out that this approach works well for NLP as well. In 2014, Yoon Kim published the original research paper on using CNNs for text classification. He tested four CNN variations, and showed CNN models could outperform previous approaches for several classification tasks:

Convolutional Neural Networks for Sentence Classification: Research Results


Kim tested the classification problems with robust datasets (indicated as acronyms at the top of the table). Indeed, these are some of the common mainstream uses of CNN for sentence classification today:

  • Classifying positive/negative movie reviews
  • Classifying sentences as subjective or objective
  • Classifying questions into types (about a person, location, numerical information, etc.)
  • Identifying positive/negative product reviews
  • Detecting opinion polarity


Of course, CNNs are not limited to these cases and can be used for any single- or multi-label classification problem on textual inputs.

How Convolutional Networks perform text classification

Below is a typical CNN architecture used for text processing. It starts with an input sentence broken up into words or word embeddings: low-dimensional representations generated by models like word2vec or GloVe.


Words are broken up into features and are fed into a convolutional layer. The results of the convolution are “pooled” or aggregated to a representative number. This number is fed to a fully connected neural structure, which makes a classification decision based on the weights assigned to each feature within the text.


How Convolutional Networks Perform Text Classification


Example of convolutional process on text vectors

In a CNN, text is organized into a matrix, with each row representing a word embedding, a word, or a character. The CNN’s convolutional layer “scans” the text like it would an image, breaks it down into features, and judges whether each feature matches the relevant label or not.


The following image illustrates how the convolutional “filter” slides over a sentence, two words at a time. It computes an element-wise product of the weights of each word, multiplied by the weights assigned to the convolutional filter.


Example of Convolutional Process Applied to Text


The sum of the products is taken as a representation of the current textual feature – 0.51 and 0.53 in the example. This is the “pooling” stage, reducing the dimensionality of the word features and retaining only a simple probability score that reflects how likely they are to match a label.


At the final stage, these scores are the inputs to a fully connected neural layer. The “fully connected” part of the CNN network goes through its own backpropagation process, to determine the most accurate weights. Each neuron receives weights that prioritize the most appropriate label━for example, “positive sentiment” or “negative sentiment”. Finally, the neurons “vote” on each of the labels, and the winner of that vote is the classification decision.


Example of Convolutional Process Applied to Text 2

Automating Operational Aspects of CNNs with MissingLink

In this article, we explained the basics of sentence classification, and how it can be achieved by Convolutional Neural Networks. When you start working on CNN projects, using deep learning frameworks like TensorFlow, Keras, and PyTorch to process and classify bodies of text, you’ll run into some practical challenges:

  • tracking experiments

    Tracking experiments

    Tracking experiments, source code, configuration, and hyperparameters. Convolutional networks can have many hyperparameters and structural variations. You’ll need to run many experiments, up to hundreds or thousands for a single project, to find the model variation that yields the best performance. Organizing, tracking, and sharing experiment data and results can be a challenge.

  • running experiment across multiple machines

    Scaling experiments on-premise or in the cloud

    CNNs are computationally intensive and typically run expensive, specialized hardware. In real projects, you may need to scale experiments across multiple machines. Provisioning these machines and ensuring they are utilized to the max is challenging. Too often, you may find yourself “babysitting” machines to make sure experiments are running as planned.

  • manage training datasets

    Managing training data

    CNN projects can involve very large bodies of text from social media or other sources. You may need to create multiple versions of these datasets to experiment with different word embeddings or preprocessing options. Copying these datasets to training machines, and re-copying them every time they change, can take time and is highly error-prone.

MissingLink is a deep learning platform that does all of this for you and lets you concentrate on building the most accurate model. Learn more to see how easy it is.

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.