Conditional Random Fields: A 2021 Overview

Ajay Ohri


Imagine you have a sequence of various snapshots and you are willing to label each of them with the activity they are representing. How would you do that?

One way is to avoid the sequential nature of these snapshots and develop per image classified for each of them. However, by avoiding the sequential aspect you might lose the critical information as well. Hence to enhance the accuracy of the image labeler one can integrate the labels of the corresponding components and this function is precisely what a conditional random field performs. New websites and other platforms generate tons of data and text content on an hourly basis. And analyzing these patterns can become a daunting task for the professionals in absence of the right tools. The conditional random fields python refers to an approach through which entity recognition can be made possible. 

In this article, we will explore conditional random fields crf and we will go deeper into the same. 

  1. An introduction to conditional random fields & Markov random fields
  2. Generative vs discriminative models
  3. How does machine learning with python work?
  4. What is machine learning with python: Sequence models for CRF?
  5. Gibbs notation
  6. CRF theory and likelihood optimization
  7. Applications of CRF

1. An introduction to conditional random fields & Markov random fields

A conditional random field is a discriminative model class that aligns with the prediction tasks in which contextual information and the state of the neighbors can influence the current production. The conditional random fields get their application in the name of noise reduction, gene prediction, object detection problems, named entity recognition, just to name a few. To learn what is conditional random fields are, it is important to understand the probabilistic graphical models. In the Conditional random field, the probabilistic graphical model refers to having an array of applications such as part of picture recognition, gene production, and many more. It is extensively utilized in natural language processing (NLP) extensively special in the area of named entity recognition, neural sequence labeling, parts of speech tagging, etc. the conditional random field is used when the details about neighboring labels are useful while calculating the label for a single sequence team. 

In a crf model, a graphical model is more like a probabilistic model used for the graphs using conditional dependence between the random variables. There are two possible graphical models which are Markov random fields and Bayesian networks. Markov random fields are the undirected graphs that can be cyclic whereas Bayesian networks mostly refer to the directed acyclic graphs. The conditional random fields come under the Markov random field category. Markov random field is an abstraction through which the conditional random fields are developed. The structure of graphs in Markov random fields chooses the dependency or independence between the random variables. 

2. Generative vs discriminative models

  • Generative models in deep learning conditional random fields elaborate on how any label vector ‘Y’ can probabilistically create the feature vector ‘X’. 
  • Discriminative models elaborate on how to take the feature vector ‘X’ and assign these vectors to the output vector ‘Y’. 
  • The discriminative models can process the decision boundary present between multiple classes. 
  • The most common example of a discriminative model is logistic regression that maximizes the likelihood of the estimates.

3. How does machine learning with python work?

Before understanding machine learning and python, it is important to understand entity recognition and text classification with the use of python conditional random fields. 

Entity recognition has been through a rigorous surge in adoption with conditional random fields nlp (natural language processing). Any entity is defined as a segment of the text that is of huge concern to the data scientist or any other professional. The examples of frequently extracted data entities can be the name, address, location, or account number of the users, etc. These are conditional random fields python examples and the user can come up with any other entity as well. To take the application of this system the algorithm automatically classifies or categorizes the data set accordingly.

To identify the patterns of nlp conditional random fields various approaches are available. Let us go through a few of them-

  • Regular expressions (RegEx)- regular expressions represent a form of a finite-state automaton. These expressions are extremely helpful for identifying the patterns following a specific structure. For instance, phone number, name, email address, etc. Can be easily identified with the help of using regular expressions. However, there is a downside to this approach where the users are required to be aware of all the possible keywords that can occur before claiming the number. This is a brute force approach
  • Hidden Markov model (HMM)- hidden Markov model is a sequence of modeling algorithms that projects and learns the pattern. This model considers the upcoming observations around the entities to learn a pattern. It also assumes the features that are independent of one another. In terms of performance, this model has not considered the best method for entity recognition
  • MaxEnt Markov model- in a conditional random field, MaxEnt Markov model refers to a sequence modeling algorithm. This model does not predict the features that are independent of each other and it also does not consider the upcoming observations to learn the pattern. In terms of performance, his model is not known as the best method to identify the entire day relationships also.
  • Conditional random fields- conditional random fields is a sequence modeling algorithm that does not assume the features that are dependent on each other but it considers the upcoming observations to learn the pattern. This model is capable of combining the utilities of HMM and MEMM. In terms of performance, it is perceived as one of the best methods for identifying the entity.

4. What is machine learning with python: Sequence models for CRF?

The prime objective of a conditional random field is to execute the task-specific productions. In simple words, if the input is X, then it predicts the Y label (predefined). A conditional random field (CRF) is a probabilistic discriminative model that has multiple applications in computer vision, conditional random fields nlp, and bioinformatics. CRF can be used to predict and analyze the sequences that rely on contextual data for adding the information. This information can be used to make accurate predictions about the models. The efficiency of deep learning conditional random fields is significant when sequence models analyze various interdependent variables. To understand this cause, it is important to know Named Entity Recognition (NER) that comes along with NLP. It refers to an issue of detecting the entities from the texts and classifying them according to the organization, Location, or person. The major concern behind this issue is the fact that these entities are too difficult to anticipate in training the components due to which the application model has to detect the entities based on their context. The fundamental approach towards this malfunctioning deals with classifying the components individually. This approach also assumes the independent levels.

To handle this nlp conditional random fields-related problem, we can use conditional random fields in which the input and output data is the sequence. We can take the previous context into account of predicting the data point. for this purpose, we can use a feature function having multiple input values and which is defined as follows-

f (X, i, Y i-1, Y i)


X= state of the input vectors

i= position of the data points that we want to predict

Y i-1= the label of data point i-1

Y i= it is the label of data point i in X

5. Gibbs notation

By operating on various factors in the log space one can represent the joint as Gibbs notation. By using β(dⱼ)= log(ϕ(dⱼ)), one can easily Express the joint in Gibbs notation. In the below-mentioned distribution, X is the set of all random variables in the graph. And β function is known as factor potentials.

Gibbs notation is used in conditional random field example and likelihood optimization and maximization problem derivations.

6. CRF theory and likelihood optimization

To better understand conditional random fields and like the optimization in CRF, it is crucial to define and locate the parameters. This will help you get a precise answer to ‘what is conditional random fields’ and you can develop the equations from the defined parameters using Gibb’s notation.

  1. Label domain: you can assume that random variables in set Y have a domain: {m ϵ ℕ | 1≤m ≤M} i.e. first M natural numbers.
  2. Evidence structure and domain: now you can assume that random variables in set X are real-valued vectors of size F i.e ∀ Xᵢ ϵ X, Xᵢ ϵ Rˢ.
  3. Allow the length of the CRF chain to be L i.e. L labels and L evidence variables.
  4. Let βᵢ(Yᵢ, Yⱼ) = Wcc’ if Yᵢ = c, Yⱼ = c’ and j = i+1, 0 otherwise.
  5. Let β’ᵢ(Yᵢ, Xᵢ) = W’c . Xᵢ, if Yᵢ = c and 0 otherwise, where ‘. ‘represents the dot product i.e. W’c ϵ Rˢ.
  6. Check for the total number of parameters- M x M + M x S i.e. there is an individual parameter for each label transition(M x M possible label transitions) and S parameters for all of the label(M possible labels) that will be multiplied to the observation variable(a vector of size S) at that label.
  7. Allow D = {(xn, yn)} for n=1 to N, be the training data having N examples.

Considering this conditional random field example in mind, the likelihood optimization is as follows- 

It is a likelihood expression for the CRF model

Therefore, the training problem reduces to maximizing the log likelihood wrt all model parameters Wcc’ and W’cs.

The gradient of the log likelihood with respect to W’cs has been derived below :-

The training problem limits to maximize the likelihood wrt all the model parameters Wcc’ and W’cs. The gradient of log likelihood optimization with respect to W’cs is derived as-

it is a derivative of likelihood optimization

The second term in the conditional random fields equation denotes the sum of marginal probability of y’i (the entire value of y’-i can take) being equivalent to c and weighted by xnis. y’i in this equation denotes the label set and y variables at every position except for position i.

  • Model training and evaluation

Using the crf model and scripts, one can easily train conditional random field models over the training sets consisting of many words and can achieve maximum accuracy on the model test set.

7. Applications of CRF

Now that we have understood the crf an introduction to conditional random fields and their major concepts, we can align ourselves with the ability of the CRF models to sequence the data. The conditional random fields are utilized for distinguishing the texts present in a sentence in contrast to the POS. The additional similar approach refers to the title recognition, name, and extraction of proper nouns from the application sentences. Here the conditional random fields are used to predict the sequences in which multiple variables are dependent on one another. Another CRF  deep learning application refers to the tasks related to gene prediction, parts recognition in images, and many more. There are multiple types of python conditional random fields structures having dynamic conditional random field features for labeling the sequence data, hiding CRF for gesture recognition, etc. 


To conclude, in identifying the entities (part of text) sequence of tokens and words are important. The pattern recognition approaches like Regular Expression, and graph based models like Maximum Entropy Markov Model, and Hidden Markov Models are useful in identifying the Identities. But, conditional random fields are arguably helpful for entity recognition. The conditional random fields refers to an undirected graph based model and class of statistical modelling method.

If you are interested in making a career in the Data Science domain, our 11-month in-person Postgraduate Certificate Diploma in Data Science course can help you immensely in becoming a successful Data Science professional. 

Also Read

Related Articles

Please wait while your application is being created.
Request Callback