Negative binomial regression is a method that is quite similar to multiple regression. However, there is one distinction: in Negative binomial regression, the dependent variable, Y, follows the negative binomial. As a result, the variables can be positive or negative integers.
When the mean of the count is lesser than the variance of the count, then Negative binomial regression is used to test for connections between confounding and predictor variables on a count outcome variable. Negative binomial regression is most commonly used to model overdispersed count outcome variables.
Examples of Negative Binomial Regression
Example 1: At two schools, administrators are looking at the attendance habits of high school juniors. A standardized math test and the type of program in which the students are enrolled indicate the number of missed days.
Description of the Data
Let's look at an example to help you understand. Assume that 314 kids from the high school are present. This information was gathered from two urban schools and is saved as Negative binomial regression data. Days Abs, or daysabs, is the response variable of interest. One of the variables in math determines the pupils' grades, and another is prog. The term "program" refers to all the programs in which the students have enrolled.
So, let's look at the descriptive plots and stats.
dat < read.dta("http://www.simplilearn.com/Data/Negative binomial regression_data.dta")
dat < within(dat, {
prog < factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocational"))
id < factor(id)
})
summary(dat)
Output:
summarize daysabs math
Variable 
Obs 
Mean 
Std 
Min 
Max 
daysabs 
314 
5.9 
7.03 
0 
35 
maths 
314 
48.2 
25.6 
1 
99 
As you can see, each of these variables has valid data. Their distributions, as you can see, appear to be fairly sensible. The outcome's mean is lower than the variance. So, let's discuss the variables. The average number of days students are absent by program type is shown in the table above. It also implies that program type is one of the strongest predictors of the number of days missed. It is so because the mean value fluctuates depending on the software. The variations within each prog level are greater than the levels' mean. These disparities indicate overdispersion and that a NB model should be used.
Analysis Methods You Might Consider
There are various analysis methods available for this type of study. The following are a few of them:

Negative Binomial Regression
It can be used whenever there is data that is overdispersed. In layman's terms, the conditional mean is smaller than the conditional variance because both methods have the same structure; Negative binomial regression and Poisson regression share some similarities.

Poisson Regression Method
The Poisson regression method is used to model the count data.

Zero Inflated Models
These models are used when the model needs to account for all the excess zeros.

OLS Regression
When the count variables' results are long transformed, it can be difficult to examine them using other methods; hence the OLS regression approach is applied. However, OLS regression approaches have some drawbacks, such as data loss.
Negative Binomial Regression Analysis
The "Negative binomial regressionreg" command estimates the Negative binomial regression model. Before the variable "prog," there is an "i." The letter I indicates that the variable is a categorical variable of type factor. These should be included in the model as indicator variables.
Fitting Poisson Model
Iteration 0: 
Log likelihood= 1328.67 
Iteration 1: 
Loglikelihood= 1328.64 
Iteration 2: 
Log likelihood= 1328.64 
Fitting Constant Only Model
Iteration 0: 
Log likelihood= 899.2 
Iteration 1: 
Log likelihood= 896.472 
Iteration 2: 
Log likelihood= 896.473 
Iteration 3: 
Log likelihood= 896.472 
Fitting Full Model
Iteration 0: 
Log likelihood= 870.4 
Iteration 1 
Log likelihood= 865.9 
Iteration 2 
Log likelihood= 865.6 
Iteration 3 
Log likelihood= 865.6 
Iteration 4 
Log likelihood= 865.6 
Negative binomial regression Number of obs= 314 LR chi2(3) = 61.69 Prob>Chi2 = 0.0 Pseudo R2 = 0.03 Dispersion= mean loglikelihood= 865.6 Likelihoodratio test of alpha=0: chibar2(01) = 926.03 Prob>=chibar2 = 0.000 
 The iteration log starts with the output. The first part is fitting the Poisson model, a null model, and the negative model. The final value of the log probability for the complete model is displayed as the last number in the iteration log.
 The number of observations is 314, and the chisquare and pvalue are shown next. Model as a whole. You can conclude from the pvalue that this model is statistically significant. A pseudoR2 is also included in the header, which in this case is 0.03.
Other points to be considered:
 The Negative binomial regression method should be used if there are small samples.
 Zeroinflated approaches should be utilized when there are excess zeroes present.
 If zeroes are not considered throughout the data production process, you should use a zerotruncated model.
 The outcome variable in Negative binomial regression analysis should be a positive integer. The exposure variable can't be 0.
 A Negative binomial regression analysis approach can also be run using the command "glm." This can be done using the log link and the binomial family.
 The pseudoRsquared can be measured in a variety of ways. Every metric gives information identical to that provided by the Rsquared in the regression of OLS.
Motivation for Using the Negative Binomial Regression Model
 At first, we will look through the data from the real world and analyze that.
 The next step will be to refine that regression set
 Then we will use the Negative binomial regression model and generate predictions.
 After that, we will implement the python method too.
 Finally, we'll see if the Negative binomial Regression model's performance is superior to that of the Poisson model.
Regression Goal
The following is a data of the cyclist on several New York City bridges.
Date  Day  High Temp  Low Temp  Precipitation  Quennsbon bridge  Manhattan Bridge  Brooklyn bridge  Williamsburg bridge  Total 
61 
friday 
79.2 
61 
0.01 
3568 
7687 
3456 
6560 
21,271 
62 
saturday 
78 
62.1 
0.02 
3278 
4557 
6543 
5431 
19,809 
63 
sunday 
78.3 
61.6 
0.00 
2689 
4323 
7896 
8905 
23,813 
64 
monday 
78.2 
65.3 
0.00 
1905 
6578 
4567 
5678 
18,728 
65 
Tuesday 
77 
67.4 
0.01 
2070 
7778 
6547 
4567 
20,962 
66 
wednesday 
78.3 
66 
0.02 
1093 
5436 
7865 
8709 
23.103 
Our Regression Strategy
We will put our focus on the QuenNB on the bridge. So, using Negative binomial regression, we will forecast the number of cyclists on the Quennsbon bridge on that particular day. The first step is to create a list of variables.
Y is the vector from days 1 to n.
As a result, y = [y 1, y 2, y 3,...,y n].
The total bicyclists on the day i is y_i
Regression variables are denoted by the letter X. Because the data set contains n number of independent observations and each observation has values for m regression variables, the size of Matrix X is a (n x m).
λ= the rate vector of events A major feature of data sets is that the vector is sizable (n x 1). Also, it has n rates [λ 0,λ 1, λ 2,...,λ n], which correspond to the n counts in y vector. The, y_i, observed count in the count's vector y is supposed to be driven by the rate λ_i for observation i The column is missing from the provided data. λ Vector, on the other hand, is a derived variable.
Matrix X Vector y
Date 
Day 
High Temp 
Low Temp 
Precipitation 
QuenNB on bridge 
61 
Friday 
79.2 
61 
0.01 
3568 
62 
Saturday 
78 
62.1 
0.02 
3278 
63 
Sunday 
78.3 
61.6 
0.00 
2689 
64 
Monday 
78.2 
65.3 
0.00 
1905 
65 
Tuesday 
77 
67.4 
0.01 
2070 
66 
Wednesday 
78.3 
66 
0.02 
1093 
We will test the model's performance after training using holdout test data that the model hasn't seen during training.
In Negative binomial regression, we have to define the parameter α.
Variance= mean + α * mean
When the value of p is 1
Variance= mean + α * mean
1+ α * mean
This is the NB 1 model
When the value of p is 2
Variance= mean + α * mean2
This is the NB 2 model, and we will implement that.
The Accurate Value of α
We will use auxiliary (OLS) Ordinary least squares regression and there is no constant.
Y=B1x+B0
Once we've used the auxiliary regression method to the data using the Ordinary Least Squares Regression approach, we can find the value.
We fitted the Poisson regression model to our data set to determine λ_i
All of the components for the Negative binomial 2 regression strategy are now in place. Let's take a look at the big picture.
Steps to Perform Negative Binomial Regression in Python
 Step 1: To test the Poisson regression method on the training data set.
First set up the regression expression. The regression variables DAY, DAY OF WEEK, MONTH, HIGH T, LOW T, and PRECIP are used to convince patsy that BB COUNT is the dependent variable.
expr = """BB COUNT DAY + DAY OF WEEK + MONTH + HIGH T + LOW T + PRECIP""" expr = """BB COUNT DAY + DAY OF WEEK + MONTH + HIGH T + LOW T + PRECIP"""
Arrange the testing and training data sets' X and y matrices. Patsy makes it really easy to do.
dmatrices(expr, df train, return type='dataframe'), y train, X train = dmatrices(expr, df train, return type='dataframe')
dmatrices(expr, df test, return type='dataframe') = y test, X test
Train the Poisson regression model using the statsmodels GLM class.
sm = poisson training results
family=sm.families. GLM(y train, X train, family=sm.families.
Poisson()).
fit()
This step completes the training Poisson regression model.
 Step 2: To fit the auxiliary Ordinary least squares regression model and find α.
Import the api package into your project.
In the Data Frame of the training data set, add the vector called 'BB LAMBDA.'
Keep in mind that the measurements are (n x 1). We will use (161 x 1). Keep in mind that the vector may be found in Poisson training results.mu:
df train ['BB LAMBDA'] = poisson training results.mu
Next, let's add the derived column to the pandas Data Frame called 'AUX OLS DEP.' The values of ordinary least square regression's dependent variable will be stored in this new column.
df train ['AUX OLS DEP'] = df train.apply df train. apply df train.apply (lambda x ((x['BB COUNT']  x['BB LAMBDA'])**2  x['BB LAMBDA']) / x['BB LAMBDA'], axis=1)  x['BB LAMBDA'])
Let's utilize patsy to create the OLSR model specification. The '1' at the back of the phrase is a hackneyed way of saying: don't use a regression intercept.
"""AUX OLS DEP BB LAMBDA  1""" ols expr = """AUX OLS DEP BB LAMBDA  1"""
Let’s fit the OLSR model, and for that follow these steps:
aux_olsr_results = smf.ols(ols_expr, df_train).
fit()
Is there a statistically significant difference of α?
Is (0.037343) a statistically significant value? Is it possible to consider it 0 for all functional purposes?
Why is it critical to discover this information?
Variance= mean + α * mean2
If α is 0,
variance= mean
The tscore of the regression coefficient is stored in OLSResults. Let's have it printed:
aux_olsr_results.tvalues
The crucial tvalue at a 95% confidence level is 2.34988, with degrees of freedom=160. This is significantly lower than the tstatistic of 4.814096. So, in conclusion
This value of α=0.037343 is significant statistically.
 Step 3: Provide the alpha value found in the previous step.
NB 2_training_results = sm.GLM(y_train, X_train,family=sm.families.NegativeBinomial(alpha=aux_olsr_results.params[0])).fit()
 Step 4: It is time for predictions using the trained Negative binomial regression2 model.
NB 2_predictions = NB 2_training_results.get_prediction(X_test)
The NB 2 model appears to be tracking the bicycle count trend rather closely.
 Step 5: Measuring the goodnessoffit of the NB Regression2 model
The training summary of the NB 2 model contains three points of relevance in terms of goodnessoffit. We'll go over each of them individually.
NB Model result
Log likelihood 
1383.2 
Deviance 
330.99 
Pearson chi2 
310 
Poisson Regression model result
Loglikelihood 
12616 
deviance 
23682 
Pearson chi2 
2.38e+04 
The LogLikelihood value is the first parameter to consider.
The LR Test
The negative binomial2, Loglikelihood is 1383.2, while the Poisson regression model has a Loglikelihood of 12616.
Thus, 2 * (12616–1383.2) = 22465.6 is the LR test statistic. This result is significantly higher than the 6.635 critical value of χ2(1) at the 1% significance level.
The Pearson ChiSquared and Deviance Statistics
The NB 2 model's Pearson and Deviance values are 310 and 330.99, respectively. The value of degrees of freedom of residuals is 165 and of p is 0.05 to produce a quantitative evaluation of the goodnessoffit at some confidence level, say 95 per cent (p=0.05). This value is then compared to the observed statistics. When this comparison was made, we discovered that the ChiSquared value is 195.973 when DF Residuals = 165 and p=0.05. However, this value is much lower than 310 and 330.99. As a result, we can deduce that the NB 2 model can be suboptimal.
Stay ahead of the techgame with our PG Program in AI and Machine Learning in partnership with Purdue and in collaboration with IBM. Explore more!
Conclusion
Now that you have learned the AZ of negative binomial regression, you should look forward to mastering machine learning. You can explore machine learning and related free courses in Skillup by Simplilearn or enroll in the topnotch machine learning PG program. Explore and enroll now.