Economietrics Report with different tests



Title of the Report
Productivity Analysis of Organic and Inorganic Input use in Paddy Cultivation
Step 1: Research Question
Research Question 1: What is the productivity level of using organic and inorganic input use in paddy production?
Step 2: Variables and their Unit of Measurement
Table: Variables and their Unit of Measurement
Variables
Unit of Measurement
Output or Productivity
Mound
Land Cost
BDT
Rental Cost
BDT
Labor Cost
BDT
Fuel Cost
BDT
Seed Cost
BDT
Pesticide Cost
BDT
Fertilizer Cost
BDT
Step 3: Specification of the Mathematical Model of Productivity from Organic and Inorganic Input use in Paddy Cultivation
A mathematical model is simply set of mathematical equations. This is a multiple equation model, because here is more than one equation. Author has shown a mathematical model of the relationship between productivity and input use and it is also called production function in economics. The variable appearing on the left side of the equality sign is called dependent variable whereas the variables on the right side are called independent or explanatory variables. The general form of the equation is as follows:
P = f (La, Ren, L, Fu, S, Pes, Fer)……………………..(i)
Where,
P= Productivity; and
La= Land Cost; Ren= Rental Cost; L= Labor Cost; Fu= Fuel Cost; S= Seed Cost; Pes= Pesticide Cost and Fer= Fertilizer Cost
Step 4: Specification of the Econometric Model of Productivity from Organic and Inorganic Input use in Paddy Cultivation
The pure mathematical model of the production function in equation (i) is of limited interest to the author, because it assumes that, there is an exact or deterministic relationship between productivity and input uses.  But generally, the relationships between economic variables are inexact. To allow inexact relationships between economic variables the author would modify the deterministic production function (i) as follows:
+ + ……..(ii)
Here,
 Productivity
= Intercept of the regression line
(i= 1, 2, 3…7) = Coefficients of the explanatory variables
u = Error term
Step 5: Data Analysis
These data give information on the variables concerning individual agents or producers at a given point of time. As this set of data was surveyed during different weeks of the same year, author would view this as a cross-sectional data set. An important feature of this cross-sectional data is that they have been obtained by random sampling. The formulation of cross-sectional data set is as follows:
Cs: + +...................................................... (iii)
Here,
Cs= Cross-sectional data set
= 1,2,3……………….7

Step 6: Estimation of Econometric Model
1.      Descriptive Statistics on Variables Used
Variables
Obs
Mean
Std. Dev.
Min
Max
Productivity
50
46.71
15.86489
18
80
Land_cost
50
6456.6
2615.606
1920
11600
Rental_Cost
50
5664.6
1372.07
3000
8500
Labor_cost
50
9259
1570.737
4500
11700
Fuel_Cost
50
8561.76
3184.66
1080
14400
Seed_Cost
50
4331.6
3149.084
250
10800
Pesticide_~t
50
403.94
196.7764
0
1160
Fertilizer~t
50
1454.94
850.992
560
4440
In the above table, author has shown that, the mean, min and max value of 7 variables. Mean is the average number of these variables and divided by these number of numbers. Here the mean value of productivity, land and labor costs is 46.71 mounds, 6456.6 BDT and 9259 BDT respectively. The rest mean values of variables are apparent in the table also. Here the standard deviation of these variables also considered. Where the standard deviation is a statistical value which has been used to determine how spread out the data in these sample are. Finally, the minimum (The smallest value in the series) and maximum (The biggest value in the series) values of different variables are shown in the table. In this table min value containing zero in case of pesticide used and max value 14400 for fuel cost respectively.
2.      Analysis for the Multiple Regression Model of Productivity 
The author uses multiple regression models to find out productivity by using land cost rental cost labor cost fuel cost seed cost pesticide cost and fertilizer cost as explanatory variables.
Dependent Variable= Productivity
Productivity_~d
Coef.
Std. Err.
t
P>t
[95% Conf.
Interval]
Land_Cost
.0023952
.0006418
3.73
0.001
 .0011
.0036905
Rental_Cost
.000906
.001196
0.76
0.453
-.0015077
.0033196
Labor_Cost
.0019087
.0008159
2.34
0.024
.0002622
.0035552
Fuel_Cost
-.0003217
.0003816
-0.84
0.404
-.0010917
.0004483
Seed_Cost
.0021764
.0002896
7.51
0.000
.0015919
.0027609
Pesticide_Cost
.0041359
.0042724
0.97
0.339
-.0044861
.0127579
Fertilizer_Cost
.0070859
.0010122
7.00
0.000
.0050432
.0091286
_cons
-10.21255
5.122402
-1.99
0.053
-20.54998
.1248775
R-squared     =  0.9356
Adj R-squared =  0.9248
Root MSE      =    4.35 Where sum of squares of the residual= 794.739483 and degrees of freedom= 42
The above table implies that, all the explanatory variables have much impact on productivity. If land cost increases by 1 BDT then productivity increases by .0023952 mound which is statistically significant at 1 percent level of significance. Similarly, if labor cost increases by 1 BDT then productivity increases by .0019087 mound this is statistically significant at 5 percent significant level. Correspondingly, if seed cost and fertilizer cost increases by 1 BDT then productivity increases by .0021764 and .0070859 which are statistically significant at, 1 percent and 1 percent level of significance respectively.
 Shows every single variable explains the variation in the dependent variable. The value of  is about 0.9356 which shows that 0.94 percent of total variation in productivity is explained by the explanatory variables. Here the R2 shows how well terms or data points fit a curve or line. Adjusted R2 also indicates how well terms fit a curve or line, but adjusts for the number of terms in a model. Here the author has used more useful variables, so the adjusted R2 has increased. The value of adjusted R2 is about 0.9248 which shows 0.93 percentage of variation explained by only the independent variables that actually affect the dependent variable.
Dividing the sum of squares of the residual (794.739483) by its degrees of freedom (42) yields 18.9223686. That is the mean sum of squares.  Further taking a square root, the Root MSE is 4.35. Basically, Root MSE is a measurement of accuracy. The more accurate model would have less error, leading to a smaller error sum of squares. The closer to zero of the Root MSE shows better the fit.                 
3.      Correlation among variables
Here author have shown correlation among 7 variables. These are land cost rental cost labor cost fuel cost seed cost pesticide cost and fertilizer cost. Author has used this statistical tool to describe the degree to which one variable is linearly related to another.
    
Productivity
Land
Rental
Labor
Fuel
Seed
Pestic
Fertil   
Productivity
1.0000







Land_Cost
0.8953
1.0000






 Rental_Cost
0.8536
0.8926
1.0000





  Labor_Cost
0.7537
0.8221
0.8262
1.0000




Fuel_Cost
0.6841
0.8021
0.7832
0.7286
1.0000



   Seed_Cost
0.6311
0.5734
0.4802
0.4788
0.4894
1.0000


Pesticide cost
0.5185
0.5424
0.5486
0.5144
0.6030
0.1428
1.0000

Fertilizer
Cost
0.3096
0.1352
0.2146
-0.0285
-0.0251
-0.3451
0.2368
1.0000
The above table shows that, there is a perfect positive relationship between productivity. If the relationship between land and productivity is considered then a strong positive relationship is apparent. Considering the relationship among land cost, rental cost with productivity author has found a strong positive relationship among them. Where there is strong positive relationship between rental cost and land cost. In case of labor cost, where it shows strong, strong, strong and perfect positive relationship with productivity, land, rental and labor cost respectively. Regarding the fuel cost, it shows moderate, strong, strong, moderate and perfect positive relationship with productivity, land, rental, labor and fuel costs respectively. Likewise, seed cost has a moderate, moderate, weak, weak, weak and perfect positive relationship with productivity, land, rental, labor, fuel and seed costs. Similarly is case pesticide cost it shows moderate relationship with productivity, land, rental, labor and fuel costs respectively and also expresses perfect positive relationship seed and pesticide costs. Finally, author has found that, fertilizer cost has weak, weak, weak and negative weak, negative weak, negative weak, weak and perfect positive relationship with productivity, land, rental, labor, fuel, seed, pesticide and fertilizer costs respectively.
4.      Single Sample t-test
a)      T- test (Paired t-test)
Paired t test
Variable
Obs        Mean    Std. Err.     Std. Dev.          [95% Conf. Interval]
Produc~d
50       46.71      2.243635    15.86489          42.20125    51.21875
Labor_~t
50        9259      222.1357    1570.737           8812.602    9705.398
diff      
  50     -9212.29  220.4496   1558.814            -9655.3    -8769.28
           
mean(diff) = mean(Productivity_M~d - Labor_Cost)           t = -41.7886
Ho: mean(diff) = 0                                         degrees            of freedom =       49
Ha: mean(diff) < 0           Ha: mean(diff) != 0          Ha: mean(diff) > 0
Pr(T < t) = 0.0000            Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000
Ø  Mean (diff): This is the mean being tested. In this report mean (Productivity_M~d - Labor_Cost) has shown.
Ø  t: This is the Student t-statistics. It is the ration of the mean of the difference to the standard error of the difference (-9212.29 /220.4496 )= -41.7886
Ø  degrees of freedom:  The degrees of freedom for the paired observations is simply the number of observations minus 1. In this report the degrees of freedom is (50-1) =49.
Ø  Pr (T < t), Pr (T > t): These are the one-tailed p-values for evaluating the alternatives (mean < Ho value) and (mean > Ho value), respectively. If the p-value is less than the pre-specified alpha level (usually 0.5 or .01) the mean difference will statistically significantly greater than or less than zero.
Ø  Pr (|T| > |t|): This is the two-tailed p-value which is computed using the t-distribution. Here the mean is statistically significantly different from zero. So, null hypothesis has been rejected and alternative hypothesis has been accepted.
5.      Test for Normality
a      1)      Sapiro-Wilk Test for Normality
Variable
Obs
W
V
z
Prob>z






Productivi~d
50
0.96895
1.460
0.808
0.20963
Land_Cost
50
0.96863
1.475
0.829
0.20350
Rental_Cost
50
0.88528
5.395
3.595
0.00016
Labor_Cost
50
0.95853
1.951
1.425
0.07711
Fuel_Cost
50
0.95634
2.054
1.535
0.06245
Seed_Cost
50
0.92363
3.592
2.727
0.00320
Pesticide_~t
50
0.91152
4.161
3.041
0.00118
Fertilizer~t
50
0.84418
7.328
4.248
0.00001
The Shapiro–Wilk test utilizes the null hypothesis principle to check whether a sample x1, ..., xn came from a normally distributed population. From this table the author found that, collected sample data on productivity and land cost are collected from normally distributed population. Where the rest of sample data on rental cost, labor cost, fuel cost, seed cost, pesticide cost and fertilizer cost are not collected from normally distributed population as these have level of significant at 1 percent, 10 percent, 10 percent, 1percent, 1 percent, 1 percent and 1 percent respectively.
b    2)   Normal Probability Plot
The normal probability plot is a graphical tool for comparing a data set with the normal distribution.
The above figure shows that, the x-axis is transformed so that a cumulative normal density function will plot in a straight line. Then, using the mean and standard deviation the data points are plotted along the fitted normal line. Here the author found normal distribution with little fluctuation of the variable of productivity. All the values are close to the 450 line. This is a ‘long tails’ curve which starts above the normal line, twists to follow it, and ends above it indicates long tails. That is, author is seeing little variance than would expect in a normal distribution.
In the above figure, the author found normal distribution with little fluctuation of land cost. This is a ‘long tails’ curve which starts above the normal line, twists to follow it, and ends above it indicates long tails. That is, author is seeing some variance in the land cost than would expect in a normal distribution.
In this figure also, the author has found normal distribution of labor cost. This is a ‘long tails’ curve which starts above the normal line bends to follow it, and ends below it indicates long tails. That is, author is seeing some variance in the labor cost than would expect in a normal distribution.
The above figure shows that, the normal distribution with little fluctuation of the variable of seed cost variable. This is a ‘long tails’ curve which starts above the normal line, curvatures to follow it, and ends above it indicates long tails. That is, author is seeing more variance in the seed cost than would expect in a normal distribution.
c    3)  Kernel- Density Plot
Kernel density estimators, specify a width. In the graph above, the default width has been used. kdensity stores the width in the returned scalar bwidth. Doing this, author discovered that the width is approximately 6.5296. The units of the width are the units of x, where the variable of productivity being analyzed. The width is specified as a half-width, meaning that the kernel density estimator with half-width of 6.5296 corresponds to sliding a window of size 13.0592 across the data. All the values taken in the sample are normally distributed.
To estimate the kdensity, author exposed that the width is approximately 646.4752. The units of the width are the units of x, where the variable of labor cost being analyzed. The width is specified as a half-width, meaning that the kernel density estimator with half-width 646.4752 corresponds to sliding a window of size 1292.9504 across the data. This is rightly skewed therefore the values are not collected from normally distributed population.  The condition of symmetry of the normal distribution has been violated in this graph.
To estimate the kdensity, author exposed that the width is approximately 305.0960. The units of the width are the units of x, where the variable of rental cost being analyzed. The width is specified as a half-width, meaning that the kernel density estimator with half-width 305.0960 corresponds to sliding a window of size 610.192 across the data. All the values taken in the sample are not normally distributed. The mean, median and mode are the same value in a normal curve and these all are remain at the centre. But above graph has violated this characteristic.
To estimate the kdensity, author exposed that the width is approximately 56.4428. The units of the width are the units of x, where the variable of pesticide cost being analyzed. The width is specified as a half-width, meaning that the kernel density estimator with half-width 56.4428 corresponds to sliding a window of size 112.8856` across the data. Finally, all the values taken in the sample are not normally distributed. This graph has violated the unimodal characteristics and mean, median and mode will be centered characteristics of normal distribution.






d     4)  Quantile Normal Plot
The above figure shows that, all the values are close to the 450 line. Therefore samples are taken from normally distributed population. The point pattern is curved with slope increasing from left to right.
In the above figure, all the values are close to the 450 line. Therefore samples are taken from normally distributed population. The point pattern is curved with slope increasing from left to right. It is also shows short tails at both ends of the data of land cost distribution.
This figure also infers that all the values are close to the 450 line. Therefore samples are taken from normally distributed population. The point pattern is curved with slope increasing from left to right. Here the data of labor cost have been rounded or are discrete.
Finally, this figure the nonlinearity of the points indicates all the values are close to the 450 line. Therefore samples are taken from normally distributed population. The point pattern is curved with slope increasing from left to right.

6.      Test for Multicollinearity
a)      Variance Inflation Factor Test (VIF Test)
In multiple regression, the variance inflation factor (VIF) is used as an indicator of multicollinearity. Multicollinearity may be said to be observed when two or more independent variables as a combination predict a very substantial percentage of another independent variable's variance. In other words, Multicollinearity occurs when variables are so highly correlated with each other that it is difficult to come up with reliable estimates of their individual regression coefficients. Multicollinearity VIF specifically indicates the magnitude of the inflation in the standard errors associated with a particular beta weight that is due to multicollinearity. When VIF <10 or 1/VIF>0.10 then there will exist no multicollinearity. In contrast with the first, when VIF >10 or 1/VIF<0.10 then there will exist multicollinearity. The author has computed VIF as follows:
Variables
VIF
1/VIF
Land_Cost
7.30
0.137029
Rental_Cost
6.97
0.143401
Labor_Cost
4.25
0.235131
Fuel_Cost
3.82
0.261534
Seed_Cost
2.15
0.464206
Fertilizer~t
1.92
0.520474
Pesticide_~t
1.83
0.546381
Mean VIF
4.04


The above table shows that, there is no multicollinearity among the variables used in the model. Author has found that, VIF of land cost, rental cost, fuel cost are 7.30, 6.97 and 4.25 whereas the 1/VIF of these variables is 0.137029, 0.143401 and 0.235131. Similarly, the VIF of fuel cost, seed cost, fertilizer cost and pesticide cost are 3.82, 2.15, 1.92 and 1.83 respectively where the 1/VIF of these variables are 0.261534, 0.464206, 0.520474 and 0.546381 respectively. By seeing only the mean VIF it is possible to infer whether there is multicollinearity or not. Here the mean VIF value is 4.04 which is less than 10 but greater than 0.10.



7.      Specification Error Test
a    a)   Link Test for Specification Error
Productivity
Coef.
Std. Err.
t
   P>t
[95% Conf.    Interval]
_hat
.686379
.2178147
3.15
0.003
.2481923     1.124566
_hatsq
.0032601
.0022305
1.46
0.151
-.0012271   .0077473
_cons
6.783878
4.992431
1.36
0.181
-3.259597   16.82735
From this table author has found that, the variable _hatsq is not significant (with p-value = 0.151). On the other hand, it tells us that we have no specification error (since the linktest is not significant).
8.      Omitted Variable Bias Test
      b)      Ramsey Test for Omitted Variable
Ramsey RESET test using powers of the fitted values of Productivity
       Ho:  model has no omitted variables
                  F (3, 39) =      2.68
                  Prob > F =      0.0600
The null hypothesis has been rejected. The model has omitted variable bias. As the value of prob>F is 0.0600 there exists omitted variable bias in the model.
Step 8: Findings of the Report
1.      Author has set up the cross- sectional data analysis in this report. As data was surveyed during different weeks of the same year.
2.      Author found that, in this model there are 4 variables such as land cost, labor cost, seed cost and fertilizer cost which have much impact on productivity. Among these land cost, seed cost and fertilizer cost are significant at 1 percent level of significance respectively. Where only labor cost is significant at 5 percent significance level. 
3.      The value of  is about 0.9356 which shows that 0.94 percent of total variation in productivity is explained by the explanatory variables.
4.      The correlation among different costs of different variables and productivity show strong perfect positive and moderate perfect positive relationship. Only the fertilizer cost has a weak positive relationship with productivity.
5.   Author has found that the mean is statistically significantly different from zero. So, null hypothesis has been rejected and alternative hypothesis has been accepted.
6.      In this report author has found normal distribution with little fluctuation from normal probability plot.
7.   In kernel density plot some variables show that the samples are collected from normally distributed population. Whereas some variables show that the samples are not collected from normally distributed population.
8.  The figures of quantile normal plot show all the values are close to the 450 line. Therefore samples are taken from normally distributed population.
9.      There is no multicollinearity among the variables used in the model.
10.  This model is free from specification error.
11.  The value of probability>F is 0.0600. So, there exists omitted variable bias in the model.