Title
of the Report
Productivity
Analysis of Organic and Inorganic Input use in Paddy Cultivation
Step 1: Research
Question
Research
Question 1: What is the productivity level of using organic and inorganic input
use in paddy production?
Step
2: Variables and their Unit of Measurement
Table:
Variables and their Unit of Measurement
Variables
|
Unit of Measurement
|
Output
or Productivity
|
Mound
|
Land
Cost
|
BDT
|
Rental Cost
|
BDT
|
Labor Cost
|
BDT
|
Fuel Cost
|
BDT
|
Seed Cost
|
BDT
|
Pesticide Cost
|
BDT
|
Fertilizer Cost
|
BDT
|
Step
3: Specification of the Mathematical Model of Productivity from Organic and
Inorganic Input use in Paddy Cultivation
A mathematical model is simply set of mathematical
equations. This is a multiple equation model, because here is more than one
equation. Author has shown a mathematical model of the relationship between
productivity and input use and it is also called production function in
economics. The variable appearing on the left side of the equality sign is
called dependent variable whereas the variables on the right side are called
independent or explanatory variables. The general form of the equation is as
follows:
P = f (La, Ren, L, Fu, S, Pes, Fer)……………………..(i)
Where,
P= Productivity; and
La= Land Cost; Ren= Rental Cost; L= Labor Cost; Fu= Fuel Cost; S= Seed Cost; Pes=
Pesticide Cost and Fer= Fertilizer Cost
Step
4: Specification of the Econometric Model of Productivity from Organic and
Inorganic Input use in Paddy Cultivation
The pure
mathematical model of the production function in equation (i) is of limited
interest to the author, because it assumes that, there is an exact or
deterministic relationship between productivity and input uses. But generally, the relationships between
economic variables are inexact. To allow inexact relationships between economic
variables the author would modify the deterministic production function (i) as
follows:
+ + ……..(ii)
Here,
Productivity
= Intercept of the regression line
(i= 1, 2, 3…7) = Coefficients of
the explanatory variables
u = Error term
Step 5: Data
Analysis
These data give information on the variables
concerning individual agents or producers at a given point of time. As this set
of data was surveyed during different weeks of the same year, author would view
this as a cross-sectional data set. An important feature of this
cross-sectional data is that they have been obtained by random sampling. The
formulation of cross-sectional data set is as follows:
Cs: + +......................................................
(iii)
Here,
Cs= Cross-sectional
data set
= 1,2,3……………….7
Step 6: Estimation of Econometric
Model
1. Descriptive Statistics on Variables Used
Variables
|
Obs
|
Mean
|
Std.
Dev.
|
Min
|
Max
|
Productivity
|
50
|
46.71
|
15.86489
|
18
|
80
|
Land_cost
|
50
|
6456.6
|
2615.606
|
1920
|
11600
|
Rental_Cost
|
50
|
5664.6
|
1372.07
|
3000
|
8500
|
Labor_cost
|
50
|
9259
|
1570.737
|
4500
|
11700
|
Fuel_Cost
|
50
|
8561.76
|
3184.66
|
1080
|
14400
|
Seed_Cost
|
50
|
4331.6
|
3149.084
|
250
|
10800
|
Pesticide_~t
|
50
|
403.94
|
196.7764
|
0
|
1160
|
Fertilizer~t
|
50
|
1454.94
|
850.992
|
560
|
4440
|
In the above table, author has
shown that, the mean, min and max value of 7 variables. Mean is the average
number of these variables and divided by these number of numbers. Here the mean
value of productivity, land and labor costs is 46.71 mounds, 6456.6 BDT and
9259 BDT respectively. The rest mean values of variables are apparent in the
table also. Here the standard deviation of these variables also considered.
Where the standard deviation is a statistical value which has been used to
determine how spread out the data in these sample are. Finally, the minimum
(The smallest value in the series) and maximum (The biggest value in the
series) values of different variables are shown in the table. In this table min
value containing zero in case of pesticide used and max value 14400 for fuel
cost respectively.
2.
Analysis
for the Multiple Regression Model of Productivity
The author uses multiple regression models to find
out productivity by using land cost rental cost labor cost fuel cost
seed cost
pesticide cost and fertilizer cost
as explanatory variables.
Dependent
Variable= Productivity
|
||||||
Productivity_~d
|
Coef.
|
Std. Err.
|
t
|
P>t
|
[95% Conf.
|
Interval]
|
Land_Cost
|
.0023952
|
.0006418
|
3.73
|
0.001
|
.0011
|
.0036905
|
Rental_Cost
|
.000906
|
.001196
|
0.76
|
0.453
|
-.0015077
|
.0033196
|
Labor_Cost
|
.0019087
|
.0008159
|
2.34
|
0.024
|
.0002622
|
.0035552
|
Fuel_Cost
|
-.0003217
|
.0003816
|
-0.84
|
0.404
|
-.0010917
|
.0004483
|
Seed_Cost
|
.0021764
|
.0002896
|
7.51
|
0.000
|
.0015919
|
.0027609
|
Pesticide_Cost
|
.0041359
|
.0042724
|
0.97
|
0.339
|
-.0044861
|
.0127579
|
Fertilizer_Cost
|
.0070859
|
.0010122
|
7.00
|
0.000
|
.0050432
|
.0091286
|
_cons
|
-10.21255
|
5.122402
|
-1.99
|
0.053
|
-20.54998
|
.1248775
|
R-squared =
0.9356
|
||||||
Adj
R-squared = 0.9248
|
||||||
Root
MSE = 4.35 Where sum of squares of the
residual= 794.739483 and degrees of freedom= 42
|
The above table implies that, all the explanatory
variables have much impact on productivity. If land cost increases by 1 BDT
then productivity increases by .0023952 mound which is statistically
significant at 1 percent level of significance. Similarly, if labor cost
increases by 1 BDT then productivity increases by .0019087 mound this is
statistically significant at 5 percent significant level. Correspondingly, if
seed cost and fertilizer cost increases by 1 BDT then productivity increases by
.0021764 and .0070859 which are statistically significant at, 1 percent and 1 percent
level of significance respectively.
Shows every single variable explains the
variation in the dependent variable. The value of is about 0.9356 which shows that 0.94 percent of
total variation in productivity is explained by the explanatory variables. Here
the R2 shows how well terms or data points fit a curve or line.
Adjusted R2 also indicates how well terms fit a curve or line, but
adjusts for the number of terms in a model. Here the author has used more
useful variables, so the adjusted R2 has increased. The value of
adjusted R2 is about 0.9248 which shows 0.93 percentage of variation
explained by only the independent variables that actually affect the dependent
variable.
Dividing
the sum of squares of the residual (794.739483) by its degrees of freedom (42)
yields 18.9223686. That is the mean sum of squares. Further taking a square root, the Root MSE is
4.35. Basically, Root MSE is a measurement of accuracy. The more accurate model
would have less error, leading to a smaller error sum of squares. The closer to
zero of the Root MSE shows better the fit.
3.
Correlation
among variables
Here author have
shown correlation among 7 variables. These are land cost rental cost labor cost
fuel cost
seed cost
pesticide cost and fertilizer cost.
Author has used this statistical tool to describe the degree to which one
variable is linearly related to another.
|
Productivity
|
Land
|
Rental
|
Labor
|
Fuel
|
Seed
|
Pestic
|
Fertil
|
Productivity
|
1.0000
|
|
|
|
|
|
|
|
Land_Cost
|
0.8953
|
1.0000
|
|
|
|
|
|
|
Rental_Cost
|
0.8536
|
0.8926
|
1.0000
|
|
|
|
|
|
Labor_Cost
|
0.7537
|
0.8221
|
0.8262
|
1.0000
|
|
|
|
|
Fuel_Cost
|
0.6841
|
0.8021
|
0.7832
|
0.7286
|
1.0000
|
|
|
|
Seed_Cost
|
0.6311
|
0.5734
|
0.4802
|
0.4788
|
0.4894
|
1.0000
|
|
|
Pesticide cost
|
0.5185
|
0.5424
|
0.5486
|
0.5144
|
0.6030
|
0.1428
|
1.0000
|
|
Fertilizer
Cost
|
0.3096
|
0.1352
|
0.2146
|
-0.0285
|
-0.0251
|
-0.3451
|
0.2368
|
1.0000
|
The above table shows that, there
is a perfect positive relationship between productivity. If the relationship
between land and productivity is considered then a strong positive relationship
is apparent. Considering the relationship among land cost, rental cost with productivity
author has found a strong positive relationship among them. Where there is
strong positive relationship between rental cost and land cost. In case of
labor cost, where it shows strong, strong, strong and perfect positive relationship
with productivity, land, rental and labor cost respectively. Regarding the fuel
cost, it shows moderate, strong, strong, moderate and perfect positive
relationship with productivity, land, rental, labor and fuel costs
respectively. Likewise, seed cost has a moderate, moderate, weak, weak, weak
and perfect positive relationship with productivity, land, rental, labor, fuel
and seed costs. Similarly is case pesticide cost it shows moderate relationship
with productivity, land, rental, labor and fuel costs respectively and also
expresses perfect positive relationship seed and pesticide costs. Finally,
author has found that, fertilizer cost has weak, weak, weak and negative weak,
negative weak, negative weak, weak and perfect positive relationship with
productivity, land, rental, labor, fuel, seed, pesticide and fertilizer costs
respectively.
4.
Single
Sample t-test
a)
T- test (Paired t-test)
Paired t test
|
|
Variable
|
Obs Mean Std. Err. Std.
Dev. [95% Conf. Interval]
|
Produc~d
|
50 46.71 2.243635 15.86489 42.20125 51.21875
|
Labor_~t
|
50 9259 222.1357 1570.737 8812.602 9705.398
|
diff
|
50
-9212.29 220.4496 1558.814 -9655.3 -8769.28
|
mean(diff)
= mean(Productivity_M~d - Labor_Cost) t
= -41.7886
Ho:
mean(diff) = 0
degrees of freedom
= 49
Ha:
mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0
Pr(T
< t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
|
Ø Mean (diff):
This is the mean being tested. In this report mean (Productivity_M~d -
Labor_Cost) has shown.
Ø t: This is the
Student t-statistics. It is the ration of the mean of the difference to the
standard error of the difference (-9212.29 /220.4496 )= -41.7886
Ø degrees of freedom:
The degrees of freedom for the paired
observations is simply the number of observations minus 1. In this report the degrees
of freedom is (50-1) =49.
Ø Pr (T < t), Pr (T > t): These
are the one-tailed p-values for evaluating the alternatives (mean < Ho
value) and (mean > Ho value), respectively. If the p-value is less than the
pre-specified alpha level (usually 0.5 or .01) the mean difference will
statistically significantly greater than or less than zero.
Ø Pr (|T| > |t|): This
is the two-tailed p-value which is computed using the t-distribution. Here the
mean is statistically significantly different from zero. So, null hypothesis
has been rejected and alternative hypothesis has been accepted.
5.
Test
for Normality
a 1)
Sapiro-Wilk Test for Normality
Variable
|
Obs
|
W
|
V
|
z
|
Prob>z
|
|
|
|
|
|
|
Productivi~d
|
50
|
0.96895
|
1.460
|
0.808
|
0.20963
|
Land_Cost
|
50
|
0.96863
|
1.475
|
0.829
|
0.20350
|
Rental_Cost
|
50
|
0.88528
|
5.395
|
3.595
|
0.00016
|
Labor_Cost
|
50
|
0.95853
|
1.951
|
1.425
|
0.07711
|
Fuel_Cost
|
50
|
0.95634
|
2.054
|
1.535
|
0.06245
|
Seed_Cost
|
50
|
0.92363
|
3.592
|
2.727
|
0.00320
|
Pesticide_~t
|
50
|
0.91152
|
4.161
|
3.041
|
0.00118
|
Fertilizer~t
|
50
|
0.84418
|
7.328
|
4.248
|
0.00001
|
The Shapiro–Wilk test utilizes the null hypothesis
principle to check whether a sample x1, ..., xn
came from a normally distributed population.
From this table the author found that, collected sample data on productivity
and land cost are collected from normally distributed population. Where the
rest of sample data on rental cost, labor cost, fuel cost, seed cost, pesticide
cost and fertilizer cost are not collected from normally distributed population
as these have level of significant at 1 percent, 10 percent, 10 percent,
1percent, 1 percent, 1 percent and 1 percent respectively.
b 2) Normal
Probability Plot
The normal probability
plot is a graphical tool for comparing a data set with the normal distribution.
The
above figure shows that, the x-axis is transformed so that a cumulative normal
density function will plot in a straight line. Then, using the mean and
standard deviation the data points are plotted along the fitted normal line.
Here the author found normal distribution with little fluctuation of the
variable of productivity. All the values are close to the 450 line. This
is a ‘long tails’ curve which starts above the normal line, twists to follow
it, and ends above it indicates long tails. That is, author is seeing little
variance than would expect in a normal distribution.
In
the above figure, the author found normal distribution with little fluctuation
of land cost. This is a ‘long tails’ curve which starts above the normal line,
twists to follow it, and ends above it indicates long tails. That is, author is
seeing some variance in the land cost than would expect in a normal
distribution.
In
this figure also, the author has found normal distribution of labor cost. This
is a ‘long tails’ curve which starts above the normal line bends to follow it,
and ends below it indicates long tails. That is, author is seeing some variance
in the labor cost than would expect in a normal distribution.
The
above figure shows that, the normal distribution with little fluctuation of the
variable of seed cost variable. This is a ‘long tails’ curve which starts above
the normal line, curvatures to follow it, and ends above it indicates long
tails. That is, author is seeing more variance in the seed cost than would
expect in a normal distribution.
c 3) Kernel-
Density Plot
Kernel density
estimators, specify a width. In the graph above, the default width has been
used. kdensity stores the width in the returned scalar bwidth. Doing this, author
discovered that the width is approximately 6.5296. The units of the width are
the units of x, where the variable of productivity being analyzed. The width is
specified as a half-width, meaning that the kernel density estimator with
half-width of 6.5296 corresponds to sliding a window of size 13.0592 across the
data. All the values taken in the sample are normally distributed.
To estimate the
kdensity, author exposed that the width is approximately 646.4752. The units of
the width are the units of x, where the variable of labor cost being analyzed.
The width is specified as a half-width, meaning that the kernel density
estimator with half-width 646.4752 corresponds to sliding a window of size 1292.9504
across the data. This is rightly skewed therefore the values are not collected
from normally distributed population. The
condition of symmetry of the normal distribution has been violated in this
graph.
To estimate the
kdensity, author exposed that the width is approximately 305.0960. The units of
the width are the units of x, where the variable of rental cost being analyzed.
The width is specified as a half-width, meaning that the kernel density
estimator with half-width 305.0960 corresponds to sliding a window of size
610.192 across the data. All the values
taken in the sample are not normally distributed. The mean, median and mode are
the same value in a normal curve and these all are remain at the centre. But
above graph has violated this characteristic.
To estimate the
kdensity, author exposed that the width is approximately 56.4428. The units of
the width are the units of x, where the variable of pesticide cost being
analyzed. The width is specified as a half-width, meaning that the kernel
density estimator with half-width 56.4428 corresponds to sliding a window of
size 112.8856` across the data. Finally, all
the values taken in the sample are not normally distributed. This graph has
violated the unimodal characteristics and mean, median and mode will be
centered characteristics of normal distribution.
d 4) Quantile
Normal Plot
The above figure shows that, all the values are
close to the 450 line. Therefore samples are taken from normally
distributed population. The point pattern is curved with slope increasing from
left to right.
In the above figure, all the values are close to the
450 line. Therefore samples are taken from normally distributed
population. The point pattern is curved with slope increasing from left to
right. It is also shows short
tails at both ends of the data of land cost distribution.
This figure also infers that all the values are
close to the 450 line. Therefore samples are taken from normally
distributed population. The point pattern is curved with slope increasing from
left to right. Here the data
of labor cost have been rounded or are discrete.
Finally, this
figure the nonlinearity of the points indicates all the values are close to the
450 line. Therefore samples are taken from normally distributed
population. The point pattern is curved with slope increasing from left to
right.
6.
Test
for Multicollinearity
a) Variance
Inflation Factor Test (VIF Test)
In multiple regression, the variance inflation
factor (VIF) is used as an indicator of multicollinearity. Multicollinearity may
be said to be observed when two or more independent variables as a combination
predict a very substantial percentage of another independent variable's
variance. In other words, Multicollinearity occurs when variables are so highly
correlated with each other that it is difficult to come up with reliable
estimates of their individual regression coefficients. Multicollinearity VIF
specifically indicates the magnitude of the inflation in the standard errors
associated with a particular beta weight that is due to multicollinearity. When
VIF <10 or 1/VIF>0.10 then there will exist no multicollinearity. In
contrast with the first, when VIF >10 or 1/VIF<0.10 then there will exist
multicollinearity. The author has computed VIF as follows:
Variables
|
VIF
|
1/VIF
|
Land_Cost
|
7.30
|
0.137029
|
Rental_Cost
|
6.97
|
0.143401
|
Labor_Cost
|
4.25
|
0.235131
|
Fuel_Cost
|
3.82
|
0.261534
|
Seed_Cost
|
2.15
|
0.464206
|
Fertilizer~t
|
1.92
|
0.520474
|
Pesticide_~t
|
1.83
|
0.546381
|
Mean
VIF
|
4.04
|
|
The above table
shows that, there is no multicollinearity among the variables used in the model.
Author has found that, VIF of land cost, rental cost, fuel cost are 7.30, 6.97
and 4.25 whereas the 1/VIF of these variables is 0.137029, 0.143401 and
0.235131. Similarly, the VIF of fuel cost, seed cost, fertilizer cost and
pesticide cost are 3.82, 2.15, 1.92 and 1.83 respectively where the 1/VIF of
these variables are 0.261534, 0.464206, 0.520474 and 0.546381 respectively. By
seeing only the mean VIF it is possible to infer whether there is multicollinearity
or not. Here the mean VIF value is 4.04 which is less than 10 but greater than
0.10.
7.
Specification
Error Test
a a) Link
Test for Specification Error
Productivity
|
Coef.
|
Std.
Err.
|
t
|
P>t
|
[95% Conf. Interval]
|
_hat
|
.686379
|
.2178147
|
3.15
|
0.003
|
.2481923 1.124566
|
_hatsq
|
.0032601
|
.0022305
|
1.46
|
0.151
|
-.0012271 .0077473
|
_cons
|
6.783878
|
4.992431
|
1.36
|
0.181
|
-3.259597 16.82735
|
From this table author has found
that, the variable _hatsq is not significant (with p-value = 0.151). On
the other hand, it tells us that we have no specification error (since the linktest
is not significant).
8. Omitted Variable Bias
Test
b) Ramsey
Test for Omitted Variable
Ramsey RESET test using powers of
the fitted values of Productivity
Ho:
model has no omitted variables
F (3, 39) = 2.68
Prob > F = 0.0600
The null hypothesis has been rejected. The model has
omitted variable bias. As the value of prob>F is 0.0600 there exists omitted
variable bias in the model.
Step
8: Findings of the Report
1.
Author has set up the cross- sectional
data analysis in this report. As data was surveyed during different weeks of
the same year.
2. Author
found that, in this model there are 4 variables such as land cost, labor cost,
seed cost and fertilizer cost which have much impact on productivity. Among
these land cost, seed cost and fertilizer cost are significant at 1 percent level
of significance respectively. Where only labor cost is significant at 5 percent
significance level.
3. The
value of is about 0.9356 which shows that 0.94 percent
of total variation in productivity is explained by the explanatory variables.
4. The
correlation among different costs of different variables and productivity show
strong perfect positive and moderate perfect positive relationship. Only the
fertilizer cost has a weak positive relationship with productivity.
5. Author
has found that the mean is statistically significantly different from zero. So,
null hypothesis has been rejected and alternative hypothesis has been accepted.
6.
In this report author has found normal distribution
with little fluctuation from normal probability plot.
7. In
kernel density plot some variables show that the samples are collected from
normally distributed population. Whereas some variables show that the samples
are not collected from normally distributed population.
8. The
figures of quantile normal plot show all the values are close to the 450
line. Therefore samples are taken from normally distributed population.
9. There
is no multicollinearity among the variables used in the model.
10. This
model is free from specification error.
11. The
value of probability>F is 0.0600. So, there exists omitted variable bias in
the model.