Journal of Animal and Veterinary Advances

Year: 2009
Volume: 8
Issue: 1
Page No. 47 - 50

Use of Factor Analysis Scores in Multiple Regression Model for Estimation of Body Weight from Some Body Measurements in Lizardfish

Authors : Levent Sangun , Soner Cankaya , G. Tamer Kayaalp and Mustafa Akar

Abstract: The aim of the study is to find out, the utility function of factor analysis scores in multiple linear regression model that were used to estimate body weight with respect to some body measurements (total length, standard length, fork length, head length, body depth, body circuit, body height) measured from Lizardfish in Iskenderun Bay. The results of the factor analysis showed that 3 factor with eigenvalues greater than 1 can be selected as explanatory variables and used to estimate body weight of Lizardfish in multiple linear regression model. The factors accounted for 98.4% of total variation in the body weight.

How to cite this article:

Levent Sangun , Soner Cankaya , G. Tamer Kayaalp and Mustafa Akar , 2009. Use of Factor Analysis Scores in Multiple Regression Model for Estimation of Body Weight from Some Body Measurements in Lizardfish. Journal of Animal and Veterinary Advances, 8: 47-50.

INTRODUCTION

Information on some body measurement is essential to estimate the body measurement of fish. Multiple regression analysis has been used to interpret the complex relationships among the body weight and the some body measurement (total length, standard length, fork length, head length, body depth, body circuit, body height and etc.) of the fish (Cankaya et al., 2006; Akar et al., 2001). But, although this method is helpful for estimating the body weight of the fish, there are several reasons why the researchers are not often satisfied with results. One of them, is its poor performance when the multicollinearity is present among variables. This evidence is to present its biological interpretation may be misleading.

The specific goals of factor analysis are to provide reduce a large number of observed variables to smaller number of factors and to provide a regression equation for an underlying process by using observed variables (Tabachnick and Fidell, 2001; Keskin et al., 2007). Factor scores can be derived such that they are nearly uncorrelated or orthogonal. Thus, the problem for multicollinearity among the variables, which are used to estimate the body weight of the fish can be solved by using the coefficients.

The aim of the study is to find out, the utility function of factor analysis scores in multiple linear regression model that were used to estimate body weight with respect to some body measurements (total length, standard length, fork length, head length, body depth, body circuit, body height) measured from Lizardfish in Iskenderun Bay.

MATERIALS AND METHODS

Data were collected beetwen 2003-2005 from the north-eastern Mediterranean coast of Turkey. Species were caught by trawl ranging from 20-100 m. The Lizardfish were weighted with a digital balance to an accuracy of 0.01 g and measured with a precision of 0.01 cm for their total length, standard length, fork length, head length, body depth, body circuit and body height.

Linear regression analysis consists of a collection of techniques used to explore relationships between variables. The aim of the multiple regression, is to estimate β = (β1, β2, …, βp)t from the data (Xi1, Xi2, ….., Xip; Yi) (Cankaya et al., 2006). The general expression of multiple linear regression model formed for the measurements (one dependent and p independent variables) is given in Eq. 1.

(1)

where parameters:

e = Usually assumed to be normally distri-buted with mean zero and variance σ2
Y = Dependent variable or response
Xi1, Xi2, …, Xip = Independent variables or the predictors

When the dependent (Y) values are plotted against the independent (X) values, the curve cannot be represented by a straight line every time, that is, the relationship may be curvilinear. In order to get a linear curve, we transform X and Y values into a logarithmic value. If we take the logarithm of the Eq. 1, it can be defined as:

(2)

or

(3)

where, Y = lny, zi1, lnXil zi2 = lnXi2, ..., zip = lnXip, respectively β1, β2, ..., βp and a = lnβ0 are the parameters and Xip (p = 1, 2, …, 7) are the independent variables, which are total length (cm), standard length (cm), fork length (cm), head length (cm), body depth (cm), body circuit (cm) and body height (cm), respectively.

δi ~ (0, σ2) or ei ~ is the random error (Draper and Smith, 1981; Gunst and Mason, 1980; Kleinbaum et al., 1998). In the multiple regression analysis, the following t-test statistics is benefited in order to test the importance of regression coefficients which is given in Eq. 4.

(4)

where, bj (j = 1, 2, ..., p) is the least square estimates and var (βj) is the diagonal member of matrix s2(X’X)-1 and also s2 is the Mean Square of residual (MS) which obtains from the ANOVA. Of course, we test null hypothesis such as β j = 0.

One of the important problems for usage of multiple linear regression analysis is the presence of multicollinearity among the used predictor variables to estimate the body weight of the fish. Multicollinearity is a statistical term for the existence of a high degree of linear correlation amongst two or more explanatory variables in a multiple regression model. In the presence of multicollinearity, it will be difficult to assess the effect of the independent variables on the dependent variable (Anonymous, 2008). To detection of multicollinearity, tolerance or the Variation Inflation Factor (VIF) should be calculated by means of the following equations:

(5)

The largest VIF value among all independent variables is used as indicator of the severity of multicollinearity. A maximum VIF value in excess of 10 is taken as an indication that multicollinearity may be unduly influencing the least squares estimates in multiple linear regression (Neter et al., 1989).

To overcome the limitations of the multiple linear regression method, the usage of its method based on factor scores which were estimated in factor analysis can be preferred rather than this classical method for conditions in which varying degrees of multicollinearity are present among the examined variables.

The goals of factor analysis are to determine the number of fundamental influences underlying a domain of variables, to quantify the extent to which each variable is associated with the factors and to obtain information about their nature from observing which factors contribute to performance, on which variables (Tinsley and Brown, 2000). This allows numerous intercorrelated variables to be condensed into fewer dimensions, called factors.

The basic factor analysis equation can be presented in matrix form as:

(6)

Where,
Z = apx1 vector of variables
λ = apxm matrix of factor loadings
F = amx1 vector of factors
ε = apx1 vector error (Sharma, 1996)

In our study, the correlation matrix of variables was used to obtained eigenvalues. In order to facilitate interpretation of factor loadings (lik), VARIMAX rotation was used. Factor coefficients (cik) were used to obtain factor scores for selected factor (Keskin et al., 2007). Factor scores can be derived such that they are nearly uncorrelated or orthogonal. Thus, the problem for multicollinearity among the variables which are used to estimate the body weight of the fish can be solved by using the coefficients. The factor number equals the number of Eigenvalues of the population correlation matrix that are greater than unity (Tinsley and Brown, 2000). Therefore, the factors with eigenvalues >1 were employed in multiple regression analysis (Sharma, 1996).

All the computational work was performed to estimate Body Weight (BW) with respect to some body measurements (Total Length (TL), Standard Length (SL), Fork Length (FL), Head Length (HL), Body Depth (BD), Body Circuit (BC) and Body Height (BH)) measured from Lizardfish in Iskenderun Bay by means of MINITAB and SPSS statistical package programs.

RESULTS AND DISCUSSION

The descriptive statistics for Lizardfish traits are given in Table 1.

Transformed data for all traits were explored for normality by using Kolmogorow-Smirnov normality test in SPSS (10.0 V) and were normally distributed (p>0.05).

Bivariate correlations displaying the relationship among all morphological characters considered are given in Table 2.

There were positive relationships among all body measurements and the body weight of Lizardfish (Table 2). The highest correlation was predicted between standard length and fork length (0.99), while the lowest correlation was between body depth and total length (0.52) (p<0.01). When multiple linear regression is used to analyze a data set, as the magnitude of the relationships among the independent variables (SL and FL, r = 0.99) increases, less and less reliance can be placed on the results generated by an ordinary least squares solution.

The standard error, t-values, p-values and VIF values for each regression coefficient (βj) based on the results of multiple regression analysis was given in Table 3.

Table 3 shows that the fork length, body depth and head length were found to be insignificant. Moreover, there was multicollinearity between standard length and fork length due to the largest VIF values (41.6 and 32.1) of the two traits of Lizardfish. This results indicated that the standard errors inflate (for example the value of standard error for FL is 0.34, while the value of the mean is 0.24), resulting in unstable parameter estimates.

The result of factor analysis presented that the first-three of ten factors were selected as independent variables for multiple regression model because three factors have eigenvalues >1. Because eigenvalues presented variances and that standardized variable contributes to principal component extraction is 1, a component with the eigenvalue less than 1 is not as important (Keskin et al., 2007; Tabachnick and Fidelli, 2001). The selected three factors explain 85.6% of total variation in factor analysis (Table 4). Factor 1-3 accounted for 31.5, 29.9 and 24.2% of the variation (Var) in all variables, respectively.

Table 1: Descriptive statistics for examined traits of lizardfish
BW: Body Weight; TL: Total Length; SL: Standard Length; FL: Fork Length; HL: Head Length; BD: Body Depth; BC: Body Circuit; BH: Body Height; CI: Confidence Interval

Table 2: Bivariate correlation for some body traits and body weight of lizardfish
**p<0.01

Table 3: Results of multiple regression analysis
S = 0.043, R-Sq = 98.5%, R-Sq(adj) = 98.2%

Table 4: Results of factor analysis

Table 5: Results of multiple regression analysis based on the result of factor analysis
S = 2.268, R-Sq = 98.4%, R-Sq(adj) = 98.3%

Moreover, the first factor accounted for 36.8% (2.206/5.989) of the variation in the solution; the second factor accounted for 34.9% (2.090/5.989) of the variation in the solution, the third factor accounted for 28.3% (1.693/5.989) of the variation in the solution.

After orthogonal rotation, the factor loadings were presented the relationship between examined variables and corresponding factors (Table 4). Here, it was seen that there were high correlation between standard, fork and head length traits of the fish and Factor 1; body depth and body height traits were high correlated with Factor 2 and total length trait was high correlated with Factor 3. The highest values of communalities indicate that the variances of variables were efficiently reflected by factors in multiple regression analysis. Factor score values for the three factors, which were obtained by means of factor score coefficients given in Table 4, were used as independent variables in the regression analysis to determine significant factor/s on body weight of Lizardfish (Table 5).

It was found that all of the selected factors had significant effect on body weight. The factors also explained 98.4% of the variance in the body weight of Lizardfish. Moreover, because VIF values for the factors were smaller than 10, the problem of multicollinearity presented in Table 3 was solved (Table 5).

CONCLUSION

Due to the fact that this study evaluated classical multiple linear regression and regression analysis based on factor scores building model and drawing conclusions in the presence of multicollinearity among the examined predictor variables, this study provides an introduction to a variety of multiple linear regression methods.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved