1 / 19

publish paper

original research in Pure and Applied Mathematics and Statistics Utilitas Mathematica Academy publication official is the journal. It enjoys a good reputation and This journal publishes mainly in areas of pure and applied mathematics, Statistics is popular at the international level in terms of research papers and distribution worldwide

graphic6
Download Presentation

publish paper

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 Some robust methods for estimating random linear regression model Amani Emad1, Ahmed Shaker Mohammed Tahir2 1Mustansiriyah University, College of Management and Economics Statistics, department Baghdad, Iraq, amaniemad93@uomustansiriyah.edu.iq 2Mustansiriyah University, College of Management and Economics Statistics, department Baghdad, Iraq, ahmutwali@uomustansiriyah.edu.iq Abstract The regression model with a random explanatory variable is one of the widely used models in representing the regression relationship between variables in various economic or life phenomena. This model is called the random linear regression model. In this model, the different values of the explanatory variable occur with a certain probability, instead of being fixed in repeated samples, so it contradicts one of the basic assumptions of the linear regression model, which leads to the least squares estimators losing some or all of their optimal properties, depending on the nature of the relationship between the explanatory variable . random and random error limits. As an alternative to the ordinary least squares method, the modified maximum likelihood method, which was previously used by many researchers, was used to estimate the coefficients of the random regression model. also, two methods were employed, the M method and the robust empirical likelihood method, which were not used previously in estimating the coefficients of the random regression model, but were used in estimating linear regression models that suffer from some standard problems, or in the event that the sample values contain outliers or extreme values. The three methods are among the robust estimation methods. For the purpose of comparison between the aforementioned estimation methods, simulated experiments were conducted, which revealed the preference of the modified maximum likelihood method despite the great convergence between the estimation results of the three methods. We also carried out a practical application to estimate the regression relationship between packed cell volume (PCV) as a response variable and random blood sugar (RBS) as a random explanatory variable, based on the data of a random sample of 30 patients with heart disease. The results of the practical application were consistent with the results of the simulation experiments. Keywords: random explanatory variable; random linear regression; modified maximum likelihood; M method; robust empirical likelihood. Introduction One of the basic assumptions of the linear regression model is that the explanatory variables are constant in the repeated samples, However, this assumption may be practically unfulfilled, as is the case in most economic and life phenomena, as the explanatory variables are random, in the sense that the values of those variables are not fixed, so the different values of the explanatory 472

  2. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 variables occur with a certain probability. In such cases the values of the explanatory variable are determined along with the values of the response variable as a result of some probability mechanism, rather than their values being controlled by the experimenters. In the case of adopting the method of least squares to estimate the coefficients of the regression model with random explanatory variables, the estimates of that method lose some or all of their optimal properties, and that depends on the type of relationship between the explanatory variables and the random error terms. In this case, alternative methods should be sought for the method of least squares that are efficient and not greatly affected by the deviation from the assumptions of the linear regression model. Among these methods are the robust methods, including the modified maximum likelihood method (MMLE), the M method, and the robust empirical likelihood method (REL). Kerridge was the first to deal with the regression model with random explanatory variables in 1967 [1]., assuming that the values of the explanatory variables corresponding to the values of the response variable were drawn from a population with a multivariate normal distribution and independent of the random error terms, and he worked on analyzing the resulting error Regarding the estimation process, he explained that it is useful to study the regression model with random explanatory variables, despite Ehrenberg's suggestion in 1963 that studying this model is useless and limited, especially when the number of explanatory variables is large . Lai and Wei,1981, [2] , studied the properties of least squares estimators for random regression models under special assumptions of the random explanatory variable and the random error terms. Tiku and Vaugham ,1997, [3] ,used the modified maximum likelihood method in estimating the coefficients of the binary regression model with a random explanatory variable, as they showed that the use of the maximum likelihood estimators in estimating this model is intractable and difficult to solve iteratively, and they showed that the modified maximum likelihood estimators are explicit functions of the sample observations, and therefore they are easy to calculate and are approximately efficient, and for small samples, they are almost fully efficient . In the year 2000 Vaughan and Tiku , 2000, [4] , they also used the modified maximum likelihood method in estimating the regression model with a random explanatory variable, as they assumed that this variable is distributed according to the extreme value distribution, and that the conditional distribution of the response variable is the normal distribution, and that these estimates are highly efficient. They also derived the hypothesis test method. For form transactions . Islam and Tiku ,2004, [5] , derived modified maximum likelihood estimators and M estimators for the coefficients of the multiple linear regression model under the assumption that the random errors do not follow the normal distribution, as they concluded that these estimators are efficient and robust, and that the ordinary least squares estimators are less efficient . Sazak et al ,2006, [6] , used modified maximum likelihood estimators for the coefficients of the random linear regression model, which are characterized by efficiency and robustness, assuming that both the random explanatory variable and the random errors follow a generalized logistic distribution .The same result reached by Islam and Tiku ,2010, [7], in adopting the same robust method to estimate the multiple linear regression model when both the explanatory variable and the random errors do not follow the normal distribution . Bharali and Hazarika, 2019, [8] , provided a historical review of the works which related with the regression with stochastic explanatory variable, they referred to the properties of ordinary least 473

  3. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 squares estimators, the use of the modified maximum likelihood method, and to the case of non- normal distribution of explanatory variables and random error limits . Deshpande and V. Bhat, 2019, [9], used kernel regression estimator which is one of the non- parametric estimation methods to estimate random linear regression model, they suggested the use of adaptive Nadaraya-Watson estimator, they obtained the asymptotic distribution of that estimator, and through the relative efficiency they showed the efficiency of this estimator compared to other estimators that depend on different kernel functions . Lateef and Ahmed, 2021, [10] ,used modified maximum likelihood method to estimate the parameters of the multiple regression model when the random errors is distributed abnormal, by a simulation study they showed the performance of the modified maximum likelihood estimators comparing with the ordinary least squares estimators . In this research, the modified maximum likelihood method was used to estimate the coefficients of the linear regression model with a random explanatory variable, in addition to the use of two robust methods, the M method and the robust empirical likelihood method, these two methods were used to estimate the coefficients of the regression model in the presence of outliers or if the model suffered from some problems, however, they were not previously used in estimating the regression model with a random explanatory variable, so they will be employed to estimate the coefficients of this model. The research aims to compare the aforementioned robust estimation methods by adopting the mean square error criterion using simulation experiments with application on real data. 1. Linear regression model with a random explanatory variable: Assuming the simple linear regression model shown by the following equation: ??= ?0+ ?1??+ ?? (1) For a random sample of size n, ?? is a random variable with independent values that represents the response variable, which is supposed to be distributed according to a normal distribution, with a mean equal to ?(??) and a variance of ?2, ??represents the explanatory variable, which is assumed to be constant for repeated samples, ?? is the random errors of the model, whose values are assumed to be independent and distributed according to a normal distribution, with a mean equal to 0 and a fixed variance equal to ?2. In many practical applications, the explanatory variable is not fixed, but rather a random variable. In this case, the regression model according to equation (1) is called the random regression model [7] . The random explanatory variable can follow a normal distribution or any other probability distribution. Assuming that the probability distribution of the explanatory variable X is the extreme value distribution according to the following probability function [4] . ?(?) = 1 (2) θ??? [−((? − ?)/?) − ??? (−?−? ? )] ، − ∞ < ? < ∞ Whereas, α: represents the location parameter, θ: represents the scale parameter. When α = 0 and θ = 1 we obtain the standard extreme value distribution. As for the conditional probability distribution of the response variable Y, the normal distribution is according to the following conditional probability function: 474

  4. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 (3) 1 ?(?|? = ?) = [2??2(1 − ?2)]−1 2 ???[− 2?2((1 − ?2)(? − ?0 2 − ?? ] − ∞ < ? < ∞ ?(? − ?)) ?, ? > 0, −1 < ? < 1, ? = ? − ?0− ?? Where ?0∈ ? , ?1= ?? the random error terms which normally distributed with conditional mean equal to zero ( ?(?|? = ?) = 0) and fixed conditional variance ( ???(? / ?) = ?2(1 − ?2)),[6], note that: ?(?|? = ?) = ?0− ?? ?(? − ?), and u represented (4) ?(? − ?) 2. This section includes a presentation of the robust estimation methods, the subject of research, which were adopted in estimating the simple linear regression model with a random explanatory variable. 3.1: Modified Maximum Likelihood Method(MMLE): The process of estimating the coefficients of the regression model with a random explanatory variable according to equation (1) using the method of Maximum Likelihoodand depending on the probability functions (2) and (3), is difficult and useless, as the equations resulting from the derivation of the Likelihood function do not have explicit solutions and the process of solving them by iterative methods is fraught with difficulties because it does not achieve rapid convergence or convergence to false values, in addition to having multiple roots. Therefore, the modified Maximum Likelihoodmethod proposed by Tiku and Vaugham, [3,4], is used . The joint probability density function for the response variable Y and the explanatory variable X is as follows: ?(?,?) = ?(?)?(?\?) = (? θ The robust estimation methods of the coefficients of the random linear regression model: (5) θ)(? 2exp[−(? − α ) − exp (−? − α 1 σ)(1 − ?2)−1 2?2((1 − ?2)(? )]???[− θ Therefore, the possibility function is in the following form: ? = (θ)−?(σ)−?(1 − ?2)−? (6) ((??−α θ) + exp (−??−α ? ?=1 2exp[−∑ θ)) − 1 ? ?=1 ??2 ] 2?2((1−?2)∑ And that the logarithm of the Likelihoodfunction is as in the following formula, assuming that ??= (??−? LnL = −nlnθ − nlnσ −n (zi+ exp(−zi)) i=1 θ). (7) 1 n n i=1 2ln(1 − ρ2) − ∑ ui2 2σ2((1−ρ2)∑ − By partial derivation of equation (7) with respect to the coefficients (θ, α, ?0,?1, ? ،ρ ) and setting them equal to zero, we obtain the following maximum likelihoodequations, by solving which we obtain the estimates of those coefficients: ?ln? ?α= exp(−??) ?=1 − ?? ?=1 (8) ? ?− 1 ?∑ ? ? ? = 0 ??(1−?2)∑ 475

  5. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 ?ln? ?θ= −? ?ln? ??0= ?ln? ??= −? ?ln? ?ρ = ?ln? ??1= Equations (8&9) contain difficult limits, which leads to the fact that the set of equations (8-13) has no explicit solutions, which must be solved using iterative methods, so the estimates of the maximum likelihoodare difficult to obtain. To address this problem, we convert the explanatory variable into ordered statistics ?(?), 1 ≤ ? ≤ ?, by arranging its value ascendingly as follows: ?(?)≤ ?(?)≤ ⋯ ≤ ?(?) (9) 1 ?∑ 1 ? ? ?=1 ? ?=1 ? ?=1 = 0 ??exp(−??) + ∑ ??(1−?2)∑ ?− ??− ???? ? (10) 1 ? ?=1 = 0 ?2((1−?2)∑ ?? (11) 1 ? ? ?=1 ? ?=1 ??2 = 0 ?3((1−?2)∑ ?2(1−?2)∑ ?+ + ???? (12) ?? ? 1 ? ?=1 ? ?=1 ??2 = 0 ?2(1−?2)2∑ ?(1−?2)∑ (1−?2)− + ???? (13) ? ? ?=1 = 0 ?2(1−?2)∑ ???? (14) Depending on the ordered values of the explanatory variable, ?[?]represents the values of the response variable corresponding to the ordered values ?(?), as well as the standard formula ??and the random errors ??are rewritten as follows: ?(?)= θ ?(?)−? (15) ?[?]= ?[?]− ?0− ?? (16) ?(?(?)− ?) Since the total number of errors is not affected by the order of values, which leads to that ∑ ?[?] ?=1 = 0 , and ∑ ?(?)?[?] ?=1 = 0 . Based on the above, equations (8-13) are rewritten as follows: ?ln?∗ ≡ ?− exp(−?(?)) ?=1 ?ln?∗ ≡ −? ?+ ?(?) ?=1 ?(?)exp(−?(?)) ?=1 ?ln?∗ ≡ ?2((1−?2)∑ ?[?] ?=1 ?ln?∗ ≡ −? ?+ ?=1 ? ? (17) ? 1 ?∑ ? = 0 ?α (18) 1 ?∑ − 1 ? ? = 0 ?∑ ?θ (19) 1 ? = 0 ??0 (20) 1 ? ?[?]2 = 0 ?3(1−?2)∑ ?? 476

  6. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 ?ln?∗ ?ρ ?ln?∗ (21) ?? ? ? ?=1 ?[?]2 = 0 ?2(1−?2)2∑ ≡ (1−?2)− (22) ? ? ?=1 = 0 ?2(1−?2)∑ ≡ ?(?)?[?] ??1 Equations (17-22) represent the modified maximum likelihood equations. To achieve the linear form of exp(−?(?)) the first two terms of the Taylor series can be used, assuming that ?(?)= ?[?(?)] as follows: exp(−?(?)) ≅ exp(−?(?)) + [?(?)− ?(?)]{? (23) ???−?} ?=?(?) Which can be rewritten as follows: exp(−?(?)) = ??+ ???(?) ,? ≤ ? ≤ ? (24) Where: ??= ?−?(?)(1 + ?(?)) (25) ??= −{? (26) ???−?} ?=?(?) Noting that ??≥ 0 for every i= 1,2,…, n . Equations (17) and (18) are rewritten after substituting equation (24) in each of them to become: ?ln?∗ ≡ ?− (??+ ???(?)) ?=1 ?ln?∗ ≡ −? ?+ ?(?)− ? ?=1 ∑ ?(?)(??+ ???(?)) ?=1 By solving equations (27), (28) and (19-22) we obtain the so-called modified maximum likelihood estimates, which are characterized in the case of large samples as unbiased and have a variance equal to the minimum variance bounds (MVB), and that they are fully efficient, and in the case of small samples that they are mostly fully efficient, as they are explicit functions with sample observations and are easy to compute, [ 4 ]. From equation (27) and after substituting for ?(?) we get the estimate for parameter α as follows: ? ?− ( θ ) ?=1 And by doing some mathematical operations we get an estimate of α: ∑ ???(?) ?=1 +∑ (1−??) ?=1 θ̂ ∑ ?? ?=1 which can be rewritten as follows: α ̂ = K + Dθ̂ (27) ? 1 ?∑ ? = 0 ?α (28) 1 ?∑ 1 ? ? = 0 ?θ (29) θ??+??(?(?)−α 1 ?∑ ? = 0 (30) ? ? α ̂ = ? (31) ? ?=1 ∑ ∑ ???(?) ?? ?=1 1 ? Where: K = and D = ∑ (1 − ??) ?=1 ? ? ?=1 ∑ ?? We get the estimate for parameter θ from equation (28) after substituting in the value of ?(?) and performing some mathematical operations, as shown follows: 477

  7. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 (32) (?(?)− α ̂)2− [?̂(??̂+ ∑ ] − ?̂∑ ? ?=1 ? ? ?=1 (?(?)− α ̂) = 0 ∑ ?? (?(?)− α ̂) ?=1 ?? Equation (32) can be rewritten after replacing α ̂ with its equal according to equation (31) as follows: ??̂2− ?̂∑ (?(?)− ?) ?=1 (1 − ??) + ∑ ?? ?=1 (?(?)− ?) (33) 2= 0 ? ? Equation (33) can be written in a simpler form as follows: ??̂2− ??̂+ ? = 0 (34) 2 ? ? ?=1 Where: ? = ? , ? = ∑ The solution to equation (34) represents the estimation of the parameter θ, i.e.: ?̂= 2? In order to obtain an estimate of the parameter ?0, we substitute the value of ?[?] into equation (19) and perform some calculations as follows: 1 ?2((1−?2)∑ ?=? ?̂0= ? ̅ − ? ̂? ̂ (1 − ??) , ? = ∑ (?(?)− ?) ?=1 ?? (?(?)− ?) (35) ?+√?2+4?? ?[?]− ?̂0− ?? ? = 0 ?(?(?)− ?) (36) ?̂( ?̅ − ? ̂) To find an estimate of σ, we rewrite ?[?] by substituting for ?̂0 in equation (16) its equal according to equation (36), as follows: ?[?]= (?[?]− ?) − ? ̂(? ̂ (37) ?̂)(?(?)− ?) Squaring both sides and taking the sum of all values of the sample size, we get: [(?[?]− ?) − ? ̂(? ̂ ?=1 By substituting equation (38) into equation (20) and performing some mathematical operations, we get the following equation: (?[?]− ? ̅)2 ?=1 − 2∑ (?[?]− ? ̅) ?=1 ??2?2= 0 The last equation can be rewritten in terms of the variance and covariance of the sample and for both ?[?] and ?(?) as follows: n?2?− 2???? (38) 2 2 ? ?=1 ? ∑ = ∑ ?[?] ?̂)(?(?)− ?) ] (39) ?̂+? ̂2? ̂2 2 ( ?(?)− ?̅)? ̂? ̂ ? ? ? − ??2+ ∑ ?̂2∑ ( ?(?)− ?̅) ?=1 (40) ??? ?̂ ?2?? ̂ .? ̂ ?2???̂2 ?4?? ̂2.? ̂2 ?̂2??2?− ??2+ ??2?2???̂2 ?4?? ̂2= 0 ?̂+ The solution to the last equation represents an estimate of σ, as shown below: 1 2 To get an estimate of ?1, we substitute the value of ?[?] and ?(?) into equation (22) and perform some mathematical operations to get the following equation: ∑ (?(?)− ?)(?[?]− ?̂0) − (?(?)− ?) ?=1 (41) ?2?? ?2?(?̂2 ? ̂ = [?2?+ ?2?− 1)] (42) 2?̂1= 0 ? And by solving this equation, we get an estimate of ?1: 478

  8. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 (43) ? ?=1 (?(?)−? ̂)(?[?]−?̂0) (?(?)−? ̂)2 ∑ ?̂1= Estimating the correlation parameter ρ we get from equation (21) after substituting for the sum of the squares of random errors ?[?] , as it can be rewritten in terms of the variance and covariance of the sample and for each of ?[?] and ?(?) after making some mathematical operations to get the following formula: [(? ̂ ?̂) ?? ?? ?? Which can be rewritten as follows: ?? ̂2− ?? ̂ − ? = 0 (44) 2 2 2(?̂ ??? 2+ ? ̂2]? ̂2− 2[(? ̂ 2− 1)] = 0 ?̂)???]? ̂ − [ (45) 2 2 2(?̂ ??? ?? Where: ? = [(? ̂ 2+ ? ̂2] , ? = −2[(? ̂ ?̂)???] , ? = −[ 2− 1)] ?̂) ?? ?? Solving equation (45), we get an estimate of the parameter ρ as follows: ?̂??? ? ̂?? Estimates of the modified maximum likelihoodmethod were derived assuming that the probability distribution of the explanatory variable X is the extreme value distribution, but it can be any other probability distribution imposed on us by the data under research. 3.2: M-estimation Method: This method is one of the robust methods which is commonly used in estimating regression model coefficients in case these models suffer from basic problems, and in particular if the data contains outliers or extreme values. In this research, this method was used to estimate the coefficients of the regression model with one or more random explanatory variables, as far as we know, they were not used in estimating these models. This method is an extension of the Maximum Likelihood method and is symbolized by the symbol M, which indicates that these estimates are of the type of estimates of the Maximum Likelihood. The idea of this method is based on minimizing a function in random errors instead of minimizing random errors themselves, meaning that the M estimator (?̂?) is obtained from minimizing the following objective function, [ 11,12 ]. ?̂?= min ? (46) 2 ? ̂ = (47) ? ?=1 ∑ ?(??) Where ?(??) is the objective function and it is a function of the random errors of the regression model. By substituting for the random errors ?? with the standard errors, the objective function (47) can be rewritten as follows: ?̂?= min ? ? The standard errors function ρ takes several forms, such as, Huber's formula, Tukey's bisquare formula, as shown in Table (1). By deriving the objective function in equation (48) and setting it equal to zero, we get: ∑ ????(??−?0−?1?? ? ̂ ) = 0 , ? = 0,1,…,? ?=1 (48) ?(??−?0−?1?? ? ?=1 ∑ ) (49) ? 479

  9. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 where ? represents the first derivative of the function ρ, ??? is the observation i of the explanatory variable j and ??0= 1 to include the fixed term in the model, equation (49) is solved depending on the method of Iteratively reweighted least squares (IRLS) and using the weights shown in the following equation, [ 12]. ?(??−?0−?1?? ? ̂ ) (??−??−?0−?1?? ? ̂ The weights ?? are shown in Table (1) for each of the two formulas of the function ρ, the estimate of the measurement parameter ? is obtained according to the following formula, [ 11]. ? ̂ = 0.6745 Depending on the ?? weights, equation (49) can be rewritten as follows: ∑ ?????(??−?0−?1?? ? ̂ ) = 0 , ? = 0,1,…,? ?=1 Using the IRLS method, equation (52) can be solved by assuming that there are initial values for the regression model coefficients ?̂0 and the estimate of the measurement parameter σ ̂. Equation (52) can be rewritten using matrices to obtain the following equations system of weighted least squares: ?′???̂= ?′?? (50) ) ??= ?(??) = (51) Median|??−Median(?)| (52) ? (53) where W is a diagonal matrix with size (n * n) and the diagonal elements represent the ?? weights, by solving equation (53) we obtain an estimate of the regression coefficients according to the (M) method, which is shown in the following formula, [11] . ?̂?= (?′??)−1?′?? (54) Table (1): Huber and Tukey's bisquare formulas for the function (ρ) Objective Functions Method Weight Function The value of the tune parameter a 1 2?2 , ?|?| −1 1 , ?? |?| ≤ ? ? |?| , ?? |?| > ? ?? |?| ≤ ? Huber 1.345 { { 2?2 , ?? |?| > ? 6(1 − (? 1 6?2 , |?| > ? 3.3: Robust Empirical Likelihood Method (REL ): The empirical likelihoodmethod is one of the nonparametric estimation methods proposed by Owen, 1988, [ 13] which is an alternative to the maximum likelihood method in the absence of assumptions about the probability distribution of the random errors term of the regression model. This method assumes that there are probability weights ?? for each of the sample observations (i=1,2,…,n ), it aims to estimate the regression model coefficients by maximizing the empirical likelihood function, which is defined as the product of those probability weights, subject to some 3 2 ?2 2 2 {(1 − (? , |?| ≤ ? , |?| ≤ ? ?) ) ?) ) Tukey's Bisquare 4.685 0 , |?| > ? { 480

  10. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 restrictions related to the model coefficients that are to be estimated, , these constraints are similar to the natural equations of the method of least squares or to the equations of the likelihood under the hypothesis of normal distribution, which leads to this method being very sensitive to the non-fulfillment of the hypothesis of a normal distribution or the presence of outliers or extreme values, therefore, Ozdemir and Arslan, 2018 , [14] suggested the useof robust constraints borrowed from the objective function of the robust estimation method M, in the sense of reconciling the empirical likelihood method with the robust M method. The empirical likelihood method will first be presented as an introduction to presenting the robust empirical likelihood method . For the linear regression model shown in equation (1) and assuming ??for each (i = 1,2,…,n), represented the probability weights for each observation of the sample, which is unknown and has different values for each observation and needs to be estimated, whereas, ??≥ 0, the empirical likelihood method used to estimate the vector of model coefficients ? and the population variance ?2 requires the maximization of the empirical likelihood function under the availability of some constraints on the maximization process, as shown below: ???(?,?2) = ∏ ?? ?=1 m ? (55) Under the following constraints: ∑ ?? = 1 ?=1 ? (56) ∑ ? ?=1 (57) )??= 0 ?? ( ??− ?0− ?1?? ∑ ? ?=1 (58) ?? [(??− ?0− ?1??)2− ?2] = 0 By taking the logarithm of both sides of equation (55), we get the logarithm of the empirical likelihood function as in the following equation: ????(?,?2) = ???∏ ?? ?=1 = ∑ ??? ?=1 (??) ? ? (59) Estimating the probability weights ??= [?1 ?2… ??] and the coefficients vector ??= [?0 ?1] and the variance ?2, can be obtained by maximizing the logarithm of the empirical likelihood function under the realization of the three constraints according to equations (56- 58), this maximization problem can be solved by using the Lagrange multiplier method, which requires the formulation of the Lagrange function according to the following formula: ?(?,?,?0,?1,?2) = ∑ ??? ?=1 (??) − ?0(∑ ??− 1) − ?=1 ?? ?? [(??− ?? ?=1 Where: ?0 ,?1,?2 ∈ ? , which represented the Lagrange multiplier.The estimation process requires first estimating the vector of probability weights by taking the derivative of the Lagrange function (60) for each ?? and equating it to zero, so that we can get an estimate of those weights according to the following formula: (60) ? ? ? ?=1 ??1∑ ?? (??− ? ??) ??− ??2 ∑ ??)2− ?2] 481

  11. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 (61) 1 ??)2−?2] ??? ? = 1,2,…,? ??= ? ?)??+ ??2[(??−?? ?0+??1 (??−?? By taking the sum of both sides of equation (61) for all values of i, we get that ?0= ? , so we can rewrite this equation after substituting for ?0 as follows: ??= ?( 1+?1(??−?? By substituting for ?? according to the formula (62) in the empirical likelihood function (60), we get the following objective function in terms of ?,?1,?2,?2 only: ?(?,?1,?2 ,?2) = −∑ ??? ( ?=1 1 + ?1(??− ?? ????? The values of the Lagrange multiplier ?1 and ?2 for each of the regression model coefficients vector ? and ?2 can be obtained through the following maximization problem: (62) 1 ??)2−?2] ??? ? = 1,2,…,? ? ?)??+ ?2[(??−?? (63) ? ? ?)??+ ?2 [(??− ?? ??)2− ?2] − (64) ?̂(?,?2) = ???min ?????] Where: ? = [?1 ?2] . Noting that the problem of minimization (64) has no explicit solutions, which requires numerical methods to find solutions to this problem, by substituting those solutions in ?(?,?1,?2 ,?2) we get the logarithm of the empirical likelihood function ?(?̂(?,?2) ,? ,?2), this function is in terms of the vector of the coefficients ? and the variance ?2, and the estimation of the empirical maximum likelihood for those parameters is obtained by solving the following maximization problem: (?̂,? ̂2) = ???max ? ?=1 ? ?)??+ ?2[(??− ?? ??)2− ?2] − ?[−∑ ?0+ ?1(??− ?? ??? ( (65) (?,?2)?(?̂(?,?2) ,? ,?2) The last maximization problem is solved using numerical methods because there are no explicit solutions for it. We have shown previously that the method of the robust empirical likelihood method is the result of replacing the constraints (56- 58) with the robust constraints borrowed from the objective function of the robust estimation method M indicated by the formula (49), this modification was adopted to increase the effectiveness of the empirical likelihood method in addressing the effect of outliers on the estimation of the model's coefficients, and here we will employ this method to estimate the coefficients of the linear regression model in case the explanatory variable is random. The empirical likelihood function shown in relation (55) will be maximized after replacing the second and third constraints, relations (57) and (58), with the following robust constraints [14] ∑ ?? ? ( ??− ?0− ?1?? ?=1 )??= 0 ? (66) ∑ ? ?=1 (67) ?? ?[(??− ?0− ?1??)2− ?2] = 0 where ? is a function of random errors, which is non-increasing in the case of Huber's function, and decreasing in the case of Tukey's function. The robust estimate for each of the model 482

  12. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 coefficients and variance is obtained by solving the following maximization problem and using the Lagrange function: ?(?,?,?0,?1,?2) = ∑ ??? ?=1 (??) − ?0(∑ ??− 1 ?=1 (68) ? ? ? ?=1 ) − ??1(∑ ?? ?(??− ? ?=1 ??)??− ??2 (∑ ??)2− ?2]) ?? With the same previous steps that were followed in the empirical likelihood method, the estimation process requires first estimating the vector of probability weights by taking the derivative of the Lagrange function (68) for each ?? and equating it to zero to get an estimate of those weights according to the following formula: ??= ?0+??1 ?(??−?? By taking the sum of both sides of equation (69) for all values of i, we get that ?0= ?, so we can rewrite this equation after substituting for ?0 as follows: ??= ?( 1+?1?(??−?? By substituting for ?? according to the equation (70) in the empirical likelihood function (68), we get the following objective function in terms of ?,?1,?2,?2 only: ?(?,?1,?2 ,?2) = −∑ ??? ( ?=1 1 + ?1 ?(??− ?? ????? The values of the Lagrange multiples ?1 and ?2 for each of the regression model coefficients vector ? and ?2 can be obtained through the following minimization problem: ?̂(?,?2) = ???min ?[−∑ ?2] − ?????] Noting that the minimization problem (72) has no explicit solutions, which requires numerical methods to find solutions to this problem, and by substituting these solutions into ?(?,?1,?2 ,?2) we get the logarithm of the empirical likelihood function ?(?̂(?,?2) ,? ,?2), this function is in terms of the vector of the model coefficients ? and the variance ?2 , and the estimate of the robust empirical maximum likelihood for those coefficients is obtained by solving the following maximization problem: (?̂,? ̂2) = ???max ?? [?(??− ?? (69) 1 2−?2] ??? ? = 1,2,…,? ? ?)??+ ??2[?(??−?? ??) (70) 1 2−?2] ??? ? = 1,2,…,? ? ?)??+ ?2[?(??−?? ??) (71) ? ? ?)??+ ?2[?(??− ?? ??)2− ?2] − (72) ? ?=1 ? ?)??) + ?2[?(??− ?? ??)2− ?0+ ?1(?(??− ?? ??? ( (73) (?,?2)?(?̂(?,?2) ,? ,?2) The last maximization problem is solved using numerical methods because there are no explicit solutions for it. 4. Simulation experiments: Simulation experiments will be conducted to compare the robust estimation methods under study to sort out the best estimation method, based on the comparison criterion, Mean Squares Errors (MSE). The R 4.2.1 statistical programming language has been used to write the simulation program. The written program includes four basic stages for estimating the regression model, as follows: 483

  13. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 4.1: Determined default values: The default values for the parameters were determined, based on the real data adopted in the practical application. The real data was simulated for the random explanatory variable, which represents the patient's blood sugar (RBS), which follows the distribution of the extreme value to determine the default values for the two parameters of this distribution, as shown in Table (1), and based on simulating real data for each of the response variable, red blood cell volume (PCV) and the previously mentioned explanatory variable, the default values of the two parameters of the simple linear regression model were determined according to formula (1) and shown in Table (1). The values (0.3, 0.5, 0.7) were chosen as default values for the correlation coefficient ρ between the random explanatory variable and the terms of random error. As for the approved sample sizes, three different sample sizes were chosen (30, 70, 150), and each experiment was repeated 1000 times. Table (1): The default values for the two parameters of the extreme value distribution and for the parameters of the simple linear random regression model. θ 40 41 α 135 145 ?0 16 ?1 0.04 4.2: Data generation: Depending on the default values of the two parameters of the extreme value distribution, the values of the random explanatory variable that follows this distribution were generated according to formula (2), the values of the random error terms variable were generated assuming that it follows a normal distribution with a mean equal to zero and a variance of ?2(1 − ?2). Note that the values of ?2were determined based on the default values for each of the parameters θ and ?1 and the correlation coefficient ρ, the generation process took place for the sizes of the three samples mentioned previously, based on the values of the random explanatory variable and the random error terms, the values of the response variable were generated based on the simple linear regression model according to formula (1). 4.3: Estimation and comparison: By adopting the three estimation methods under research and the data generated, the simple linear regression model was estimated with a random explanatory variable that follows the distribution of the extreme value according to the model shown in formula (1), and then calculating the average rout mean squares errors for the estimated model and for each estimation method according to the following formula: 42 155 18 43 160 19 44 175 20 17 0.05 0.06 0.07 0.08 (74) 2 1 ?√ 1 [?̂?− ??] ?=1 ? ? ?=1 ?∑ ∑ ????? = Where r represents the number of iterations for each experiment and is equal to (1000). According to the above comparison criteria, the comparison will be made between the results of estimation of the three methods. For each combination of the default values and the three sample sizes, the values of the comparison criterion are summarized in Table (2). It is clear from these results that the best estimation method is the MMLE, followed by the ME method, although the value of the comparison criterion for the three estimation methods is close. 484

  14. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 Table (2): Average rout mean square errors values for the random regression model estimation results. Estimated method ρ n A* B* MMLE 0.3 30 10.70234 6.42548 70 10.65632 6.25740 150 10.54366 6.23644 0.5 30 5.83150 3.45439 70 5.79143 3.41886 150 5.63819 3.41322 0.7 30 3.43634 2.02950 70 3.39836 2.02566 150 3.29939 1.97072 ME 0.3 30 10.70665 6.42715 70 10.66579 6.26244 150 10.57076 6.25282 0.5 30 5.83400 3.45580 70 5.64809 3.42608 150 5.79602 3.41581 0.7 30 3.43770 2.03111 70 3.40131 2.02629 150 3.30746 1.97426 RELE 0.3 30 10.70659 6.42717 70 10.66617 6.26243 150 10.57200 6.25408 0.5 30 5.83400 3.45583 70 5.79609 3.42629 150 5.64793 3.41592 0.7 30 3.43772 2.03118 70 3.40154 2.02629 150 3.30882 1.97469 * A: θ=44, α=175, β0=20, β1=0.08 B: θ=41, α=145, β0=17, β1=0.05 C: θ=42, α=155, β0=18, β1=0.06 D: θ=43, α=165, β0=19, β1=0.07 E: θ=40, α=135, β0=16, β1=0.04 5. Practical application: The data that was adopted in the practical application related to people with heart disease, as a random sample of 30 patients was chosen from those who were hospitalized in Ghazi Hariri Hospital for Specialized Surgery. For each patient, packed cell volume (PCV) and random blood sugar (RBS) were recorded, as shown in table (3), these data were used in estimating the simple linear relationship between (PCV) as a response variable and (RBS) as an explanatory variable, a C* 7.81122 7.78046 7.67183 4.23047 4.20564 4.08129 2.51128 2.49299 2.48488 7.81689 7.78304 7.68618 4.23198 4.20894 4.08827 2.51231 2.49510 2.48925 7.81678 7.78306 7.68638 4.23207 4.20912 4.08902 2.51232 2.49517 2.48968 D* 9.32264 9.23494 8.94120 5.12812 5.08658 4.91907 3.02261 2.98417 2.93176 9.32983 9.23813 8.95571 5.13209 5.08826 4.92857 3.02370 2.98635 2.93706 9.32996 9.23817 8.95669 5.13238 5.08832 4.92877 3.02368 2.98642 2.93759 E* 5.06044 5.01819 4.68794 2.67360 2.65669 2.64957 1.61564 1.57930 1.50329 5.06424 5.01946 4.69933 2.67408 2.66265 2.65129 1.61605 1.57994 1.50462 5.06412 5.01984 4.69955 2.67407 2.66192 2.65151 1.61606 1.57997 1.50462 485

  15. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 simple linear regression model was built with a random explanatory variable, the regression relationship was described in the following model: PCV?= ?0+ ?1RBS?+ ?? (75) Table (3): Packed cell volume (PCV) and random blood sugar (RBS) of the sample observations. No. PCV RBS No. PCV 1 36 170 11 26 2 23 136 12 27 3 39 233 13 46 4 22 96 14 32 5 21 145 15 31 6 25 188 16 29 7 21 136 17 26 8 25 141 18 33 9 22 147 19 31 10 41 310 20 29 The random regression model according to formula (74) was built on the assumption that the response variable PCV is distributed according to the normal distribution and that the explanatory variable is distributed according to the extreme value distribution. As for the random error terms, they follow the normal distribution independently with a mean equal to zero and a constant variance equal to ?2(1 − ?2). To verify these assumptions, the chi-square test was used for goodness of fit for each of the response and explanatory variables. With regard to the data of the response variable (PCV), It was tested whether it has a normal distribution versus not having such a distribution, i.e. testing the following hypothesis: ?0: ???~?????? ????. ?1: ??? ≁ ?????? ????. The test results are shown in Table (5), these results indicate that the P-value associated with this test is greater than the significance level of 0.05, and thus the null hypothesis is not rejected in the sense that the data of the PCV variable are normally distributed, Figure (1) shows the histogram and the normal distribution curve of the data this variable. As for the data of the RBS response variable, it was tested to see if it follows the distribution of the extreme value, according to the following hypothesis: ?0: ???~??????? ????? ????. ?1: ??? ≁ ??????? ?????????. The results of this test are shown in Table (5), as the P-value of this test, which is greater than the level of significance 0.05, indicates the acceptance of the null hypothesis, meaning that the data of this variable follow the extreme value distribution, Figure (2) shows the histogram and distribution curve of the data of this variable. Table (5): The chi-square goodness of fit for the response variable PCVand explanatory variable RBS. PCV Statistic 1.46667 P-Value 0.916884 RBS 149 348 387 202 200 199 115 158 152 193 No. 21 22 23 24 25 26 27 28 29 30 PCV 31 30 30 27 29 27 34 29 34 32 RBC 191 182 184 124 218 143 150 153 161 209 RBS 6.26667 0.281129 486

  16. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 Parameters Figure (1): Histogram and normal distribution curve for the response variablePCV. ? = 29.6; ? = 5.75384 ? = 157.785; ? = 41.8356 Figure (2): Histogram and the extreme value distribution curve for the response variable RBS. To estimate the coefficients of the random regression model shown in formula (75), the three estimation methods under research will be relied upon, as although the simulation experiments revealed the preference of the modified Maximum Likelihoodmethod, we find that there is a great convergence between the values of the comparison criteria RMSE for the models estimated according to those methods. Table (6) summarizes the results of the estimation. From these results, we find that the three methods produced significant estimated models, depending on the statistical indicator F for a significant level of 0.05. They also agreed that there is a positive effect of the random explanatory variable RBS on the response variable PCV, and the value of this effect is convergent for the three estimated models, as for the criteria of goodness of fit and accuracy of estimation, R2 and RMSE, indicated the preference for the modified maximum likelihood method, followed by the robust empirical likelihoodmethod in preference. Based on the foregoing and depending on the estimates of the modified Maximum Likelihoodmethod as the best method, the estimated random regression model is as follows: ??? ̂?= 28.0954 + 0.05963???? (76) Table (7) shows the real values of the response variable PCV and its estimated values. Figure (3) shows the graph of the real and estimated regression curve. 487

  17. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 Table (6): The estimate results for the three estimation methods MMLE, ME, RELE. Estimation method Estimated parameters F statistic MMLE ?0 ?1 0.05963 ME ?0 ?1 0.07463 RELE ?0 ?1 0.07443 Table (7): The real and estimated values of the response variable PCV. R2 43.68% RMSE 4.469 21.71965* 28.0954 19.316* 16.0487 40.82% 4.581 19.369* 16.09 40.89% 4.578 No. Real value 36 23 39 22 21 25 21 25 22 41 Figure (3): The graph of the real and estimated regression curve. Estimated value 30.494 25.485 28.050 27.692 30.137 30.017 29.481 29.600 26.022 31.627 No. Real value 26 27 46 32 31 29 26 33 31 29 Estimated value 27.155 27.573 27.751 28.228 31.091 28.765 26.738 32.522 24.353 27.274 No. Real value 31 30 30 27 29 27 34 29 34 32 Estimated value 29.839 26.738 27.036 27.394 37.113 27.513 39.379 41.705 30.673 30.554 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 50 45 40 35 30 ةيقيقحلا ميقلا 25 ةيريدقتلا ميقلا 20 15 10 5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Conclusions: The estimation of the simple linear regression model with a random explanatory variable, which contradicts the basic assumptions of this model, is discussed. Three robust estimation methods were used, the first is the modified maximum likelihood method, which was applied by researchers in estimating the coefficients of this model, the second and third methods are the M estimation method 488

  18. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 and the robust empirical likelihood method, which were employed by us in estimating the regression model coefficients under research, as far as we know, they were not used previously in estimating this model, but rather they were used in estimating regression models that suffer from some standard problems, and in case the sample observations contain outliers or extreme values. The results of the simulation experiments revealed the superiority of the modified maximum likelihood method based on the RMSE comparison criterion. These experiments also revealed the effectiveness of the two methods M and the robust empirical likelihood in view of the great convergence between the estimation results of these two methods and the results of the modified maximum likelihood method. As for the results of the practical application, they were identical to the results of the simulation experiments. Based on the foregoing, the two robust estimation methods M or the robust empirical likelihood can be used in estimating the random regression model, as they are easier in terms of applications than the modified maximum likelihood method, nor are they based on assumptions about the probability distribution of each of the random error limits and the random explanatory variable. Acknowledgement: We would like to thank mustansiriyah university, Baghdad-Iraq for its support in the present work. (www.uomustansiriyah.edu.iq ). References 1. Kerridge, D. (1967). Errors of prediction in multiple regression with stochastic regressor variables. Technometrics, 9(2), 309-311. 2. Lai, T. L., & Wei, C. Z. (1982). Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1), 154-166. 3. Tiku, M. L., & Vaughan, D. C. (1997). Logistic and nonlogistic density functions in binary regression with nonstochastic covariates. Biometrical Journal, 39(8), 883-898. 4. Vaughan, D. C., & Tiku, M. L. (2000). Estimation and hypothesis testing for a nonnormal bivariate distribution with applications. Mathematical and Computer Modelling, 32(1-2), 53- 67. 5. Islam, M. Q., & Tiku, M. L. (2005). Multiple linear regression model under nonnormality. Communications in Statistics-Theory and Methods, 33(10), 2443-2467 6. Sazak, H. S., Tiku, M. L., &amp; Islam, M. Q. (2006). Regression analysis with a stochastic design variable. International Statistical Review/Revue Internationale de Statistique, 77-88. 7. Islam, M. Q., & Tiku, M. L. (2010). Multiple linear regression model with stochastic design variables. Journal of Applied Statistics, 37(6), 923-943 8. Bharali, S., & Hazarika, J. (2018). Regression models with stochastic regressors: An expository note. . Int. J. Agricult. Stat. Sci. Vol. 15, No. 2, pp. 873-880, 2019. 9. Deshpande, B., & Bhat, S. V. (2019). Random Design Kernel Regression Estimator. International Journal of Agricultural and Statistical Sciences, 15(1), 11-17. 489

  19. UtilitasMathematica ISSN 0315-3681 Volume 120, 2023 10. Lateef, Zainab Kareem & Ahmed, Ahmed Dheyab . ( 2021)Use of modified maximum likelihood method to estimate parameters of the multiple linear regression model . Int. J. Agricult. Stat. Sci. Vol. 17, Supplement 1, pp. 1255-1263, 2021. 11. Anderson, C., &amp; Schumacker, R. E. (2003). A comparison of five robust regression methods with ordinary least squares regression: Relative efficiency, bias, and test of the null hypothesis. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 2(2), 79-103. 12. Susanti, Y., Pratiwi, H., Sulistijowati, S., &amp; Liana, T. (2014). M estimation, S estimation, and MM estimation in robust regression. International Journal of Pure and Applied Mathematics, 91(3), 349-360 . 13. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for single functional. Biometrika, 75(2), 237-249. 14. Özdemir, Ş., &amp; Arslan, O. (2020). Combining empirical likelihood and robust estimation methods for linear regression models. Communications in Statistics-Simulation and Computation, 51(3), 941-954 490

More Related