New Algorithms for L1 Norm Regression

In this paper, we propose four algorithms for L1 norm computation of regression parameters, where two of them are more efficient for simple and multiple regression models. However, we start with restricted simple linear regression and corresponding derivation and computation of the weighted median problem. In this respect, a computing function is coded. With discussion on the m parameters model, we continue to expand the algorithm to include unrestricted simple linear regression, and two crude and efficient algorithms are proposed. The procedures are then generalized to the m parameters model by presenting two new algorithms, where the algorithm 4 is selected as more efficient. Various properties of these algorithms are discussed.


Introduction
In Bidabad (1989a,b), various aspects of the L1 norm regression were reviewed. We observed that the L1 norm criterion is going to find its place in scientific analysis. Since it is not computationally comparable with other criteria such as L2 norm, it needs more work to make it a hand tool. The closed form of the solution of the L1 norm estimator has not been derived yet, and therefore, makes further inferences of the properties of this estimator difficult. Any attempt to give efficient computational algorithms which may introduce significant insight into the different characteristics of the problem is desirable. In this regard, we shall try to give a general procedure in this paper to solve the L1 norm linear regression problem. The proposed algorithms are based on a special descent method and use a discrete differentiation technique. Primary designs of the algorithms have been discussed by Bidabad (1987a,b,88a,b). By manipulating these algorithms, more efficient ones were introduced by Bidabad (1989a,b) which has been shown to have better performance than other existing algorithms.
Consider the following regression model, m yi =  jxji + ui i=1,...,n (1) j=1 where j, j=1,...,m are population parameters to be estimated, yi, xji and ui are dependent, independent, and random variables respectively. We wish to estimate j's by minimizing the expression, n n m Suppose m=1, then expression (2) reduces to, n n n S =  │y i -y i^│ =  │y i - A typical element, Si=|yi-^xi| can be viewed as a broken line in the Sx^ plane composed of two half-lines. Si attains its minimum which is zero at, i^ = yi/xi (4) The slopes of the half-lines to the left and right of i^ are -|xi| and |xi| respectively. So, Si's are all convex, and hence their sum S is also convex with a slope at any ^ equal to the sum of the slopes of Si's at that value of ^. Since the slope of each Si changes only at the corresponding i^, the minimum of S will lie on one of the i^. Thus, the regression line will pass through origin with the slope equal to i^ which minimizes S. On the other hand, to find the L1 norm estimate of  we need to find only one observation (see, Karst (1958), Taylor (1974)). Furthermore, Taylor (1974) concludes that: "This implies, of course, that the regression line must pass through the observation corresponding to the minimizing i. The regression line, therefore, is determined by the point of origin and the observation associated with minimizing i^". But he did not continue this approach, that is, minimizing S with respect to subscript i. In this paper as Bidabad (1987a,88a) a first endeavor is to develop this point of view. In the next section, after rewriting (3) in a suitable fashion, the value of i is determined by using discrete differentiation.
By a similar discussion, it can be concluded that when the number of parameters is m, m observations must lie on the regression hyperplane. On the other hand, m equations of the form given by (5) are necessary to specify the regression hyperplane (see also Bidabad (1989a,b)). m yi - j^xji = 0 (5) j=1 Now, let us proceed with the simplest regression model.

Restricted simple linear regression
For model (1) consider the case of one independent variable with no intercept, namely, yi=xi+ui. To find the L1 norm estimate of , the following procedure can be suggested. n n n n S =  |ui| =  |yi -xi| =  (yi -xi)sgn(yi -xi) =  |xi|(yi/xi -)sgn(yi/xi -) Let zi=yi/xi and sort zi in descending order. Rename the resultant ordered zi, i=1,...,n to zh, h=1,...,n. Elements of zh should have the following property: zh > zl if h < l for all h,l=1,...,n. Rewrite (6) with ordered observations as, n S =  |xh|(zh -)sgn(zh-) (7) h=1 Let us denote the observation which will be on the regression line by (xt+1,yt+1) -that is the (t+1) th observation in the zh array. Value of zh is the slope of a ray passing through the origin and the h th observation. Therefore, if, h < t+1 then, zh >  and uh > 0 if, h = t+1 then, zh =  and uh = 0 if, h > t+1 then, zh <  and uh < 0 We can now rewrite (7) as follows, h=t+1 To find the minimum of S, we need to differentiate it with respect to  and subscript t, because both  and t are unknowns. Note that  has continuous domain and t has discrete domain. Thus, δS t n The differentiation of S with respect to subscript t must be discrete derivative (see, Bender and Orszag (1978), Clarke (1983)): It should be noted that if xt+1=0 then the value of zt+1=l, and when the array z is sorted, infinite values of z locate at extremes of two tails of the array and make no problem in (10). Remember that equation (10) is the same as relation (4). Equations (9) and (10) are two equations with two unknowns t and . The variable t can be found by rewriting (9) as follows, k n h=1 h=k+1 It is obvious that Dk is an increasing function of k. When k increases from one to n, Dk attains different values, which increase from negative to positive. So, initially, k is set equal to one, and Dk is computed accordingly. If Dk is negative, k is increased by one, and the procedure is repeated until Dk becomes positive. When Dk reaches the first positive value, then, t+1=k. By this procedure, the value of t+1 is found. The observation corresponding to this subscript (t+1=k) is selected (xt+1,yt+1). The L1 norm estimate of  is found by substituting the values of xt+1 and yt+1 into (10), (see, Bidabad (1987a,88a)).
The presented procedure is essentially the weighted median procedure of Laplace (1818). Namely, this solution is the steps that Laplace took. The main difference is the mathematical derivation. Laplace found this solution by analyzing the algebraic characteristic of the L1 norm objective function; while the above procedure exactly differentiates the objective function. Another new contribution of this procedure is the application of the partial discrete differentiation over subscript accompanying with conventional differentiation on a regular variable with the continuous domain. It should be remembered that Boscovich (1757) solved this problem by a geometrical procedure and Karst (1958) by an analytical method.

Computation of weighted median
Two series of computations are necessary to compute the weighted median. One sorting algorithm is essential to sort the z array and restoring the corresponding subscripts to compute the second part of the calculation to find the optimal value of k by using Dk defined by (11).
Efficient sorting algorithms exist for the first part of the computation. The algorithms 'quicksort' of Hoare (1961,62), 'quickersort' of Scowen (1965) and 'sort' of Singleton (1969) have desirable performances and efficiencies. For the second part of the computation, there is no special purpose procedure, but Bloomfield and Steiger (1980) used the partial sorting of Chambers (1971) to give an efficient way to combine the two steps of sorting the z array and finding the value of k. The superiority of this procedure is in sorting the smaller segments of z array rather than all its elements. With some modification, this procedure is used for our proposed algorithms in the forthcoming sections. This procedure can be stated as the following function. FUNCTION LWMED (n,ys,w,l) Step 0) Initialization.
Step 1) Compute left, middle and right sum of weights.
Step 3) Check for solution.
Step 4) Divide the string into two halves then sort.
Step 5) Compute the accumulation of weights.
If ys(l(j))>xt then go to step 5.b, otherwise if j≤i then go to step 6, otherwise lt=l(i), l(i)=l(j), l(j)=lt, go to step 5.a.
Step 6) Test for solution.

Discussion of the m parameters model
The procedure applied to the simple one-parameter model cannot be simply generalized to the m parameters model. This difficulty is due to the fact that we can not reorder observations in such a way that their corresponding residuals are increasingly or decreasingly ordered. On the other hand, to apply the discrete differentiation technique, primarily (2) should be rewritten as follows: t m n m h=1 j=1 h=t+1 j=1 Expression (12), which is free of absolute value sign is the generalization of (8) for m parameters. But our first problem is to find a logic which enables us to form (12). On the other hand, we need to reorder observations in such a way that when h is less, equal or greater than t+1, uh be greater, equal or less than zero respectively; and as h increases from one to n, the corresponding residual uh decreases accordingly.
If (2) could be written as (12), we could again differentiate it with respect to j for all j=1,...,m and t. Differentiating S with respect to j for all j=1,...,m would give, δS t n ── = - x jh +  x jh j=1,...,m δß j h=1 h=t+1 In order to differentiate S with respect to discrete domain variable t we would have, m distinct values for t can be computed from (13) by using the following Dkj for each explanatory variable. The procedure to compute t is similar to what we did in (11) for the one parameter model. k n h=1 h=k+1 It should be noted that these m values for t, all give a unique value for S in (12), because all of the errors corresponding to these t's have zero values. This becomes clear by zero residuals property of the L1 norm regression that there exist m points on the regression hyperplane which have zero residuals. In (12), these m points locate sequentially, because we had reordered observations in such a way that, when h increases from one to n, the errors decrease from greatest positive to lowest negative values. So, these m zero errors will locate one after another nearly in the middle of n elements sequence of (12). Corresponding to these m values of t, m observations are recognized. Substitute the values of these m observations into (14), m equations are found which can be solved simultaneously for m unknown j's. These last m equations again confirm the abovecited property of the L1 norm regression.
Since for the m parameters model, the ordering of residuals before computing optimal values of j's is not possible, another device should be adopted to compute the L1 norm estimates of the multiple linear regression parameters. 3. Unrestricted simple linear regression To find the L1 norm estimates of {j,j=1,...,m} for the model (1), we shall try to propose algorithms to search for those observations on the optimal regression hyperplane. For m=2 and x1i=0, i=1,...,n; (1) reduces to restricted simple linear regression with only one parameter. The L1 norm estimate of the slope 2 for the simple model can be obtained by the algorithm given in the previous sections. Let us now consider a simple unrestricted linear model in which m=2 and x1i=1 for all i=1,...,n; namely, y i = ß 1 + ß 2 x 2i + u i (16)

Algorithm 1
For the model given in (16), the L1 norm objective function S to be minimized will be, n Let k1 denote a subscript which belongs to the range of one to n, and assume that k1 th observation (yk1,x2k1) is a candidate to be on the regression line. If this is the case, then uk1=0 and we can transfer the origin of YxX2 coordinates to the point (yk1,x2k1) without any loss. For this, we should rewrite all observations as deviates from the point (yk1,x2k1),  (17), then, our L1 norm minimization problem can be redefined as, n min: Since we assumed that the k1 th observation is on the regression line, the expression yk1-1-2x2k1=0. Thus (20) reduces to, n min: The solution of the optimization problem (21), is the same as the one described before for one parameter linear model. Note that when uk1=0, then the minimum value of (21) is equal to the minimum value of (17).
Let 2 derived by minimizing (21) be denoted by 2 k1 . By changing k1 from one to n and minimizing (21), 2 1 ,...,2 n are attained accordingly. Now the question is: what value of k1 minimizes (17)? In other words, which observations are on the optimal regression line? Note that in the two parameters L1 norm linear model, there exists at least two observations with zero errors. Transferring of YxX2 coordinates to both these points does not change the minimum of (17). Suppose p th and q th are those observations on the regression line, thus, (18) and (19) for k1=p,q and substituting them into (21), yields, Using (18) and rewriting (23) and (24), it is shown that Sp is equal to Sq, Sp is equal to Sq if and only if the two parentheses inside the absolute value signs of (25) and (26) are equal. This can be concluded by solving the two equations of (22) for 1. That is, 1 = yp -2x2p = yq -2x2q (27) The equality of Sq and Sp is because the points p and q are on the regression line and therefore up=uq=0. Thus, Sp=Sq, so, the values of 2 p and 2 q derived by minimizing either Sp or Sq must be equal. This gives a criterion to find the desired 2 from all 2 k1 for k1=1,...,n. That is, when 2 p =2 q . The estimated value of 2 is denoted by 2^. Value of 1^ is simply computed by (27). Now let us summarize the whole procedure for finding the values of 1^ and 2^ for the model given by (16).
Step 2) Minimize (21) by using the weighted median procedure and find ß2 k1 .
Step 4) Increase k1 by one and go to step 1.

Algorithm 2
The crude algorithm 1 for finding ß1^ and ß2^ is not computationally efficient because it usually requires testing the majority of observations. In order to make this algorithm efficient, some elaborations are necessary.
Instead of setting k1=1, let us set k1 equal to an arbitrary integer value such as a where a is any integer from one to n. Suppose now, ua=ya-ß1-ß2x2a=0, and rewrite (21) as, n min: Sa = Σ │yi a -ß2x2i a │ (28) ß2 i=1 Minimizing (28) gives ß2 a which is equal to yb a yb -ya ß2 a = ─── = ───── (29) x2b a x2b -x2a On the other hand, by pivoting on the a th observation, another point such as b is found where b refers to that observation which has zero error in the minimum solution of (28), so, ub=yb-ß1-ß2x2b=0. Let us denote the minimum of Sa in (28) as Sa * , then, n Sa * = Σ │yi a -ß2 a x2i a │ (30) i=1 Note that, ß1 = y a -ß2 a x2 a = yb -ß2 a x2b (31) This comes from multiplying (29) by its denominator and rearranging the terms. Substitute (31) in (17), (32) is Sa * and the second sum is Sb evaluated at ß2=ß2 a . Thus, it can be concluded that, Sa * = Sb│ (33) │ß2=ß2 a Sa * is at the minimum but Sb│ still can be minimized for other values of ß2.
│ß2=ß2 a Therefore, an important result is derived, that is, Sa* > Sb* (34) The inequality of (34) guarantees that if we choose an arbitrary point to transfer the origin of coordinates to it and minimizing the objective function (21) another point is found, then transferring the origin to this newly found point decreases the total sum of absolute errors. Therefore, at each transference we get closer to the minimum of S. By a similar discussion, this conclusion can be generalized as, Sa* > Sb* > Sc* > Sd* ... (35) Note that a is an arbitrary starting point. The point b is derived by minimizing Sa, c is derived by minimizing Sb, and d is derived by minimizing Sc, and so on. Now, the question is, when the minimum value of S is reached? Suppose S*=Sf*; by transferring the origin of coordinates to the point f and minimizing Sf, the g th observation is derived. When S*=Sf* by minimizing Sg, f th observation is again found, because Sg*=Sf*=S* and the g th and f th observations are both on the L1 norm regression line. This conclusion can be a criterion to stop the procedure. Hence, Sa* > Sb* > Sc* > Sd* ... > Sf* = Sg* = S* (36) It should be noted that if the minimum solution of S is not unique, that is function S has a horizontal segment, the procedure stops when it reaches the first minimum solution.
Now, let us introduce the whole stages of this algorithm to find the L1 norm estimates of ß1 and ß2 in the simple linear model (16).

Algorithm 2
Step 0) Select an arbitrary integer value a between one to n and set k1=a.
Step 2) Minimize (21) by using weighted median procedure and find the appropriate observation on the line, namely observation b.
Step 4) Minimize (21) and find that observation on the line; observation c.
Step 6) Set a=b and go to step 1.
Operationally, the following steps are taken.
Step 4) Compute the solution.
Stop. EN 4. Generalization to m parameters Now we extend the above procedure to the restricted two parameters model, namely, Minimization of (39) is similar to that of simple linear model explained in the previous section. An important distinction for solving (39) in comparison to (17) is the expression │x2i│ which has been multiplied to │yi s1 -ß2-ß3x3i s1 │. This multiplication does not make any problem when (39) is minimized, because if we deviate yi s1 and x3i s1 from the point k1 as, yi s1k1 = yi s1 -yk1 s1 , x3i s1k1 = x3i s1 -x3k1 s1 i=1,...,n (41) then we can rewrite (39) similar to (21), so, n To minimize (42), we should use the following expression, n Sk1 = Σ │x2ix3i s1k1 ││yi s1k1 /x3i s1k1 -ß3│ (43) i=1 According to Bidabad (1987a,88a), in applying discrete derivative to (43) the term in the first absolute value sign is used to find the subscript of that point which locates on the regression line. In comparison to the simple unrestricted linear model (16), this is the main difference.
In the case of inclusion of an intercept in the model given by (37), we have, n S = Σ │yi -ß1 -ß2x2i -ß3x3i│ (44) i=1 Let k1 be an arbitrary subscript then, the transference of the origin of coordinates to the k1 th point is done by deviating all observations from this point, namely, yi k1 = yi -yk1, x2i k1 = x2i -x2k1, x3i k1 = x3i -x3k1 i=1,...,n (45) Rearrange the terms of (45) and substitute in (44), then we have, n S = Σ │yi k1 -ß2x2i k1 -ß3x3i k1 +(yk1-ß1-ß2x2k1-ß3x3k1)│ (46) i=1 If the k1 th observation is on the regression plane, then yk1 -ß1 -ß2x2k1 -ß3x3k1 = 0 (47) So, instead of minimizing (44), the following function is minimized: Minimization of (48) is completely similar to that of (38) and can be proceeded as follows, yi s1 = yi k1 /x2i k1, x3i s1 = x3i k1 /x2i k1 i=1,...,n (50) Now, again, transfer the origin of the two dimensional Y s1 xX3 s1 coordinates to an arbitrary point k2 by deviating yi s1 and x3i s1 from yk2 s1 and x3k2 s1 as follows, yi s1k2 = yi s1 -yk2 s1 , x3i s1k2 = x3i s1 -x3k2 s1 i=1,...,n (51) By rearranging the terms of (51) and substituting them into (49) and assuming that the point k2 is on the regression plane, we can rewrite (49) The objective function (53) can be minimized, as suggested before. Now, the procedure from (49) to (53) can be repeated with different values of k2 as in algorithm 2 proposed for the simple linear model. When the last point (M) in the process of minimizing (53) is reached, the origin of three dimensional YxX2xX3 coordinates (k1) is transferred to this newly found point M and the procedure from (45) to (53) is again repeated with the exception that instead of assigning an arbitrary value to k2, we set k2 equal to the previous value of k1. This procedure continues until the point found by minimizing (53) is equal to the previous value of k1. The values of ß1^, ß2^ and ß3^ can be computed according to the following formulas, ß3^ = yM s1k2 /x3M s1k2 , 2^ = yk2 s1 -ß3^x3k2 s1 , ß1^ = yk1 -ß2^x2k1 -ß3^x3k1 (54) It should be noted that at each step, it can be proved that we are getting closer to the minimum of S in (44). The proof is similar to that of the two parameters model given by (16).
It can be shown that for any two arbitrary points k1 and k2, the value of Sk1k2 given by (53) for any commutation for k1 and k2 remains unchanged. This will be shown in the next sections. However, Sk1k2 = Sk2k1 (55) Now, following the procedure of the proof given by (28) through (36), we can write, Sk1a* = Sk1b│ (56) │ß3=ß3 k1a where Sk1a* denotes the optimal value of Sk1a with respect to ß3 of (53) for any arbitrary integer a from one to n, namely, Sk1a* = min (Sk1a) (57) ß3 Subscript b in (56) denotes the observation which is found by minimizing Sk1a. Since the a th and b th observations both are on the regression hyperplane, by a similar discussion stated by (28) through (32) we can conclude that Sk1a* is equal to Sk1b which evaluated at that value of ß3 which is found by minimizing Sk1a. Since the left-hand side of (56) is minimum and its right-hand side can be decreased by other values of ß3, it can be concluded that, Sk1a* > Sk1b* (58) According to (55) we can write, Sk1a* > Sk1b* = Sbk1* (59) Since in each step we discard an observation from the basis and replace it with the newly found one, we have, Sk1a* > Sk1b* = Sbk1* > Sbc* = Scd* (60) or generally, Sk1a* > Sk1b* > Sbc* > Scd* The minimum solution S* is reached when by entering the f th observation in the basis we can not reduce the objective function value. That is the previous observation is again reached, namely, Sk1a* > Sab* > Sbc* > Scd* > ... > Sfg* = Sgf* = S* (62) The relation (62) guarantees that at each step, we are descending down the objective function surface.

Algorithm 3
To generalize the above algorithm to the m parameters linear model (1), we should reduce the number of parameters in the same fashion as in the case of three parameters model explained above. If the model is restricted, we can make it unrestricted by dividing all dependent and independent variables to one of the independent variables as follows. n m n m If the model is unrestricted, we can make it restricted by deviating all observations from an arbitrary one observation; namely, n m n m yi k1 = yi -y k1 , xji k1 = xji -xjk1 i=1,...,n; j=2,...,m (65) Therefore, according to the transformations (63), (64) and (65), any m parameters model can be reduced to a simple restricted one parameter model and then be estimated as a weighted median problem. To conduct this transformation, if the model is unrestricted, we should deviate all observations from an arbitrary one and make the model restricted. Then divide all dependent and independent variables to an arbitrary independent variable. This makes the model unrestricted. At this step, we have reduced the number of parameters by one. By continuing this process, we can reduce any m parameters model to a oneparameter model. If the model is restricted, we should start by dividing all of the variables by one of the independent variables and make the model unrestricted. Now, we can again reduce one of the parameters by applying the procedures explained above in order to transform the unrestricted model to restricted form. By solving the one parameter model, we can then solve for the two parameters and then three parameters models and so on.
The most delicate part of this algorithm is that, at the starting point, once k1,k2,...,k(m-1) are selected arbitrarily, then the algorithm assigns the best possible values to the integers k1,k2,...,k(m-1). To explain the procedure, let us deal with the four parameters unrestricted linear model. Once, the value of k1 is arbitrarily selected. Deviate all observations from k1 th observation. In this way, the model reduces to a three parameters restricted model. By dividing all of the variables to one of the independent variables, our model becomes completely similar to (44). By minimizing (44) according to the algorithm previously explained for the three parameters model, subscript M corresponding to M th point is derived. This is the newly found point which its subscript (M) is assigned to k1. The previous value of k1 is assigned to k2, and the previous value of k2 is assigned to k3, and the whole procedure is repeated again. The procedure stops when the value of M is equal to k1.
The above important assigning technique, which is essential for pivoting on the origins of different size coordinates, can be extended to more parameters models as we did before. Again, we should note that at each succeeding step, we get closer to the minimum of S in (1). The proof is completely similar to the three parameters model explained before. The whole procedure is as follows.
Step 2) Proceed the transformations (63) through (65) to reduce the model to a one-parameter transformed restricted linear model.
Step 3) Apply weighted median procedure to find the point M.
Step 4) If M is equal to the previous value of k(counter) go to step 5; otherwise set k(counter)=M and go to step 1.
Step 5) If counter = m-1 go to step 6; otherwise let counter=counter+1 and go to step 4.

Algorithm 4
In algorithm 3 to solve any m parameters model, the reduced m-1 parameters model should be solved first and therefore makes the algorithm rather costly. Because to solve a m parameters model, we should reduce the objective function value of the reduced m-1 parameters model down to its minimum value and then going back to the m parameters model. In interior steps for m-1,m-2,...,1 this procedure repeats. In order to make this algorithm more efficient, another assigning technique may be adopted.
Once m-1 observations are selected arbitrarily. The S function is reduced to a one-parameter weighted median problem as we did in algorithm 3 using (63) and (64). By solving the weighted median problem, a new observation M is found. Now, the newly found point is replaced by the most previous observation which entered the basis. The procedure is repeated until we can not find any new observation out of the current set of basis points and the deleted point in the previous iteration. On the other hand, in this algorithm m-1 point is selected as a basic feasible solution. By pivoting on this solution, a new point is found to enter the basis and should be replaced by another one which was previously on the basis. The procedure ends when we cannot enter any new point to reduce the objective function value. In this context, it is similar to the simplex procedure of linear programming technique.
Algorithm 4 is also a descent method. Because in each iteration the value of S is decreased. The proof is completely similar to that of algorithm 3 stated before. The main difference of this algorithm with algorithm 3 is in choosing the point which should be deleted from the basis. Therefore, in each step, we need not keep some points in the basis to solve smaller size models. On the other hand, in each m iterations, all m points in the basis are discarded, and new points enter. In algorithm 3, we kept some points in the basis until we reach an optimal point for smaller size reduced model. It should be noted that one point may be discarded and reenter the basis through m iterations. This is not contradictory and actually occurs, especially when we are near the optimal solution.
As we will see in the forthcoming section, due to different properties of this algorithm, many manipulations in computational steps may be adopted to start calculations. Before making this algorithm more sophisticated, it is suitable to summarize its general steps as follows.
Step 1) Proceed the transformations (63) through (65) to reduce the model to one-parameter transformed restricted linear model.
Step 2) Apply weighted median procedure to find the point M.
Step 3) Assign M to one of the k1,...,k(m-1) which is oldest in the basis.
Step 4) Check if the newly found point M which enters the basis, discard the point which entered in the previous iteration repeats m-1 times then go to step 5; otherwise go to step1.
After discussing the properties of algorithm 4, we will give more expository steps of computation for this algorithm. By using these properties, we can make this algorithm more efficient.

properties
Before discussing the properties of the proposed algorithms specifically algorithm 4, let us consider the following four parameters model L1 norm objective function, n S = Σ │yi-ß0-ß1x1i-ß2x2i-ß3x3i│ (66) i=1 Some steps should be taken to reduce (66) to a weighted median problem by entering three arbitrary integers k1, k2, k3 between one and n as follows. Note that function (1) has been slightly redefined to include intercept as a simple term, in equation (66). Similar to the transformations (63) through (65), the following steps are taken sequentially.

Property 1
If we start the computation with m-1 points which do not belong to the original set of observations, the value of S decreases to its global minimum at each iteration. Suppose the points (yn+1,...,xm-1,n+1),...,(yn+m-1,...,xm-1,n+m-1) do not belong to our sample points. If n+1,n+2,...,n+m-1 are assigned to k1,k2,...,k(m-1), then minimizing the function in (90) leads to finding a new point M which belongs to the sample observations. Replacing k1 by M and deleting the (n+1) th point from the basis and minimizing (90) leads to finding another point M' in the sample which will be replaced by the k2 th observation which is out of our sample. Thus, in each iteration one of the out of sample points is deleted, and optimization procedure will be done on the sample points after m-1 iterations.
This property has a good performance on ill-conditioned data. That is if for example, rounding error causes an observation to get incorrect rounded value, then this point in the next iteration will be replaced by a sample point and thus redirect the path of descending to the right points.

Property 2
Reduction of m-1 parameters model to fewer parameters model can be made by using the computation of m-1 parameters model and conversely. Thus stepwise and all possible regressions may be computed efficiently.

Property 4
By any commutation for kj for j=1,...,m-1, values of wi k1...k(m-1) for all i=1,...,n defined by (91) remain unchanged. This conclusion is made by multiplication of sequential parentheses of (97) and deleting the suitable terms. This property accompanying with property 3 guarantees that any commutation for k1,...,k(m-1) does not change the necessary items of (91) and (92) in the computation of weighted median problem (90).

Property 8
Updating procedure to evaluate different subsets of sample observations can be simply done by assigning zero values to their wh k1k2...k(m-1) in (90) for deleting points. This makes those unwanted points h's be unaffected in the calculation.

Property 9
According to properties 6 and 8, since the value of wh k1k2...k(m-1) =0 for all h=k1,k2,...,k(m-1); the point M which derives from solving weighted median problem (90) will not be equal to k1,k2,...,k(m-1). This property guarantees that when we enter an observation into the basis, we do not find this point again in the same iteration.

Property 10
If all kj for j=1,...,m-1 are equal in starting time, no problem occurs in the process of computation, but the execution time may be increased. Conversely, if the kj points are selected in such a way that they are near the minimum of S, the algorithm converges to the optimal solution with fewer iterations.

Property 11
Values of wi k1k2...k(m-1) and ri s1s2...s(m-1) , which are essential to solving the weighted median problem of (90) can be retrieved from their values in the previous iteration. That is when the point kh is discarded from the basis and replaced by kj, values of wi k1k2...k(h-1) and ri s1s2...s(h-1) do not change and need not be computed again. Applying this property to algorithm 4 makes it more efficient.

Property 12
Since at each iteration, values of ri s1s2...s(m-1) and wi k1k2...k(m-1) should be computed from the source input data y and X, rounding error does not accumulate, and convergence does not perturb. This property makes the algorithm safe from the accumulation of rounding error, which exists in simplex-type algorithms.
Step 1) Compute weights and ratios.

Initial value problem
Selection of the arbitrary points k1,k2,...,k(m-1) at the first stage of computation has an important role in the number of iterations necessary to reach the optimal solution. One way to select these m-1 points may be based on applying another estimator to guess those points which their residuals are smallest in absolute values among all estimated residuals. That is one estimator for example least squares is applied on the data, and smallest m-1 residuals in absolute values are selected, and the corresponding points to these residuals are nominated for starting algorithms 1, 2, 3 or 4. However, we examined this procedure for algorithms 2 and 4 and found that it makes them more efficient.
In the process of this study, we found some points which might be leading to develop the field of L1 norm computation. In the proposed algorithm 4, we could not find a criterion to delete that observation from the current set of observations on the basis, which reduces the objective function value more than other points. If anyone finds a criterion like the heuristic method of Bloomfield and Steiger (1980), the number of iterations will decrease very much and will reduce computation time.
The main obstacle to applying discrete differentiation technique to operational problems is reordering of the residuals in descending order before starting the computation. If this obstacle is removed, a similar algorithm to that of simple restricted linear regression for multiple regression may be proposed.