Three-Stage Procedure and Expected Opportunity Cost for Selecting a Good Simulated Design

Authors : Mohammad H. Almomani, Adam Baharum and Rosmanjawati Abdul Rahman

Abstract: Three-stage procedure is used to select one of the best simulated designs with high probability. This procedure involves three stages that consists the ordinal optimization approach, subset selection approach and indifference-zone approach. Normally the three-stage procedure focuses on the probability of correct selection as a measure of selection quality. Another measure of selection quality is the expected opportunity cost of a potentially incorrect selection where the opportunity cost in the simulation selection problem is defined as the difference of unknown mean between the selected best design and the actual best design. In this study, we select a good simulated design using the three-stage procedure and utilizing the expected opportunity cost of a potentially incorrect selection. The results of the numerical example show that the procedure selects a good simulated design with high probability and minimum expected opportunity cost.

How to cite this article:

Mohammad H. Almomani, Adam Baharum and Rosmanjawati Abdul Rahman, 2010. Three-Stage Procedure and Expected Opportunity Cost for Selecting a Good Simulated Design. Research Journal of Applied Sciences, 5: 417-423.

DOI: 10.3923/rjasci.2010.417.423

URL: https://medwelljournals.com/abstract/?doi=rjasci.2010.417.423

INTRODUCTION

Statistical selection procedures are used to identify the best of a finite set of simulation alternatives where the best simulated design is defined in terms of the minimum (maximum) expected value of each alternative. In this study, we are considering the problem of selecting the best design from a finite and large set of alternatives where the expected value of each alternative can be inferred by the simulation. A three-stage procedure proposed by Almomani and Alrefaei (2010) identify the best design by using three stages. At first stage, the Ordinal Optimization (OO) approach is used to select a subset that overlaps with the set of the best m% designs with high probability.

At second stage, Subset Selection (SS) approach is used to get a smaller subset that contains the best among the subset that was selected at the first stage before. Finally, the Indifference-Zone (IZ) approach is used to select the best design among the subset from the second stage. However, since simulation output has randomness, the best design cannot be selected with certainty. Instead, selection procedures provide some measure of the quality of selection. There are two measures of selection quality, the first one is the Probability of Correct Selection P (CS) and the second measure is the Expected Opportunity Cost E (OC) of a potentially incorrect selection. Almomani and Alrefaei (2010) proposed the three-stage procedure from the contexts of P (CS). However, this study provides the three-stage procedure from the context of E (OC), the second measure of the quality. Traditional selection procedures identify the best design with high probability of correct selection by maximizing the P (CS). Since, the E (OC) become important in many applications like business and engineering applications, it leads to a recently new selection procedures to reduce the opportunity cost of a potentially incorrect selection (Gupta and Miescke, 1994, 1996; Chick and Inoue, 2001a, b). (Almomani and Alrefaei, 2010) showed that the three-stage procedure is indeed to select one of the best simulated design with its P (CS). In this study we are using the three-stage procedure to select the best simulated design by providing the E (OC) as a measure of the quality of selection.

PROBLEM STATEMENT

We consider the following optimization problem:

Where:

f	=	The expected performance measure of some complex stochastic system
Θ	=	Arbitrary feasible solution set that finite and has no structure
θ	=	A vector representing the system design parameters
Y	=	Represents all the random effect of the design and
L	=	A deterministic function that depends on θ and Y

The goal of the selection procedure is to identify the best n simulated designs where the best in this study is defined as the design with the smallest mean which is unknown and to be inferred from simulation. Suppose that there are n designs. Let Y_ij represent the jth output from design i and let Y_i = {Y_ij, j = 1, 2,....} denote the output sequence from design i. We assume that Y_ij are independent and identically distributed (i.i.d) normal with unknown means μ_i = E (Y_ij) and variances σ² = V_ar (Y_ij). In addition, we assume that the Y₁, Y₂,.......,Y_n are mutually independent. The normality assumption is not a problem here because simulation outputs are obtained from an average performance or batch means. So by using the Central Limit Theorem the normality assumption is hold. In practice the σ²_i are unknown so we estimate it using the sample variances S²_i. Throughout this study we assume that a smallest mean is better so if the ordered μ_i-values are denoted by μ_[1]≤μ_[2]≤.....≤μ_[n] then the design having mean is referred μ_[1] to as the best design. The CS occurs when the design selected by the selection procedure is the same as the true best design.

In the simulation selection problem, the opportunity cost is represented as the difference between the unknown means of the selected design and the actual best design:

Where:

b	=	The design with the smallest sample mean
i^*	=	The true best design

If the best design is correctly selected then the OC is zero:

This follow with the Expected Opportunity Cost (E (OC)) in which is defined as E (OC) = E (μ_b-μ_i*) (Chick and Wu, 2005; He et al., 2007).

BACKGROUND

Ordinal Optimization (OO): OO concentrates on isolating a subset of good designs with high probability and reduces the required simulation time dramatically for discrete event simulation. The OO approach has been proposed by Ho et al. (1992). Since then, it has emerged as an efficient technique for simulation and optimization. In this approach, the aim is to find a good enough solution, rather than to estimate accurately the performance value of these designs. If we simulate the design to estimate the expected performance measure, the confidence interval of this estimate cannot be improved faster than where n is the number of replications used. This rate may be good for many problems when there are a small number of alternatives but it is not good enough for a complex simulation with a larger number of alternatives. The reason is that since each sample requires one simulation run, we need a large number of samples when we are dealing with a huge number of alternative designs in the solution set which is very hard and impossible. In this case, one could compromise the objective to get a good enough solution rather than doing extensive simulation which is impossible in many cases. However, in many real world applications using a simpler model to get an accurate estimate is somehow impossible. Suppose that the Correct Selection (CS) is to select a subset G of g designs from the feasible solution set that contains at least one of the top m% best designs. Since we assume that Θ is very huge then the probability of CS is given by:

Now, suppose that the CS is to select a subset G of g designs that contains at least r of the best s designs. Let S be the subset that contains the actual best s designs then the probability of CS can be obtained using the hyper geometric distribution as:

However, it assume that the number of alternatives is very large then P (CS) can be approximated by the binomial random variable: Therefore,

where it assume that s/nx100% = m%. It is also clear that this P (CS) increases when the sample size g increases.

Subset Selection (SS): SS approach screens out the search space and eliminate non-competitive designs and construct a subset that contains the best design with high probability. We can use this approach when the number of alternatives is relatively large to select (a random-size) subset that contains the actual best design. We require that P (CS)≥1-α where CS the correct selection is selecting a subset that contains the actual best design and 1-α is a predetermined probability. The SS approach dating back to Gupta (1965) who presented a single stage procedure for producing a subset containing the best design with a specified probability. Extensions of this research which is relevant to the simulation setting include (Sullivan and Wilson, 1989) who derived a two stage SS approach that determines a subset of maximum size m that with a specified probability will contain designs that are all within a pre-specified amount of the optimum.

Indifference Zone (IZ): The main aim of IZ approach is selecting the best design among n designs when n≤20. Suppose we have n alternative designs that are normally distributed with unknown means μ₁, μ₂,.....μ₇₃ and suppose that these means are ordered as μ_[1]≤μ_[2]≤,.....≤μ_[73]. We seek to locate the design that has the best minimum mean μ_[1]. The IZ is defined to be the interval [μ_[1], μ_[1]+δ] where δ is a predetermined small positive real number. We are interested in selecting an alternative k such that μ_kε [μ_[1], μ_[1]+δ]. Let CS here is selecting an alternative whose mean belongs to the indifference zone. We would like the correct selection to take place with high probability say with a probability not smaller than 1-α where 1/n≤1-α≤1. Care must be taken in specifying the value of δ because the design requires to implicitly determining the common single-stage sample size required for the n competing designs. If δ is too small then the number of observations is expected to be large. Also when the 1-α is large then the number of observations is expected to be large. The indifference-zone approach consists of two stages. In the first stage, all designs are sampled using t₀ simulation runs to get an initial estimate of expected performance measure and their variances. Next, depending on the information obtained in the first stage, we compute how many more samples are needed in the second stage for each design to guarantee that P (CS)≥1-α. Rinott (1978) has presented a procedure that is applicable when the data are normally distributed and all designs are simulated independently of each others. This procedure consists of two stages for the case when variances are completely unknown. Tamhane and Bechhofer (1977) has presented a simple procedure that is valid when variances may not be equal. Nelson et al. (2001) proposed a two-stage subset selection procedure.

The first stage is to reduce the number of competitive designs. These designs are carried out to the second stage in which involved with the indifference-zone approach using the information gathered from the first stage. Alrefaei and Almomani (2007) proposed two sequential algorithms for selecting a subset of k designs that is contained in the set of the top s designs. Another comprehensive review of ranking and selection procedures can be found by Goldsman and Nelson (1994), Bechhofer et al. (1995) by Kim and Nelson (2006, 2007).

THREE-STAGE PROCEDURE

This procedure consists three stages in the first stage, a subset G is selected randomly from the feasible solution set Θ. Let g denotes the size of the subset G where g is a relatively small number but the probability that G contains one of the best m% alternatives is high. In the second stage, the SS approach is applied on the subset G to select a subset I that contains the best design with high probability where |I|≤20. Note that the size of the subset G is small so we can use the SS approach. Finally, the IZ approach is applied on the set I to select the best design. Note that the size of set I is ≤20 so that we can use the IZ approach. The algorithm of the three-stage procedure is described as follows:

Algorithm:

•	Determine the number of alternatives g in G, i.e., \|G\| = g where G is the subset that will be selected in the first stage that satisfies P (G contains at least one of the best m% design) = ≥1-α₁. Determine the number of samples t₀≥2 and the indifference zone δ from the t-distribution

•	Select a subset G of size g randomly from the feasible solution set Θ

•	Take random samples of t₀ observations y_ij (1≤j≤t₀) for each design i, i = 1,......., g

•	Calculate the first stage sample means and variances and as:

where, i = 1,..., g and v = t₀-1 degrees of freedom.

Let:

•

Set:

Where:

•	If I contain a single index then this design is the best design. Otherwise for all i∈I compute the second stage sample size T_i-t₀, where:

h = (1-α/2, t₀, -I-) be the Rinott’s constant and can be obtained from tables of (Wilcox, 1984).

•	Take a random sample of T_i-t₀ additional observations for all designs i∈I and compute the overall sample means:

For all iεI

•	Select the design i∈I with the smallest as the best design

Nelson et al. (2001) have shown that with a probability of at least 1-(α₂+α₃) this approach selected the best design in the subset G. Therefore, if G contains at least one of the top m% designs then this procedure selects a good design with probability 1-(α₂+α₃). From the OO, the selected set G contains at least one of the best m% designs with probability:

Therefore, the P (the selected design by three-stage procedure is in the top m% designs)≥(1-α₁)(1-(α₂+α₃)).

NUMERICAL EXAMPLE

Consider the queuing designs when the inter arrival times and the service times are exponentially distributed and the design has one server. This queuing design is known as M/M/1 queuing design. In this design, the arrival process follows a Poisson distribution with rate λ (the mean arrival rate) and the service time is exponentially distributed with parameter μ (the mean service rate). Suppose we have n M/M/1 queuing designs and we want to select one of the best m% queuing models that have the minimum average waiting time per customer. We assume that the arrival rate λ is fixed and the service rate με[a, b]. Of course this problem has analytical solution in the steady state case. We implement the algorithm of three-stage procedure for solving this problem under some parameter settings. In the first experiment, we assume that n = 1000, λ = 1 and με[4, 5]. We discretize the problem by assuming that Θ = {4.001, 4.002,...., 5.0000}. Of course, the best design is 1000th design with μ₁₀₀₀ = 5. Suppose we want to select one of the best (2%) designs. Then we are shooting for designs 981-1000. Let |G| = 50, α₂ = α₃ = 0.005 and δ = 0.05. We consider it as a correct selection if the selected design belongs to {μ₉₈₁, μ₉₈₂,..., μ₁₀₀₀}. Note that, we can evaluate the analytical probability of the correct selection as:

Table 1:	The numerical illustration for n = 1000, g = 50, t₀ = 30, m% = 2%

Table 2:	The P (CS) and for n = 1000, g = 50, t₀ = 30, m% = 2% over 100 replications

Table 1 contains the result of this illustration with 10 replications for selecting one of the best (2%) designs. From the Table 1, best is referring to the index of the design that has been selected by three-stage procedure as the best design, is the unknown average waiting time for the actual best design i^* (here i^* = 1000 and = 0.25), is the unknown average waiting time for the best design b (b = Best), is the sample mean for the actual best design I, is the sample mean for the best design b, three-stage and the analytical . Note that we can find and from the formula:

Where:

μ_i	=	The service rate for the design
i and λ_i	=	The arrival rate for the design i

Moreover, after the simulation is performed, can be calculated according to the design output. Note that in some replications there are negative values for three-stage E (OC) whereas E (OC) should be positive values. The reason for this is that in the first stage of three-stage procedure we used OO to select randomly the set G so in some replications we are not selecting the actual best design i^* from the first stage. This means that the actual best design i^* is not belongs to the set G. It implying that we have not take into consideration, the addition observation in stage three compare to the best design b. Therefore the sample mean for the best design b sometime become less than the sample mean for the actual best design and this implies that the three-stage . The above numerical examples are extended for 100 replications and the results are shown in Table 2. It is clear that each P (CS) and values (the average number of Expected Opportunity Cost) for three-stage procedure is closed to the analytical values. Figure 1 shows the relation between the E (OC) for the three-stage procedure and the analytical E (OC) when this experiment were repeated for 100 replications.


Fig. 1:	Relation between three-stage E (OC) and Analytical E (OC) for n = 1000, g = 50, t₀ = 30, m% = 2% over 100 replications


Fig. 2:	Relation between three-stage E (OC) and Analytical E (OC) for n = 1000, g = 50, t₀ = 30, m% = 2% over 100 replications

If we take the difference between the and as absolute value (positive) then we can see the relation between three-stage E (OC) and Analytical E (OC) in Fig. 2. Note that three-stage procedure produce a very small E (OC) and are mostly closed to the values of the analytical E (OC). For the next experiment, assume n = 1000, λ = 1 and με[4, 5]. We want to select one of the best (5%) designs. Let g = 50, α₂ = α₃ = 0.005 and δ = 0.05. We consider it as a correct selection if the selected design belongs to {μ₉₅₁, μ₉₅₂,..., μ₁₀₀₀}. Note that we can evaluate the analytical probability of the correct selection as:

Table 3 contains the result of this illustration with 10 replications for selecting one of the best (5%) designs. The above numerical experiments are extended for 100 replications and the results are shown in Table 4. The following still show that each value of P (CS) and for three-stage procedure is closed to the analytical values.


Fig. 3:	Relation between three-stage E (OC) and Analytical E (OC) for n = 1000, g = 50, t₀ = 30, m% = 2% over 100 replications


Fig. 4:	Relation between three-stage E (OC) and Analytical E (OC) for n = 1000, g = 50, t₀ = 30, m% = 2% over 100 replications

Table 3:	The numerical illustration for n = 1000, g = 50, t₀ = 30, m% = 2%

Table 4:	The P (CS) and for n = 1000, g = 50, t₀ = 20, m% = 5% over 100 replications

Figure 3 show the relation between the E (OC) for the three-stage procedure and the analytical E (OC) when this experiment were repeated for 100 replications. If we take the difference between the and as absolute value then we can see the relation between three-stage E (OC) and Analytical E (OC) in Fig. 4.

CONCLUSION

In this study, we have presented a three-stage procedure that selects one of the good enough simulated design in contexts of E (OC) of a potentially incorrect selection. In the first stage, we use the ordinal optimization approach to select a relatively small subset G where the intersection with the set that contains the actual best m% alternatives is high. Then subset Selection approach are used to select a smaller subset I that contains the best alternative of G where |I|<20. These are following by the indifference-zone approach to select the best alternative of I which highly results in selecting an alternative that belongs to the top m% alternatives. We have also discussed how to implement the algorithm for a given problem involving M/M/1 queuing models.

The numerical results indicate that the three-stage procedure is sound and work practically well. We also note that the P (CS) values for this procedure are much closed to the values of analytical P (CS). In addition the three-stage procedure also guarantees to provide the minimum value of E (OC). In future research we would like to improve the three-stage procedure to reduce the E (OC) and applied it in real life problems. Moreover, we will study the efficiency of three-stage procedure when there is a changing in the initial sample size t₀, since we cannot ignore that from the numerical example, it shows that the efficiency of three-stage procedure is sensitive to t₀.

ACKNOWLEDGEMENT

The researchers would like to thank School of Mathematical Sciences, University Sains Malaysia for supporting this research.

Related Links

Journals By Subject

Research Journal of Applied Sciences

Three-Stage Procedure and Expected Opportunity Cost for Selecting a Good Simulated Design

How to cite this article: