INTRODUCTION
Statistical selection procedures are used to identify the best of a finite
set of simulation alternatives where the best simulated design is defined in
terms of the minimum (maximum) expected value of each alternative. In this study,
we are considering the problem of selecting the best design from a finite and
large set of alternatives where the expected value of each alternative can be
inferred by the simulation. A three-stage procedure proposed by Almomani
and Alrefaei (2010) identify the best design by using three stages. At first
stage, the Ordinal Optimization (OO) approach is used to select a subset that
overlaps with the set of the best m% designs with high probability.
At second stage, Subset Selection (SS) approach is used to get a smaller subset
that contains the best among the subset that was selected at the first stage
before. Finally, the Indifference-Zone (IZ) approach is used to select the best
design among the subset from the second stage. However, since simulation output
has randomness, the best design cannot be selected with certainty. Instead,
selection procedures provide some measure of the quality of selection. There
are two measures of selection quality, the first one is the Probability of Correct
Selection P (CS) and the second measure is the Expected Opportunity Cost E (OC)
of a potentially incorrect selection. Almomani and Alrefaei
(2010) proposed the three-stage procedure from the contexts of P (CS). However,
this study provides the three-stage procedure from the context of E (OC), the
second measure of the quality. Traditional selection procedures identify the
best design with high probability of correct selection by maximizing the P (CS).
Since, the E (OC) become important in many applications like business and engineering
applications, it leads to a recently new selection procedures to reduce the
opportunity cost of a potentially incorrect selection (Gupta
and Miescke, 1994, 1996; Chick
and Inoue, 2001a, b). (Almomani
and Alrefaei, 2010) showed that the three-stage procedure is indeed to select
one of the best simulated design with its P (CS). In this study we are using
the three-stage procedure to select the best simulated design by providing the
E (OC) as a measure of the quality of selection.
PROBLEM STATEMENT
We consider the following optimization problem:
Where:
| f |
= |
The expected performance measure of some complex stochastic
system |
| Θ |
= |
Arbitrary feasible solution set that finite and has no structure |
| θ |
= |
A vector representing the system design parameters |
| Y |
= |
Represents all the random effect of the design and |
| L |
= |
A deterministic function that depends on θ and Y |
|
The goal of the selection procedure is to identify the best n simulated designs
where the best in this study is defined as the design with the smallest mean
which is unknown and to be inferred from simulation. Suppose that there are
n designs. Let Yij represent the jth output from design i and let
Yi = {Yij, j = 1, 2,....} denote the output sequence from
design i. We assume that Yij are independent and identically distributed
(i.i.d) normal with unknown means μi = E (Yij) and
variances σ2 = Var (Yij). In addition,
we assume that the Y1, Y2,.......,Yn are mutually
independent. The normality assumption is not a problem here because simulation
outputs are obtained from an average performance or batch means. So by using
the Central Limit Theorem the normality assumption is hold. In practice the
σ2i are unknown so we estimate it using the sample
variances S2i. Throughout this study we assume that a
smallest mean is better so if the ordered μi-values are denoted
by μ[1]≤μ[2]≤.....≤μ[n]
then the design having mean is referred μ[1] to as the best
design. The CS occurs when the design selected by the selection procedure is
the same as the true best design.
In the simulation selection problem, the opportunity cost is represented as
the difference between the unknown means of the selected design and the actual
best design:
Where:
| b |
= |
The design with the smallest sample mean |
| i* |
= |
The true best design |
|
If the best design is correctly selected then the OC is zero:
This follow with the Expected Opportunity Cost (E (OC)) in which is defined
as E (OC) = E (μb-μi*) (Chick
and Wu, 2005; He et al., 2007).
BACKGROUND
Ordinal Optimization (OO): OO concentrates on isolating a subset of
good designs with high probability and reduces the required simulation time
dramatically for discrete event simulation. The OO approach has been proposed
by Ho et al. (1992). Since then, it has emerged
as an efficient technique for simulation and optimization. In this approach,
the aim is to find a good enough solution, rather than to estimate accurately
the performance value of these designs. If we simulate the design to estimate
the expected performance measure, the confidence interval of this estimate cannot
be improved faster than
where n is the number of replications used. This rate may be good for many problems
when there are a small number of alternatives but it is not good enough for
a complex simulation with a larger number of alternatives. The reason is that
since each sample requires one simulation run, we need a large number of samples
when we are dealing with a huge number of alternative designs in the solution
set which is very hard and impossible. In this case, one could compromise the
objective to get a good enough solution rather than doing extensive simulation
which is impossible in many cases. However, in many real world applications
using a simpler model to get an accurate estimate is somehow impossible. Suppose
that the Correct Selection (CS) is to select a subset G of g designs from the
feasible solution set that contains at least one of the top m% best designs.
Since we assume that Θ is very huge then the probability of CS is given
by:
Now, suppose that the CS is to select a subset G of g designs that contains
at least r of the best s designs. Let S be the subset that contains the actual
best s designs then the probability of CS can be obtained using the hyper geometric
distribution as:
However, it assume that the number of alternatives is very large then P (CS)
can be approximated by the binomial random variable: Therefore,
where it assume that s/nx100% = m%. It is also clear that this P (CS) increases
when the sample size g increases.
Subset Selection (SS): SS approach screens out the search space and
eliminate non-competitive designs and construct a subset that contains the best
design with high probability. We can use this approach when the number of alternatives
is relatively large to select (a random-size) subset that contains the actual
best design. We require that P (CS)≥1-α where CS the correct selection
is selecting a subset that contains the actual best design and 1-α is a
predetermined probability. The SS approach dating back to Gupta
(1965) who presented a single stage procedure for producing a subset containing
the best design with a specified probability. Extensions of this research which
is relevant to the simulation setting include (Sullivan
and Wilson, 1989) who derived a two stage SS approach that determines a
subset of maximum size m that with a specified probability will contain designs
that are all within a pre-specified amount of the optimum.
Indifference Zone (IZ): The main aim of IZ approach is selecting the
best design among n designs when n≤20. Suppose we have n alternative designs
that are normally distributed with unknown means μ1, μ2,.....μ73
and suppose that these means are ordered as μ[1]≤μ[2]≤,.....≤μ[73].
We seek to locate the design that has the best minimum mean μ[1].
The IZ is defined to be the interval [μ[1], μ[1]+δ]
where δ is a predetermined small positive real number. We are interested
in selecting an alternative k such that μkε [μ[1],
μ[1]+δ]. Let CS here is selecting an alternative whose
mean belongs to the indifference zone. We would like the correct selection to
take place with high probability say with a probability not smaller than 1-α
where 1/n≤1-α≤1. Care must be taken in specifying the value of δ
because the design requires to implicitly determining the common single-stage
sample size required for the n competing designs. If δ is too small then
the number of observations is expected to be large. Also when the 1-α is
large then the number of observations is expected to be large. The indifference-zone
approach consists of two stages. In the first stage, all designs are sampled
using t0 simulation runs to get an initial estimate of expected performance
measure and their variances. Next, depending on the information obtained in
the first stage, we compute how many more samples are needed in the second stage
for each design to guarantee that P (CS)≥1-α. Rinott
(1978) has presented a procedure that is applicable when the data are normally
distributed and all designs are simulated independently of each others. This
procedure consists of two stages for the case when variances are completely
unknown. Tamhane and Bechhofer (1977) has presented
a simple procedure that is valid when variances may not be equal. Nelson
et al. (2001) proposed a two-stage subset selection procedure.
The first stage is to reduce the number of competitive designs. These designs
are carried out to the second stage in which involved with the indifference-zone
approach using the information gathered from the first stage. Alrefaei
and Almomani (2007) proposed two sequential algorithms for selecting a subset
of k designs that is contained in the set of the top s designs. Another comprehensive
review of ranking and selection procedures can be found by Goldsman
and Nelson (1994), Bechhofer et al. (1995)
by Kim and Nelson (2006, 2007).
THREE-STAGE PROCEDURE
This procedure consists three stages in the first stage, a subset G is selected
randomly from the feasible solution set Θ. Let g denotes the size of the
subset G where g is a relatively small number but the probability that G contains
one of the best m% alternatives is high. In the second stage, the SS approach
is applied on the subset G to select a subset I that contains the best design
with high probability where |I|≤20. Note that the size of the subset G is
small so we can use the SS approach. Finally, the IZ approach is applied on
the set I to select the best design. Note that the size of set I is ≤20 so
that we can use the IZ approach. The algorithm of the three-stage procedure
is described as follows:
Algorithm:
| • | Determine
the number of alternatives g in G, i.e., |G| = g where G is the subset
that will be selected in the first stage that satisfies P (G contains
at least one of the best m% design) = ≥1-α1. Determine
the number of samples t0≥2 and the indifference zone δ
from the t-distribution |
|
| • | Select
a subset G of size g randomly from the feasible solution set Θ |
|
| • | Take
random samples of t0 observations yij (1≤j≤t0)
for each design i, i = 1,......., g |
|
| • | Calculate
the first stage sample means and variances
and
as: |
|
where, i = 1,..., g and v = t0-1 degrees of freedom.
Let:
Where:
| • | If
I contain a single index then this design is the best design. Otherwise
for all i∈I compute the second stage sample size Ti-t0,
where: |
|
h = (1-α/2, t0, -I-) be the Rinotts constant and can
be obtained from tables of (Wilcox, 1984).
| • | Take
a random sample of Ti-t0 additional observations
for all designs i∈I and compute the overall sample means: |
|
For all iεI
| • | Select
the design i∈I with the smallest
as the best design |
|
Nelson et al. (2001) have shown that with a
probability of at least 1-(α2+α3) this approach
selected the best design in the subset G. Therefore, if G contains at least
one of the top m% designs then this procedure selects a good design with probability
1-(α2+α3). From the OO, the selected set G contains
at least one of the best m% designs with probability:
Therefore, the P (the selected design by three-stage procedure is in the top
m% designs)≥(1-α1)(1-(α2+α3)).
NUMERICAL EXAMPLE
Consider the queuing designs when the inter arrival times and the service times are exponentially distributed and the design has one server. This queuing design is known as M/M/1 queuing design. In this design, the arrival process follows a Poisson distribution with rate λ (the mean arrival rate) and the service time is exponentially distributed with parameter μ (the mean service rate). Suppose we have n M/M/1 queuing designs and we want to select one of the best m% queuing models that have the minimum average waiting time per customer. We assume that the arrival rate λ is fixed and the service rate με[a, b]. Of course this problem has analytical solution in the steady state case. We implement the algorithm of three-stage procedure for solving this problem under some parameter settings. In the first experiment, we assume that n = 1000, λ = 1 and με[4, 5]. We discretize the problem by assuming that Θ = {4.001, 4.002,...., 5.0000}. Of course, the best design is 1000th design with μ1000 = 5. Suppose we want to select one of the best (2%) designs. Then we are shooting for designs 981-1000. Let |G| = 50, α2 = α3 = 0.005 and δ = 0.05. We consider it as a correct selection if the selected design belongs to {μ981, μ982,..., μ1000}. Note that, we can evaluate the analytical probability of the correct selection as:
| Table 1: | The
numerical illustration for n = 1000, g = 50, t0 = 30, m% =
2% |
 |
|
| Table 2: | The
P (CS) and
for n = 1000, g = 50, t0 = 30, m% = 2% over 100 replications |
 |
|
Table 1 contains the result of this illustration with 10
replications for selecting one of the best (2%) designs. From the Table
1, best is referring to the index of the design that has been selected by
three-stage procedure as the best design,
is the unknown average waiting time for the actual best design i*
(here i* = 1000 and
= 0.25),
is the unknown average waiting time for the best design b (b = Best),
is the sample mean for the actual best design I,
is the sample mean for the best design b, three-stage
and the analytical
.
Note that we can find
and
from the formula:
Where:
| μi |
= |
The service rate for the design |
| i and λi |
= |
The arrival rate for the design i |
|
Moreover, after the simulation is performed,
can be calculated according to the design output. Note that in some replications
there are negative values for three-stage E (OC) whereas E (OC) should be positive
values. The reason for this is that in the first stage of three-stage procedure
we used OO to select randomly the set G so in some replications we are not selecting
the actual best design i* from the first stage. This means that the
actual best design i* is not belongs to the set G. It implying that
we have not take into consideration, the addition observation in stage three
compare to the best design b. Therefore the sample mean for the best design
b sometime become less than the sample mean for the actual best design
and this implies that the three-stage
. The above numerical examples are extended for 100 replications and the results
are shown in Table 2. It is clear that each P (CS) and
values (the average number of Expected Opportunity Cost) for three-stage procedure
is closed to the analytical values. Figure 1 shows the relation
between the E (OC) for the three-stage procedure and the analytical E (OC) when
this experiment were repeated for 100 replications.
| | Fig. 1: | Relation
between three-stage E (OC) and Analytical E (OC) for n = 1000, g = 50,
t0 = 30, m% = 2% over 100 replications |
|
| | Fig. 2: | Relation
between three-stage E (OC) and Analytical E (OC) for n = 1000, g = 50,
t0 = 30, m% = 2% over 100 replications |
|
If we take the difference between the
and
as absolute value (positive) then we can see the relation between three-stage
E (OC) and Analytical E (OC) in Fig. 2. Note that three-stage
procedure produce a very small E (OC) and are mostly closed to the values of
the analytical E (OC). For the next experiment, assume n = 1000, λ = 1
and με[4, 5]. We want to select one of the best (5%) designs. Let
g = 50, α2 = α3 = 0.005 and δ = 0.05. We
consider it as a correct selection if the selected design belongs to {μ951,
μ952,..., μ1000}. Note that we can evaluate
the analytical probability of the correct selection as:
Table 3 contains the result of this illustration with 10
replications for selecting one of the best (5%) designs. The above numerical
experiments are extended for 100 replications and the results are shown in Table
4. The following still show that each value of P (CS) and for three-stage
procedure is closed to the analytical values.
| | Fig. 3: | Relation
between three-stage E (OC) and Analytical E (OC) for n = 1000, g = 50,
t0 = 30, m% = 2% over 100 replications |
|
| | Fig. 4: | Relation
between three-stage E (OC) and Analytical E (OC) for n = 1000, g = 50,
t0 = 30, m% = 2% over 100 replications |
|
| Table 3: | The
numerical illustration for n = 1000, g = 50, t0 = 30, m% =
2% |
 |
|
| Table 4: | The
P (CS) and
for n = 1000, g = 50, t0 = 20, m% = 5% over 100 replications |
 |
|
Figure 3 show the relation between the E (OC) for the three-stage
procedure and the analytical E (OC) when this experiment were repeated for 100
replications. If we take the difference between the
and
as absolute value then we can see the relation between three-stage E (OC) and
Analytical E (OC) in Fig. 4.
CONCLUSION
In this study, we have presented a three-stage procedure that selects one of
the good enough simulated design in contexts of E (OC) of a potentially incorrect
selection. In the first stage, we use the ordinal optimization approach to select
a relatively small subset G where the intersection with the set that contains
the actual best m% alternatives is high. Then subset Selection approach are
used to select a smaller subset I that contains the best alternative of G where
|I|<20. These are following by the indifference-zone approach to select the
best alternative of I which highly results in selecting an alternative that
belongs to the top m% alternatives. We have also discussed how to implement
the algorithm for a given problem involving M/M/1 queuing models.
The numerical results indicate that the three-stage procedure is sound and work practically well. We also note that the P (CS) values for this procedure are much closed to the values of analytical P (CS). In addition the three-stage procedure also guarantees to provide the minimum value of E (OC). In future research we would like to improve the three-stage procedure to reduce the E (OC) and applied it in real life problems. Moreover, we will study the efficiency of three-stage procedure when there is a changing in the initial sample size t0, since we cannot ignore that from the numerical example, it shows that the efficiency of three-stage procedure is sensitive to t0.
ACKNOWLEDGEMENT
The researchers would like to thank School of Mathematical Sciences, University Sains Malaysia for supporting this research.