MAM Vol. 23 No. 2 April 2017

270 J. Zelenty et al. In this work, a new cluster search algorithm, which uses a

learning algorithms used to cluster data, is first introduced to illustrate the key ideas underlying iterative clustering algorithms. Next the development of GEMA is described in detail, focusing on the interpretation of GMMs and how they are estimated using EM. The paper is concluded by illustrating the successful application ofGEMAto both simulated and real APT data showing, respectively, that GEMA is often superior to the state-of-the-art and that GEMA provides useful cluster representations of real APT data. Overall, this novel approach provides a powerful, statistically principled, and immensely extensible baseline for cluster detection in APT.

AFIRST APPROACH: K-MEANS

k-means is one of the simplest and most common clustering algorithms in statistics and machine learning (Bishop, 2006). After a random initialization, k-means jointly learns cluster

locations and assigns atoms to clusters by iterating two steps: first, given cluster locations, each atom is assigned to the nearest cluster; second, given atom assignments to clusters, each cluster center is placed at the average location of all atoms assigned to that cluster (Fig. 1). The value of this approach is that it breaks down a complex problem, i.e. jointly determining cluster parameters and assigning atoms, into two simple sub- problems, which are iteratively solved. GEMA uses a similar iterative approach, as do, more generally, all EMalgorithms. However, there are several disadvantages to using

k-means on APT data. Primarily, all atoms must belong to a cluster in k-means, and thus atom assignments to the matrix are not possible (Fig. 2). In addition, atoms must determi- nistically belong to a single cluster, ignoring matrix/cluster and cluster/cluster uncertainty. Near precipitate interfaces, in particular, matrix/cluster uncertainty is a real scientific phenomenon which should be reflected in the modeling assumptions. Currently, all APT clustering algorithms share this undesirable property with k-means.

GEMA

GMMs, which are essentially a more sophisticated alter- native to k-means, can also be used to identify clusters. GEMA, which utilizes GMMs, overcomes the primary problem with k-means by explicitly modeling the matrix

Gaussian mixture model (GMM) to probabilistically distinguish clusters from the matrix, is developed. This unsupervised machine learning algorithm maximizes the data likelihood via expectation maximization (EM). Specifically, using APT data, the algorithm learns the position and size of each cluster. This new clustering algorithm is called the Gaussian mixture model Expectation Maximization Algorithm (GEMA). Arguably, the most important feature of GEMA is that it eliminates the need for external parameter selection while simultaneously improving power, thus providing a key step toward routine reproducibility of quantitative measurements. In this paper, k-means, one of the simplest machine

as an additional Gaussian component with infinite variance. A (bounded) Gaussian with infinite variance is identical to a uniform distribution, which formalizes the assumption that unclustered solute is distributed “randomly” in the matrix.a GEMA addresses the latter shortcoming by probabilistically assigning atoms to clusters and the matrix, thus eliminating the strict cluster assignments imposed by k-means. In regards to cluster assignments, GEMA labels atoms

with a probability of 50% or greater as belonging to a cluster. Although the probability of an atom belonging to the cluster has been made visible to the user via atomic shading, this is not an arbitrary parameter. A threshold of 50% is used by default in the algorithm – as it is Bayes optimal in many contexts – and therefore cluster assignments remain consistent. However, atomic probabilities can be extracted and

utilized for additional analyses, such as cluster composition and cluster size (number of atoms). For instance, cluster composition could be determined by weighting each atom by its probability of belonging to the cluster. In addition, there are frequently potential gradients in

chemistry and crystal structure around the precipitate bound- ary, owing simply to thermodynamics. Therefore, it is import- ant to note that missing data, specifically at the boundary, could potentially affect cluster size. However, this problem is due to thenatureofAPTas atechnique,which lies outsidethe scopeof this paper. In regards to chemical gradients, GEMA does not utilize the chemical identity of an atom when determining the probability of an atom belonging to a cluster. Therefore, chemical uncertainty does not affect cluster definition.

GMMs

GEMA utilizes a GMM (Bishop, 2006) to probabilistically learn cluster locations and assign atoms to clusters and the matrix. As its name suggests, a GMM is a distribution defined by a weighted sum, or mixture, of Gaussians:

pX

ðÞ=X K

k=1 wkNXjμk; σ2 k ; (1)

component, respectively. The goal of GEMA is to find a set of parameters, wk, μk,and σ2

distribution optimallymatches the empirical distribution. In its current state, GEMA assumes that the covariance

k are the weight, mean, and variance of the kth Gaussian k for each k, such that this analytic

where K is the number of Gaussian components and wk, μk,and σ2

matrix is spherical, or a multiple of the identity matrix. This is equivalent to the assumption that all clusters are spherical. However, despite this assumption GEMA works remarkably well on real APT data where this clearly isn’t the case (see Atom Probe Data section). Furthermore, generalizing the covariance matrix to allow ellipsoidal clusters in a future version of GEMA would be relatively straightforward. It should be noted that the current version of GEMA takes only solute atoms into consideration. Solute elements

aIn practice, this assumption is not necessarily true, nor does it need to be true in order for GEMA to be implemented effectively.

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52 | Page 53 | Page 54 | Page 55 | Page 56 | Page 57 | Page 58 | Page 59 | Page 60 | Page 61 | Page 62 | Page 63 | Page 64 | Page 65 | Page 66 | Page 67 | Page 68 | Page 69 | Page 70 | Page 71 | Page 72 | Page 73 | Page 74 | Page 75 | Page 76 | Page 77 | Page 78 | Page 79 | Page 80 | Page 81 | Page 82 | Page 83 | Page 84 | Page 85 | Page 86 | Page 87 | Page 88 | Page 89 | Page 90 | Page 91 | Page 92 | Page 93 | Page 94 | Page 95 | Page 96 | Page 97 | Page 98 | Page 99 | Page 100 | Page 101 | Page 102 | Page 103 | Page 104 | Page 105 | Page 106 | Page 107 | Page 108 | Page 109 | Page 110 | Page 111 | Page 112 | Page 113 | Page 114 | Page 115 | Page 116 | Page 117 | Page 118 | Page 119 | Page 120 | Page 121 | Page 122 | Page 123 | Page 124 | Page 125 | Page 126 | Page 127 | Page 128 | Page 129 | Page 130 | Page 131 | Page 132 | Page 133 | Page 134 | Page 135 | Page 136 | Page 137 | Page 138 | Page 139 | Page 140 | Page 141 | Page 142 | Page 143 | Page 144 | Page 145 | Page 146 | Page 147 | Page 148 | Page 149 | Page 150 | Page 151 | Page 152 | Page 153 | Page 154 | Page 155 | Page 156 | Page 157 | Page 158 | Page 159 | Page 160 | Page 161 | Page 162 | Page 163 | Page 164 | Page 165 | Page 166 | Page 167 | Page 168 | Page 169 | Page 170 | Page 171 | Page 172 | Page 173 | Page 174 | Page 175 | Page 176 | Page 177 | Page 178 | Page 179 | Page 180 | Page 181 | Page 182 | Page 183 | Page 184 | Page 185 | Page 186 | Page 187 | Page 188 | Page 189 | Page 190 | Page 191 | Page 192 | Page 193 | Page 194 | Page 195 | Page 196 | Page 197 | Page 198 | Page 199 | Page 200 | Page 201 | Page 202 | Page 203 | Page 204 | Page 205 | Page 206 | Page 207 | Page 208 | Page 209 | Page 210 | Page 211 | Page 212 | Page 213 | Page 214 | Page 215 | Page 216 | Page 217 | Page 218 | Page 219 | Page 220 | Page 221 | Page 222 | Page 223 | Page 224 | Page 225 | Page 226 | Page 227 | Page 228 | Page 229 | Page 230 | Page 231 | Page 232 | Page 233 | Page 234 | Page 235 | Page 236 | Page 237 | Page 238 | Page 239 | Page 240 | Page 241 | Page 242 | Page 243 | Page 244 | Page 245 | Page 246 | Page 247 | Page 248 | Page 249 | Page 250 | Page 251 | Page 252 | Page 253 | Page 254 | Page 255 | Page 256 | Page 257 | Page 258 | Page 259 | Page 260 | Page 261 | Page 262 | Page 263 | Page 264 | Page 265 | Page 266 | Page 267 | Page 268 | Page 269 | Page 270 | Page 271 | Page 272 | Page 273 | Page 274 | Page 275 | Page 276 | Page 277 | Page 278 | Page 279 | Page 280 | Page 281 | Page 282 | Page 283 | Page 284

orderForm.title