python - More efficient way to get nearest center -
my data object instance of:
class data_instance: def __init__(self, data, tlabel): self.data = data # 1xd numpy array self.true_label = tlabel # integer {1,-1}
so far in code, have list called data_history
full data_istance
, set of centers
(numpy array shape (k,d)).
for given data_instance new_data
, want:
1/ nearest center
new_data
centers
(by euclidean distance) let callednearest_center
.2/ iterate trough
data_history
and:- 2.1/ select elements nearest center
nearest_center
(result of 1/) list calledneighbors
. - 2.2/ labels of object in
neighbors
.
- 2.1/ select elements nearest center
bellow code work steel slow , looking more efficient.
my code
for 1/
def getnearestcenter(data,centers): if centers.shape != (1,2): dist_ = np.sqrt(np.sum(np.power(data-centers,2),axis=1)) # compute distance between data , centers center = centers[np.argmin(dist_)] # return center have minimum distance data else: center=centers[0] return center
for 2/ (to optimize)
def getlabel(datapoint, c, history): labels = [] cluster = getnearestcenter(datapoint.data,c) x in history: if np.all(getnearestcenter(x.data,c) == cluster): labels.append(x.true_label) return labels
you should rather use optimized cdist
scipy.spatial
more efficient calculating numpy,
from scipy.spatial.distance import cdist dist = cdist(data, c, metric='euclidean') dist_idx = np.argmin(dist, axis=1)
an more elegant solution use scipy.spatial.ckdtree
(as pointed out @saullo castro in comments), faster large dataset,
from scipy.spatial import ckdtree tr = ckdtree(c) dist, dist_idx = tr.query(data, k=1)
Comments
Post a Comment