python - More efficient way to get nearest center -
my data object instance of:
class data_instance:     def __init__(self, data, tlabel):         self.data = data # 1xd numpy array         self.true_label = tlabel # integer {1,-1} so far in code, have list called data_history full data_istance , set of centers (numpy array shape (k,d)).
for given data_instance new_data, want:
- 1/ nearest center - new_data- centers(by euclidean distance) let called- nearest_center.
- 2/ iterate trough - data_historyand:- 2.1/ select elements nearest center nearest_center(result of 1/) list calledneighbors.
- 2.2/ labels of object in neighbors.
 
- 2.1/ select elements nearest center 
bellow code work steel slow , looking more efficient.
my code
for 1/
def getnearestcenter(data,centers):      if centers.shape != (1,2):         dist_ = np.sqrt(np.sum(np.power(data-centers,2),axis=1)) # compute distance between data , centers          center = centers[np.argmin(dist_)] # return center have minimum distance data      else:         center=centers[0]     return center for 2/ (to optimize)
def getlabel(datapoint, c, history):      labels = []     cluster = getnearestcenter(datapoint.data,c)     x in history:         if  np.all(getnearestcenter(x.data,c) == cluster):             labels.append(x.true_label)     return labels 
you should rather use optimized cdist scipy.spatial more efficient calculating numpy,
from scipy.spatial.distance import cdist  dist = cdist(data, c, metric='euclidean') dist_idx = np.argmin(dist, axis=1) an more elegant solution use scipy.spatial.ckdtree (as pointed out @saullo castro in comments), faster large dataset,
from scipy.spatial import ckdtree  tr = ckdtree(c) dist, dist_idx = tr.query(data, k=1)  
Comments
Post a Comment