python - More efficient way to get nearest center -


my data object instance of:

class data_instance:     def __init__(self, data, tlabel):         self.data = data # 1xd numpy array         self.true_label = tlabel # integer {1,-1} 

so far in code, have list called data_history full data_istance , set of centers (numpy array shape (k,d)).

for given data_instance new_data, want:

  • 1/ nearest center new_data centers (by euclidean distance) let called nearest_center.

  • 2/ iterate trough data_history and:

    • 2.1/ select elements nearest center nearest_center (result of 1/) list called neighbors.
    • 2.2/ labels of object in neighbors.

bellow code work steel slow , looking more efficient.

my code

for 1/

def getnearestcenter(data,centers):      if centers.shape != (1,2):         dist_ = np.sqrt(np.sum(np.power(data-centers,2),axis=1)) # compute distance between data , centers          center = centers[np.argmin(dist_)] # return center have minimum distance data      else:         center=centers[0]     return center 

for 2/ (to optimize)

def getlabel(datapoint, c, history):      labels = []     cluster = getnearestcenter(datapoint.data,c)     x in history:         if  np.all(getnearestcenter(x.data,c) == cluster):             labels.append(x.true_label)     return labels 

you should rather use optimized cdist scipy.spatial more efficient calculating numpy,

from scipy.spatial.distance import cdist  dist = cdist(data, c, metric='euclidean') dist_idx = np.argmin(dist, axis=1) 

an more elegant solution use scipy.spatial.ckdtree (as pointed out @saullo castro in comments), faster large dataset,

from scipy.spatial import ckdtree  tr = ckdtree(c) dist, dist_idx = tr.query(data, k=1)  

Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -