python - More efficient way to get nearest center -
my data object instance of:
class data_instance: def __init__(self, data, tlabel): self.data = data # 1xd numpy array self.true_label = tlabel # integer {1,-1} so far in code, have list called data_history full data_istance , set of centers (numpy array shape (k,d)).
for given data_instance new_data, want:
1/ nearest center
new_datacenters(by euclidean distance) let callednearest_center.2/ iterate trough
data_historyand:- 2.1/ select elements nearest center
nearest_center(result of 1/) list calledneighbors. - 2.2/ labels of object in
neighbors.
- 2.1/ select elements nearest center
bellow code work steel slow , looking more efficient.
my code
for 1/
def getnearestcenter(data,centers): if centers.shape != (1,2): dist_ = np.sqrt(np.sum(np.power(data-centers,2),axis=1)) # compute distance between data , centers center = centers[np.argmin(dist_)] # return center have minimum distance data else: center=centers[0] return center for 2/ (to optimize)
def getlabel(datapoint, c, history): labels = [] cluster = getnearestcenter(datapoint.data,c) x in history: if np.all(getnearestcenter(x.data,c) == cluster): labels.append(x.true_label) return labels
you should rather use optimized cdist scipy.spatial more efficient calculating numpy,
from scipy.spatial.distance import cdist dist = cdist(data, c, metric='euclidean') dist_idx = np.argmin(dist, axis=1) an more elegant solution use scipy.spatial.ckdtree (as pointed out @saullo castro in comments), faster large dataset,
from scipy.spatial import ckdtree tr = ckdtree(c) dist, dist_idx = tr.query(data, k=1)
Comments
Post a Comment