Py学习  »  Python

K在Python中是指从头开始

Jerry M. • 5 年前 • 1563 次点击  

我有一个k-means算法的python代码。 我很难理解它的作用。 像这样的线条 C = X[numpy.random.choice(X.shape[0], k, replace=False), :] 我很困惑。

有人能解释一下这段代码到底在做什么吗? 谢谢你

def k_means(data, k, num_of_features):
    # Make a matrix out of the data
    X = data.as_matrix()
    # Get k random points from the data
    C =  X[numpy.random.choice(X.shape[0], k, replace=False), :]
    # Remove the last col
    C = [C[j][:-1] for j in range(len(C))]
    # Turn it into a numpy array
    C = numpy.asarray(C)
    # To store the value of centroids when it updates
    C_old = numpy.zeros(C.shape)
    # Make an array that will assign clusters to each point
    clusters = numpy.zeros(len(X))
    # Error func. - Distance between new centroids and old centroids
    error = dist(C, C_old, None)
    # Loop will run till the error becomes zero of 5 tries
    tries = 0
    while error != 0 and tries < 1:
        # Assigning each value to its closest cluster
        for i in range(len(X)):
            # Get closest cluster in terms of distance
            clusters[i] = dist1(X[i][:-1], C)
        # Storing the old centroid values
        C_old = deepcopy(C)
        # Finding the new centroids by taking the average value
        for i in range(k):
            # Get all of the points that match the cluster you are on
            points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]
            # If there were no points assigned to cluster, put at origin
            if not points:
                C[i][:] = numpy.zeros(C[i].shape)
            else:
                # Get the average of all the points and put that centroid there
                C[i] = numpy.mean(points, axis=0)
        # Erro is the distance between where the centroids use to be and where they are now
        error = dist(C, C_old, None)
        # Increase tries
        tries += 1
    return sil_coefficient(X,clusters,k)
Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/48446
 
1563 次点击  
文章 [ 1 ]  |  最新文章 5 年前