• python实现kNN(最近邻)


    什么是最近邻?

    最近邻可以用于分类和回归,这里以分类为例。给定一个训练集,对新输入的实例,在训练数据集中找到与该实例最接近的k个实例,这k个实例的多数属于某个类,就把该输入实例分为这个类

    最近邻模型的三个基本要素?

    距离度量、K值的选择和分类决策规则。

    距离度量:一般是欧式距离,也可以是Lp距离和曼哈顿距离。

    下面是一个具体的例子:

    k值怎么选择?

    接下来是代码实现:

    from __future__ import print_function, division
    import numpy as np
    from mlfromscratch.utils import euclidean_distance
    
    class KNN():
        """ K Nearest Neighbors classifier.
    
        Parameters:
        -----------
        k: int
            The number of closest neighbors that will determine the class of the 
            sample that we wish to predict.
        """
        def __init__(self, k=5):
            self.k = k
    
        def _vote(self, neighbor_labels):
            """ Return the most common class among the neighbor samples """
            counts = np.bincount(neighbor_labels.astype('int'))
            return counts.argmax()
    
        def predict(self, X_test, X_train, y_train):
            y_pred = np.empty(X_test.shape[0])
            # Determine the class of each sample
            for i, test_sample in enumerate(X_test):
                # Sort the training samples by their distance to the test sample and get the K nearest
                idx = np.argsort([euclidean_distance(test_sample, x) for x in X_train])[:self.k]
                # Extract the labels of the K nearest neighboring training samples
                k_nearest_neighbors = np.array([y_train[i] for i in idx])
                # Label sample as the most common class label
                y_pred[i] = self._vote(k_nearest_neighbors)
    
            return y_pred
            

    其中一些numpy中的函数用法:

    numpy.bincount()

    numpy.argmax():

    numpy.argsort():返回排序后数组的索引

     

    接着是其中使用到了euclidean_distance():

    def euclidean_distance(x1, x2):
        """ Calculates the l2 distance between two vectors """
        distance = 0
        # Squared distance between each coordinate
        for i in range(len(x1)):
            distance += pow((x1[i] - x2[i]), 2)
        return math.sqrt(distance)

    这里使用的是l2距离。

    运行的主函数:

    from __future__ import print_function
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn import datasets
    
    from mlfromscratch.utils import train_test_split, normalize, accuracy_score
    from mlfromscratch.utils import euclidean_distance, Plot
    from mlfromscratch.supervised_learning import KNN
    
    def main():
        data = datasets.load_iris()
        X = normalize(data.data)
        y = data.target
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
    
        clf = KNN(k=5)
        y_pred = clf.predict(X_test, X_train, y_train)
        
        accuracy = accuracy_score(y_test, y_pred)
    
        print ("Accuracy:", accuracy)
    
        # Reduce dimensions to 2d using pca and plot the results
        Plot().plot_in_2d(X_test, y_pred, title="K Nearest Neighbors", accuracy=accuracy, legend_labels=data.target_names)
    
    
    if __name__ == "__main__":
        main()

    结果:

    Accuracy: 0.9795918367346939

    理论知识:来自统计学习方法

    代码来源:https://github.com/eriklindernoren/ML-From-Scratch

  • 相关阅读:
    K8s系列【四、kubernetes核心组件工作流程及原理】
    K8s系列【五、Kubernetes实战演练】
    Linux系列【服务器安全篇】
    K8s系列【三、Kubernetes架构】
    Docker系列【离线安装指定版本的docker】
    K8s系列【配置Harbor私有仓库】
    Docker系列【docker: Error response from daemon: ...(iptables failed: iptables wait t nat A DOCKER p tcp d 0/0 dport 80 j DNAT】
    K8s系列【一、为什么要学习K8s?】
    K8s系列【二、K8s是什么?】
    CF1628AMeximum Array【二分】
  • 原文地址:https://www.cnblogs.com/xiximayou/p/12827812.html
Copyright © 2020-2023  润新知