Numpy 是 Python 专门处理高维数组 (high dimensional array) 的计算的包。官网 (www.numpy.org).
列表和数组区别
列表:数据类型可以不同——3.1413, ‘pi’, 3.1404, [3.1401, 3.1349], ‘3.1376’
数组:数据类型相同——3.1413, 3.1398, 3.1404, 3.1401, 3.1349, 3.1376
numpy数组及其运算
创建数组
1 2
| import numpy as np np.array([1,2,3,4,5])
|
array([1, 2, 3, 4, 5])
array([1, 2, 3, 4, 5])
array([0, 1, 2, 3, 4])
1
| np.array([[1,2,3],[4,5,6]])
|
array([[1, 2, 3],
[4, 5, 6]])
数组的创建—定隔定点的 np.arange() 和 np.linspace()
更常见的两种创建 numpy 数组方法:
(1)定隔的 arange:固定元素大小间隔:arange(start , stop , step)
说明:其中 stop 必须要有,start 和 step 没有的话默认为 1。
array([0, 1, 2, 3, 4, 5, 6, 7])
array([1, 3, 5, 7, 9])
(2)定点的 linspace:固定元素个数: linspace (start , stop , num)
说明:其中 start 和 stop 必须要有,num 没有的话默认为 50。
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
1
| np.linspace(0,10,11,endpoint=False)
|
array([0. , 0.90909091, 1.81818182, 2.72727273, 3.63636364,
4.54545455, 5.45454545, 6.36363636, 7.27272727, 8.18181818,
9.09090909])
array([1.00000000e+000, 1.29154967e+011, 1.66810054e+022, 2.15443469e+033,
2.78255940e+044, 3.59381366e+055, 4.64158883e+066, 5.99484250e+077,
7.74263683e+088, 1.00000000e+100])
1
| np.logspace(1,6,5,base=2)
|
array([ 2. , 4.75682846, 11.3137085 , 26.90868529, 64. ])
数组的创建—使用NumPy中函数创建ndarray数组
array([0., 0., 0.])
array([1., 1., 1.])
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
array([[0., 0., 0.]])
array([[0.],
[0.],
[0.]])
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
array([[1., 1., 1.]])
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
array([[1., 0.],
[0., 1.]])
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
array([0.08 , 0.10492407, 0.17699537, 0.28840385, 0.42707668,
0.5779865 , 0.7247799 , 0.85154952, 0.94455793, 0.9937262 ,
0.9937262 , 0.94455793, 0.85154952, 0.7247799 , 0.5779865 ,
0.42707668, 0.28840385, 0.17699537, 0.10492407, 0.08 ])
array([-1.38777878e-17, 5.08696327e-02, 2.58000502e-01, 6.30000000e-01,
9.51129866e-01, 9.51129866e-01, 6.30000000e-01, 2.58000502e-01,
5.08696327e-02, -1.38777878e-17])
array([0.03671089, 0.16199525, 0.36683806, 0.61609304, 0.84458838,
0.98167828, 0.98167828, 0.84458838, 0.61609304, 0.36683806,
0.16199525, 0.03671089])
使用random() 创建随机 n 维数组
1
| np.random.randint(0,50,5)
|
array([28, 2, 43, 17, 22])
1
| np.random.randint(0,50,(3,5))
|
array([[12, 47, 33, 10, 2],
[23, 6, 17, 7, 24],
[11, 39, 24, 26, 30]])
array([0.09670211, 0.99259248, 0.5341233 , 0.71901056, 0.90347888,
0.91331606, 0.08372737, 0.79742945, 0.31878994, 0.55094098])
1
| np.random.standard_normal(5)
|
array([-0.14415678, 1.08512822, 0.99102915, -0.49645761, -1.51070848])
1 2
| x=np.random.standard_normal(size=(3,4,2)) x
|
array([[[ 0.85754025, -0.28172407],
[ 0.23733268, 0.87116374],
[-0.62887857, -1.36845296],
[-0.53553569, 1.01818603]],
[[-0.48036281, 0.30293107],
[-0.41027798, 0.87720648],
[-0.74450358, -0.60304887],
[-0.48493204, -1.9946126 ]],
[[ 0.19694871, 0.53144363],
[ 0.032769 , -1.362219 ],
[-0.03665392, 1.49506453],
[ 1.55785374, -0.96754543]]])
array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
测试两个数组的对应元素是否足够接近
1 2 3 4 5 6 7 8
| import numpy as np x=np.array([1,2,3,4.001,5]) y=np.array([1,1.999,3,4.01,5.1]) print(np.allclose(x,y)) print(np.allclose(x,y,rtol=0.2)) print(np.allclose(x,y,atol=0.2)) print(np.isclose(x,y)) print(np.isclose(x,y,atol=0.2))
|
False
True
True
[ True False True False False]
[ True True True True True]
修改数组中的元素值
1 2 3
| import numpy as np x=np.arange(8) x
|
array([0, 1, 2, 3, 4, 5, 6, 7])
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
array([ 0, 1, 2, 3, 4, 5, 6, 7, 9, 10])
array([0, 1, 2, 3, 4, 5, 6, 7])
array([0, 1, 2, 8, 4, 5, 6, 7])
array([0, 8, 1, 2, 8, 4, 5, 6, 7])
1
| x=np.array([[1,2,3],[4,5,6],[7,8,9]])
|
array([[1, 2, 4],
[4, 1, 1],
[7, 1, 1]])
array([[1, 2, 4],
[4, 1, 2],
[7, 1, 2]])
1 2
| x[1:,1:]=[[1,2],[3,4]] x
|
array([[1, 2, 4],
[4, 1, 2],
[7, 3, 4]])
数组与标量的运算
1 2 3
| import numpy as np x=np.array((1,2,3,4,5)) x
|
array([1, 2, 3, 4, 5])
array([ 2, 4, 6, 8, 10])
array([0.5, 1. , 1.5, 2. , 2.5])
array([0, 1, 1, 2, 2], dtype=int32)
array([ 1, 8, 27, 64, 125], dtype=int32)
array([3, 4, 5, 6, 7])
array([1, 2, 0, 1, 2], dtype=int32)
array([ 2, 4, 8, 16, 32], dtype=int32)
array([2. , 1. , 0.66666667, 0.5 , 0.4 ])
array([63, 31, 21, 15, 12], dtype=int32)
数组与数组的运算
1
| np.array([1,2,3,4])+np.array([4,3,2,1])
|
array([5, 5, 5, 5])
1
| np.array([1,2,3,4])+np.array([4])
|
array([5, 6, 7, 8])
array([2, 4, 6])
array([1, 4, 9])
array([0, 0, 0])
array([1., 1., 1.])
array([ 1, 4, 27], dtype=int32)
1
| b=np.array(([1,2,3],[4,5,6],[7,8,9]))
|
array([[ 1, 4, 9],
[ 4, 10, 18],
[ 7, 16, 27]])
array([[ 2, 4, 6],
[ 5, 7, 9],
[ 8, 10, 12]])
数组排序
1 2 3 4 5 6
| import numpy as np x=np.array([3,1,2]) np.argsort(x)
|
array([1, 2, 0], dtype=int64)
array([1, 2, 3])
1 2
| x=np.array([3,1,2,4]) x.argmax(),x.argmin()
|
(3, 1)
array([1, 2, 0, 3], dtype=int64)
array([1, 2, 3, 4])
array([1, 2, 3, 4])
1
| x=np.random.randint(1,10,(2,5))
|
array([[3, 2, 6, 9, 9],
[8, 5, 5, 1, 1]])
array([[2, 3, 6, 9, 9],
[1, 1, 5, 5, 8]])
数组的内积运算
x⋅y=i=1∑nxiyi
1 2 3 4 5
| x=np.array((1,2,3)) y=np.array((4,5,6)) print(np.dot(x,y)) print(x.dot(y)) print(sum(x*y))
|
32
32
32
访问数组中的元素
1 2
| b=np.array(([1,2,3],[4,5,6],[7,8,9])) b
|
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
array([1, 2, 3])
1
3
array([[1, 2, 3],
[4, 5, 6]])
array([3, 8, 4])
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
array([0, 2, 4, 6, 8])
array([0, 1, 2, 3, 4])
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24])
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
array([2, 3, 4])
array([5, 6, 7, 8, 9])
array([[12, 13, 14],
[17, 18, 19],
[22, 23, 24]])
array([ 7, 19])
array([[ 7, 8],
[17, 18]])
array([ 3, 8, 13, 18, 23])
array([[ 2, 4],
[ 7, 9],
[12, 14],
[17, 19],
[22, 24]])
array([[ 5, 6, 7, 8, 9],
[15, 16, 17, 18, 19]])
array([[ 7, 9],
[17, 19]])
数组对函数运算的支持
1 2 3
| x=np.arange(0,100,10,dtype=np.floating) print(x) print(np.sin(x))
|
[ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90.]
[ 0. -0.54402111 0.91294525 -0.98803162 0.74511316 -0.26237485
-0.30481062 0.77389068 -0.99388865 0.89399666]
1 2
| x=np.array(([1,2,3],[4,5,6],[7,8,9])) print(x)
|
[[1 2 3]
[4 5 6]
[7 8 9]]
[[ 0.54030231 -0.41614684 -0.9899925 ]
[-0.65364362 0.28366219 0.96017029]
[ 0.75390225 -0.14550003 -0.91113026]]
1
| print(np.round(np.cos(x)))
|
[[ 1. -0. -1.]
[-1. 0. 1.]
[ 1. -0. -1.]]
[[1. 1. 2.]
[2. 3. 3.]
[4. 4. 5.]]
改变数组形状
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
(10,)
10
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
(2, 5)
array([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8],
[ 9, 10]])
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6100/1914000034.py in <module>
----> 1 x.reshape((1,10)) # reshape()不能修改数组元素个数,出错
ValueError: cannot reshape array of size 5 into shape (1,10)
array([[0, 1, 2, 3, 4, 0, 0, 0, 0, 0]])
array([[0, 1, 2]])
array([[0, 1, 2, 3, 4, 0, 0, 0, 0, 0]])
数组布尔运算
array([0.23494488, 0.62911978, 0.08695988, 0.49816789, 0.78656564,
0.41347271, 0.09284217, 0.50007711, 0.37424032, 0.38977916])
array([False, True, False, False, True, False, False, True, False,
False])
array([0.62911978, 0.78656564, 0.50007711])
array([ True, False, True, True, False, True, True, False, True,
True])
3
True
False
1 2
| a=np.array([1,2,3]) b=np.array([3,2,1])
|
array([False, False, True])
array([3])
array([False, True, False])
array([2])
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
array([6, 8])
array([6, 8])
分段函数
1 2
| x=np.random.randint(0,10,size=(1,10)) x
|
array([[7, 5, 6, 0, 8, 6, 3, 7, 6, 2]])
array([[1, 1, 1, 0, 1, 1, 0, 1, 1, 0]])
array([[7, 5, 6, 0, 8],
[6, 3, 7, 6, 2]])
1 2
| np.piecewise(x,[x<3,x>7],[lambda x:x*2,lambda x:x*3])
|
array([[ 0, 0, 0, 0, 24],
[ 0, 0, 0, 0, 4]])
1 2 3
|
np.piecewise(x,[x<3,(3<x)&(x<5),x>7],[-1,1,lambda x:x*4])
|
array([[ 0, 0, 0, -1, 32],
[ 0, 0, 0, 0, -1]])
数组堆叠与合并
1 2
| arr1=np.array([1,2,3]) arr2=np.array([4,5,6])
|
array([1, 2, 3, 4, 5, 6])
array([[1, 2, 3],
[4, 5, 6]])
1 2
| arr3=np.array([[1],[2],[3]]) arr4=np.array([[4],[5],[6]])
|
array([[1],
[2],
[3]])
array([[4],
[5],
[6]])
array([[1, 4],
[2, 5],
[3, 6]])
array([[1],
[2],
[3],
[4],
[5],
[6]])
1
| np.concatenate((arr1,arr2))
|
array([1, 2, 3, 4, 5, 6])
1
| np.concatenate((arr3,arr4))
|
array([[1],
[2],
[3],
[4],
[5],
[6]])
1
| np.concatenate((arr3,arr4),axis=1)
|
array([[1, 4],
[2, 5],
[3, 6]])
矩阵生成与常用操作
矩阵生成
1 2 3 4 5
| x=np.matrix([[1,2,3],[4,5,6]]) y=np.matrix([1,2,3,4,5,6])
print(x,y,x[1,1],sep='\n\n')
|
[[1 2 3]
[4 5 6]]
[[1 2 3 4 5 6]]
5
矩阵转置
1 2 3
| x=np.matrix([[1,2,3],[4,5,6]]) y=np.matrix([1,2,3,4,5,6]) print(x.T, y.T, sep='\n\n')
|
[[1 4]
[2 5]
[3 6]]
[[1]
[2]
[3]
[4]
[5]
[6]]
查看矩阵特征
1 2 3 4 5 6 7 8 9 10 11 12
| import numpy as np
x = np.matrix([[1,2,3], [4,5,6]]) print(x.mean(), end='\n====\n') print(x.mean(axis=0), end='\n====\n') print(x.mean(axis=0).shape, end='\n====\n') print(x.mean(axis=1), end='\n====\n') print(x.sum(), end='\n====\n') print(x.max(axis=1), end='\n====\n') print(x.argmax(axis=1), end='\n====\n') print(x.diagonal(), end='\n====\n') print(x.nonzero())
|
3.5
====
[[2.5 3.5 4.5]]
====
(1, 3)
====
[[2.]
[5.]]
====
21
====
[[3]
[6]]
====
[[2]
[2]]
====
[[1 5]]
====
(array([0, 0, 0, 1, 1, 1], dtype=int64), array([0, 1, 2, 0, 1, 2], dtype=int64))
矩阵乘法
1 2 3
| x=np.matrix([[1,2,3],[4,5,6]]) y=np.matrix([[1,2],[3,4],[5,6]]) print(x*y)
|
[[22 28]
[49 64]]
matrix([[1, 2, 3],
[4, 5, 6]])
matrix([[1, 2],
[3, 4],
[5, 6]])
计算相关系数矩阵
1 2 3 4 5 6
| import numpy as np
print(np.corrcoef([1,2,3,4], [4,3,2,1])) print(np.corrcoef([1,2,3,4], [8,3,2,1])) print(np.corrcoef([1,2,3,4], [1,2,3,4])) print(np.corrcoef([1,2,3,4], [1,2,3,40]))
|
[[ 1. -1.]
[-1. 1.]]
[[ 1. -0.91350028]
[-0.91350028 1. ]]
[[1. 1.]
[1. 1.]]
[[1. 0.8010362]
[0.8010362 1. ]]
计算方差、协方差、标准差
1 2 3 4 5 6 7 8 9 10 11 12 13
| import numpy as np
print(np.cov([1,1,1,1,1])) print(np.std([1,1,1,1,1])) x = [-2.1, -1, 4.3] y = [3, 1.1, 0.12] X = np.vstack((x,y)) print(X) print(np.cov(X)) print(np.cov(x, y)) print(np.std(X)) print(np.std(X, axis=1)) print(np.cov(x))
|
0.0
0.0
[[-2.1 -1. 4.3 ]
[ 3. 1.1 0.12]]
[[11.71 -4.286 ]
[-4.286 2.14413333]]
[[11.71 -4.286 ]
[-4.286 2.14413333]]
2.2071223094538484
[2.79404128 1.19558447]
11.709999999999999
计算特征值与特征向量
对于n×n方阵A,如果存在标量λ和n 维非0向量x,使得A·x=λx成立,那么称 λ 是方阵 A 的一个特征值,x为对应于 λ 的特征向量。
从几何意义来讲,矩阵乘以一个向量,是对这个向量进行了一个变换,从一个坐标系变换到另一个坐标系。在变换过程中,向量主要发生旋转和缩放这两种变化。如果矩阵乘以一个向量之后,向量只发生了缩放变化而没有进行旋转,那么这个向量就是该矩阵的特征向量,缩放的比例就是特征值。或者说,特征向量是对数据进行旋转之后理想的坐标轴之一,而特征值则是该坐标轴上的投影或者贡献。特征值越大,表示这个坐标轴对原向量的表达越重要,原向量在这个坐标轴上的投影越大。一个矩阵的所有特征向量组成了该矩阵的一组基,也就是新坐标系中的轴。有了特征值和特征向量之后,向量就可以在另一个坐标系中进行表示。
1 2 3 4 5 6 7 8 9 10
| import numpy as np
A = np.array([[1,-3,3],[3,-5,3], [6,-6,4]]) e, v = np.linalg.eig(A) print(e, v, sep='\n') print(np.dot(A,v)) print(e*v) print(np.isclose(np.dot(A,v), e*v))
print(np.linalg.det(A-np.eye(3,3)*e))
|
[ 4. -2. -2.]
[[-0.40824829 -0.81034214 0.1932607 ]
[-0.40824829 -0.31851537 -0.59038328]
[-0.81649658 0.49182677 -0.78364398]]
[[-1.63299316 1.62068428 -0.38652141]
[-1.63299316 0.63703074 1.18076655]
[-3.26598632 -0.98365355 1.56728796]]
[[-1.63299316 1.62068428 -0.38652141]
[-1.63299316 0.63703074 1.18076655]
[-3.26598632 -0.98365355 1.56728796]]
[[ True True True]
[ True True True]
[ True True True]]
3.197442310920445e-14
计算逆矩阵
1 2 3 4 5 6 7
| import numpy as np
x = np.matrix([[1,2,3], [4,5,6], [7,8,0]]) y = np.linalg.inv(x) print(y) print(x*y) print(y*x)
|
[[-1.77777778 0.88888889 -0.11111111]
[ 1.55555556 -0.77777778 0.22222222]
[-0.11111111 0.22222222 -0.11111111]]
[[ 1.00000000e+00 1.66533454e-16 1.38777878e-17]
[-1.05471187e-15 1.00000000e+00 2.77555756e-17]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
[[ 1.00000000e+00 -4.44089210e-16 0.00000000e+00]
[ 2.77555756e-16 1.00000000e+00 0.00000000e+00]
[ 6.93889390e-17 1.11022302e-16 1.00000000e+00]]
求解线性方程组
1 2 3 4 5 6 7 8 9
| import numpy as np
a = np.array([[3,1], [1,2]]) b = np.array([9,8]) x = np.linalg.solve(a, b) print(x) print(np.dot(a, x)) print(np.linalg.lstsq(a, b))
|
[2. 3.]
[9. 8.]
(array([2., 3.]), array([], dtype=float64), 2, array([3.61803399, 1.38196601]))
计算向量和矩阵的范数
在线性代数中,一个n维空间中的一个点,向量的长度称为模或2-范数。对于向量 (x1,x2,x3,…….,xn),其模长也就是向量与自己的内积的平方根,计算公式为:
向量p-范数计算公式为(其中p为不等于0的整数)
对于m×n的矩阵A,常用的范数有Frobenius范数(也称F-范数),其计算公式为:
矩阵A的2-范数是矩阵A的共轭转置矩阵与A的乘积的最大特征值的平方根,其计算公式为:
1 2
| x=np.matrix([[1,2],[3,-4]]) print(np.linalg.norm(x))
|
5.477225575051661
1
| print(np.linalg.norm(x,-2))
|
1.9543950758485487
1
| print(np.linalg.norm(x,-1))
|
4.0
1
| print(np.linalg.norm([1,2,0,3,4,0],0))
|
4.0
1
| print(np.linalg.norm([1,2,0,3,4],2))
|
5.477225575051661
奇异值分解
1 2
| a=np.matrix([[1,2,3],[4,5,6],[7,8,9]]) u,s,v=np.linalg.svd(a)
|
matrix([[-0.21483724, 0.88723069, 0.40824829],
[-0.52058739, 0.24964395, -0.81649658],
[-0.82633754, -0.38794278, 0.40824829]])
array([1.68481034e+01, 1.06836951e+00, 3.33475287e-16])
matrix([[-0.47967118, -0.57236779, -0.66506441],
[-0.77669099, -0.07568647, 0.62531805],
[-0.40824829, 0.81649658, -0.40824829]])
matrix([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]])
函数向量化
1 2
| mat=np.matrix([[1,2,3],[4,5,6]]) mat
|
matrix([[1, 2, 3],
[4, 5, 6]])
1 2 3
| import math a = math.factorial(4) print(a)
|
24
1 2
| vecFactorial=np.vectorize(math.factorial) vecFactorial(mat)
|
matrix([[ 1, 2, 6],
[ 24, 120, 720]])
来自书籍:《Python数据分析、挖掘与可视化》