np.dot 和 np.multiply 与 np.sum 在二进制交叉熵损失计算中的区别
问题描述
我尝试了以下代码,但没有发现 np.dot 和 np.multiply 与 np.sum 之间的区别
这里是 np.dot 代码
logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)打印(logprobs.shape)打印(日志问题)成本 = (-1/m) * logprobs打印(成本.形状)打印(类型(成本))打印(成本)
它的输出是
(1, 1)[[-2.07917628]](1, 1)<类'numpy.ndarray'>[[ 0.693058761039 ]]
这是 np.multiply 与 np.sum 的代码
logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))打印(logprobs.shape)打印(日志问题)成本 = - logprobs/m打印(成本.形状)打印(类型(成本))打印(成本)
它的输出是
<代码>()-2.07917628312()<类'numpy.float64'>0.693058761039
我无法理解类型和形状的差异,而两种情况下的结果值相同
即使在压缩前代码的情况下成本值与后相同但类型保持相同
cost = np.squeeze(cost)打印(类型(成本))打印(成本)
输出是
<class 'numpy.ndarray'>0.6930587610394646
解决方案 你正在做的是计算 二元交叉熵损失,用于衡量模型的预测(此处为:A2
)与真实输出(这里:Y
).
这是您的案例的可重现示例,它应该解释为什么您在第二种情况下使用 np.sum
在[88]中:Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])在 [89] 中:A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])在 [90] 中:logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)# `np.dot` 返回二维数组,因为它的参数是二维数组在 [91] 中:logprobs出[91]:数组([[-0.78914626]])在 [92] 中:成本 = (-1/m) * logprobs在 [93] 中:成本出[93]:数组([[ 0.09864328]])在 [94] 中:logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))# np.sum 返回标量,因为它对 2D 数组中的所有内容求和在 [95] 中:logprobs输出[95]:-0.78914625761870361
请注意,np.dot
仅对与此处 (1x8) 和 (8x1)
匹配的内部尺寸求和.因此,8
s 将在点积或矩阵乘法期间消失,产生的结果为 (1x1)
,这只是一个 标量,但返回作为形状 (1,1)
.
另外,最重要的是注意这里 np.dot
与 np.matmul
完全相同,因为输入是二维数组(即矩阵)
在[107]中:logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)在 [108] 中:logprobs出[108]:数组([[-0.78914626]])在 [109] 中:logprobs.shape出 [109]: (1, 1)
以标量值的形式返回结果
np.dot
或 np.matmul
根据输入数组返回任何结果数组形状.如果输入是二维数组,即使使用 out=
参数也无法返回 标量.但是,我们可以使用 np.asscalar()
如果结果数组的形状为 (1,1)
(或更一般地说是 scalar 包裹在 nD 数组中的值)
在 [123]: np.asscalar(logprobs)输出[123]:-0.7891462576187036在 [124] 中:类型(np.asscalar(logprobs))出[124]:浮动
<块引用>
ndarray 大小为 1 到 标量 值
在 [127]: np.asscalar(np.array([[[23.2]]]))出局[127]:23.2在 [128] 中:np.asscalar(np.array([[[[23.2]]]]))出局[128]:23.2
I have tried the following code but didn't find the difference between np.dot and np.multiply with np.sum
Here is np.dot code
logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
print(logprobs.shape)
print(logprobs)
cost = (-1/m) * logprobs
print(cost.shape)
print(type(cost))
print(cost)
Its output is
(1, 1)
[[-2.07917628]]
(1, 1)
<class 'numpy.ndarray'>
[[ 0.693058761039 ]]
Here is the code for np.multiply with np.sum
logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
print(logprobs.shape)
print(logprobs)
cost = - logprobs / m
print(cost.shape)
print(type(cost))
print(cost)
Its output is
()
-2.07917628312
()
<class 'numpy.float64'>
0.693058761039
I'm unable to understand the type and shape difference whereas the result value is same in both cases
Even in the case of squeezing former code cost value become same as later but type remains same
cost = np.squeeze(cost)
print(type(cost))
print(cost)
output is
<class 'numpy.ndarray'>
0.6930587610394646
解决方案
What you're doing is calculating the binary cross-entropy loss which measures how bad the predictions (here: A2
) of the model are when compared to the true outputs (here: Y
).
Here is a reproducible example for your case, which should explain why you get a scalar in the second case using np.sum
In [88]: Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])
In [89]: A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])
In [90]: logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
# `np.dot` returns 2D array since its arguments are 2D arrays
In [91]: logprobs
Out[91]: array([[-0.78914626]])
In [92]: cost = (-1/m) * logprobs
In [93]: cost
Out[93]: array([[ 0.09864328]])
In [94]: logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
# np.sum returns scalar since it sums everything in the 2D array
In [95]: logprobs
Out[95]: -0.78914625761870361
Note that the np.dot
sums along only the inner dimensions which match here (1x8) and (8x1)
. So, the 8
s will be gone during the dot product or matrix multiplication yielding the result as (1x1)
which is just a scalar but returned as 2D array of shape (1,1)
.
Also, most importantly note that here np.dot
is exactly same as doing np.matmul
since the inputs are 2D arrays (i.e. matrices)
In [107]: logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)
In [108]: logprobs
Out[108]: array([[-0.78914626]])
In [109]: logprobs.shape
Out[109]: (1, 1)
Return result as a scalar value
np.dot
or np.matmul
returns whatever the resulting array shape would be, based on input arrays. Even with out=
argument it's not possible to return a scalar, if the inputs are 2D arrays. However, we can use np.asscalar()
on the result to convert it to a scalar if the result array is of shape (1,1)
(or more generally a scalar value wrapped in an nD array)
In [123]: np.asscalar(logprobs)
Out[123]: -0.7891462576187036
In [124]: type(np.asscalar(logprobs))
Out[124]: float
ndarray of size 1 to scalar value
In [127]: np.asscalar(np.array([[[23.2]]]))
Out[127]: 23.2
In [128]: np.asscalar(np.array([[[[23.2]]]]))
Out[128]: 23.2
相关文章