引用:
In certain example embodiments, the techniques herein may advantageously take advantage of NVIDIA's tensor cores (or other similar hardware). A tensor core may be a hardware unit that multiplies two 16×16 FP16 matrices (or other sized matrices depending on the nature of the hardware), and then adds a third FP16 matrix to the result by using fused multiply—add operations, and obtains an FP16 result. In certain example embodiments, a tensor core (or other processing hardware) can be used to multiply two 16×16 INT8 matrices (or other sized matrices depending on the nature of the hardware), and then add a third INT32 matrix to the result by using fused multiply-add operations and obtain an INT32 result which can then be converted to INT8 by dividing by the appropriate normalization amount (e.g., which may be calculated during a training process, such as described in connection with FIG. 9). Such conversions may be accomplished using, for example, a low processing cost integer right shift. Such hardware acceleration for the processing discussed herein (e.g., in the context the separable block transforms) may be advantageous.
这一段已经明说了,基于NV张量核心。
作为任天堂专利倒也没说死,or other similar hardware。
但肯定是依赖硬件深度计算加速的,也就是AMD是没戏的。
这也是任天堂相对于steam deck最大竞争优势了。