[业评] MS取消E3媒體會後之圖桌會議 (即問答環節)

lkig

禁止访问

帖子: 56
精华: 0
积分: 151
激骚: 2 度
爱车
主机
相机
手机
注册时间: 2013-2-26

发短消息
加为好友
当前离线

1^# 大中小发表于 2013-6-6 16:09 显示全部帖子

那你認為ATI可不可靠????
http://developer.amd.com/wordpre ... .2011.04.07.Web.pdf
http://developer.amd.com/wordpre ... MMC2011_Keynote.pdf
懶的看或是看不懂直接看結論好了
GPU cores optimized for arithmetic workloads and latency hiding

GCN的設計上本身的架構對延遲就很不敏感
如果閣下要發揮低延遲的威力就請去找別家
不然以GCN的多工執行緒設計低延遲跟本是雞肋的能力
GPU的處理根本不用排隊去等,你搞低延遲有何意義????

喔~~~對了...........也別找NV,他們對延遲的看法跟做法不會跟ATI差到哪去~~~~
http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Goal: Performance per millimeter

For GPUs, perfomance == throughput
Strategy: hide latency with computation not cache
– Heavy multithreading!
Implication: need many threads to hide latency
– Occupancy – typically prefer 128 or more threads/TPA
– Multiple thread blocks/TPA help minimize effect of barriers

[ 本帖最后由 lkig 于 2013-6-6 16:14 编辑 ]

TOP

lkig

禁止访问

帖子: 56
精华: 0
积分: 151
激骚: 2 度
爱车
主机
相机
手机
注册时间: 2013-2-26

发短消息
加为好友
当前离线

2^# 大中小发表于 2013-6-6 16:34 显示全部帖子

我講白一點好了~~~~
現在的GPU隨隨便便都有幾百個stream processor
居然還有人在談延遲.......
是當GPU跟CPU一樣只有個位數的執行緒是吧

TOP

lkig

禁止访问

帖子: 56
精华: 0
积分: 151
激骚: 2 度
爱车
主机
相机
手机
注册时间: 2013-2-26

发短消息
加为好友
当前离线

3^# 大中小发表于 2013-6-6 17:17 显示全部帖子

引用:

原帖由 首斩破沙罗 于 2013-6-6 16:37 发表

专业人士吗？那你不如再说得直白一点，四两的低延迟没多大用处

專業談不上~~~~只是勉強看的懂罷了

Nvidia 簡報
http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Goal: Performance per millimeter

For GPUs, perfomance == throughput
Strategy: hide latency with computation not cache
– Heavy multithreading!
Implication: need many threads to hide latency
– Occupancy – typically prefer 128 or more threads/TPA
– Multiple thread blocks/TPA help minimize effect of barriers

AMD 簡報1
http://developer.amd.com/wordpre ... MMC2011_Keynote.pdf

AMD’s GPU designs hide latency in two main ways:
– Issue an instruction over multiple cycles
– Interleave instructions from multiple
threads

AMD 簡報2
http://developer.amd.com/wordpre ... .2011.04.07.Web.pdf
LATENCY HIDING
GPU memory latency is high compared to ALU rates
When a wavefront needs to read in memory, it will stall on the SIMD until the memory arrives
GPU hides the stall (potential SIMD idle time) by switching the SIMD onto another wavefront
Hence, need more wavefronts than available SIMDs for efficient SIMD utilization

不管是NV還是AMD都是告訴你他們的作法是採取多工運算也就是利用多個執行單元"一次"處理大量的運算
說穿了你有幾百個處理單元為什麼要去等快取完再作下一個動作????
當然是利用這些單元做大量同時運算
也就是這兩家所說的GPU cores optimized for arithmetic workloads and latency hiding
就字面上的意義延遲已經被隱藏起來了
同時多工處理的時程老早就可以避過延遲了

相對的你幾百個處理單元需要的是一次送大量的資料
也就是傳輸資料量(注重頻寬)的能力,而不是你多頻繁(注重延遲)傳輸資料的能力

反過來說如果低延遲發揮效用就學CPU
用少量的處理單元高速的處理資料,再利用低延遲的快取反覆傳輸資料

[ 本帖最后由 lkig 于 2013-6-6 17:36 编辑 ]

TOP

[业评] MS取消E3媒體會後之圖桌會議 (即 問答環節)

引用:

[业评] MS取消E3媒體會後之圖桌會議 (即問答環節)