原帖由 JimmyC 于 2011-2-1 09:14 发表
SGX543的数值不对
每核SGX543应是USSE2x4
SGX543MP4+=USSE2x16
2FP32 x 16 x 550mhz = 17.6GFLOPS
而SGX543MP4+用的是TBDR架构
实际表现再乘三
(ARM的Mali GPU规格表直接将1.6G写成3.2G)
原帖由 u571 于 2011-2-1 11:53 发表
现代GPU性能就是靠浮点计算,NV和AMD有差距,但不是像这样只有几分之一的差距
而且SGX543连大部分浮点纹理都不支持,最基本的HDR都做不了
原帖由 黑龙 于 2011-2-1 20:50 发表
我觉得再怎么战也得有个道德底线吧,毕竟你首先是个人,然后才是什么饭
怎么反索饭就这么不把自己当人呢?为了反索就这么凭空捏造,什么下三滥卑鄙手段都使出来了……这几年里面你们造了多少个谣了?就这么脸不红 ...
原帖由 JimmyC 于 2011-2-1 19:26 发表
ARM使用TBR架构的Mali官方一样将实际1600M Pix/s写成3200M Pix/s
你看ARM怎样解释
http://blogs.arm.com/multimedia/ ... -pixel-not-a-pixel/
设问
你认为一款要求2000M Pix/s ...
原帖由 黑龙 于 2011-2-1 20:50 发表
我觉得再怎么战也得有个道德底线吧,毕竟你首先是个人,然后才是什么饭
怎么反索饭就这么不把自己当人呢?为了反索就这么凭空捏造,什么下三滥卑鄙手段都使出来了……这几年里面你们造了多少个谣了?就这么脸不红 ...
原帖由 u571 于 2011-2-2 07:34 发表
好笑,G70的HSR怎么叫笑话?所有G70跑DX9.0C游戏都是开HSR,请问哪个游戏开8AA能下降到七分之一?
TBR架构给ARM和powerVR吹的神乎其神,那intel怎么不在桌面继续用这个构架呢?
而且TBR架构所谓消除不可 ...
原帖由 @JimmyC 于 2011-2-2 13:42 发表
无论RSX/G70那个HSR都是假HSR
开真HSR的结果就是效能降到1/7
真假有什麽分别你直接去某N粉讨论区问好了
那边的管理员回覆了RSX和SGX543MP4+的比较文
可以顺便问一下TBDR有什麽优势,
1000MP/sTBDR和2000MP/s非TB ...
原帖由 hourousha 于 2011-2-2 14:29 发表
HSR效能1/7?喷了,转个7800的review,包括z-rejection performance
http://www.beyond3d.com/content/reviews/38/8
稍微解释一下,render order为Back to Front时,Early-Z-Rejection没有工作。因此pixel pipelin ...
原帖由 JimmyC 于 2011-2-3 19:27 发表
要到G80才算真正支援early z-rejection
http://www.gamedev.net/topic/576 ... on-on-g70-hardware/
To take advantage of this all 3D and 2D applications should use opaque objects (blending off, alpha test off, no discard in shader) as much as possible so that the HSR process can reduce fragment processing to a minimum. These should be rendered first, before any objects with transparency. Examples of this kind of sprites could be background graphics, terrain tiles, pop-up message windows.
原帖由 hourousha 于 2011-2-4 01:34 发表
喷了,你给出的链接只是说明在alpha test时early-z会失效罢了(大多数正常情况不会失效)。这和你说的‘G70的HSR是历史笑话’‘fps会降到1/7’之类的逻辑联系在哪?
要说起来,你不如去关心PVR中你引以为豪的TB ...
原帖由 JimmyC 于 2011-2-4 14:30 发表
不至, 至少在这些情况下也会失效(fps降至1/10~30)
-use kill/clip in pixelshader
-change compare func
-modify depth
好吧, 你要说这也算是完整的HSR我也没办法
那G80的官方文档和Nvidia GPU Programming Guide还真是写心酸的
USSE2的TBDR效能已经比USSE好了一倍(16z:8z)
一样是MBX, Sega的Aurora(2008产品)就有专门优化透明/不完整三角形
当年PowerVR2代, Dreamcast也是alpha test with HW front, 效能比同时脉的电脑版快一倍
难保SGX543MP4+不会有硬件加速alpha test, 就算没有, 也有64z, 即是Galaxy S的八倍
200MHz的Galaxy S(SGX540)比起240MHz的Tegra2 GPU效能差距多少?
就算不是N粉也可以参考Nvidia今年1月26日发出的宣传PDF, 说是110~150%, 实际约110~125%
然後Nvidia声称Tegra2的GPU效能是低阶G80(Tegra1是低阶Geforce6)
要喷, 请连NV一起喷, 好歹SGX543MP4+的同时脉效能是这"低阶G80"的八倍以上
For sprites with transparent areas, create polygons that are optimal for the visible area and exclude fragments that are completely transparent. If an application was to render a simple triangular shaped tree texture on a quad polygon, there would be large, empty areas that would need to be blended. A better approach in this situation would be to use a triangle that tightly fits the shape of the texture. By doing so, most of the empty area that would have to be blended when using a quad to render the tree sprite can be removed, which means there are fewer fragments to blend. Geometry used to tightly fit sprites in a given application should be kept as simple as possible while eliminating as many unwanted fragments as possible. Finding the balance between geometric complexity and the empty space that will be removed by using more complex geometry is a balance that is very application and platform specific. A tool such as the one described here: http://www.humus.name/index.php?page=Cool&ID=8 can be used to generate the geometry required.
For further optimisation, when rendering sprites with partially transparent areas, break each object down into an area that can be rendered as an opaque sprite and a second area of partially transparency that can be blended. By taking this approach, the number of fragments that need to be blended for each sprite can be significantly reduced, which allows the HSR process to provide a "super" fill rate. In order to maintain sprite ordering, use of the depth buffer will be required - each sprite will need a unique offset to avoid artefacts. Generating the areas for this technique can be done with a similar tool to that mentioned above, but this time looking for opaque pixels instead of completely transparent. As stated previously, the opaque objects should be drawn first followed by the blended objects as this will allow the blended objects to gain the most benefit possible from the hardware's HSR process.
原帖由 hourousha 于 2011-2-4 15:33 发表
1:请给出fps降至1/10至1/30的出处,说起来你这个结论就很神,降至1/10的原始参照物在哪里?前提条件是什么?仅仅是一个alpha test时HSR失效就会让fps降到1/10,那岂不是说alpha test占了总渲染成本的90%以上且alp ...
原帖由 JimmyC 于 2011-2-4 16:39 发表
early-z exists since gf3, like mentioned before. it is disabled if you
-enable alpha test
-use kill/clip in pixelshader
-change compare func
in order to get speed again on G70, you need to work around your alpha-testing.
this is critical, otherwise you pretty much run without optimization and then you're easily 10 to 30 times slower.
你自己搜索一下随便一个Dreamcast模拟器的说明
DC用的PowerVR2的指令分ZWrite和Alpha ZWrite等
後者可大幅强化fps数倍,
这硬体加速指令可是DC版的PowerVR2才有, 显卡的Neon250没有
Sega街机用的MBX也有这个指令, 但iphone2G/3G用的就没有
证明Imgtec一早就有解决方法但没全部采用
在还没清楚SGX543MP4+的规格前就喷这点会不会太早?
PowerVR Insider那边的资料别说SGX543MP4+, 连SGX543的也没有, 也没有家用机芯片的资料
最近期的就是2007年发表的SGX540的开发建议
比起USSE, USSE2每管线shader/TBDR/隐面处理性能增加一倍, 8z>16z, 1D>2D, Vec2>Vec4, 同时支援更多硬体加速
难为你可以面不红气不喘地用2005年USSE的资料来喷2009年的USSE2
跑什麽题?
RSX:G70(7800)阉割版(8:24:24:8)
时脉比SGX543MP4+高20%, 效能高10~25%的240MHz Tegra2:低阶G80, 最低阶的G80为8300GS(8:8:4)
前一点不敢喷,
说到同时脉效能为Tegra2八倍以上的SGX543MP4+效能接近8600GT(32:16:8)/RSX就要喷了
可笑的是连SGX543MP4+时脉多少还未知道
当2011Q1的OMAP4440(45nm)用的已是380MHz
还要拿着200MHz的数据来喷
原帖由 hourousha 于 2011-2-4 18:40 发表
敢情1/10-1/30是这么来的,彻底喷了,那人在论坛上红口白牙地一说,一无数据支持,二无环境说明,三无法证明此问题是由HSR失效导致,到了你这里就当真理宣传了,你真行……
说RSX的HSR是笑话是假HSR的是你不是我; ...
原帖由 qjw363924793 于 2011-2-4 18:45 发表
ngp能领先手机2年半?的确啊因为ngp2年半后上市,所以yy的东西总是无比强大,2014年初,ngp机能如果能领先最高端手机,我死,如果没有领先楼上死,楼上的2b敢赌命不?2014年挖坟来
原帖由 JimmyC 于 2011-2-4 20:21 发表
G70及之前的只能coarse level Z and Stencil culling
G80及以後的才能fine-grained Z and Stencil culling
Course-grained Z: Course Z, Hierarchical Z, Hi-Z, or ZCULL
Fine-grained Z: Fine Z, Early Z, Early Z Checking, Early Z Out
好吧, 这不是阉割,
fine-grained Z and Stencil culling是多馀的
skip the shading of occluded pixels其实是没有用的垃圾功能
没有这的G70已经是完整的HSR
没有这的G70才是真HSR
有这的G80反而是假HSR
我这样说没错吧?
1/7, 1/10-30都是别人在G70使用HSR实际编程的结果,
Nvidia自然不会说白慢多少, 但随便搜一下也有很多这方面的讨论
我放出讨论链结又被喷是搜回来的, 非官方不能作准
但我又不会写, 你怎样不自己写一点看看?
还有, MBX是五年前的产品
拿2005年USSE来喷2009年USSE2的不是你?
原帖由 hourousha 于 2011-2-4 22:50 发表
喷了,你这逻辑能力真成问题,G70的early-z有限制,但不是假HSR,更不是笑话,很简单,有37楼给出的测试结果为证,比你在这红口白牙地给HSR的真假与否,笑话与否胡乱下定义要强的多。
至于你说G80是假HSR,我只能 ...
证明有没有early-Z的方法, 就是要让z-cull失效. 方法很简单, 反转一下z test就可以了.
结果证明G8x几乎根本不受z-反转的影响, 而G70在测试反转后性能和完全没有occlusion一样.
原帖由 JimmyC 于 2011-2-4 23:25 发表
你先看一下Course-grained Z和Fine-grained Z的归类
Course-grained Z: Course Z, Hierarchical Z, Hi-Z, or ZCULL
Fine-grained Z: Fine Z, Early Z, Early Z Checking, Early Z Out
然後究竟G70有没有Fine- ...
Early-Z Optimization
Early-z optimization (sometimes called “z-cull”) improves performance by avoiding the rendering of occluded surfaces. If the occluded surfaces have expensive shaders applied to them, z-cull can save a large amount of computation time. To take advantage of z-cull, follow these guidelines:
Don’t create triangles with holes in them (that is, avoid alpha test or texkill)
Don’t modify depth (that is, allow the GPU to use the interpolated depth value)
Violating these rules can invalidate the data the GPU uses for early optimization, and can disable z-cull until the depth buffer is cleared again
原帖由 TG春上春 于 2011-2-4 23:53 发表
乃们还真能吵, 还吵得像模像样的. :D
Z-cull和early-z本来就不是一个咚咚. Z-cull是在raster里面的, 所谓coarse是因为它是逐tile做深度测试, 不是逐sample. 做逐sample深度测试的是ZROP, 所谓的fine-grained. ZRO ...
原帖由 hourousha 于 2011-2-4 23:56 发表
原来你又发现了新大陆,呵呵,可惜的是你只知其一不知其二。
这个early-z rejection指的是一种行为——也就是把‘本来就通不过z-test的fragment在进入fragment shader之前预先cull掉,避免不必要的运算’。至于不 ...
原帖由 JimmyC 于 2011-2-5 00:15 发表
以你的标准
现在连Tegra1支援也真HSR, 非阉割HSR了...
(Tegra支援early-z rejection)
哎...
这样的话我也无话可说了...
原帖由 hourousha 于 2011-2-5 00:25 发表
Tegra细节是啥我不清楚别和我扯这个。
RacingPHT在本论坛也有账号你直接问他关于这问题不就OK了?
他在那贴里明明也说了‘因为首先z-cull也可以算是early-z’。换句话说,G70的Z-Cull本身也是Early-Z,只不过后来 ...
原帖由 JimmyC 于 2011-2-5 00:55 发表
刚又找到SCEE的官方开发文档PDF 2009年版
在适当环境下, 依足步骤, 没有违反建议下, RSX的Early Z-cull可以足足省回10%GPU!
哈哈, 好吧, 我认了
RSX的HSR是"真"HSR, "非阉割"HSR
虽然效率只有G8X的一半
TBDR的六分一(依x2.5计算)
RSX 2 z/stencil
SGX543MP4+ 64 z/stencil
两者的实际HSR效率差了32倍
就算RSX的HSR仅能省回10%也好,
总之RSX的是"真"HSR, "非阉割"HSR就是
话说回来, "PowerVR有TBDR有什麽了不起, RSX也有HSR"这话题呢是谁开的?
现在有答案了, 呵呵
RacingPHT我不熟, 你可以问问看
看你对CLX2在TBDR的同时对alpha test硬件加速一面怀疑
其实beyond3d的讨论区就有Imgtec的员工长驻
说CLX2有alpha test硬件加速, 同时脉性能比Neon250高一倍的就是他
你可以问他究竟十二年前是怎样做到的
(虽然随便下一个DC模拟器已经可看到zwrite/alpha test zwrite的选项)
原帖由 hourousha 于 2011-2-5 11:31 发表
请贴10%的原文与前提条件,要是原场景的depth complexity就只有1或者渲染全是transparent obj,那还一点都省不了呢。少逗咳嗽了你
还乘2.5,还TBDR的六分之一喷了,你要不就是算术太棒,要不就是脑子太好,真是 ...
SimonF说的话我信....给出我怀疑PVR CLX2的连接,别急了眼就信口胡说啊……
一样是MBX, Sega的Aurora(2005产品)就有专门优化透明/不完整三角形
当年PowerVR2代, Dreamcast也是alpha test with HW front, 效能比同时脉的电脑版快一倍
优化透明三角形么?还是看我给你的那个Insider FAQ,里面提到了,我再给你引用一下
....
是让开发者事先把blend的几何体给分割成不透明/半透明两大集合,尽量减小blend处理量,这就是你说的硬件优化透明/镂空三角形吗?喷了…
本来透明物体渲染就和HSR无缘。
看你对CLX2在TBDR的同时对alpha test硬件加速一面怀疑
是谁开的呢?
这问题我在08年就和他聊过
原帖由 JimmyC 于 2011-2-5 15:08 发表
10%
没有, 原文那一頁, 就这六行, 你可以不信, 呵呵
Many games are fragment shaderbound
•Rendering Z only ‘primes’ the RSX™ Z-cull unit
–Very fast, 16 pixels/clock rather than 8
–Render entire scene,
–Or ‘large’ meshes only
–Easily save 10% GPU
怎样不直接计算SGX和RSX受惠於TBDR/z-cull能省掉多少GPU?
RSX方面SCEE已直接给了省10%GPU这答案
SGX将400MP/s当1000MP/s用对吧?
省多少?怎样计算?我不知道, 呵
拿十二年前的CLX2/六年前的MBX替USSE2说项不行
拿六年前的USSE喷USSE2就可以了, 呵呵
你不会看?
就贴出来呀
还有麻烦您别缩,我怀疑CLX2的证据在哪?
一样是MBX, Sega的Aurora(2005产品)就有专门优化透明/不完整三角形
当年PowerVR2代, Dreamcast也是alpha test with HW front, 效能比同时脉的电脑版快一倍
优化透明三角形么?还是看我给你的那个Insider FAQ,里面提到了,我再给你引用一下
....
是让开发者事先把blend的几何体给分割成不透明/半透明两大集合,尽量减小blend处理量,这就是你说的硬件优化透明/镂空三角形吗?喷了…
本来透明物体渲染就和HSR无缘。
看你对CLX2在TBDR的同时对alpha test硬件加速一面怀疑
其实beyond3d的讨论区就有Imgtec的员工长驻
说CLX2有alpha test硬件加速, 同时脉性能比Neon250高一倍的就是他
你可以问他究竟十二年前是怎样做到的
SimonF说的话我信....给出我怀疑PVR CLX2的连接,别急了眼就信口胡说啊……
优化透明三角形么?还是看我给你的那个Insider FAQ,里面提到了,我再给你引用一下
引用:
For sprites with transparent areas, create polygons that are optimal for the visible area and exclude fragments that are completely transparent. If an application was to render a simple triangular shaped tree texture on a quad polygon, there would be large, empty areas that would need to be blended. A better approach in this situation would be to use a triangle that tightly fits the shape of the texture. By doing so, most of the empty area that would have to be blended when using a quad to render the tree sprite can be removed, which means there are fewer fragments to blend. Geometry used to tightly fit sprites in a given application should be kept as simple as possible while eliminating as many unwanted fragments as possible. Finding the balance between geometric complexity and the empty space that will be removed by using more complex geometry is a balance that is very application and platform specific. A tool such as the one described here: http://www.humus.name/index.php?page=Cool&ID=8 can be used to generate the geometry required.
For further optimisation, when rendering sprites with partially transparent areas, break each object down into an area that can be rendered as an opaque sprite and a second area of partially transparency that can be blended. By taking this approach, the number of fragments that need to be blended for each sprite can be significantly reduced, which allows the HSR process to provide a "super" fill rate. In order to maintain sprite ordering, use of the depth buffer will be required - each sprite will need a unique offset to avoid artefacts. Generating the areas for this technique can be done with a similar tool to that mentioned above, but this time looking for opaque pixels instead of completely transparent. As stated previously, the opaque objects should be drawn first followed by the blended objects as this will allow the blended objects to gain the most benefit possible from the hardware's HSR process.
是让开发者事先把blend的几何体给分割成不透明/半透明两大集合,尽量减小blend处理量,这就是你说的硬件优化透明/镂空三角形吗?喷了……
一样是MBX, Sega的Aurora(2005产品)就有专门优化透明/不完整三角形
当年PowerVR2代, Dreamcast也是alpha test with HW front, 效能比同时脉的电脑版快一倍
问题2.
CLX2/MBX到底有没有alpha test硬体加速?
问题3.
HSR渲染下能否对alpha test硬体加速?
问题4.
Imgtec是否曾经掌握HSR渲染下对alpha test硬体加速的设计?
问题5.
为什麽你要用PowerVR Insider那段软件解决方法
原帖由 马甲雷 于 2011-2-23 21:54 发表
因为数据和英文太多,没怎么细看,不过貌似张老师问SCE什么时候说NGP能和PS3媲美这种话,我来作证,SCE官方的确没有说过两者性能近似。而是NGP的技术还要更“进化”,能够提供“最高品质”“最真实的体验”等等,岂止 ...
原帖由 @马甲雷 于 2011-2-23 21:54 发表
因为数据和英文太多,没怎么细看,不过貌似张老师问SCE什么时候说NGP能和PS3媲美这种话,我来作证,SCE官方的确没有说过两者性能近似。而是NGP的技术还要更“进化”,能够提供“最高品质”“最真实的体验”等等,岂止 ...
原帖由 AngelKillerr 于 2011-2-24 08:17 发表
唉。。孩子,可能你不懂英文,没看直播。人家sony说NGP不是跟ps3一样,是超过ps3. 而且你在家玩ps3,出门还可以用NGP继续玩ps3上的游戏~~~同一个游戏你得买2份啊!
欢迎光临 TGFC Lifestyle (http://tgfcer.com/) | Powered by Discuz! 6.0.0 |