avx,fma vysvetlivky http://img848.imageshack.us/img848/8104/x55.png
AVX(1) su 256b FP instrukcie, AVX2 su 256b integer (nadvazuje na prve)
(SSE su 128b)
FMA - znacne zvysenie vykonu vo FP , a presnosti.
(odpada chyba zaokruhlovanim)
intel bude pouzivat 3operand variantu (a=a+b×c), BD ma 4op. (d=a+b×c)
BMI-
Bit manipulation instructions are useful for compressed database, hashing , large number arithmetic, and a variety of general purpose codes.
Niektore instr.sady budu specificke pre Intel, AMD. Bezneho uzivatela to negativne neovplyvni (nemalo),jedna sa viacej o server
(FMA3/4,TBM,BMI,LWP,XOP)
===========================
http://imgk.zol.com.cn/diybbs/5201/a5200908_s.jpg IB
-------------
Knights Ferry/Knights Corner
http://hothardware.com/News/Intel-Discu ... -Products/
Like Larrabee, Knight's Corner (and future MIC products in general) utilize a CPU based on Intel's original Pentium architecture (P54C). Modifications include complete cache coherency, x86-64 compatibility, and 512-bit vector support capable of performing 16 single-precision floating point operations simultaneously.
[P]erformance ranges dramatically based on applications from orders of magnitude improvements to incremental improvements" using the current chips [Knight's Ferry] that support only single-precision floating point operations.
You end up with two kinds of customers, one highly satisfied with [AMD and Nvidia graphics] accelerators because despite the tedious porting process, their results are very good. Others feel their time spent on porting [their apps to AMD and Nvidia chips] doesn’t justify the performance and there is a huge part of this second group for whom MIC is useful—and ultimately some of the first group may want MIC, too.
http://www.theregister.co.uk/2011/06/20 ... re_coding/
From Reinders' point of view, programmers need to switch their thinking from optimizing for the best possible performance measured by pure flops to optimizing instead for efficient data movement. The Knights' cores all share a coherent cache. "They're completely coherent with each other," he said. "And they have a snooping protocol and a cache directory, if you will, basis."
But coherency or no coherency, "The key to getting performance on [a MIC] is to have some data, keep it local, and nobody else touches it," Reinders told us.
Prioritou je minimalizovat presuny dat v cipe, inak vykon klesa.
(ovplyvnene RingBusom ?)
Pravdepodobne menej vhodny cip pre
"latency (many threads) sensitive" ulohy. Pre vela singlethread uloh beziacich sucasne by mohlo byt fajn riesenie. Z dizajn hladiska RingBus umoznuje lahko pridavat dalsie jadra bez velkej zmeny dizajnu cipu.
Latencia RingBusu stupa s rastucim poctom jadier/zástaviek na zbernici, a meni sa podla vzdialenosti navzajom komunikujucich jadier.
Nehalem aj SB maju tiez RB, tu ale bezi na plnom takte jadier, a maly pocet jadier (2-8c)
(SB dosahuje 384GB/s pri 3GHz latencia 35-40cykl., tento cip pojde na nizsej rychlosti a viacej jadier..)
Sucasny Aubrey Isle bezi na 1.2GHz 45nm 32c (cip je rozmerny, caka sa na 22nm),
buduci produkt bude na 22nm 50c ~1.5GHz (PCIEx3.0)
Co vlastne tento cip je? Je to GPU alebo CPU?
GPU to nieje, chybaju obvody ktore by sme nasli v GPU. (tiez chyba video vystup)
Samostatny CPU to tiez nieje (napriek tomu ze obsahuje mnoho "x86 compatible" jadier), na to aby server fungoval je treba aj CPU (Xeon..)
Teda odpoved na otazku
"Does it run Crysis ?" je v oboch pripadoch nie.
To nevadi, pretoze je to server produkt ktory ma sluzit pre
HPC. Top Xeony su >3000$, myslim ze cena KC by sa mohla vysplhat az sem.
Hlavny tahakom by malo byt "lahsie programovanie ako pre GPU" , spolocnost mu budu robit : Tesla , Firestream,
TileGX , prip. FPGA akceleratory