por EvaCross » 01-06-2006 06:17
[quote="LoveProof"]Cómo que operación. ¿huh? ...vera que si codifico un video a DivX con CELL utilizando solo su PPE ....habría que esperar muchísimo para que termine... en comparación a un simple single-core INTEL de 3.2GHZ
Ahem...
[quote]
Cell Processor to Run at 4.6 GhzWhen Sony reveals the technical details of the highly-anticipated advanced microprocessor code-named Cell, they will announce that the new processor will run at 4.6 gigahertz. IBM, Sony and Toshiba will present four technical papers next month at the International Solid State Circuits Conference to be held in San Francisco.
The Cell is a multicore chip comprising a 64-bit Power processor core and multiple synergistic processor cores capable of massive floating point processing. Cell is optimized for compute-intensive workloads and broadband rich media applications, including computer entertainment, movies and other forms of digital content.
Sony plans to use the Cell to power its next generation PlayStation as well as home servers for broadband content and high-definition televisions.
-Other highlights of the Cell processor design include:
-Multi-thread, multicore architecture.
-Supports multiple operating systems at the same time. -Substantial bus bandwidth to/from main memory, as well as companion chips.
-Flexible on-chip I/O (input/output) interface.
-Real-time resource management system for real-time applications.
-On-chip hardware in support of security system for intellectual property protection.
-Implemented in 90 nanometer (nm) silicon-on-insulator (SOI) technology
http://news.teamxbox.com/xbox/7511/Cell ... at-46-Ghz/[color=red]
LoveProof. ¿No y que no corría los sistemas operativos al mismo tiempo?.[/color]
IBM's CELL Processor: Preview to Greatness?IBM, in cooperation with fellow industry giants Sony and Toshiba, recently announced a new processor, called the Cell. While this in itself is not really cause for much celebration (except perhaps for the odd bedfellows involved in the project) the new multi-core chip might well prove to be something special. For one thing, it's going to be at the heart of Sony's upcoming Playstation 3 console, which we have a feeling may be slightly popular…
The Cell processor is vastly different from conventional processors inside. This tiny chip contains a powerful 64-bit Dual-threaded IBM PowerPC core but also eight proprietary 'Synergistic Processing Elements' (SPEs), essentially eight more highly specialized mini-computers on the same die.
It's these SPEs that make the Cell architecture special, as you might guess. IBM describes the product as a 'System on a Chip.' Like IBM's Power5 processors, multi-processing is build right into the die.
PCstats is going to take a quick look at what's currently known about the architecture of the Cell processor and its potential as a rival to today's x86-based 32 and 64-bit processors.
The basics: Cell Biology
As we mentioned, the prototype Cell processor is composed of a single 64-bit RISC PowerPC processor and eight SPE 32-bit units. These are bound together by a fast internal bus, the Elemental Interface Bus (EIB). A built-in dual channel memory controller is included, and connects to a current maximum of 256MB of extremely fast Rambus XDR memory. Communication with the rest of the system is provided by the FlexIO bus. This interface also allows high speed, chip-to-chip communication between different Cell processors, either inside or outside the same computer system.
The prototype Cell processor ran at 4GHz, and according to IBM, is capable of a theoretical maximum of 256Gflops, thus placing it instantly at the forefront of potential multi-chip supercomputer designs of the future. The chip is built on a 90nm process and contains 234 million transistors.
The Cell is allegedly capable of dynamic power management (perhaps a variant of Transmeta'a Long Run power management technology?), throttling itself to suit the current processing load. Internal temperature sensors are also present as you would anticipate.
The Synergistic Processing Elements: Cellular Engines
Each 'Synergistic Processing Elements' (SPE) is a powerful processor in its own right, focused on one thing: churning through single precision and double precision mathematical calculations. As each SPE component is a subordinate part of the whole Cell processor, they can dispense with a lot of the complex instruction queuing logic that a typical modern single processor computer needs. Each SPE gets its orders from the 64-bit Power PC unit which handles the scheduling and parceling of data out to the SPEs. Instead the focus is on speed
Each SPE has 256KB of on-die memory allotted only to it, but instead of being used as conventional cache memory, this small area of high-speed storage is actually addressed almost like typical system RAM. The L1 cache memory found in conventional processors is highly automated, making it simple to program for but adding overhead. With the Cell processor, programmers can dictate exactly how they wish their software to use the 256KB allotment available to each SPE. This allows execution efficiency to be increased with good software design. It also allows better memory management and security from buffer overflows and other exploits.
Individual SPEs should be able to pass data to each other by storing it in specific areas of the system memory, forming a chain of processing units each performing a different operation on the data. Obviously this requires an extremely fast interface between the system RAM and the SPEs, which the Cell has in spades… More on this later.
ISSCC 2005: The CELL ProcessorThe fundamental task of a processor is to manage the flow of data through its computational units. However in the past two decades, each successive generation of processors for personal computers has added more transistors dedicated to increasing the performance of spaghetti-like integer code. For example, it is well known that typical integer codes are branchy and that branch mispredict penalties are expensive; in an effort to minimize the impact of branch instructions, transistors were used to develop highly accurate branch predictors. Aside from branch predictors, sophisticated cache hierarchies with large tag arrays and predictive cache prefetch units attempt to hide the complexity of data movement from the software, and further increase the performance of single threaded applications. The pursuit of single threaded performance can be observed in recent years in the proposal of extraordinarily deeply pipelined processors designed primarily to increase the performance of single threaded applications, at the cost of higher power consumption and larger transistor budgets.
The fundamental idea of the CELL processor project is to reverse this trend and give up the pursuit of single threaded performance, in favor of allocating additional hardware resources to perform parallel computations. That is, minimal resources are devoted toward the execution of single threaded workloads, so that multiple DSP-like processing elements can be added to perform more parallelizable multimedia-type computations. In the examination of the first implementation of the CELL processor, the theme of the shift in focus from the pursuit of single threaded integer performance to the pursuit of multiply threaded, easily parallelizable multimedia-type performance is repeated throughout.
CELL Basics
The CELL processor is a collaboration between IBM, Sony and Toshiba. The CELL processor is expected by this consortium to provide computing power an order of magnitude above and beyond what is currently available to its competitors. The International Solid-State Circuits Conference (ISSCC) 2005 was chosen by the group as the location to describe the basic hardware architecture of the processor and announce the first incarnation of the CELL processor family.
Members of the CELL processor family share basic building blocks, and depending on the requirement of the application, specific versions of the CELL processor can be quickly configured and manufactured to meet that need. The basic building blocks shared by members of the CELL family of processor are the following:
The PowerPC Processing Element (PPE)
The Synergistic Processing Element (SPE)
The L2 Cache
The internal Element Interconnect Bus(EIB)
The shared Memory Interface Controller (MIC) and
The FlexIO interface
Each SPE is in essence a private system-on-chip (SoC), with the processing unit connected directly to 256KB of private Load Store (LS) memory. The PPE is a dual threaded (SMT) PowerPC processor connected to the SPE's through the EIB. The PPE and SPE processing elements access system memory through the MIC, which is connected to two independent channels of Rambus XDR memory, providing 25 GB/s of memory bandwidth. The connection to I/O is done through the FlexIO interface, also provided by Rambus, providing 44.8 GB/s of raw outbound BW and 32 GB/s of raw inbound bandwidth for total I/O bandwidth of 76.8 GB/s. At ISSCC 2005, IBM announced that the first implementation of the CELL processor has been tested to operate at frequencies above 4 GHz. In the CELL processor, each SPE is capable of sustaining 4 FMADD operations per cycle. At an operating frequency of 4 GHz, the CELL processor is thus capable of achieving a peak throughput rate of 256 GFlops from the 8 SPE's. Moreover, the PPE can contribute some amount of additional compute power with its own FP and VMX units.
[img]http://www.realworldtech.com/includes/images/articles/cell-1.gif[/img]
Figure 1 shows the die photo of the first CELL processor implementation with 8 SPE's. The sample processor tested was able to operate at a frequency of 4 GHz with Vdd of 1.1V. The power consumption characteristics of the processor were not disclosed by IBM. However, estimates in the range of 50 to 80 Watts @ 4 GHz and 1.1 V were given. One unconfirmed report claims that at the extreme end of the frequency/voltage/power spectrum, one sample CELL processor was observed to operate at 5.6 GHz with 1.4 V Vdd and consumed 180 W of power.
As described previously, the CELL processor with 8 SPE's operating at 4 GHz has a peak throughput rate of over 256 GFlops. To provide the proper balance between processing power and data bandwidth, an enormously capable system interconnects and memory system interface is required for the CELL processor. For that task, the CELL processor was designed as a Rambus Sandwich, with Redwood Rambus Asic Cell (RRAC) acting as the system interface on one end of the CELL processor, and the XDR (formerly Yellowstone) high bandwidth DRAM memory system interface on the other end of the CELL processor. Finally, the CELL processor has 2954 C4 contacts to the 3-2-3 organic package, and the BGA package is 42.5 mm by 42.5 mm in size. The BGA package contains 1236 contacts, 506 of which are signal interconnects and the remainder are devoted to power and ground interconnects.
[img]http://www.realworldtech.com/includes/images/articles/cell-2.gif[/img]
The first incarnation of the CELL processor is implemented in a 90nm SOI process. IBM claims that while the logic complexity of each pipeline stage is roughly comparable to other processors with a per stage logic depth of 20 FO4, aggressive circuit design, efficient layout and logic simplification enabled the circuit designers of the CELL processor to reduced the per stage circuit delay to 11 FO4 throughout the entire design. The design methodology deployed for the CELL processor project provides an interesting contrast to that of other IBM processor projects in that the first incarnation of the CELL processor makes use of fully custom design. Moreover, the full custom design includes the use of dynamic logic circuits in critical data paths. In the first implementation of the CELL processor, dynamic logic was deployed for both area minimization as well as performance enhancement to reach the aggressive goal of 11 FO4 circuit delay per stage. Figure 2 shows that with the circuit delay depth of 11 FO4, oftentimes only 5~8 FO4 are left for inter-latch logic flow.
The use of dynamic logic presents itself as an interesting issue in that dynamic logic circuits rely on the capability of logic transistors to retain a capacitive load as temporary storage. The decreasing capacitance and increasing leakage of each successive process generation means that dynamic logic design becomes more challenging with each successive process generation. In addition, dynamic circuits are reportedly even more challenging on SOI based process technologies. However, circuit design engineers from IBM believe that the use of dynamic logic will not present itself as an issue in the scalability of the CELL processor down to 65 nm and below. The argument was put forth that since the CELL processor is a full custom design, the task of process porting with dynamic circuits is no more and no less challenging than the task of process porting on a design without dynamic circuits. That is, since the full custom design requires the re-examination and re-optimization of transistor and circuit characteristics for each process generation, if a given set of dynamic logic circuits become impractical for specific functions at a given process node, that set of circuits can be replaced with static circuits as needed.
The process portability of the CELL processor design is an interesting topic due to the fact that the prototype CELL processor is a large device that occupies 221 mm2 of silicon area on the 90 nm process. Comparatively, the IBM PPC970FX processor has a die size of 62 mm2 on the 90 nm process. The natural question then arises as to whether Sony will choose to reduce the number of SPE's to 4 for the version of the CELL processor to appear in the next generation Playstation, or keep the 8 SPE's and wait for the 65 nm process before it ramps up the production of the next generation Playstation. Although no announcements or hints have been given, IBM's belief in regards to the process portability of the CELL processor design does bode well for the 8 SPE path since process shrinks can be relied on to bring down the cost of the CELL processor at the 65 nm node and further at the 45 nm node.
Floating Point Capability
As described previously, the prototype CELL processor's claim to fame is its ability to sustain a high throughput rate of floating point operations. The peak rating of 256 GFlops for the prototype CELL processor is unmatched by any other device announced to date. However, the SPE's are designed for speed rather than accuracy, and the 8 floating point operations per cycle are single precision (SP) operations. Moreover, these SP operations are not fully IEEE754 compliant in terms of rounding modes. In particular, the SP FPU in the SPE rounds to zero. In this manner, the CELL processor reveals its roots in Sony's Emotion Engine. Similar to the Emotion Engine, the SPE's single precision FPU also eschewed rounding mode trivialities for speed. Unlike the Emotion Engine, the SPE contains a double precision (DP) unit. According to IBM, the SPE's double precision unit is fully IEEE854 compliant. This improvement represents a significant capability, as it allows the SPE to handle applications that require DP arithmetic, which was not possible for the Emotion Engine.
Naturally, nothing comes for free and the cost of computation using the DP FPU is performance. Since multiple iterations of the same FPU resources are needed for each DP computation, peak throughput of DP FP computation is substantially lower than the peak throughput of SP FP computation. The estimate given by IBM at ISSCC 2005 was that the DP FP computation in the SPE has an approximate 10:1 disadvantage in terms of throughput compared to SP FP computation. Given this estimate, the peak DP FP throughput of an 8 SPE CELL processor is approximately 25~30 GFlops when the DP FP capability of the PPE is also taken into consideration. In comparison, Earth Simulator, the machine that previously held the honor as the world's fastest supercomputer, uses a variant of NEC's SX-5 CPU (0.15um, 500 MHz) and achieves a rating of 8 GFlops per CPU.
Clearly, the CELL processor contains enough compute power to present itself as a serious competitor not only in the multimedia-entertainment industry, but also in the scientific community that covets DP FP performance.That is, if the non-trivial challenges presented by the programming model of the CELL processor can be overcome, the CELL processor may be a serious competitor in applications that its predecessor, the Emotion Engine, could not cover.
más aquí:
http://www.realworldtech.com/page.cfm?A ... 1005084318Comparación de Berchmark: Pentium 4 Vs CELL.[img]http://www.research.ibm.com/journal/sj/451/damor7.gif[/img]
[img]http://www.research.ibm.com/journal/sj/451/damor6.gif[/img]
Ray-Tracing may be the future of computer graphicsAs nVidia has shown with their Dawn demos, GPUs with programmable shaders are capable of creating a high degree of realism. Yet there are still those, including our own Goatguy, who argue that ray-tracing is a superior image creation technique. According to the Scientific Computing and Imaging Institute.
Ace's Hardware tried to see if the latest Intel and AMD processors were capable of handling ray-tracing in real-time. Unfortunately, they discovered that at 640x480 resolution, today's fastest CPUs average less than 2 fps using ray-tracing techniques; at higher resolutions it would take several seconds to render a single frame. It is therefore unlikely that games will employ ray-tracing any time soon. Still, we should seriously consider Johan's warning that "If LOD levels get so high that a polygon is as small as a pixel on the screen, the typical 'polygon and texture' approach of the current videochips might not make any sense anymore."
IBM's TRE demo, these are the pics from the demo.
[img]http://img.photobucket.com/albums/v665/snowcrash007/cell-rt2.jpg[/img]
[img]http://img.photobucket.com/albums/v665/snowcrash007/cell-rt1.jpg[/img]
TRE 720P Performance
2.0 GHz Apple G5 0.6 frames/sec
40% of cycles spent waiting for Memory
3.2 GHz Cell 30.0 frames/sec
1% of cycles spent waiting forMemory
Cell has 50x advantage
BLURB
As nVidia has shown with their Dawn demos, GPUs with programmable shaders are capable of creating a high degree of realism. Yet there are still those, including our own Goatguy, who argue that ray-tracing is a superior image creation technique. According to the Scientific Computing and Imaging Institute:
While traditional z-buffer based graphics accelerators operate by "pushing" all of the geometry in a scene "through" the screen, ray-tracing works by "casting" a ray through each pixel of the screen and finding the scene geometry intersected by that ray. Though the final images produced by these two approaches are often identical (though not always: ray-tracers can often produce images with much greater visual richness), the computational complexities are quite different. The traditional graphics pipeline scales linearly with the number of polygons in the scene, and sub-linearly with the number of pixels on the screen. That is, for twice as many polygons, a typical graphics card will take twice as long to render the scene. In contrast, ray-tracing scales sub-linearly with the number of polygons. So for twice as many polygons, ray-tracing will typically require far less than twice as long to produce a complete final rendering.
Ace's Hardware tried to see if the latest Intel and AMD processors were capable of handling ray-tracing in real-time. Unfortunately, they discovered that at 640x480 resolution, today's fastest CPUs average less than 2 fps using ray-tracing techniques; at higher resolutions it would take several seconds to render a single frame. It is therefore unlikely that games will employ ray-tracing any time soon. Still, we should seriously consider Johan's warning that "If LOD levels get so high that a polygon is as small as a pixel on the screen, the typical 'polygon and texture' approach of the current videochips might not make any sense anymore."
http://www.gamespot.com/pages/forums/sh ... d=24401643[color=red]En otras palabras, la más poderosa AMD, INTEL y PowerPC
según esta prueba corren esto a 2 Frames Per Second. CELL lo corre a 30 Frames Per Second.[/color]
---------------
Durante el pasado E3 ya os contamos la demo tecnológica de Cell, y también la demostración en la que este micro decodificaba 48 películas simultáneamente.
Para que nos hagamos una idea, Cell tiene una capacidad de cálculo de 256 Gigaflops (es decir, procesa 256 mil millones de operaciones de coma flotante por segundo), nuestros PC's, a duras penas, llegan a los 6, pero además, por si fuera poco, la supercomputadora nº 500 más rapida del mundo, procesa "sólo" 850 Gigaflops, por lo que no será de extrañar que en un espacio de tiempo relativamente corto, estas superestaciones que interconectan procesadores entre sí (Cell fue concebido exactamente para eso), comiencen a adoptar este chip.
Por último, hay que recordar que tanto Intel como AMD (pero sobre todo Intel, que se ha quedado estancado en 3.73Ghz), tienen serios problemas para alcanzar los 4 Ghz de velocidad, mientras que la velocidad por defecto de Cell estará entre 4 y 4.6 Ghz, si bien en las demostraciones realizadas, hubo que bajar el clock a una velocidad algo más estable (3.2 Ghz en el caso de la demo del E3).
A este ritmo, AMD e Intel tendrán que aunar esfuerzos para no ser barridos por esta pequeña bestia de IBM, porque de llegar a la arquitectura de nuestros PC's, puede arrasar con todo
http://www.noticias3d.com/noticia.asp?idnoticia=10452[color=red]A menos de que
PCStats.com y
Real World Technologies se equivoquen, CELL será un monstruo. Lo de las operaciones lo leí justamente ayer y llevo 2 horas buscando por internet a ver donde lo leí y no lo encuentro...demos por hecho entonces que lo leí seguramente en alguna página fanática que quería poner el CELL por encima de todo.
Ahora, lo extraño es que todas estas páginas que di --incluyendo una página Xboxera-- afirman el hecho de que CELL trabaja mucho más rápido que cualquier computadora hoy en día en teoría.[/color]
[quote="LoveProof"]Pues en su PC talvez llegue a 1 FPS xD, Pude jugar Oblivion a 1600x1200 con un mínimo de 17FPS (forest) y sobrepasé los 60FPS (dungeons) claro, hubo que hacer SLI de 2 Geforce 7800GTX y creeme que es un juego muy mal optimizado para PC.