ST200: A VLIW Architecture for Media-Oriented Applications Paolo Faraboschi Hewlett-Packard Laboratories (Cambridge, MA) Fred Homewood STMicroelectronics (Cambridge, MA) http://www.hpl.hp.com/cambridge/projects/cfp http://www.st.com
Project Overview vST200: the First implementation of the “Lx” Processor Family vJoint Hewlett-Packard Labs and STMicroelectronics Design • Technology platform for System-On-Chip (SOC) VLIW cores vLx is an “architecture framework” • Customizable and Scalable, high performance VLIW • Targeting ‘soft core’ approach for SOC • Includes hardware and toolchain – Very aggressive retargetable ILP compiler is fundamental • Presented as one compatible architecture family to the user
1
HP Labs: Evolution of Computing, Towards the Information Utility Very Large Scale Computing (Internet Data Centers)
Target Applications for the Lx Family vInteger computation-intensive media-processing applications • Programmed in a high-level language (C, C++, extensions) • “DSP-style” computational kernels • Significant “control” component – Multitasking O/S, user interfaces, interrupt handlers, etc. vExamples of domains that share these common properties • Digital still-imaging • Video processing • Networking, cryptography • Audio processing
4
Architecture and Compiler Design vLx Design Philosophy • “Build only the features you can compile for” – Compiler-driven architecture and microarchitecture design – Compiler technology already in place – Built-in scalability in compiler, tools and ISA vLx compiler based on HP Labs technology • Descendant of the Multiflow compiler • Very aggressive and robust ILP compiler • Global scheduling (trace scheduling) • Table driven retargetable – Wide class of VLIW architectures, including clusters
5
Major Lx Architectural Features vBase VLIW ISA + Extensions • Efficient 32-bit integer ISA • Extensible for Floating point and SIMD • Large set of general purpose and branch registers • Simple predication through “select” operations • Static branch prediction vExecution Model • Precise interrupts, explicit speculation model vCustomizable for a specific application • Variable number of clusters and registers, cache sizes, and operation latencies
6
Multi-Cluster Lx Architecture
Single PC
MMU
Cluster 0 Instruction Cache Fetch and Expansion Unit
v Syllables are grouped to form Bundles (VLIW instructions) • Variable-size length with Bundle-stop bits –Allows for no-op folding • Clustering with Cluster-start bit –Allows scalability v Simple and extensible coding format
9
ST200 Lx Cluster Pipeline Structure Fetch Phase
Decode Phase
Read Phase
E1 Phase
E2 Phase
Write Phase
Mul
128b Wide ICache Word
I-buffer And Variable Size Decoding
Register File Access And Immediate Generation
IU
Load/Store
Register Rollback, Writeback and Bypassing
Dcache Fetch Address Generation
PC and Branch Unit
Exception Generation
10
ST200: The first Lx Implementation vFirst implementation sampling January 2001 • ST200-STB1 device – Includes Lx and peripherals vInstance of single cluster Lx • 4 issue processor with 32kByte I$ and 32kByte D$ • 64 x 32-bit registers • 2 Multipliers, 4 integer, 1 load/store, 1 branch • 300MHz in 0.25µ technology (2.5V) – Single-cluster processor core 5mm2 – Total core size : 21mm2 • 400MHz in 0.18µ technology (1.8V)
11
ST200-STB1 Chip Features v32 mm2 in 0.25µ technology Core: 21mm2, Peripherals: 11mm2 v372 BGA, 1.7 W peak power v64-bit SDRAM / DDRAM interface PCI Port
PCI I/F
DRAM Port
DDRAM Interface
Lx VLIW Core
ST-Bus Bridge Clock Control
Timer
Debug Support
Interrupt Controller
Serial Port 1
Serial Port 2
EPROM Port
12
Performance Analysis of ST200-STB1 Measurements • Benchmarking of ST200-STB1 : – Performance with a real memory system – SDRAM 133MHz – Level 1 32K I$ and 32K D$
– Code size • Compared to Reference platform : – StrongARM SA-110 / 275 – High-end 32b embedded RISC CPU – Corel Netwinder system (gcc / Linux)
13
Industry Standard and Application Specific Benchmark Suites Name
Wrapping Up… vVLIW is becoming the predominant embedded / DSP technology • Custom-VLIW: right balance of performance and flexibility vHP/ST Lx is a custom-VLIW “technology platform” • Can be effectively customized to an application • First implementation (ST200-STB1) sampling soon vCustomizing embedded VLIW is beneficial • 4x-12x gains vs. a general-purpose RISC architecture – Starting from C-level code – At similar cost and technology vGeneral purpose code performance • Comparable to RISC • Efficient precise interrupts and exceptions in the ST200
Technology platform for System-On-Chip (SOC) VLIW cores. âLx is an ... Computing. (Internet Data Centers) ... Built-in scalability in compiler, tools and ISA.
Figure 5: Renormalization and bit insertion normalizes the values of low and range in the interval. [0, 1] so that they are at least separated by a QUAR-.
degree of customization or scaling for a particular application ..... The memory savings for the code compression algorithm averages ..... tradeoffs into account. 3.
we developed the architecture and software from the beginning to support both ... from companies implementing variations of traditional embedded ...... Page 10 ...
Benefits of VLIW e VLIW design ... the advantage that the computing paradigm does not change, that is .... graphics boards, and network communications devices. ere are also .... level of code, sometimes found hard wired in a CPU to emulate ...
Advances in communications technology, development of powerful desktop workstations, and increased user demands for sophisticated applications are rapidly changing computing from a traditional centralized model to a distributed one. The tools and ser
AbstractâWatermarking is the process that embeds data called a watermark, a tag, ...... U. C. Niranjan, âVLSI impementation of online digital watermarking techniques with ... Master's of Engineering degree in systems science and automation ...
We have been inspired by computer science studies where design patterns have been introduced to ease software development process by allowing the reuse ...
Jul 5, 2005 - (navigate to the ARM Software development tools section, ABI for the ARM Architecture subsection). .... http://www.sco.com/developers/gabi/2003-12- ..... LX. DCD R_ARM_GLOB_DAT(X). PLT code loads the PLTGOT entry SB- relative (§A.1). D
Keywords. Agent architectures, computer game agents, teleo-reactive programs .... ing maintenance goals persist in the arbitrator, even when the goal condition ...
AbstractâThe rapid burst of Internet usage and the corre- sponding growth of security risks and online attacks for the everyday user or enterprise employee lead ...
servoing research use specialised hardware and software. The high cost of the ... required to develop the software complicates the set-up of visual controlled ..... Papanikolopoulos, N. & Khosla, P.- "Adaptive Robotic Visual. Tracking: Theory ...
Jun 23, 2010 - models in a technique we call host multithreading, and is particularly ..... L1 Instruction Cache Private, 32 KB, 4-way set-associative, 128-byte lines. L1 Data ..... In Proc. of the 17th Int'l Conference on Parallel Architectures and.
data, to allow evaluation with the official shared task scoring code. .... Shared Task evaluation framework of Xue et al. (2015). 3.1 Data ... Some mis- takes occur ...
to the current game situation and assign priorities to goals based on the current situation; ... Our architecture contains four main components: memory, a set of goal generators, a program selector and the ... interface takes information from the env
Recently, interest in service robots has been increasing in ... As it may be deduced from its definition, a service robot is ..... Publisher, San Francisco, CA, 2007.
Nov 15, 2016 - services and entertainment [2]. .... credit/debit card payment (see also the next section). Note ... (ii) Executes online and hence must have.
of CPU cores, while maintaining a high performance-to- power ratio, which is the key metric if hardware is to con- tinue to scale to meet the expected demand of ExaScale computing and Cloud growth. In contrast to the KVM Hypervisor platform, the Xen.
One of the main privacy risks perceived by users is that of a computer âfiguring things ... In other words, the simple fact of showing interest in a certain item may be .... of a tag cloud, which may be regarded as another kind of histogram. .....
Each of these areas can benefit from high performance imple- mentations .... several disadvantages are immediately apparentâfor each iter- ation there are two ...
phishing web sites or command-and-control servers, spamming, click fraud, and license key theft ... seen in the wild [9,10]. Therefore, it is .... Each behavior graph has a start point, drawn as a single point at the top of the graph ..... C&C server
incremental software architecture a method for saving failing it implementations contains important information and a detailed explanation about incremental ...