Directions in ISA Specification

نویسنده

  • Anthony C. J. Fox
چکیده

This rough diamond presents a new domain-specific language (DSL) for producing detailed models of Instruction Set Architectures, such as ARM and x86. The language’s design and methodology is discussed and we propose future plans for this work. Feedback is sought from the wider theorem proving community in helping establish future directions for this project. A parser and interpreter for the DSL has been developed in Standard ML, with an ARMv7 model used as a case study. This paper describes recent work on developing a domain-specific language (DSL) for Instruction Set Architecture (ISA) specification. Various theorem proving projects require ISA models; for example, for formalizing microprocessors, operating systems, compilers and machine code. As such, (often partial) ISA models exist for a number of architectures (e.g. x86, ARM and PowerPC) in a number of theorem provers (e.g. ACL2, PVS, HOL-Light, Isabelle/HOL, Coq and HOL4). These models differ in their presentation style, precise abstraction level (fidelity) and degrees of completeness. In part this reflects the nature of the projects for which the models have been originally developed, e.g. compiler verification [4] and machine code verification [6]. There are also differences based on the expressiveness and features of the theorem provers that are used. The ACL2 theorem prover has been used very successfully in this field for many years, where it has the advantage of providing very fast model evaluation. Recently, Warren Hunt has developed an ACL2-based specification of the Y86 processor, which implements a subset of the x86 architecture; see [3]. The main objective of the DSL is to make the task of modelling ISAs simpler, more reliable and less tedious. In particular, it should be possible for people who are not experts in HOL4 to readily read, develop and create ISA specifications for use in HOL4. However, it is also hoped that this work will help facilitate the dissemination of ISA models — enabling various concrete ISA models to be derived for different settings, tools and use cases. Although various ISA DSLs currently exist, often these have been developed for writing compiler backends and binary code analysis tools, e.g. λ-RTL [7] and TSL [5]. The most closely related work is Lyrebird [1], which was developed as part of the seL4 project at NICTA. This tool supports fast simulation but it has not been successfully used in a theorem proving setting. The aim of this work is to produce high-fidelity specifications that are inherently formal and yet prover/tool agnostic. The DSL and generated native prover specifications should be acceptable to both the engineering (computer architecture) and formal methods communities. DR AF T 1 Language Design and Methodology The design of the DSL has been influenced by our experiences in specifying the ARM architecture in HOL4, which is described in [2]. In particular, the DSL has been developed and tested through the production of a completely new version of the ARMv7 specification.1 However, it is believed that the DSL is flexible enough to produce good models of other ISAs, such as the x86 architecture. Methodology. The requirements for the DSL are based on our current specification approach, where we define: – A state space. This represents all of the programmer visible registers, flags and memory. It may also include components that are not directly visible, such as static system configuration information (e.g. describing the architecture version and extension support) as well as any helpful shadow state components (e.g. the bit width of the current instruction). – An instruction datatype. This provides an interface between instruction decoders and the instruction set semantics. – A collection of next state functions for each instruction class. This provides an operational semantics for each element of the instruction datatype. – A decoder. This maps machine code values to the instruction datatype. – An encoder (optional). This maps elements of the instruction datatype to concrete machine code values. – A next state function. This fetches an instruction from memory, decodes it and then calls the next state function for that instruction. This approach has been implemented directly in HOL4 for the ARM, x86 and PowerPC architectures. However, there are areas where producing and maintaining ISA specifications in HOL4 is unduly tedious and potentially error prone. The DSL improves upon native HOL4-based specifications in a few key areas: – In HOL4 the state space is declared as a type, which means that all state components must be introduced early on and all in one go. It is also necessary to manually introduce collections of functions for accessing and updating state components and sub-components (e.g. named bit-fields). It is more natural to introduce state components in context, as and when they are needed. For example, within separate sections for the specification of machine registers and main memory. In the DSL state components are treated as global variables that may be declared anywhere at the top-level. – The instruction datatype is also declared as a HOL4 type, which is somewhat tedious to specify and to maintain. The instruction datatype can be built incrementally, using the type signatures of the functions that define the instruction semantics. 1 The new model actually covers the very latest incarnation of ARMv7, which adds support for a new “hypervisor” mode. DR AF T – Writing a good decoder is particularly challenging in HOL4. This is primarily because HOL4 does not provide direct support for matching over bit patterns. There is also the challenge of making the decoder evaluate efficiently, which currently requires some degree of HOL4 expertise. Matching over bit patterns is built into the DSL. We hope to support fast evaluation for the generated HOL4 model, since building this capability into the translator will save HOL4 users a lot of work. These and other language features make it much easier to write ISA specifications in a natural style; making it possible to automatic generate HOL4 specifications that are otherwise hard or tedious to write manually. Language Overview. The DSL is a first-order language with a fairly basic type system.2 The intention is to keep the design of the DSL reasonably simple; shunning features that are not directly focussed on the ISA domain. This reduces the effort required to implement the language and it should help simplify the task of targeting models to different settings. Although the DSL is not particularly sophisticated, there were no problems in concisely specifying ARMv7. Types. The primitive types of the language are: unit, bool, string, nat, int, bitstring and bits(n), where n is either fixed or is constrained to a (possibly infinite) set of positive integers at the point of a function definition.3 Type checking is implemented using Hindley-Milner inference, with some additional light-weight support for bit-vectors. Users can declare type synonyms, records, enumerations and non-recursive sum types. Constructors for product, map and set types are provided. For example, the following are valid declarations: type reg = bits(4) -type synonym (this is a comment) type word = bits(32) type men = word → bits(8) -map construct SRType -enumerated type { SRType_LSL, SRType_LSR, SRType_ASR, SRType_ROR, SRType_RRX } construct offset -sum type { register_form :: reg * SRType * nat -product immediate_form :: word } Syntax and Constructs. The DSL syntactically distinguishes between statements and expressions. Mutable values can be declared and updated in statements but not in expressions.4 There are if-then-else, when-do, match-case and for-do constructs. Function calls are strictly call-by-value but side-effects are possible, i.e. the global state can be updated. Exceptions can be declared and called, but not handled. 2 Recursive types, type polymorphism and dependent types are not supported. 3 Floating-point support will be added in the future. 4 For now let-expressions are only possible in statements too. DR AF T A wide selection of primitive data operations are provided. Users can define their own operations but cannot give these symbolic or infix/mixfix syntax. The following declaration defines n-byte word alignment: bits(N) Align (w::bits(N), n::nat) = return [n * ([w] div n)] The operation ‘[·]’ is used as a general casting map for primitive types. All types are inferred above using the function’s arguments, which must be annotated. Registers. The DSL supports declarations of register types with named bit-fields. The following declares a type for ARM’s Programme Status Registers: register PSR :: word { 31: N -Condition flag (Negative) 30: Z -Condition flag (Zero) 29: C -Condition flag (Carry) 28: V -Condition flag (oVerflow) 27: Q -Cumulative saturation flag 15-10, 26-25: IT -If-Then 24: J -Jazelle bit 23-20: RAZ! -reserved 19-16: GE -Greater-equal flags (SIMD) 9: E -Endian bit (T: Big, F: Little) 8: A -Asynchronous abort disable 7: I -Interrupt disable 6: F -Fast interrupt disable 5: T -Thumb mode 4-0: M -Mode field } This introduces a new type PSR that corresponds with a 32-bit word. The named bit-field M is a 5-bit word and N is a Boolean flag. There are a few special bit-field categories, such as RAZ! (read-as-zero), for anonymous fields. Note that the 8-bit IT component is built from non-consecutive bit ranges. To read the IT field one can write CPSR.IT, which is equivalent to &CPSR<15:10> : &CPSR<26:25>; where the overloaded operator ‘&’ maps registers to their bit-vector values, ‘·<·:·>’ is bit-field extraction and ‘:’ is bit-vector concatenation. In the HOL4 model [2], PSRs are defined using a record type and encoding/decoding functions are manually defined. It is now relatively easy to automatically generate these types and functions for each of ARM’s system registers, saving time and effort. State. Global state components are declared as follows: declare CPSR :: PSR declare MEM :: bits(32) → bits(8) These components can be updated with various assignment forms, for example: CPSR.N ← true; &CPSR<31> ← true; CPST.M ← ’11010’; &CPSR ← 0x11; &CPSR<31:28> ← ’1101’; CPSR ← PSR(0x11); MEM(4) ← &CPSR<15:8> DR AF T The dot syntax also applies to conventional record types. Users can define their own update operations; for example, consider the following declaration: component NZCV :: bits(4) { value = &CPSR<31:28> assign v = &CPSR<31:28> ← v } This declaration makes it easy to access and modify the NZCV status flags. The component construct is particularly useful for declaring operations that access registers and memory. For example, one can write: NZCV ← NZCV && ’0101’; R(12)<15:0> ← MemU(address, 2); MemU(address + 2, 2) ← R(12)<31:16> where the operations R and MemU provide an interface to the general-purpose registers and memory. Note that the physical register corresponding with the argument 12 actually depends on the current processor mode, given by CPSR.M; and memory accesses are affected by the endian bit CPSR.E. Instruction Specification. The following DSL code specifies the semantics of the ARM instruction BLX (register): define Branch > BranchLinkExchangeRegister ( m :: reg ) = { target = R(m); if CurrentInstrSet() == InstrSet_ARM then { next_instr_addr = PC 4; LR ← next_instr_addr } else { next_instr_addr = PC 2; LR ← next_instr_addr<31:1> : ’1’ }; BXWritePC (target) } This declaration extends an abstract syntax tree (AST) datatype instruction. A primitive operation Run :: instruction → unit runs the code associated with the given AST. The > notation allows instructions to be grouped into a hierarchy of instruction categories. Instruction Decoding. A decoder is any function that takes the output of an instruction fetch and returns values of type instruction. Users are free to define such functions in any way that they see fit. A natural choice for decoding ARM instructions is through pattern matching over bit patterns. The ARMv7 decoder is approximately four thousand lines of code (including comments), with 224 top-level cases. Missing and redundant patterns are reported, which is essential in this context. Below is a code snippet relating to the BLX instruction: DR AF T instruction Decode (mc::MachineCode) = match mc { ... case ARM (’cond : 00010010 : (111111111111) : 0011 : Rm’) => if Take (cond, ArchVersion() >= 5) then { when Rm == 15 do DECODE_UNPREDICTABLE (mc, "BLX (register)"); Branch (BranchLinkExchangeRegister (Rm)) } else Skip () ... } Bit patterns are surrounded by apostrophes. Bracketed bit fields are “should-be” tokens — they match any field of the appropriate length. The bit-widths of variables can be annotated or given default values: cond and Rm were declared as 4-bit values. When an op-code is not valid the user function DECODE_UNPREDICTABLE is called, which raises a suitable exception. The user defined functions Take and Skip take care of conditional (no-op) and undefined instructions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Examining Urban Impervious Surface Distribution and Its Dynamic Change in Hangzhou Metropolis

Analysis of urban distribution and its expansion using remote sensing data has received increasing attention in the past three decades, but little research has examined spatial patterns of urban distribution and expansion with buffer zones in different directions. This research selected Hangzhou metropolis as a case study to analyze spatial patterns and dynamic changes based on time-series urba...

متن کامل

ISA-TAB-Nano: A Specification for Sharing Nanomaterial Research Data in Spreadsheet-based Format

BACKGROUND AND MOTIVATION The high-throughput genomics communities have been successfully using standardized spreadsheet-based formats to capture and share data within labs and among public repositories. The nanomedicine community has yet to adopt similar standards to share the diverse and multi-dimensional types of data (including metadata) pertaining to the description and characterization of...

متن کامل

On Proving with Event-B that a Pipelined Processor Model Implements its ISA Specification

Microprocessor pipelining is a well-established technique that improves performance and reduces power consumption by overlapping instruction execution. Verifying, however, that an implementation meets this ISA specification is complex and time-consuming. One of the key verification issues that must be addressed is that of overlapping instruction execution. This can introduce hazards where, for ...

متن کامل

An ISA-Tab specification for protein titration data exchange

Data curation presents a challenge to all scientific disciplines to ensure public availability and reproducibility of experimental data. Standards for data preservation and exchange are central to addressing this challenge: the Investigation-Study-Assay Tabular (ISA-Tab) project has developed a widely used template for such standards in biological research. This paper describes the application ...

متن کامل

High Level Synthesis from Sim-nML Processor Models

The design of modern complex embedded systems require a high level of abstraction of the design. The SimnML[1] is a specification language to model processors for such designs. Several software generation tools have been developed that take ISA specifications in Sim-nML as input. In this paper we present a tool Sim-HS that implements high level behavioral and structural synthesis of processors ...

متن کامل

Antecedents of Employees' Information Security Awareness - Review, synthesis, and Directions for Future Research

Living in a digital age, where all kinds of information are accessible electronically at all times, organizations worldwide struggle to keep their information assets secure. Interestingly, the majority of organizational information systems security (ISS) incidents are the direct or indirect result of human errors. To explore how organizations can defend themselves against harmful ISS behaviour,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012