# Computer # ++++++++ ## IOMMU - Input Output Memory Management Unit (needed for secure pass through to vm) intel calls it VT-d, Virtualization Technology for directed IO system pages are mapped to virtual address and scatter-gather returns physical memory for a virtual address, essentially creating contiguous virtual memory ## ACPI - Advanced Computer Power Interface ## APIC - Advanced Programmable Interrupt Controller ## AMBA - Advanced Microcontroller Bus Architecture ### APB - Advanced Peripheral Bus ### AXI - Advanced Extensible Interface - for high speed soc components like memory ## SATA - Serial Advanced Technology Attachment ## SCSI - Small Computer Simple Interface ## ISA - Instruction Set Architecture ## HT - HyperThreading; multiple threads per core ## DMA - Direct Memory Access ## DMAR - DMA remapping ## DMAR-IR - DMAR interrupt request ## IRQ - Interrupt Request Queue ## UART - Universal Asynchronous Receiver and Transmitter ## CTS - Clear To Send ## DCD - Data Carrier Detect (DCE sends it to DCD, its the led that ) ## DTE - Data Terminal Equipment (our computer) ## DCE - Data Circuit Terminating Equipment (telephone line end device) ## DSR - Data Set Ready ## RTS - Request To Send ## DTR - Data Terminal Ready ## RI - Ring Indicator (Incoming call ring) ## UEFI - Unified Extensible Firmware Interface ## OVMF - Open Virtual Machine Firmware enables UEFI for virtual machines ## OEM - Original Equipment Manufacturer ## KVM - Kernel-based Virtual Machine is a hypervisor ## PCI - Peripheral Component Interconnect Bus ``` +----------SoC-----------------------------------------------+ | | |+-----+ +-----------------RootComplex--------------------+| || |<->| Host Bridge (BUS 0) || || CPU | | ^ || || | | | || |+-----+ | +------------------+-----------------+ || | | | | | || |+-----+ | +----+--------+ +-------+-----+ +---------+---+|| || |<->| | VirtualPCI- | | VirtualPCI- | | VirtualPCI- ||| || RAM | | | PCIBridge | | PCIBridge | | PCIBridge ||| || | | +--+----------+ +-------+-----+ +---------+---+|| |+-----+ | | | | || | +----|---------------------|-----------------|---+| +---------------|---------------------|-----------------|----+ | | | | BUS 1 | BUS 3 | BUS 9 v v v +--------+-------+ +---+---+ +-------+----+ | PCI Express | | | | PCIExpress | | to PCI/PCX | | Switch| | Endpoint | | Bridge | | | +-------+----+ +--------+-------+ +-+-+-+-+ | | ^ ^ ^ Device 0 [ ] BUS 2 | | | v | | | BUS 5/6/7/8 +--------+-------+ | | | | PCI/PCX | <-----+ | +-----> [PCIExpressEndPts] | Legacy Devices | | +----------------+ +-----> [PCIExpressEndPts] BUS 1: rx+|rx-|tx+|tx-|refclk+|refclk-| perst#(usedToTellWhenTheClockNvoltageSignalsAreStable)|wake#| prsnt1#|prsnt2#(usedForHotPlugDetection)| jtag#|+12v#|rx+|rx-|tx+|tx-|gnd pcie doesn't irq unlike pci, it sends irq via txNrx SoC +-----------------------------------------------------------------+ | | | +-------+ +-------------------------------------+ | | | |---------->| Root Complex | | | | CPU | +-------------------------------------+ | | | | | Configuration Space | | | +---+---+ | 4KB | | | | +-------------------------------------+ | | v | IP Registers | | | +-------+ +-------------------------------------+ | | | | | Source Addr | Dest Addr | Size |Type| | | Z |Memory |<--------- |-------------+-----------+------+----| | | | | | | | | | | | +-------+ +-------------------------------------+ | | 1GB | Configurable Address Space | | | | | | | +-------------------------------------+ | | | +-----------------------------------------------------------------+ PCIe Address Space PCIe Endpoint +--------------------------+ +-----------------------+ 0 | | | | | - - - - - - - - - - - - | | Configuration Space | | ^ | | | | | | | N:1 = | | | | +-----------------------+ | v | | | C +--------------------------+ | Memory Space | | ^ | | | | | | | | X +--------------------------+ +-----------------------+ | | | Y +--------------------------+ | | | | v | D +--------------------------+ | | E +--------------------------+ | | Z | 1GB MEM ADDR | | | +--------------------------+ 2^32/64 - 1 ``` - cpu can program the below by addressing root complex's registers - IP(Intellectual property) registers - lane width - speed mode: gen1/gen2 - registers to Address translation unit(cpu address to pci address) - configurable address space: - configuration, IO, memory, message(non physical) spaces are defined by pci standard - pcie endpoint - configuration space has all the info about device like deviceId, vendorId, classCode, various capabilities and it will have registers to configure the device for (eg. to get into low power state) - pcie configuration space is backward compatible with pci and increase from 256bytes to 4kb. first 64 bytes is called standard header, is of 2 types. type1: rootports/bridges/switches(primary,secondary,subordinate bus number), type0: endpoints ## platform bus- for soc units ## i2c, spi bus- for bridges and panels ## smpi bus - spi bus for power management ## MIPI - Mobile Industry Processor Interface ## DSI - MIPI Display Serial Interface ## CSI - MIPI Camera Serial Interface ## SMP - Symmetric Multi Processing - multiple identical proccessors inter connected to a single shared memory, and to all the I/O devices, unlike asymmetric MP(CPU+dGPU). ## I2c - Inter Integrated Controller - half-duplex: Data transmitted in both direction but not at the same time - only 2 wires are used: SDA(SerialData) and SCL(SerialClk) - it's a overkill for communication between few devices as it needs addressing system: [start][slaveAddress][r/w][ACK][Data(8bits)][Ack]... r: slave reads ## SPI - Serial Peripheral Ineterface - Full duplex: Data transmitted in both direction at the same time - needs 3+N wires: SCLK, MOSI(MasterOutSlaveIn), MISO(MasterInSlaveOut) + SS1(SlaveSelect),2,3,.. ## PCIe - Peripheral Component Interconnect Express - PCIe lanes come directly from CPU except PCH PCIe lanes. Each generation is 2x faster than its predecessor: PCIe 5.0 is 32 GT/s, 4.0 is 16 GT/s and are 2-way compatible across generations. ### PCIe vs PCI - usage model and software interface same as PCI but hardware is not compatible - pcie can work with serial technology while pci works with parallel bus technology - Root Complex: pcie host controller present inside SOC - Endpoint: a pcie device - cpu_addr - CPU physical address (proc/iomem). pci has its own address space, 32bit/64bit depending on the root complex, this address space is visible only to pci components(root complex, endpoints, switches and bridges) ## PCH - Peripheral Controller Hub - Usually manages features on MoBo like USB, WiFi, Ethernet, Sound. But its speed limited to x8 3.0. PCH lanes connect to CPU via DMI link. ## DMI - Direct Media Interface - Intel's proprietary link between the northbridge(or CPU) and southbridge(PCH) It supports concurrent and ischronous traffic. ## ASPM - Active-State Power Management - Saves power in PCIe subsytems by setting a lower power state for PCIe links when the devices to which they connect are not in use. ## hypervisor - software, firmware or hardware that creates and runs virutal machines ## device passthrough - CPU must support hardware virtualization ## USB - Universal Serial Bus - Pins: Gnd, Vcc, D+, D- - D+, D-: differential signalling: if D+=high then D-=low - D+ - D- cancels interference - J: D+=high(>Vihz) D-=low( Vihz, D+ < Vil - Full speed: D- < Vil, D+ > Vihz - plug in device - detect connection - set address - get device info - choose configuration - choose drivers for interfaces - use it ### keyboard - computer sends packet with device id - SYNC|IN|DevId|EndPoint|CRC(DevId|EndPoint): 24 bit - IN: 10010110 - every byte is BigBit Endian(need to reverse bit order to get endianness) - keyboard sends SYNC|8 byte key code followed by 16bit CRC - computer sends SYNC|8bit ACK - endpoint can be matched with lsusb output corresponding to keyboard. - every time computer asks(polls) keyboard for keys, unlike PS2 interface - full speed keyboard 12Mbps with polling interval 1ms - slow speed keyboard 1.5Mbps with polling interval 16ms ## ESP32 emerge -av dev-vcs/git wget flex bison gperf python pyserial pyelftools cmake ninja ccache xtensa-esp32-elf ## memory pages - memory is divided into pages to handle memory fragmentation - also used to allocate memory greater than available memory to swap unused memory to hard disk - page fault occurs if required page isn't in main memory. - this is called virtual memory ## ALU `T add_sat (T x, T y)` add and saturate, adds x and y and if overflowed then returns the maximum value. ## CM4 connection ## GUID - Global Unique Identifier ## ACPI - It is the way BIOS send structured location of devices ## Computer architecture - Outlines the system's functionality, design and compatibility ### System disign - Design of data processors, DMA, GPU, data paths, memory controllers and miscellaneous things such as virutalization and multiprocessing ### ISA - Defines CPU capabilities and functions like data formats, memorty addressing modes, processor register types. word size and the instruction set ### Microarchitecture - also known as computer organization, defines storage elements, data processing and data paths and how they should be implemented in ISA ## componenets of microprocessor - alu, registers, cantrol units to move data between processors, memory caches ## MESI - states of chace block, MOdified(pendingWriteToMain)-Exclusive(OnlyThis CacheHasBlockAndIsClean)-Shared(anotherCacheAlsoHoldingThisBlockUnmodified) and Invalid(modifiedInAnotherCache). also known as illinois protocol. used to maintain cache coherancy(copy of same memory block across processor cores) in hierarchical memory. it is the most common protocol that supports write-back cache. - Directory based coherency: cache state of varioush memory block is maintained in a central or distributed directory as a block->caches map. - snooping coherency: each cache keeps track of coherency of physical memory block its holding. - snooping and directory can be mixed in case of multichip multiprocessor. - used by many coherant memories. non-coherant memories need software based syncing ## MOESI - O(owned): The other caches will get the block from the cache that has 'O' bit. - there are others similar to MESI MOESI ## what is a snooping protocol? - also called bus snooping protocol, maintains cache coherency in symmetric multiprocessing environments. Whenever a processor writes to its cache, it broadcasts the address of the modified block to the bus. other processors that have a copy of the same block in their caches can either invalidate or update it, depending on the protocol variant. But bus can become a bottleneck as the number of processors and cache accesses increase. the protocol also requires all the caches to monitor the bus constantly comsuming power and bandwidth. it is not suitable for distributed network where the bus is replaced by a network. ### mutex - implemented using atomic operations like LOCK prefixed ops like LOCKXCHG. - LOCKXCHG swaps register value with memory value in uniterrupted step. - its achieved through cache coherency on modern systems. on older systems bus locking is used. - memory barrier is called before LOCK op to ensure all prevoius memory operations are completed. ## different hazards - structural hazards: occur from resource conflicts when the hardware can't support all the possible combinations of instructions in synchronized overlapped execution - data hazards: data being corrupted due to being modified by different stages of pipeline - control hazards: occur from the piplelining of branches and other instructions ## what is pipelining - keeping all stages of execution engaged all the time to maximize the work done is called pipelining. ## type of interrupts - internal interrupts(software interrupts) caused by software instruction representing an event like SIGINT(Ctrl+c). - external interrupts(hardware interrupts) caused by external hardware module. ## cache mapping: maps memory blocks and cache locations. - Direct mapping: easiest way, maps each block of the main memory into only one possible cache line. When a new block needs to be laoded, the old block is trashed. `i = j % m`(i:cache line no., j: main memory block no., m: number of lines in the cache) - Associative mapping: fastest and most flexible, any block can go into any line of the cache. The work id bits are used to identify which word in the block is needed. - Set-Associative mapping: cache is divided into sets and a memory blocked can be mapped to a cache set and is loaded into any location in that set. ## common rules of assembly language - the label field can either be empty or may define a symbolic address. - instruction fields can specify machine pseudo instructions - comment fields can be commented with or left empty - in case of symbolic addrsses, up to 4 char are allowed - comment field begins with '/', symbolic address field terminate by "," ## RAID: Redundant Array of Independent Disks ## Hardware methods to establis a priority: - Parallel priority: ## JTAG (Joint Test Action Group) - TDI(TestDataIn), TDO(TDOut), TMS(TModeSelect), TCK(TClock), TRST(TReset), SRST(SystemRST), RTCK(ReturnTCK) pins - configure multi-core debugging - C232HM-EDHSL(5v/450mA) and C232HM-DDHSL(3.3v/250mA(RPi)) JTAGtoUSB adapters - OpenOCD(OnChipDebugger): software to which gdb can connect and debug instructions that the cpu executing currently ## Parallel port - 3 8bit registers - 1st 8bit register connected to 5 GPIO outputs and this can be connected to a 7 segment register for example - 2nd byte connected to 5 GPIO inputs with internal pull up registers. - 3rd byte is 4 outputs. ## AMD - Advanced Micro Devices Zen architecture for x86-64 based Ryzen series in 2017 by Jim Keller. ## ARM - Advanced RISC Machines ### Architectures #### v4T - Halfword and signed - Halfword/byte support - System mode - Thumb instruction set #### v5TE - Improved ARM/Thumb - Interworking - CLZ - Saturated arithmetic - DSP multiply-accumulate #### v6 - SIMD instructions - multi-processing - v6 Memory architecture - unaligned data support - Extension: - Thumb-2 (v6T2) - TrustZone (v6Z) - Multicore (v6K) - Thumb only (v6-M) #### v7 - Thumb2 - NEON - TrustZone - Virtualization - Architecture Profiles - v7-A (Applications): NEON - MMU, high efficiency, multitasking, trustzone, 40bitAdrressing, virtualization extensions - v7-R (Real-time): Hardware divide - Protected memory (MPU): no virtual memory - Low latency predictability real-time needs - tightly coupled memories for fast, deterministic access - v7-M (Microcontroller): Hardware divide,Thumb-2 only - low gate count = low cost - deterministic and predictable behavior a key priority - deeply embedded use - architecture specifies instruction set but can have different implementations - Cortex-A8 core is v7-A with 13-stage pipeline - Cortex-A9 core is v7-A with 8-stage pipeline - ![CoresArchsFeatures](images/ArmArchsNCores.png) ### Data Size and Instruction Sets - Now a days though many instructions are not in RISC mode, most instructions execute in a single cycle, orthogonal register set, load-store architecture - ARM is 32-bit load-store architecture; most internal registers are 32bit - the only memory accesses allowed are loads and stores - ARM instruction set: 32 bit - Thumb Instruction set: 32/16 bit - switching arm n thumb is called interworking handled by compile n linker - Older cores support 16-bit thumb instructions only - Thumb-2 technology in current cores adds 32-bit instructions to Thumb - ARMv7M only support thumb ### Processor Modes - Most ARM cores have seven basic operating modes - each mode has access to its own stack space and a different subset of registers called register banking - Some operations can only be carried out in a privileged mode - previleged modes - unrestricted access to hardware componentes and can execute previleged instructions which can be dangerous - supervisor mode: entered on reset and on supervisor call instruction - FIQ: when a high priority interrupt is raised - IRQ: on normal priority interrupt is raised - Abort: Used to handle memory access violations - Undef: used to handle undefined instructions - all the above are called exception modes, register banking makes nested excpetions of different kind handling much more efficient, but same kind is complicated - System: previleged mode using the same registers as user mode - unprevileged mode - user mode is unprevileged, cant disable interrupts or reconfigure mem access - this mode structure only applies to cortex A n cortex Arm, for cortexM its completely different, it has: - Thread Mode(Unprevileged): for application code - Handler Mode(previleged): for exception handlers - above modes are switched upon exception entry or return - by default they operate on seperate stacks but can be configured to same stack and also both modes can be configured to be previleged. - switching occurs by register organization - ![modeSwitching](images/ArmModeSwitch.gif) - spsr savedProgramStatusRegister holds pinter snapshot of current system state at the moment of exception - when operating in thumb state the fields in instruction are not large enough to address all the registers so they can directly address only low registers(r0-r7), there are only 1 or 2 thumb instrcutions that can access high registers(r8-r15) ### Cortex-M register set - 13 general purpose registers r0-r7(low) r8-r12(high) - - StackPointer(SP): r13 - LinkRegister(LR): r14 - ProgramCounter(PC): r15 - 1 special register xPSR ProgramStatusRegisters - sp is switched between HandlerMode and ThreadMode ### sPSR - 31|30|29|28|27 24|23 19 |16|15 10|09|08|07|06|05|04 00| N Z C V Q J GE[3:0] E A I F T mode - Condition code flags - N: negative result from ALU - Z: zero form ALU - C: ALU operation Carried out - V: ALU op results in signed oVerflow - Q: Stick overflow flag used by saturating instructions - GE[3:0] used to record multiple results from SIMD instructions - 31 to 27 are only bits that are modified by user mode instructions - mode bits specify the current processor mode. in previleged mode these can be changed manually to change processor mode. - state bits - T: tells if executing ARM or Thumb instruction - J: Jazell state: tells if any of the cores executing java byte code. - I: irq, F: frq enabling or disabling, A: can disable asynchronous data aborts, E: can change endianness of data interface dynamically - remaining bits indicate internal system state and should never be modified. ### xPSR - 31|30|29|28|27 24|23 19 |16|15 10|09|08|07|06|05|04 00| N Z C V T ExceptionNo. - T is always set to 1 coz its M ### Exceptions - internal or external - synchronous or asynchronous - when exceptions occurs a snapshot of current state is saved by coyping CPSR to SPSR, PC to LR, switches to appropriate exception mode, disables interrupts, uses vector table to find exception handler - Vector Table ponter to handler of each exception type 0xIC FIQ 0x18 IRQ 0x14 Reserved 0x10 Data Abort 0x0C Prefetch Abort 0x08 Sofware Interrupt 0x04 Undefinde Instruction 0x00 Reset - once handler is done mode is switched copying cpsr to spsr and LR to PC ### Exceptions in Cortex M are completely different ### Security Extensions (TrustZone) - implements 2 virtual machines on same hardware Normal Secure Applications|Applications Trusted sevices GuestOS |GuestOS Trusted OS Hypervisor Transition is handled by Secure Monitor Program ### ARM Instruction Set are 32bit - Most instructions can be contionally executed, each instruction has condition field and is not executed if its not matched with current status of ALU in CPSR. majority are conditonal - load/store instuction set - no direct manipulation of memory - syntax of instruction SUB r0,r1,#5 => r0=r1-5 ADD r2,r3,r3,LSL #2 => r2=r3+(r3*4) i.e r3 shifted left 2 places ANDS r4,r4,#0x20 //notice suffix S;means ALU condition codes in CPSR will b //updated, by default they are not changed if no suffix S ADDEQ r5,r5,r6 i.e. if (EQ) r5 = r5 + r6, //should google what EQ is B