What is Meltdown security vulnerability?
Meltdown, also known as Rogue Data Cache Load, is a security vulnerability that affects microprocessors of the types Intel x86, IBM Power, and ARM, by allowing some malicious processes to read memory, even without authorization.
Exploiting race condition, this vulnerability allows a process to bypass the normal privilege check that prevents a process from accessing data belonging to other processes and the operating system which enables an unauthorized process to read data from any address that is mapped to the current process’s memory space. Therefore, the data from an unauthorized address gets loaded into the CPU’s cache, from where it can easily be execute by the processor, even if the privilege checker somehow identifies the rogue process trying to access other addresses in the RAM. It is also possible for the malicious process to effectively read any physical, kernel or any other process’s mapped memory, even though it may not have the permission to do so.
How does Meltdown exploitation work?
Meltdown takes advantage of the following features inherent in CPU design:
- Virtual Memory
- Privilege levels
- Instruction pipelining and speculative execution
- CPU cache
The above features provide the basis of the way all the modern CPUs work, and are considered secure. However, meltdown takes advantage of how they interact with each other. The following points explain the exploitation mechanism:
- The virtual address space of an operating system doesn’t have privilege control checks in order to maximize efficiency. The control mechanism is entrusted with the CPU privilege control. Like every other process, the rogue process too, is assigned some address space here.
- If a process tries reading from unauthorized memory, the read instruction will be scheduled and pipelined by the CPU. Before the instruction is allowed to produce any output, the privilege check will complete elsewhere. In the case of an unauthorized read, the execution unit will be told that the instruction failed the privilege check.
- In the early stages of the instruction execution, the CPU’s scheduler scheduled two events – a privilege check, and the first steps of executing the instruction. As part of that, while it was waiting for the privilege check to complete, the execution unit started by fetching the data. In the case of the rogue process, the data was from an unauthorized address, but it was still fetched by the memory controller during the initial stage of instruction execution, even if it was then discarded and abandoned when the privilege check completed and failed.
- Despite the instruction failure, the data has already been requested by the execution unit and fetched by the memory controller, in order to be ready to process it, and although the execution unit discards the data upon privilege check failure, the CPU cache was in fact updated as an automatic part of fetching the data from memory, in case the same data might be needed shortly a second time. At this point, Meltdown kicks in.
- By employing cache timing side channel attack, the rogue process can determine whether data from a specific address is held within the CPU cache, even if it cannot itself read the actual data from there.
- If data from some address has been cached by the CPU then a second instruction to read that address will use the CPU cache for the purpose (fast), if not then the CPU would have to request the data to be read from memory (slower). The rogue process can use this difference in timing to detect which of these took place, and whether the address was already in the CPU cache. Meltdown can use it combined with other features of the CPU instruction set to gain full access to all mapped memory.
How have processor vendors addressed this vulnerability?
Discovered in early 2018, the meltdown vulnerability caused a lot of panic throughout the processor making industry as well as several hardware/software firm. Therefore, to combat this crisis, Intel. AMD, Snapdraggon etc released distinct patches for their own products. However, all of them loosely implemented the same mitigation strategy, called Kernel page table isolation, which separates user-space and kernel-space page tables entirely. One set of page table includes both kernel-space and user-space addresses same as before, but it is only used when the system is running in kernel mode. The second set of page table for use in user mode contains a copy of user-space and a minimal set of kernel-space mappings that provides the information needed to enter or exit system calls, interrupts and exceptions.
Although KPTI has enabled the device to be invulnerable to the meltdown issue, it has also resulted in performance loss, with some Intel processors losing upto 30% of their performance. However, Intel has assured it’s consumers that the performance issued will be fixed over time.
Several guidelines have also been published to help the end users to remain unaffected from the newly discovered vulnerability, which include regularly updating software, not clicking on unrecognized hyperlinks, and not downloading software or documents from untrusted sources.
References:https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)
https://meltdownattack.com/