Exetools - Windows on Arm64, x86/x64 emulation

Exetools (https://forum.exetools.com/index.php)

- General Discussion (https://forum.exetools.com/forumdisplay.php?f=2)

- - Windows on Arm64, x86/x64 emulation (https://forum.exetools.com/showthread.php?t=20066)

Windows on Arm64, x86/x64 emulation

I'm wondering how the x86/x64 emulation on arm64 works, is the image loaded to memory the x86/x64 original and there is just a transpiled arm64 shadow copy, or is the code transpiled before and the loaded image is arm64 only.

I don't have any arm hardware yet, but am already thinking about how function hooks in x86/x64 emulated on arm64 would, or maybe wouldn't, work.

Anyone here who already has some experience and could shine some light on it?

I have found some information: https://blogs.blackberry.com/en/2019/09/teardown-windows-10-on-arm-x86-emulation

So it seams that what lives in memory is the x86/x64 image and its only transpiled piece by piece on demand. I hope a call to FlushInstructionCache will invalidate the cached transliled result ...

EDIT: now I just need some good device to experiment on, any suggestions?

Was about to post that link but apparently I'm late and you found it, this one also has a good information https://wbenny.github.io/2018/11/04/wow64-internals.html

Also this fantastic CODE BLUE talk which has juicy details:
- https://www.slideshare.net/ffri/appearances-are-deceiving-novel-offensive-techniques-in-windows-1011-on-arm-250472833
- https://www.youtube.com/watch?v=amHAot3X8cE

You can try to run ARM64 windows on QEMU.
Im not sure about the part of capability of running x86/x64 app on ARM64 Windows tho

@sh3dow thanks

@kino0924 emulating arm64 on x64 which in turns emulates x86/x64 sounds painfully slow.
That I'm probably better of trying to run it naively on an old raspberry pi :p

Also interesting: https://oofhours.com/2021/02/19/running-x64-on-windows-10-arm64-how-the-heck-does-that-work/

So some things I learned about x64 on arm64, it seams MSFT went all in to provide a good interoperability between x64 and arm 64 code.

They have introduced a new type of PE so called CHPEv2 (Compiled Hybrid Portable Executable) which can contain booth x64 and arm64 code, as far as I understand in practice this is mostly arm64 code with x64 entry-points.
These are called ARM64EC in the VS 2019/2022 tool chain.

An executable compiled as x64 can load ARM64EC dll's and call them normally,
the x64 wrappers have some (as far as I can tell) dummy prollogs, large enough to install x64 hooks.
The intended use case for this is to load system libraries from system32 which on arm64 are all provided as such CHPE's so no C:\Windows\SysX8664 or alike.

So to say 64 bit is 64 bit no mater if its ARM or AMD, also no separate registry paths. Its all thoroughly mixed together, unlike SysWOW64 or SysArm32 that booth get an own system directory and an own registry redirection.

A executable compiled as ARM64EC can load x64 Dll's just fine, haven't looked into the hook ability in this scenario yet (TODO)
The intended use case is to allow developers to port a part of their application to Arm64 and keep the rest x64 for the time being, as well as to provide compatibility with x64 plugins and extensions (according to MSFT's docs).

So technically the x64 on arm thing is an own feature and should not be confused with being an other version or an extension of WOW its a separate interoperability feature, one which works quite different, unfortunately I don't know if MSFT named it somehow. Based on some dll and service names its probably called XTA that would probably stand for "x86_x64 to ARM", or something like that.

XTA is a just in time compiler that converts x64 or x86 code when needed to arm64 which is the being executed, the loaded binary image in memory stays untouched x64/x86, hooking any portion of it seams to work just fine, of cause FlushInstructionCache is probably particularly important to ensure the XTA cache gets updated.

When running a x86 application on a arm64 machine in fact booth WOW and XTA seam to work together, while the SysWOW64 directory contains only x86 dll's there is an other one called SyChpe32 that contains something like ARM64EC just in 32 bit so ARMEC (?) unfortunately MSFT did not provide a Toolchain to create such binaries for ourselves. There we have the most commonly used system dll's in such a hybrid format.

So WOW takes care of the syscall translation from 32 bit to 64 bit close to the kernel, be that on arm64 or on x64, and filesystem + registry redirection. While XTA takes care of the transition from emulated native code as close to the loaded user code.

In between lives native arm/arm64 code.

When running x64 on arm 64 only XTA is active and no WOW is in place, a call to IsWow64Process2 confirms that an x64 application running on arm64 does so without WOW.

Interestingly when querying NtQuerySystemInformationEx(SystemSupportedProcessorArchitectures, ...
ARM64EC binaries give the same result x64 binaries on arm64

Also when debugging ARM64EC binaries you need to use the x64 debugger, there is no dedicated ARM64EC one.

Interesting, I was under the impression it was booth the same, ARM64 code with provisions to
a) call x64 code
and
b) be callable from x64 code
so the ARM64X version contains full code for booth platforms, hence would be runable on a x64 windows as well?

So how is ARM64X different form a dll compiled as ARM64EC?

So they have a new type of relocation that "relocates" imports, etc. rva depending on arm vs x64? That's actually pretty cool, I am now only half-mad they introduced yet another relocation type...

I have been experimenting with my HackLib with code injection into/from Arm64 processes and noticed something unexpected but in hindsight logic ARM64EC processes are more x64 than arm even if most of their compiled code is arm.
That I mean, when I hijack the main thread before it resumes, and point it to a peace of shell code of mine the shell code must be x64 in order to work.
The funny thing is my shell code loads a dll that hooks then in x64 various functions, for demo purposes for example MessageBoxW and alters its title and all that works as expected.
The ARM64EC app having its main and rest of user written functions being ARM64 code when calling MessageBoxW invoked the hooked x64 version just fine.
To be honest I would have expected a shorthand here where the arm code calls system API's directly, but no the control flow goes from Arm64 to the x64 stub which can be hooked and then back to Arm64 code in the system\user32.dll that is pretty nice. And in a way logical that a process that is supposed to be able to load native x64 dll's would for ideal compatibility have provisions to allow that dll to consistently hook functions even if that means a few more detours are taken.

Now about cross architecture code injection (assuming shell code and injected dll have the right architecture):
Code injection from an arm64 process into an x64 or arm64ec process works just fine using x64 code

What how ever currently fails is code injection from a x64 or arm64ec process into a arm64 process and it seams for a quite mundane reason, NtSetContextThread when called from a x64 process to act upon a arm64 process, returns STATUS_SET_CONTEXT_DENIED
I expect would the operation have been performed, the rest would succeed.

Since arm64 to arm64 injection using the same method works just fine I expert this being som sort of intentional limitation, perhaps not even a real security measure but rather a safeguard against screwing things up.

So the question, did anyone here already experienced this and would know if there is a easy way around, short of native arm64 spawning a helper process just to do that one function call.

Found this on my feed and wanted to share :)

├─Jack-in-the-Cache: A New Code injection Technique through Modifying X86-to-ARM Translation Cache (this one is from the same author I mentioned before in #3).
│ it's was presented for BlackHat con also the video this time is in English while the code blue one was in Japanese.
│
│─────https://i.blackhat.com/eu-20/Wednesday/eu-20-Nakagawa-Jack-In-The-Cache-A-New-Code-Injection-Technique-Through-Modifying-X86-To-Arm-Translation-Cache.pdf
│
└─────ttps://www.youtube.com/watch?v=8wg7X5IaEto

Quote:

Recently, the adoption of ARM processors for laptop computers is becoming popular due to its high energy efficiency. Windows 10 on ARM is a new OS for such ARM-based computers. Several laptop computers with this OS have already been shipped; notably, the recent launch of Microsoft Surface Pro X will be a driving force to facilitate the widespread use of Windows 10 on ARM.

You might think that there are new threats to such a new OS. Yes! We found such a threat.

In this talk, we present a new code injection technique to abuse a novel feature of Windows 10 on ARM: X86 emulation.
Remarkably, Windows 10 on ARM can run X86 apps via the X86 emulation feature that translates binary from X86-to-ARM just in time. To reduce the performance overhead of JIT binary translation, the OS has the mechanism to cache already-translated results as X86-to-ARM (XTA) cache files.

Our new code injection technique is performed by modifying this XTA cache file. Since this technique is difficult to detect and trace, appropriate countermeasures are necessary. Moreover, this technique can be used as an API hooking invisible to an X86 process. Therefore, this technique has already been a threat to Windows 10 on ARM.

We believe that future OSs have a JIT translation mechanism at the processor transition. For example, Apple has recently announced Rosetta 2, which is a similar mechanism for introducing their own ARM-based chip. For these OSs, the caching of already-translated results as files is a reasonable way to decrease performance overhead.

Our new code injection technique might also apply to such OSs.This presentation becomes a beneficial advisory for the developers of such future OSs, not limited to Windows 10 on ARM. PoC code of our new code injection technique and analysis results of the X86 emulation will be public on GitHub after this talk.

Excellent blog from the same author as well
├─Discovering a new relocation entry of ARM64X in recent Windows 10 on Arm
│
│─────https://ffri.github.io/ProjectChameleon/new_reloc_chpev2/
│
└─────https://github.com/FFRI/ProjectChameleon/

hmpf... sooo I have a nice function hijacking code that from arm64 to arm64 works perfectly.

Now we also know that all processes in arm64 start execution as arm64 (or at least I think that) so at the very start of every program we should enter ntdll's LdrInitializeThunk and for arm64 on arm64 its of cause so.
For other architectures on arm64 we should at some point divert into emulated code.

When trying to inject a detour in LdrInitializeThunk of a created suspended x64 process on arm64 however that code does not seam to ever be executed. Meaning I can inject garbage and it will still startup just fine.

Now my assumption of how x64 on arm64 works is that as soon as execution goes into a system dll i.e. anything compiled as ARM64X we exit emulated x64 mode and execute the native arm code in the dll. So it stands to argue that wen bootstrapping a process, it behaves analogously everything is executed in native arm until it comes the time to call the x64 processes entry point.

Well it seams something isn't quite right here, one possibility is that the ARM64X dll's truly have all the code doubled including large portions of the arm code, so when I manipulate the LdrInitializeThunk I get i do it to a copy that will never be used.

Now I find that strange I would have assumed that the code wouldn't be doubled that MSFT would have some smart redirection in place allowing the ARM64X dll's to re use most of the arm code for the native and the emulated mode.

@RamMerLabs since you apparently have already a lot of experience with the layout of the new PE files, would you may be have a few tips is that really so that the code is fully separated?

After some more research I can confirm that this is what happens, when i get the address of #LdrInitializeThunk from the symbol file for ntdll and use these my injected code works.
Sooo... the next question is how to get the "export" addresses without the need of a pdb file.
It was already earlier written that this ARM64X files have a 2nd export directory, so I guess parsing that "by hand" would be the strait forward approach.
Unless there is a flag that can be passed to LdrLoadDll that would do this for me ?

Splendid! Where can i find some documentation/reverse engineering notes on the new PE features? As for my use case it's not enough to see them in a tool i need to programmatically get to the relevant addresses for code injection.
I already have found this: https://ffri.github.io/ProjectChameleon/new_reloc_chpev2/ but its from 2019 i imagine by now there should be more refined information available?