Avoids potentially expensive (depending on the size of the memory block)
allocations by reserving the necessary memory before performing both
insertions. This avoids scenarios where the second insert may cause a
reallocation to occur.
Avoids needing to read the same long sequence of code in both code
paths. Also makes it slightly nicer to read and debug, as the locals
will be able to be shown in the debugger.
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
We can simply enable CMAKE_AUTOUIC and let CMake take care of handling
the UI code generation for targets.
As part of letting CMake automatically handle the header file parsing,
we must not name includes with "ui_*" unless they're related to the
output of the Qt UIC compiler. Because of this, we need to rename
ui_settings, given it would conflict with this restriction.
This commit ensures that the host gpu is constantly fed with commands to
work with, while the guest gpu keeps producing the rest of the commands.
This reduces syncing time between host and guest gpu.
This commit fixes offsets on Linear -> Tiled copies, corrects z pos
fortiled->linear copies, corrects bytes_per_pixel calculation in tiled
-> linear copies and relaxes some limitations set by latest dma fixes
refactors.
This commit ensures that all backing memory allocated for the Guest CPU
is aligned to 256 bytes. This due to how gpu memory works and the heavy
constraints it has in the alignment of physical memory.
Audio devices use the supplied revision information in order to
determine if USB audio output is able to be supported. In this case, we
can only really handle using this revision information in
ListAudioDeviceName(), where it checks if USB audio output is supported
before supplying it as a device name.
A few other scenarios exist where the revision info is checked, such as:
- Early exiting from SetAudioDeviceOutputVolume if USB audio is
attempted to be set when that device is unsupported.
- Early exiting and returning 0.0f in GetAudioDeviceOutputVolume when
USB output volume is queried and it's an unsupported device.
- Falling back to AHUB headphones in GetActiveAudioDeviceName when the
device type is USB output, but is unsupported based off the revision
info.
In order for these changes to also be implemented, a few other changes
to the interface need to be made.
Given we now properly handle everything about ListAudioDeviceName(), we
no longer need to describe it as a stubbed function.
The revision querying facilities are used by more than just audren. e.g.
audio devices can use this to test whether or not USB audio output is
supported.
This will be used within the following change.
AudioDevice and AudioInterface aren't valid device names on the Switch.
We should also be returning consistent names in
GetActiveAudioDeviceName().
While we're at it, we can also handle proper name output in
ListAudioDeviceName, by returning all the available devices on the
Switch.
This is the default behavior of the copy constructor, so it doesn't need
to be specified.
While we're at it we can make the other non-default constructor
explicit.
Prevents a truncation warning from occurring with MSVC. Also the
internal data structures already treat it as a size_t, so this is just a
discrepancy in the interface.
Textures can have different components types in different orders. This
assert was completely inprecise and the effectiveness of such is better
handled by case and within the texture cache.
Conditional Rendering takes care of conditionaly clearing or drawing
depending on a set of queries. This PR implements the query checks to
stablish if things can be rendered or not.
These are std::shared_ptr instances underneath the hood, which means
copying them isn't as cheap as a regular pointer. Particularly so on
weakly-ordered systems.
This avoids atomic reference count increments and decrements where they
aren't necessary for the core set of operations.
While changing this code, simplify tracking code to allow returning
the base address node, this way callers don't have to manually rebuild
it on each invocation.
Creating multiple "AudioRenderer" threads cause the previous thread to be overwritten. The thread will name be renamed to AudioRenderer-InstanceX, where X is the current instance number.
Provides a basic implementation of SetAutoSleepDisabled. Until idle
handling is implemented, this is about the best we can do.
In the meantime, provide a rough documenting of specifics that occur
when this function is called on actual hardware.
The JIT is mature enough that this setting can be removed, falling back
to Unicorn only on unsupported architectures. Any missing features from
Unicorn (of which there are extremely few), are mostly
developer-oriented, which most users don't care about.
Features should be coordinated with the JIT, not the interpreter,
anyhow.
This was initially necessary when AArch64 JIT emulation was in its
infancy and all memory-related instructions weren't implemented.
Given the JIT now has all of these facilities implemented, we can remove
these functions from the CPU interface.
Prior to PR, Yuzu did not restore memory to RW-
on unmap of mirrored memory or unloading of NRO.
(In fact, in the NRO case, the memory was unmapped
instead of reprotected to --- on Load, so it was
actually lost entirely...)
This PR addresses that, and restores memory to RW-
as it should.
This fixes a crash in Super Smash Bros when creating
a World of Light save for the first time, and possibly
other games/circumstances.
We don't have any friends implemented in Yuzu yet so it doesn't make sense to return any friends. For now we'll be returning 0 friends however the information provided will allow a proper implementation of this cmd when needed.
must_reconfigure isn't a parameter for this function any more, so it can
be replaced with current_state.
While we're at it, we can make the parameters of the declaration match
the same name as the ones in the definition.
This sets the DeviceMapped attribute for GPU-mapped memory blocks,
and prevents merging device mapped blocks. This prevents memory
mapped from the gpu from having its backing address changed by
block coalesce.
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.
In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
This implements svcMapPhysicalMemory/svcUnmapPhysicalMemory for Yuzu,
which can be used to map memory at a desired address by games since
3.0.0.
It also properly parses SystemResourceSize from NPDM, and makes
information available via svcGetInfo.
This is needed for games like Super Smash Bros. and Diablo 3 -- this
PR's implementation does not run into the "ASCII reads" issue mentioned
in the comments of #2626, which was caused by the following bugs in
Yuzu's memory management that this PR also addresses:
* Yuzu's memory coalescing does not properly merge blocks. This results
in a polluted address space/svcQueryMemory results that would be
impossible to replicate on hardware, which can lead to game code making
the wrong assumptions about memory layout.
* This implements better merging for AllocatedMemoryBlocks.
* Yuzu's implementation of svcMirrorMemory unprotected the entire
virtual memory range containing the range being mirrored. This could
lead to games attempting to map data at that unprotected
range/attempting to access that range after yuzu improperly unmapped
it.
* This PR fixes it by simply calling ReprotectRange instead of
Reprotect.
Prior to execution within a process beginning, the process establishes
its own TLS region for uses (as far as I can tell) related to exception
handling.
Now that TLS creation was decoupled from threads themselves, we can add
this behavior to our Process class. This is also good, as it allows us
to remove a stub within svcGetInfo, namely querying the address of that
region.
Previously, a translated string was being appended onto another string
in a manner that doesn't allow the translator to control where the
appended text is placed. This can be a nuisance for languages where
grammar and text ordering differs from English.
We now append the strings via the format strings themselves, which
allows translators to reorder where the text will be placed.
Instead of passing by copy an execution context through out the whole
Vulkan call hierarchy, use a command buffer view and fence view
approach.
This internally dereferences the command buffer or fence forcing the
user to be unable to use an outdated version of it on normal usage.
It is still possible to keep store an outdated if it is casted to
VKFence& or vk::CommandBuffer.
While changing this file, add an extra parameter for Flush and Finish to
allow releasing the fence from this calls.
Provides a more accurate name for the memory region and also
disambiguates between the map and new map regions of memory, making it
easier to understand.
Handles the placement of the stack a little nicer compared to the
previous code, which was off in a few ways. e.g.
The stack (new map) region, shouldn't be the width of the entire address
space if the size of the region calculation ends up being zero. It
should be placed at the same location as the TLS IO region and also have
the same size.
In the event the TLS IO region contains a size of zero, we should also
be doing the same thing. This fixes our memory layout a little bit and
also resolves some cases where assertions can trigger due to the memory
layout being incorrect.
Taking the json instance as a constant reference, makes all moves into
the parameter non-functional, resulting in copies. Taking it by value
allows moves to function.