Introduction

A Basic Python Tutorial

Installation

We support Python 3. Python wheels for Windows X64 are published to Pypi, so to install:

pip install synthizer

Some Basics

Synthizer requires initialization and deinitialization. The best way to initialize Synthizer is to use the context manager:

with synthizer.initialized():
    # Code...

The context manager will handle deinitialization on exceptions. Though synthizer.initialize() and synthizer.shutdown() are exposed, use is discouraged as the context manager ensures clean shutdown in (almost) all cases.

Once the library is deinitialized, calls into Synthizer error.

Destroying Objects

Unlike the rest of Python, Synthizer objects represent concrete audio architecture, which need to have a defined lifetime in order to control when they are and aren't being heard, and also to ensure that it can be well understood when things are and aren't using system resources. To that end, Synthizer objects don't interact with Python garbage collection. When you're done with an object, call myobj.destroy().

At the moment, failing to do so will permanently leak the object. Work in future may lift this restriction, but it's still necessary to be explicit: if you aren't, things may be audioble longer than you intend.

The Context

Most objects in Synthizer require a context, which represents a listener in 3D space, an audio output device, and other miscellaneous infrastructure. Objects are passed a context on construction, and two objects from different contexts never interact.

To get a context:

ctx = synthizer.Context()

Currently, selecting audio output devices isn't supported, and audio will go to the system default.

Buffers, Streams, and Generators

As elaborated in the concepts section, the Synthizer audio graph is as follows:

  • A source is fed by one or more generators and pans audio.
  • All the sources feed the context's output.

Generators are an abstract concept, which represents somewhere audio comes from. Specific kinds of generators implement the abstract interface in a concrete fashion, notably BufferGenerator (takes a buffer) and StreamingGenerator (takes streaming parameters).

The easiest way to get audio into Synthizer is via streams. Streams are specified as pre-parsed URL-like components:

  • A protocol, file for example, which specifies where the audio comes from.
  • A path, which specifies where the audio is (i.e. path on disk, etc).
  • Options, of the form key=value&key=value&.... At the moment, unused by anything, so just use "".

To play back a stream, you have two choices: you can use a StreamingGenerator, which takes these parameters directly and will decode in realtime, or you can use a BufferGenerator, which takes a pre-decoded buffer that you made previously from a strem specification. You use StreamingGenerator for things like music and BufferGenerator for things like short sounds. Note that StreamingGenerator is expensive and relatively high latency.

Buffers are in-memory decoded assets, essentially arrays of 16-bit samples resampled to Synthizer's samplerate. Note that they aren't actually contiguous arrays and are also immutable.

To get a streaming generator:

generator = synthizer.StreamingGenerator(ctx, "file", "test.wav")

To get a buffer from a stream:

buffer = synthizer.Buffer.from_stream("file", "test.wav")

Note the following:

  • Buffers aren't associated with a context and can be used anywhere,.
  • Buffers can be used simultaneously by multiple generators (or whatever else). You are encouraged to cache them and reuse them indefinitely.
  • When Synthizer needs to overload a constructor, it does so as staticmethods on the class. We don't have multiple constructors here yet, but will in future.

Sources

Sources represent audio output. They get one or more generators, combine them all, and output. Currently we have the following kinds of sources:

  • A PannedSource is manually controlled using azimuth and elevation or, alternatively, a panning scalar.
  • Source3D is a 3D environmental source with the usual things you'd expect: distance model, position, etc.

Properties

Synthizer offers the following kinds of properties:

  • int, represented as either a Python integer type or an enum. An example of the latter case is source.distance_model, which is a synthizer.DistanceModel. Synthizer exposes enums as Python 3.4-style enums via Cython.
  • Double, which is self-explanatory.
  • Double3, which is a tuple of 3 doubles. Usually used as a position.
  • Double6, a tuple of 6 doubles. Usually used as an orientation (given a dedicated section below).
  • Object, i.e. buffer_generator.buffer = b.

It is important to note that Synthizer properties are eventually consistent. What this means is that code like the following doesn't do what you expect:

myobj.property = 0
myobj.property += 5
myobj.property += 5
# May or may not fail, depending on timing.
assert myobj.property == 10

myobj.property = 15
x = myobj.property
# may or may not fail depending on timing.
assert x == 15

Property reads are primarily useful for properties like position on various generators, where Synthizer is updating the property itself. In general, it's best to use properties to tell Synthizer what to do, but keep the model of what's supposed to be going on in your code. A common mistake is to try to use Synthizer to store data, for example putting the position of your objects in a source rather than maintaing the coordinates yourself.

Object properties are internally referenced in a weak fashion. That is to say that destroying (.destroy())the object the property is set to will clear the property.

In the Python bindings, Synthizer translates SYZ_P_MY_PROPERTY to obj.my_property. The translations of the tables in the object reference are nearly mechanical, and this simple transformation always tells you where the property lives in Python.

An aside: orientation formulas

Before going much further, most people seem to eventually ask about trigonometry with respect to using a 3D audio library for 2D games. The high level overview for those who already know trigonometry is that Synthizer's coordinate system is right-handed and orientations consist of 2 orthogonal unit vectors (atx, aty, atz, upx, upy, upz) stored as a packed property so that they can both be set atomically. But the longer version for those who don't know trigonometry is:

Degrees to radians is:

import math

def deg2rad(angle):
    return (angle / 180.0) * math.pi

People who don't know trig usually ask for orientations that are clockwise of north. To do that:

import math

def make_orientation(degrees):
    rad = deg2rad(angle)
    return (math.sin(rad), math.cos(rad), 0, 0, 0, 1)

Setting context.orientation to the result of the above will set things up so that you can treat positive x as east, positive y as north, and positive z as up. The default orientation faces the listener north.

Putting it Together

To play a source in 3D space, do the following:

  • Create a context.
  • Create a source.
  • Create a buffer.
  • Create a generator.
  • generator.buffer = buffer
  • source.add_generator(generator)
  • Then manipulate position, etc. on the source.

A Worked Example

The following is a 3D media player, the audio library equivalent of hello world. It supports the following commands:

  • pause, play: pause/play the source
  • seek <seconds>: self-explanatory.
  • pos <x> <y> <z>: move the source. X is right, y is forward, z is up.
  • loop: Toggle looping of the generator.
  • gain <value>: Control the gain of the generator, in DB.
  • quit: self-explanatory.

Note that the default distance model parameters cause the source to become completely silent at around 50 units out. Movements close to the head won't change the volume much. Also, HRTF improvements are coming.

This example also doesn't demonstrate destruction, as that's handled by library deinitialization and process shutdown. A proper program needs source.destroy() etc for dynamic sources, as explained above.

The code:

"""A simple media player, demonstrating the Synthizer basics."""

import sys

import synthizer

if len(sys.argv) != 2:
    print(f"Usage: {sys.argv[0]} <file>")
    sys.exit(1)

# Log to debug. At the moment this writes directly to stdout, but will in
#  future integrate with Python's logging modules.
# It's best to call this before any initialization.
synthizer.configure_logging_backend(synthizer.LoggingBackend.STDERR)
synthizer.set_log_level(synthizer.LogLevel.DEBUG)

with synthizer.initialized():
    # Get our context, which almost everything requires.
    # This starts the audio threads.
    ctx = synthizer.Context()

    # A BufferGenerator plays back a buffer:
    generator = synthizer.BufferGenerator(ctx)
    # A buffer holds audio data. We read from the specified file:
    buffer = synthizer.Buffer.from_stream("file", sys.argv[1])
    # Tell the generator to use the buffer.
    generator.buffer = buffer
    # A Source3D is a 3D source, as you'd expect.
    source = synthizer.Source3D(ctx)
    # It'll play the BufferGenerator.
    source.add_generator(generator)
    # Keep track of looping, since property reads are expensive:
    looping = False

    # A simple command parser.
    while True:
        cmd = input("Command: ")
        cmd = cmd.split()
        if len(cmd) == 0:
            continue
        if cmd[0] == "pause":
            source.pause()
        elif cmd[0] == "play":
            source.play()
        elif cmd[0] == "pos":
            if len(cmd) < 4:
                print("Syntax: pos x y z")
                continue
            try:
                x, y, z = [float(i) for i in cmd[1:]]
            except ValueError:
                print("Unable to parse coordinates")
                continue
            source.position = (x, y, z)
        elif cmd[0] == "seek":
            if len(cmd) != 2:
                print("Syntax: pos <seconds>")
                continue
            try:
                pos = float(cmd[1])
            except ValueError:
                print("Unable to parse position")
                continue
            try:
                generator.position = pos
            except synthizer.SynthizerError as e:
                print(e)
        elif cmd[0] == "quit":
            break
        elif cmd[0] == "loop":
            looping = not looping
            generator.looping = looping
            print("Looping" if looping else "Not looping")
        elif cmd[0] == "gain":
            if len(cmd) != 2:
                print("Syntax: gain <value>")
                continue
            try:
                value = float(cmd[1])
            except ValueError:
                print("Unable to parse value.")
                continue
            # Convert to scalar gain from db.
            gain = 10 ** (value / 20)
            source.gain = gain
        else:
            print("Unrecognized command")

Events (alpha)

The following functionality is alpha. In particular, significant changes are expected in the 0.9 release. For the moment the only documentation is this tutorial; it will be improved after the planned 0.9 changes and as part of the general pre-1.0 documentation project.

Introduction

Synthizer supports sending events. Currently, it can send finished and looped events for both BufferGenerator and StreamingGenerator. This will be extended to other objects and concepts in future, as appropriate. The events have the following meanings:

  • Finished: the generator isn't configured to loop and has reached the end.
  • Looped: The generator is configured to loop, and a loop was just completed.

Events are disabled by default and must be enabled. For C, this means syz_contextEnableEvents(context). For Python, an additional keyword argument enable_events = True can be passed to the Context constructor.

Once enabled, events feed a queue that can be polled with syz_contextGetNextEvent. If events are enabled and the application never polls the queue, the queue will fill up forever; this is effectively a memory leak. In other words, only enable events if you know you'll actually use them.

C users are encouraged to read synthizer.h, in particular struct syz_Event. See also the enum SYZ_EVENT_TYPES in synthizer_constants.h. The basic idea is to call syz_contextGetNextEvent until you get SYZ_EVENT_TYPE_INVALID indicating the end of the queue.

The events system will drop events which would refer to invalid handles on a best-effort basis, so that an invalid handle is never returned to the caller of syz_contextGetNextEvent. This is guaranteed to work as long as all deletion happens on the same thread as the event polling. In future, Synthizer is going to switch handles to a reference-counted scheme which will improve the concurrency story.

A Python Tutorial

In Python, events are exposed as an iterator on the context:

for event in ctx.get_events():
    if isinstance(e, synthizer.FinishedEvent):
        # Handle finished
    elif isinstance(e, synthizer.LoopedEvent):
        # process

get_events takes an optional argument to limit the number of events returned in one iteration. By default, it's unlimited.

As shown above, you detect event types with isinstance. Each event has a source and context property indicating the source (e.g. generator) and context associated with it, as Synthizer objects. In future, other event types may include more information.

Filters in Python

NOTE: C API users and bindings developers should se here.

Synthizer offers the ability to set filters in many places, in order to allow for precise control over audio effects. By far the most useful of these is on all sources, which can be used to simulate occlusion. At the moment, filters may be found on:

  • All source types, as:
    • A filter property, which applies to all audio coming out of the source.
    • A filter_direct property, which runs after filter but only to audio going to the direct path (e.g. not through effects).
    • a filter_effects property, which also runs after filter but only on audio going to effects.
  • On all effects, as a filter_input property, which filters the input to an effect.
  • As a filter parameter to Context.config_route, which will apply to audio traveling through that effect send.

There are basically two paths, as to how audio can get filtered. First is filter followed by filter_direct. Second is filter, filter_effects, the filter on the send, then filter_input. filter is on both paths so that it can be used to control the audio from the source.

practically, occlusion goes on either filter or filter_direct depending if you want it fed into reverb, filter_input on reverbs provides a per-reverb coloration of the "walls", and the filter in the effect send can be used to provide per-source coloration for the effect that it's going to.

Currently, the properties are readonly until such time as Synthizer makes struct syz_BiquadConfig non-opaque.

Synthizer supports lowpass, bandpass, and highpass filters. You get them as follows:

source.filter = synthizer.BiquadConfig.design_lowpass(frequency, q)
source.filter = synthizer.BiquadConfig.design_highpass(frequency, q)
source.filter = synthizer.BiquadConfig.design_bandpass(frequency, bandwidth)

context.config_route(output, input, filter = synthizer.BiquadConfig.design_lowpass(1000))

In the above, q is an advanced parameter that defaults to a value which yields a butterworth filter, which is almost always what you want. You shouldn't need to change it from the default, and can usually just omit it. q controls resonance. higher values of q produce filters that ring, which may or may not be beneficial for designing audio effects.

To clear a filter, set it to synthizer.BiquadConfig.design_identity(), which is how you get the filter which does nothing to the audio (internally Synthizer will avoid running it, but filters do not have a concept of NULL).

Note that not all filter configurations are stable. Synthizer cannot validate this case in any meaningful fashion. All normal usage should work as expected, but extreme values may produce unstable filters. For example: lowpasses with absurdly high frequencies, bandpasses with a bandwidth of 1 HZ, and/or very low and very high q values. For those not already familiar with unstable filters, this case can be recognized by strange metallic ringing effects that run forever, even when audio is silent.

To design occlusion, use a lowpass filter on the source, either as filter or filter_direct. Synthizer doesn't currently provide anything to help because it's not possible to build a proper physics-based occlusion model and it is sometimes even beneficial to use bandpass or highpass filters instead (e.g. audio traveling through a pipe). It has to be done per application.

Introduction

This section of the manual introduces Synthizer concepts and should be read in order. Pages here will either explain something, or quote relevant sections of the C headers with explanation.

Stability and Versioning

Synthizer uses pre-1.0 semantic versioning. This means:

  • Major is always 0.
  • Minor is incremented for incompatible API changes.
  • Patch is incremented for new features and/or bug fixes.

Synthizer is intended to be production ready software, but has not seen wide usage. It's somewhere between beta and 1.0: not as many features as you might want, but also not crashing at the drop of a hat. If you find bugs, please report them against the official repository.

API breakage is still expected. This manual attempts to document where API breakage may occur. These are referred to as provisional features.

The Signal Graph, Library Parameters, and Limitations

Synthizer's signal graph is as follows:

  • 1 or more generators feed a source.
  • 1 or more sources feed:
    • A context, which is directly passed to audio output.
    • Optionally, any number of global effects.

Synthizer always processes audio at a sample rate of 44100, and with a fixed block size. All parameter updates take effect at the block boundaries. These parameters cannot be reconfigured without recompiling the library. Synthizer will get the best latency it can for the system it's running on. This can also not be controlled or influenced by the programmer.

Currently, the block size is 256 samples, or around 5 MS. The authoritative source for this number is include/synthizer/config.hpp in the Synthizer repository. The hope is that this may be lowered to 128 or better in future.

No audio input or output can be over 16 channels. This limit may be raised in future.

Object handles are not reused, but there is a limit of 65535 concurrent outstanding object handles. This is due to an internal lockfree slab which allows for relatively fast handle translation without syscalls. If this limit proves problematic, it will be raised, but for all practical intents memory or CPU will be exhausted first.

C API Conventions

The headers are:

  • synthizer.h: All library functions
  • synthizer_constants.h: Constants, i.e. the very large property enum.

The Synthizer C API returns errors and writes results to out parameters. Out parameters are always the first parameters of a function, and errors are always nonzero. Note that error codes are currently not defined; they will be, once things are more stable.

Logging, Initialization, and Shutdown

The following excerpts from synthizer.h specify the loggin and initialization API. Explanation follows:

typedef ... syz_Handle;

enum SYZ_LOGGING_BACKEND {
	SYZ_LOGGING_BACKEND_STDERR = 0,
};

SYZ_CAPI syz_ErrorCode syz_configureLoggingBackend(enum SYZ_LOGGING_BACKEND backend, void *param);

enum SYZ_LOG_LEVEL {
	SYZ_LOG_LEVEL_ERROR = 0,
	SYZ_LOG_LEVEL_WARN = 10,
	SYZ_LOG_LEVEL_INFO = 20,
	SYZ_LOG_LEVEL_DEBUG = 30,
};

SYZ_CAPI void syz_setLogLevel(enum SYZ_LOG_LEVEL level);

SYZ_CAPI syz_ErrorCode syz_initialize();
SYZ_CAPI syz_ErrorCode syz_shutdown();

Synthizer supports logging backends, which should be configured before calling syz_initialize() and never thereafter. The param is backend specific, and unused for stderr (the only supported option). Log levels work exactly how one would expect and can be changed at any time.

syz_initialize() and syz_shutdown()initialize and shut the library down respectively. Calls to these nest; everysyz_initializeshould be matched with asyz_shutdown`. This is supported so that multiple dependencies to a program can initialize Synthizer without conflict, but centralizing initialization and only doing it once is strongly encouraged.

Objects, Handles, and Properties

Synthizer represents references to objects with a syz_Handle type. The following excerpts from synthizer.h are available on every object type:

SYZ_CAPI syz_ErrorCode syz_handleFree(syz_Handle handle);
SYZ_CAPI syz_ErrorCode syz_handleGetObjectType(int *out, syz_Handle handle);

SYZ_CAPI syz_ErrorCode syz_getI(int *out, syz_Handle target, int property);
SYZ_CAPI syz_ErrorCode syz_setI(syz_Handle target, int property, int value);
SYZ_CAPI syz_ErrorCode syz_getD(double *out, syz_Handle target, int property);
SYZ_CAPI syz_ErrorCode syz_setD(syz_Handle target, int property, double value);
SYZ_CAPI syz_ErrorCode syz_setO(syz_Handle target, int property, syz_Handle value);
SYZ_CAPI syz_ErrorCode syz_getD3(double *x, double *y, double *z, syz_Handle target, int property);
SYZ_CAPI syz_ErrorCode syz_setD3(syz_Handle target, int property, double x, double y, double z);
SYZ_CAPI syz_ErrorCode syz_getD6(double *x1, double *y1, double *z1, double *x2, double *y2, double *z2, syz_Handle target, int property);
SYZ_CAPI syz_ErrorCode syz_setD6(syz_Handle handle, int property, double x1, double y1, double z1, double x2, double y2, double z2);

Synthizer objects are like classes: they have properties, methods, and (optionally) bases. They're created through "constructors", for example syz_createContext, and destroyed through syz_handleFree. syz_handleGetObjectType can be used to query the type of an object at runtime, returning one of the SYZ_OTYPE constants in synthizer_constants.h. As with C's malloc and free, calling syz_HandleFree with handle = 0 is a no-op.

Property names are defined in synthizer_constants.h of the form SYZ_P_FOO. Since some objects have common properties and in order to preserve flexibility, the property enum is shared between all objects.

Properties are set through syz_setX and read through syz_getX where X depends on the type:

  • I for integer
  • D for double
  • O for object.
  • D3 for double3, a packed vector of 3 doubles.
  • D6 for a packed array of 6 doubles, commonly used to represent orientations.
  • Biquad for a biquad filter property (see filters)

Synthizer supports 6 property types: int, double, double3, double6, object, and biquad filters.

Double3 is typically used for position and double6 for orientation. Synthizer's coordinate system is right-handed, configured so that positive y is forward, positive x east, and positive z up. Listener orientation is controlled through the context.

Object properties hold handles to other objects. This is a weak reference, so destroying the object will set the property to null. Note that it's not possible to read object properties. This is because internal machinery can block the audio thread because locking is required to safely manipulate handles in that case.

Biquad filter properties are documented in a dedicated section.

Property writes are always ordered with respect to other property writes on the same thread, and in general work how you would expect. But it's important to note that reads are eventually consistent. Specifically:

  • Two writes from the same thread always happen in the order they were made in terms of audio output.
  • The ordering of writes also applies if the app uses synchronization such as mutexes.
  • But reads may not return the value just written, and in general return values at some point in the relatively recent past, usually on the order of 5 to 50MS.

It is still useful to read some properties. An example of this is SYZ_P_POSITION on BufferGenerator. Even though Synthizer is returning values that are slightly out of date, it's still good enough for UI purposes. Additionally, even if Synthizer always returned the most recent value, audio latency introduces uncertainty as well. For properties that Synthizer updates, additional effort is made to keep the latency low enough for practical use, though there is always at least some.

The actual links between properties and objects are specified in this manual.

Filters

Synthizer supports a filter property type, as well as filters on effect sends. The API for this is as follows:

struct syz_BiquadConfig {
...
};

SYZ_CAPI syz_ErrorCode syz_getBiquad(struct syz_BiquadConfig *filter, syz_Handle target, int property);
SYZ_CAPI syz_ErrorCode syz_setBiquad(syz_Handle target, int property, const struct syz_BiquadConfig *filter);

SYZ_CAPI syz_ErrorCode syz_biquadDesignIdentity(struct syz_BiquadConfig *filter);
SYZ_CAPI syz_ErrorCode syz_biquadDesignLowpass(struct syz_BiquadConfig *filter, double frequency, double q);
SYZ_CAPI syz_ErrorCode syz_biquadDesignHighpass(struct syz_BiquadConfig *filter, double frequency, double q);
SYZ_CAPI syz_ErrorCode syz_biquadDesignBandpass(struct syz_BiquadConfig *filter, double frequency, double bandwidth);

The struct syz_BiquadConfig is an opaque struct whose fields are only exposed to allow allocating them on the stack. It represents configuration for a biquad filter, designed using the Audio EQ Cookbook. It's initialized with one of the above design functions.

A suggested default for q is 0.7071135624381276, which gives Buttererworth lowpass and highpass filters. For those not already familiar with biquad filters, q controls resonance: higher values of q will cause the filter to ring for some period of time.

In future, Synthizer will stabilize the syz_BiquadConfig struct and use it to expose more options, e.g. automated filter modulation.

Streams

Streams are sources of audio. They are specified with 3 string parameters:

  • protocol: The protocol. At the moment Synthizer only supports "file".
  • path: The path. Interpreted ina protocol-specific manner. For files, the on-disk path relative to the running execuable.
  • options: A string encoded as "key=value&key=value&...". Protocol-specific options.

Whenever a Synthizer API function wants to read audio data, it will request these three parameters to indicate where the data comes from. In future, it will be possible to register your own streams.

The similarity to a URL is intentional. We don't yet support parsing in that format, but may opt to do so in future. Nonetheless it is simple for a host program to do that parsing before passing these parameters to Synthizer, and in particular file:// URLs match the file protocol exactly.

Synthizer supports decoding Flac, Wav, and MP3. Ogg is planned but low priority.

3D Panning

This page explains the steps involved in 3D panning. Note that only the panner strategy applies to PannedSource.

1. Convert the source's position from world coordinates to Azimuth, Elevation, and distance

This is done by converting the position, at, and up vectors to a transformation matrix. The result is a position in listener coordinates. Then application of the pythagorean theorem and basic trigonometry gets to spherical coordinates.

2. Compute the Gain from the Distance Model

Let d be the distance to the source, d_ref the reference distance, d_max the max distance, r the roll-off factor. Then the gain of the source is computed as a linear scalar using one of the following formulas:

ModelFormula
SYZ_DISTANCE_MODEL_NONE1.0
SYZ_DISTANCE_MODEL_LINEAR1 - r * (clamp(d, d_ref, d_max) - d_ref) / (d_max - d_ref);
SYZ_DISTANCE_MODEL_EXPONENTIAL when d_ref == 0.00.0
SYZ_DISTANCE_MODEL_EXPONENTIAL when d_ref > 0.0(max(d_ref, d) / d_ref) ** -r
SYZ_DISTANCE_MODEL_INVERSE when d_ref = 0.00.0
SYZ_DISTANCE_MODEL_INVERSE when d_ref > 0.0d_ref / (d_ref + r * max(d, d_ref) - d_ref)

Qualitatively, d_ref is the "size" of the source, d_max is where the source is silent, r is how fast the source becomes quieter. Mapping these to real world scenarios is difficult, and in general the best approach is to experiment for your use case.

3. Apply the closeness boost and clamp

The closeness boost, specified through SYZ_P_CLOSENESS_BOOST and SYZ_P_CLOSENESS_BOOST_DISTANCE is used to emphasize sources that have crossed a threshold of interest, i.e. because the player is now close enough to interact. SYZ_P_CLOSENESS_BOOST specifies a gain in DB (negative DB is allowed) which is added to the source's gain when the source is closer than SYZ_P_CLOSENESS_BOOST_DISTANCE.

After the closeness boost is applied, gain is clamped to the range 0.0 to 1.0.

4. Apply the Panning Strategy

The panning strategy specifies how sources are to be panned. SYnthizer supports the following panning strategies:

StrategyChannelsDescription
SYZ_PANNER_STRATEGY_HRTF2An HRTF implementation, intended for use via headphones.
SYZ_PANNER_STRATEGY_STEREO2A simple stereo panning strategy assuming speakers are at -90 and 90.

Channel Upmixing and Downmixing

Synthizer's current channel mixing algorithm is as follows:

  • Mono to anything duplicates in all channels.
  • Anything to mono sums all channels and divides.
  • Otherwise, missing channels are zero-initialized and extra channels are dropped.

Though this algorithm will be extended in future, note that Synthizer is for games and VR applications, and that it is usually impossible to determine channel layout from media files with 100% reliability. When better support is added, this page wil be extended explaining how it works, but expect to need to perform media type conversions or to add stream options if working with more than 2 channels per asset.

Effects and Effect Routing

IMPORTANT: the effects API is provisional and subject to change for the time being. At the moment this is a hopefully final rough draft of the functionality, but experience is required to determine if it can be stabilized in the current form.

Synthizer will support two kinds of effects: global effects and generator-specific effects. At the moment, only global effects are implemented.

users of the Synthizer API can route any number of sources to any number of global effects, for example echo. This is done through the following C API:

struct syz_RouteConfig {
	float gain;
	float fade_time;
	syz_BiquadConfig filter;
};

SYZ_CAPI syz_ErrorCode syz_initRouteConfig(struct syz_RouteConfig *cfg);
SYZ_CAPI syz_ErrorCode syz_routingConfigRoute(syz_Handle context, syz_Handle output, syz_Handle input, struct syz_RouteConfig *config);
SYZ_CAPI syz_ErrorCode syz_routingRemoveRoute(syz_Handle context, syz_Handle output, syz_Handle input, float fade_out);

Routes are uniquely identified by the output object (Source3D, etc) and input object (Echo, etc). There is no route handle type, nor is it possible to form duplicate routes.

In order to establish or update the parameters of a route, use syz_routingConfigRoute. This will form a route if there wasn't already one, and update the parameters as necessary.

It is necessary to initialize syz_RouteConfig with syz_initRouteConfig before using it, but this need only be done once. After that, reusing the same syz_RouteConfig for a route without reinitializing it is encouraged.

Gains are per route and apply after the gain of the source. For example, you might feed 70% of a source's output to something (gain = 0.7).

Filters are also per route and apply after any filters on sources. For example, this can be used to change the filter on a per-reverb basis for a reverb zone algorithm that feeds sources to more than one reverb at a time.

In order to remove a route, use syz_routingRemoveRoute.

Both of these functions support crossfading provided in seconds. Internally, this is truncated to the nearest block. As an exception to this rule, non-zero fade times always give at least one block, under the assumption that if some fading was requested the goal was to avoid clipping. Specifically, in pseudocode:

blocks = truncate(fade * SR / BLOCK_SIZE)
if blocks == 0 and fade != 0:
    blocks = 1

This manual doesn't document global effects as distinct entities because Synthizer is internally designed to allow for object reuse in future when we support per-generator effects. Specifically, an object like Echo will often be able to be used in both positions.

Many effects involve feedback and/or other long-running audio as part of their intended function. But while in development, it is often useful to reset an effect. Synthizer exposes a function for this purpose:

SYZ_CAPI syz_ErrorCode syz_effectReset(syz_Handle effect);

Which will work on any effect (at most, it does nothing). As with things like property access this is slow, and it's also not going to sound good, but it can do things like clear out the feedback paths of a reverb at the Python shell for interactive experimentation purposes.

Context

Constructors

syz_createContext

SYZ_CAPI syz_ErrorCode syz_createContext(syz_Handle *out);

Creates a context configured to play through the default output device.

Properties

EnumTypeDefaultRangeDescription
SYZ_P_GAINdouble1.0value >= 0.0The gain of the context
SYZ_P_POSITIONdouble3(0, 0, 0)anyThe position of the listener.
SYZ_P_ORIENTATIONdouble6(0, 1, 0, 0, 0, 1)Two packed unit vectorsThe orientation of the listener as (atx, aty, atz, upx, upy, upz).
SYZ_P_DISTANCE_MODELintSYZ_DISTANCE_MODEL_LINEARany SYZ_DISTANCE_MODELThe default distance model for new sources.
SYZ_P_DISTANCE_REFdouble1.0value >= 0.0The default reference distance for new sources.
SYZ_P_DISTANCE_MAXdouble50.0value >= 0.0The default max distance for new sources.
SYZ_P_ROLLOFFdouble1.0value >= 0.0The default rolloff for new sources.
SYZ_P_CLOSENESS_BOOSTdouble0.0any finite doubleThe default closeness boost for new sources in DB.
SYZ_P_CLOSENESS_BOOST_DISTANCEdouble0.0value >= 0.0The default closeness boost distance for new sources
SYZ_P_PANNER_STRATEGYintSYZ_PANNER_STRATEGY_STEREOany SYZ_PANNER_STRATEGYThe default panner strategy for new sources.

Functions

syz_pause, syz_play

syz_ErrorCode syz_pause(syz_Handle object);
syz_ErrorCode syz_play(syz_Handle object);

The standard play/pause functions, which do exactly what their name suggests.

When the context is paused, nothing it manages advances and no audio is audible.

Remarks

The context is the main entrypoint to Synthizer, responsible for the following:

  • Control and manipulation of the audio device.
  • Driving the audio threads.
  • Owning all objects that play together.
  • Representing the listener in 3D space.

All objects which are associated with a context take a context as part of all their constructors. Two objects which are both associated with different contexts should never interact. For efficiency, whether two objects are from different contexts is unvalidated, and the behavior of mixing them is undefined.

All objects associated with a context become useless once the context is destroyed. Calls to them will still work, but they can't be reassociated with a different context and no audioble output will result.

Most programs create one context and destroy it at shutdown.

For the time being, all contexts output stereo audio, and it is not possible to specify the output device. These restrictions will be lifted in future.

For information on the meaning of the distance model properties, see 3D Panning.

Buffer

Constructors

syz_createBufferFromStream

SYZ_CAPI syz_ErrorCode syz_createBufferFromStream(syz_Handle *out, const char *protocol, const char *path, const char *options);

Currently, the only way to make a buffer is from a stream, in the self-explanatory manner. See Streams for information on streams.

This call will decode the stream in the calling thread, returning errors as necessary. Synthizer will eventually offer a BufferCache which supports background decoding and caching, but for the moment the responsibility of background decoding is placed on the calling program.

Properties

None.

Functions

Getters

SYZ_CAPI syz_ErrorCode syz_bufferGetChannels(unsigned int *out, syz_Handle buffer);
SYZ_CAPI syz_ErrorCode syz_bufferGetLengthInSamples(unsigned int *out, syz_Handle buffer);
SYZ_CAPI syz_ErrorCode syz_bufferGetLengthInSeconds(double *out, syz_Handle buffer);

The self-explanatory getters. These aren't properties because they can't be written and they shouldn't participate in the property infrastructure.

Remarks

Buffers hold audio data, as a collection of contiguous chunks. Data is resampled to the Synthizer samplerate and converted to 16-bit PCM using triangular dither.

Buffers are one of the few Synthizer objects that don't require a context. They may be used freely with any object requiring a buffer, from any thread. In order to facilitate this, buffers are immutable after creation.

The approximate memory usage of a buffer in bytes is 2 * channels * duration_in_seconds * 44100. Loading large assets into buffers is not recommended. For things such as music tracks, use StreamingGenerators. Note that on 32-bit architectures, some operating systems only allow a 2 gigabyte address space. Synthizer avoids allocating buffers as contiguous arrays in part to allow efficient use of 32-bit address spaces, but this only goes so far. If on a 32-bit architecture, expect to run out of memory from Synthizer's perspective well before decoding 2 Gigabytes of buffers simultaneously due to the inability to find consecutive free pages.

Source (abstract)

Constructors

None.

Properties

EnumTypeDefaultRangeDescription
SYZ_P_GAINdoubleAny double > 0An additional gain factor applied to this source.
SYZ_P_FILTERbiquadidentityanyA filter which applies to all audio leaving the source, before SYZ_P_FILTER_DIRECT and SYZ_P_FILTER_EFFECTS.
SYZ_P_FILTER_DIRECTbiquadidentityanyA filter which applies after SYZ_P_FILTER but not to audio traveling to effect sends.
SYZ_P_FILTER_EFFECTSbiquadidentityanyA filter which runs after SYZ_P_FILTER but only applies to audio traveling through effect sends.

Functions

syz_sourceAddGenerator, syz_sourceRemoveGenerator

SYZ_CAPI syz_ErrorCode syz_sourceAddGenerator(syz_Handle source, syz_Handle generator);
SYZ_CAPI syz_ErrorCode syz_sourceRemoveGenerator(syz_Handle source, syz_Handle generator);

Add/remove a generator from a source. Each generator may be added once and duplicate add calls will have no effect. Each generator should only be used with one source at a time.

syz_pause, syz_play

syz_ErrorCode syz_pause(syz_Handle object);
syz_ErrorCode syz_play(syz_Handle object);

The standard play/pause functions. Note that all subclasses of Source will still process panners for the time being, so this doesn't make a Source free in terms of things like HRTF and effect sends. This case will be optimized further in future.

When a Source is paused, no generator connected to it advances even if the generator is unpaused.

Remarks

Sources represent audio output. They combine all generators connected to them, apply any effects if necessary, and feed the context. Subclasses of Source add panning and other features.

All sources offer filters via SYZ_P_FILTER, SYZ_P_FILTER_DIRECT and SYZ_P_FILTER_EFFECTS. First, SYZ_P_FILTER is applied, then the audio is split into two paths: the portion heading directly to the speakers gets SYZ_P_FILTER_DIRECT, and the portion heading to the effect sends gets SYZ_P_FILTER_EFFECTS. This can be used to simulate occlusion and perform other per-source effect customization.

DirectSource

Inherits from Source.

Constructors

syz_createDirectSource

SYZ_CAPI syz_ErrorCode syz_createDirectSource(syz_Handle *out, syz_Handle context);

Creates a direct source.

Properties

Inherited from Source only.

Remarks

A direct source is for music and other audio assets that don't wish to participate in panning, , and should be linked directly to speakers.

Audio is converted to the Context's channel count and passed directly through.

SpatializedSource (abstract)

Constructors

None

Properties

EnumTypeDefaultRangeDescription
SYZ_P_PANNER_STRATEGYintSYZ_PANNER_STRATEGY_HRTFany SYZ_PANNER_STRATEGYThe panner strategy for this source.

Remarks

SpatializedSource is an abstract class which gives all panned sources their distance model and gain.

PannedSource

Inherits from SpatializedSource.

Constructors

syz_createPannedSource

SYZ_CAPI syz_ErrorCode syz_createPannedSource(syz_Handle *out, syz_Handle context);

Creates a panned source.

Properties

EnumTypeDefaultRangeDescription
SYZ_P_AZIMUTHdouble0.00.0 to 360.0The azimuth of the panner. See remarks.
SYZ_P_ELEVATIONdouble0.0-90.0 to 90.0See remarks
SYZ_P_PANNING_SCALARdouble0.0-1.0 to 1.0see remarks

Remarks

The PannedSource gives direct control over a panner, which is either controlled via azimuth/elevation in degrees or a panning scalar.

If using azimuth/elevation, 0.0 azimuth is forward and positive angles are clockwise. Elevation ranges from -90 (down) to 90 (up).

Some applications want to control panners through a panning scalar instead, i.e. for UI purposes. If using panning scalars, -1.0 is full left and 1.0 is full right.

Applications should use either a panning scalar or azimuth/elevation, never both on the same source. Using both simultaneously is undefined behavior.

For information on panning, see 3D Panning.

Source3D

Inherits from SpatializedSource.

Constructors

syz_createSource3D

SYZ_CAPI syz_ErrorCode syz_createSource3D(syz_Handle *out, syz_Handle context);

Creates a source3d positioned at the origin and with no associated generators.

Properties

EnumTypeDefaultRangeDescription
SYZ_P_POSITIONdouble3(0, 0, 0)anyThe position of the sourec.
SYZ_P_ORIENTATIONdouble6(0, 1, 0, 0, 0, 1)Two packed unit vectorsThe orientation of the source as (atx, aty, atz, upx, upy, upz). Currently unused.
SYZ_P_DISTANCE_MODELintfrom Contextany SYZ_DISTANCE_MODELThe distance model for this source.
SYZ_P_DISTANCE_REFdoubleFrom Contextvalue >= 0.0The reference distance.
SYZ_P_DISTANCE_MAXdoubleFrom Contextvalue >= 0.0The max distance for this source.
SYZ_P_ROLLOFFdoubleFrom Contextvalue >= 0.0The rolloff for this source.
SYZ_P_CLOSENESS_BOOSTdoubleFrom Contextany finite doubleThe closeness boost for this source in DB.
SYZ_P_CLOSENESS_BOOST_DISTANCEdoubleFrom Contextvalue >= 0.0The closeness boost distance for this source.

Remarks

A Source3D represents an entity in 3D space. For explanations of the above properties, see 3D Panning.

When created, Source3D reads all of its defaults from the Context's corresponding properties. Changes to the Context versions don't affect already created sources. A typical use case is to configure the Context to the defaults of the game, and then create sources.

Generator (abstract)

Generators generate audio, and are how Synthizer knows what to play through sources. In addition to direct generation, some generators take other generators as arguments, e.g. per-source effects and filters.

Properties

All generators support the following properties:

EnumTypeDefaultRangeDescription
SYZ_P_GAINdouble1.0value >= 0.0The gain of the generator.
SYZ_P_PITCH_BENDdouble1.0value >= 0.0Pitch bend of the generator as a multiplier (2.0 is +1 octave, 0.5 is -1 octave, etc)

Functions

syz_pause, syz_play

syz_ErrorCode syz_pause(syz_Handle object);
syz_ErrorCode syz_play(syz_Handle object);

The standard play/pause functions, which do exactly what their name suggests.

Remarks

Not all generators support SYZ_P_PITCH_BEND because it doesn't necessarily make sense for them to do so. Where this is the case, this manual will document that in the remarks for that generator type. Additionally, in cases where SYZ_P_PITCH_BEND has non-obvious behavior, the remarks will document that as well. The most common place to se non-obvious SYZ_P_PITCH_BEND behavior is in effects.

StreamingGenerator

Inherits from Generator.

Constructors

syz_createStreamingGenerator

SYZ_CAPI syz_ErrorCode syz_createStreamingGenerator(syz_Handle *out, syz_Handle context, const char *protocol, const char *path, const char *options);

Creates a StreamingGenerator from the standard stream parameters.

Properties

EnumTypeDefault ValueRangeDescription
SYZ_P_POSITIONdouble0.0value >= 0.0The position in of the stream.
SYZ_P_LOOPINGint00 or 1Whether playback loops

Remarks

StreamingGenerator plays streams, decoding and reading on demand. The typical use case is for music playback.

Due to the expense of streaming from disk and other I/O sources, having more than a few StreamingGenerators going will cause a decrease in audio quality on many systems, typically manifesting as drop--outs and crackling. StreamingGenerator creates one background thread per instance and does all decoding and I/O in that thread.

At startup, StreamingGenerator's background thread eagerly decodes a relatively large amount of data in order to build up a buffer which prevents underruns. Thereafter, it will pick up property changes every time the background thread wakes up to add more data to the buffer. This means that most operations are high latency, currently on the order of 100 to 200 MS. The least latent operation is the initial start-up, which will begin playing as soon as enough data is decoded. How long that takes depends on the format and I/O characteristics of the stream, as well as the user's machine and current load of the system.

BufferGenerator

Inherits from Generator.

Constructors

syz_createBufferGenerator

SYZ_CAPI syz_ErrorCode syz_createBufferGenerator(syz_Handle *out, syz_Handle context);

Creates a BufferGenerator. The buffer is set to NULL and the resulting generator will play silence until one is associated.

Properties

EnumTypeDefault ValueRangeDescription
SYZ_P_BUFFERObject0Any Buffer handleThe buffer to play
SYZ_P_POSITIONdouble0.0value >= 0.0The position in the buffer.
SYZ_P_LOOPINGint00 or 1Whether playback loops at the end of the buffer.

Remarks

BufferGenerators play Buffers.

SYZ_P_POSITION is reset if SYZ_P_BUFFER is modified.

SYZ_P_POSITION can be set past the end of the buffer. If SYZ_P_LOOPING = 0, the generator will play silence. Otherwise, the position will immediately loop to the beginning.

More than one BufferGenerator can use the same underlying Buffer.

SYZ_P_PITCH_BEND is a multiplicative rate on the playback of the buffer. Though this clarification is currently unimportant, the difference between this and it being the rate is that it will eventually be combined with doppler effects. A value of 2.0 is one octave higher. A value of 0.5 is one octave lower. Expect SYZ_P_PITCH_BEND to move to a base "class" in future: though we only support it for buffers at the moment, many other generator types will be able to do so in future.

NoiseGenerator

Inherits from Generator.

Constructors

syz_createNoiseGenerator

SYZ_CAPI syz_ErrorCode syz_createNoiseGenerator(syz_Handle *out, syz_Handle context, unsigned int channels);

Creates a NoiseGenerator configured for uniform noise with the specified number of output channels. The number of output channels cannot be configured at runtime. Each channel produces decorrelated noise.

Properties

EnumTypeDefault ValueRangeDescription
SYZ_P_NOISE_TYPEintSYZ_NOISE_TYPE_UNIFORMany SYZ_NOISE_TYPEThe type of noise to generate. See remarks.

Remarks

NoiseGenerators generate noise, which will be useful in future when various effects are added. For instance filtered noise makes plausible wind. Note that noise generators don't support SYZ_P_PITCH_BEND because noise doesn't have a pitch by definition.

Synthizer allows setting the algorithm used to generate noise to one of the following options. Note that these are more precisely named than white/pink/brown; the sections below document the equivalent in the more standard nomenclature.

SYZ_NOISE_TYPE_UNIFORM

A uniform noise source. From an audio perspective this is white noise, but is sampled from a uniform rather than Gaussian distribution for efficiency.

SYZ_NOISE_TYPE_VM

This is pink noise generated with the Voss-McCartney algorithm, which consists of a number of summed uniform random number generators which are run at different rates. Synthizer adds an additional random number generator at the top of the hierachy in order to improve the color of the noise in the high frequencies.

SYZ_NOISE_TYPE_FILTERED_BROWN

This is brown noise generated with a -6DB filter.

GlobalEffect

This is the abstract base class for global effects.

Properties

EnumTypeDefaultRangeDescription
SYZ_P_GAINdouble1.0value >= 0.0The overall gain of the effect.
SYZ_P_FILTER_INPUTbiquadusually identity. if not, documented with the effect.anyA filter which applies to the input of this effect. Runs after filters on effect sends.

Functions

syz_effectReset

SYZ_CAPI syz_ErrorCode syz_effectReset(syz_Handle effect);

Clears the internal state of the effect. Intended for design/development purposes. This function may produce clicks and other artifacts and is slow.

Remarks

All global effects inherit from this object type.

Echo

IMPORTANT: this object is provisional and may be subject to change.

Constructors

syz_createGlobalEcho

SYZ_CAPI syz_ErrorCode syz_createGlobalEcho(syz_Handle *out, syz_Handle context);

Creates the global variant of the echo effect.

Functions

syz_echoSetTaps

struct syz_EchoTapConfig {
	float delay;
	float gain_l;
	float gain_r;
};

SYZ_CAPI syz_ErrorCode syz_echoSetTaps(syz_Handle handle, unsigned int n_taps, struct syz_EchoTapConfig *taps);

Configure the taps for this Echo. Currently, delay must be no greater than 5 seconds. To clear the taps, set the echo to an array of 0 elements.

Properties

None

Remarks

This is a stereo tapped delay line, with a one-block crossfade when taps are reconfigured. The max delay is currently fixed at 5 seconds, but this will be made user configurable in future.

This implementation offers precise control over the placement of taps, at the cost of not being able to have indefinitely long echo effects. It's most useful for modeling discrete, panned echo taps. Some ways this is useful are:

  • Emphasize footsteps off walls in large spaces, by computing the parameters for the taps off level geometry.
  • Emphasize openings or cooridors.
  • Pair it with a reverb implementation to offer additional, highly controlled early reflection emphasis

This is effectively discrete convolution for 2 channels, implemented using an algorithm designed for sparse taps. In other words, the cost of any echo effect is O(taps). Anything up to a few thousand discrete taps is probably fine, but beyond that the cost will become prohibitive.

FdnReverb

A reverb based off a feedback delay network.

Inherits from GlobalEffect.

This is provisional functionality, and subject to change.

Constructors

syz_createGlobalFdnReverb

SYZ_CAPI syz_ErrorCode syz_createGlobalFdnReverb(syz_Handle *out, syz_Handle context);

Creates a global FDN reverb with default settings.

Properties

See remarks for a description of what these do and how to use them effectively.

EnumTypeDefaultRangeDescription
SYZ_P_MEAN_FREE_PATHdouble0.020.0 to 0.5The mean free path of the simulated environment.
SYZ_P_T60double1.00.0 to 100.0The T60 of the reverb
SYZ_P_LATE_REFLECTIONS_LF_ROLLOFFdouble1.00.0 to 2.0A multiplicative factor on T60 for the low frequency band
SYZ_P_LATE_REFLECTIONS_LF_REFERENCEdouble200.00.0 to 22050.0Where the low band of the feedback equalizer ends
SYZ_P_LATE_REFLECTIONS_HF_ROLLOFFdouble0.50.0 to 2.0A multiplicative factor on T60 for the high frequency band
SYZ_P_LATE_REFLECTIONS_HF_REFERENCEdouble500.00.0 to 22050.0Where the high band of the equalizer starts.
SYZ_P_LATE_REFLECTIONS_DIFFUSIONdouble1.00.0 to 1.0Controls the diffusion of the late reflections as a percent.
SYZ_P_LATE_REFLECTIONS_MODULATION_DEPTHdouble0.010.0 to 0.3The depth of the modulation of the delay lines on the feedback path in seconds.
SYZ_P_LATE_REFLECTIONS_MODULATION_FREQUENCYdouble0.50.01 to 100.0The frequency of the modulation of the delay lines inthe feedback paths.
SYZ_P_LATE_REFLECTIONS_DELAYdouble0.010.0 to 0.5The delay of the late reflections relative to the input in seconds.

Note that SYZ_P_INPUT_FILTER defaults to a lowpass Butterworth with a cutoff frequency of 1500 HZ.

Remarks

This is a reverb composed of a feedback delay network with 8 internal delay lines. The algorithm proceeds as follows:

  • Audio is fed through the input filter, a lowpass. Use this to eliminate high frequencies, which can be quite harsh when fed to reverb algorithms.
  • Then, audio is fed into a series of 8 delay lines, connected with a feedback matrix. It's essentially a set of parallel allpass filters with some additional feedbacks, but inspired by physics.
    • Each of these delay lines is modulated, to reduce periodicity.
    • On each feedback path, the audio is fed through an equalizer to precisely control the decay rate in 3 frequency bands.
  • Two decorrelated channels are extracted. This will be increased to 4 when surround sound support is added.
  • Finally, the output is delayed by the late reflections delay.

The current reverb modle is missing spatialized early reflections. Practically speaking this makes very little difference when using an FDN because the FDN simulates them effectively on its own, but the SYZ_P_EARLY_REFLECTIONS_* namespace is reserved for that purpose. The plan is to feed them through HRTF in order to attempt to capture the shape of the room, possibly with a per-source model.

The reverb is also missing the ability to pan late reflections; this is on the roadmap.

The default configuration is something to the effect of a medium-sized room. Presets will be added in future. The following sections explain considerations for reverb design with this algorithm:

A Note On Property Changes

The FdnReverb effect involves a large amount of feedback and is therefore impossible to crossfade efficiently. To that end,we don't try. Expect most property changes save for t60 and the hf/lf frequency controls to cause clicking and other artifacts.

To change properties smoothly, it's best to create a reverb, set all the parameters, connect all the sources to the new one, and disconnect all the sources from the old one, in that order. Synthizer may eventually do this internally, but that necessitates taking a permanent and large allocation cost without a lot of implementation work being done first, so for the moment we don't.

In practice, this doesn't matter. Most environments don't change reverb characteristics. A good flow is as follows:

  • Design the reverb in your level editor/other environment.
  • When necessary, use syz_effectReset for interactive experimentation.
  • When distributing/launching for real, use the above crossfading instructions.

It is of course possible to use more than one reverb at a time as well, and to fade sources between them at different levels. Note, however, that reverbs are relatively expensive.

The Input Filter

Most reverb algorithms have a problem: high frequencies are emphasized. Synthizer's is no different. To solve this, we introduce an input lowpass filter, which can cut out the higher frequencies. This is SYZ_P_FILTER_INPUT, available on all effects, but defaulted by the reverb to a lowpass at 1500 HZ because most of the negative characteristics of reverbs occur when high frequencies are overemphasized.

Changing this cutoff filter is the strongest tool available for coloring the reverb. Low cutoffs are great for rooms with sound dampening, high cutoffs for concrete walls. It can be disabled, but doing so will typically cause metallic and periodic artifacts to be noticeable.

It's also possible to swap it with other filter types. Lowpass filters are effectively the only filter type that aligns with the real world in the context of a reverb, but other filter types can produce interesting effects.

Choosing the mean free path and late reflections delay

These two values are most directly responsible for controlling how big a space feels. Intuitively, the mean free path is the average distance from wall to wall, and the late reflections delay is the time it takes for audio to hit something for the first time. In general, get the mean free path by dividing the average distance between the walls by the speed of sound, and set the late reflections delay to something in the same order of magnitude.

A good approximation for the mean free path is 4 * volume / surface_area. Mathematically, it's the average time sound travels before reflection off an obstacle. Very large mean free paths produce many discrete echoes. For unrealistically large values, the late reflections won't be able to converge at all.

Choosing T60 and controlling per-band decay

The t60 and related properties control the gains and configuration of a filter on the feedback path.

The t60 of a reverb is defined as the time it takes for the reverb to decay by -60db. Effectively this can be thought of as how long until the reverb is completely silent. 0.2 to 0.5 is a particularly reverberant and large living room, 1.0 to 2.0 is a concert hall, 5.0 is an amazingly large cavern, and values larger than that quickly become unrealistic and metallic.

Most environments don't have the same decay time for all frequency bands, so the FdnReverb actually uses a 3-band equalizer instead of raw gains on the feedback paths. The bands are as follows:

  • 0.0 to SYZ_P_LATE_REFLECTIONS_LF_REFERENCE
  • SYZ_P_LATE_REFLECTIONS_LF_REFERENCE to SYZ_P_LATE_REFLECTIONS_HF_REFERENCE
  • SYZ_P_LATE_REFLECTIONS_HF_REFERENCE to nyquist

SYZ_P_T60 controls the decay time of the middle frequency band. The lower band is t60 * lf_rolloff, and the upper t60 * hf_rolloff. This allows you to simply change T60, and let the rolloff ratios control coloration.

Intuitively, rooms with carpet on all the walls have a rather low hf reference and rolloff, and giant stone caverns are close to equal in all frequency bands. The lf reference/rolloff pairing can be used primarily for non-natural base boosting. When the reverb starts, all frequencies are relatively equal but, as the audio continually gets fed back through the feedback paths, the equalizer will emphasize or deemphasize the 3 frequency bands at different rates. To use this effectively, treat the hf/lf as defining the materials of the wall, then move t60.

Note that the amount of coloration you can get from the equalizer is limited especially for short reverbs. To control the perception of the environment more bluntly and independently of t60, use the input filter.

Diffusion

The diffusion of the reverb is how fast the reverb tail transitions from discrete echoes to a continuous reverberant response. Synthizer exposes this to you as a percent-based control, since it's not conveniently possible to tie anything to a real physical quantity in this case. Typically, diffusion at 1.0 (the default) is what you want.

Another way to think of diffusion is how rough the walls are, how many obstacles there are for sound to bounce off of, etc.

Delay Line modulation

A problem with feedback delay networks and/or other allpass/comb filter reverb designs is that they tend to be obviously periodic. To deal with this, modulation of the delay lines on the feedback path is often introduced. The final stage of designing an FdnReverb is to decide on the values of the modulation depth and frequency.

The trade-off here is this:

  • At low modulation depth/frequency, the reverb likes to sound metallic.
  • At high modulation depth/frequency, the reverb gains very obvious nonlinear effects.
  • At very high modulation depth/frequency, the reverb doesn't sound like a reverb at all.

FdnReverb tries to default to universally applicable settings, but it might still be worth adjusting these. To disable modulation all together, set the depth to 0.0; due to internal details, setting the frequency to 0.0 is not possible.

The artifacts introduced by large modulation depth/frequency values are least noticeable with percussive sounds and most noticeable with constant tones such as pianos and vocals. Inversely, the periodic artifacts of no or little modulation are most noticeable with percussive sounds and least noticeable with constant tones.

In general, the best way to not need to touch these settings is to use realistic t60, as the beginning of the reverb isn't generally periodic.

Audio EQ Cookbook

The following is the Audio EQ Cookbook, containing the most widely used formulas for biquad filters. Synthizer's internal implementation of most filters either follows these exactly or is composed of cascaded/parallel sections.

There are several versions of this document on the web. This version is from http://music.columbia.edu/pipermail/music-dsp/2001-March/041752.html.

         Cookbook formulae for audio EQ biquad filter coefficients
---------------------------------------------------------------------------
by Robert Bristow-Johnson <rbj at gisco.net>  a.k.a. <robert at audioheads.com>


All filter transfer functions were derived from analog prototypes (that 
are shown below for each EQ filter type) and had been digitized using the 
Bilinear Transform.  BLT frequency warping has been taken into account 
for both significant frequency relocation and for bandwidth readjustment.

First, given a biquad transfer function defined as:

            b0 + b1*z^-1 + b2*z^-2
    H(z) = ------------------------                                (Eq 1)
            a0 + a1*z^-1 + a2*z^-2

This shows 6 coefficients instead of 5 so, depending on your architechture,
you will likely normalize a0 to be 1 and perhaps also b0 to 1 (and collect
that into an overall gain coefficient).  Then your transfer function would
look like:

            (b0/a0) + (b1/a0)*z^-1 + (b2/a0)*z^-2
    H(z) = ---------------------------------------                 (Eq 2)
               1 + (a1/a0)*z^-1 + (a2/a0)*z^-2

or

                      1 + (b1/b0)*z^-1 + (b2/b0)*z^-2
    H(z) = (b0/a0) * ---------------------------------             (Eq 3)
                      1 + (a1/a0)*z^-1 + (a2/a0)*z^-2


The most straight forward implementation would be the Direct I form (using Eq 2):

y[n] = (b0/a0)*x[n] + (b1/a0)*x[n-1] + (b2/a0)*x[n-2]
                    - (a1/a0)*y[n-1] - (a2/a0)*y[n-2]              (Eq 4)

This is probably both the best and the easiest method to implement in the 56K.



Now, given:

    sampleRate (the sampling frequency)

    frequency ("wherever it's happenin', man."  "center" frequency 
        or "corner" (-3 dB) frequency, or shelf midpoint frequency, 
        depending on which filter type)
    
    dBgain (used only for peaking and shelving filters)

    bandwidth in octaves (between -3 dB frequencies for BPF and notch
        or between midpoint (dBgain/2) gain frequencies for peaking EQ)

     _or_ Q (the EE kind of definition)

     _or_ S, a "shelf slope" parameter (for shelving EQ only).  when S = 1, 
        the shelf slope is as steep as it can be and remain monotonically 
        increasing or decreasing gain with frequency.  the shelf slope, in 
        dB/octave, remains proportional to S for all other values.



First compute a few intermediate variables:

    A     = sqrt[ 10^(dBgain/20) ]
          = 10^(dBgain/40)                    (for peaking and shelving EQ filters only)

    omega = 2*PI*frequency/sampleRate

    sin   = sin(omega)
    cos   = cos(omega)

    alpha = sin/(2*Q)                                     (if Q is specified)
          = sin*sinh[ ln(2)/2 * bandwidth * omega/sin ]   (if bandwidth is specified)

    beta  = sqrt(A)/Q                                     (for shelving EQ filters only)
          = sqrt(A)*sqrt[ (A + 1/A)*(1/S - 1) + 2 ]       (if shelf slope is specified)
          = sqrt[ (A^2 + 1)/S - (A-1)^2 ]


Then compute the coefficients for whichever filter type you want:

  The analog prototypes are shown for normalized frequency.
  The bilinear transform substitutes:
  
                1          1 - z^-1
  s  <-  -------------- * ----------
          tan(omega/2)     1 + z^-1

and makes use of these trig identities:

                    sin(w)
   tan(w/2)    = ------------
                  1 + cos(w)


                  1 - cos(w)
  (tan(w/2))^2 = ------------
                  1 + cos(w)



LPF:            H(s) = 1 / (s^2 + s/Q + 1)

                b0 =  (1 - cos)/2
                b1 =   1 - cos
                b2 =  (1 - cos)/2
                a0 =   1 + alpha
                a1 =  -2*cos
                a2 =   1 - alpha



HPF:            H(s) = s^2 / (s^2 + s/Q + 1)

                b0 =  (1 + cos)/2
                b1 = -(1 + cos)
                b2 =  (1 + cos)/2
                a0 =   1 + alpha
                a1 =  -2*cos
                a2 =   1 - alpha



BPF (constant skirt gain):    H(s) = s / (s^2 + s/Q + 1)

                b0 =   Q*alpha
                b1 =   0
                b2 =  -Q*alpha
                a0 =   1 + alpha
                a1 =  -2*cos
                a2 =   1 - alpha


BPF (constant peak gain):     H(s) = (s/Q) / (s^2 + s/Q + 1)

                b0 =   alpha
                b1 =   0
                b2 =  -alpha
                a0 =   1 + alpha
                a1 =  -2*cos
                a2 =   1 - alpha



notch:          H(s) = (s^2 + 1) / (s^2 + s/Q + 1)

                b0 =   1
                b1 =  -2*cos
                b2 =   1
                a0 =   1 + alpha
                a1 =  -2*cos
                a2 =   1 - alpha



APF:          H(s) = (s^2 - s/Q + 1) / (s^2 + s/Q + 1)

                b0 =   1 - alpha
                b1 =  -2*cos
                b2 =   1 + alpha
                a0 =   1 + alpha
                a1 =  -2*cos
                a2 =   1 - alpha



peakingEQ:      H(s) = (s^2 + s*(A/Q) + 1) / (s^2 + s/(A*Q) + 1)

                b0 =   1 + alpha*A
                b1 =  -2*cos
                b2 =   1 - alpha*A
                a0 =   1 + alpha/A
                a1 =  -2*cos
                a2 =   1 - alpha/A



lowShelf:       H(s) = A * (s^2 + beta*s + A) / (A*s^2 + beta*s + 1)

                b0 =    A*[ (A+1) - (A-1)*cos + beta*sin ]
                b1 =  2*A*[ (A-1) - (A+1)*cos            ]
                b2 =    A*[ (A+1) - (A-1)*cos - beta*sin ]
                a0 =        (A+1) + (A-1)*cos + beta*sin
                a1 =   -2*[ (A-1) + (A+1)*cos            ]
                a2 =        (A+1) + (A-1)*cos - beta*sin



highShelf:      H(s) = A * (A*s^2 + beta*s + 1) / (s^2 + beta*s + A)

                b0 =    A*[ (A+1) + (A-1)*cos + beta*sin ]
                b1 = -2*A*[ (A-1) + (A+1)*cos            ]
                b2 =    A*[ (A+1) + (A-1)*cos - beta*sin ]
                a0 =        (A+1) - (A-1)*cos + beta*sin
                a1 =    2*[ (A-1) - (A+1)*cos            ]
                a2 =        (A+1) - (A-1)*cos - beta*sin

Synthizer 0.8.x

Highlights of this Release

This release is about performance, and undoing the Linux rollback in 0.7.7. As of this release, all code paths from the external interface don't block on the audio thread in the best case. An internal refactor and introduction of a new command architecture unify a bunch of ad-hoc interfaces for fast inter-thread synchronization.

Specifically:

  • Property reads are now a couple of atomic operations as opposed to a semaphore and waiting on the next audio tick. Put another way, they're literally millions of times faster, and it's now possible to read things like BufferGenerator.position without issue.
  • property writes are now roughly 5 times faster than they used to be
  • object creation is essentially free
  • adding and removing generators from sources doesn't block
  • configuring routes for effects doesn't block

There is one exception to the lack of blocking: if code does around 10000 operations per audio tick, Synthizer will have no choice but to block due to internal resource exhaustion. If this proves to be an issue in practice, it is trivial to raise this limit further.

A Note on Breakage

The above changes were a complex refactor. In order to make them with as little chaos as possible Synthizer has begun introducing some basic unit tests, but nonetheless it is nearly impossible to make a release like this one without issues. You are encouraged to report issues against Synthizer's GitHub repository.

Compatibility

Synthizer introduces the following compatibility breakages in this release:

In order to get the above performance increases, it was necessary to remove the ability to read object properties. This has the side effect of hiding details of object lifetime, which Synthizer may rely on in future.

FdnReverb.input_filter_enabled and FdnReverb.input_filter_cutoff were removed. Synthizer will be replacing these with a dedicated filter property type in the near future and opted to remove them now as to avoid an excessive number of releases that introduce backward-incompatible changes.

Property reads are now actually eventually consistent. The manual preivously stated that it's possible for a read that comes after a write to read a stale value, so code shouldn't have been relying on it. In any case, Synthizer now uses this reservation for the above performance increases, and may break code that incorrectly relied on the old behavior.

Synthizer has also changed the default panner strategy to SYZ_PANNER_STRATEGY_STEREO. To re-enable HRTF, set this on a per-source basis. The ability to set this default on a per-context basis will be introduced in the 0.8.x series. This change was made because HRTF is only useful for headphone users, and it is not possible for Synthizer to reliably detect that case. In general, stereo panning is safe on every audio configuration including surround sound systems.

Patch Notes

0.8.0

  • Initial release
  • Undo the rollbacks in 0.7.7 and reintroduce MSVC and Linux support.
  • Internal fix for wire filters, which now don't play silence.
  • A number of small fixes and slight quality improvements to reverb.
  • All of the massive performance increases from above.

0.8.1

  • Miniaudio was improperly configured to use a conservative profile, rather than a low latency profile. Caused extreme audio latency.

0.8.3

  • Fix: FdnReverb.t60 = 0.0 no longer internally divides by 0.
  • fix: BufferGenerator again uses seconds for positions. This was refactored incorrectly during 0.8.0's major changes.
  • Objects should now die faster when their handles are freed, due to holding only weak references internally where possible when dispatching commands to the audio thread.

0.8.4

  • Contexts and all generators now support SYZ_P_GAIN.
  • Contexts, sources, and generators now support play/pause via syz_pause and syz_play.
    • In Python, this is src.pause() and similar.
  • As the first part of the event system, all Synthizer objects may now have arbitrary userdata associated with them:
    • In C, this is syz_getUserdata and syz_setUserdata
    • In Python, this is get_userdata and set_userdata
    • This will be documented properly when the rest of the event system exists, but can be used to link Synthizer objects to non-Synthizer objects without having to maintain a custom mapping yourself.
  • Slight memory savings.
  • Fix: deleting contexts no longer crashes.

0.8.5

  • New function syz_getObjectType which queries the object type. Primarily intended for bindings developers.
  • Fix: Synthizer no longer leaks megabytes of memory per second (issue #44).

0.8.6

This release contains somewhat experimental code to make decoding buffers faster by approximately 2X. Please report any perceptible audio quality issues with BufferGenerator.

  • syz_handleFree now no-ops when passed handle = 0.
  • Fix: deleting sources in specific orders no longer causes audio artifacts (issue #46)

0.8.7

  • Fix: the library no longer crashes due to internal races between handle deletion and execution of audio commands on the background thread
    • This fixes a number of crashes, most notably issue #50, which highlighted the issue due to effects relying heavily on commands.
  • Fix: streaming generators no longer spuriously seek to the beginning of their audio after generating the first few MS of audio (issue #49)
  • Fix: panning_scalar will once more properly take effect if set immediately after PannedSource creation (issue #52)

0.8.8

  • An event system. This is alpha, and changes are expected in 0.9.
  • Add SYZ_P_PANNER_STRATEGY to Context to default panner strategies for new sources.

0.8.9

  • Fix: non-looping BufferGenerator with a pitch bend is no longer silent
  • Fix/improvement: In Python, Synthizer events inherit from a common base class.

0.8.10

This release is the long-awaited improved HRTF. This might not be the final version, but it's much better in quality and the scripts to generate it are now much easier to maintain. Please leave any feedback here. in particular, it is not feasible to test the entire sphere, so it may be possible to find azimuth/elevation combinations which do weird things.

0.8.11-0.8.16

These releases are a test series for moving CI to GitHub Actions. Though not user-facing, 0.8.14 now blocks on successful Linux CI for Ubuntu 20 and thus gives Linux support which is on par with Windows.

0.8.17

The flagship feature of this release is filters. Python users should see the Python tutorial. Also:

  • Synthizer now builds on Ubuntu 18.
  • Python no longer eroniously exports top-level enums due to undocumented Cython behavior.
    • If you were using ints as property values instead of enums or relying on the topp-level definition, you will need to update your code. This was never intended to be part of the public API.
  • It is possible to set generator gain before adding it to a source without triggering a crossfade. Internal changes to the crossfade helpers should make bugs like this much rarer, and probably fixed others that were yet to be reported.

Release Notes for 0.0.x to 0.8.x

Starting at 0.8 and later, Synthizer maintains separate pages for every incompatible release with more details on compatibility breakage and what specifically changed. The following were early development versions, and should no longer be used.

0.7.7

  • Identical to 0.7.5. 0.7.6 introduced a major performance regression that makes Synthizer unusable (probably issue #32, but investigations are ongoing). After this is fixed, the next version will contain all the nice 0.7.6 things.

0.7.6

  • Synthizer now builds on Linux. This is preliminary. If you experience issues, please report them, as Linux isn't my primary platform. That said, I have received multiple confirmations that it works.
  • Python now publishes source distributions, which are capable of building without any extra intervention on your part, except for Windows which requires being in an appropriate MSVC shell. On Linux, you can install Synthizer into virtualenvs with a simple pip install synthizer, assuming a supported C and C++ compiler.
    • Note that git is currently required in order to clone the Synthizer repository. We might include Synthizer inline in future.
  • Prebuilt C artifacts for Windows now link against the multithreaded dynamic CRT. This was done to speed up Python builds and because there is no obvious right choice. If you need a different configuration, please build from source specifying CMAKE_MSVC_RUNTIME_LIBRARY and SYNTHIZER_LIB_TYPE to appropriate values.
  • The audio generation all runs inline in the audio callback. This has the knock-on effect of making. property reads even slower than they were already, but is necessary to work well on Linux. See issue #32 for tracking fast property reads.

0.7.5

  • Introduce reverb.
  • Start building Python 3.9 wheels.
  • Sources now fade gain changes to prevent clicks, especially when feeding effects.
  • Expose syz_resetEffect for interactive experimentation purposes, to clear the internal state of effects. In python this is .reset() on any effect.
  • All effects now have a SYZ_P_GAIN property to set the overall gain of the effect.
  • internally fix the bitsets, which manifested as weird issues with allocating sources to panner lanes, for example issue #16 where PannedSource could become silent. Note that this may introduce other issues in exchange: it was assumed that the bitsets were working this entire time.

0.7.4

0.7.3

  • Fix pitch bend support on BufferGenerator when looping is enabled. For the morbidly curious there is indeed a difference between std::fmod and std::remainder
  • Maybe fix bugs in stereo panning for the same reason. If you had issues with stereo panning and didn't report them, you should probably try this release.

0.7.2

  • New property on BufferGenerator: SYZ_P_PITCH_BEND.
  • We will adopt pre-1.0 semantic versioning going forward in order to be compatible with Rust-style ^ dependencies: 0.major.minor, with minor incrementing for features and major incrementing for incompatible API updates. This should enable at least some compatibility in version numbers across package managers.

0.7.1

  • Attempt releasing 32-bit Python wheels.

0.7.0

  • Add a noise generator.
  • Internal fixes to filters.

0.6.4

  • Python bindings now release the GIL in from_stream.

0.6.3

  • Re-enable dr_flac SIMD.
  • Fix a bug with resampling when decoding to buffers.

0.6.1

This release temporarily disables dr_flac SIMD support until resolution of this upstream issue.

0.6.0

Features

  • Major improvements to StreamingGenerator:
    • Position and looping are now exposed.
    • Streaming now happens in a background thread.
    • Streaming now builds up a buffer which prevents underruns.
    • Note that this uses a bunch of as-yet-untested threading/concurrency stuff that hasn't been exercised heavily, so if you notice bugs please open issues.
  • Introduce a stereo panning strategy. Note that this will become the default panning strategy in the future, because it's the only one that's safe on all speaker arrangements. If you want HRTF, request it by setting SYZ_P_PANNER_STRATEGY on the context and/or sources.

Bugs Fixed

  • Throw an exception instead of silently crashing on invalid audio files.
  • Fix the fundamentally broken DirectSource mixing logic. This probably still needs some improvement but is no longer fundamentally broken.
  • Fix Stream seeking when using LookaheadByteStream internally. This fixes/allows for StreamingGenerator to seek, and may also fix loading of audio files into buffers in some cases.

0.5.0

  • Roll out a deferred freeing strategy, which uses C++ custom allocators to move freeing pointers to a background thread that wakes up on a period. This doesn't move all freeing, but gets a vast majority of it. The impact is that it will take a little while for any large memory allocations to actually free, but that freeing will (mostly) not happen on any thread but the freeing thread, including threads from users of the public API. This is the first step to eventually decreasing latency below 20MS or so, though the upshot of that work won't be seen for a while.
  • Decouple Buffer from context. The manual already claimed that we did, but the library itself didn't. Calls to create buffers no longer needa context passed in.
  • Gains are now scalars instead of DB. This is because it is valuable to be able to know which combinations of gains sum to 1 in order to prevent clipping.
    • For reference, you can get a scalar gain from DB as 10**(db / 20) if you need it.
  • Fix creating buffers of files which require resampling.

0.4.1

  • Fixes to allocating multiple panner lanes from a PannerBank.
    • Manifested as the inability to play multiple sources.
  • Fix HRTF computation for angles outside the built-in HRTF dataset, but which are still in range for elevation.

0.4.0

  • Introduce DirectSource, for music and other non-panned assets.
  • All sources now have gain, as part of refactoring for DirectSource.
  • Introduce syz_bufferGetChannels, syz_bufferGetLengthInSamples, and syz_bufferGetLengthInSeconds, and obvious Python equivalents.
  • Source add/remove generators needed to happen in the audio thread in order to not break. For now this is using Context.call, but in future it'll probably use a ConcurrentQueue to enqueue updates and avoid the overhead, or perhaps an abstracted PropertyRing.
  • Move to moodycamel::ConcurrentQueue. This removes mutexes in a lot of places
    • As part of this, get rid of our custom lockfree queues. ConcurrentQueue is better; it's MPMC and not intrusive.
  • Get rid of some dead code.
  • Fix double properties that use P_DOUBLE_MIN to mean minimum double, not 0. This bug brought to you by surprising C++ numeric_limits behavior.
  • Fix looping in BufferGenerator.
  • Offer additional guarantees around syz_getX behavior with respect to properties that Synthizer modifies, like generator positions.
  • CONTRIBUTING.md and pull request templates.

0.3.0

  • Added support for Flac and MP3 via dr_libs.
  • Extended byte streams to be able to advertise their size.