Spatial Music Making in VR

Do you create electronic music or sound design? Or are you a student or professional in Audio / Music Technology? If so, I am running a study over the next few weeks (August and September) and would be great to have your participation!

You will be asked to collaboratively mix a short track using a shared VR spatial audio app. You will then be asked to complete a survey about your experience.

The study will take two hours to complete. All studies will be done in the Media and Arts Technology studios in the Engineering and Materials Science building of Queen Mary University of London, Mile End Road, Tower Hamlets.

Study slots were available from 13/08/19 to 28/08/19.

If you are interested in the context of the research I have some resources here:

Getting my head around regression analysis Pt 1: Setting the problem

Regression analysis and specifically mixed effect linear models (LMMs) is hard – harder than I thought based on what I learned in traditional statistics classes. ‘Modern’ mixed model approaches, although more powerful (as they can handle more complex designs, lack of balance, crossed random factors, some kinds of non-normally distributed responses, etc.), also require a new set of conceptual tools.

This first post covers my process of understanding how to apply the magic of multiple regression to my experimental data. The next post will cover how the analysis was done using R.

The Basics

Before tackling my specific problem as a mixed effect linear model, it is important to review the basic building block of linear regression.

Linear regression is a standard way to build a model of your variables. You want to do this when [source]:

  • You have two variables: one dependent variable and one independent variable. Both variables are interval.
  • You want to express the relationship between the dependent variable and independent variable in a form of a line. That is, you want to express the relationship like y = ax + b, where x and y are the independent and dependent variables, respectively.

Also, there are 4 key concepts in linear regression that should be clear before you attempt extended techniques like LMM or GLM [source]

1. Understand what centring does to your variables: Intercepts are pretty important in multilevel models, so centring is often required to make intercepts meaningful.

2. Work with categorical and continuous predictors: You will want to use both dummy and effect coding in different situations.  Likewise, you want to be able to understand what it means if you make a variable continuous or categorical.  What different information do you get from it and what does it mean?  Even if you’re a regular ANOVA user, it may make sense to treat time as continuous, not categorical.

3. Interactions: Make sure you can interpret interactions regardless of how many categorical and continuous variables they contain.  And make sure you can interpret an interaction regardless of whether the variables in the interaction are both continuous, both categorical, or one of each.

4. Polynomial terms: Random slopes can be hard enough to grasp.  Random curvature is worse, be comfortable with polynomial functions if you have complex data (e.g. the Wundt curve, the bell-shaped relationship of positive affect and complexity in music).

Finally, understand how all these concepts fit together. This means understanding what the estimates in your model mean and how to interpret them.

What is a mixed effect linear model?

Simply, they are statistical models of parameters that vary at more than one level. They are a generalised form of linear regression that builds multiple linear models to provide data on how predictors relate parameters.

Many kinds of data, including observational data collected in experiments, have a hierarchical or clustered structure. For example, children with the same parents tend to be more alike in their physical and mental characteristics than individuals chosen at random from the population at large. Individuals may be further nested within demographic and psychometric features. Multilevel data structures also arise in longitudinal studies where an individual’s responses over time are correlated with each other. In experimental data, LMM is a good way to position individual difference between participants. For example, some participants may be more comfortable with using touchscreens than the others, and thus, their performance in a task might have been better. If we tried to represent this with linear regression,  the model tries to represent the data with one line, this aggressively aggregates differences which may matter to the results being effective and contextually understood.

Multilevel regression, intuitively, allows us to have a model for each group represented in the within-subject factors. In this way, we can also consider the individual differences of the participants (they will be described as differences between the models). What multilevel regression actually does is something like between completely ignoring the within-subject factors (sticking with one model) and building a separate model for every single group (making n separate models for n participants). LMM controls for non-independence among the repeated observations for each individual by adding one or more random effects for individuals to the model. They take the form of additional residual terms, each of which has its own variance to be estimated. Roughly speaking, there are two strategies you can take for random effects: varying-intercept or varying-slope (or do both). Varying-intercept means differences in random effects are described as differences in intercepts. Varying-slope means vice versa: changing the coefficients of some factors.


Dependant/Response variable the variable that you measure and expect to vary given experimental manipulation.

Independent/Explanatory/exogenous variables and Fixed effects are all variables that we expect will have an effect on the dependent/response variable. Factors whose levels are experimentally determined or whose interest lies in the specific effects of each level, such as effects of covariates, differences among treatments and interactions.

Random effects are usually grouping factors for which we are trying to control. In repeated measures designs, they can be either crossed or hierarchical/nested, more on that later. Random effects are factors whose levels are sampled from a larger population, or whose interest lies in the variation among them rather than the specific effects of each level. The parameters of random effects are the standard deviations of variation at a particular level (e.g.among experimental blocks).

The precise definitions of ‘fixed’ and ‘random’ are controversial; the status of particular variables depends on experimental design and context.

My Research Problem

In an experiment comparing Desktop (DT) computer and VR interfaces in a collaborative music-making task, I think that individual users and the dyadic session dynamics affect the amount of speech when doing the task and that the amount of talk will also be affected by media (DT/VR). Basically, the mixture of people and experimental condition will both have effects, but I really want to know the specific effect of media on speech amount.

Data structure

The dependent variable is the frequency of coded speech per user, while demographic surveys produced multiple explanatory variables along with the independent variable of media. So, we also have a series of other variables that may affect the volume of communication. Altogether variables of interest for linear modelling include:

  • Media: media condition DT or VR.
  • User: repeated measure grouping by the participant ID.
  • Session: categorical dyad grouping e.g. A, B, C.
  • Utterance: A section of transcribed speech, a sentence or comparable. Frequencies of utterances used.
  • Pam: Personal acquaintance measure, a psychometric method of evaluating how much you know another person.
  • VrScore: level of experience with VR, simple one to seven scores.
  • MsiPa: Musical sophistication index perceptual ability factor for each user.
  • MsiMtMusical sophistication index musical training factor for each user.

Using the right tool

As I used a repeated measure design for the experiment, where each participant used both interfaces, Media is a within-subject factor. This means I need a statistical method that can account for it. A simple paired t-test or repeated measures ANOVA may be of use but it lacks the ability to include all of the explanatory variables, this leaves us with regression analysis. This decision tree highlights how to proceed with choosing the right form of regression analysis:

  1. If you have one independent variable and do not have any within-subject factor, consider Linear regression. If your dependent variable is binomial, Logistic regression may be more appropriate.
  2. If you have multiple independent variables and do not have any within-subject factor, consider Multiple linear regression.
  3. If you have any within-subject factor, consider Multi-level linear regression (mixed-effect linear model).
  4. For some special cases, consider the Generalized Linear Model (GLM) or Generalized Linear Mixed Model (GLMM).

So, at first I chose to use a mixed-effect linear model (LMM), as I am trying to fit a model that has two random intercepts, e.g. two groups. As such, we are trying to fit a model with nested random effects.

Crossed or Nested random effects

As each User only appears once in each Session, the data can be treated as nested. For nested random effects, the factor appears only within a particular level of another factor; for crossed effects, a given factor appears in more than one level of another factor (User’s appearing within more than one session). An easy rule of thumb is that if your random effects aren’t nested, then they are crossed!

Special Cases…GLM

After a bit of further reading, I found out that my dependent variable meant a standard LMM was not suitable. As the response variable is count data of speech, it violates the assumptions of normal LMMs. When your dependent variable is not continuous, unbounded, and measured on an interval or ratio scale, your model will never meet the assumptions of linear mixed models (LMMs). In steps the flexible, but highly sensitive, Generalised Linear Mixed Models (GLMM).  The difference between LMMs and GLMMs is that the response variables can come from different distributions besides Gaussian, for count data this is often of a Poisson distribution. There are a few issues to keep in mind, though.

  1. Rather than modelling the responses directly, some link function is often applied, such as a log link. For Poisson, the link function (the transformation of Y) is the natural log.  So all parameter estimates are on the log scale and need to be transformed for interpretation, the means applying inverse function of the link, for log this is exponential.
  2. It is often necessary to include an offset parameter in the model to account for the amount of risk each individual had to the event, practically this is a normalising factor such as the total number of utterance across repeated condition.
  3. One assumption of Poisson Models is that the mean and the variance are equal, but this assumption is often violated.  This can be dealt with by using a dispersion parameter if the difference is small or a negative binomial regression model if the difference is large.
  4. Sometimes there are many, many more zeros than even a Poisson Model would indicate.  This generally means there are two processes going on–there is some threshold that needs to be crossed before an event can occur.  A Zero-Inflated Poisson Model is a mixture model that simultaneously estimates the probability of crossing the threshold, and once crossed, how many events occur.

Moving forward

In the next post, I will cover how this analysis is done in the R environment using the lme4 package.


Looking back at people looking forward

In 1995, Heath, Luff, & Sellen lamented the uptake of video conferencing indicating that it had not at the time reached its promise. But looking back at this projection, the ubiquity of video systems for social and work communication can be seen. And subsequently, research has gone about understanding it further in a variety of HCI paradigms (CHI2010, CSCW2010, CHI2018). So, for my research, making projections on the use of VR for music collaboration, it might be that findings and insights do not reach fruition, either, in a timely fashion, or in the domain of interest that they were investigated in, or ever! Though this could be touching on a form of hindsight bias.

Going back to the article that speculated on the unobtained promise of video conferencing technologies, Heath Luff, and Sellen (1995), provide a piece of insight that can still be placed into perspective on design interventions for collaboration:

It becomes increasingly apparent, when you examine work and collaboration in more conventional environments, that the inflexible and restrictive views characteristic of even the most sophisticated media spaces, provide impoverished settings in which to work together. This is not to suggest that media space research should simply attempt to ‘replace’ co-present working environments, such ambitions are way beyond our current thinking and capabilities. Rather, we can learn a great deal concerning the requirements for the virtual office by considering how people work together and collaborate in more conventional settings. A more rigorous understanding of more conventional collaborative work, can not only provide resources with which to recognise how, in building technologies we are (inadvertently) changing the ways in which people work together, but also with ways in which demarcate what needs to be supported and what can be left to one side (at least for time being). Such understanding might also help us deploy these advanced technologies.

The bold section highlights the nub of what I’m interested in; for VR music collaboration systems. I break this down into how I’ve tackled framing collaboration in my research:

  • conventional collaborative work – ethnographies of current and developing practice. Even if you pitch a radical agenda of VR workspace, basic features of the domain of interest need to be understood for their contextual and technical practices.
  • building technology is changing practice – observing the impact of design interventions on how people collaborate in media production. Not only does a technology suggest new ways of working, it can enforce them! Observing and understanding this in domain-specific ways is important.
  • what needs to be supported – basic interactional requirements, we have to be able to make sense of each other, and the work, together, in an efficient manner.
  • what can be left to one side – the exact models and metaphors of how work is constructed in reality, in VR we can create work setups and perspectives that cannot exist in reality. For instance, shared spatial perspectives i.e. seeing the same thing from the same perspective is impossible in reality as we have to occupy a separate physical space. In repositioning basic features of spatial collaboration, the effects need to be understood in terms of interaction and domain requirement. But the value is in finding new ways of doing things not possible in face to face collaboration.

Overall, the key theme that should be taken away is that of humans’ need to communicate and collaborate. In this sense, any research that looks to make collaboration easier is provisioning for basic human understanding. That is quite nice to be a part of.

Media comparison for collaborative music making

Image credit Nicolas Ulloa

Do you create electronic music? Are you a musician, producer, artist or DJ? Or are you a student or professional in Music / Music Technology? If so, I am running a study over the next few weeks (July & August) and would love your participation!

You will be invited to use and compare two different interfaces one in virtual reality and another screen-based. You will be asked to create some drum loops collaboratively with another person using the provided interfaces. You will then be asked to complete a survey about your experience.

The study will take two hours to complete, and you will be paid £25 for your participation. All studies will be done in Computer Science building on the Mile End Campus of Queen Mary University of London.

Study slots are available from 25/07/17 to 18/08/17. Monday-Friday – time slots at 10 am, 12.30 pm, 3.30 pm, and 6 pm. If none of these are suitable for you alternative arrangements can easily be made.

Unfortunately, this study has ended and further appointments are not being made.

If you are interested in the context of the research I have some resources here:

  • Polyadic: design and research of an interface for collaborative music making on a desktop or in VR.
  • Design for collaborative music making: some previous work on the user-centred design cycle involved in the progress of my PhD.

Interference: Journal of Audio Culture

Courtesy of UCLA

Listen too much you have trouble listening? Working in the audio field can often sap you of your basic ability to listen, as you constantly have to produce to deadlines, assess quality and generally conform a product to external conventions. But the most fundamental requirement of sound to a human is to inform them of a position in space, as before verbal communication the auditory system prevented you from getting eaten by bigger predators. This being said the importance of communication cannot be overlooked, as it is tied with our gradual evolutionary supremacy.

In steps Interference. A peer reviewed journal, supported by Trinity College Dublin, that is entirely free access. They describe themselves as follows:

“Interference is an open access forum on the role of sound in cultural practices, providing a trans-disciplinary platform for the presentation of research and practice in areas such as acoustic ecology, sensory anthropology, sonic arts, musicology, technology studies and philosophy. The journal seeks to balance its content between scholarly writing, accounts of creative practice, and an active engagement with current research topics in audio culture.”

Of special note is one article in the current issue Leandra Lambert called “Experienced Sonic Fictions“, which I shall very superficially contextualise for this article. Throughout the introduction of the piece Lambert mentions the founders in the field that established the ‘deep listening’ and ‘sonic awareness’ disciplines, which can be approximated to a form of listening meditation. Proceeding this she describes the process of free-form sound walks, and the associated imagery that is stimulated. As by letting her imagination guide her through these walks the stimulation is less and less guided by any conscious purpose, and in reaction the ideas and concepts imagined become more lucid and fantastic. Though rather random and quite time consuming it does reaffirm the idea that we need to listen to our environments and not try to block them out or classify them to swiftly. Though this capacity for ordering reality is essential in modern life, for a sound designer the ability to stop and actually listen to a scene for all its richness is worth remembering. In many respects it is reminiscent of the John Cage works on silence and of how evocative the absence of direct stimulus is, paradox or contradiction I’m not sure?

Coming back to the opening gambit, though sound walks may not be for you, the idea that to truly assess and recreate a sound scape one must remember how to listen is a very important skill. How you choose to do this can come in many forms, as with all creative processes, but it is a important principle as audio technology reproduction methods approach the means to reproduce true soundscapes to a mass market.

Another journal of note is that of SoundEffects, also open access and very stimulating.