Allgemeine Beschreibung

Das Cocktail-Party-Phänomen bezeichnet die Fähigkeit des menschlichen Gehörs auch in einem vollkommenen Durcheinander verschiedener Schallquellen einzelne Schallquellen voneinander zu trennen. Anwendung finden sich  z.B. in der automatisierten Auswertung von Spektraldaten in der chemischen Analytik. Die Studierenden sollen in diesem Versuch diesen Effekt zum einen wieder mit Schallquellen untersuchen und hierfür Methoden des unüberwachten Lernens nutzen. Zum anderen soll ein kostengünstiger NIR-Sensor zur Spektralanalyse von Kunststoffen erprobt und ggf. zur Klassifikation von Plastikabfall eingesetzt werden. Zur Messdatenaufnahme werden wieder die IoT-Kits genutzt, die Auswertung mittels ML-Algorithmen erfolgt in der Cloud. Mögliche Verfahren zur Merkmalsreduktion sind die Hauptkomponentenanalyse (PCA) und die Independent-Component-Analyse (ICA).


Independent component analysis is a classical machine learning tool that is used to extract hidden patterns from data. In this exercise you will be introduced to this method in the context of blind source separation in the cocktail party problem using MATLAB.

The cocktail party problem describes the human ability to extract and recover a desired voice in an environment with multiple speakers. Given some observation mixtures and without prior knowledge of the mixing system or the source signals themselves we aim to recover the original sources. For this purpose we apply ICA in the frequency domain and perform permutation alignment and scaling correction.

The data here is created similar to the localization experiment. We use the convolution of the room impulse responses and some speech signals to create the convolutive observation mixtures. Note that the ICA algorithm relies on the assumption that the sources are independent of each other.

  • Preprocessing: The observation mixture in this experiment is assumed to have been centered and whitened
  • Algorithm: The Blind source separation process for linear mixing system can be described as


The aim of BSS is to estimate an unmixing matrix \(W\) that would recover the source signals as accurately as possible. The permutation and scaling ambiguities are inherent to all algorithms in ICA. The practically achievable separation is given by the equation below


where s is the vector of original signals and y the recovered signals. The permutation ambiguity states that an ICA algorithm cannot completely recover the signals in their original order, instead they are permuted by the permutation matrix \(Π\). A permutation matrix has only a single 1 in every row and column and zeros everywhere else. This can be understood as the columns of the mixing matrix being permuted arbitrarily, of which the algorithm is unaware, yet it is still a valid mixture.
The scaling ambiguity describes the fact that every separated signal has unit variance and is thus scaled arbitrarily compared to the original one. The multiplication with the diagonal matrix \(Γ\) = diag(γ1,…,γN) accounts for the scaling of the output channel i by γi after permutation. For the case of a determined system of N ×N sources and microphones, the scaling ambiguity can be solved rather easily. The minimal distortion principle proposed by (Matsuoka2002) claims that each microphone signal Xi(m,f) should be affected minimally by the separation process leading to the estimated source signal Yi(m,f) for i ∈ {1,…,N}. This method is implemented below with \(W_p(f)\) denoting the unmixing matrices after permutation alignment

$$\mathbf{W}_s(f) = \mathrm{diag}(\mathbf{W}_p^{-1}(f)) \mathbf{W}_p(f)$$

Cost function: As the aim of ICA is to obtain maximally independent estimates of the original sources, it is necessary for us to provide the cost function as a measure of independence. The common choices here are kurtosis and negentropy which are used in their normalized and approximate forms as:

$$G_{kurtosis}(x) = x^4 -3$$

$$G_{negentropy}(x) = -\exp(-x^2/2)$$

In this exercise you will first learn to apply the described ICA to linear mixture in the time domain.
The developed framework is then extended to convolutive mixtures in the frequency domain. To do this you will first take the short time Fourier transform of the mixtures and apply ICA to each frequency bin. Afterwards, a permutation alignment and scaling correction is necessary.
The signal envelopes for the different stages of the source separation is shown below:

In this exercise you learn to apply ICA to blind source separation problems. You also learn the necessary steps for applying the method to more complex mixtures by transforming the problem to the frequency domain.