Das Cocktail-Party-Phänomen bezeichnet die Fähigkeit des menschlichen Gehörs auch in einem vollkommenen Durcheinander verschiedener Schallquellen einzelne Schallquellen voneinander zu trennen. Anwendung finden sich z.B. in der automatisierten Auswertung von Spektraldaten in der chemischen Analytik. Die Studierenden sollen in diesem Versuch diesen Effekt zum einen wieder mit Schallquellen untersuchen und hierfür Methoden des unüberwachten Lernens nutzen. Zum anderen soll ein kostengünstiger NIR-Sensor zur Spektralanalyse von Kunststoffen erprobt und ggf. zur Klassifikation von Plastikabfall eingesetzt werden. Zur Messdatenaufnahme werden wieder die IoT-Kits genutzt, die Auswertung mittels ML-Algorithmen erfolgt in der Cloud. Mögliche Verfahren zur Merkmalsreduktion sind die Hauptkomponentenanalyse (PCA) und die Independent-Component-Analyse (ICA).

- Preprocessing: The observation mixture in this experiment is assumed to have been centered and whitened
- Algorithm: The Blind source separation process for linear mixing system can be described as

$$x=As$$

The aim of BSS is to estimate an unmixing matrix \(W\) that would recover the source signals as accurately as possible. The permutation and scaling ambiguities are inherent to all algorithms in ICA. The practically achievable separation is given by the equation below

$$y=Wx=WAs=ΓΠs$$

where s is the vector of original signals and y the recovered signals. The permutation ambiguity states that an ICA algorithm cannot completely recover the signals in their original order, instead they are permuted by the permutation matrix \(Π\). A permutation matrix has only a single 1 in every row and column and zeros everywhere else. This can be understood as the columns of the mixing matrix being permuted arbitrarily, of which the algorithm is unaware, yet it is still a valid mixture.

The scaling ambiguity describes the fact that every separated signal has unit variance and is thus scaled arbitrarily compared to the original one. The multiplication with the diagonal matrix \(Γ\) = diag(γ1,…,γN) accounts for the scaling of the output channel i by γi after permutation. For the case of a determined system of N ×N sources and microphones, the scaling ambiguity can be solved rather easily. The minimal distortion principle proposed by (Matsuoka2002) claims that each microphone signal Xi(m,f) should be aﬀected minimally by the separation process leading to the estimated source signal Yi(m,f) for i ∈ {1,…,N}. This method is implemented below with \(W_p(f)\) denoting the unmixing matrices after permutation alignment

$$\mathbf{W}_s(f) = \mathrm{diag}(\mathbf{W}_p^{-1}(f)) \mathbf{W}_p(f)$$

Cost function: As the aim of ICA is to obtain maximally independent estimates of the original sources, it is necessary for us to provide the cost function as a measure of independence. The common choices here are kurtosis and negentropy which are used in their normalized and approximate forms as:

$$G_{kurtosis}(x) = x^4 -3$$

$$G_{negentropy}(x) = -\exp(-x^2/2)$$

The developed framework is then extended to convolutive mixtures in the frequency domain. To do this you will ﬁrst take the short time Fourier transform of the mixtures and apply ICA to each frequency bin. Afterwards, a permutation alignment and scaling correction is necessary.

The signal envelopes for the diﬀerent stages of the source separation is shown below: