Blind Source Separation and Data Fusion

GIFT As the data now come in a multitude of forms originating from different applications and environments, it has become more apparent that we can no longer make many of the simplifying assumptions, such as stationarity, linearity, Gaussianity, and circularity when developing methods for the analysis of such data. The emphasis in our research lab is on the development of data-driven methods such as independent component analysis (ICA), blind source separation, and canonical dependence analysis (CDA) that minimize the modeling assumptions on the data and achieve useful decompositions of the multi-modal data that can be either used as informative features or can be directly used for inference from the data.

Key references:

  • D. Lahat, T. Adali and C. Jutten, "Multimodal data fusion: An overview of methods, challenges, and prospects," Proc. IEEE, vol. 103, no. 9, pp. 1449-1477, Sep. 2015.
    This paper provides an overview of the main challenges in multimodal data fusion across various disciplines and addresses two key issues: "why we need data fusionā" and "how we perform it."

  • T. Adali, Y. Levin-Schwartz, and V. D. Calhoun, "Multimodal data fusion using source separation: Two effective models based on ICA and IVA and their properties," Proc. IEEE, vol. 103, no. 9, pp. 1478-1493, Sep. 2015.
    This paper introduces two powerful data-driven models and provides guidance on the selection of a given model and its implementation while emphasizing the general applicability of the two models.

  • T. Adali Y. Levin-Schwartz, and V. D. Calhoun, "Multimodal data fusion using source separation: Application to medical imaging," Proc. IEEE, vol. 103, no. 9, pp. 1494-1506, Sep. 2015.
    This paper demonstrates the application of the two models introduced in the previous paper to fusion of medical imaging data from three modalities: functional magnetic resonance imaging (MRI), structural MRI, and electroencephalography data and discusses the tradeoffs in various modeling and parameter choices.

  • T. Adali, M. Anderson, and G.-S. Fu, "Diversity in independent component and vector analyses: Identifiability, algorithms, and applications in medical imaging," IEEE Signal Processing Magazine, vol. 31, no. 3, pp. 18-33, May 2014.
    In this overview article, we first present ICA, and then its generalization to multiple data sets, IVA, both using mutual information rate, present conditions for the identifiability of the given linear mixing model and derive the performance bounds. We address how various methods fall under this umbrella and give examples of performance for a few sample algorithms compared with the performance bound. We then discuss the importance of approaching the performance bound depending on the goal, and use medical image analysis as the motivating example.

Active projects:

    CIF: Small: Source Separation with an Adaptive Structure for Multi-Modal Data Fusion
    Funded by NSF CCF, Division of Computing and Communication Foundations (Grant Number: NSF-CCF 1618551)
    In many fields today, information about a given phenomenon is obtained through different types of acquisition techniques and experimental conditions, and the availability of such multimodal data has been growing. Joint analysis of this data---its fusion---promises a more comprehensive and informative view of the task at hand, and, if probed carefully, may open new venues and answer questions we might not have even thought of asking when working with a single modality. There is now significant activity in data fusion in disciplines spanning medical imaging, remote sensing, speech processing, behavioral sciences, and metabolomics, to name a few. A common challenge across multiple disciplines is determining how and to what degree different datasets are related, i.e., identifying the common and distinct subspaces across multiple datasets in terms of the information they provide for the given task. The current work provides a powerful solution to this key challenge enabling identification of the relationship among multiple datasets in a completely data-driven manner such that both the common and distinct subspaces within each dataset can be robustly identified and analyzed.

    CRCNS: Informed Data-Driven Fusion of Behavior, Brain Function, and Genes
    Funded by NIH-NIBIB (Grant Number: R01 EB 005846)
    As imaging scanners improve and are able to collect images faster, more studies collecting multiple types of imaging information from the same participants are being used to study connectivity in healthy controls and pathological states. However, each existing modality for imaging the living brain can only report upon a limited domain. The goal of the second project CRCNS: Collaborative Research: Spatiotemporal Fusion of fMRI, EEG, and Genetic Data is to examine associations among fMRI, EEG, and genetic variations related to healthy and abnormal brain function. As part of this project, we develop a set of tools based on ICA that can effectively fuse the information provided by multiple imaging modalities to span a vast range of spatial and temporal scales.

Recent related projects:

    CIF: Small: Collaborative Research: Entropy Rate for Source Separation and Model Selection: Applications in fMRI and EEG Analysis
    Funded by NSF-CCF: Award no: 1117056 and Award no: 1116944

    The focus of this research is the development of a class of powerful methods for source separation and model selection using entropy rate so that one can take both the higher-order-statistical information and sample correlation into account to achieve significant performance gains in more challenging problems. The main application domain is one that can truly take advantage of this fully combined approach: the analysis of functional magnetic resonance (fMRI) data and the rejection of gradient and pulse artifacts in electroencephalography (EEG) in concurrent EEG-fMRI data.

    Collaborative Research: III: Small: Collaborative Research: Canonical Dependence Analysis for Multi-modal Data Fusion and Source Separation
    Funded by NSF-IIS: Award no: 1017718 and Award no: 1016619

    In this grant, the main aim is twofold. First, a number of powerful methods are developed for multi-subject (multi-set) data analysis and multi-modal data fusion based on canonical dependence analysis by significantly extending the power and flexibility of MCCA. Then, the successful application of the methods are demonstrated on a unique problem that demands these properties, namely the study of brain function and functional associations during simulated driving, a naturalistic task where data-driven methods have proven very useful.

    Collaborative Research: SEI: Independent Component Analysis of Complex-Valued Brain Imaging Data
    Funded by NSF-IIS (Award no: 0612076)
    We develop a class of complex ICA algorithms, in particular for analysis of biomedical imaging data and demonstrate the power of joint data analysis as well as performing the analysis on the complete set of data, i.e., by utilizing both the magnitude and the phase information. We focus upon three image types, functional magnetic resonance imaging (fMRI), structural MRI (sMRI) and diffusion tensor imaging (DTI). These three imaging data provide complementary information about brain connectivity, and all can benefit from the incorporation of a complex-valued data processing approach.

Other selected references:

  • M. Anderson, G.-S. Fu, R. Phlypo, and T. Adali, "Independent Vector Analysis: Identification Conditions and Performance Bounds," IEEE Trans. Signal Processing, to appear.
  • M. Anderson, X.-L. Li, and T. Adali, "Joint blind source separation with multivariate Gaussian model: Algorithms and performance analysis," IEEE Trans. Signal Processing, vol. 60, no. 4, pp. 2049--2055, April 2012.

Project team:

Resources:

  • MLSP-Lab Resources: Codes for the algorithms we have published.

  • Fusion ICA Toolbox (FIT): FIT is a MATLAB toolbox which implements the joint ICA and parallel ICA methods. It is used to examine the shared information between the features (SPM contrast image, EEG signal or SNP data).