Artificial General Intelligence (AGI), Foundation models and Superintelligence

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR'23) 

 Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR'22)

Relation-aware Language-Graph Transformer for Question Answering (AAA'23)

A superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. The hypothetical ability can be also be referred to as Artificial General Intelligence (AGI). In contrast to weak AI, AGI/foundation  models (e.g., GPT-4, chatGPT, CLIP, and Flamingo) are capable of many diverse tasks without specific training. Meta-learning, also known as "learning to learn" generalizes learning to  'learning strategies'. In sum, our overarching goal is to develop general-purpose learning systems or models that efficiently learn new tasks and perform well on unseen tasks.  This topic includes self-supervised learning, surrogate loss learning, and foundation models.

Our related publications

[ICCV '23] Distribution-Aware Prompt Tuning for Vision-Language Models[ICCV '23] Read-only Prompt Optimization for Vision-Language Few-shot Learning[CVPR '23] Learning to Balance Local Losses via Meta-Learning[AAAI '23] Relation-aware Language-Graph Transformer for Question Answering[CVPR '22] Video-Text Representation Learning via Differentiable Weak Temporal Alignment[InfoSci] Robust Auxiliary Learning with Weighting Function for Biased Data[IEEE ACCESS'21] Learning to Balance Local Losses via Meta-Learning[IEEE ACCESS'21] Learning Non-parametric Surrogate Losses with Correlated Gradients[IEEE ACCESS'21] Self-Supervised Learning for Anomaly Detection with Dynamic Local Augmentation

Deep Understanding of Visual World

HOTR: End-to-End HOI Detection with Transformers (CVPR'21 Oral)

Consistency Learning via Decoding Path Augmentation (CVPR'2022)

UnionDet (ECCV'20)

High-level computer vision enables a deeper understanding of the visual world. Object recognition systems detect objects in images and videos. They offer basic information on whether certain objects are in the scene and how many instances are in the scene. But the information may not be sufficient for building personalized and automated systems for smart city: smart home, smart offices, and hospitals. Without a deep understanding of the interaction between humans and objects, it is hard to understand the context of the scene and what kind of services are needed. "Scene Understanding" is one topic to study such interaction and generate metadata such as scene graphs. It allows "Visual Question Answering (VQA)". Security cameras are pervasive in modern cities and computer vision helps anomaly detection: flood, wildfire, dangerous wild animals, and estimate traffic and even temperature. We study algorithms that offer a more accurate and deeper understanding of the visual world and help people to live safer and smarter. 

Our related publications

[ICCV '23] Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models[CVPR '22] Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection[CVPR '21] HOTR: End-to-End Human-Object Interaction Detection with Transformers[ECCV '20] UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection[CVPR '18] Tensorize, Factorize and Regularize: Robust Visual Relationship Learning[ECCV '16] Abundant Inverse Regression using Sufficient Reduction and its Applications[ECCV '18] Efficient Relative Attribute Learning using Graph Neural Networks 

3D Computer Vision and Point Clouds

Self-positioning Point-based Transformer for Point Cloud Understanding (CVPR'23)

SageMix: Saliency-Guided Mixup for Point Clouds (NeurIPS '22)

Point Cloud Augmentation with Weighted Local Transformations (ICCV '21)

Shape Classification

Scene Semantic Segmentation

With the recent development of 3D sensors such as LiDAR sensors, RGB-D cameras, and 3D scanners, sufficient 3D data are available by retrieving depth information via stereo information and infrared-based depth measurement. 3D data are crucial to diverse fields like robotics, autonomous driving, AI Drones, medical data analysis, and scene reconstruction.  We are interested in the field of 3D Computer Vision and 3D Deep Learning based on 3D data (e.g., point cloud, voxel, polygonal mesh), which has more complex geometry than 2D data. For example, recent deep learning efforts have focused on enabling networks to directly operate on point clouds, which is an unordered set of points with no inherent structures. Shape classification, indoor/outdoor scene semantic segmentation, and shape correspondence/ registration are representative tasks for point cloud data. However, the availability of 3D data is limited with a high acquisition cost. Thus, we tackle Data Augmentation technique to compensate for the data scarcity issue, which has been less explored in the point cloud literature.

Our related publications

[CVPR '23] Self-positioning Point-based Transformer for Point Cloud Understanding[NeurIP '22] SageMix: Saliency-Guided Mixup for Point Clouds[ICCV '21] Point Cloud Augmentation with Weighted Local Transformations

Graph Neural Networks and Structured Data Analysis

Metropolis-Hastings Data Augmentation for Graph Neural Networks (NeurIPS'21)

Neo-GNNs: Neighborhood Overlap-aware Graph Neural Networks for Link Prediction (NeurIPS'21)

SELAR: Self-supervised Auxiliary Learning (NeurIPS'20)

Graph Transformer Networks (NeurIPS'19)

In modern data analysis, highly-structured data frequently occur and they can be viewed as data on non-Euclidean spaces (e.g., graphs, Riemannian manifolds, data manifolds, and functional spaces). Naive algorithms do not respect the geometry of the data space, often break the structure of data, return invalid predictions in the ambient space (not in the data space of interest). For structured data analysis, our focus is to develop geometrically-inspired machine learning methods and apply them to real world applications such as computer vision, brain imaging, and recommender systems. 

Our related publications

[AAAI '22] Deformable Graph Convolutional Networks[NN '22] Graph Transformer Networks: Learning Meta-path Graphs to Improve GNNs[NeurIPS '21] Metropolis-Hastings Data Augmentation for Graph Neural Networks[NeurIPS '21] Neighborhood Overlap-aware Graph Neural Networks for Link Prediction[NeurIPS '20] Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs[NeurIPS '19] Graph Transformer Networks[ICML '15] Manifold-valued Dirichlet Processes[CVPR '16] Latent Variable Graphical Model Selection using Harmonic Analysis: Applications to the HCP[Quarterly of Applied MathLocalizing differentially evolving covariance structures via scan statistics[ICCV '15] Interpolation on the manifold of k component Gaussian Mixture Models

Safe AI, Adversarial Examples, and Uncertainty

Machine learning models (or deep neural networks) have been used in a variety of applications including autonomous robots, vehicles, and drones. When deploying AI systems to the physical world, the reliability of algorithms is crucial for safety. Guaranteeing such safety includes specification, robustness, and assurance. Given a concrete purpose of the system (specification), the AI system should be robust to perturbations and attacks (adversarial examples). Further, the uncertainty of predictions by models helps monitor and control the AI system's activity. In this line of thought, we study uncertainty of models (e.g., Bayesian Neural Networks) and adversarial examples from both attacker and defender perspectives. This topic may fall in the intersection of AI and security.

Our related publications

[IEEE ACCESS'21] Search-and-Attack: Temporally SparseAdversarial Perturbations on Videos[ECCV '20] Robust Neural Networks inspired by Strong Stability Preserving Runge-Kutta methods[UAI '19] Sampling-free Uncertainty Estimation in Gated Recurrent Units with Applications to Normative Modeling in Neuroimaging[arxiv '18] Sampling-free Uncertainty Estimation in Gated Recurrent Units with Exponential Families 

Medical Imaging

Riemannian MLGM (CVPR)

Medical imaging or brain imaging inherently has many structured measurements such as diffusion tensor image (DTI), high angular resolution diffusion images (HARDI), ensemble average propagators (EAPs), etc. Common goals in medical imaging are to identify important regions related to a certain disease, detect diseases at the early stage, and model the disease progression. To provide predictions and findings that are rigorously tested by statistics, more powerful pipelines are needed. We study a more powerful representation of medical images and models (mixed effects models for structured data, filtering, dimensionality reduction etc.). We also research few-shot detection, domain-adaptation, and contrastive learning to deal with limited samples and labels in the medical domain.

Our related publications

[CVPR '17] Riemannian Nonlinear Mixed Effects Models: Analyzing Longitudinal Deformations in Neuroimaging [CVPRW '17] Riemannian Variance Filtering: An Independent Filtering Scheme for Statistical Tests on Manifold-valued Data [ECCV '15] Canonical Correlation Analysis on Riemannian Manifolds and its Applications [CVPR '14] MGLM on Riemannian Manifolds with Applications to Statistical Analysis of Diffusion Weighted Images