Skip to content

Causal representation learning as causal discovery

Robert Osazuwa Ness edited this page Aug 3, 2022 · 4 revisions

Causal representation learning as causal discovery

This document highlights directions in causal representation learning in the context of causal discovery.

Causal representation learning is a broad area, encompassing many types of data. In dodiscover, the focus is on multivariate set of data that lives naturally in a Pandas DataFrame, in which rows correspond to sample observations and columns to variables.

In many cases learning a representation is in service of a downstream goal like prediction. To that end, some libraries treat the representation as a low-level artifact and secondary abstraction. In constrast, dodiscovery treats the representation as a primary abstraction and provides an API for manipulating, visualizing and using representations with other tasks.

Causal discovery algorithms that learn latent variables

A key problem with causal discovery workflows is that they typically assume latent variables are absent. This is in sharp contrast to other workflows for applied causal inference which generally emphasize robust of the inference to latent confounders. This priority focuses on causal discovery algorithms that return graphs with nodes corresponding to latent variables.

Composing lower-level features into high-level nodes

A key focus is algorithms that construct high-level abstractions from low-level variables in observed data. In a causal discovery setting, this means constructing nodes representing the high-level abstractions, then learning the structure between them.

The high-level abstractions should have the following characteristics.

Causally salience

Abstractions should capture and isolate the important underlying dynamics of the problem. Interventions on low-level features should correspond to interventions on the high-level variables.

Sufficiency

Causal inferences about the domain are feasible with the higher-level abstractions alone.

Goal-orientation

Causal discovery algorithms typically focus on "ground truth" discovery of causal relationships given a set of causal abstractions. However, we cannot learn a single optimal abstract representation from a set of low-level features. The appropriate abstraction depends on the goals of the modeler. For example, consider how to build high-level abstractions from the objects in a room. An artist who wishes to paint the room cares about variables capturing color and appearance of objects in the room. A mover cares about shape, size, weight, and maneuverability of the objects. Somebody who wishes to setup a high-quality sound system for cares about angles and how well object surfaces reflect sound. Algorithms must in some way consider the goals of the modeler.

Pending issues:

Clone this wiki locally