ILCL: Inverse Logic-Constraint Learning from Temporally Constrained Demonstrations

1KAIST, 2Georgia Institute of Technology

Abstract

  • We aim to solve the problem of temporal-constraint learning from demonstrations.
  • Learning logic constraints is challenging due to the combinatorially large space of possible specifications and the ill-posed nature of non-Markovian constraints.
  • To this end, we propose ILCL, which learns truncated linear temporal logic (TLTL) constraint by tree-based genetic algorithm (GA-TL-Mining) and logic-constrained RL (Logic-CRL) with constraint redistribution.
  • Our evaluations show ILCL outperforms baselines in learning and transferring TL constraints across simulated scenarios, demonstrating the transferability of our method in real-world peg-in-shallow-hole environments.

Method

Training Performance
  • ILCL learns constraints via a two-player zero-sum game.
  • The constraint player, GA-TL-Mining, searches for the logic constraint that maximally penalizes the policy player's trajectories while labeling demonstrations as safe, by applying a genetic algorithm on the abstract syntax tree space of the temporal logic.
  • The policy player, Logic-CRL, optimizes a policy that maximizes the reward while satisfying the constraint player's temporal logic constraint, by redistributing the continuous logic constraint.

Results

Demonstrations in Simulated Environment

Navigation with φ GT1

Navigation with φGT1

Navigation with φ GT2

Navigation with φGT2

Wiping

Wiping

Peg-in-shallow-hole

Peg-in-shallow-hole

Demonstrations maximize the following goal rewards and satisfy the following constraints:

Goals

  • Navigation with φGT1: Reach to the goal ()
  • Navigation with φGT2: Reach to the goal ()
  • Wiping: Reach to the red flag
  • Peg-in-shallow-hole: Insert peg to the hole

Constraints

  • Navigation with φGT1: Avoid R and avoid B until in G
  • Navigation with φGT2: Reach RGB
  • Wiping: Avoid contact until reaching the green flag and fc > threshold afterwards
  • Peg-in-shallow-hole: As tilted peg contacts to the hole, maintain peg-to-hole contact and peg-to-jaw contact until insertion

Transfer Results

Navigation with φGT1 Transfer

Transfer 1
Transfer 2
Transfer 3
Transfer 4
Transfer 5
Transfer 6
Transfer 7
Transfer 8
Transfer 9
Transfer 10

Navigation with φGT2 Transfer

Transfer 1
Transfer 2
Transfer 3
Transfer 4
Transfer 5
Transfer 6
Transfer 7
Transfer 8
Transfer 9
Transfer 10

Wiping Transfer

Transfer 1
Transfer 2
Transfer 3
Transfer 4

Peg-in-shallow-hole Transfer

Transfer 1
Transfer 2
Transfer 3
Transfer 4

Statistical Evaluation

Training Performance

ILCL finds policies that achieve the lowest violation rate w.r.t the GT constraints, expert-level cumulative reward, and the lowest TR (truncated negative robustness).

Real-world transfer of the learned constraint (Peg-in-shallow-hole)

Evaluation rollouts (3 different hole angles)

90°
60°
-25°