
PolicyGuided Heuristic Search with Guarantees
The use of a policy and a heuristic function for guiding search can be q...
read it

Training a FirstOrder Theorem Prover from Synthetic Data
A major challenge in applying machine learning to automated theorem prov...
read it

Avoiding Side Effects By Considering Future Tasks
Designing reward functions is difficult: the designer has to specify wha...
read it

Logarithmic Pruning is All You Need
The Lottery Ticket Hypothesis is a conjecture that every large neural ne...
read it

Learning to Prove from Synthetic Theorems
A major challenge in applying machine learning to automated theorem prov...
read it

Pitfalls of learning a reward function online
In some agent designs like inverse reinforcement learning an agent needs...
read it

Iterative Budgeted Exponential Search
We tackle two longstanding problems related to reexpansions in heurist...
read it

Zooming Cautiously: LinearMemory Heuristic Search With Node Expansion Guarantees
We introduce and analyze two parameterfree linearmemory tree search al...
read it

An investigation of modelfree planning
The field of reinforcement learning (RL) is facing increasingly challeng...
read it

SoftBayes: Prod for Mixtures of Experts with LogLoss
We consider prediction with expert advice under the logloss with the go...
read it

SingleAgent Policy Tree Search With Guarantees
We introduce two novel tree search algorithms that use a policy to guide...
read it

Measuring and avoiding side effects using relative reachability
How can we design reinforcement learning agents that avoid causing unnec...
read it

Agents and Devices: A Relative Definition of Agency
According to Dennett, the same system may be described using a `physical...
read it

AI Safety Gridworlds
We present a suite of reinforcement learning environments illustrating v...
read it

Reinforcement Learning with a Corrupted Reward Channel
No realworld reward function is perfect. Sensory errors and software bu...
read it

Thompson Sampling is Asymptotically Optimal in General Environments
We discuss a variant of Thompson sampling for nonparametric reinforcemen...
read it
Laurent Orseau
is this you? claim profile