Skip to content

unienv_data.replay_buffer.collection_wrapper

ReplayBufferCollectionWrapper – transparently records transitions into replay buffers.

This wrapper sits around any Env and writes every step transition into one or more ReplayBuffer instances according to the chosen replay_mode.

Environment-space invariant

This wrapper relies on the :class:~unienv_interface.env_base.env.Env contract that when an env is batched, its observation_space, action_space, and context_space already include the leading batch dimension. The internal source/target space builders (_build_canonical_source_space and the default DictSpace builder) therefore use the env-side spaces as-is and never wrap them in an additional batch_space call. Re-batching those spaces would double-batch and produce incorrect shapes – this was the root cause of a previous bug in this module.

PathLike module-attribute

PathLike = Union[str, 'os.PathLike[str]']

ReplayBufferCollectionWrapper

ReplayBufferCollectionWrapper(env: Env[BArrayType, ContextType, ObsType, ActType, RenderFrame, BDeviceType, BDtypeType, BRNGType], replay_buffers: Union[ReplayBuffer, Sequence[ReplayBuffer]], replay_mode: Literal['auto', 'shared', 'per_env'] = 'auto', transition_builder: Optional[Union[Callable[..., Any], DataTransformation]] = None, validate_transition: bool = False, dump_on_episode_end: bool = False, dump_on_reset_boundary: bool = False, dump_path: Optional[PathLike] = None)

Bases: Wrapper[BArrayType, ContextType, ObsType, ActType, RenderFrame, BDeviceType, BDtypeType, BRNGType, BArrayType, ContextType, ObsType, ActType, RenderFrame, BDeviceType, BDtypeType, BRNGType]

Automatically record every transition into one or more replay buffers.

Parameters

env : Env The environment to wrap. replay_buffers : ReplayBuffer | Sequence[ReplayBuffer] One replay buffer (shared mode) or one per environment slot (per‑env mode). replay_mode : "auto", "shared" or "per_env" * "shared" - a single flat replay buffer. * "per_env" - one buffer per batched slot, with episode segments. * "auto" - resolve based on batch_size and buffer count. transition_builder : callable or DataTransformation, optional Either a callable (cached_obs, cached_context, action, next_obs, reward, terminated, truncated, env_index, rb_single_space) -> transition, or a DataTransformation that maps a canonical raw transition dict to the replay buffer's format. When None the default DictSpace‑based builder is used. validate_transition : bool When True every built transition is validated against the target replay buffer's space (single‑instance for append / batched for shared extend). Off by default for speed. dump_on_episode_end : bool When True, automatically call dumps(path) on the replay buffer(s) after an episode ends (i.e., when terminated or truncated is true). Requires dump_path to be set. dump_on_reset_boundary : bool When True, automatically call dumps(path) on reset boundaries (full reset or masked reset). Requires dump_path to be set. dump_path : str | os.PathLike | None Filesystem path where the replay buffer(s) should be dumped when auto-dump is triggered. May be reassigned after construction to redirect subsequent dumps. Required if dump_on_episode_end or dump_on_reset_boundary is True. Repeated dumps overwrite the same path.

dump_path instance-attribute

dump_path: Optional[PathLike] = dump_path

replay_buffers property

replay_buffers: List[ReplayBuffer]

The list of managed replay buffers.

replay_mode property

replay_mode: str

The resolved replay mode ("shared" or "per_env").

buffer_swap class-attribute instance-attribute

buffer_swap = set_replay_buffers

shared classmethod

shared(env: Env, replay_buffer: ReplayBuffer, **kwargs: Any) -> 'ReplayBufferCollectionWrapper'

Create a wrapper in shared mode with a single replay buffer.

per_env classmethod

per_env(env: Env, replay_buffers: Sequence[ReplayBuffer], **kwargs: Any) -> 'ReplayBufferCollectionWrapper'

Create a wrapper in per‑env mode with one buffer per slot.

reset

reset(*args: Any, mask: Optional[BArrayType] = None, seed: Optional[int] = None, **kwargs: Any) -> Tuple[ContextType, ObsType, Dict[str, Any]]

step

step(action: ActType) -> Tuple[ObsType, Union[SupportsFloat, BArrayType], Union[bool, BArrayType], Union[bool, BArrayType], Dict[str, Any]]

reset_async

reset_async(*args: Any, mask: Optional[BArrayType] = None, seed: Optional[int] = None, **kwargs: Any) -> None

reset_wait

reset_wait() -> Tuple[ContextType, ObsType, Dict[str, Any]]

step_async

step_async(action: ActType) -> None

step_wait

step_wait() -> Tuple[ObsType, Union[SupportsFloat, BArrayType], Union[bool, BArrayType], Union[bool, BArrayType], Dict[str, Any]]

set_replay_buffers

set_replay_buffers(replay_buffers: Union[ReplayBuffer, Sequence[ReplayBuffer]], *, replay_mode: Optional[Literal['auto', 'shared', 'per_env']] = None, require_same_space: bool = True, allow_mid_episode: bool = False, finalize_old_segments: bool = False, dump_old: bool = False) -> List[ReplayBuffer]

Swap the replay buffer(s) without rebuilding the wrapper.

Parameters

replay_buffers : ReplayBuffer | Sequence[ReplayBuffer] The new replay buffer(s) to use. replay_mode : "auto", "shared", "per_env", or None If provided, change the replay mode. If None, keep the current mode. require_same_space : bool If True, the new buffers must have the same single_space as the old buffers. Default is True. allow_mid_episode : bool If True, allow swapping even if this splits an episode across old/new buffers. Default is False. finalize_old_segments : bool If True, close any open segments in the old buffers before swapping. Default is False. dump_old : bool If True, dump the old buffers before swapping. Requires dump_path to be set. Only does this safely after segment finalization where needed. Default is False.

Returns

List[ReplayBuffer] The old replay buffer(s) that were replaced.

Raises

ValueError If the new buffers don't match the expected layout or space. RuntimeError If there are open segments and allow_mid_episode and finalize_old_segments are both False.