SpaceMining
A Novel RL Environment Beyond LLM Priors

A custom Gymnasium environment intentionally unfamiliar to LLMs. It enables fair evaluation of reward-design capabilities without leakage from standard RL benchmarks, addressing prior-knowledge bias and supporting studies of true generalization.

License: MIT Python PyPI Tests Hugging Face W&B Project Open In Colab

What is SpaceMining?

SpaceMining is a single-agent reinforcement learning environment simulating asteroid mining in a 2D space environment. The agent (mining robot) must collect resources from asteroids, deliver them to the mothership, manage energy consumption, and avoid moving obstacles while maximizing efficiency.

๐ŸŽฏ Novel Benchmark

Custom environment unfamiliar to LLMs for fair evaluation of reward-design capabilities.

โšก 2D Space Physics

Realistic physics with thrust dynamics, collisions, and gravitational forces in an 80ร—80 grid.

๐Ÿ”‹ Strategic Constraints

Energy management, limited observation radius, and moving obstacles create balanced difficulty.

Task Description

The agent is deployed in a 2D space environment (80ร—80 grid) with randomly distributed asteroids and a central mothership. The comprehensive task requires balancing multiple objectives:

๐ŸŽฏ Primary Objectives

  • Navigate and mine resource-rich asteroids
  • Monitor energy and return to mothership for recharging
  • Avoid collisions with moving obstacles
  • Transport resources to maximize efficiency

โš–๏ธ Strategic Balance

  • Exploration vs exploitation of known resources
  • Risk vs reward in obstacle navigation
  • Energy conservation vs aggressive mining
  • Travel time vs collection optimization

Environment Specifications

Technical specifications for researchers implementing experiments and designing reward functions for the SpaceMining environment.

๐Ÿ“Š Action Space

Box(3) - Continuous control

  • thrust_x and thrust_y: Movement forces
  • mine_action: Mining activation

๐Ÿ‘๏ธ State Space

Box(53) - Structured observations

  • Agent state: position, velocity, energy, inventory
  • Asteroid information: positions and resources
  • Mothership position

๐ŸŒ Environment

  • Grid size: 80ร—80 units
  • Asteroids: 8-12 per episode
  • Moving obstacles: 4-8 hazards
  • Observation radius: 15 units
  • Episode limit: 1200 steps

๐Ÿ† Rewards

  • Mining: +8.0 per resource unit
  • Delivery: +12.0 per delivered unit
  • Exploration: +3.0 per discovered asteroid
  • Energy recharge: +0.5 per unit

โš ๏ธ Penalties

  • Obstacle collision: -10.0
  • Boundary contact: -1.0
  • Energy depletion: -10.0
  • Inventory limit: 100 units max

โš™๏ธ Difficulty

Medium - Balanced challenge

  • Limited observation radius
  • Energy management required
  • Collision avoidance needed
  • Strategic resource planning

Quick Start Guide

Get started with SpaceMining for your research experiments. The environment is designed to be easy to install and integrate into existing RL pipelines.

๐Ÿš€ Installation

# Install from PyPI
pip install space-mining

# Or install from source
git clone https://github.com/reveurmichael/space_mining.git
cd space_mining
pip install -e .

๐ŸŽฎ Basic Usage

import gymnasium as gym
from space_mining.envs import make_env

# Create environment
env = make_env(render_mode="human")

# Reset and run
obs, info = env.reset()
for _ in range(1000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()

๐Ÿค– Agent Training Pipeline

Complete PPO training setup for baseline agent development and reward function evaluation.

from space_mining.agents.train_ppo import train_ppo

def main() -> None:
    # Train with configuration
    model = train_ppo(
        total_timesteps=5_000_000,
        output_dir="runs/ppo_research",
        track_wandb=True,
        wandb_project_name="space-mining-research",
        checkpoint_freq=100_000,
        eval_freq=200_000,
        device="cuda" if torch.cuda.is_available() else "cpu"
    )
    print("Training completed successfully!")

if __name__ == "__main__":
    main()

๐ŸŽฌ Generate Demo GIFs

Create demonstration GIFs from trained models for research analysis.

# Train PPO Agent
python -m space_mining.agents.train_ppo \
  --total-timesteps 5000000 --output-dir runs/ppo --checkpoint-freq 100000 --eval-freq 200000

# Render a GIF
python -m space_mining.scripts.make_gif \
  --checkpoint runs/ppo/final_model.zip --output output_gif/agent.gif --steps 800 --fps 20

These commands provide a straightforward workflow for training agents and generating visual demonstrations for research documentation.

๐Ÿ”ง Research Configuration

Environment parameter customization for controlled experiments and ablation studies.

from space_mining.envs import SpaceMining, EnvironmentConfig

# Create custom configuration
config = EnvironmentConfig(
    max_episode_steps=2000,
    grid_size=120,
    max_asteroids=20,
    observation_radius=25,
    energy_consumption_rate=0.03,
    mining_range=10.0
)

# Initialize with custom config
env = SpaceMining(config=config, render_mode="human")

# Run experiment
obs, info = env.reset(seed=42)
print(f"Environment initialized with {len(env.asteroid_positions)} asteroids")

Demo: Agent Behaviors

The GIF demonstrations showcase different agent behaviors and training outcomes, from successful resource collection to various failure modes and learning phases.

๐ŸŽฎ Visual Elements Guide

Green Circle: Mining agent (changes color based on state, Green = Exploring, Orange = Mining, Yellow = Carrying Resources)
Blue Circle: Mothership (delivery and recharge point)
Yellow Circles: Resource-rich asteroids (mining targets, changes color based on resource level, Orange = High-resource Asteroids, Yellow = Low-resource Asteroids, Gray X = Depleted Asteroids)
Red Circles: Moving obstacles (collision hazards)
Red Ring: Mining range indicator
Blue Ring: Observation range (partial observability)

Health Bars: Green-to-red gradient bars above asteroids indicate remaining resource levels

Status Display: Real-time information showing energy, inventory, mining progress, and episode statistics