What is SpaceMining?
SpaceMining is a single-agent reinforcement learning environment simulating asteroid mining in a 2D space environment. The agent (mining robot) must collect resources from asteroids, deliver them to the mothership, manage energy consumption, and avoid moving obstacles while maximizing efficiency.
๐ฏ Novel Benchmark
Custom environment unfamiliar to LLMs for fair evaluation of reward-design capabilities.
โก 2D Space Physics
Realistic physics with thrust dynamics, collisions, and gravitational forces in an 80ร80 grid.
๐ Strategic Constraints
Energy management, limited observation radius, and moving obstacles create balanced difficulty.
Task Description
The agent is deployed in a 2D space environment (80ร80 grid) with randomly distributed asteroids and a central mothership. The comprehensive task requires balancing multiple objectives:
๐ฏ Primary Objectives
- Navigate and mine resource-rich asteroids
- Monitor energy and return to mothership for recharging
- Avoid collisions with moving obstacles
- Transport resources to maximize efficiency
โ๏ธ Strategic Balance
- Exploration vs exploitation of known resources
- Risk vs reward in obstacle navigation
- Energy conservation vs aggressive mining
- Travel time vs collection optimization
Environment Specifications
Technical specifications for researchers implementing experiments and designing reward functions for the SpaceMining environment.
๐ Action Space
Box(3) - Continuous control
thrust_x
andthrust_y
: Movement forcesmine_action
: Mining activation
๐๏ธ State Space
Box(53) - Structured observations
- Agent state: position, velocity, energy, inventory
- Asteroid information: positions and resources
- Mothership position
๐ Environment
- Grid size: 80ร80 units
- Asteroids: 8-12 per episode
- Moving obstacles: 4-8 hazards
- Observation radius: 15 units
- Episode limit: 1200 steps
๐ Rewards
- Mining: +8.0 per resource unit
- Delivery: +12.0 per delivered unit
- Exploration: +3.0 per discovered asteroid
- Energy recharge: +0.5 per unit
โ ๏ธ Penalties
- Obstacle collision: -10.0
- Boundary contact: -1.0
- Energy depletion: -10.0
- Inventory limit: 100 units max
โ๏ธ Difficulty
Medium - Balanced challenge
- Limited observation radius
- Energy management required
- Collision avoidance needed
- Strategic resource planning
Quick Start Guide
Get started with SpaceMining for your research experiments. The environment is designed to be easy to install and integrate into existing RL pipelines.
๐ Installation
# Install from PyPI
pip install space-mining
# Or install from source
git clone https://github.com/reveurmichael/space_mining.git
cd space_mining
pip install -e .
๐ฎ Basic Usage
import gymnasium as gym
from space_mining.envs import make_env
# Create environment
env = make_env(render_mode="human")
# Reset and run
obs, info = env.reset()
for _ in range(1000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
๐ค Agent Training Pipeline
Complete PPO training setup for baseline agent development and reward function evaluation.
from space_mining.agents.train_ppo import train_ppo
def main() -> None:
# Train with configuration
model = train_ppo(
total_timesteps=5_000_000,
output_dir="runs/ppo_research",
track_wandb=True,
wandb_project_name="space-mining-research",
checkpoint_freq=100_000,
eval_freq=200_000,
device="cuda" if torch.cuda.is_available() else "cpu"
)
print("Training completed successfully!")
if __name__ == "__main__":
main()
๐ฌ Generate Demo GIFs
Create demonstration GIFs from trained models for research analysis.
# Train PPO Agent
python -m space_mining.agents.train_ppo \
--total-timesteps 5000000 --output-dir runs/ppo --checkpoint-freq 100000 --eval-freq 200000
# Render a GIF
python -m space_mining.scripts.make_gif \
--checkpoint runs/ppo/final_model.zip --output output_gif/agent.gif --steps 800 --fps 20
These commands provide a straightforward workflow for training agents and generating visual demonstrations for research documentation.
๐ง Research Configuration
Environment parameter customization for controlled experiments and ablation studies.
from space_mining.envs import SpaceMining, EnvironmentConfig
# Create custom configuration
config = EnvironmentConfig(
max_episode_steps=2000,
grid_size=120,
max_asteroids=20,
observation_radius=25,
energy_consumption_rate=0.03,
mining_range=10.0
)
# Initialize with custom config
env = SpaceMining(config=config, render_mode="human")
# Run experiment
obs, info = env.reset(seed=42)
print(f"Environment initialized with {len(env.asteroid_positions)} asteroids")
Demo: Agent Behaviors
The GIF demonstrations showcase different agent behaviors and training outcomes, from successful resource collection to various failure modes and learning phases.

๐ Early Exploration
Early Exploration ยท Eval num_timesteps=25000
Initial learning phase

๐ Early Exploration (Later)
Early Exploration ยท Eval num_timesteps=500000
Continued learning

โ๏ธ Successful Mining
Successful Mining
Strategic resource collection

๐ Complete Episode
Complete Episode (1200 steps)
Optimal resource collection

๐ Energy Depletion
Energy Depletion
Failed to recharge at mothership

๐ฅ Poor Obstacle Avoidance
Poor Obstacle Avoidance
Multiple collisions
๐ฎ Visual Elements Guide
Health Bars: Green-to-red gradient bars above asteroids indicate remaining resource levels
Status Display: Real-time information showing energy, inventory, mining progress, and episode statistics