What is SpaceMining?
SpaceMining is a single-agent reinforcement learning environment simulating asteroid mining in a 2D space environment. The agent (mining robot) must collect resources from asteroids, deliver them to the mothership, manage energy consumption, and avoid moving obstacles while maximizing efficiency.
๐ฏ Novel Benchmark
Custom environment unfamiliar to LLMs for fair evaluation of reward-design capabilities.
โก 2D Space Physics
Realistic physics with thrust dynamics, collisions, and gravitational forces in an 80ร80 grid.
๐ Strategic Constraints
Energy management, limited observation radius, and moving obstacles create balanced difficulty.
Task Description
The agent is deployed in a 2D space environment (80ร80 grid) with randomly distributed asteroids and a central mothership. The comprehensive task requires balancing multiple objectives:
๐ฏ Primary Objectives
- Navigate and mine resource-rich asteroids
- Monitor energy and return to mothership for recharging
- Avoid collisions with moving obstacles
- Transport resources to maximize efficiency
โ๏ธ Strategic Balance
- Exploration vs exploitation of known resources
- Risk vs reward in obstacle navigation
- Energy conservation vs aggressive mining
- Travel time vs collection optimization
Environment Specifications
Technical specifications for researchers implementing experiments and designing reward functions for the SpaceMining environment.
๐ Action Space
Box(3) - Continuous control
thrust_xandthrust_y: Movement forcesmine_action: Mining activation
๐๏ธ State Space
Box(53) - Structured observations
- Agent state: position, velocity, energy, inventory
- Asteroid information: positions and resources
- Mothership position
๐ Environment
- Grid size: 80ร80 units
- Asteroids: 8-12 per episode
- Moving obstacles: 4-8 hazards
- Observation radius: 15 units
- Episode limit: 1200 steps
Quick Start Guide
Get started with SpaceMining for your research experiments. The environment is designed to be easy to install and integrate into existing RL pipelines.
๐ Installation
# Install from PyPI
pip install space-mining
# Or install from source
git clone https://github.com/reveurmichael/space_mining.git
cd space_mining
pip install -e .
๐ฎ Basic Usage
import gymnasium as gym
from space_mining.envs import make_env
# Create environment
env = make_env(render_mode="human")
# Reset and run
obs, info = env.reset()
for _ in range(1000):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
๐ค Agent Training Pipeline
Complete PPO training setup for baseline agent development and reward function evaluation.
from space_mining.agents.train_ppo import train_ppo
def main() -> None:
# Train with configuration
model = train_ppo(
total_timesteps=5_000_000,
output_dir="runs/ppo_research",
track_wandb=True,
wandb_project_name="space-mining-research",
checkpoint_freq=100_000,
eval_freq=200_000,
device="cuda" if torch.cuda.is_available() else "cpu"
)
print("Training completed successfully!")
if __name__ == "__main__":
main()
๐ฌ Generate Demo GIFs
Create demonstration GIFs from trained models for research analysis.
# Train PPO Agent
python -m space_mining.agents.train_ppo \
--total-timesteps 5000000 --output-dir runs/ppo --checkpoint-freq 100000 --eval-freq 200000
# Render a GIF
python -m space_mining.scripts.make_gif \
--checkpoint runs/ppo/final_model.zip --output output_gif/agent.gif --steps 800 --fps 20
These commands provide a straightforward workflow for training agents and generating visual demonstrations for research documentation.
๐ง Research Configuration
Environment parameter customization for controlled experiments and ablation studies.
from space_mining.envs import SpaceMining, EnvironmentConfig
# Create custom configuration
config = EnvironmentConfig(
max_episode_steps=2000,
grid_size=120,
max_asteroids=20,
observation_radius=25,
energy_consumption_rate=0.03,
mining_range=10.0
)
# Initialize with custom config
env = SpaceMining(config=config, render_mode="human")
# Run experiment
obs, info = env.reset(seed=42)
print(f"Environment initialized with {len(env.asteroid_positions)} asteroids")
Demo: Agent Behaviors
The GIF demonstrations showcase different agent behaviors and training outcomes.
๐ Success #1
Successful episode
Efficient resource collection
๐ Success #2
Successful episode
Strategic deliveries
๐ Success #3
Successful episode
Good obstacle avoidance
๐ Success #4
Successful episode
Optimal routing
๐ฅ Collision Failure
Too many collisions
With moving obstacles
๐ Energy Depletion
Ran out of energy
Failed to recharge
๐ฎ Visual Elements Guide
Health Bars: Bars above asteroids indicate remaining resource levels