MyJoCo: Custom 3D Robot Simulator from Scratch (feat. MuJoCo)

June 3rd , 2026

Simulator preview

GitHub

Agenda

Current progress
Simulator Overview
Core Implementation
Limitation (Future Implementation Plan)
References

Current progress

엔트리 파일	상태 갱신 방식	궤적 형성 방식	ctrl 입력	물리 계산	용도
dexjoco_teleop.py	DexJoCo Panda-Allegro 모델을 로드하고 arm/site 제어와 hand ctrl을 mj_step으로 갱신	panel target을 attachment_site pose target으로 직접 사용, grasp slider는 open/close alpha로 매핑	Panda arm은 operational-space torque를 motor ctrl에 입력, Allegro hand는 position actuator qpos target 입력	mj_step 사용, hand-object 접촉 friction/condim 및 boxed_food 질량을 런타임 조정	DexJoCo Panda + Allegro hand teleop grasp demo

영역	구현 내용
Environment, Viewer class	model, data, viewer, mujoco API wrapper를 묶어서 관리하는 Environment class / GLFW와 mujoco rendering API를 묶은 Viewer class, camera 조종 패널 추가, event handler에서 polling 중심 구조로 변경 (reference: dm_control)
Kinematic simulator	IK 결과를 data.qpos에 직접 반영하고 mj_forward로 상태를 갱신
Dynamic simulator	IK 결과를 actuator ctrl에 넣고 mj_step으로 mujoco dynamics 진행
시뮬레이션 공통	rendering, polling, trajectory generation, simulation 시간축 분리 및 적절한 주기(ex. trajectory_duration, poll_interval) 탐색
Multi target IK	multi target의 jacobian과 error를 쌓는 get_stacked_ik 함수, damped least squares로 IK 계산, 클리핑 로직 최적화
Differential IK	actual state 기준으로 multi target jacobian을 구성하고, least-squares로 qvel target과 다음 q target 계산
Trajectory	pose interpolation(시작점 속도/가속도가 비영으로 부드러운 궤적 전환 도모), joint-space interpolation
Dynamics utility	computed torque, PD controller 모듈을 구현하고, mujoco timestep마다 각각 팔과 손가락 제어를 담당 (reference: robosuite)
Collision utility	mujoco data.contact 기반 robot-table, finger-object 접촉 판정
mujoco utility	편의를 위한 mujoco API wrapper (ex. joint id, dof id, actuator id 매핑)
Assets	DexJoCo의 Panda arm, Allegro hand, bucket/object XML을 외부 clone으로 참조하고, grasp demo에 필요한 contact/friction 설정을 런타임에서 조정

Simulator Overview

MyJoCo(feat. MuJoCo)는 로봇 모델의 다른 제어 방식을 이해하고 실험할 수 있는 MuJoCo-based 로봇 시뮬레이터 구축 프로젝트입니다. MJCF을 제외한 MyJoCo 코드는 vibe coding 없이 from scratch로 제작되었습니다.

레퍼런스로 삼은 레포지토리는 dm_control: Google DeepMind Infrastructure for Physics-Based Simulation 입니다. dm_contol은 RL 태스크 정의 단위인 Task 모듈을 학습시키기 위해 한 층 더 추상화된 Physics 클래스를 제공합니다. id 매핑 helper도 만들어야할 만큼 mujoco api가 충분하지 않았다고 느낀 이후 dm_control를 참고하게 되었습니다. 이밖에 headless RL 환경이 아닌 interactive simulator인 점, mujoco, viewer, composer, suite로 이어지는 계층구조와 확장성 역시 참고 계기가 되었습니다. dm_control api 명세서를 통해

event handler가 아닌 polling 중심의 viewer와의 상호작용
control timestep과 physical timestep을 분리 (Myjoco에서 각각 time.time, opt.timestep에 대응)
MjModel, MjData를 묶고 필드를 관리하는 구조 등등

시뮬레이션에 대한 이해를 높일 수 있었습니다. 기타 레퍼런스는 Core Implementation에서 더 자세히 다룹니다.

Simulator Structure

sim_with_mujoco/
  dexjoco_teleop.py            DexJoCo Panda arm + Allegro hand teleop entry
  environment/
    env.py                      model, data, viewer wrapper
  mjcf/
    parser.py                   MJCF parser
  utils/
    ik.py                       DLS multi-target IK
    ik_qp.py                    differential IK
    dynamics.py                 CT, PD, task-space control
    kinematics.py               finger / kinematics helper
    collision.py                contact helper
    math3d.py                   transform / twist helper
    mj.py                       mujoco id mapping helper
    planning.py                 waypoint / trajectory helper
  viewer/
    viewer.py                   renderer wrapper
    glfw_panel.py               hand target panel
    gui_panel.py                camera control panel
sim/model/motion/
  trajectory.py                 task / joint trajectory math
temp/
  dexjoco_src/                  external DexJoCo clone used by dexjoco_teleop.py

Core Implementation

Dynamic mode (motor)
- DexJoCo 원격조작 데모
- Panda 작업공간 토크 제어
- Allegro 열림/닫힘 position actuator 제어
Baseline
- 렌더링 / 패널 입력 / 시뮬레이션 시간축
- 자코비안, IK, 궤적, 환경/뷰어 헬퍼

Dynamic mode (motor)

Pipeline

dexjoco 브랜치의 데모 엔트리는 sim_with_mujoco/dexjoco_teleop.py입니다. 이 파일은 DexJoCo의 arena_arm_hand_bucket_pick.xml을 읽고 Panda arm, Allegro hand, bucket/object를 한 장면에서 구동합니다. DexJoCo asset은 repo 내부에 직접 포함하지 않고 temp/dexjoco_src 또는 DEXJOCO_XML_PATH로 참조합니다.

실행 흐름은 간단합니다. load_model에서 XML을 로드하고, tune_grasp_contact로 hand/object geom의 마찰 계수와 condim을 조정한 뒤, initialize_state에서 Panda home pose와 Allegro open pose를 넣습니다. 메인 루프에서는 Viewer panel target을 attachment_site의 목표 pose로 바꾸고, thumb/finger slider를 Allegro open/closed alpha로 분리해 사용합니다.

Panda arm은 differential IK를 거치지 않고 site 기준 operational-space torque로 제어합니다. mj_jacSite로 attachment_site Jacobian을 얻고, 현재 site pose와 목표 pose의 위치/회전 오차를 작업공간 가속도 형태로 만든 뒤, operational-space inertia와 J.T를 통해 joint torque로 변환합니다. 여기에 null-space home posture 항과 qfrc_bias를 더해 Panda 7개 motor actuator에 넣습니다.

Allegro hand는 torque controller가 아니라 position actuator target으로 닫힘 정도를 명령합니다. index/middle/ring 쪽은 finger_alpha, thumb은 thumb_alpha를 사용해 open pose와 closed pose를 보간하고, 각 actuator ctrlrange 안으로 clamp한 값을 넣습니다. 즉 dexjoco demo의 핵심은 arm은 torque-level site tracking, hand는 position actuator 기반 grasp slider로 나누어 안정적인 teleop grasp를 만드는 것입니다.

Baseline

아래 내용은 dexjoco demo를 만들기 전 baseline에서 구현한 구조입니다. 위의 dexjoco_teleop.py는 이 baseline에서 정리한 rendering loop, Viewer panel, MuJoCo id/name 접근 방식, Jacobian 기반 제어 흐름을 가져와 Panda-Allegro grasp demo로 좁혀 실행합니다.

polling, rendering / simulation 시간축 분리

시뮬레이터 loop는 크게 두 축의 시간 흐름이 있습니다.

time.time(): 절대적으로 흐르는 시간으로 시뮬레이션 시간과 독립적이며
- polling 주기
- rendering 주기에 관여합니다.
mjData.time: mujoco simulation에서 관리하는 timestep으로, dynamic simulator에서는 mj_step()이 자동으로 관리해주고, kinematic simulator는 loop마다 수동으로 더해줘야 합니다.
- 궤적 형성 주기
- contact detection 주기
- tau 계산
- mj_step()에 관여합니다.

아래는 baseline motor 시뮬레이터의 시간축 분리 구조를 간소화한 예시입니다.

try:
    while not glfw.window_should_close(env.viewer.window):  # 렌더링 주기
        # 렌더링 프레임당 시뮬레이션 묶음
        for _ in range(sim_steps_per_frame):
            now = time.time()

            # 패널 입력 polling
            polled_target = None
            if now - last_poll_time >= poll_interval:
                last_poll_time = now
                polled_target, polled_camera = env.viewer.poll_target()

            # task-space target 갱신
            if polled_target is not None:
                T_des = initial_pose.copy()
                T_des[:3, 3] = polled_target[:3].copy()
                target_rpy = polled_target[3:]
                target_rot = rpy2rotation_matrix(
                    target_rpy[0], target_rpy[1], target_rpy[2]
                )
                T_des[:3, :3] = initial_pose[:3, :3] @ target_rot
                alpha_target[:] = [polled_target[6], polled_target[7]]

            # mujoco step 단위 제어 루프
            for _ in range(steps_per_sim):
                # reference state 갱신
                mujoco.mj_copyData(ref_data, env.model, env.data)
                ref_data.qpos[:] = q_des
                ref_data.qvel[:] = 0.0
                mujoco.mj_forward(env.model, ref_data)

                # differential IK target 갱신
                q_des, q_dot_des, _ = solve_differential_ik(...)

                # finger alpha 설정
                alpha_cmd += (
                    k_alpha * (alpha_target - alpha_cmd) * env.model.opt.timestep
                )  # differential IK와 유사하게 한 timestep 동안 이동시킬 alpha 소분을 짐작 (추후 수정)
                alpha_cmd = np.clip(alpha_cmd, 0.0, 1.0)

                # finger torque control
                finger_pd_control(env, alpha_cmd)

                # arm computed torque control
                tau_des = computed_torque_control(...)

                # motor actuator command 변환
                for i, actuator_id in enumerate(actuator_ids):
                    gear = env.model.actuator_gear[actuator_id, 0]
                    env.data.ctrl[actuator_id] = clip(tau_des[i] / gear)

                # dynamics integration
                env.step(1)

        # rendering
        env.viewer.render()

finally:
    env.viewer.terminate_viewer()

이 구조에서 rendering, polling, physics step을 시간축을 분리했습니다. 바깥 while은 GLFW window 주기와 rendering을 관리하고, polling은 절대 시간을 기준으로 panel target을 읽으며, 안쪽 steps_per_sim 루프는 mujoco timestep 기준으로 IK, torque 계산, mj_step에 관여합니다.

따라서 panel 입력은 비교적 낮은 주기로 갱신되지만, 같은 target을 기준으로 물리 제어는 더 촘촘하게 실행됩니다. baseline motor 버전에서는 task-space target을 별도 trajectory로 보간하기보다 매 step differential IK로 q target을 갱신하고, 그 아래에서 finger PD와 arm computed torque가 ctrl을 채운 뒤 mujoco가 dynamics를 적분합니다.

Dynamics utilities

sim_with_mujoco/utils/dynamics.py에는 실험적인 제어 블록들이 들어 있습니다. dexjoco_teleop.py는 현재 이 파일의 computed torque helper를 직접 호출하기보다, 엔트리 내부에서 attachment_site 기준 operational-space torque를 계산하는 형태로 단순화했습니다.

def solve_inverse_dynamics(model, data, qacc):
    data.qacc = qacc.copy()
    mujoco.mj_inverse(model, data)
    qfrc_inverse = data.qfrc_inverse

    return qfrc_inverse.copy()

mujoco의 mj_inverse()는 현재 qpos, qvel, 원하는 qacc를 바탕으로 필요한 generalized force를 계산합니다.

Joint-space computed torque controller (for arm)

dynamic motor simulator에서 팔은 panel target을 최대한 잘 따라가게 하기 위해 computed torque controller를 사용했습니다. robosuite의 operational space controller는 end-effector pose error를 task-space wrench로 만들고 이를 \(J^T f\)로 joint torque에 매핑하는 구조라서 end-effector 기준 action space에는 자연스럽지만, 본 프로젝트에서는 differential IK가 이미 panel pose를 joint target으로 변환합니다. 따라서 arm controller는 다시 task-space wrench를 만들기보다, \(q_{des}, \dot{q}_{des}, \ddot{q}_{des}\)를 직접 추종하면서 \(M(q)\), bias force를 보상하는 joint-space computed torque가 tracking 준수에 더 적합하다고 판단했습니다. 즉 robosuite의 high-level IK 계층은 참고하되, low-level arm control은 mj_inverse() 기반 CT로 바꿔 panel target tracking 오차를 줄이도록 유도했습니다. 그리고 손가락은 접촉 안정성을 위해 별도 controller로 분리했습니다. 손가락의 경우 질량이 작아 bias force도 arm에 비해 매우 작았고(norm 약 0.02) MJCF에 명시된 force 범위도 작아 클리핑 로직의 영향을 많이 받았습니다. 이로 인해 tracking delay도 arm에 비해 컸습니다.

def computed_torque_control(
    model, data, q_des, q_dot_des, q_dotdot_des, joint_ids, kp=300, kd=50
):
    inv_data = mujoco.MjData(model)  # inverse dynamic 계산용
    mujoco.mj_copyData(inv_data, model, data)

    dof_ids = dof_ids_from_joints(model, joint_ids)
    qpos_ids = model.jnt_qposadr[joint_ids]

    qacc_des = (
        q_dotdot_des  # task_to_joint_space()로 인해 active actuator에 대한 원소로만 구성
        + kp * (q_des[qpos_ids] - inv_data.qpos[qpos_ids])
        + kd * (q_dot_des - inv_data.qvel[dof_ids])
    )

    inv_data.qacc[:] = 0.0
    inv_data.qacc[dof_ids] = qacc_des
    mujoco.mj_inverse(
        model, inv_data
    )  # mj_inverse는 내부적으로 중력항을 고려해 토크를 반환
    tau = inv_data.qfrc_inverse[
        dof_ids
    ]  # contact constraint는 mj_step이 처리하므로 actuator torque에서 다시 보상하지 않음


...

한편 computed torque는 시간에 대한 5차 다항식 quintic 함수와 잘 어울리는데, 이는 q, qdot, qdotdot 계산을 위해 그리고 미분 가능한 구간을 위해 5차 다항식이 필요하기 때문입니다. (ROS2 joint trajectory planner와의 조합이 기대되는 부분입니다.)

q_dotdot_des = np.zeros(len(joint_ids))
q_dot_des = np.zeros(len(joint_ids))

위와 같이 목표 관절각속도 q_dotdot_des와 목표 관절각가속도 q_dot_des를 zero로 덮어쓰면 컨트롤러가 목표 궤적이 정지해있다고 인식하는 computed torque setpoint control 문제가 됩니다.

PD controller (for finger)

position actuator 버전에서 mujoco는 내부적으로 P 계열 제어기를 사용하는데 dampratio 설정을 통해 damping 성격의 항도 포함할 수 있습니다. 다만 motor actuator 버전에서는 ctrl이 더 이상 목표 관절각이 아니라 직접 torque 명령이므로, 같은 servo 효과를 코드 레벨에서 재구성할 필요가 있었습니다. 따라서 손가락에는 \(k_p(q_{des}-q)-k_d\dot q\) 형태의 PD torque controller를 사용하여 grasping phase에 따라 gain을 조절할 수 있게 했습니다.

finger_pd_control()은 position actuator가 아닌 motor actuator 손가락에 직접 torque를 넣기 위한 단순 PD 제어기입니다. kinematic mode와 마찬가지로 먼저 각 joint range를 alpha에 대해 보간합니다.

dexjoco demo에서는 이 baseline finger PD 대신 Allegro position actuator에 open/closed pose를 직접 target으로 넣는 방식을 사용했습니다. grasp 안정성을 먼저 확인하기 위해 손 제어를 torque-level까지 내리지 않고, arm의 site tracking과 hand의 position actuator grasp를 분리한 형태입니다.

q_des = (1 - alpha[0]) * thumb_q_open[index] + alpha[0] * thumb_q_closed[index]
q = data.qpos[qpos_id]
qdot = data.qvel[dof_id]

tau = kp * (q_des - q) - kd * qdot

tau = np.clip(tau, -tau_max, tau_max)
if model.actuator_ctrllimited[actuator_id]:
    lo, hi = model.actuator_ctrlrange[actuator_id]
    tau = np.clip(tau, lo, hi)

data.ctrl[actuator_id] = tau

이 함수의 핵심은 손가락을 IK로 풀지 않고 grasp slider alpha를 관절 목표각으로 바꾼 뒤, 현재 관절각/속도 오차로 motor torque를 계산하는 것입니다.

Jacobian

Jacobian은 IK에서만 쓰인 것이 아니라, 현재 body/site가 선택된 관절 속도에 의해 어떻게 움직이는지 읽고 task-space 명령을 joint-space 명령으로 바꾸는 공통 연결부로 사용했습니다. 또한 mujoco가 전체 model.nv 자유도 기준의 Jacobian을 채워주므로, 항상 필요한 dof column만 잘라서 사용했습니다.

point1: mj_jacBody, mj_jacSite는 반환값이 아니라 사전할당된 array를 채웁니다.

def get_body_jacobian(model, data, body_id):
    jacp = np.zeros((3, model.nv))
    jacr = np.zeros((3, model.nv))
    mujoco.mj_jacBody(model, data, jacp, jacr, body_id)
    J = np.vstack([jacr, jacp])
    return J

mujoco의 mj_jacBody, mj_jacSite는 matrix를 return하지 않고 jacp, jacr 배열을 채웁니다. jacp는 linear velocity, jacr은 angular velocity에 대한 Jacobian이고, 프로젝트에서는 twist 순서를 맞추기 위해 [jacr, jacp] 형태의 6 x nv 행렬로 합쳤습니다.

point2: full Jacobian에서 필요한 dof column만 부분적으로 사용합니다.

def get_body_twist(model, data, ee_id, joint_ids):
    dof_ids = np.array([model.jnt_dofadr[jid] for jid in joint_ids], dtype=int)

    jacp = np.zeros((3, model.nv))
    jacr = np.zeros((3, model.nv))
    mujoco.mj_jacBody(model, data, jacp, jacr, ee_id)

    J = np.vstack([jacr, jacp])[:, dof_ids]
    qvel = data.qvel[dof_ids]
    twist_current = J @ qvel
    return twist_current

DexJoCo Panda-Allegro 모델도 arm, hand, object 등 여러 자유도를 함께 갖지만, 특정 controller가 실제로 움직일 관절은 그중 일부입니다. 그래서 mujoco가 준 전체 Jacobian을 그대로 쓰지 않고 joint id를 dof id로 바꾼 뒤 필요한 column만(active joints) 뽑아 현재 end-effector twist를 계산했습니다.

point3: task-space PD에서는 J.T로 task command를 joint torque로 변환합니다.

def pd_task_space(env, x_des, twist_des, twist_dot_des, joint_ids, kp=16, kd=4):
    dof_ids = np.array([env.model.jnt_dofadr[jid] for jid in joint_ids], dtype=int)

    twist_current = get_body_twist(env.model, env.data, env.ee_body_id, joint_ids)
    twist_error = env.get_twist_error(x_des)
    cmd = twist_dot_des + kd * (twist_des - twist_current) + kp * twist_error

    J = get_body_jacobian(env.model, env.data, env.ee_body_id)[:, dof_ids]
    tau = J.T @ cmd

    return tau

pose error, twist error, twist_dot_des가 섞인 task-space command를 J.T로 joint space에 투영하면 end-effector 방향의 오차를 줄이는 토크를 각 active joint에 분배할 수 있습니다. 다만 단순 자코비안 변환은 mass matrix가 포함된 구조는 아니므로 각 관절의 무게, 관성을 고려하지 못하는 문제가 있습니다.

Multi-target IK

solve_ik()는 오른손, 왼손 두 타겟을 입력으로 받아 stacked jacobian, error를 계산하고 관절 제약이 있는 IK 문제를 풉니다.

오른손 target은 pose IK이므로 6차원 twist error를 씁니다. 여기서 pose error를 행렬 원소 차이로 구하면 SE(3)의 group 구조를 무시하게 되어 현재 pose에서 목표 pose까지의 상대 rigid transform을 만들고, 그 상대 변환에 log를 씌워 tangent vector로 바꿉니다.

_, err_i = calculate_twist_error(T_sb, T_sd)
J = np.vstack([jacr, jacp])
J_b = Adjoint(np.linalg.inv(T_sb)) @ J
J_i = J_b[:, dof_ids]

반면 왼손은 position IK로 3차원 position error를 사용합니다.

err_i = T_sd[:3, 3] - T_sb[:3, 3]
J_i = jacp[:, dof_ids]

왼손 target은 position error만 씁니다.

err_i = T_sd[:3, 3] - T_sb[:3, 3]
J_i = jacp[:, dof_ids]

그다음 한데 모인 각 target의 Jacobian과 error는 multi-target IK를 하나의 선형 문제로 풀게 해줍니다.

return np.vstack(J_list), np.hstack(err_list)

현재 stacked error는 총 9차원으로, solver는 15개의 active joint을 조정해 이 error를 줄입니다.

왼손, 오른손 각기 다른 IK를 적용하는 구조가 좋은 이유는, J @ dq = error에 대해 관절 차원(15=왼팔7+오른팔7+리프트1)이 error 차원(9)보다 크고, 여러 해를 확보할 수 있는 상태가 되기 때문입니다. 이는 두 손 모두 pose IK를 적용한 경우보다 큰 여유 자유도를 갖습니다.

다음으로 특이점 근처에서는 damped pseudoinverse를 사용합니다.

def damped_pseudoinverse(J, damping=1e-2):
    rows, cols = J.shape

    if rows <= cols:
        return J.T @ np.linalg.inv(J @ J.T + damping**2 * np.eye(rows))
    
    return np.linalg.inv(J.T @ J + damping**2 * np.eye(cols)) @ J.T

Moore-Penrose pseudoinverse만 쓰면 singular value가 작아질 때 관절 update가 커질 수 있습니다. damping을 넣으면 역행렬 분모에 작은 양수가 더해져 update 폭주를 줄일 수 있습니다.

if model.jnt_limited[joint_id]:
    q_lower, q_upper = model.jnt_range[joint_id]
    ik_data.qpos[qadr] = np.clip(ik_data.qpos[qadr], q_lower, q_upper)

각 iteration에서는 xml joint range도 반영합니다.

if check_collision and is_collision(model, ik_data):
    mujoco.mj_copyData(ik_data, model, prev_data)
else:
    mujoco.mj_copyData(prev_data, model, ik_data)

그리고 위 충돌 감지 분기점은 kinematic mode에서만 실행되는 블록으로, mj_forward() 대신 충돌을 감지하고 이전 configuration으로 rollback합니다.

Trajectory Generation

시작 속도/가속도가 비영인 pose interpolation

def interpolate_pose_ros(T_start, T_end, twist_start, twistdot_start, T, t):
    p_start = T_start[:3, 3]
    p_end = T_end[:3, 3]
    R_start = T_start[:3, :3]
    R_end = T_end[:3, :3]

    w_start = twist_start[:3]
    v_start = twist_start[3:]
    w_dot_start = twistdot_start[:3]
    v_dot_start = twistdot_start[3:]

    p_t, p_dot_t, p_dotdot_t = quintic_interpolate_position(
        p_start, p_end, v_start, v_dot_start, T, t
    )

    rotvec_end = Rotation.from_matrix(R_start.T @ R_end).as_rotvec()
    rotvec_dot_start = R_start.T @ w_start
    rotvec_dotdot_start = R_start.T @ w_dot_start
    rotvec_t, rotvec_dot_t, rotvec_dotdot_t = quintic_interpolate_position(
        np.zeros(3), rotvec_end, rotvec_dot_start, rotvec_dotdot_start, T, t
    )

    R_t = R_start @ Rotation.from_rotvec(rotvec_t).as_matrix()
    w_t = R_t @ rotvec_dot_t
    w_dot_t = R_t @ rotvec_dotdot_t

    return create_transform_matrix(R_t, p_t), np.r_[w_t, p_dot_t], np.r_[w_dot_t, p_dotdot_t]

interpolate_pose_ros는 시작 pose와 목표 pose만 잇는 것이 아니라, 시작 시점의 twist와 twistdot도 경계조건으로 사용합니다. 따라서 target이 갱신되더라도 이전 궤적의 속도와 가속도를 이어받아 position, rotation의 잦은 감/가속을 방지할 수 있습니다.

position은 시작 선속도와 선가속도를 포함한 quintic 보간으로 만들고, rotation은 시작 자세 기준 상대 회전벡터를 5차 보간한 뒤 다시 SO(3) 회전행렬로 복원합니다. rotation matrix를 원소별로 보간하지 않고 회전벡터를 거치는 이유는 중간 자세가 계속 유효한 회전이 되게 하기 위해서입니다. 또한 회전벡터의 1차, 2차 미분을 현재 자세로 변환해 원하는 각속도와 각가속도도 함께 얻습니다.

joint-space quintic

joint-space 궤적 생성기는 target pose가 바뀌었을 때 먼저 IK로 trajectory_goal_q를 구하고, q 사이를 보간합니다. 일반적인 타임 스케일링 함수와 다른 점은 인접한 두 궤적이 qvel을 공유하게 해 초기 qvel, qacc가 0이 아닐 수 있습니다. 궤적들을 부드럽게 잇는 이 5차 방정식은 ROS의 궤적형성 메커니즘에서 영감을 받았습니다.

# 초기 q, qdot, qdotdot은 비영이고 q end는 0을 만족시키는 quintic
def quintic_time_scaling_ros(q_start, q_end, q_dot, q_dotdot, T, t):
    t = np.clip(t, 0.0, T)

    D = q_end - (q_start + q_dot * T + 0.5 * q_dotdot * T**2)

    a0 = q_start
    a1 = q_dot
    a2 = 0.5 * q_dotdot


...

관절 공간에서 관절 궤적을 단순 보간하면 중간에 불가능한 해와 마주치는 경우가 있습니다. Panda-Allegro 모델에서도 task-space target을 바로 joint-space로 보간하면 attachment_site의 target path를 보장하지 못하고 손과 물체의 접촉 상태도 쉽게 흔들릴 수 있습니다. ROS는 궤적 상에 중간 지점, 즉 waypoint를 사전에 정해놓기에(일종의 planning) 나쁜 해에 덜 노출됩니다. 위 quintic_time_scaling_ros는 인접 궤적 간 물리량을 공유하지만 waypoint를 미리 지정하지 않습니다. 참고로 관절 공간 궤적 생성기는 실시간 성능에 저해 요소가 되어 teleoperation simulator에는 사용하지 못했습니다. 관절 공간 보간이 end-effector의 target path를 보장하지 못한다는 치명적인 문제점도 한 몫하였습니다.

Planning (추가)

기존에는 생성된 궤적을 바로 제어에 활용하여 특이점을 고려하지 못했고 mjData에 무관하게 궤적 기간이 동일한 등등 저수준 궤적 형성에 머물렀습니다. 특히 joint-space에 바로 보간을 적용하면 불가능한 해가 나올 수도 있습니다. 새로 추가한 planning 모듈은 trajectory 집합을 관리합니다. 저는 실시간으로 새로운 궤적을 현재 궤적과 블렌딩하는 방법보다 하나의 긴 경로를 여러 궤적으로 쪼갤 때 블렌딩이 더 용이해질 것을 기대했습니다. 결과적으로 특이점에 도달하는 경우는 줄었지만 반응은 더 지연되었고, 짧은 경로에서 크게 효과적이지 않았던 것 같습니다.

참고했던 ROS2 joint trajectory controller는 task-space 목표를 여러 구간으로 나누고 그 사이를 quintic 보간합니다. 이때 0이 아닌 양끝 속도값으로 5차 다항식을 풀어 구간 경계(waypoint)에서의 빈번한 감가속을 완화합니다. 이제 controller에서는 궤적을 형성할 때 planning api만 호출하고 planning 모듈이 trajectory 모듈을 호출하게 됩니다.

현재 각 waypoint는 특이점 평가, 관절각 및 토크 범위를 고려하며 충돌은 고려하지 않고 있습니다.

특이점 평가는 body Jacobian의 singular value를 이용합니다. 계획된 waypoint의 q를 임시 MjData에 넣고 mj_forward()를 호출한 뒤, end-effector body 기준 Jacobian을 계산합니다.

def get_singular_values(model, data, body_id, joint_ids):
    dof_ids = np.array([model.jnt_dofadr[jid] for jid in joint_ids], dtype=int)

    jacp = np.zeros((3, model.nv))
    jacr = np.zeros((3, model.nv))
    mujoco.mj_jacBody(model, data, jacp, jacr, body_id)
    J = np.vstack([jacr, jacp])[:, dof_ids]

    singular_values = np.linalg.svd(J, compute_uv=False)

    return singular_values[-1], singular_values

여기서 J는 전체 자유도 자코비안이 아니라 IK에 사용한 active joints의 dof 열만 잘라낸 6 x n 자코비안입니다. singular_values[-1]은 가장 작은 singular value이며, 이 값이 작아질수록 특정 task-space 방향으로 움직이기 위해 매우 큰 관절 변화가 필요해질 수 있습니다. planning.py에서는 이 값을 singularity_threshold와 비교해 위험한 waypoint를 무시하고 재시도합니다.

Environment class

Environment 클래스는 mjData, mjModel, control input, simulation step, rendering을 한데 묶어 루프의 부담을 줄입니다. dm_control도 Physics가 mujoco state와 step/render를 맡고, control.Environment가 task의 before_step/after_step과 observation/reward 생성을 분리해 physics와 task logic이 서로 얽히지 않게 합니다. (Myjoco는 보다 포괄적인 클래스명 Environment를 사용했습니다) 따라서 이 프로젝트에서는 mjData, mjModel는 Environment가 소유하고, Viewer는 같은 data를 읽어 그리기만 하며, controller가 만든 qpos/ctrl/torque는 env.step() 직전에 state를 반영하도록 설계했습니다.

Environment는 mujoco라는 물리법칙, 즉 규칙들의 집합으로 모델과 유사한 역할을 합니다. 또한 Viewer 인스턴스를 감싸며 mjData 공유하는데, 컨트롤러가 qpos(ctrl)을 바꾸고 mj_step()을 호출하면 viewer는 공유된 data를 렌더링합니다.

class Environment:
    def __init__(self, xml_path, end_effector):
        self.model, self.data = parser(xml_path)
        self.ee_body_id = mujoco.mj_name2id(
            self.model,
            mujoco.mjtObj.mjOBJ_BODY,
            end_effector,
        )
        self.ee_body_name = end_effector
        self.viewer = Viewer(self.model, self.data)


...

Environment 생성자는 빈출 geom 필드를 저장합니다. 현재는 named access helper가 mjcf.py 파일로 분기되어 있지만 geom 정보를 필드로 담는 Environment 메서드로 재구성할 계획입니다.

Viewer class

Viewer 클래스는 mujoco rendering api를 사용하여 target 및 camera panel(시야 조절 패널)을 생성합니다. 특히 poll_target()은 event handler가 아닌 polling 방식으로 호출부에서 패널 입력을 조회하도록 만듭니다.

다음 render 함수는 엔트리 메인 루프에서 호출되며 FPS와 연산량 사이의 적절한 트레이드오프를 찾아야 합니다.

self.width, self.height = glfw.get_framebuffer_size(self.window)
self.viewport = mujoco.MjrRect(0, 0, self.width, self.height)

mujoco.mjv_updateScene(
    self.model,
    self.data,
    self.opt,
    None,
    self.cam,
    mujoco.mjtCatBit.mjCAT_ALL,
    self.scene,
)

mujoco.mjr_render(self.viewport, self.scene, self.context)
self.hand_pose_panel.render(self.window, self.context)
self.gui_panel.render(self.window, self.context)
glfw.swap_buffers(self.window)

Id, Name mapping helper

id 체계는 kinematic tree를 쓰면서 가장 헷갈리는 부분으로, joint, dof, body 등의 id를 서로 매핑하는 helper를 만들었습니다.

def actuator_ids_from_joints(model, joint_ids):
    joint_id_set = set(joint_ids)
    actuator_ids = []

    for actuator_id in range(model.nu):
        if model.actuator_trntype[actuator_id] != mujoco.mjtTrn.mjTRN_JOINT:
            continue

        joint_id = model.actuator_trnid[actuator_id, 0]
        if joint_id in joint_id_set:
            actuator_ids.append(actuator_id)

    return actuator_ids

자주 등장하는 joint, dof, actuator id는 서로 다릅니다.

joint id: model.jnt_* 배열을 indexing하는 id입니다.
dof id: data.qvel, data.qacc, data.qfrc_applied, data.qfrc_bias를 indexing하는 id입니다.
actuator id: data.ctrl, model.actuator_*를 indexing하는 id입니다.

hinge/slide joint는 보통 joint 하나가 dof 하나를 갖지만, 그래도 jnt_qposadr, jnt_dofadr, actuator_trnid를 통해 변환해야 합니다. 특히 qpos index와 qvel index는 freejoint나 ball joint가 생기면 바로 달라질 수 있습니다. 이번 DexJoCo XML에서는 Panda와 Allegro의 hinge joint가 중심이지만, object와 scene 요소까지 함께 로드되므로 다른 관절 타입으로도 확장될 수 있음을 가정해야 합니다.

Limitation (Future Implementation Plan)

IK의 damping과 dq 제한이 singularity, collision 조건까지 만족시키지 않습니다. 별도 planning 모듈을 추가하거나 Differential IK 모듈 제약을 보완해야 합니다.
향후에 task-space PD외에 mass matrix, null-space posture control을 포함한 operational space controller를 구현해야 합니다.
finger interpolation은 grasp synergy를 모델링하지 않고 open/close alpha를 관절 목표로 직접 매핑해서 손의 실제 닫힘을 충분히 표현하지 못합니다.
현재 trajectory duration은 고정값인데, 인접 velocity 등을 바탕으로 동적으로 바꿔볼 수 있습니다.

References

mujoco documentation
Modern Robotics, Kevin M. Lynch and Frank C. Park: Ch. 3 Rigid-Body Motions, Ch. 5 Velocity Kinematics and Statics, Ch. 6 Inverse Kinematics, Ch. 8 Dynamics of Open Chains, Ch. 9 Trajectory Generation, Ch. 11 Robot Control, Ch. 12 Grasping and Manipulation
Drake Differential IK: https://drake.mit.edu/doxygen_cxx/group__planning__kinematics.html
robosuite Controllers: https://robosuite.ai/docs/modules/controllers.html
dm_control: https://github.com/google-deepmind/dm_control
DexJoCo: https://github.com/brave-eai/dexjoco

Tech Stack

MuJoCo
GLFW
SciPy
matplotlib
Python
NumPy