MyJoCo: Custom 3D Robot Simulator from Scratch (feat. MuJoCo)

June 3rd , 2026

Simulator preview

GitHub

Demo

26/06/22

00:03 ~ 01:14 Kinematic simulator Demo

01:17 ~ 02:32 Dynamic simulator (position) Demo

02:35 ~ 03:29 Dynamic simulator (motor) Demo

26/06/12

00:03 ~ 01:03 Kinematic simulator Demo

01:06 ~ 02:30 Dynamic simulator (position) Demo

02:33 ~ 03:31 Dynamic simulator (motor) Demo

Agenda

Current progress (26/06/22 기준)
Simulator Overview
Core Implementation
Limitation (Future Implementation Plan)
References

Current progress (26/06/22 기준)

엔트리 파일	상태 갱신 방식	궤적 형성 방식	ctrl 입력	물리 계산	용도
dynamic_simulator_position.py	IK 목표 관절각을 position actuator ctrl로 전달	panel target을 task-space pose로 보간한 뒤 IK 수행	ctrl = target qpos, mujoco position servo가 torque 계산	mj_step 사용, qfrc_bias를 qfrc_applied에 더해 중력 보상 실험	baseline dynamic simulator
dynamic_simulator_motor.py	differential IK로 관절 목표를 갱신	panel target을 바로 pose target으로 쓰고 매 부분 IK 수행	arm은 computed torque, finger는 motor PD torque를 data.ctrl에 입력	mj_step 사용, 접촉 여부에 따라 finger gain 조절	motor actuator 기반 torque-level teleoperation / grasping 실험
kinematic_simulator.py	IK 결과를 data.qpos에 직접 대입	panel target을 task-space pose로 보간한 뒤 IK 수행	torque 계산 없이 ctrl만 qpos와 동기화	mj_forward 사용, data.time 수동 증가	IK와 trajectory 동작 확인용 kinematic simulator

영역	구현 내용
Environment, Viewer class	model, data, viewer, mujoco API wrapper를 묶어서 관리하는 Environment class / GLFW와 mujoco rendering API를 묶은 Viewer class, camera 조종 패널 추가, event handler에서 polling 중심 구조로 변경 (reference: dm_control)
Kinematic simulator	IK 결과를 data.qpos에 직접 반영하고 mj_forward로 상태를 갱신
Dynamic simulator	IK 결과를 actuator ctrl에 넣고 mj_step으로 mujoco dynamics 진행
시뮬레이션 공통	rendering, polling, trajectory generation, simulation 시간축 분리 및 적절한 주기(ex. trajectory_duration, poll_interval) 탐색
Multi target IK	multi target의 jacobian과 error를 쌓는 get_stacked_ik 함수, damped least squares로 IK 계산, 클리핑 로직 최적화
Differential IK	actual state 기준으로 multi target jacobian을 구성하고, least-squares로 qvel target과 다음 q target 계산
Trajectory	pose interpolation(시작점 속도/가속도가 비영으로 부드러운 궤적 전환 도모), joint-space interpolation
Dynamics utility	computed torque, PD controller 모듈을 구현하고, mujoco timestep마다 각각 팔과 손가락 제어를 담당 (reference: robosuite)
Collision utility	mujoco data.contact 기반 robot-table, finger-object 접촉 판정
mujoco utility	편의를 위한 mujoco API wrapper (ex. joint id, dof id, actuator id 매핑)
Assets	motor actuator로 구성된 ffw MJCF 파일 추가, 손가락 마디 사이에 self-collision을 exclude 태그로 임시 방지

Simulator Overview

MyJoCo(feat. MuJoCo)는 MuJoCo-free 구시뮬레이터를 MuJoCo-based 시뮬레이터로 재탄생한 프로젝트입니다. MJCF을 제외한 MyJoCo 코드는 vibe coding 없이 from scratch로 제작되었습니다.

레퍼런스로 삼은 레포지토리는 dm_control: Google DeepMind Infrastructure for Physics-Based Simulation 입니다. dm_contol은 RL 태스크 정의 단위인 Task 모듈을 학습시키기 위해 한 층 더 추상화된 Physics 클래스를 제공합니다. id 매핑 helper도 만들어야할 만큼 mujoco api가 충분하지 않았다고 느낀 이후 dm_control를 참고하게 되었습니다. 이밖에 headless RL 환경이 아닌 interactive simulator인 점, mujoco, viewer, composer, suite로 이어지는 계층구조와 확장성 역시 참고 계기가 되었습니다. dm_control api 명세서를 통해

event handler가 아닌 polling 중심의 viewer와의 상호작용
control timestep과 physical timestep을 분리 (Myjoco에서 각각 time.time, opt.timestep에 대응)
MjModel, MjData를 묶고 필드를 관리하는 구조 등등

시뮬레이션에 대한 이해를 높일 수 있었습니다. 기타 레퍼런스는 Core Implementation에서 더 자세히 다룹니다.

Simulator Structure

sim_with_mujoco/
  dynamic_simulator_position.py position actuator 기반 dynamic entry
  dynamic_simulator_motor.py    motor actuator 기반 torque-control entry
  kinematic_simulator.py        qpos 직접 갱신 기반 kinematic entry
  environment/
    env.py                      model, data, viewer wrapper
  mjcf/
    parser.py                   MJCF parser
  utils/
    ik.py                       DLS multi-target IK
    ik_qp.py                    differential IK
    dynamics.py                 CT, PD, task-space control
    kinematics.py               finger / kinematics helper
    collision.py                contact helper
    math3d.py                   transform / twist helper
    mj.py                       mujoco id mapping helper
    planning.py                 waypoint / trajectory helper
  viewer/
    viewer.py                   renderer wrapper
    glfw_panel.py               hand target panel
    gui_panel.py                camera control panel
sim/model/motion/
  trajectory.py                 task / joint trajectory math

Core Implementation

Dynamic mode (motor)
- Pipeline
- Dynamics utilities
- Differential IK QP
Dynamic mode (position)
Kinematic mode
Jacobian
Multi-target IK
Trajectory Generation
Environment / Viewer class

polling, rendering / simulation 시간축 분리

시뮬레이터 loop는 크게 두 축의 시간 흐름이 있습니다.

time.time(): 절대적으로 흐르는 시간으로 시뮬레이션 시간과 독립적이며
- polling 주기
- rendering 주기에 관여합니다.
mjData.time: mujoco simulation에서 관리하는 timestep으로, dynamic simulator에서는 mj_step()이 자동으로 관리해주고, kinematic simulator는 loop마다 수동으로 더해줘야 합니다.
- 궤적 형성 주기
- contact detection 주기
- tau 계산
- mj_step()에 관여합니다.

아래는 간소화된 dynamic mode(motor) 시뮬레이터 구조를 나타냅니다.

    try:
        while not glfw.window_should_close(env.viewer.window): # 렌더링 주기
            # 렌더링 프레임당 시뮬레이션 묶음
            for _ in range(sim_steps_per_frame):
                now = time.time()

                # 패널 입력 polling
                polled_target = None
                if now - last_poll_time >= poll_interval:
                    last_poll_time = now
                    polled_target, polled_camera = env.viewer.poll_target()

                # task-space target 갱신
                if polled_target is not None:
                    T_des = initial_pose.copy()
                    T_des[:3, 3] = polled_target[:3].copy()
                    target_rpy = polled_target[3:]
                    target_rot = rpy2rotation_matrix(target_rpy[0], target_rpy[1], target_rpy[2])
                    T_des[:3, :3] = initial_pose[:3, :3] @ target_rot
                    alpha_target[:] = [polled_target[6], polled_target[7]]

                # mujoco step 단위 제어 루프
                for _ in range(steps_per_sim):
                    # reference state 갱신
                    mujoco.mj_copyData(ref_data, env.model, env.data)
                    ref_data.qpos[:] = q_des
                    ref_data.qvel[:] = 0.0
                    mujoco.mj_forward(env.model, ref_data)

                    # differential IK target 갱신
                    q_des, q_dot_des, _ = solve_differential_ik(...)

                    # finger alpha 설정
                    alpha_cmd += k_alpha * (alpha_target - alpha_cmd) * env.model.opt.timestep # differential IK와 유사하게 한 timestep 동안 이동시킬 alpha 소분을 짐작 (추후 수정)
                    alpha_cmd = np.clip(alpha_cmd, 0.0, 1.0)

                    # finger torque control
                    finger_pd_control(env, alpha_cmd)

                    # arm computed torque control
                    tau_des = computed_torque_control(...)

                    # motor actuator command 변환
                    for i, actuator_id in enumerate(actuator_ids):
                        gear = env.model.actuator_gear[actuator_id, 0]
                        env.data.ctrl[actuator_id] = clip(tau_des[i] / gear)

                    # dynamics integration
                    env.step(1)

            # rendering
            env.viewer.render()

    finally:
        env.viewer.terminate_viewer()

이 구조에서 rendering, polling, physics step을 시간축을 분리했습니다. 바깥 while은 GLFW window 주기와 rendering을 관리하고, polling은 절대 시간을 기준으로 panel target을 읽으며, 안쪽 steps_per_sim 루프는 mujoco timestep 기준으로 IK, torque 계산, mj_step에 관여합니다.

따라서 panel 입력은 비교적 낮은 주기로 갱신되지만, 같은 target을 기준으로 물리 제어는 더 촘촘하게 실행됩니다. motor 버전에서는 task-space target을 별도 trajectory로 보간하기보다 매 step differential IK로 q target을 갱신하고, 그 아래에서 finger PD와 arm computed torque가 ctrl을 채운 뒤 mujoco가 dynamics를 적분합니다.

Dynamic mode (motor)

Pipeline

motor 버전의 전체 구조는 robosuite의 IK_POSE 계열처럼 사용자 입력 pose를 곧바로 힘으로 쓰지 않고, high-level end-effector target에서 joint target을 만든 뒤 torque-level controller로 내려보내는 계층 구조를 참고했습니다. 패널 입력은 목표 pose를 만들고, differential IK는 매 step 현재 state 기준으로 이를 qvel target과 다음 q target으로 변환합니다.

그 아래에서는 팔과 손을 하나의 제어식으로 억지로 묶지 않고 부위별 controller를 분리했습니다. 팔은 computed torque controller로 관성항과 bias force를 고려하고, 손가락은 안정성을 목적의 낮은 stiffness의 PD controller로 torque를 제한합니다. 즉 IK는 물리 제약을 QP로 풀며 low-level controller 입장에서 플래닝 부담을 줄이게 합니다. 현재 IK가 grasping보다 arm 제약에 초점을 두었기 때문에 완벽한 플래너 역할을 한다고 보기는 어렵지만, 추후 finger/arm을 동시에 제어할 수 있는 방법론도 조사할 예정입니다.

Reference: robosuite Controllers, robosuite Controller API

constrained Differential IK

for target in targets:
    body_id, target_T, is_pose, weight = target
    mujoco.mj_jacBody(model, data, jacp, jacr, body_id)

    if is_pose:
        T_current = get_body_T(data, body_id)
        _, err = calculate_twist_error(T_current, target_T)
        J = Adjoint(np.linalg.inv(T_current)) @ np.vstack([jacr, jacp])
    else:
        err = target_T[:3, 3] - get_body_T(data, body_id)[:3, 3]
        J = jacp

    J_list.append(np.sqrt(weight) * J[:, dof_ids])
    vel_list.append(np.sqrt(weight) * gain * err)

result = lsq_linear(A, b, bounds=(lower, upper))
q_next[qpos_ids] = q_current + result.x * dt

Differential IK는 목표 pose를 한 번에 최종 관절각으로 풀기보다, 현재 상태에서 target error를 줄이는 joint velocity를 계산합니다.
여기서는 Drake Differential IK처럼 현재 pose와 target pose의 차이를 시간 간격 기준의 속도 문제로 바꾸는 정석적 형태를 참고했고, 구현은 scipy의 least-squares로 관절 속도 제한과 관절각 제한을 같이 걸었습니다.

Reference: Drake Differential IK

Dynamics utilities

sim_with_mujoco/utils/dynamics.py에는 실험적인 제어 블록들이 들어 있습니다.

def solve_inverse_dynamics(model, data, qacc):
    data.qacc = qacc.copy()
    mujoco.mj_inverse(model, data)
    qfrc_inverse = data.qfrc_inverse

    return qfrc_inverse.copy()

mujoco의 mj_inverse()는 현재 qpos, qvel, 원하는 qacc를 바탕으로 필요한 generalized force를 계산합니다.

Joint-space computed torque controller (for arm)

dynamic motor simulator에서 팔은 panel target을 최대한 잘 따라가게 하기 위해 computed torque controller를 사용했습니다. robosuite의 operational space controller는 end-effector pose error를 task-space wrench로 만들고 이를 \(J^T f\)로 joint torque에 매핑하는 구조라서 end-effector 기준 action space에는 자연스럽지만, 본 프로젝트에서는 differential IK가 이미 panel pose를 joint target으로 변환합니다. 따라서 arm controller는 다시 task-space wrench를 만들기보다, \(q_{des}, \dot{q}_{des}, \ddot{q}_{des}\)를 직접 추종하면서 \(M(q)\), bias force를 보상하는 joint-space computed torque가 tracking 준수에 더 적합하다고 판단했습니다. 즉 robosuite의 high-level IK 계층은 참고하되, low-level arm control은 mj_inverse() 기반 CT로 바꿔 panel target tracking 오차를 줄이도록 유도했습니다. 그리고 손가락은 접촉 안정성을 위해 별도 controller로 분리했습니다. 손가락의 경우 질량이 작아 bias force도 arm에 비해 매우 작았고(norm 약 0.02) MJCF에 명시된 force 범위도 작아 클리핑 로직의 영향을 많이 받았습니다. 이로 인해 tracking delay도 arm에 비해 컸습니다.

def computed_torque_control(model, data, q_des, q_dot_des, q_dotdot_des, joint_ids, kp=300, kd=50):
    inv_data = mujoco.MjData(model)  # inverse dynamic 계산용
    mujoco.mj_copyData(inv_data, model, data)

    dof_ids = dof_ids_from_joints(model, joint_ids)
    qpos_ids = model.jnt_qposadr[joint_ids]

    qacc_des = (
        q_dotdot_des  # task_to_joint_space()로 인해 active actuator에 대한 원소로만 구성
        + kp * (q_des[qpos_ids] - inv_data.qpos[qpos_ids])
        + kd * (q_dot_des - inv_data.qvel[dof_ids])
    )

    inv_data.qacc[:] = 0.0
    inv_data.qacc[dof_ids] = qacc_des
    mujoco.mj_inverse(model, inv_data)  # mj_inverse는 내부적으로 중력항을 고려해 토크를 반환
    tau = inv_data.qfrc_inverse[
        dof_ids
    ]  # contact constraint는 mj_step이 처리하므로 actuator torque에서 다시 보상하지 않음


...

한편 computed torque는 시간에 대한 5차 다항식 quintic 함수와 잘 어울리는데, 이는 q, qdot, qdotdot 계산을 위해 그리고 미분 가능한 구간을 위해 5차 다항식이 필요하기 때문입니다. (ROS2 joint trajectory planner와의 조합이 기대되는 부분입니다.)

q_dotdot_des = np.zeros(len(joint_ids))
q_dot_des = np.zeros(len(joint_ids))

위와 같이 목표 관절각속도 q_dotdot_des와 목표 관절각가속도 q_dot_des를 zero로 덮어쓰면 컨트롤러가 목표 궤적이 정지해있다고 인식하는 computed torque setpoint control 문제가 됩니다.

PD controller (for finger)

position actuator 버전에서 mujoco는 내부적으로 P 계열 제어기를 사용하는데 dampratio 설정을 통해 damping 성격의 항도 포함할 수 있습니다. 다만 motor actuator 버전에서는 ctrl이 더 이상 목표 관절각이 아니라 직접 torque 명령이므로, 같은 servo 효과를 코드 레벨에서 재구성할 필요가 있었습니다. 따라서 손가락에는 \(k_p(q_{des}-q)-k_d\dot q\) 형태의 PD torque controller를 사용하여 grasping phase에 따라 gain을 조절할 수 있게 했습니다.

finger_pd_control()은 position actuator가 아닌 motor actuator 손가락에 직접 torque를 넣기 위한 단순 PD 제어기입니다. kinematic mode와 마찬가지로 먼저 각 joint range를 alpha에 대해 보간합니다.

q_des = (1 - alpha[0]) * thumb_q_open[index] + alpha[0] * thumb_q_closed[index]
q = data.qpos[qpos_id]
qdot = data.qvel[dof_id]

tau = kp * (q_des - q) - kd * qdot

tau = np.clip(tau, -tau_max, tau_max)
if model.actuator_ctrllimited[actuator_id]:
    lo, hi = model.actuator_ctrlrange[actuator_id]
    tau = np.clip(tau, lo, hi)

data.ctrl[actuator_id] = tau

이 함수의 핵심은 손가락을 IK로 풀지 않고 grasp slider alpha를 관절 목표각으로 바꾼 뒤, 현재 관절각/속도 오차로 motor torque를 계산하는 것입니다.

Dynamic mode (position)

dynamic mode(position)는 data.qpos를 직접 덮어쓰기보다 actuator ctrl에 넣고 env.step()을 호출합니다. 제어 흐름은 다음과 같습니다.

q_des, joint_ids = solve_ik(...)
actuator_ids = actuator_ids_from_joints(env.model, joint_ids)

for actuator_id in actuator_ids:
    jid = env.model.actuator_trnid[actuator_id, 0]
    qadr = env.model.jnt_qposadr[jid]
    env.data.ctrl[actuator_id] = q_des[qadr]

ffw_sh5.xml를 구성하는 position actuator는 ctrl을 목표 관절각으로 받습니다. mujoco api는 목표 관절각을 받아 P 제어기를 통해 내부적으로 힘/토크를 계산합니다.

\[\tau = k_p (u - q) - k_v \dot{q}\]

다만 입력이 없는 상황에서도 로봇이 처지는 현상이 발생했고 PD servo로 계산된 토크가 중력 같은 지속적인 외력을 고려하지 못한다는 사실을 깨달았습니다. 이에

mujoco.mj_forward(env.model, env.data)
env.data.qfrc_applied[all_dof_ids] = 0.0
env.data.qfrc_applied[all_dof_ids] = env.data.qfrc_bias[all_dof_ids]
env.step(steps_per_sim)

처럼 자유도가 미치는 곳에 qfrc_bias를 외력으로 보상해주었습니다.

\[M(q)\ddot{q} + C(q, \dot{q})\dot{q} + g(q) = \tau\] \[M(q)\dot{v} + \mathrm{qfrc\_bias}(q, v) = \tau\]

qfrc_bias는 mujoco 운동방정식에서 bias generalized force에 해당하는 값입니다. 속도에 따른 코리올리, 원심항 그리고 중력항이 포함되는 쪽으로 이해할 수 있고, 정지 상태에서는 사실상 중력 보상으로 관찰됩니다. 그래서 qfrc_applied에 qfrc_bias를 넣으면 팔이 중력 때문에 무너지는 현상을 줄일 수 있습니다.

Kinematic mode

env.data.qvel[:] = 0.0
mujoco.mj_forward(env.model, env.data)
env.data.time += env.model.opt.timestep * steps_per_sim  # 시뮬레이션 시간 증가 -> 렌더링 API에서 이를 반영

dynamic mode와 달리 kinematic mode는 IK 결과를 ctrl에 대입하고 mj_forward()을 호출합니다. mj_forward()는 충돌 감지 능력이 없고 opt.timestep을 증가시키지 않아 간소화된 충돌 감지 로직을 추가하고 수동으로 simulation timestep을 더해줘야 합니다.

for contact_id in range(data.ncon):
    contact = data.contact[contact_id]
    body1 = mujoco.mj_id2name(model, mujoco.mjtObj.mjOBJ_BODY, model.geom_bodyid[contact.geom1])
    body2 = mujoco.mj_id2name(model, mujoco.mjtObj.mjOBJ_BODY, model.geom_bodyid[contact.geom2])

    if is_robot_table_contact(body1, body2):
        if contact.dist < 1e-4:
            return True

kinematic mode는 target에 대한 반응이 정말 빠릅니다. 즉 모방 능력이 가장 수월해보이는 시뮬레이터입니다.

Jacobian

Jacobian은 IK에서만 쓰인 것이 아니라, 현재 body/site가 선택된 관절 속도에 의해 어떻게 움직이는지 읽고 task-space 명령을 joint-space 명령으로 바꾸는 공통 연결부로 사용했습니다. 또한 mujoco가 전체 model.nv 자유도 기준의 Jacobian을 채워주므로, 항상 필요한 dof column만 잘라서 사용했습니다.

point1: mj_jacBody, mj_jacSite는 반환값이 아니라 사전할당된 array를 채웁니다.

def get_body_jacobian(model, data, body_id):
    jacp = np.zeros((3, model.nv))
    jacr = np.zeros((3, model.nv))
    mujoco.mj_jacBody(model, data, jacp, jacr, body_id)
    J = np.vstack([jacr, jacp])
    return J

mujoco의 mj_jacBody, mj_jacSite는 matrix를 return하지 않고 jacp, jacr 배열을 채웁니다. jacp는 linear velocity, jacr은 angular velocity에 대한 Jacobian이고, 프로젝트에서는 twist 순서를 맞추기 위해 [jacr, jacp] 형태의 6 x nv 행렬로 합쳤습니다.

point2: full Jacobian에서 필요한 dof column만 부분적으로 사용합니다.

def get_body_twist(model, data, ee_id, joint_ids):
    dof_ids = np.array([model.jnt_dofadr[jid] for jid in joint_ids], dtype=int)

    jacp = np.zeros((3, model.nv))
    jacr = np.zeros((3, model.nv))
    mujoco.mj_jacBody(model, data, jacp, jacr, ee_id)

    J = np.vstack([jacr, jacp])[:, dof_ids]
    qvel = data.qvel[dof_ids]
    twist_current = J @ qvel
    return twist_current

ffw 모델은 lift, 양팔, 손가락, 바퀴 등 여러 자유도를 함께 갖지만, 특정 controller가 실제로 움직일 관절은 그중 일부입니다. 그래서 mujoco가 준 전체 Jacobian을 그대로 쓰지 않고 joint id를 dof id로 바꾼 뒤 필요한 column만(active joints) 뽑아 현재 end-effector twist를 계산했습니다.

point3: task-space PD에서는 J.T로 task command를 joint torque로 변환합니다.

def pd_task_space(env, x_des, twist_des, twist_dot_des, joint_ids, kp=16, kd=4):
    dof_ids = np.array([env.model.jnt_dofadr[jid] for jid in joint_ids], dtype=int)

    twist_current = get_body_twist(env.model, env.data, env.ee_body_id, joint_ids)
    twist_error = env.get_twist_error(x_des)
    cmd = twist_dot_des + kd * (twist_des - twist_current) + kp * twist_error

    J = get_body_jacobian(env.model, env.data, env.ee_body_id)[:, dof_ids]
    tau = J.T @ cmd

    return tau

pose error, twist error, twist_dot_des가 섞인 task-space command를 J.T로 joint space에 투영하면 end-effector 방향의 오차를 줄이는 토크를 각 active joint에 분배할 수 있습니다. 다만 단순 자코비안 변환은 mass matrix가 포함된 구조는 아니므로 각 관절의 무게, 관성을 고려하지 못하는 문제가 있습니다.

Multi-target IK

solve_ik()는 오른손, 왼손 두 타겟을 입력으로 받아 stacked jacobian, error를 계산하고 관절 제약이 있는 IK 문제를 풉니다.

오른손 target은 pose IK이므로 6차원 twist error를 씁니다. 여기서 pose error를 행렬 원소 차이로 구하면 SE(3)의 group 구조를 무시하게 되어 현재 pose에서 목표 pose까지의 상대 rigid transform을 만들고, 그 상대 변환에 log를 씌워 tangent vector로 바꿉니다.

_, err_i = calculate_twist_error(T_sb, T_sd)
J = np.vstack([jacr, jacp])
J_b = Adjoint(np.linalg.inv(T_sb)) @ J
J_i = J_b[:, dof_ids]

반면 왼손은 position IK로 3차원 position error를 사용합니다.

err_i = T_sd[:3, 3] - T_sb[:3, 3]
J_i = jacp[:, dof_ids]

왼손 target은 position error만 씁니다.

err_i = T_sd[:3, 3] - T_sb[:3, 3]
J_i = jacp[:, dof_ids]

그다음 한데 모인 각 target의 Jacobian과 error는 multi-target IK를 하나의 선형 문제로 풀게 해줍니다.

return np.vstack(J_list), np.hstack(err_list)

현재 stacked error는 총 9차원으로, solver는 15개의 active joint을 조정해 이 error를 줄입니다.

왼손, 오른손 각기 다른 IK를 적용하는 구조가 좋은 이유는, J @ dq = error에 대해 관절 차원(15=왼팔7+오른팔7+리프트1)이 error 차원(9)보다 크고, 여러 해를 확보할 수 있는 상태가 되기 때문입니다. 이는 두 손 모두 pose IK를 적용한 경우보다 큰 여유 자유도를 갖습니다.

다음으로 특이점 근처에서는 damped pseudoinverse를 사용합니다.

def damped_pseudoinverse(J, damping=1e-2):
    rows, cols = J.shape

    if rows <= cols:
        return J.T @ np.linalg.inv(J @ J.T + damping**2 * np.eye(rows))
    
    return np.linalg.inv(J.T @ J + damping**2 * np.eye(cols)) @ J.T

Moore-Penrose pseudoinverse만 쓰면 singular value가 작아질 때 관절 update가 커질 수 있습니다. damping을 넣으면 역행렬 분모에 작은 양수가 더해져 update 폭주를 줄일 수 있습니다.

if model.jnt_limited[joint_id]:
    q_lower, q_upper = model.jnt_range[joint_id]
    ik_data.qpos[qadr] = np.clip(ik_data.qpos[qadr], q_lower, q_upper)

각 iteration에서는 xml joint range도 반영합니다.

if check_collision and is_collision(model, ik_data):
    mujoco.mj_copyData(ik_data, model, prev_data)
else:
    mujoco.mj_copyData(prev_data, model, ik_data)

그리고 위 충돌 감지 분기점은 kinematic mode에서만 실행되는 블록으로, mj_forward() 대신 충돌을 감지하고 이전 configuration으로 rollback합니다.

Trajectory Generation

시작 속도/가속도가 비영인 pose interpolation

def interpolate_pose_ros(T_start, T_end, twist_start, twistdot_start, T, t):
    p_start = T_start[:3, 3]
    p_end = T_end[:3, 3]
    R_start = T_start[:3, :3]
    R_end = T_end[:3, :3]

    w_start = twist_start[:3]
    v_start = twist_start[3:]
    w_dot_start = twistdot_start[:3]
    v_dot_start = twistdot_start[3:]

    p_t, p_dot_t, p_dotdot_t = quintic_interpolate_position(
        p_start, p_end, v_start, v_dot_start, T, t
    )

    rotvec_end = Rotation.from_matrix(R_start.T @ R_end).as_rotvec()
    rotvec_dot_start = R_start.T @ w_start
    rotvec_dotdot_start = R_start.T @ w_dot_start
    rotvec_t, rotvec_dot_t, rotvec_dotdot_t = quintic_interpolate_position(
        np.zeros(3), rotvec_end, rotvec_dot_start, rotvec_dotdot_start, T, t
    )

    R_t = R_start @ Rotation.from_rotvec(rotvec_t).as_matrix()
    w_t = R_t @ rotvec_dot_t
    w_dot_t = R_t @ rotvec_dotdot_t

    return create_transform_matrix(R_t, p_t), np.r_[w_t, p_dot_t], np.r_[w_dot_t, p_dotdot_t]

interpolate_pose_ros는 시작 pose와 목표 pose만 잇는 것이 아니라, 시작 시점의 twist와 twistdot도 경계조건으로 사용합니다. 따라서 target이 갱신되더라도 이전 궤적의 속도와 가속도를 이어받아 position, rotation의 잦은 감/가속을 방지할 수 있습니다.

position은 시작 선속도와 선가속도를 포함한 quintic 보간으로 만들고, rotation은 시작 자세 기준 상대 회전벡터를 5차 보간한 뒤 다시 SO(3) 회전행렬로 복원합니다. rotation matrix를 원소별로 보간하지 않고 회전벡터를 거치는 이유는 중간 자세가 계속 유효한 회전이 되게 하기 위해서입니다. 또한 회전벡터의 1차, 2차 미분을 현재 자세로 변환해 원하는 각속도와 각가속도도 함께 얻습니다.

joint-space quintic

joint-space 궤적 생성기는 target pose가 바뀌었을 때 먼저 IK로 trajectory_goal_q를 구하고, q 사이를 보간합니다. 일반적인 타임 스케일링 함수와 다른 점은 인접한 두 궤적이 qvel을 공유하게 해 초기 qvel, qacc가 0이 아닐 수 있습니다. 궤적들을 부드럽게 잇는 이 5차 방정식은 ROS의 궤적형성 메커니즘에서 영감을 받았습니다.

# 초기 q, qdot, qdotdot은 비영이고 q end는 0을 만족시키는 quintic
def quintic_time_scaling_ros(q_start, q_end, q_dot, q_dotdot, T, t):
    t = np.clip(t, 0.0, T)

    D = q_end - (q_start + q_dot * T + 0.5 * q_dotdot * T**2)

    a0 = q_start
    a1 = q_dot
    a2 = 0.5 * q_dotdot


...

관절 공간에서 관절 궤적을 단순 보간하면 중간에 불가능한 해와 마주치는 경우가 있습니다. ffw-sh5 모델의 경우 오른손의 y축 방향 움직임이 joint-space 보간시 매우 불안정했습니다. ROS는 궤적 상에 중간 지점, 즉 waypoint를 사전에 정해놓기에(일종의 planning) 나쁜 해에 덜 노출됩니다. 위 quintic_time_scaling_ros는 인접 궤적 간 물리량을 공유하지만 waypoint를 미리 지정하지 않습니다. 참고로 관절 공간 궤적 생성기는 실시간 성능에 저해 요소가 되어 teleoperation simulator에는 사용하지 못했습니다. 관절 공간 보간이 end-effector의 target path를 보장하지 못한다는 치명적인 문제점도 한 몫하였습니다.

Planning (추가)

기존에는 생성된 궤적을 바로 제어에 활용하여 특이점을 고려하지 못했고 mjData에 무관하게 궤적 기간이 동일한 등등 저수준 궤적 형성에 머물렀습니다. 특히 joint-space에 바로 보간을 적용하면 불가능한 해가 나올 수도 있습니다. 새로 추가한 planning 모듈은 trajectory 집합을 관리합니다. 저는 실시간으로 새로운 궤적을 현재 궤적과 블렌딩하는 방법보다 하나의 긴 경로를 여러 궤적으로 쪼갤 때 블렌딩이 더 용이해질 것을 기대했습니다. 결과적으로 특이점에 도달하는 경우는 줄었지만 반응은 더 지연되었고, 짧은 경로에서 크게 효과적이지 않았던 것 같습니다.

참고했던 ROS2 joint trajectory controller는 task-space 목표를 여러 구간으로 나누고 그 사이를 quintic 보간합니다. 이때 0이 아닌 양끝 속도값으로 5차 다항식을 풀어 구간 경계(waypoint)에서의 빈번한 감가속을 완화합니다. 이제 controller에서는 궤적을 형성할 때 planning api만 호출하고 planning 모듈이 trajectory 모듈을 호출하게 됩니다.

현재 각 waypoint는 특이점 평가, 관절각 및 토크 범위를 고려하며 충돌은 고려하지 않고 있습니다.

특이점 평가는 body Jacobian의 singular value를 이용합니다. 계획된 waypoint의 q를 임시 MjData에 넣고 mj_forward()를 호출한 뒤, end-effector body 기준 Jacobian을 계산합니다.

def get_singular_values(model, data, body_id, joint_ids):
    dof_ids = np.array([model.jnt_dofadr[jid] for jid in joint_ids], dtype=int)

    jacp = np.zeros((3, model.nv))
    jacr = np.zeros((3, model.nv))
    mujoco.mj_jacBody(model, data, jacp, jacr, body_id)
    J = np.vstack([jacr, jacp])[:, dof_ids]

    singular_values = np.linalg.svd(J, compute_uv=False)

    return singular_values[-1], singular_values

여기서 J는 전체 자유도 자코비안이 아니라 IK에 사용한 active joints의 dof 열만 잘라낸 6 x n 자코비안입니다. singular_values[-1]은 가장 작은 singular value이며, 이 값이 작아질수록 특정 task-space 방향으로 움직이기 위해 매우 큰 관절 변화가 필요해질 수 있습니다. planning.py에서는 이 값을 singularity_threshold와 비교해 위험한 waypoint를 무시하고 재시도합니다.

Environment class

Environment 클래스는 mjData, mjModel, control input, simulation step, rendering을 한데 묶어 루프의 부담을 줄입니다. dm_control도 Physics가 mujoco state와 step/render를 맡고, control.Environment가 task의 before_step/after_step과 observation/reward 생성을 분리해 physics와 task logic이 서로 얽히지 않게 합니다. (Myjoco는 보다 포괄적인 클래스명 Environment를 사용했습니다) 따라서 이 프로젝트에서는 mjData, mjModel는 Environment가 소유하고, Viewer는 같은 data를 읽어 그리기만 하며, controller가 만든 qpos/ctrl/torque는 env.step() 직전에 state를 반영하도록 설계했습니다.

Environment는 mujoco라는 물리법칙, 즉 규칙들의 집합으로 모델과 유사한 역할을 합니다. 또한 Viewer 인스턴스를 감싸며 mjData 공유하는데, 컨트롤러가 qpos(ctrl)을 바꾸고 mj_step()을 호출하면 viewer는 공유된 data를 렌더링합니다.

class Environment:
    def __init__(self, xml_path, end_effector):
        self.model, self.data = parser(xml_path)
        self.ee_body_id = mujoco.mj_name2id(
            self.model,
            mujoco.mjtObj.mjOBJ_BODY,
            end_effector,
        )
        self.ee_body_name = end_effector
        self.viewer = Viewer(self.model, self.data)


...

Environment 생성자는 빈출 geom 필드를 저장합니다. 현재는 named access helper가 mjcf.py 파일로 분기되어 있지만 geom 정보를 필드로 담는 Environment 메서드로 재구성할 계획입니다.

Viewer class

Viewer 클래스는 mujoco rendering api를 사용하여 target 및 camera panel(시야 조절 패널)을 생성합니다. 특히 poll_target()은 event handler가 아닌 polling 방식으로 호출부에서 패널 입력을 조회하도록 만듭니다.

다음 render 함수는 엔트리 메인 루프에서 호출되며 FPS와 연산량 사이의 적절한 트레이드오프를 찾아야 합니다.

self.width, self.height = glfw.get_framebuffer_size(self.window)
self.viewport = mujoco.MjrRect(0, 0, self.width, self.height)

mujoco.mjv_updateScene(
    self.model,
    self.data,
    self.opt,
    None,
    self.cam,
    mujoco.mjtCatBit.mjCAT_ALL,
    self.scene,
)

mujoco.mjr_render(self.viewport, self.scene, self.context)
self.hand_pose_panel.render(self.window, self.context)
self.gui_panel.render(self.window, self.context)
glfw.swap_buffers(self.window)

Id, Name mapping helper

id 체계는 kinematic tree를 쓰면서 가장 헷갈리는 부분으로, joint, dof, body 등의 id를 서로 매핑하는 helper를 만들었습니다.

def actuator_ids_from_joints(model, joint_ids):
    joint_id_set = set(joint_ids)
    actuator_ids = []

    for actuator_id in range(model.nu):
        if model.actuator_trntype[actuator_id] != mujoco.mjtTrn.mjTRN_JOINT:
            continue

        joint_id = model.actuator_trnid[actuator_id, 0]
        if joint_id in joint_id_set:
            actuator_ids.append(actuator_id)

    return actuator_ids

자주 등장하는 joint, dof, actuator id는 서로 다릅니다.

joint id: model.jnt_* 배열을 indexing하는 id입니다.
dof id: data.qvel, data.qacc, data.qfrc_applied, data.qfrc_bias를 indexing하는 id입니다.
actuator id: data.ctrl, model.actuator_*를 indexing하는 id입니다.

hinge/slide joint는 보통 joint 하나가 dof 하나를 갖지만, 그래도 jnt_qposadr, jnt_dofadr, actuator_trnid를 통해 변환해야 합니다. 특히 qpos index와 qvel index는 freejoint나 ball joint가 생기면 바로 달라질 수 있습니다. 이번 ffw_sh5.xml에서는 floating base를 주석 처리했기 때문에 대부분 hinge/slide 위주로 단순하지만, 다른 관절로도 확장해야 합니다.

Limitation (Future Implementation Plan)

IK 행렬이 단순히 stack한 구조라 task 간 우선순위가 없고 직관적으로 position IK를 적용한 왼손의 target error를 과소평가할 것 같습니다. 실제로 시뮬레이션 상에서 왼손은 오른손에 비해 자리를 이탈하는 경우가 많았습니다.
IK의 damping과 dq 제한이 singularity, collision 조건까지 만족시키지 않습니다. 별도 planning 모듈을 추가하거나 Differential IK 모듈 제약을 보완해야 합니다.
향후에 task-space PD외에 mass matrix, null-space posture control을 포함한 operational space controller를 구현해야 합니다.
finger interpolation은 grasp synergy를 모델링하지 않고 open/close alpha를 관절 목표로 직접 매핑해서 손의 실제 닫힘을 충분히 표현하지 못합니다.
viewer diagnostics는 visual contact와 body BVH에 의존하고 있어 q_des saturation, ctrl saturation, qfrc_bias 보상 여부 같은 controller 내부 상태를 즉시 확인하기 어렵습니다. timestep별 mjData 대시보드도 만들면 좋을 것 같습니다. 생성된 궤적(ref)을 얼마나 준수했는지 평가하는 지표도 시각화할 수 있습니다.
dm_control과 달리 현재 Environment wrapper는 state snapshot과 restore 경계가 약해서 IK용 임시 MjData, simulation MjData, viewer가 보는 MjData가 섞여있습니다.
현재 trajectory duration은 고정값인데, 인접 velocity 등을 바탕으로 동적으로 바꿔볼 수 있습니다.
현재 충돌 감지 모듈은 kinematic rollback 중심이라 접촉면을 따라 미끄러짐을 표현하지 못합니다. 또한 robot-table hard collision에 초점이 맞춰진 모듈을 확장해야 합니다. 한편 kinematic mode 기능 범위가 헷갈려 사용처를 더 조사해야 합니다.
점진적으로 강화학습 데모를 붙여봅니다.

References

mujoco XML modeling documentation
mujoco computation and API documentation
mujoco visualization documentation
Gymnasium mujoco environment API
Modern Robotics, Kevin M. Lynch and Frank C. Park: Ch. 3 Rigid-Body Motions, Ch. 5 Velocity Kinematics and Statics, Ch. 6 Inverse Kinematics, Ch. 8 Dynamics of Open Chains, Ch. 9 Trajectory Generation, Ch. 11 Robot Control, Ch. 12 Grasping and Manipulation
Drake Differential IK: https://drake.mit.edu/doxygen_cxx/group__planning__kinematics.html
robosuite Controllers: https://robosuite.ai/docs/modules/controllers.html
dm_control: https://github.com/google-deepmind/dm_control
ROBOTIS mujoco Menagerie assets: https://github.com/ROBOTIS-GIT/robotis_mujoco_menagerie
robosuite assets: https://github.com/ARISE-Initiative/robosuite

Tech Stack

MuJoCo
GLFW
SciPy
matplotlib
Python
NumPy