2 Commits

Author SHA1 Message Date
tkrmagid
9b99283b70 v0.4.11: video frame ring buffer + decoder stats + 0.1s audio buffer
Some checks failed
build / build (push) Has been cancelled
0.4.10 still played at ~2-5 fps even though the decoder buffer was
preallocated. Root cause: the single-slot staging buffer was paced by
SourceDataLine backpressure at the audio buffer's granularity (~0.5 s),
so the decoder burst-produced ~12 video frames into the slot while audio
drained, the consumer saw only the last frame of each burst, then the
decoder stalled until audio drained again. Net visible rate ~ source_fps
/ frames_per_burst.

Fix:
- Replace single staging slot with a 4-slot ring (preallocated, FIFO).
  Decoder writes to ringTail; if full, overwrites oldest and bumps
  droppedFrames so we can see overflow in the log. Render thread drains
  oldest under the same lock — no allocation, no race.
- Shrink audio driver buffer 0.5 s → 0.1 s so the decoder is paced more
  tightly. Burst size collapses from ~12 frames to 2-3, which fits
  inside the ring.
- Log decoder spec on start (WxH @ fps, audio Hz x ch, ring depth) and
  produced/consumed/dropped counters every ~10 s. Lets the user log
  confirm whether the decoder is keeping real-time pace and whether the
  ring is overflowing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 02:10:47 +09:00
tkrmagid
cee01bd448 v0.4.10: preallocate decoder direct buffer, fix 5fps video
Some checks failed
build / build (push) Has been cancelled
0.4.9 allocated a fresh w*h*4 direct ByteBuffer on every grab() — at
1080p × 24fps that's ~192 MB/s of direct memory churn (page zero-fill +
Cleaner enqueue). The decoder thread spent most of its frame budget on
memory bookkeeping instead of decoding, fell behind real time, and the
single-slot AtomicReference saw bursty refills that the render thread
could only sample at ~5fps. Game thread was fine, only the video looked
like 5fps.

Replace it with one preallocated direct buffer per backend instance,
filled under a short-held lock on the decoder side. Swap the pollFrame()
ByteBuffer-returning API for consumeFrame(dstAddr, maxBytes) so the
render thread memcpys straight from staging buffer → GPU texture
pointer under the same lock — no allocation, no race window between
"got buffer" and "decoder overwrote it".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 22:55:04 +09:00
6 changed files with 178 additions and 51 deletions

View File

@@ -3,7 +3,7 @@
마인크래프트 안에서 임의의 동영상 URL을 벽·바닥·천장에 평면으로 재생하는 Fabric 모드. 마인크래프트 안에서 임의의 동영상 URL을 벽·바닥·천장에 평면으로 재생하는 Fabric 모드.
- 모드 ID: `video_player` - 모드 ID: `video_player`
- 현재 버전: **0.4.9** - 현재 버전: **0.4.11**
- 마인크래프트 버전: **26.1.2** - 마인크래프트 버전: **26.1.2**
- 필요 Java: **25** (마인크래프트 26.x 가 요구함) - 필요 Java: **25** (마인크래프트 26.x 가 요구함)
@@ -51,23 +51,23 @@ Fabric은 마인크래프트에 모드 기능을 추가해 주는 로더입니
https://cdn.modrinth.com/data/P7dR8mSH/versions/Sy2Bq7Xc/fabric-api-0.149.0%2B26.1.2.jar https://cdn.modrinth.com/data/P7dR8mSH/versions/Sy2Bq7Xc/fabric-api-0.149.0%2B26.1.2.jar
- 더 최신 빌드를 찾을 땐: https://modrinth.com/mod/fabric-api/versions → 페이지에서 게임 버전 필터 `26.1.2` 를 직접 선택. (URL 파라미터 필터가 듣지 않는 경우가 있어서 페이지 안에서 한 번 더 확인하는 게 안전합니다.) - 더 최신 빌드를 찾을 땐: https://modrinth.com/mod/fabric-api/versions → 페이지에서 게임 버전 필터 `26.1.2` 를 직접 선택. (URL 파라미터 필터가 듣지 않는 경우가 있어서 페이지 안에서 한 번 더 확인하는 게 안전합니다.)
- 받은 `fabric-api-0.149.0+26.1.2.jar``mods` 폴더에 넣습니다. - 받은 `fabric-api-0.149.0+26.1.2.jar``mods` 폴더에 넣습니다.
2. **video_player** (이 모드, 0.4.9 부터 JavaCV 가 jar 안에 포함됨) 2. **video_player** (이 모드, 0.4.11 부터 JavaCV 가 jar 안에 포함됨)
- 다운로드: https://git.tkrmagid.kr/tkrmagid/mc_video_player_mod/releases - 다운로드: https://git.tkrmagid.kr/tkrmagid/mc_video_player_mod/releases
- 자신의 OS·CPU 에 맞는 jar **한 개** 만 받아서 `mods` 폴더에 넣으면 됩니다 (별도 JavaCV 설치 불필요): - 자신의 OS·CPU 에 맞는 jar **한 개** 만 받아서 `mods` 폴더에 넣으면 됩니다 (별도 JavaCV 설치 불필요):
- Windows 64bit: `video_player-windows-x86_64-0.4.9.jar` (~32MB) - Windows 64bit: `video_player-windows-x86_64-0.4.11.jar` (~32MB)
- macOS Intel: `video_player-macosx-x86_64-0.4.9.jar` (~24MB) - macOS Intel: `video_player-macosx-x86_64-0.4.11.jar` (~24MB)
- macOS Apple Silicon (M1/M2/M3/M4): `video_player-macosx-arm64-0.4.9.jar` (~21MB) - macOS Apple Silicon (M1/M2/M3/M4): `video_player-macosx-arm64-0.4.11.jar` (~21MB)
- Linux 64bit: `video_player-linux-x86_64-0.4.9.jar` (~27MB) - Linux 64bit: `video_player-linux-x86_64-0.4.11.jar` (~27MB)
- 자기 OS 가 헷갈리면: Windows 는 거의 다 `windows-x86_64`, 인텔맥은 `macosx-x86_64`, 애플 실리콘 맥은 `macosx-arm64`, 리눅스는 `linux-x86_64`. - 자기 OS 가 헷갈리면: Windows 는 거의 다 `windows-x86_64`, 인텔맥은 `macosx-x86_64`, 애플 실리콘 맥은 `macosx-arm64`, 리눅스는 `linux-x86_64`.
이전 버전(`video_player-0.4.0.jar`, `0.4.2.jar`, `0.4.3.jar`, `0.3.x.jar` 등)이 mods 폴더에 남아있다면 **반드시 삭제**하세요. 두 개가 같이 있으면 마인크래프트가 충돌로 켜지지 않습니다. 0.4.7 이하에서 쓰던 JVM 인수(`-Xbootclasspath/a:...javacv...`) 도 0.4.9 부터는 **빼주세요** — 모드 jar 안에 같은 JavaCV 가 들어있어서 부트클래스패스의 것과 충돌해 검은 화면이 날 수 있습니다. 이전 버전(`video_player-0.4.0.jar`, `0.4.2.jar`, `0.4.3.jar`, `0.3.x.jar` 등)이 mods 폴더에 남아있다면 **반드시 삭제**하세요. 두 개가 같이 있으면 마인크래프트가 충돌로 켜지지 않습니다. 0.4.7 이하에서 쓰던 JVM 인수(`-Xbootclasspath/a:...javacv...`) 도 0.4.11 부터는 **빼주세요** — 모드 jar 안에 같은 JavaCV 가 들어있어서 부트클래스패스의 것과 충돌해 검은 화면이 날 수 있습니다.
### STEP 5. 잘 설치됐는지 확인 ### STEP 5. 잘 설치됐는지 확인
게임 안에서 채팅창에 `/videostick` 을 입력하세요. 정상이라면: 게임 안에서 채팅창에 `/videostick` 을 입력하세요. 정상이라면:
- 인벤토리에 **비디오 스틱** 아이템이 들어옵니다 (보라/검정 missing-texture 가 아니라 작대기 모양 아이콘). - 인벤토리에 **비디오 스틱** 아이템이 들어옵니다 (보라/검정 missing-texture 가 아니라 작대기 모양 아이콘).
- 보라/검정 missing texture 가 나오면 **STEP 4** 에서 이전 버전 jar(`video_player-0.4.0.jar` / `0.4.1.jar` 등)가 mods 폴더에 같이 남아있는 경우입니다. 다 지우고 `0.4.9` 만 남기고 다시 시작하세요. (0.4.1 이하는 Fabric 26.1.2 model 로더가 unprefixed `item/generated` parent 를 거부해서 스틱 아이콘이 missing-model 큐브로 보입니다 — 0.4.2 에서 수정됨.) - 보라/검정 missing texture 가 나오면 **STEP 4** 에서 이전 버전 jar(`video_player-0.4.0.jar` / `0.4.1.jar` 등)가 mods 폴더에 같이 남아있는 경우입니다. 다 지우고 `0.4.11` 만 남기고 다시 시작하세요. (0.4.1 이하는 Fabric 26.1.2 model 로더가 unprefixed `item/generated` parent 를 거부해서 스틱 아이콘이 missing-model 큐브로 보입니다 — 0.4.2 에서 수정됨.)
--- ---
@@ -172,7 +172,7 @@ Fabric은 마인크래프트에 모드 기능을 추가해 주는 로더입니
```sh ```sh
JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew build JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew build
``` ```
산출물: `build/libs/video_player-0.4.9.jar` (~85KB) 산출물: `build/libs/video_player-0.4.11.jar` (~85KB)
플랫폼별 fat jar (JavaCV 1.5.13 + ffmpeg 8.0.1 네이티브 nested): 플랫폼별 fat jar (JavaCV 1.5.13 + ffmpeg 8.0.1 네이티브 nested):
```sh ```sh
@@ -181,7 +181,7 @@ JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew clean build -Pplatform=li
JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew clean build -Pplatform=macosx-x86_64 JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew clean build -Pplatform=macosx-x86_64
JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew clean build -Pplatform=macosx-arm64 JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew clean build -Pplatform=macosx-arm64
``` ```
산출물: `build/libs/video_player-<platform>-0.4.9.jar` (~21-32MB, jar 내부에 nested 로 javacv/javacpp/ffmpeg jar 5개 포함, Fabric loader 가 런타임에 classpath 로 풀어서 로딩) 산출물: `build/libs/video_player-<platform>-0.4.11.jar` (~21-32MB, jar 내부에 nested 로 javacv/javacpp/ffmpeg jar 5개 포함, Fabric loader 가 런타임에 classpath 로 풀어서 로딩)
JavaCV를 직접 의존성으로 가져오는 경우의 Maven 좌표: JavaCV를 직접 의존성으로 가져오는 경우의 Maven 좌표:
``` ```

View File

@@ -5,7 +5,7 @@ org.gradle.configuration-cache=false
# Mod # Mod
mod_id=video_player mod_id=video_player
mod_version=0.4.9 mod_version=0.4.11
maven_group=com.ejclaw.videoplayer maven_group=com.ejclaw.videoplayer
archives_base_name=video_player archives_base_name=video_player

View File

@@ -4,6 +4,8 @@ import com.ejclaw.videoplayer.VideoPlayerMod;
import net.fabricmc.api.EnvType; import net.fabricmc.api.EnvType;
import net.fabricmc.api.Environment; import net.fabricmc.api.Environment;
import org.lwjgl.system.MemoryUtil;
import javax.sound.sampled.AudioFormat; import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem; import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.SourceDataLine; import javax.sound.sampled.SourceDataLine;
@@ -13,7 +15,7 @@ import java.nio.ByteBuffer;
import java.nio.ByteOrder; import java.nio.ByteOrder;
import java.nio.ShortBuffer; import java.nio.ShortBuffer;
import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicReference; import java.util.concurrent.atomic.AtomicLong;
/** /**
* SPEC §5.3 — fallback mp4/http(s) backend driven by JavaCV's FFmpegFrameGrabber. * SPEC §5.3 — fallback mp4/http(s) backend driven by JavaCV's FFmpegFrameGrabber.
@@ -38,7 +40,36 @@ public class JavaCvBackend implements VideoBackend {
private Thread worker; private Thread worker;
private final AtomicBoolean running = new AtomicBoolean(false); private final AtomicBoolean running = new AtomicBoolean(false);
private final AtomicBoolean paused = new AtomicBoolean(false); private final AtomicBoolean paused = new AtomicBoolean(false);
private final AtomicReference<ByteBuffer> latest = new AtomicReference<>(); /**
* Ring buffer of preallocated RGBA staging slots. Decoder thread writes to {@code ringTail}
* under {@link #frameLock}; render thread drains the oldest slot via
* {@link #consumeFrame(long, long)} under the same lock.
*
* <p>0.4.10 used a single staging slot and relied on {@link SourceDataLine#write}
* backpressure to pace the decoder. That paced only at audio-buffer granularity (~0.5 s):
* the decoder burst-produced ~12 video frames into the slot while the audio line drained,
* the consumer (60+ Hz polling) saw only the last frame of each burst, then the decoder
* stalled until audio drained again — net effect ~2 fps of visible video despite the
* decoder producing at the source's 24 fps. The ring absorbs the burst; combined with the
* smaller audio buffer (~0.1 s) below the burst collapses to 23 frames which fits in
* {@link #FRAME_RING_SLOTS}.
*
* <p>If the ring still fills, the decoder overwrites the oldest slot and increments
* {@link #droppedFrames}. Memory cost: {@code 4 × w × h × 4} bytes (32 MB at 1080p,
* ~130 MB at 4K).
*/
private static final int FRAME_RING_SLOTS = 4;
private final Object frameLock = new Object();
private final ByteBuffer[] ringBufs = new ByteBuffer[FRAME_RING_SLOTS];
private final int[] ringBytes = new int[FRAME_RING_SLOTS];
private int ringHead = 0; // next slot to consume
private int ringTail = 0; // next slot to produce into
private int ringCount = 0;
/** Decoder telemetry (cumulative). Logged ~every 10 s from the decode thread. */
private final AtomicLong producedFrames = new AtomicLong();
private final AtomicLong consumedFrames = new AtomicLong();
private final AtomicLong droppedFrames = new AtomicLong();
private volatile int width = 0; private volatile int width = 0;
private volatile int height = 0; private volatile int height = 0;
private volatile float gain = 1.0F; private volatile float gain = 1.0F;
@@ -88,14 +119,39 @@ public class JavaCvBackend implements VideoBackend {
public int videoHeight() { return height; } public int videoHeight() { return height; }
@Override @Override
public ByteBuffer pollFrame() { public boolean consumeFrame(long dstAddr, long maxBytes) {
return latest.getAndSet(null); synchronized (frameLock) {
if (ringCount <= 0) return false;
int idx = ringHead;
int n = ringBytes[idx];
ByteBuffer buf = ringBufs[idx];
// Always advance head regardless of memcpy outcome — otherwise a single oversize
// frame (e.g. mid-resize) would jam the ring forever.
ringHead = (idx + 1) % FRAME_RING_SLOTS;
ringCount--;
if (buf == null || n <= 0 || n > maxBytes) {
// Texture not yet sized for this frame, or empty slot — skip. ensureTexture()
// runs in Entry.tryUpload() before consumeFrame, so n > maxBytes only happens
// on the exact tick of a resolution change.
return false;
}
MemoryUtil.memCopy(MemoryUtil.memAddress(buf), dstAddr, n);
consumedFrames.incrementAndGet();
return true;
}
} }
@Override @Override
public void close() { public void close() {
closed = true; closed = true;
stopWorker(); stopWorker();
synchronized (frameLock) {
for (int i = 0; i < FRAME_RING_SLOTS; i++) {
ringBufs[i] = null;
ringBytes[i] = 0;
}
ringHead = ringTail = ringCount = 0;
}
} }
private void stopWorker() { private void stopWorker() {
@@ -180,6 +236,18 @@ public class JavaCvBackend implements VideoBackend {
localAudioLine = openLine(sampleRate, audioChannels); localAudioLine = openLine(sampleRate, audioChannels);
this.audioLine = localAudioLine; this.audioLine = localAudioLine;
// Decoder spec — printed once per playback so the user log shows what the decoder
// actually sees (resolution / frame rate / sample rate). Used to verify our pacing
// assumptions (e.g. ring depth, audio buffer length) match the source.
double srcFrameRate = 0;
try { srcFrameRate = ((Number) grabberCls.getMethod("getFrameRate").invoke(grabber)).doubleValue(); }
catch (Throwable ignored) {}
VideoPlayerMod.LOG.info(
"[{}] decoder started: {}x{} @ {} fps, audio {} Hz x{}, ring={} slots",
VideoPlayerMod.MOD_ID, width, height,
String.format("%.2f", srcFrameRate),
sampleRate, audioChannels, FRAME_RING_SLOTS);
Class<?> frameCls = Class.forName(FRAME_CLASS); Class<?> frameCls = Class.forName(FRAME_CLASS);
Field imageField = frameCls.getField("image"); Field imageField = frameCls.getField("image");
Field samplesField = frameCls.getField("samples"); Field samplesField = frameCls.getField("samples");
@@ -187,6 +255,14 @@ public class JavaCvBackend implements VideoBackend {
// but we still resolve its class so a future code path could fall back to it if a // but we still resolve its class so a future code path could fall back to it if a
// grabber refuses setPixelFormat. Keep the lookup defensive. // grabber refuses setPixelFormat. Keep the lookup defensive.
// Stats sampling: every 10 s of wall-clock we log produced/consumed/dropped deltas
// and the implied fps. Lets us tell from the log whether the decoder is keeping
// real-time pace (produced≈source fps) and whether the ring is overflowing
// (dropped>0). All counters are cumulative; we keep the previous sample to compute
// deltas.
long statsLastNs = System.nanoTime();
long lastProd = 0, lastCons = 0, lastDrop = 0;
while (running.get() && !closed) { while (running.get() && !closed) {
if (paused.get()) { Thread.sleep(20); continue; } if (paused.get()) { Thread.sleep(20); continue; }
Object frame; Object frame;
@@ -214,18 +290,61 @@ public class JavaCvBackend implements VideoBackend {
Object[] images = (Object[]) imageField.get(frame); Object[] images = (Object[]) imageField.get(frame);
if (images != null && images.length > 0 && images[0] instanceof ByteBuffer src) { if (images != null && images.length > 0 && images[0] instanceof ByteBuffer src) {
// frame.image[0] is the swscale-converted RGBA plane, reused by the grabber // frame.image[0] is the swscale-converted RGBA plane, reused by the grabber
// across grab() calls. Copy into a fresh direct buffer because the render // across grab() calls. Copy into the next ring slot under frameLock so the
// thread reads `latest` asynchronously and would otherwise see a buffer // render thread's consumeFrame() sees coherent frames in FIFO order.
// already being overwritten by the next grab(). //
// Allocation is one-time per slot, lazily on first use (or on a resolution
// upgrade) — never per frame. 0.4.9's per-frame allocateDirect was the
// primary memory-churn problem; 0.4.10 fixed that; 0.4.11 adds the ring on
// top to absorb the burst-then-stall caused by SourceDataLine backpressure
// pacing only at audio-buffer granularity.
int need = src.remaining(); int need = src.remaining();
if (need > 0) { if (need > 0) {
ByteBuffer copy = ByteBuffer.allocateDirect(need).order(ByteOrder.nativeOrder());
int srcPos = src.position(); int srcPos = src.position();
copy.put(src); long srcAddr = MemoryUtil.memAddress(src) + srcPos;
src.position(srcPos); // restore so JavaCV's own bookkeeping isn't disturbed synchronized (frameLock) {
copy.flip(); int idx = ringTail;
latest.set(copy); if (ringBufs[idx] == null || ringBufs[idx].capacity() < need) {
ringBufs[idx] = ByteBuffer.allocateDirect(need).order(ByteOrder.nativeOrder());
} }
long dstAddr = MemoryUtil.memAddress(ringBufs[idx]);
MemoryUtil.memCopy(srcAddr, dstAddr, need);
ringBytes[idx] = need;
ringTail = (idx + 1) % FRAME_RING_SLOTS;
if (ringCount < FRAME_RING_SLOTS) {
ringCount++;
} else {
// Ring was full — we overwrote the oldest frame. Advance head
// to point at the next-oldest so consume order stays FIFO.
ringHead = (ringHead + 1) % FRAME_RING_SLOTS;
droppedFrames.incrementAndGet();
}
producedFrames.incrementAndGet();
}
src.position(srcPos); // restore — JavaCV reads it on subsequent grabs
}
}
// Periodic stats — once per ~10 s of wall-clock. Includes ring depth so we can
// see whether the consumer is keeping up.
long now = System.nanoTime();
if (now - statsLastNs > 10_000_000_000L) {
long prod = producedFrames.get();
long cons = consumedFrames.get();
long drop = droppedFrames.get();
double elapsedS = (now - statsLastNs) / 1e9;
int depth;
synchronized (frameLock) { depth = ringCount; }
VideoPlayerMod.LOG.info(
"[{}] decoder stats: produced={} ({} fps), consumed={} ({} fps), dropped={} (+{}) over {}s, ring={}/{}",
VideoPlayerMod.MOD_ID,
prod, String.format("%.1f", (prod - lastProd) / elapsedS),
cons, String.format("%.1f", (cons - lastCons) / elapsedS),
drop, (drop - lastDrop),
String.format("%.1f", elapsedS),
depth, FRAME_RING_SLOTS);
statsLastNs = now;
lastProd = prod; lastCons = cons; lastDrop = drop;
} }
// If we have an open audio line, SourceDataLine.write() blocks for backpressure // If we have an open audio line, SourceDataLine.write() blocks for backpressure
@@ -262,10 +381,17 @@ public class JavaCvBackend implements VideoBackend {
try { try {
AudioFormat fmt = new AudioFormat(sampleRate, 16, channels, true, false); // signed 16-bit LE AudioFormat fmt = new AudioFormat(sampleRate, 16, channels, true, false); // signed 16-bit LE
SourceDataLine line = AudioSystem.getSourceDataLine(fmt); SourceDataLine line = AudioSystem.getSourceDataLine(fmt);
// ~0.5 s of audio buffered in the driver. Smooths over upstream hiccups without // ~0.1 s of audio buffered in the driver. 0.4.10 used 0.5 s, which let the decoder
// delaying close() — stopWorker() calls line.stop() / line.flush() to dump it. // burst ~12 video frames between backpressure stalls — way past the video ring's
// capacity and the visible cause of the "2-5 fps" stutter the user saw. With 0.1 s
// the audio line refills more often, so the decoder is paced more tightly and
// bursts collapse to 2-3 frames (well inside FRAME_RING_SLOTS).
//
// Floor at frameSizeBytes * 256 keeps the buffer above the typical OS / driver
// minimum so we don't get UnsupportedOperationException at line.open() on
// exotic sample rates.
int frameSizeBytes = 2 * channels; int frameSizeBytes = 2 * channels;
int bufferBytes = Math.max(sampleRate * frameSizeBytes / 2, frameSizeBytes * 1024); int bufferBytes = Math.max(sampleRate * frameSizeBytes / 10, frameSizeBytes * 256);
line.open(fmt, bufferBytes); line.open(fmt, bufferBytes);
line.start(); line.start();
return line; return line;

View File

@@ -3,8 +3,6 @@ package com.ejclaw.videoplayer.client.playback;
import net.fabricmc.api.EnvType; import net.fabricmc.api.EnvType;
import net.fabricmc.api.Environment; import net.fabricmc.api.Environment;
import java.nio.ByteBuffer;
/** /**
* SPEC §5.3 — minimal playback backend abstraction. Implementations: WatermediaBackend (preferred, * SPEC §5.3 — minimal playback backend abstraction. Implementations: WatermediaBackend (preferred,
* when v2 supports the target MC version) and JavaCvBackend (fallback). * when v2 supports the target MC version) and JavaCvBackend (fallback).
@@ -21,10 +19,19 @@ public interface VideoBackend {
int videoHeight(); int videoHeight();
/** /**
* Poll a new decoded RGBA frame if one is ready. * If a new RGBA frame is ready, memcpy it directly into the GPU texture buffer at
* @return the frame buffer (capacity = w*h*4) or {@code null} if no new frame is ready. * {@code dstAddr} (must have room for at least {@code w*h*4} bytes) and clear the dirty
* flag. Returns {@code true} when a frame was written.
*
* <p>Replaces the prior {@code pollFrame()} which returned a {@link java.nio.ByteBuffer}.
* The old contract forced the decoder to either allocate a fresh direct buffer per frame
* (huge memory churn at 1080p — see 0.4.10 changelog) or expose a reused buffer whose
* memory the decoder could clobber while the renderer was still reading. Pushing the copy
* inside the backend lets the decoder hold a single preallocated buffer under its own
* lock and copy out to the GPU pointer in one synchronized block — zero allocation, no
* race window.
*/ */
ByteBuffer pollFrame(); boolean consumeFrame(long dstAddr, long maxBytes);
void close(); void close();
} }

View File

@@ -9,9 +9,7 @@ import net.minecraft.client.Minecraft;
import net.minecraft.client.renderer.texture.DynamicTexture; import net.minecraft.client.renderer.texture.DynamicTexture;
import net.minecraft.core.BlockPos; import net.minecraft.core.BlockPos;
import net.minecraft.resources.Identifier; import net.minecraft.resources.Identifier;
import org.lwjgl.system.MemoryUtil;
import java.nio.ByteBuffer;
import java.nio.file.Path; import java.nio.file.Path;
import java.util.HashMap; import java.util.HashMap;
import java.util.HashSet; import java.util.HashSet;
@@ -113,10 +111,8 @@ public final class VideoPlayback {
continue; continue;
} }
if (!e.backend.isReady()) continue; if (!e.backend.isReady()) continue;
ByteBuffer buf = e.backend.pollFrame();
if (buf == null) continue;
try { try {
e.upload(buf); e.tryUpload();
} catch (Throwable t) { } catch (Throwable t) {
VideoPlayerMod.LOG.warn("[{}] texture upload failed: {}", VideoPlayerMod.MOD_ID, t.toString()); VideoPlayerMod.LOG.warn("[{}] texture upload failed: {}", VideoPlayerMod.MOD_ID, t.toString());
e.close(); e.close();
@@ -188,24 +184,24 @@ public final class VideoPlayback {
} }
} }
/** Copy an incoming RGBA byte buffer into the texture, resizing if dimensions changed. */ /**
void upload(ByteBuffer rgba) { * If the backend has a new RGBA frame, copy it straight into the texture's native
* pixel buffer and re-upload to GPU. The backend does the memcpy under its own lock
* so we never read a half-written frame. RGBA bytes already match NativeImage's
* ABGR-int layout in little-endian byte order (byte 0 = R = low byte of the int).
*/
void tryUpload() {
int w = backend.videoWidth(); int w = backend.videoWidth();
int h = backend.videoHeight(); int h = backend.videoHeight();
if (w <= 0 || h <= 0) return; if (w <= 0 || h <= 0) return;
ensureTexture(w, h, false); ensureTexture(w, h, false);
NativeImage img = texture.getPixels(); NativeImage img = texture.getPixels();
if (img == null) return; if (img == null) return;
long maxBytes = (long) w * h * 4L;
// RGBA bytes from the backend already match NativeImage's ABGR-int layout when if (backend.consumeFrame(img.getPointer(), maxBytes)) {
// viewed as little-endian bytes: byte 0 = R (low byte of ABGR int), byte 1 = G,
// byte 2 = B, byte 3 = A. So a flat memcpy works — no per-pixel swap needed.
// This replaces a 2M-iteration Java loop with one native memcpy for 1080p frames,
// cutting upload time from ~20ms to <1ms and removing the main stutter source.
long bytes = (long) w * h * 4L;
MemoryUtil.memCopy(MemoryUtil.memAddress(rgba), img.getPointer(), bytes);
texture.upload(); texture.upload();
} }
}
void close() { void close() {
try { backend.close(); } catch (Throwable ignored) {} try { backend.close(); } catch (Throwable ignored) {}

View File

@@ -4,8 +4,6 @@ import com.ejclaw.videoplayer.VideoPlayerMod;
import net.fabricmc.api.EnvType; import net.fabricmc.api.EnvType;
import net.fabricmc.api.Environment; import net.fabricmc.api.Environment;
import java.nio.ByteBuffer;
/** /**
* SPEC §5.3 / §5.4 — WaterMedia v2 backend. Reflection-only so the mod jar stays clean of * SPEC §5.3 / §5.4 — WaterMedia v2 backend. Reflection-only so the mod jar stays clean of
* compile-time WaterMedia dependencies. Until a v2 build supports 1.21.6+ this returns * compile-time WaterMedia dependencies. Until a v2 build supports 1.21.6+ this returns
@@ -38,8 +36,8 @@ public class WatermediaBackend implements VideoBackend {
@Override public int videoHeight() { return height; } @Override public int videoHeight() { return height; }
@Override @Override
public ByteBuffer pollFrame() { public boolean consumeFrame(long dstAddr, long maxBytes) {
return null; // no frames until v2 is wired up return false; // no frames until v2 is wired up
} }
@Override @Override