[08/05] Running a High-Performance GPT-OSS-120B Inference Server with TensorRT LLM ️ link [08/01] Scaling Expert Parallelism in TensorRT LLM (Part 2: Performance Status and Optimization) ️ link [07/26 ...
The Dockerfiles in this repository now provide a safe multi-stage build that supports an optional plugin install. Usage patterns: Build with plugin: provide the tarball via --build-arg and build the ...