vllm.model_executor.layers.quantization.kernels.scaled_mm.triton ¶
TritonScaledMMLinearKernel ¶
Bases: ScaledMMLinearKernel
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py
apply_weights ¶
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py
can_implement classmethod ¶
can_implement(
c: ScaledMMLinearLayerConfig,
) -> tuple[bool, str | None]
is_supported classmethod ¶
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py
process_weights_after_loading ¶
process_weights_after_loading(layer: Module) -> None