Close Menu
Techora News HubTechora News Hub
    Facebook X (Twitter) Instagram
    Techora News HubTechora News Hub
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Techora News HubTechora News Hub
    Home»AI News»A Coding Implementation to Compress and Benchmark Instruction-Tuned LLMs with FP8, GPTQ, and SmoothQuant Quantization using llmcompressor
    AI News

    A Coding Implementation to Compress and Benchmark Instruction-Tuned LLMs with FP8, GPTQ, and SmoothQuant Quantization using llmcompressor

    May 17, 2026
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    A Coding Implementation to Compress and Benchmark Instruction-Tuned LLMs with FP8, GPTQ, and SmoothQuant Quantization using llmcompressor
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email
    murf


    import subprocess, sys
    def pip(*pkgs):
    subprocess.check_call([sys.executable, “-m”, “pip”, “install”, “-q”, *pkgs])
    pip(“llmcompressor”, “compressed-tensors”,
    “transformers>=4.45”, “accelerate”, “datasets”)
    import os, gc, time, json, math
    from pathlib import Path
    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    from datasets import load_dataset
    assert torch.cuda.is_available(), \
    “Enable a GPU: Runtime > Change runtime type > T4 GPU”
    print(“GPU:”, torch.cuda.get_device_name(0),
    “| CUDA:”, torch.version.cuda,
    “| torch:”, torch.__version__)
    MODEL_ID = “Qwen/Qwen2.5-0.5B-Instruct”
    WORKDIR = Path(“/content/quant_lab”); WORKDIR.mkdir(exist_ok=True)
    os.chdir(WORKDIR)
    def free_mem():
    gc.collect(); torch.cuda.empty_cache()
    def dir_size_gb(path):
    total = 0
    for root, _, files in os.walk(path):
    for f in files:
    total += os.path.getsize(os.path.join(root, f))
    return total / 1e9
    def time_generation(model, tok, prompt, max_new_tokens=64):
    “””Greedy decode; reports latency & tokens/sec after a brief warmup.”””
    inputs = tok(prompt, return_tensors=”pt”).to(model.device)
    _ = model.generate(**inputs, max_new_tokens=4, do_sample=False)
    torch.cuda.synchronize()
    t0 = time.time()
    out = model.generate(**inputs, max_new_tokens=max_new_tokens,
    do_sample=False, pad_token_id=tok.eos_token_id)
    torch.cuda.synchronize()
    dt = time.time() – t0
    new_ids = out[0][inputs[“input_ids”].shape[1]:]
    return tok.decode(new_ids, skip_special_tokens=True), dt, max_new_tokens/dt
    @torch.no_grad()
    def wikitext_ppl(model, tok, seq_len=512, max_chunks=20, stride=512):
    “””Light WikiText-2 perplexity probe (fast, indicative).”””
    ds = load_dataset(“wikitext”, “wikitext-2-raw-v1″, split=”test”)
    text = “\n\n”.join(t for t in ds[“text”][:400] if t.strip())
    enc = tok(text, return_tensors=”pt”).input_ids.to(model.device)
    nll_sum, tok_count = 0.0, 0
    for begin in range(0, enc.size(1) – seq_len, stride):
    chunk = enc[:, begin:begin+seq_len]
    out = model(chunk, labels=chunk)
    nll_sum += out.loss.float().item() * seq_len
    tok_count += seq_len
    if tok_count // seq_len >= max_chunks: break
    return math.exp(nll_sum / tok_count)
    results = {}
    PROMPT = (“<|im_start|>user\nIn two sentences, explain why post-training ”
    “quantization works for large language models.<|im_end|>\n”
    “<|im_start|>assistant\n”)
    def benchmark(label, model_path_or_id):
    free_mem()
    print(f”\n──── benchmarking: {label} ────”)
    tok = AutoTokenizer.from_pretrained(model_path_or_id)
    m = AutoModelForCausalLM.from_pretrained(
    model_path_or_id, torch_dtype=”auto”, device_map=”cuda”).eval()
    sample, dt, tps = time_generation(m, tok, PROMPT)
    ppl = wikitext_ppl(m, tok)
    size = dir_size_gb(model_path_or_id) if os.path.isdir(str(model_path_or_id)) else None
    results[label] = {“size_gb”: size, “ppl”: round(ppl, 3),
    “latency_s”: round(dt, 3), “tok_per_s”: round(tps, 1),
    “sample”: sample.strip().replace(“\n”, ” “)[:180]}
    print(json.dumps(results[label], indent=2))
    del m; free_mem()



    Source link

    aistudios
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Amazon launches Alexa for Shopping as Rufus moves behind the scenes

    May 18, 2026

    Two from MIT named 2026 Knight-Hennessy Scholars | MIT News

    May 16, 2026

    Intercom, now called Fin, launches an AI agent whose only job is managing another AI agent

    May 15, 2026

    Physical AI moves closer to factory floors as companies test humanoid robots

    May 14, 2026

    Enterprise AI Governance in 2026: Why the Tools Employees Use Are Ahead of the Policies That Cover Them

    May 13, 2026

    Study: Firms often use automation to control certain workers’ wages | MIT News

    May 11, 2026
    notion
    Latest Posts

    Amazon launches Alexa for Shopping as Rufus moves behind the scenes

    May 18, 2026

    Do THIS instead of watching endless tutorials — how to learn Python for AI

    May 18, 2026

    Long AI Video Kaise Banaye (15 Min) Using Just 1 Prompt🔥|| Ai Automation

    May 18, 2026

    Patrick Witt Teases ‘Breakthrough’ On US Bitcoin Reserve

    May 18, 2026

    Bitcoin price drop below $78K clears path for rebound as options traders hedge downside

    May 18, 2026
    frase
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    Bitcoin Bleeds $1B Weekly but XRP and SOL Defy Market Panic

    May 19, 2026

    Echo Protocol Hacked for $76.7M in Admin Key Exploit

    May 19, 2026
    kraken
    Facebook X (Twitter) Instagram Pinterest
    © 2026 TechoraNewsHub.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 76,687.00
    ethereum
    Ethereum (ETH) $ 2,112.02
    tether
    Tether (USDT) $ 0.999134
    bnb
    BNB (BNB) $ 639.06
    xrp
    XRP (XRP) $ 1.37
    usd-coin
    USDC (USDC) $ 0.999743
    solana
    Solana (SOL) $ 84.63
    tron
    TRON (TRX) $ 0.356302
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.04
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05