Close Menu
geekfence.comgeekfence.com
    What's Hot

    Customer experience management (CXM) predictions for 2026: How customers, enterprises, technology, and the provider landscape will evolve 

    December 28, 2025

    What to Know About the Cloud and Data Centers in 2026

    December 28, 2025

    Why Enterprise AI Scale Stalls

    December 28, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook Instagram
    geekfence.comgeekfence.com
    • Home
    • UK Tech News
    • AI
    • Big Data
    • Cyber Security
      • Cloud Computing
      • iOS Development
    • IoT
    • Mobile
    • Software
      • Software Development
      • Software Engineering
    • Technology
      • Green Technology
      • Nanotechnology
    • Telecom
    geekfence.comgeekfence.com
    Home»Artificial Intelligence»Posit AI Blog: Hugging Face Integrations
    Artificial Intelligence

    Posit AI Blog: Hugging Face Integrations

    AdminBy AdminDecember 15, 2025No Comments4 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Posit AI Blog: Hugging Face Integrations
    Share
    Facebook Twitter LinkedIn Pinterest Email



    We are happy to announce the first releases of hfhub and tok are now on CRAN.
    hfhub is an R interface to Hugging Face Hub, allowing users to download and cache files
    from Hugging Face Hub while tok implements R bindings for the Hugging Face tokenizers
    library.

    Hugging Face rapidly became the platform to build, share and collaborate on
    deep learning applications and we hope these integrations will help R users to
    get started using Hugging Face tools as well as building novel applications.

    We also have previously announced the safetensors
    package allowing to read and write files in the safetensors format.

    hfhub

    hfhub is an R interface to the Hugging Face Hub. hfhub currently implements a single
    functionality: downloading files from Hub repositories. Model Hub repositories are
    mainly used to store pre-trained model weights together with any other metadata
    necessary to load the model, such as the hyperparameters configurations and the
    tokenizer vocabulary.

    Downloaded files are ached using the same layout as the Python library, thus cached
    files can be shared between the R and Python implementation, for easier and quicker
    switching between languages.

    We already use hfhub in the minhub package and
    in the ‘GPT-2 from scratch with torch’ blog post to
    download pre-trained weights from Hugging Face Hub.

    You can use hub_download() to download any file from a Hugging Face Hub repository
    by specifying the repository id and the path to file that you want to download.
    If the file is already in the cache, then the function returns the file path imediately,
    otherwise the file is downloaded, cached and then the access path is returned.

    path <- hfhub::hub_download("gpt2", "model.safetensors")
    path
    #> /Users/dfalbel/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/model.safetensors

    tok

    Tokenizers are responsible for converting raw text into the sequence of integers that
    is often used as the input for NLP models, making them an critical component of the
    NLP pipelines. If you want a higher level overview of NLP pipelines, you might want to read
    our previous
    blog post ‘What are Large Language Models? What are they not?’.

    When using a pre-trained model (both for inference or for fine tuning) it’s very
    important that you use the exact same tokenization process that has been used during
    training, and the Hugging Face team has done an amazing job making sure that its algorithms
    match the tokenization strategies used most LLM’s.

    tok provides R bindings to the 🤗 tokenizers library. The tokenizers library is itself
    implemented in Rust for performance and our bindings use the
    extendr project
    to help interfacing with R. Using tok we can tokenize text the exact same way most
    NLP models do, making it easier to load pre-trained models in R as well as sharing
    our models with the broader NLP community.

    tok can be installed from CRAN, and currently it’s usage is restricted to loading
    tokenizers vocabularies from files. For example, you can load the tokenizer for the GPT2
    model with:

    tokenizer <- tok::tokenizer$from_pretrained("gpt2")
    ids <- tokenizer$encode("Hello world! You can use tokenizers from R")$ids
    ids
    #> [1] 15496   995     0   921   460   779 11241 11341   422   371
    tokenizer$decode(ids)
    #> [1] "Hello world! You can use tokenizers from R"

    Spaces

    Remember that you can already host
    Shiny (for R and Python) on Hugging Face Spaces. As an example, we have built a Shiny
    app that uses:

    • torch to implement GPT-NeoX (the neural network architecture of StableLM – the model used for chatting)
    • hfhub to download and cache pre-trained weights from the StableLM repository
    • tok to tokenize and pre-process text as input for the torch model. tok also uses hfhub to download the tokenizer’s vocabulary.

    The app is hosted at in this Space.
    It currently runs on CPU, but you can easily switch the the Docker image if you want
    to run it on a GPU for faster inference.

    The app source code is also open-source and can be found in the Spaces file tab.

    Looking forward

    It’s the very early days of hfhub and tok and there’s still a lot of work to do
    and functionality to implement. We hope to get community help to prioritize work,
    thus, if there’s a feature that you are missing, please open an issue in the
    GitHub repositories.

    Enjoy this blog? Get notified of new posts by email:

    Posts also available at r-bloggers

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

    Citation

    For attribution, please cite this work as

    Falbel (2023, July 12). Posit AI Blog: Hugging Face Integrations. Retrieved from 

    BibTeX citation

    @misc{hugging-face-integrations,
      author = {Falbel, Daniel},
      title = {Posit AI Blog: Hugging Face Integrations},
      url = {},
      year = {2023}
    }



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Why Enterprise AI Scale Stalls

    December 28, 2025

    Combining AI and Automation to Improve Employee Productivity in 2026

    December 27, 2025

    Understanding LoRA with a minimal example

    December 26, 2025

    AI Wrapped: The 14 AI terms you couldn’t avoid in 2025

    December 25, 2025

    AI, MCP, and the Hidden Costs of Data Hoarding – O’Reilly

    December 24, 2025

    Google Research 2025: Bolder breakthroughs, bigger impact

    December 23, 2025
    Top Posts

    Understanding U-Net Architecture in Deep Learning

    November 25, 20258 Views

    Microsoft 365 Copilot now enables you to build apps and workflows

    October 29, 20258 Views

    Here’s the latest company planning for gene-edited babies

    November 2, 20257 Views
    Don't Miss

    Customer experience management (CXM) predictions for 2026: How customers, enterprises, technology, and the provider landscape will evolve 

    December 28, 2025

    After laying out our bold CXM predictions for 2025 and then assessing how those bets played out…

    What to Know About the Cloud and Data Centers in 2026

    December 28, 2025

    Why Enterprise AI Scale Stalls

    December 28, 2025

    New serverless customization in Amazon SageMaker AI accelerates model fine-tuning

    December 28, 2025
    Stay In Touch
    • Facebook
    • Instagram
    About Us

    At GeekFence, we are a team of tech-enthusiasts, industry watchers and content creators who believe that technology isn’t just about gadgets—it’s about how innovation transforms our lives, work and society. We’ve come together to build a place where readers, thinkers and industry insiders can converge to explore what’s next in tech.

    Our Picks

    Customer experience management (CXM) predictions for 2026: How customers, enterprises, technology, and the provider landscape will evolve 

    December 28, 2025

    What to Know About the Cloud and Data Centers in 2026

    December 28, 2025

    Subscribe to Updates

    Please enable JavaScript in your browser to complete this form.
    Loading
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2025 Geekfence.All Rigt Reserved.

    Type above and press Enter to search. Press Esc to cancel.