Close Menu
geekfence.comgeekfence.com
    What's Hot

    MongoDB Vulnerability CVE-2025-14847 Under Active Exploitation Worldwide

    December 29, 2025

    How to Maximise Efficiency With a Compact Tractor on a Small Plot

    December 29, 2025

    Designing custom UI with Liquid Glass on iOS 26 – Donny Wals

    December 29, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook Instagram
    geekfence.comgeekfence.com
    • Home
    • UK Tech News
    • AI
    • Big Data
    • Cyber Security
      • Cloud Computing
      • iOS Development
    • IoT
    • Mobile
    • Software
      • Software Development
      • Software Engineering
    • Technology
      • Green Technology
      • Nanotechnology
    • Telecom
    geekfence.comgeekfence.com
    Home»Artificial Intelligence»Posit AI Blog: News from the sparkly-verse
    Artificial Intelligence

    Posit AI Blog: News from the sparkly-verse

    AdminBy AdminNovember 24, 2025No Comments4 Mins Read1 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Posit AI Blog: News from the sparkly-verse
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Highlights

    sparklyr and friends have been getting some important updates in the past few
    months, here are some highlights:

    • spark_apply() now works on Databricks Connect v2

    • sparkxgb is coming back to life

    • Support for Spark 2.3 and below has ended

    pysparklyr 0.1.4

    spark_apply() now works on Databricks Connect v2. The latest pysparklyr
    release uses the rpy2 Python library as the backbone of the integration.

    Databricks Connect v2, is based on Spark Connect. At this time, it supports
    Python user-defined functions (UDFs), but not R user-defined functions.
    Using rpy2 circumvents this limitation. As shown in the diagram, sparklyr
    sends the the R code to the locally installed rpy2, which in turn sends it
    to Spark. Then the rpy2 installed in the remote Databricks cluster will run
    the R code.


    Diagram that shows how sparklyr transmits the R code via the rpy2 python package, and how Spark uses it to run the R code

    Figure 1: R code via rpy2

    A big advantage of this approach, is that rpy2 supports Arrow. In fact it
    is the recommended Python library to use when integrating Spark, Arrow and
    R
    .
    This means that the data exchange between the three environments will be much
    faster!

    As in its original implementation, schema inferring works, and as with the
    original implementation, it has a performance cost. But unlike the original,
    this implementation will return a ‘columns’ specification that you can use
    for the next time you run the call.

    spark_apply(
      tbl_mtcars,
      nrow,
      group_by = "am"
    )
    
    #> To increase performance, use the following schema:
    #> columns = "am double, x long"
    
    #> # Source:   table<`sparklyr_tmp_table_b84460ea_b1d3_471b_9cef_b13f339819b6`> [2 x 2]
    #> # Database: spark_connection
    #>      am     x
    #>    
    #> 1     0    19
    #> 2     1    13

    A full article about this new capability is available here:
    Run R inside Databricks Connect

    sparkxgb

    The sparkxgb is an extension of sparklyr. It enables integration with
    XGBoost. The current CRAN release
    does not support the latest versions of XGBoost. This limitation has recently
    prompted a full refresh of sparkxgb. Here is a summary of the improvements,
    which are currently in the development version of the package:

    • The xgboost_classifier() and xgboost_regressor() functions no longer
      pass values of two arguments. These were deprecated by XGBoost and
      cause an error if used. In the R function, the arguments will remain for
      backwards compatibility, but will generate an informative error if not left NULL:

    • Updates the JVM version used during the Spark session. It now uses xgboost4j-spark
      version 2.0.3
      ,
      instead of 0.8.1. This gives us access to XGboost’s most recent Spark code.

    • Updates code that used deprecated functions from upstream R dependencies. It
      also stops using an un-maintained package as a dependency (forge). This
      eliminated all of the warnings that were happening when fitting a model.

    • Major improvements to package testing. Unit tests were updated and expanded,
      the way sparkxgb automatically starts and stops the Spark session for testing
      was modernized, and the continuous integration tests were restored. This will
      ensure the package’s health going forward.

    remotes::install_github("rstudio/sparkxgb")
    
    library(sparkxgb)
    library(sparklyr)
    
    sc <- spark_connect(master = "local")
    iris_tbl <- copy_to(sc, iris)
    
    xgb_model <- xgboost_classifier(
      iris_tbl,
      Species ~ .,
      num_class = 3,
      num_round = 50,
      max_depth = 4
    )
    
    xgb_model %>% 
      ml_predict(iris_tbl) %>% 
      select(Species, predicted_label, starts_with("probability_")) %>% 
      dplyr::glimpse()
    #> Rows: ??
    #> Columns: 5
    #> Database: spark_connection
    #> $ Species                 "setosa", "setosa", "setosa", "setosa", "setosa…
    #> $ predicted_label         "setosa", "setosa", "setosa", "setosa", "setosa…
    #> $ probability_setosa      0.9971547, 0.9948581, 0.9968392, 0.9968392, 0.9…
    #> $ probability_versicolor  0.002097376, 0.003301427, 0.002284616, 0.002284…
    #> $ probability_virginica   0.0007479066, 0.0018403779, 0.0008762418, 0.000…

    sparklyr 1.8.5

    The new version of sparklyr does not have user facing improvements. But
    internally, it has crossed an important milestone. Support for Spark version 2.3
    and below has effectively ended. The Scala
    code needed to do so is no longer part of the package. As per Spark’s versioning
    policy,
    found here,
    Spark 2.3 was ‘end-of-life’ in 2018.

    This is part of a larger, and ongoing effort to make the immense code-base of
    sparklyr a little easier to maintain, and hence reduce the risk of failures.
    As part of the same effort, the number of upstream packages that sparklyr
    depends on have been reduced. This has been happening across multiple CRAN
    releases, and in this latest release tibble, and rappdirs are no longer
    imported by sparklyr.

    Enjoy this blog? Get notified of new posts by email:

    Posts also available at r-bloggers

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

    Citation

    For attribution, please cite this work as

    Ruiz (2024, April 22). Posit AI Blog: News from the sparkly-verse. Retrieved from 

    BibTeX citation

    @misc{sparklyr-updates-q1-2024,
      author = {Ruiz, Edgar},
      title = {Posit AI Blog: News from the sparkly-verse},
      url = {},
      year = {2024}
    }



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Why Enterprise AI Scale Stalls

    December 28, 2025

    Combining AI and Automation to Improve Employee Productivity in 2026

    December 27, 2025

    Understanding LoRA with a minimal example

    December 26, 2025

    AI Wrapped: The 14 AI terms you couldn’t avoid in 2025

    December 25, 2025

    AI, MCP, and the Hidden Costs of Data Hoarding – O’Reilly

    December 24, 2025

    Google Research 2025: Bolder breakthroughs, bigger impact

    December 23, 2025
    Top Posts

    Understanding U-Net Architecture in Deep Learning

    November 25, 20258 Views

    Microsoft 365 Copilot now enables you to build apps and workflows

    October 29, 20258 Views

    Here’s the latest company planning for gene-edited babies

    November 2, 20257 Views
    Don't Miss

    MongoDB Vulnerability CVE-2025-14847 Under Active Exploitation Worldwide

    December 29, 2025

    Dec 29, 2026Ravie LakshmananDatabase Security / Vulnerability A recently disclosed security vulnerability in MongoDB has…

    How to Maximise Efficiency With a Compact Tractor on a Small Plot

    December 29, 2025

    Designing custom UI with Liquid Glass on iOS 26 – Donny Wals

    December 29, 2025

    Microsoft Azure Is Now Supported By Cloud Development Kits

    December 29, 2025
    Stay In Touch
    • Facebook
    • Instagram
    About Us

    At GeekFence, we are a team of tech-enthusiasts, industry watchers and content creators who believe that technology isn’t just about gadgets—it’s about how innovation transforms our lives, work and society. We’ve come together to build a place where readers, thinkers and industry insiders can converge to explore what’s next in tech.

    Our Picks

    MongoDB Vulnerability CVE-2025-14847 Under Active Exploitation Worldwide

    December 29, 2025

    How to Maximise Efficiency With a Compact Tractor on a Small Plot

    December 29, 2025

    Subscribe to Updates

    Please enable JavaScript in your browser to complete this form.
    Loading
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2025 Geekfence.All Rigt Reserved.

    Type above and press Enter to search. Press Esc to cancel.