Spark 3.4 — Parameterised SQL

Ani
2 min readAug 4, 2023

Code reuse is the Holy Grail of Software Engineering.
— Douglas Crockford

Reuse Reuse Reuse

Apache Spark 3.4 introduces parameterized SQL queries to enhance query reusability and also reinforces security by mitigating the risk of SQL injection attacks.

● This functionality integrates seamlessly into the extended SparkSession API.

● The sql method now takes a map of parameter names and corresponding Scala/Java literals.

def sql(sqlText: String, args: Map[String, Any]): DataFrame

Parameterized SQL Queries

● With this capability, named parameters can now incorporate effortlessly into the SQL text, expanding their usage beyond traditional constant values and providing flexibility.

● Embrace this innovation to elevate the efficiency, security, and versatility of your Spark data processing workflows!

Let’s test it out!

We have a classic employee table here where some weird names are present. (Some are rivals too!)

Now we need to see which employees have id greater than or equal to 70 and their names contains Do!

 spark.sql(
sqlText = "SELECT * FROM iceberg.employees " +
"WHERE id >= :age and lower(name) like :name",
args = Map(
"age" -> 70,
"name" -> "%do%"))
.show(false)

BINGO!

For any type of help regarding career counselling, resume building, discussing designs or know more about latest data engineering trends and technologies reach out to me at anigos.

P.S : I don’t charge money

--

--

Ani

Big Data Architect — Passionate about designing robust distributed systems