Open in app

Sign In

Write

Sign In

Ani
Ani

1.1K Followers

Home

About

Pinned

How did I become a Data Engineer?

I still remember the day when I first saw a computer in a book fare near to my home. The most important thing that astonished me was articles about the seventh wonder of earth – Tajmahal. I asked my physics teacher that how it is storing so much articles about…

Data Engineering

7 min read

How did I become a Data Engineer?
How did I become a Data Engineer?
Data Engineering

7 min read


Mar 2

Apache Spark — Play with nested files

“Setting goals give you a life to live. When you have zero goals its life that consumes you.” ― Thomas Vato In general when we try to read a directory in spark with say a pattern we do this. val path = "examples/src/main/resources/people.csv" val df = spark.read.csv(path) df.show() +------------------+ |…

Spark

2 min read

Apache Spark — Play with nested files
Apache Spark — Play with nested files
Spark

2 min read


Jul 30, 2022

Spark Optimization : Reducing Shuffle

“Shuffling is the only thing which Nature cannot undo.” — Arthur Eddington I used to see people playing cards and using the word “Shuffle” even before I knew how to play it. Shuffling in cards, play a very critical role to distribute “power”, adding weightage to a player’s hand. It…

Spark

7 min read

Spark Optimization : Reducing Shuffle
Spark Optimization : Reducing Shuffle
Spark

7 min read


Jul 17, 2022

Design a Scalable Data Solution : Know the requirements

A recipe has no soul. You as the cook must bring soul to the recipe.” – Thomas Keller We the data engineers for me, should be no different than a spiritual seeker. A seeker mind is always curious to get more clarity, more transparency. …

Data Engineering

3 min read

Design a Scalable Data Solution : Know the requirements
Design a Scalable Data Solution : Know the requirements
Data Engineering

3 min read


Jun 10, 2022

Tame The Spark UI

“An artist’s concern is to capture beauty wherever he finds it.” ― Kazuo Ishiguro, An Artist of the Floating World We are data engineers and Spark is our best friend and the natural choice when the job is massive parallel data processing. Many times a day we interact with spark…

Bigdata

4 min read

Tame The Spark UI
Tame The Spark UI
Bigdata

4 min read


Jun 3, 2022

Scala Fun : Create your custom config

Work hard and figure out how to be useful and don’t try to imitate anybody else’s success. Figure out how to do it for yourself with yourself. — Harrison Ford Being a data engineer, I try to find fun to create small utils to overcome repetitive work. …

Scala

4 min read

Scala Fun : Create your custom config
Scala Fun : Create your custom config
Scala

4 min read


May 27, 2022

Spark Jobs — Induce Parallelism

Calculus, the electrical battery, the telephone, the steam engine, the radio — all these groundbreaking innovations were hit upon by multiple inventors working in parallel with no knowledge of one another. — Steven Johnson Spark, my all time favourite ETL framework and for all the reasons to be the best…

Spark

4 min read

Spark Jobs — Induce Parallelism
Spark Jobs — Induce Parallelism
Spark

4 min read


Apr 15, 2022

Balance — The game of CAP

“If you are capable, but not available, nature will raise a person with lesser ability to replace you soon.” ― Israelmore Ayivor, Become a Better You No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. In the presence of a partition, one is…

Cap

5 min read

Balance — The game of CAP
Balance — The game of CAP
Cap

5 min read


Mar 26, 2022

Concurrency Control — The Heart Of Transactions

“It is far easier to design a class to be thread-safe than to retrofit it for thread safety later.” ― Brian Goetz, Java Concurrency in Practice The Success Story of RDBMS Handling transactions with ease is always the USP for RDBMS systems. We live in a world where every interaction is transactional in nature. A…

Database

6 min read

Concurrency Control — The Heart Of Transactions
Concurrency Control — The Heart Of Transactions
Database

6 min read


Mar 3, 2022

Consuming Restful API data and store in Spark Dataframe

I would rather walk with a friend in the dark, than alone in the light. — Helen Keller Well apache spark is that friend, Helen was talking about in the big data world. Multiple ocassions when we deal with very low volume of data in some of the interfaces that…

Rest Api

3 min read

Consuming Restful API data and store in Spark Dataframe
Consuming Restful API data and store in Spark Dataframe
Rest Api

3 min read

Ani

Ani

1.1K Followers

Big Data Architect — Passionate about designing robust distributed systems

Following
  • Petr Zapletal

    Petr Zapletal

  • Netflix Technology Blog

    Netflix Technology Blog

  • Eda Johnson

    Eda Johnson

  • Maria Karanasou

    Maria Karanasou

  • Rohan Jacob

    Rohan Jacob

See all (12)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech