arihant.shashank01@gmail.com +91 73370 82594
About Expertise Projects Journey Achievements Recognition Gallery Contact ⬇ Download CV
Available for Leadership Conversations

Arihant
Shashank

Lead Data & AI Architect · Ericsson Global Chief Data & AI Office · Bengaluru

I architect the data & AI systems that power decisions,
productionize LLMs, and transform how enterprises operate.

Arihant Shashank 🏗️ Data Architecture 🤖 AI Engineering ⚡ Data Engineering 🧠 GenAI & RAG ❄️ Snowflake Expert 🔗 Agentic AI

Experience at World-Class Organisations

11+ Years in Data & AI
65M SEK Cost Savings Delivered
Snowflake Data Superhero
CEO LTVP Recognition by Ericsson CEO
Arihant Shashank
11+
Years of Experience

Building the Data Foundation for the AI Era

I'm a Lead Data & AI Architect at Ericsson's Global Chief Data & AI Office, with over 11 years of experience turning raw, complex data into strategic enterprise assets. My work sits at the intersection of data engineering, cloud architecture, and Generative AI.

From migrating legacy systems to Snowflake and saving 65M SEK, to defining Ericsson's GenAI RAG strategy and building AI-ready data products — I focus on outcomes that leadership can see and measure.

I've been recognized by Ericsson's CEO through the Long-Term Variable Pay (LTVP) program — a recognition reserved for the top contributors who embody long-term strategic thinking. I've also been named a Snowflake Data Superhero two years running (2024 & 2025).

Data Architecture Snowflake AWS PySpark / Spark Generative AI RAG (Retrieval-Augmented Generation) LLM Integration AI Agent Creation Agentic AI Pipelines Vector Databases LoRA / Fine-tuning MCP (Model Context Protocol) Embeddings & Semantic Search Apache Iceberg Apache Airflow Kafka Python / Scala AWS Lake Formation AWS Glue AWS Athena Tableau Power BI Data Governance Lakehouse Design CDC Pipelines Data Lineage Data Products GCP Teradata

What I Do

Three pillars that define how I create value for data-driven organizations.

🏗️

Data Architecture & Lakehouse Engineering

Designing multi-zone data lakes and lakehouses using Apache Iceberg, AWS Lake Formation, and Snowflake. Building ACID-compliant, time-travel-enabled pipelines with dynamic schema handling and automated lineage — at enterprise scale.

🤖

AI-Ready Data Products

End-to-end ownership of AI-ready data product architecture across CRM systems. Building conversational AI interfaces on governed enterprise data, enabling natural-language access for analytics and real-time decision-making by business leaders.

🧠

GenAI RAG Strategy & Governance

Defining the GenAI Retrieval-Augmented Generation strategy for Ericsson's Data Office. Setting standards for data chunking, embeddings, vector storage, and retrieval — shaping governance frameworks to productionize LLMs at enterprise scale.

Projects That Define My Career

Not just roles — real, complex, enterprise-scale programmes I own end-to-end at Ericsson.

🏗️

Ericsson · Global · 2024–Present

Ericsson Federated Data Lake (EFDL)

Solo Architecture Driver

The Ericsson Federated Data Lake (EFDL) is one of the most ambitious data infrastructure programmes at Ericsson's Global Chief Data & AI Office — and I am its sole architecture driver. From the ground up, I designed and continue to engineer a multi-zone, federated lakehouse that unifies data across Ericsson's siloed business domains into a single, governed, and AI-ready data platform.

The challenge was significant: Ericsson operates across dozens of business units globally, each with its own data sources, formats, and governance requirements. The EFDL had to bring all of this together without creating a monolithic bottleneck — hence the federated approach, where each domain retains ownership while consuming and publishing to a shared, standardised lakehouse layer.

  • Designed a multi-zone architecture (Raw → Cleansed → Curated → Consumption) using Apache Iceberg as the open table format, AWS Lake Formation for access control, and Snowflake as the serving layer.
  • Built PySpark-based ingestion pipelines with dynamic schema handling and Change Data Capture (CDC) support — so the lake stays current as source systems evolve, with zero manual intervention.
  • Engineered ACID transaction support and time-travel capabilities — enabling rollback, historical analysis, and audit trails at any point in time across all datasets.
  • Integrated REST APIs and batch sources into a unified pipeline fabric with automated data lineage tracking and data quality checks built into every ingestion layer.
  • Established S3-backed metadata layers to ensure the lake is self-documenting, reducing time-to-insight for downstream data consumers and AI teams by orders of magnitude.

How the EFDL Works

1

Ingest: PySpark pipelines pull data from CRM, SAP, REST APIs, and batch files with dynamic schema detection and CDC support.

2

Govern: AWS Lake Formation enforces row/column-level security per domain. Every dataset is tagged, classified, and lineage-tracked automatically.

3

Store: Apache Iceberg tables on S3 provide ACID guarantees, time-travel, and partition evolution — so data is always consistent and queryable historically.

4

Serve: Snowflake sits at the consumption layer — giving BI teams, data scientists, and AI pipelines fast, governed access to clean, curated data.

5

Federate: Each business domain owns its data product. The EFDL provides the rails — not a data silo — enabling enterprise-wide AI at scale.


Tech Stack

Apache Iceberg AWS Lake Formation Snowflake PySpark AWS S3 Apache Airflow CDC Pipelines Data Lineage REST API Integration Data Governance
🤖

Ericsson · Global CRM · 2024–Present

AI-Ready Data Products & Conversational Enterprise AI

Lead Architect

As the Lead Data & AI Architect, I own the end-to-end architecture of Ericsson's AI-Ready Data Products programme — a strategic initiative to transform raw, ungoverned enterprise data into structured, governed, and AI-consumable data products that power both traditional analytics and next-generation GenAI applications.

The crown jewel of this work is the Conversational AI interface I built on top of these governed data products. Instead of writing SQL or opening dashboards, business leaders and decision-makers can now ask questions in plain English — and receive precise, data-backed answers drawn from Ericsson's live enterprise data. Think of it as a natural language BI layer, governed at the source, and trusted at every step.

  • Designed the full data product architecture across multiple CRM systems — from raw ingestion through Snowflake modelling, to semantic governance layers ready for AI consumption.
  • Built automated ingestion pipelines using PySpark, Apache Airflow, and AWS — supporting both real-time updates and batch loads with SLA-driven reliability.
  • Architected a Conversational AI layer using RAG (Retrieval-Augmented Generation) — connecting LLMs to governed Snowflake data products so responses are always grounded in current, trusted enterprise data.
  • Enabled natural language access for analytics and decision-making — reducing time from question to insight from hours to seconds for business stakeholders.
  • Established data governance and quality gates at every layer, ensuring AI outputs are traceable, explainable, and compliant with enterprise data policies.

How Conversational AI Works on Enterprise Data

1

Governed Data Products: CRM and enterprise data flows through Snowflake via PySpark pipelines — modelled, documented, and quality-checked before AI ever touches it.

2

Semantic Layer: A metadata-enriched semantic model sits on top of Snowflake, translating business questions into structured queries without hallucination risk.

3

RAG Pipeline: The LLM retrieves relevant, real-time context from the governed data layer — grounding every response in actual Ericsson data, not training memory.

4

Natural Language Interface: Business leaders type questions — "What are the top 5 regions by churn risk this quarter?" — and get instant, accurate, explainable answers.

5

Governance & Auditability: Every query, every response is logged, traceable, and governed — so the AI is not just smart, it's trusted and enterprise-grade.


Tech Stack

Snowflake RAG / LLMs Vector Databases PySpark Apache Airflow AWS Semantic Layer NLP / Conversational AI Data Products Embeddings

11 Years, One Direction

From ETL developer to Lead Data & AI Architect at a global tech company — a consistent decade of growing impact, deeper complexity, and higher stakes.

2025 – Present

Lead Data & AI Architect

Ericsson Global Chief Data & AI Office · Bengaluru

This is the pinnacle of my Ericsson journey — operating at the intersection of data engineering, cloud architecture, and Generative AI as the sole architecture driver for some of the most strategic data programmes in the organisation. I am currently architecting the Ericsson Federated Data Lake (EFDL), a multi-zone enterprise lakehouse using Apache Iceberg, AWS Lake Formation, and Snowflake that brings together siloed business domains into one governed, AI-ready data platform. In parallel, I lead the AI-Ready Data Products programme — designing the full pipeline from CRM ingestion to a conversational AI interface that allows business leaders to query enterprise data in plain English. I also own Ericsson's GenAI RAG strategy for the Data Office, setting the standards for chunking, embeddings, vector storage, retrieval frameworks, and LLM governance — making Ericsson's knowledge base accessible, scalable, and enterprise-grade.

  • Solo architecture driver for the Ericsson Federated Data Lake (EFDL) — multi-zone lakehouse with ACID, time-travel, CDC, and automated lineage
  • End-to-end ownership of AI-Ready Data Products across multiple CRM systems with conversational AI on top
  • Defining enterprise GenAI RAG strategy — chunking, embeddings, vector stores, and LLM governance at scale

Sep 2023 – Mar 2025

Data & Analytics Architect

Ericsson · Bengaluru

Stepping into an architect role for the first time at Ericsson, I took ownership of the Data Analytics Platform (DAP) — designing and maintaining the data pipeline infrastructure that moved data from a wide range of non-SAP source systems into Snowflake. The work was complex because Ericsson's source landscape is highly fragmented — each business system has its own schema, latency, and access pattern. I led proof-of-concept assessments to evaluate source system compatibility with the DAP framework, drove architectural changes based on evolving project requirements, and introduced data governance and security controls that brought the platform into compliance with internal and regulatory standards. I also built a monitoring and observability layer in Power BI to give the team real-time visibility into pipeline health, while simultaneously cutting cloud data costs through targeted storage and compute optimisation.

  • Designed and maintained robust pipelines from non-SAP sources to Snowflake landing zones
  • Led PoC assessments and architectural governance across the DAP framework
  • Built Power BI monitoring dashboards; significantly reduced data storage and processing costs

May 2022 – Aug 2023

Lead Data Engineer

Ericsson · Bengaluru

My entry into Ericsson came with immediate leadership responsibility — managing a team of 7 engineers through one of the most technically demanding migrations the team had undertaken: moving the entire data platform from MapR to AWS. MapR was reaching end-of-life, and the migration had to be done without disrupting live reporting for a global business. I was responsible for the end-to-end PySpark ingestion pipeline architecture, ensuring data flows were rebuilt on AWS with improved reliability, observability, and performance. Beyond the migration, I built a suite of Tableau dashboards sourcing from AWS Athena and SAP HANA — giving business stakeholders self-serve access to operational metrics. It was here I first learned how to balance technical depth with stakeholder management, a skill I carry into every project today.

  • Led a team of 7 engineers through a full-scale migration from MapR to AWS
  • Architected end-to-end PySpark data ingestion pipelines on AWS
  • Built Tableau dashboards using AWS Athena and SAP HANA for business reporting

May 2021 – May 2022

Data Engineer III

Walmart Global Tech · India

At Walmart Global Tech, I worked on the data infrastructure powering Walmart's online delivery platform — a high-throughput system where real-time data accuracy directly impacts millions of deliveries. My primary focus was building and maintaining delivery performance metrics pipelines using Spark, Python, and GCP, processing driver behaviour data to enable automated performance tracking and payments. I engineered ETL pipelines from raw Kafka JSON events — handling the messiness of streaming data at retail scale — and implemented Slowly Changing Dimension Type 2 (SCD2) logic to maintain accurate historical records of store openings and closings. This role deepened my command of distributed computing and event-driven architectures in a high-stakes, high-volume production environment.

  • Built delivery metrics pipelines on Spark, Python, and GCP for Walmart's last-mile platform
  • Processed Kafka event streams from raw JSON into structured analytical datasets
  • Implemented SCD2 logic for accurate historical tracking of store operational data

Dec 2020 – May 2021

Data Engineer

TCS · India

At TCS, I was embedded within a financial services engagement, working on a data warehouse modernisation project for a leading Indian bank. The bank was migrating its core data infrastructure from Teradata to MapR — a significant architectural shift that required converting years of Teradata SQL and ETL logic into Spark-based pipelines for Hive. I was responsible for converting these legacy Teradata scripts into Spark ETL jobs and maintaining the reporting layers for card transaction data — ensuring business-critical dashboards stayed accurate throughout the migration. This role gave me a deep grounding in data warehouse patterns, migration methodology, and the nuances of financial data at scale.

  • Led ETL migration for a major bank from Teradata to MapR/Hive using Spark
  • Converted legacy Teradata scripts into scalable Spark ETL pipelines
  • Maintained reporting layers for card transaction data throughout migration

Aug 2018 – Dec 2020

Big Data Consultant

Deloitte · India

Deloitte was where I first encountered the full complexity of enterprise data at consulting scale — working across industries and client environments with real accountability. My most significant project was building a real-time ETL pipeline on AWS Glue using PySpark for an insurance client's telematics data — processing live driver behaviour signals (speed, braking, route patterns) to power a metadata-driven rules engine for premium pricing. I also worked on a GDPR compliance initiative, designing and implementing data masking logic across a large data warehouse to ensure sensitive customer records were properly anonymised before reaching analytics layers. This was my first deep exposure to cloud-native data engineering, privacy regulation, and the responsibility that comes with handling sensitive consumer data at scale.

  • Built real-time AWS Glue / PySpark pipeline processing insurance telematics data at scale
  • Implemented metadata-driven telematics rules engine for dynamic pricing logic
  • Designed GDPR-compliant data masking across the enterprise data warehouse

Dec 2017 – Aug 2018

ETL Developer

GSPANN Technologies · India

At GSPANN, I was part of a data warehouse modernisation engagement for a leading contract manufacturer undergoing a major infrastructure overhaul. The project involved migrating the organisation's entire analytical data warehouse from Teradata to Cloudera HDFS — re-engineering ETL workflows, migrating key tables, and materialising critical views that the business used for production metrics analysis. This role was foundational: it gave me my first taste of big data platforms, hands-on Hadoop ecosystem experience, and the rigour of migrating production-grade analytical systems without disrupting business operations.

  • Migrated data warehouse from Teradata to Cloudera HDFS for a global contract manufacturer
  • Materialised key analytical tables powering production performance metrics

May 2015 – Nov 2017

Senior Systems Engineer

Infosys · India

Infosys was where the journey began — my first role out of university, and the place that instilled the discipline, process rigour, and technical foundation I've built everything on since. I worked on a large-scale data warehouse migration from Teradata to Cloudera HDFS, gaining hands-on experience with ETL processes, data modelling, and the operational demands of enterprise data systems. Working within Infosys's structured delivery framework taught me how to operate in complex, multi-team environments, meet exacting quality standards, and understand data as a business-critical asset — not just a technical artefact. Every senior role I've held since traces its roots back to the fundamentals I built here.

  • Migrated enterprise data warehouse from Teradata to Cloudera HDFS
  • Maintained ETL processes and materialized views for business reporting

Impact That Speaks For Itself

Milestones that define a career built on bold decisions and measurable outcomes.

💰

65M SEK in Cost Savings

Led the migration from AWS EMR and SAP to Snowflake, delivering a streamlined architecture that saved Ericsson 65 million SEK — one of the most impactful infrastructure transformations in the organization.

🏆

CEO Recognition — LTVP Award

Selected as a Key Contributor in Ericsson's 2025 Long-Term Variable Pay (LTVP) program by the CEO — a recognition for individuals embodying role model behavior and long-term strategic impact.

❄️

Snowflake Data Superhero 2024 & 2025

Recognized two years in a row by Snowflake for outstanding community contributions, technical depth, and thought leadership. One of a very select group honored with this distinction globally.

🚀

Double Promotion at Ericsson

Promoted twice in quick succession at Ericsson — from Lead Data Engineer to Analytics Architect to Lead Data Architect — reflecting consistent delivery, leadership, and the trust of senior stakeholders.

Awards & Honours

Recognition from the highest levels — a CEO award, global community honours, and consistent career acceleration.

🏆

CEO Award

Selected as a Key Contributor by Ericsson's CEO — reserved for individuals embodying long-term strategic thinking and role model behaviour.

Ericsson · 2025
❄️

Snowflake Data Superhero

Recognized two consecutive years by Snowflake for outstanding technical contributions and global community leadership.

2024 & 2025
🚀

Double Promotion

Promoted twice in quick succession at Ericsson — from Lead Data Engineer → Analytics Architect → Lead Data & AI Architect.

Ericsson · 2022–2025

Peers & Managers Recommend

Unsolicited recommendations from people who worked alongside me.

LinkedIn Recommendation
LinkedIn Recommendation
LinkedIn Recommendation

Certifications

Continuous learning as a commitment, not a checkbox.

🤖

Advanced Certification in Generative AI

Upgrad · 10-Month Program

❄️

SnowPro Core Certified

Snowflake

☁️

AWS Certified Data Engineer

Amazon Web Services

🦸

Snowflake Data Superhero

2024 & 2025 · Two Consecutive Years

Academic Foundation

Master of Science in Data Science & Engineering

Birla Institute of Technology and Science, Pilani

Mar 2021 – Feb 2023

Bachelor of Engineering

Sathyabama University, Chennai

Aug 2011 – Jul 2015

Let's Talk Data & AI

Whether you're a business leader looking for a data strategy partner, a recruiter working on an exciting opportunity, or someone who wants to discuss the future of AI-ready data infrastructure — I'd love to hear from you.