Arihant Shashank — Lead Data & AI Architect

11+

Years of Experience

About Me

Building the Data Foundation for the AI Era

I'm a Lead Data & AI Architect at Ericsson's Global Chief Data & AI Office, with over 11 years of experience turning raw, complex data into strategic enterprise assets. My work sits at the intersection of data engineering, cloud architecture, and Generative AI.

From migrating legacy systems to Snowflake and saving 65M SEK, to defining Ericsson's GenAI RAG strategy and building AI-ready data products — I focus on outcomes that leadership can see and measure.

I've been recognized by Ericsson's CEO through the Long-Term Variable Pay (LTVP) program — a recognition reserved for the top contributors who embody long-term strategic thinking. I've also been named a Snowflake Data Superhero two years running (2024 & 2025).

Data Architecture Snowflake AWS PySpark / Spark Generative AI RAG (Retrieval-Augmented Generation) LLM Integration AI Agent Creation Agentic AI Pipelines Vector Databases LoRA / Fine-tuning MCP (Model Context Protocol) Embeddings & Semantic Search Apache Iceberg Apache Airflow Kafka Python / Scala AWS Lake Formation AWS Glue AWS Athena Tableau Power BI Data Governance Lakehouse Design CDC Pipelines Data Lineage Data Products GCP Teradata Databricks

My Time Allocation

How I Spend My Week

50%

Data Architecture & Engineering

Designing lakehouses, data pipelines, federated architectures, governance frameworks, and scalable data products end-to-end.

25%

AI & GenAI Strategy

RAG pipelines, LLM integration, vector stores, AI agent design, GenAI platform strategy, and AI governance at enterprise scale.

15%

Leadership & Team Management

Leading engineers, owning delivery, architecture reviews, mentoring, and driving technical decisions across cross-functional teams.

10%

Stakeholder Management & Strategy

Executive presentations, CDO-level briefings, cross-org alignment, roadmap planning, and shaping long-term data & AI strategy.

Flagship Work

Projects That Define My Career

Not just roles — real, complex, enterprise-scale programmes I own end-to-end at Ericsson.

🏗️

Ericsson · Global · 2024–Present

Ericsson Federated Data Lake (EFDL)

Solo Architecture Driver

The Ericsson Federated Data Lake (EFDL) is one of the most ambitious data infrastructure programmes at Ericsson's Global Chief Data & AI Office — and I am its sole architecture driver. From the ground up, I designed and continue to engineer a multi-zone, federated lakehouse that unifies data across Ericsson's siloed business domains into a single, governed, and AI-ready data platform.

The challenge was significant: Ericsson operates across dozens of business units globally, each with its own data sources, formats, and governance requirements. The EFDL had to bring all of this together without creating a monolithic bottleneck — hence the federated approach, where each domain retains ownership while consuming and publishing to a shared, standardised lakehouse layer.

▸ Designed a multi-zone architecture (Raw → Cleansed → Curated → Consumption) using Apache Iceberg as the open table format, AWS Lake Formation for access control, and Snowflake as the serving layer.
▸ Built PySpark-based ingestion pipelines with dynamic schema handling and Change Data Capture (CDC) support — so the lake stays current as source systems evolve, with zero manual intervention.
▸ Engineered ACID transaction support and time-travel capabilities — enabling rollback, historical analysis, and audit trails at any point in time across all datasets.
▸ Integrated REST APIs and batch sources into a unified pipeline fabric with automated data lineage tracking and data quality checks built into every ingestion layer.
▸ Established S3-backed metadata layers to ensure the lake is self-documenting, reducing time-to-insight for downstream data consumers and AI teams by orders of magnitude.

How the EFDL Works

Ingest: PySpark pipelines pull data from CRM, SAP, REST APIs, and batch files with dynamic schema detection and CDC support.

Govern: AWS Lake Formation enforces row/column-level security per domain. Every dataset is tagged, classified, and lineage-tracked automatically.

Store: Apache Iceberg tables on S3 provide ACID guarantees, time-travel, and partition evolution — so data is always consistent and queryable historically.

Serve: Snowflake sits at the consumption layer — giving BI teams, data scientists, and AI pipelines fast, governed access to clean, curated data.

Federate: Each business domain owns its data product. The EFDL provides the rails — not a data silo — enabling enterprise-wide AI at scale.

Tech Stack

Apache Iceberg AWS Lake Formation Snowflake PySpark AWS S3 Apache Airflow CDC Pipelines Data Lineage REST API Integration Data Governance

🤖

Ericsson · Global CRM · 2024–Present

AI-Ready Data Products & Conversational Enterprise AI

Lead Architect

As the Lead Data & AI Architect, I own the end-to-end architecture of Ericsson's AI-Ready Data Products programme — a strategic initiative to transform raw, ungoverned enterprise data into structured, governed, and AI-consumable data products that power both traditional analytics and next-generation GenAI applications.

The crown jewel of this work is the Conversational AI interface I built on top of these governed data products. Instead of writing SQL or opening dashboards, business leaders and decision-makers can now ask questions in plain English — and receive precise, data-backed answers drawn from Ericsson's live enterprise data. Think of it as a natural language BI layer, governed at the source, and trusted at every step.

▸ Designed the full data product architecture across multiple CRM systems — from raw ingestion through Snowflake modelling, to semantic governance layers ready for AI consumption.
▸ Built automated ingestion pipelines using PySpark, Apache Airflow, and AWS — supporting both real-time updates and batch loads with SLA-driven reliability.
▸ Architected a Conversational AI layer using RAG (Retrieval-Augmented Generation) — connecting LLMs to governed Snowflake data products so responses are always grounded in current, trusted enterprise data.
▸ Enabled natural language access for analytics and decision-making — reducing time from question to insight from hours to seconds for business stakeholders.
▸ Established data governance and quality gates at every layer, ensuring AI outputs are traceable, explainable, and compliant with enterprise data policies.

How Conversational AI Works on Enterprise Data

Governed Data Products: CRM and enterprise data flows through Snowflake via PySpark pipelines — modelled, documented, and quality-checked before AI ever touches it.

Semantic Layer: A metadata-enriched semantic model sits on top of Snowflake, translating business questions into structured queries without hallucination risk.

RAG Pipeline: The LLM retrieves relevant, real-time context from the governed data layer — grounding every response in actual Ericsson data, not training memory.

Natural Language Interface: Business leaders type questions — "What are the top 5 regions by churn risk this quarter?" — and get instant, accurate, explainable answers.

Governance & Auditability: Every query, every response is logged, traceable, and governed — so the AI is not just smart, it's trusted and enterprise-grade.

Tech Stack

Snowflake RAG / LLMs Vector Databases PySpark Apache Airflow AWS Semantic Layer NLP / Conversational AI Data Products Embeddings

📊

Ericsson · Enterprise AI Intelligence · 2024–Present

AI Aggregate — GenAI Usage Intelligence Platform

Data Product + Conversational AI

Ericsson runs a diverse portfolio of enterprise GenAI applications — Glean (enterprise search AI), Amazon Q (AWS-native AI assistant), Ericsson Playground (an internal ChatGPT-equivalent where employees can select and converse with any LLM), Sigma (AI capabilities from Azure AI Foundry), and several others. Each of these applications generates rich usage telemetry: who is using what model, how frequently, how many tokens consumed, which teams, which geographies.

The problem: all of this data was siloed. Each application had its own native database and storage system. The API Gateway captured aggregate signals, but nobody had a unified view of GenAI adoption across the organisation. Leadership had no way to answer the most critical questions about their AI investment. I built the solution.

The AI Aggregate project collates usage data from every GenAI application at Ericsson into a unified intelligence platform. By integrating with SuccessFactors (Ericsson's HR system), we enriched raw usage metrics with organisational context — mapping every user action to their team, function, country, and reporting line. This turned raw telemetry into actionable people-and-AI intelligence at enterprise scale.

▸ Ingested usage telemetry from all GenAI applications (Glean, Amazon Q, Playground, Sigma, and others) via API Gateway into a centralised data pipeline — unifying previously siloed, app-native data stores.
▸ Integrated with SAP SuccessFactors to enrich usage data with HR/org data — enabling model usage to be analysed by team, business unit, country, seniority, and reporting hierarchy.
▸ Created a consolidated data product spanning all GenAI applications — providing a single source of truth for enterprise-wide AI adoption — alongside individual data products per application for deep-dive analysis.
▸ Built a Conversational AI interface on top of governed data products — allowing line managers and leadership to ask natural-language questions and receive instant, data-backed answers.
▸ Enabled real-time monitoring of token consumption, model preferences, usage frequency, geographic distribution, and team-level AI adoption — giving the CDO Office a live pulse on Ericsson's AI maturity.

Questions Leaders Can Now Answer Instantly

"Which LLM model is most popular across Ericsson?" — Real-time breakdown of model usage (GPT-4, Claude, Llama, etc.) across all Playground and Sigma sessions.

"Which country is most AI-savvy?" — Usage intensity per geography, enriched with org size from SuccessFactors to produce per-capita AI adoption scores.

"Which team has consumed the most tokens this month?" — Cost attribution and token burn by team, BU, and reporting line — critical for AI budget governance.

"How many employees are actively using GenAI tools?" — DAU/MAU-style engagement metrics per application, segmented by org level and function.

"Which business units need more AI enablement?" — Adoption gap analysis by comparing usage intensity against team size — surfacing where AI upskilling is needed most.

Applications Integrated

Glean Amazon Q Ericsson Playground Azure AI Foundry (Sigma) SAP SuccessFactors API Gateway

Tech Stack

Snowflake Databricks PySpark Apache Airflow RAG / LLMs Conversational AI Data Products AWS Azure REST APIs

Career Journey

11 Years, One Direction

From ETL developer to Lead Data & AI Architect at a global tech company — a consistent decade of growing impact, deeper complexity, and higher stakes.

2025 – Present

Lead Data & AI Architect

Ericsson Global Chief Data & AI Office · Bengaluru

This is the pinnacle of my Ericsson journey — operating at the intersection of data engineering, cloud architecture, and Generative AI as the sole architecture driver for some of the most strategic data programmes in the organisation. I am currently architecting the Ericsson Federated Data Lake (EFDL), a multi-zone enterprise lakehouse using Apache Iceberg, AWS Lake Formation, and Snowflake that brings together siloed business domains into one governed, AI-ready data platform. In parallel, I lead the AI-Ready Data Products programme — designing the full pipeline from CRM ingestion to a conversational AI interface that allows business leaders to query enterprise data in plain English. I also own the AI Aggregate platform — a unified intelligence system that collates usage telemetry from all of Ericsson's enterprise GenAI applications (Glean, Amazon Q, Ericsson Playground, Sigma) and enriches it with HR data from SuccessFactors, giving leadership a live, organisation-wide view of AI adoption and token consumption. And I own Ericsson's GenAI RAG strategy for the Data Office, setting the standards for chunking, embeddings, vector storage, retrieval frameworks, and LLM governance — making Ericsson's knowledge base accessible, scalable, and enterprise-grade.

Solo architecture driver for the Ericsson Federated Data Lake (EFDL) — multi-zone lakehouse with ACID, time-travel, CDC, and automated lineage
End-to-end ownership of AI-Ready Data Products across multiple CRM systems with conversational AI on top
Architected the AI Aggregate platform — unified GenAI usage intelligence across all enterprise AI applications, enriched with SuccessFactors org data
Defining enterprise GenAI RAG strategy — chunking, embeddings, vector stores, and LLM governance at scale

Sep 2023 – Mar 2025

Data & Analytics Architect

Ericsson · Bengaluru

Stepping into an architect role for the first time at Ericsson, I took ownership of the Data Analytics Platform (DAP) — designing and maintaining the data pipeline infrastructure that moved data from a wide range of non-SAP source systems into Snowflake. The work was complex because Ericsson's source landscape is highly fragmented — each business system has its own schema, latency, and access pattern. I led proof-of-concept assessments to evaluate source system compatibility with the DAP framework, drove architectural changes based on evolving project requirements, and introduced data governance and security controls that brought the platform into compliance with internal and regulatory standards. I also built a monitoring and observability layer in Power BI to give the team real-time visibility into pipeline health, while simultaneously cutting cloud data costs through targeted storage and compute optimisation.

Designed and maintained robust pipelines from non-SAP sources to Snowflake landing zones
Led PoC assessments and architectural governance across the DAP framework
Built Power BI monitoring dashboards; significantly reduced data storage and processing costs

May 2022 – Aug 2023

Lead Data Engineer

Ericsson · Bengaluru

My entry into Ericsson came with immediate leadership responsibility — managing a team of 7 engineers through one of the most technically demanding migrations the team had undertaken: moving the entire data platform from MapR to AWS. MapR was reaching end-of-life, and the migration had to be done without disrupting live reporting for a global business. I was responsible for the end-to-end PySpark ingestion pipeline architecture, ensuring data flows were rebuilt on AWS with improved reliability, observability, and performance. Beyond the migration, I built a suite of Tableau dashboards sourcing from AWS Athena and SAP HANA — giving business stakeholders self-serve access to operational metrics. It was here I first learned how to balance technical depth with stakeholder management, a skill I carry into every project today.

Led a team of 7 engineers through a full-scale migration from MapR to AWS
Architected end-to-end PySpark data ingestion pipelines on AWS
Built Tableau dashboards using AWS Athena and SAP HANA for business reporting

May 2021 – May 2022

Data Engineer III

Walmart Global Tech · India

At Walmart Global Tech, I worked on the data infrastructure powering Walmart's online delivery platform — a high-throughput system where real-time data accuracy directly impacts millions of deliveries. My primary focus was building and maintaining delivery performance metrics pipelines using Spark, Python, and GCP, processing driver behaviour data to enable automated performance tracking and payments. I engineered ETL pipelines from raw Kafka JSON events — handling the messiness of streaming data at retail scale — and implemented Slowly Changing Dimension Type 2 (SCD2) logic to maintain accurate historical records of store openings and closings. This role deepened my command of distributed computing and event-driven architectures in a high-stakes, high-volume production environment.

Built delivery metrics pipelines on Spark, Python, and GCP for Walmart's last-mile platform
Processed Kafka event streams from raw JSON into structured analytical datasets
Implemented SCD2 logic for accurate historical tracking of store operational data

Dec 2020 – May 2021

Data Engineer

TCS · India

At TCS, I was embedded within a financial services engagement, working on a data warehouse modernisation project for a leading Indian bank. The bank was migrating its core data infrastructure from Teradata to MapR — a significant architectural shift that required converting years of Teradata SQL and ETL logic into Spark-based pipelines for Hive. I was responsible for converting these legacy Teradata scripts into Spark ETL jobs and maintaining the reporting layers for card transaction data — ensuring business-critical dashboards stayed accurate throughout the migration. This role gave me a deep grounding in data warehouse patterns, migration methodology, and the nuances of financial data at scale.

Led ETL migration for a major bank from Teradata to MapR/Hive using Spark
Converted legacy Teradata scripts into scalable Spark ETL pipelines
Maintained reporting layers for card transaction data throughout migration

Aug 2018 – Dec 2020

Big Data Consultant

Deloitte · India

Deloitte was where I first encountered the full complexity of enterprise data at consulting scale — working across industries and client environments with real accountability. My most significant project was building a real-time ETL pipeline on AWS Glue using PySpark for an insurance client's telematics data — processing live driver behaviour signals (speed, braking, route patterns) to power a metadata-driven rules engine for premium pricing. I also worked on a GDPR compliance initiative, designing and implementing data masking logic across a large data warehouse to ensure sensitive customer records were properly anonymised before reaching analytics layers. This was my first deep exposure to cloud-native data engineering, privacy regulation, and the responsibility that comes with handling sensitive consumer data at scale.

Built real-time AWS Glue / PySpark pipeline processing insurance telematics data at scale
Implemented metadata-driven telematics rules engine for dynamic pricing logic
Designed GDPR-compliant data masking across the enterprise data warehouse

Dec 2017 – Aug 2018

ETL Developer

GSPANN Technologies · India

At GSPANN, I was part of a data warehouse modernisation engagement for a leading contract manufacturer undergoing a major infrastructure overhaul. The project involved migrating the organisation's entire analytical data warehouse from Teradata to Cloudera HDFS — re-engineering ETL workflows, migrating key tables, and materialising critical views that the business used for production metrics analysis. This role was foundational: it gave me my first taste of big data platforms, hands-on Hadoop ecosystem experience, and the rigour of migrating production-grade analytical systems without disrupting business operations.

Migrated data warehouse from Teradata to Cloudera HDFS for a global contract manufacturer
Materialised key analytical tables powering production performance metrics

May 2015 – Nov 2017

Senior Systems Engineer

Infosys · India

Infosys was where the journey began — my first role out of university, and the place that instilled the discipline, process rigour, and technical foundation I've built everything on since. I worked on a large-scale data warehouse migration from Teradata to Cloudera HDFS, gaining hands-on experience with ETL processes, data modelling, and the operational demands of enterprise data systems. Working within Infosys's structured delivery framework taught me how to operate in complex, multi-team environments, meet exacting quality standards, and understand data as a business-critical asset — not just a technical artefact. Every senior role I've held since traces its roots back to the fundamentals I built here.

Migrated enterprise data warehouse from Teradata to Cloudera HDFS
Maintained ETL processes and materialized views for business reporting

Recognition

Awards & Honours

Recognition from the highest levels — a CEO award, global community honours, and consistent career acceleration.

🏆

CEO Award

Selected as a Key Contributor by Ericsson's CEO — reserved for individuals embodying long-term strategic thinking and role model behaviour.

Ericsson · 2025

❄️

Snowflake Data Superhero

Recognized two consecutive years by Snowflake for outstanding technical contributions and global community leadership.

2024 & 2025

🚀

Double Promotion

Promoted twice in quick succession at Ericsson — from Lead Data Engineer → Analytics Architect → Lead Data & AI Architect.

Ericsson · 2022–2025

What Colleagues Say

Arihant's expertise in data engineering, AI, and the Snowflake ecosystem is impressive, and his ability to bring people together — both inside and outside of Ericsson — truly sets him apart.

Jeff Grover

Area One Director · Ericsson

Working with Arihant in itself was a great opportunity. Great leadership skills, in-depth knowledge on Big Data, team spirit, and a calm, composed attitude in case of extremely tight deadlines.

Amolak Singh

Lead Data Engineer · Salesforce

Arihant is the person I turn to for advice in BigData analytics. His responses are timely and value-added. His never-say-die attitude makes him stand out among the rest.

Manish Manohar

Web Development Consultant · Concentrix

Arihant worked on a complex Spark-based analytics implementation and delivered a high quality product with aggressive timelines. A great team player and quick learner who understands business quickly.

Dinesh Ravikumar

Director, PwC AC · Healthcare Analytics Lead (AI/GenAI)

I had good fortune to work with Arihant during my time at Deloitte. He is very talented and sharp, picked up the concepts of Spark very swiftly, and delivered a crucial project mostly by himself — on time.

Debarun Chatterjee

Manager, Big Data & Cloud · EY GDS

Arihant
Shashank

Building the Data Foundation for the AI Era

What I Do

Data Architecture & Lakehouse Engineering

AI-Ready Data Products

GenAI RAG Strategy & Governance