Hello there 👋, I'm Sarak Dahal

Aspiring Data Engineer...

Skilled in building scalable data pipelines, ETL processes, and cloud-native data solutions. Eager to contribute to a robust data infrastructure.

About Me

I'm a passionate and detail-oriented Data Engineer currently pursuing my Master's at Regis University (GPA: 3.9). My core expertise lies in designing, building, and optimizing robust data infrastructure. With hands-on experience as a Graduate Researcher, I've honed my skills in engineering end-to-end ETL solutions using Apache Spark and Airflow (cutting processing time by 40%!), solving complex data challenges, and deploying scalable database schemas in Snowflake.

I thrive in collaborative environments and enjoy tackling complex problems using tools like Python, SQL, Apache Spark, and cloud platforms like AWS. My goal is to build reliable and scalable data systems that empower analytics and data-driven decisions.

My Journey

Graduate Researcher (Data Engineering Focus)

Regis University, Denver, CO

Aug 2024 - June 2025
  • Innovated and executed an end-to-end ETL solution for vast Scope 3 emissions datasets (>10M records), utilizing Apache Airflow for workflow management and Apache Spark for distributed computation, cutting processing time by 40%.
  • Solved complex data challenges by developing highly optimized PySpark jobs for sophisticated transformations, rigorous validation, and efficient data loading into Snowflake, significantly boosting downstream analytical query performance.
  • Pioneered the integration of scalable KNN imputation techniques within the Spark framework, enhancing data quality and reliability by 15%.
  • Played a key role in the conceptualization and deployment of a centralized Snowflake database schema, directly improving data integration capabilities and reporting team access.

Vice President

Nepalese Student Association (NSA) - Regis University

2023 - Present

Python Developer

Appharu Pvt Ltd.

2020 - 2022

Freelance Developer

Upwork

2020 - 2022

Data Analyst

World Bank Group (Household Risk and Vulnerability Survey)

2018 - 2019

Technical Skills

Programming & Scripting

PythonSQLShell Scripting

Big Data & Processing

Apache SparkHadoopData WarehousingData ModelingETL/ELTData Ingestion

Orchestration & Workflow

Apache Airflow

Cloud Platforms

AWS (S3, Lambda, EC2)Microsoft Azure

Databases

SnowflakePostgreSQLMySQLMongoDB

DevOps & Tools

GitGitHubDockerJiraLinux

Featured Projects

Cloud-Native Data Processing System

Engineered a serverless, event-driven architecture on AWS to transform large volumes of raw, unstructured data into a queryable, structured (Parquet) format. Leveraged AWS Lambda and S3 to enable near real-time data availability for analysis, reducing costs via a pay-per-use model.

AWS S3AWS LambdaPythonServerlessETLParquet

Automated Web Scraping & Cloud Data Aggregation

Developed a scalable Python framework (Scrapy, BeautifulSoup) to gather data from 20+ dynamic websites. Integrated with AWS S3 for centralized storage and implemented scheduling for automated collection, eliminating manual data retrieval.

PythonScrapyAWS S3REST APIsAutomation

Research & Publications

Dahal, S., Pochampally, A., & Soraf, K. (2024). Predictive Models for Scope 3 Emissions: Improving Accuracy with Machine Learning and Financial Data. Presented at Marketing and Data Sciences, Regis University, Denver, CO, USA.

Academic Foundation

Master of Science, Data Science

Regis University, Denver, Colorado

Aug 2023 - Dec 2025 (Expected)

GPA: 3.9

Relevant Coursework: Cloud Computing, Database Management, Big Data Analytics, Data Engineering with Apache Spark & Hadoop, Data Warehousing & BI, Statistical Modeling, Machine Learning Concepts.

Let's Connect

Interested in collaborating or have a question? I'm always open to discussing new opportunities and innovative ideas in data engineering and cloud data solutions.

Based in: Lakewood, Colorado, USA | Ph: +1-720-319-1164