Logo

Mudasser Shaik

LinkedIn | GitHub | Medium

I am a Principal BigData Engineer leading UKG.inc engineering team that develops highly scalable and fault-tolerant data ingestion and analytics platform.

I am currently a PhD Candidate at University of Arkansas at Little Rock with a focus on Data Quality, Data profiling and Deep Learning research.

AWS Certified Big Data - Specialty Confluent Certified Developer for Apache Kafka

Portfolio


Data Science

End-End Distributed ML LifeCycle

Open Notebook View on GitHub

As an instructor, I designed and taught the curriculum for End to End Distributed Machine Learning workflow at Magnimind Academy. From created a python - Data crawler application that extracts, cleans and store the Twitter data into MongoDB to train and deploy a NLP classification Model to Docker.
  1. Data ingestion and Preprocessing using Python - Twitter Extractor
  2. Feature Engineering using PySpark-ML
  3. Model Training and Evaluation
  4. ML tracking using Apache MLflow
  5. Model Serialization using Apache Mleap
  6. Model Packaging and Deployment to Docker
  7. Schedule the training pipeline using Airflow
In this architecture diagram the Data ingestion and ML Training is deployed on Cloud AWS and Databricks. These services interact with each otheron cloud to make a common End to End Distributed ML workflow. For deploying the ML model, we are using the Low latency Predictions Approach (non-Spark) using Apache Mleap and Docker. Note: In our Session we use Community Databricks Edition which spins up One Spot-EC2 instance.

Spark Crash Course

Open Web App Open Notebook View on GitHub

After my team preprocessed a dataset of 10K credit applications and built machine learning models to predict credit default risk, I built an interactive user interface with Streamlit and hosted the web app on Heroku server.




Project 2 Title


Project 3 Title


Category Name 2


Ā© 2020 Shaik Mudasser. Powered by Jekyll and the Minimal Theme.