Job Description
Key Responsibilities
-
Build and deploy ML models for forecasting, segmentation, pricing, and inventory optimization.
-
Design scalable feature engineering pipelines to boost model effectiveness.
-
Use PySpark to handle and analyze massive datasets in a distributed environment.
-
Partner with data engineers to build reliable and scalable data pipelines.
-
Apply MLOps best practices for deploying, monitoring, and maintaining models in production.
-
Develop ETL workflows to prepare and transform structured and unstructured data.
-
Present data-driven insights to stakeholders in a clear, impactful way.
-
Stay informed on emerging trends in AI, machine learning, and big data technologies.
Required Qualifications
-
5+ years in data science or applied machine learning.
-
Proficiency in Python and libraries like pandas, NumPy, scikit-learn, TensorFlow, or PyTorch.
-
Strong experience with PySpark and SQL for big data analytics.
-
Familiarity with cloud platforms (AWS, GCP, Azure) and distributed computing.
-
Knowledge of MLOps, including model lifecycle management, CI/CD, and automation.
-
Experience with ETL processes and building robust data pipelines.
-
Solid grasp of statistical modeling, time-series forecasting, and clustering techniques.
-
Strong communication skills for translating insights into business actions.
-
Experience with tools like Databricks, Airflow, MLflow, and containerization platforms (Docker, Kubernetes).