
| AWS Service / Technology | Role |
|---|---|
| Amazon S3 | Acts as a data lake for storing raw input datasets and processed ETL outputs. |
| Amazon EMR (Hadoop MapReduce) | Executes cluster-based ETL pipelines for batch processing and benchmarking. |
| AWS Glue (ETL + Data Catalog + PySpark) | Provides serverless ETL orchestration and metadata management for dataset schema. |
| AWS Lambda | Handles workflow orchestration by triggering ETL jobs in EMR or Glue. |
| Amazon CloudWatch | Monitors performance metrics, job execution logs, and resource utilization. |
| Amazon Athena | Performs SQL queries on benchmark logs stored in S3 for performance evaluation. |
| Amazon QuickSight | Builds dashboards & visualizations for cost vs. performance analysis. |