Cloud DataProc:
- Managed Spark and Hadoop service used for batch processing for AI or ML.
- Spark, HIVE, Hadoop, Pig etc are all supported
- Uses VMs
- Multi cluster mode where we can have multiple masters (upto 3)
- For simple data pipelines without clusters one can use DataFlow.
- Server less hence no clusters management
- For ETL (Extract/Transform/Load) we can use
- Data Prep for simple clean and load (intelligent service)
- Data Flow - Little more complex pipelines
- Data Proc - For very complex processing
- To visualize data in BigQuery - use data studio or Looker
- Visualize your data pipelines - Cloud Data Fusion
For Streaming data?
- Cloud Pub/Sub > Data Flow > BigQuery or BigTable
For IOT?
- Cloud IOT Core > Cloud Pub/Sub > Data Flow > BigQuery or BigTable or Data Store
For Complex Big Data solutions (Data Lake)?
- Data Ingestion
- Cloud Pub/Sub + Data Flow
- Processing and Analytics
- BigQuery (SQL) or Data Proc (Hadoop cluster)
- Data Mining
- Data Prep
REST API Management
- APIGEE
- API Management Platform
- For Cloud/On-Premise or Hybrid
- Provides Cloud Endpoints as well
- API Gateway
- Simpler than APIGEE and newer
- Relatively simple to setup than APIGEE
No comments:
Post a Comment