Cloud DataProc:
- Managed Spark and Hadoop service used for batch processing for AI or ML.
 - Spark, HIVE, Hadoop, Pig etc are all supported
 - Uses VMs
 - Multi cluster mode where we can have multiple masters (upto 3)
 - For simple data pipelines without clusters one can use DataFlow.
 - Server less hence no clusters management
 - For ETL (Extract/Transform/Load) we can use
 - Data Prep for simple clean and load (intelligent service)
 - Data Flow - Little more complex pipelines
 - Data Proc - For very complex processing
 - To visualize data in BigQuery - use data studio or Looker
 - Visualize your data pipelines - Cloud Data Fusion
 
For Streaming data?
- Cloud Pub/Sub > Data Flow > BigQuery or BigTable
 
For IOT?
- Cloud IOT Core > Cloud Pub/Sub > Data Flow > BigQuery or BigTable or Data Store
 
For Complex Big Data solutions (Data Lake)?
- Data Ingestion
 - Cloud Pub/Sub + Data Flow
 - Processing and Analytics
 - BigQuery (SQL) or Data Proc (Hadoop cluster)
 - Data Mining
 - Data Prep
 
REST API Management
- APIGEE
 - API Management Platform
 - For Cloud/On-Premise or Hybrid
 - Provides Cloud Endpoints as well
 - API Gateway
 - Simpler than APIGEE and newer
 - Relatively simple to setup than APIGEE
 
No comments:
Post a Comment