Tuesday, July 26, 2022

Google Cloud - Miscellaneous (Part 2) - BigData related

 Cloud DataProc:

  • Managed Spark and Hadoop service used for batch processing for AI or ML.
  • Spark, HIVE, Hadoop, Pig etc are all supported
  • Uses VMs
  • Multi cluster mode where we can have multiple masters (upto 3)
  • For simple data pipelines without clusters one can use DataFlow.
    • Server less hence no clusters management
  • For ETL (Extract/Transform/Load) we can use
    • Data Prep for simple clean and load (intelligent service)
    • Data Flow - Little more complex pipelines
    • Data Proc - For very complex processing
  • To visualize data in BigQuery - use data studio or Looker
  • Visualize your data pipelines - Cloud Data Fusion
For Streaming data?
  • Cloud Pub/Sub > Data Flow > BigQuery or BigTable
For IOT?
  • Cloud IOT Core > Cloud Pub/Sub > Data Flow > BigQuery or BigTable or Data Store
For Complex Big Data solutions (Data Lake)?
  • Data Ingestion
    • Cloud Pub/Sub + Data Flow
  • Processing and Analytics
    • BigQuery (SQL) or Data Proc (Hadoop cluster)
  • Data Mining
    • Data Prep
REST API Management
  • APIGEE
    • API Management Platform
    • For Cloud/On-Premise or Hybrid
    • Provides Cloud Endpoints as well
  • API Gateway
    • Simpler than APIGEE and newer
    • Relatively simple to setup than APIGEE



No comments:

Post a Comment