Monday, October 21, 2019

AWS Database services (Intro and how to create a DB Instance)

Quick introduction to AWS Database services

AWS gives option for the following Database services:

Relational Database service:
MySQL, Aurora DB, PostGreSQL, Maria DB, Oracle and Microsoft SQL Server


DynamoDB
Fast, Fully Managed, NOSQL Database Service.
Link to documentation

Elasti-cache
In memory data store and cache service.
Link to documentation

Neptune
High performance graph database engine optimized for storing billions of relationships and querying the graph.
Link to documentation


How to create a DB instance and connect to it?


  1. Login to AWS Console and services > RDS
  2. Create a database. (I chose MySQL Free Trial)
  3. Put in an instance name and username/password.
  4. Once the database is created and available, one can connect to it using AWS Workbench. Link to documentation
  5. Connecting to MYSQL DB Instance documentation










How to add buckets in S3

How to add buckets in S3?


  1. Login to AWS Console and go to Services > S3
  2. Put a name for the bucket. Click Next.
  3. Click Next
  4. If you need access via URL enable public access. Else click Next to keep it as a VPC (Virtual Private Cloud)
  5. Click "Create Bucket"
  6. View Bucket on console page
  7. You can also delete your bucket by clicking delete (last screenshot)








AWS Storage Services

AWS Storage services is broken down into:

Simple Storage Service (S3) is designed to store and access any type of data over the internet.

Amazon Glacier  is the cheapest service and is used for long term data archiving.

Elastic Block Storage (EBS) is a highly available (low latency) block storage.

Elastic File Storage (EFS) is a network attached file system.

Storage Gateway enables hybrid storage between On Premise and AWS Cloud. It caches the most frequently used data in On Premise and less frequently used data in AWS Cloud.

Snowball is a hardware device used to migrate data. e.g Copy data from On Premise to Cloud.


Tuesday, October 15, 2019

Quick Introduction to AWS


What is AWS?

AWS is a global cloud platform. A hosting provider for one to run their applications on the cloud.

What do they do?

They provide
  • Infrastructure as a service (IAAS). No need to manage backup or power supply to the units.
  • Platform as a service (PAAS). e.g get Java as a service and one doesn’t need to manage the binaries.
  • Software as a service (SAAS). e.g send emails.
  • Cloud storage platform


Advantages?
  • Stable services.
  • Services are billed per hour and for storage per GB.
  • Easy to sign in and start scaling.


List of services provided

EC2 (Elastic Compute cloud) 
  • Bare servers. Run your software on those machines. 
  • Steps:
    • Choose an AMI (Amazon Machine Image) : OS, Software Info, Access Permissions.
    • Can create customized AMIs or choose from a predefined one (AMI marketplace).
    • Choose Instance type (HW) [Compute optimized, Memory optimized, GPU optimized, storage optimized or general purpose]
    • Configure the instance [# of instances, IP, bootstrap scripts etc]
    • Add Storage
    • Configure security groups (configure access to your instance)
    • Launch
    • Select a public/private key (needed for access and security)

VPC (Virtual private cloud)
  • Create networks in your cloud and run your servers on those networks

S3 (Simple Storage Service)
  • File storage and sharing service

RDS (Relational Database Service)
  • Run and manage databases (SQL, MySQL, Oracle etc) on the cloud.

Route 53
  • Global DNS Service. Its a managed DNS service where one can point their DNS to Amazon and they take care of it. 


ELB (Elastic Load Balancer)
  • Load balance incoming traffic to multiple machines.

Autoscaling 
  • Add/Remove capacity on the fly.
  • This should ideally be combined with ELB.


Monday, October 14, 2019

Kubernetes: Features

When we plan to deploy containers on our own, we could run into some of the below mentioned issues:


  • Container communication
  • Deployment
  • Managing a container 
  • Auto scaling
  • Load balancing



Kubernetes helps in resolving the above mentioned issues.


Kubernetes automates container deployment, auto scaling and load balancing.


Some of the features that Kubernetes provides:


  • Communication between containers (each container is assigned an IP and a single DNS name for a set of containers)
  • Automatic packing of applications into containers
  • Scaling (add new or remove containers)
  • Restarts failed containers and can create new containers and nodes as a replacement if there is a crash.
  • Load balancing
  • Allows mounting of storage system
  • Rollout/Rollback (done automatically)


Docker Vs VM

In an earlier article we discussed Kubernetes and introduced Docker.

What is Docker?
As per wiki:

Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.


In simpler terms:

  • An application can have various components like Webserver, database etc where we could encounter an issue related to libraries or dependencies.
  • We can create a container per component and use docker to deploy them over a particular OS.
  • These can then communicate with each other.
  • Docker containers share the same OS kernel.
What is the difference between VMs and Dockers?

  1. Each VM comes with its own OS. Containers have applications (software) deployed for a particular OS kernel.
  2. VMs will be heavy and consume high disk space and compared to a container.
  3. VMs takes time to boot.
  4. VMs have their own OS and hence we can have VMs containing Windows and Linus deployed together. Docker containers share the OS kernel and hence having different OS software is not possible. This is however not a disadvantage since we can have two different deployments communicating with each other.

High Level Introduction to Kubernetes

This article will provide a high level introduction to Kubernetes (for beginners).

What is Kubernetes?

As per wiki definition:


Kubernetes (commonly stylized as k8s) is an open-source container-orchestration system for automating application deployment, scaling, and management. It was originally designed by Google, and is now maintained by the Cloud Native Computing Foundation. It aims to provide a "platform for automating deployment, scaling, and operations of application containers across clusters of hosts". It works with a range of container tools, including Docker.


In simpler terms:
  • It is an open source system for ADSM (Automating deployment, scaling and management).
  • It groups containers that make up an application into logical units for easy management and discovery.
  • Makes it easy to deploy and run applications (in a container)
At a high level, lets try to understand the concepts of Containers and Orchestration.

What are containers?

Traditionally, applications were deployed on physical servers. Running multiple applications caused resource allocation issues.
An application could compete with another for CPU or memory etc in the same physical server.

The solution for the above was to use different standalone physical servers per application.
This was not practical since that would involve many physical servers and some being underutilized.

As an improvement, the next deployment phase was to use VMs (virtual machines).
VMs were deployed on a single server. They had their own memory/CPU assigned and each VM was isolated from the other (from security standpoint).

Containers are similar to VMs. They are light weight since they have relaxed isolation properties to share the OS among applications. 
Like VMs, containers have their own filesystem, CPU and memory etc.
You could read up more on containers and their efficiency.

What is Kubernetes?
In very basic terms:

  • You have a container deployed in production.
  • Containers run applications and you need to ensure that there isn't any downtime.
  • If one containers fails, another should be up and running to distribute the load.
  • This framework - to run distributed load is provided by Kubernetes,
  • Where does Kubernetes help:
    • Load balance and distribute network traffic between containers
    • A new deployment can be as simple as create new containers and remove existing containers. New containers can be thoroughly tested before deployment.
    • Define CPU/Memory per container
    • Restarting stuck containers etc can be automated.
Docker is a very popular tool to deploy applications in a container.

As per wiki:

Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.

Wednesday, October 9, 2019

Statistical Learning (Index Page)

Will keep updating this page with details on statistics, supervised and unsupervised learning, Linear Regression, Classification etc

What is Statistical Learning?

In simple terms, it refers tools that help us to understand data.

These tools are of two type:

Supervised Learning:
Supervised learning is basically creating a model given a set of inputs and a known output.
Once the model is ready, it can be applied to newer sets of inputs.

e.g Image recognition and Speech recognition

Unsupervised Learning:
In Unsupervised learning, we start with a set of inputs and try to understand the pattern.
Mostly used for exploratory analysis.