Thursday, December 19, 2019

Regression and Residual

What is Regression?
Regression line helps us to predict change in Y for a change in X.

From previous example (https://mylearningcafe.blogspot.com/2019/12/correlation.html), we can see if we can determine the value of Y for a value of X.





What is Residual?
Residual tells us the error in the prediction (Actual value - predicted value).
We can see the difference from above known values.



Correlation

What is Correlation?

Correlation shows us the Direction and Strength of a linear relationship shared between 2 quantitative variables.

Its denoted by the equation

where

r = correlation
n = # of data points in a data set
Sx = The standard deviation
Sy = The standard deviation
Xi = The data value
For more details on the mean and standard deviation, refer the following blog post:

Direction is provided by the slope (if we draw a line along the data points)
If the slope is upwards, we deduce that the correlation is positive.
If the slope is downwards, we deduce that the correlation is negative.
Correlation values range from -1 to 1.
A value of 1 indicates perfect positive correlation and a -1 indicates perfect negative correlation.

Correlation is positive


Correlation is negative

Strength of a linear relationship gets stronger as correlation increases from 0 to 1 or from 0 to -1.
Refer pics below.



r = 0


          
        
r = 0.3




r = 0.7

r = 1

Lets look at a calculation for a dataset for "No of hours on a treadmill" vs "Calories burnt"






We can see a near straight line of a positive correlation of 0.969 (very close to a perfect positive correlation).

Friday, December 13, 2019

Mode, Median, Mean, Range and Standard Deviation

Lets try to ascertain differences between Mode, Median, Mean, Range and Standard Deviation.

Lets assume following data set:

50, 20, 100, 150, 20, 60, 20, 15, 35

Mode:
Mode is data that occurs frequently.
From above data set, we can see that 20 occurs thrice and hence Mode for above data set is 20.

Median:
Center point of an ordered data set.
The point to note here is "ordered" data set.

Hence for the above set, lets do the ordering first.

50, 20, 100, 150, 20, 60, 20, 15, 35
becomes
15, 20, 20, 20, 35, 50, 60, 100, 150

How, do we get the median?

Median = (n+1)/2

In the above case, its (9+1)/2 = 5th position which is 35.

How about when we have even numbers in a data set?
In that case, we take the average of the middle two numbers.

Lets add one more element to the above ordered data set.
15, 20, 20, 20, 35, 50, 60, 100, 150, 175

Median will be average of the middle two numbers which is avg of (35 and 50) which is 42.5

Mean:
Mean is the average. [(Sum of all data values)/n]

In this case
50, 20, 100, 150, 20, 60, 20, 15, 35

Mean = (50+20+100+150+20+60+20+15+35)/9 = 470/9 = 52.22

Range:
Range is simply the difference of max vs min.
Hence,

Range = max-min = 150 - 15 = 135

Standard Deviation
Standard deviation ascertains how close to the mean are the values in a data set.

Formula for standard deviation:


How do we calculate this?



Mean in above for difference is 52.22

Standard deviation is 45.69

Small deviation indicates the distribution is less spread and data is close to mean.
Large deviation indicates the distribution is more spread and data is further away from the mean.









Tuesday, December 10, 2019

Web Scraping Introduction

What is Web Scraping?
In simplest terms, as the name suggests, Web Scraping is scraping the data from the Web.

If the web page is correctly marked up, one can extract data using the <p> element where id is the subject and it returns the text.

To get data from a HTML, we can use the BeautifulSoupLibrary which builds a tree out of the various elements in a page. It provides an interface to access these elements.

Pre-requisite:


pip install beautifulsoup4



We shall use requests to get to the html page (via URL) and then use the BeautifulSoup Library function to access the first line (paragraph).

Lets try this with our own website:


from bs4 import BeautifulSoup
import requests

webhtmlpage = requests.get("https://mylearningcafe.blogspot.com/p/welcome_9.html").text;
bsoup = BeautifulSoup(webhtmlpage,'html5lib');

first = bsoup.find('p');
print first;




$ python webscraping.py

<p class="description"><span>The cafe (of learning) never closes <br/><br/> For finance related posts, go to <a href="http://mymoneyrules.blogspot.in/">http://mymoneyrules.blogspot.in/</a></span></p>

To extract the text, if I add:

first_text = bsoup.find('p').text;
print first_text;

I will get

The cafe (of learning) never closes  For finance related posts, go to http://mymoneyrules.blogspot.in/

This is how the data of my website looks like:



Lets get the length of li tags first:

#find count of <li> tags
li_tag = bsoup('li');
print(len(li_tag));

How to extract the Href text?

If we look closely, the main data is in "<div id='adsmiddle24552235005691491924'>"

Thus, we search for the specific div ID and loop through to find the text.
We do get a "None" element as well and it would throw a AttributeError since None object type would have no method. Hence, we would eliminate it using a try except block.

#find count of <li> tags li_tag = bsoup('li'); print(len(li_tag)); div_tag = bsoup.find("div",{"id":"adsmiddle24552235005691491924"}); print(len(div_tag.find_all("li"))); #print(div_tag.find_all("li")) for a_text in div_tag.find_all("li"): try: print(a_text.a.text); except AttributeError: None; #print "skip";

Output:

$ python webscraping.py 
High level introduction to Kubernetes
Docker vs VM
Kubernetes - Features
Quick Introduction to AWS
Definition of various storage services
How to add buckets in S3
AWS Database services (Intro and how to create a DB Instance)
Statistical learning Index Page
Annotate - Lets you create personalized messages over images (meme)
Link Library - A library for all your links
Raspberry Pi Index Page
R programming Index Page
Python Index Page
Alert pop up dialog in android 
Rename package in Android
How to change version of apk for redeployment in Android Studio 
How to change an app's icon in Android Studio 
http://mylearningcafe.blogspot.in/2015/05/garbage-collection-gc.html
http://mylearningcafe.blogspot.in/2014/02/sorting-algorithms-in-java.html
http://mylearningcafe.blogspot.in/2014/02/some-new-features-in-java-7-part-2.html
http://mylearningcafe.blogspot.in/2014/02/some-new-features-in-java-7-part-1.html
http://mylearningcafe.blogspot.in/2014/02/automatic-resource-management-in-java-7.html







Monday, December 2, 2019

Miscellaneous

Index page for Python:
http://mylearningcafe.blogspot.com/2015/08/python-index-page.html

Some miscellaneous stuff:

Sorting


>>> list_of_items = [1,100,200,3,100,200,5]
>>> sorted_list_of_items = sorted(list_of_items)
>>> list_of_items
[1, 100, 200, 3, 100, 200, 5]
>>> sorted_list_of_items
[1, 3, 5, 100, 100, 200, 200]

Reverse Sorting

>>> sorted_list_of_items_backwards = sorted(list_of_items,key=abs,reverse=True)

>>> sorted_list_of_items_backwards
[200, 200, 100, 100, 5, 3, 1]

Get even # list

>>> list_of_items = [1,100,200,3,100,200,5]
>>> list_of_items
[1, 100, 200, 3, 100, 200, 5]

>>> even_number_list = [x for x in list_of_items if x % 2 == 0]

>>> even_number_list
[100, 200, 100, 200]

Randomly shuffle data

>>> import random
>>> list_of_items = [1,100,200,3,100,200,5]
>>> 
>>> list_of_items
[1, 100, 200, 3, 100, 200, 5]

>>> random.shuffle(list_of_items)

>>> list_of_items
[3, 1, 5, 100, 100, 200, 200]

Dictionaries and Sets

Index page for Python:
http://mylearningcafe.blogspot.com/2015/08/python-index-page.html


Dictionaries in python are data structures which associate keys with values.


>>> empl_name_id_dict = {"Nitin":1,"Jonathan":2,"Brien":3}


>>> briens_details = empl_name_id_dict["Brien"]
>>> briens_details
3

>>> briens_details = empl_name_id_dict["brien"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

KeyError: 'brien'

Keys are case sensitive.

Get method helps to get a default value for a non-existing key rather than an exception

>>> briens_details = empl_name_id_dict.get("brien")
>>> briens_details
>>> 

>>> briens_details = empl_name_id_dict.get("Brien")
>>> briens_details
3

//Adding a new element
>>> empl_name_id_dict ["Matt"] = 4
>>> empl_name_id_dict
{'Nitin': 1, 'Matt': 4, 'Jonathan': 2, 'Brien': 3}

//Get the list of keys or values

>>> list_of_keys = empl_name_id_dict.keys()
>>> list_of_keys
['Nitin', 'Matt', 'Jonathan', 'Brien']
>>> 
>>> list_of_values = empl_name_id_dict.values()
>>> list_of_values
[1, 4, 2, 3]

>>> list_of_items = empl_name_id_dict.items()
>>> list_of_items
[('Nitin', 1), ('Matt', 4), ('Jonathan', 2), ('Brien', 3)]


Sets is a data structure that represents a collection of distinct elements.

>>> list_of_items = [1,100,200,3,100,200,5]
>>> list_of_items
[1, 100, 200, 3, 100, 200, 5]
>>> 
>>> list_of_items_set = set(list_of_items)
>>> list_of_items_set

set([200, 1, 3, 100, 5])

Index page for Python:

Lists in Python

Index page for Python
http://mylearningcafe.blogspot.com/2015/08/python-index-page.html

What is a list?
Simply put, an ordered collection.

>>> list_of_integers = [1,4,5]
>>> len(list_of_integers)
3
>>> sum(list_of_integers)
10


How to get to an element in the list:

>>> list_of_integers[1]
4

Lists index starts from 0.

>>> list_of_integers[0]
1

Slice lists:

Slice lists using square brackets

>>> first_two_elements = list_of_integers[:2]
>>> first_two_elements
[1, 4]

Lets try some more commands:

>>> lists_of_value = [100,1,50,25,34,67,78,99]
>>> len(lists_of_value)
8

>>> first_4_elements = lists_of_value[:4]
>>> first_4_elements
[100, 1, 50, 25]

>>> last_3_elements = lists_of_value[-3:]
>>> last_3_elements
[67, 78, 99]

>>> copy_of_list_elements = lists_of_value[:]
>>> copy_of_list_elements
[100, 1, 50, 25, 34, 67, 78, 99]

>>> copy_of_list_elements.extend([200,300,400])
>>> copy_of_list_elements
[100, 1, 50, 25, 34, 67, 78, 99, 200, 300, 400]


Index Page for Python:
http://mylearningcafe.blogspot.com/2015/08/python-index-page.html



Monday, October 21, 2019

AWS Database services (Intro and how to create a DB Instance)

Quick introduction to AWS Database services

AWS gives option for the following Database services:

Relational Database service:
MySQL, Aurora DB, PostGreSQL, Maria DB, Oracle and Microsoft SQL Server


DynamoDB
Fast, Fully Managed, NOSQL Database Service.
Link to documentation

Elasti-cache
In memory data store and cache service.
Link to documentation

Neptune
High performance graph database engine optimized for storing billions of relationships and querying the graph.
Link to documentation


How to create a DB instance and connect to it?


  1. Login to AWS Console and services > RDS
  2. Create a database. (I chose MySQL Free Trial)
  3. Put in an instance name and username/password.
  4. Once the database is created and available, one can connect to it using AWS Workbench. Link to documentation
  5. Connecting to MYSQL DB Instance documentation










How to add buckets in S3

How to add buckets in S3?


  1. Login to AWS Console and go to Services > S3
  2. Put a name for the bucket. Click Next.
  3. Click Next
  4. If you need access via URL enable public access. Else click Next to keep it as a VPC (Virtual Private Cloud)
  5. Click "Create Bucket"
  6. View Bucket on console page
  7. You can also delete your bucket by clicking delete (last screenshot)








AWS Storage Services

AWS Storage services is broken down into:

Simple Storage Service (S3) is designed to store and access any type of data over the internet.

Amazon Glacier  is the cheapest service and is used for long term data archiving.

Elastic Block Storage (EBS) is a highly available (low latency) block storage.

Elastic File Storage (EFS) is a network attached file system.

Storage Gateway enables hybrid storage between On Premise and AWS Cloud. It caches the most frequently used data in On Premise and less frequently used data in AWS Cloud.

Snowball is a hardware device used to migrate data. e.g Copy data from On Premise to Cloud.


Tuesday, October 15, 2019

Quick Introduction to AWS


What is AWS?

AWS is a global cloud platform. A hosting provider for one to run their applications on the cloud.

What do they do?

They provide
  • Infrastructure as a service (IAAS). No need to manage backup or power supply to the units.
  • Platform as a service (PAAS). e.g get Java as a service and one doesn’t need to manage the binaries.
  • Software as a service (SAAS). e.g send emails.
  • Cloud storage platform


Advantages?
  • Stable services.
  • Services are billed per hour and for storage per GB.
  • Easy to sign in and start scaling.


List of services provided

EC2 (Elastic Compute cloud) 
  • Bare servers. Run your software on those machines. 
  • Steps:
    • Choose an AMI (Amazon Machine Image) : OS, Software Info, Access Permissions.
    • Can create customized AMIs or choose from a predefined one (AMI marketplace).
    • Choose Instance type (HW) [Compute optimized, Memory optimized, GPU optimized, storage optimized or general purpose]
    • Configure the instance [# of instances, IP, bootstrap scripts etc]
    • Add Storage
    • Configure security groups (configure access to your instance)
    • Launch
    • Select a public/private key (needed for access and security)

VPC (Virtual private cloud)
  • Create networks in your cloud and run your servers on those networks

S3 (Simple Storage Service)
  • File storage and sharing service

RDS (Relational Database Service)
  • Run and manage databases (SQL, MySQL, Oracle etc) on the cloud.

Route 53
  • Global DNS Service. Its a managed DNS service where one can point their DNS to Amazon and they take care of it. 


ELB (Elastic Load Balancer)
  • Load balance incoming traffic to multiple machines.

Autoscaling 
  • Add/Remove capacity on the fly.
  • This should ideally be combined with ELB.


Monday, October 14, 2019

Kubernetes: Features

When we plan to deploy containers on our own, we could run into some of the below mentioned issues:


  • Container communication
  • Deployment
  • Managing a container 
  • Auto scaling
  • Load balancing



Kubernetes helps in resolving the above mentioned issues.


Kubernetes automates container deployment, auto scaling and load balancing.


Some of the features that Kubernetes provides:


  • Communication between containers (each container is assigned an IP and a single DNS name for a set of containers)
  • Automatic packing of applications into containers
  • Scaling (add new or remove containers)
  • Restarts failed containers and can create new containers and nodes as a replacement if there is a crash.
  • Load balancing
  • Allows mounting of storage system
  • Rollout/Rollback (done automatically)


Docker Vs VM

In an earlier article we discussed Kubernetes and introduced Docker.

What is Docker?
As per wiki:

Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.


In simpler terms:

  • An application can have various components like Webserver, database etc where we could encounter an issue related to libraries or dependencies.
  • We can create a container per component and use docker to deploy them over a particular OS.
  • These can then communicate with each other.
  • Docker containers share the same OS kernel.
What is the difference between VMs and Dockers?

  1. Each VM comes with its own OS. Containers have applications (software) deployed for a particular OS kernel.
  2. VMs will be heavy and consume high disk space and compared to a container.
  3. VMs takes time to boot.
  4. VMs have their own OS and hence we can have VMs containing Windows and Linus deployed together. Docker containers share the OS kernel and hence having different OS software is not possible. This is however not a disadvantage since we can have two different deployments communicating with each other.

High Level Introduction to Kubernetes

This article will provide a high level introduction to Kubernetes (for beginners).

What is Kubernetes?

As per wiki definition:


Kubernetes (commonly stylized as k8s) is an open-source container-orchestration system for automating application deployment, scaling, and management. It was originally designed by Google, and is now maintained by the Cloud Native Computing Foundation. It aims to provide a "platform for automating deployment, scaling, and operations of application containers across clusters of hosts". It works with a range of container tools, including Docker.


In simpler terms:
  • It is an open source system for ADSM (Automating deployment, scaling and management).
  • It groups containers that make up an application into logical units for easy management and discovery.
  • Makes it easy to deploy and run applications (in a container)
At a high level, lets try to understand the concepts of Containers and Orchestration.

What are containers?

Traditionally, applications were deployed on physical servers. Running multiple applications caused resource allocation issues.
An application could compete with another for CPU or memory etc in the same physical server.

The solution for the above was to use different standalone physical servers per application.
This was not practical since that would involve many physical servers and some being underutilized.

As an improvement, the next deployment phase was to use VMs (virtual machines).
VMs were deployed on a single server. They had their own memory/CPU assigned and each VM was isolated from the other (from security standpoint).

Containers are similar to VMs. They are light weight since they have relaxed isolation properties to share the OS among applications. 
Like VMs, containers have their own filesystem, CPU and memory etc.
You could read up more on containers and their efficiency.

What is Kubernetes?
In very basic terms:

  • You have a container deployed in production.
  • Containers run applications and you need to ensure that there isn't any downtime.
  • If one containers fails, another should be up and running to distribute the load.
  • This framework - to run distributed load is provided by Kubernetes,
  • Where does Kubernetes help:
    • Load balance and distribute network traffic between containers
    • A new deployment can be as simple as create new containers and remove existing containers. New containers can be thoroughly tested before deployment.
    • Define CPU/Memory per container
    • Restarting stuck containers etc can be automated.
Docker is a very popular tool to deploy applications in a container.

As per wiki:

Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.

Wednesday, October 9, 2019

Statistical Learning (Index Page)

Will keep updating this page with details on statistics, supervised and unsupervised learning, Linear Regression, Classification etc

What is Statistical Learning?

In simple terms, it refers tools that help us to understand data.

These tools are of two type:

Supervised Learning:
Supervised learning is basically creating a model given a set of inputs and a known output.
Once the model is ready, it can be applied to newer sets of inputs.

e.g Image recognition and Speech recognition

Unsupervised Learning:
In Unsupervised learning, we start with a set of inputs and try to understand the pattern.
Mostly used for exploratory analysis.

Wednesday, January 2, 2019

Raspberry Pi Index Page

Configure and test audio on the Pi



  • Please follow prior steps to install Raspbian.
  • Connect the USB microphone and 3.5mm jack speaker
  • Go to the UI and 
    • Click on the Raspberry icon > Preferences> Audio Device settings 
    • Select "USB PnP Sound Device (Alsa Mixer)
    • Click "Select Controls", select Microphone and make the Microphone audio to its highest.



  • Run "sudo raspi-config" on the terminal
  • Select "Advanced Options"
  • Select "Audio"
  • Select "Force 3.5mm headphone jack"
  • Select "Finish"








Setting Mic and Audio controls:

  • Run the command "aplay -l" (its -L lowercase) in the terminal
  • Copy the card # and device # (for speaker)
  • Run the command "arecord -l" (its -L lowercase) in the terminal
  • Copy the card # and device # (for Mic)
  • Open a new file under /home/pi
  • vi .asoundrc
  • Copy paste the below

pcm.!default {
  type asym
  capture.pcm "mic"
  playback.pcm "speaker"
}
pcm.mic {
  type plug
  slave {
    pcm "hw:1,0"
  }
}
pcm.speaker {
  type plug
  slave {
    pcm "hw:0,0"
  }
}

In the above

pcm "hw:1,0" under pcm.mic is hw:<card#>,<device#>

Please use the correct one from aplay (Speaker) and arecord (Mic) as mentioned above.


Test Audio:
  • ssh to the Pi (or run the commands directly on a terminal on the pi)
  • Run the command "speaker-test -t wav"
  • You should hear voice from the speaker
  • Our speaker is configured.
Test Mic:
  • ssh to the Pi (or run the commands directly on a terminal on the pi)
  • Run the command
    arecord --format=S16_LE --duration=10 --rate=16000 --file-type=raw out2.wav
  • Say something near the microphone for 10 sec [Note duration above is in seconds]
  • To replay, run the command
  • aplay --format=S16_LE --rate=16000 out2.wav
Your Mic and speaker are now configured.