Posted in Python, Tech

Python 3.7 – Cool New features

Welcome back, to Python! It’s time to discuss the cool new features shipped with Python 3.7. Here are some of my favorites:

cobra-1287036_1920

  • Data Classes
  • Built-in breakpoint
  • Typing module
  • Importing Data files

Data Classes

What is it? A new decorator: @dataclass

What’s new? It eases writing special methods in the class, like __init__(), __repr__(), and __eq__(. They are added automatically.

Example:

from dataclasses import dataclass, field
@dataclass(order=True)
class Test:
…field1: str
…field2: str

The following does not need an implementation of __repr__() in the class:

>>> t = Test("abc","xyz")
>>> t
Test(field1='abc', field2='xyz')
>>> t.field1
'abc'
>>> t.field2
'xyz'

Doesn’t need an implementation of __eq__ to do this:

>>> t1=Test("abc","xyz")
>>> t==t1
True
>>> t2=Test("a","b")
>>> t==t2
False

Breakpoint

What is it?  A built-in pdb.

What’s new? Not a new feature but simplifies using the debugger. Eliminates the necessity to import pdb.

Example:
def divide(a,b):
…breakpoint()
…return a/b

>>> divide(2,3)
> (3)divide()
(Pdb)

Old way of importing pdb:

def divide(a,b):
….import pdb; pdb.set_trace()
….return a/b

Typing module13541540425_63372041e1

What is it?  Annotations and Type hinting

What’s new? Function Annotations intend to provide a standard way of associating metadata to function arguments and return values.

  • The annotations module and typing module provide hints on arguments and return value of a function.
  • Annotations earlier worked only with names available in the current scope. i.e. Forward referencing was not supported.
  • Annotations are now evaluated when a module is imported.

Example:  

Creating the file py37anno.py:

from __future__ import annotations

class Try:
…def foo(name: str) -> ‘salutation’:
……print(f”Annotations Example for you {name}”)

importing the annotations and typing module:

>>> from py37anno import Try
>>> from __future__ import annotations
>>> Try.foo.__annotations__
{‘name’: ‘str’, ‘return’: “‘salutation'”}
>>> import typing
>>> typing.get_type_hints(Try.foo)
{‘name’: <class ‘str’>, ‘return’: ForwardRef(‘salutation’)}
>>> Try.foo(‘Shilpa’)
Annotations Example for you Shilpa

Importing Data Files

What is it? An optimized, easy and organized API for working with data files in a Project

What’s new? Eliminates the necessity of hard-coding data file-names. The importlib.resources module helps in locating, importing and reading from the data file.

Example:

A file lorem.txt exists in the data directory of the project which also contains the __init__.py file

>>> import os
>>> os.listdir(‘data’)                                     #output: files in the ‘data’ directory
[‘lorem.txt’, ‘__init__.py’, ‘__pycache__’]

>> from importlib import resources
>>> with resources.open_text(“data”,”lorem.txt”) as f:
….print(f.readlines())
….
[‘”Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.” n’]

A note on installing python 3.7:

Creating a new environment:

conda create -n py37 -c anaconda python=3.7

Upgrade in an existing python environment to 3.7 in Anaconda:

conda install -c anaconda python=3.7

Posted in Docker, Misc, Tech

Docker – Introduction

docker.png

Let’s start by looking back on how applications were hosted and how it has evolved. Initially, we had one application running on a single server with its own dedicated memory, CPU and disk space. This model proved to be highly costly in order to host high load applications.

hypervisor.pngVirtual Machine

Next, we moved to Virtual Machines (VM), also called as Hypervisor Virtualization. We could now run multiple applications on each virtual machine, each one being a chunk of the actual physical server and having its own memory, CPU and disk space. However, the drawback was that each Virtual Machine needed its own Operating System and this incurred additional overhead of licensing costs, security patching, driver support, admin time etc.

Docker arrived with a solution to this problem. There is just one OS installed on the server. Containers are created on the OS. Each Container is capable of running separate applications or services. Here’s a comparative illustration of Containers vs Virtual Machines:

docker vs vm

  • Docker is said to be derived from the words: docker + worker -> Docker.
  • It is written using Golang
  • Is Open source with Apache license 2.0.

Let’s look at what the following mean:

Docker Container: It is a software that contains the code and all its dependencies to facilitate an application to run quickly and reliably irrespective of the computing environment

Docker Container Image is a lightweight, standalone, executable package that contains everything to run an application: the code, runtime, system tools, libraries and settings. Container Images become Containers at runtime.

Docker Hub is a public docker repository to store and retrieve docker Images. This is provided by Docker, however there are other third-party registries as well.

Open Container Initiative or OCI  is responsible for standardizing container format and container runtime.

Why is Docker so popular? 

  • A Docker image can run the same way irrespective of the server or the machine. It is hence very portable and consistent. It eliminates the overhead of setting up and debugging environment.
  • Another reason is Rapid Deployment, the deployment time is reduced to a few seconds.
  • They start and stop much faster than Virtual Machines
  • Applications are easy to scale as containers can be easily added from an environment.
  • Helps in converting a monolithic application into Microservices. [To read more, refer Microservices]
  • last but not the least, it is Open Source.
Posted in Scala

Scala – Introduction

Origin

Scala was created by Martin Odersky in 2004. The name ‘Scala’ is derived from the word ‘Scalable’ which signifies that it can grow with the demand of the user. Scala was designed to be both an Object oriented as well as a Functional programming language. i.e. every value is an object and every function is a value. It is designed to exhibit common programming patterns in a concise, type-safe and elegant way. Many say that it was introduced to fulfill the shortcomings of Java.

1_ygv4NHtC9Jao-aZPSCMPYg

Why Scala?

  • Modular, highly efficient and scalable.
  • Is a type-safe JVM language that incorporates both object-oriented and functional programming.
  • Good for exploiting parallelism for multicore and parallel computing.
  • The source code compiles to Java bytecode that runs on the JVM.
  • It is very flexible in defining abstractions
  • Immutability: Scala makes it easy to write code using immutable data
  • Scala can use all Java libraries and Scala code can be imported into Java code interchangeably.
  •  Applications:
    • Writing web applications
    • Distributed applications
    • Streaming Data
    • Parallel processing
    • Analyzing data with Spark

Who’s using Scala?

businesses-using-scala.jpg

Directory structure

SBT or the Source Build Tool is an open-source build tool for Scala/Java Projects. Here’s the directory structure normally followed in Scala projects. The src/ is the Base or the project’s root directory.

src/
   main/
       resources/
         <files to include in main jar here>
       scala/
         <main Scala sources>
       java/
         <main Java sources>
   test/
       resources
         <files to include in test jar here>
       scala/
         <test Scala sources>
       java/
         <test Java sources>

Class, Object, Package and Trait

Class: The concept of classes in Scala is similar to that in Java. They are nothing but templates containing fields and methods. A class can be instantiated using the ‘new’ construct. Classes in Scala cannot have static members.

Object: Object is a named instance with fields and methods.

Package: Packages allows you to modularize programs. Contents of the package can be scattered across many files. Classes defined in a package can be specifically imported.

Trait: Traits are like Interfaces. However, they can even contain field definition or method implementations.

Writing your first Scala Program:

object FirstScalaProg extends App {
    println("I am done! Execute me!")
}

The entry point of the program is defined inside an object. The object is made executable by extending the type ‘App‘ or by adding a “main” method, as follows:

object FirstScalaProg {
    def main(args: Array[String]){
        println("I can be written this way too!")
    }
}

Scala source code is stored in text files with the extension .scala. The Scala compiler compiles .scala files into .class files. Classfiles are binary files containing machine code for the JVM to execute.

Books:

  • Structure and Interpretation of Computer ProgramsHarold Abelson and Gerald J. Sussman. 2nd edition. MIT Press 1996.– [Full text available online].
  • Programming in ScalaMartin Odersky, Lex Spoon and Bill Venners. 2nd edition. Artima 2010. – [Full text of 1st edition available online]
  • Scala for the ImpatientCay Horstmann. Addison-Wesley 2012. – [First part available for download.]
  • Scala in DepthJoshua D. Suereth. Manning 2012.
  • Programming ScalaDean Wampler and Alex Payne. O’Reilly 2009.

Other Learning Sources:

Posted in Java for Python Developers, Tech

Java For Python Developers 2 – OOP

This blog will focus on Object Oriented Programming concepts like Inheritance, Polymorphism and writing Packages, Interfaces and Abstract classes in Java.

Packages

Packages make a project organized. It creates a level of security by letting only the classes within that package have access, and it also provides name-scoping.

  • Some of them we have already come across are System (System.out.println), String, Math. They are from the package java.lang that is automatically included without having to import.
  • An import statement is written at the beginning of the code and is accessible throughout. An alternate way is to type the full name on each use in the code., i.e <pkg_name>.<class_name>
  • Java doesn’t compile the imported class. An import statement only saves time typing the full name of every class.

Inheritance

Public, Private, Protected and Default are called access controls. They control whether a variables or a methods can be inheritance or not.

  • Public members are inherited.
  • Private members are not inherited
  • Protected members can be access only by subclasses
  • Default can be accessed only within the same class. It doesn’t require a keyword.

Java does not support multiple inheritance, as it leads to the “Deadly Diamond of Death” problem. Lets see what that means.

180px-Diamond_inheritance.svg

The diamond problem is an ambiguity that arises when two classes B and C inherit from A and then when D inherits from B and C. Now, if there is a method in A that has been overridden by both B and C then which version of the method does D inherit? that of B or that of C? To avoid this problem, Java does not allow multiple inheritance.

Few other important aspects of inheritance:

  • A non-public class can be sub-classed only by the classes in the same package.
  • A class declared as final cannot be sub-classed anymore.
  • A class which has all private constructors cannot be sub-classed.
  • The keyword used is “extends”

Polymorphism

Polymorphism is having many forms of a method.

It is not a valid overloading if only the return type of a given method has been changed. To overload a method, the argument list has to be changed.

Abstract Class

  • Abstract Classes cannot be instantiated. The compiler will restrict instantiation of an abstract class. Whereas, concrete classes can be instantiated.
  • Abstract Methods must be overridden by the inherited class. Abstract methods do not have a body. It keeps a check to ensure that the inherited class implement them.
  • An abstract method should always be put in an abstract class.

Lets see how they are implemented in Java:

abstract class Pan extends Utensils { 
    public abstract void Material(); //an abstract method.
} 

Pan p; //works
p = new Spoon; //works
p = new Pan; //Compiler throws an error: Pan is abstract; cannot be instantiated

It is similar to that in Python. Lets see the implementation:

from abc import ABCMeta, abstractmethod #IMPORTANT

class animal:
     __metaclass__ = ABCMeta
    def delegate(self):
        self.action()

    @abstractmethod
    def action(self): pass

class tiger(animal): pass

A=animal()
t = tiger()

Interfaces

An interface can be presumed to be an extra feature that are not inherited from a parent class. This kind of helps overcome the drawback of not being able to inherit multiple classes.

  • An interface contains all “abstract” methods that the subclass is forced to implement. It can be perceived as a 100% abstract class
  • The keyword used is “implements”
  • Interfaces are always “public”. Methods in the interface are always “abstract”
  • A class can implement multiple interfaces which allows adding a number of features to a class making it more meaningful, in other words more real-word than the parent class.

Here’s an example:

interface Pet {      // all are the abstract methods.      abstractvoid wagTail(int a);      abstractvoid RunAround(int a);      abstractvoid EatFood(int a);  }

class Dog extends Animal implements Pet {}

Conclusion:

In this blogpost we learnt some Object Oriented Programming concepts in Java. We also learnt how to implement Packages, Abstract Classes and Interfaces. In the next blog will be a deep dive into Memory management and Garbage Collection.

The previous blog was about getting started with basic concepts like using Variables, write Classes, conditional statements, loops and built-in libraries.                                     Java for Python Developers 1 – Basics

Posted in Misc, Serverless, Tech

Serverless

Serverless is a cloud-computing execution model that dynamically manages resources. It is categorized as FaaS or Function as a Service Solution.

serverlessThe idea is to create an abstract functionality which is available “on-demand”. Cloud Platforms bill only for the time of execution of the functionality. Serverless model reduces Operation cost, complexity of provisioning servers and maintaining them and handles scaling automatically.

  • Serverless abstracts the functionality.
  • The server side logic is run on stateless computing containers which are triggered on an event.
  • It is fully managed by third-party, relieving the responsibility of managing servers.
  • Provides Increased Reliability
  • Greener Computing: It reduces the necessity of building data centers across the Globe.

Some well known Serverless solutions are Amazon’s AWS Lambda, IBM’s OpenWhisk, Google cloud service, Microsoft Azure Serverless Platform.

Expedia, Coca-cola, Thompson Reuters, EA are some examples of companies using Serverless Architecture.

Ground Rules

Some ground rules that label an Architecture as Serverless are:

  • Granularity: One or more stateless functions having a single purpose, solving a specific problem.
  • Designed as an Event-driven pipeline.
  • Zero Administration: No provisioning or Maintaining or scaling. Using third party services lets you focus on building value-adding customer features.
  • Scaling is automatically managed.
  • Cost-Saving: You can execute code on-demand and pay only for the time of execution. This helps build self-sustaining start-ups regardless of the number of customers it currently has. Scaling is also not a road-block anymore.
  • Rapid Time to Market: Shortens the Time between an Idea evolving into a Product taking away the entire overhead of procuring servers.

 

What is not Serverless and Why?

  • PaaS or Platform as a Service like Heroku, Salesforce or Google App Engine cannot bring the entire application up/down in response to an event unlike how a FaaS works.
  • Container Platform like Docker and Container Hosting Systems like Mesos, Kubernetes need you to manage the size and shape of the cluster. FaaS can automatically handle resource provisioning, allocation and scaling.
  • Stored Procedures as a Service – They often require to be written in a specific Framework or language and is hard to unit test as they are database dependance.

 

AWS Lambda: 

AWS Lambda can run scripts/app on the Amazon’s cloud environment. Amazon charges only when the function is used. Hence you pay as you use regardless of whether your business has a single user or a million. AWS Lambda provides an integrated solution for computing as well as storage, using Amazon S3.

  • AWS Lambda provides a lot of pre-configured templates to choose from instead of writing the Lambda function from scratch.
  • In Python, boto3 is the library used in order to create an Identity and Access Management (IAM) Role to securely access AWS resources. An example given below.
  • A function can be created inline or by uploading a .zip file.
  • Once created, the lambda function can be triggered via an HTTP call, using the “Add API Endpoint” feature.

Here’s an example of a Lambda function:

import boto3, json

lambda_client = boto3.client('lambda')

def lambda_handler(event, context):
   message = 'Hello {} {}!'.format(event['first_name'], 
                                    event['last_name'])  
    return { 
        'message' : message
    } 

lambda_client.create_function(
  FunctionName='exampleLambdaFunction',
  Runtime='python2.7',
  Role=role['Role']['Arn'],
  Handler='main.handler',
  Code=dict(ZipFile=zipped_code),
  Timeout=300
)

The above function can be invoked using the below snippet:

import boto3, json

lambda_client = boto3.client('lambda')

lambda_client.invoke(
FunctionName='exampleLambdaFunction',
InvocationType='Event',
Payload=json.dumps(test_event),
)

 

Drawbacks

The one major drawback is the dependency on the third-party/ vendor whose services are being used, taking away control over system down time, cost changes etc. Also, it is very hard to port from one vendor to another and always involves shifting the entire infrastructure. Privacy and security is also a major concern when the application is built for handling sensitive information.

Posted in Java for Python Developers, Tech

Java for Python Developers 1 – Basics

The intention of this blog is to talk about the similarities and differences between Python and Java. This will be a series of blogposts, this being the first. Through this blog I plan to help myself and other Python developers learn Java easily by relating it to similar concepts in Python. The focus of this post will be on learning Basic concepts and writing some simple code in Java.

Why learn Java?

  • Java is a very popular and a widely used Programming language. It is used in building large systems due to it’s powerful JVM that leads to faster execution. When an Application needs to scale-up, Java is the most popular choice in the industry.
  • It supports Concurrency.
  • Java has a good cross platform support.
  • It has a very strong IDE support making the developer’s life easier.
  • It imposes certain best practices. There is most likely, a single way of doing stuff and hence a bad programmer can only harm so much. :\ But in case of languages like Python, there are several ways of writing a piece of code and the onus is completely on the developer to use the most optimized one.

Classes and Objects:

The concept of Classes, Objects and Object Oriented programming is similar to Python. But here’s the catch:

googleapisjavapython

Each .java file can only have a single public class that contains the main function. This is to know the exact point in the file from which the application is supposed to be launched. The name of this public class should be same as the name of the .java file.

Variables:

Again, the concept is very similar to Python including the naming conventions.

Java cares about Type. An Elephant cannot be put in a basket meant for vegetables. This is very much unlike Python, which decides if we need a Basket or a Cage based on whether the source is an Elephant or a vegetable that we are dealing with. 🙂 Moreover,

  • variables declared as final cannot be reassigned, once assigned to some value.
  • A variable of a particular type can be assigned to another of the same type. Example, a variable of the type Tiger (Tiger t1 = new Tiger()) can be assigned to any another variable of the type Tiger. (Tiger t2 = new Tiger(); t2 = t1;)
  • Java is PASS-BY-VALUE.

Global Variables in Python can be used as global x = 3. Java has no concept of global variables as such. However, a variable marked as public, static and final acts as a globally available constant.

Conditional Statement:

if(var1 == var2) { <do this>
}

Iteration:

WHILE Loop:

Use while loop when we do not know the exact number of times to loop, in advance. In other words, when we have to loop until a condition is satisfied.

while(boolean condn) { <repeat this>
}

FOR Loop:

for(int i=0;i<anArray.length;i++){
<repeat array-length number of times>
}

Enhanced FOR loop: (For each loop)

String[] AnArray = {"John","Bob","Andy","George","Rachel"};
System.out.println("Names: ");
for(String name: AnArray) {
    System.out.println(name);
}
teach-2562905_1280
DO NOT forget the SEMICOLON;

Java Library or Java API

Let’s now writing some simple code in Java to achieve certain redundant tasks using the built-in Libraries.

ArrayList:

ArrayList can perform some frequently used operations on Lists which otherwise takes multiple lines of code and iterations to achieve the same. Similar to the built-in functions for a List in Python, elements can be added to the list, removed, check if an element is present in a list, get the size of the list, get the index of an element etc. as follows:

import java.util.ArrayList;
ArrayList<String> ListPlaces = new ArrayList<String>();
String p1 = new String("Switzerland");
String p2 = new String("Singapore");
ListPlaces.add(p1);
ListPlaces.add(0,p2); //0 - index where p1 should be added
System.out.println(ListPlaces);
System.out.println(ListPlaces.size());
System.out.println(ListPlaces.get(0));
System.out.println(ListPlaces.indexOf(p1));
System.out.println(ListPlaces.remove(1));
System.out.println(ListPlaces.contains(p2));
System.out.println(ListPlaces.isEmpty());

Generating a random number:

int Rand = (int) (Math.random() * 10);

*(int) -> Type casting the return value of the random() function to an integer.

Obtaining user input from command line:

import java.util.Scanner;
System.out.println("Please enter your name: ");
Scanner sc = new Scanner(System.in);
String username = sc.nextLine();

Let’s play Battleship

Here’s an example program, the battleship game written in Java. We will play this on command-line. The game randomly assigns 3 Battleships on a 7×7 board. It allows the user to guess the location. On successful identification of each ship, it says “kill”. When all the ships sink, games ends with a score.

There may be better ways of writing this code but at this stage, we are just trying to get a step closer to learning Java. Here’s an implementation of the battleship game.

This code is also available on Github: https://github.com/shilpavijay/Battleship

import java.util.*;
import java.io.*;

public class Battleship {
    private GameHelper helper = new GameHelper();
    private ArrayList<PlayBattleship> battleList = new ArrayList<PlayBattleship>();
    private int noOfGuesses = 0;

    private void setUpGame() {
        PlayBattleship ship1 = new PlayBattleship();
        ship1.setName("Maximus");
        PlayBattleship ship2 = new PlayBattleship();
        ship2.setName("Aloy");
        PlayBattleship ship3 = new PlayBattleship();
        ship3.setName("Agnes");
        battleList.add(ship1);
        battleList.add(ship2);
        battleList.add(ship3);
        System.out.println(battleList);

        System.out.println("Your goal is to sink the three ships: Maximus, Aloy and Agnes with the least number of guesses.");
        for (PlayBattleship shipToset : battleList) {
            ArrayList<String> newLocation = helper.placeShip(3);
            shipToset.setLocation(newLocation);
        }
    }

    private void startPlaying() {

        while(!battleList.isEmpty()) {
            String userGuess = helper.getUserInput("Enter a guess");
            checkUserGuess(userGuess);
        }
        finishGame();
    }


    private void checkUserGuess(String userGuess) {
        noOfGuesses++;
        String result = "miss";

        for(int i=0;i<battleList.size();i++) {
            result = battleList.get(i).checkTheGuess(userGuess);
            if(result.equals("hit")) {
                break;
            }
            if(result.equals("kill")) {
                battleList.remove(i);
                break;
            }
        }
        System.out.println(result);
    }

    void finishGame() {
        System.out.println("Wow!!! You sunk all the ships! Congrats!");
        if(noOfGuesses <= 18) {
            System.out.println("Score: Congrats, you sunk all ships in " + noOfGuesses + " guesses!");
        }
        else {
            System.out.println("You have exhausted your options! Sorry! Please try again!");
        }
    }

    public static void main(String[] args) {
        Battleship game = new Battleship();
        game.setUpGame();
        game.startPlaying();
    }
}
class PlayBattleship {
    private ArrayList<String> location;
    private String name;

    public void setLocation(ArrayList<String> battleList) {
        location = battleList;
    }

    public void setName(String n){
        name = n;
    }
    public String checkTheGuess(String choice) {
        String result = "miss";
        int index = location.indexOf(choice);
        if(index>=0) {
            location.remove(index);
            if (location.isEmpty()) {
                result = "kill";
                System.out.println("You sunk " + name + ":|");
            }
            else {
                result = "hit";
            }
        }
        return result;
    }
}
class GameHelper {
    private static final String alphabet = "abcdefg";
    private int gridlength = 7;
    private int gridSize = 49;
    private int[] grid = new int[gridSize];
    private int comcount = 0;

    public String getUserInput(String prompt) {
        String inputLine = null;
        System.out.println(prompt);
        try {
            BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
            inputLine = input.readLine();
            if(inputLine.length() == 0) return null;
        }
        catch(IOException e){
            System.out.println("IOException: " + e);
        }
        return inputLine.toLowerCase();
    }

    public ArrayList<String> placeShip(int size) {
        ArrayList<String> cells = new ArrayList<String>();
        String temp = null;
        int [] curr = new int[size];
        int attempts = 0;
        boolean success = false;
        int location = 0;

        comcount++;
        int incr = 1;
        if((comcount % 2) == 0) {
            incr = gridlength;
        }

        while(!success & attempts++ < 200) {
            location = (int) (Math.random() * gridSize);
            int x = 0;
            success = true;
            while(success && x<size) {
                if(grid[location] == 0) {
                    curr[x++] = location;
                    location += incr;
                    if (location >= gridSize) {
                        success = false;
                    }
                    if (x>0 && (location % gridlength == 0)) {
                        success = false;
                    }
                }
                else {
                    success = false;
                }
            }
        }

        int x = 0;
        int row = 0;
        int column = 0;

        while (x < size) {
            grid[curr[x]] = 1;
            row = (int) (curr[x]/gridlength);
            column = curr[x] % gridlength;
            temp = String.valueOf(alphabet.charAt(column));

            cells.add(temp.concat(Integer.toString(row)));
            x++;
//            System.out.print(" co-ord "+x+" = " + cells.get(x-1).toUpperCase());
        }
        return cells;
    }
}

Conclusion:

In this blogpost we learnt how to use Variables, write Classes, conditional statements, loops in Java. We also glanced at some basics APIs or built-in libraries in Java that help perform certain redundant operations easily. We also put these concepts in action by coding a game.

Posted in AI, Tech

Evaluating your Classifier

The previous post: Find your best fit

This post is a continuation of the previous one. This one shall discuss a few more ways of evaluating a Machine Learning Algorithm – a Classification Model, in particular.

Learning Curve

learning_curve

Say we have fit some function to ‘x’ number of training samples. The error of this function keeps increasing, as more and more data samples are added. However, after a certain limit, say ‘n’ number of samples, the error value will plateau.

Plotting a learning curve can assist in taking decisions on collecting more samples to enhance the performance of the classifier on the Test set. It is not going to make much difference to the performance if the learning curve has already reached the plateau.

Precision and Recall

Let’s consider an Email Spam Classifier. The prediction of the classifier vs the actual result, can be one of the following: [Here, 0 -> False, 1 -> True]

prec_recall_table

If the spam classifier predicts an email to be a spam (Predicted class is 1) and it turns out to be actually a spam (i.e. Actual class is also 1) then the result is said to be ‘True Positive’. The classification was accurate. However, if the spam classifier classifies the email to be a spam (Predicted class is 1) and the email is not actually a spam (Actual class is 0) then the result is said to be ‘False Positive’. The classifier seems to have erroneously classified the mail as spam. False Negative and True Negative can be similarly interpreted.

Precision and Recall are further used to calculate F1-Score, that measures the performance of a classification algorithm. Let us see what they mean and how they are calculated.

Precision = True Positive/(True Positive + False Positive)

Precision: Of all the emails that were predicted to be spam, the fraction of those that were actually spam is called the Precision.

Recall = True Positive / (True Positive + False Negative)

Recall: Of all the emails that actually are spam, Recall is the fraction that was correctly predicted as spam.

prec_recall

The relationship between Precision and Recall is illustrated in this graph. They are inversely related. There is a threshold where the classifier performs the best and has the highest precision.

F1-Score

F1-Score is the weighted average of Precision and Recall. It is calculated as:

F1-Score = 2 (Precision * Recall) / (Precision + Recall)

Here’s an example of three different algorithms with their precision, recall and F1 score:

Fscore

The first algorithm has the highest F-Score and hence performs the best.

Evaluating an Algorithm at an early stage results in making a calculative decision on further steps to be taken to improve performance. It is usually suggested to make a very quick and dirty implementation using a simple algorithm initially, and then improvise, by analyzing the algorithm and taking decisions like adding more/reducing features, collecting more samples etc.

Posted in AI, Tech

Find your Best Fit!

Let’s say we have chosen and implemented the best Machine Learning Algorithm, suitable for the data of our choice but ended up figuring out that the algorithm is actually making unacceptably large errors in prediction. What do we do next?

Eyes Cat Domestic Cat Kitten Cat's Eyes Surprised

Let’s discuss some choices that can be made in such a scenario and also get an intuition of what can be expected out of them. These can be used in general, for debugging or analyzing the performance of a Machine Learning Algorithm on a specific implementation.

  1. Add more training examples
  2. Try smaller set of features
  3. Try getting additional features
  4. Add polynomial features
  5. Increasing / decreasing the regularization parameter

The result of these can only make sense after understanding some underlying concepts like Over-fit, Under-fit, Bias, Variance etc. Here is what they mean:

Under-fit and Over-fit:

UnderfitAs illustrated in the graph, if the regression model fits only few data samples, it is said to be under-fitting. Insufficient number of Features might cause under-fitting. This is also called High Bias.

 

Overfit

 

As we can see, the model fits almost all the data samples provided to the algorithm. It may perform well on the training set but will not generalize well enough when new samples are encountered in the test set. This is also called High Variance.

Bias vs Variance:

When the training set error and the test set error are plotted against, say, the size of the data sample, we get a graph similar to that given below.

Bias_Variance

As more number of sample are fed, the algorithm performs excellently on the training set, thus reducing the error drastically. Whereas, on the test set it may performs poorly leading to high error.

[Error is the difference between the prediction made by the model and the actual value. The entire data sample is usually divided into a Training set, a Test set and sometimes a Cross validation set. The predictive model is trained using the training set. Cross validation and test sets are used to validation and Testing the model accordingly.]

High Bias: If both the training set error and the test set error are high, we can incur that the algorithm is suffering a high bias or is under-fitting.

High Variance: In case the training set error is very low and error on test set is very high when compared to the training set, the algorithm is suffering from high variance or is over-fitting.

Back to the diagnosis:

Now that we have the required background, let’s go back to the diagnosis and analyze each of our choices.

  1. Add more training examples:
    • If the test set error is too higher than that of the training set and we conclude the possibility of high variance, we can fix it by collecting more data samples and training the model again.
    • This might not be such a great idea if both the training and test set errors are high.
  2. Try getting additional features:
    • Say the algorithm is under-fitting the samples, the number of features might not be sufficient for the model to make accurate predictions. Collecting some more features may prove to be helpful.
    • For Example, if the model is predicting the price of a hotel, adding more features like number of rooms, room size, floor elevation, balconies, furniture etc can act as additional features.
  3. Try smaller set of features:
    • If the algorithm is over-fitting the data samples; in other words, it performs extremely well on the training set but is highly erroneous on the test set, adding more features might help improve the performance.
  4. Add polynomial features:
    • In some scenarios, we may be confined only to a limited set of features and adding more number of features might not be possible. In such case, an additional term can be added to the model, using an existing feature.
    • A linear model, for example, can be transformed into a quadratic model or a cubic model by adding an additional term which can be the square of the size of the room (or the cube)
    • Adding a polynomial feature can fix high Bias.
  5. Increasing / decreasing the regularization parameter:
    • Cost function is the best possible hypothesis/model to the training set that has the minimum squared error. To minimize the error or optimize the cost function, a regularization parameter called lambda is added.
    • Increasing the regularization parameter penalizes the model vector growing arbitrarily, forcing the optimization function to choose smaller values of the weights (thetha). This results in fitting a better model to the data when the existing one is suffering from high variance/over-fitting.
    • Similarly, decreasing the regularization parameter leads to higher values of thetha, fitting the model better when it is under-fitting. In other words, it fixes high bias.

These choices are not exhaustive but identifying the right step to be taken can save a lot of time and help arrive at the Best Fit, faster!

Posted in Elastic Stack, Misc, Tech

Elastic Stack

Server logs contain some of the most valuable and untapped information. Logs are always unstructured and usually makes little sense. Various opportunities of improvement might unveil by deriving insights from them. Elastic Stack or the ELK Stack is the most widely used solution for Log Analysis. It is Open Source and has a massive community pushing the boundaries by adapting it into various scalable systems. Companies including Microsoft, LinkedIn, Netflix, ebay, SoundCloud, StackOverflow use the Elastic Stack.

Elastic Stack

ELK is an acronym for three open source projects:

  • Elasticsearch:  A search and Analytics Engine. It is an open source, Distributed, RESTFul, JSON based search engine.
  • Logstash: A Server side data Processing Pipeline that can ingest data from multiple sources. It then transforms and send data to Elasticsearch.
  • Kibana: Visualizes data with charts and graphs.
  • Beats: A light-weight single purpose Data Shipper. Beats have a small installation footprint and use limited system resources. It can either directly send data to Elasticsearch or send it via Logstash. Beats is written in Go! 

Logstash along with Beats collects and parses log data from multiple sources. Elasticsearch indexes and stores this information.
Kibana visualizes this information to provide insights.

Elasticsearch

Elasticsearch is worth discussing in-detail. It is widely used for Full Text search. It is written in Java. This powerful search engine is designed to scale-up to millions of search events per second. Elasticsearch is used by Wikipedia, Airbus, ebay and shopify for powering their search for near-real time access. Its powerful features:

  • Scalability
  • Highly Available
  • Multi-tenancy
  • Developer friendly

Logstash

Logstash supports data of many formats coming from various systems. It can ingest data from logs, web applications, Data stores, Network devices, AWS services and REST endpoints. It then parses and transforms data, identifies named fields to build the structure and converts into a common format.logstash

  • Provides around 200 plugins to mix and match and build the data pipeline. It also provides the feature to build a plugin to ingest from a custom application.
  • Pipelines can be very complicated and cumbersome to monitor Load, Performance, Latency, Availability etc. Centralized monitoring is provided by the “monitoring and pipeline viewer” that makes the task easier and understandable.
  • Structures, transforms and enriches data with filter plugins
  • Can emit data to Elasticsearch or other destinations using output plugins like TCP or UDP
  • Logstash is horizontally scalableSecurity: Incoming data from Beats can be encrypted. Logstash also integrates with secured Elasticsearch clusters. 

Kibana

Kibana provides interactive visuals of Elasticsearch data to monitor the behavior, understand the impact of certain data changes and so on.

  • Kibana core comes with histograms, line charts, pie charts, sunburst and many other classics.
  • Plots Geospatial data on any given map.
  • Can perform advanced Time Series analysis.
  • Graph Exploration: Analyzing Relationships with Graphs
  • Build customized canvas, add logos, elements and create a story.
  • Can easily share dashboards across the organization 

Problems ELK can solve:

  • In a distributed system with several nodes, searching through several  log files for certain information, using unix commands is a tedious task. Elasticsearch comes to the rescue by providing faster access along with Logstash+Beats by collecting logs from all the nodes.
  • Ship Reports: Kibana provides faster ways to explore and visualize data. It can schedule and email reports. Can quickly export the results of ad-hoc analysis or saved searches into a CSV file. Alerting can be used to generate data dumps when certain conditions are met, or on a regular interval.
  • Alerting feature can set alerts on data changes, that can be identified using the Elasticsearch query language. Can proactively identify intrusion attempts, trend in social media, peak-hours in network traffic and can also learn from its own Alerting history. It comes with built-in integrations for email, Slack, HipChat etc.
  • Unsupervised Learning: The Machine learning features have the ability to detect different kinds of anomalies, unusual network activities and quick root cause identification.

It can also integrate with Graph APIs to analyze relationships in data. Canvas can be used to build presentations and organize reports. Elastic Stack has been extending its features and exploring many possibilities. 

Useful Resources:

  1. Elastic Stack
  2. Kibana Live Demo
  3. Logstash – Video

Posted in Python, Tech

Python 3.7 – Cool New features

Welcome back, to Python! It’s time to discuss the cool new features shipped with Python 3.7. Here are some of my favorites:

cobra-1287036_1920

  • Data Classes
  • Built-in breakpoint
  • Typing module
  • Importing Data files

Data Classes

What is it? A new decorator: @dataclass

What’s new? It eases writing special methods in the class, like __init__(), __repr__(), and __eq__(. They are added automatically.

Example:

from dataclasses import dataclass, field
@dataclass(order=True)
class Test:
…field1: str
…field2: str

The following does not need an implementation of __repr__() in the class:

>>> t = Test("abc","xyz")
>>> t
Test(field1='abc', field2='xyz')
>>> t.field1
'abc'
>>> t.field2
'xyz'

Doesn’t need an implementation of __eq__ to do this:

>>> t1=Test("abc","xyz")
>>> t==t1
True
>>> t2=Test("a","b")
>>> t==t2
False

Breakpoint

What is it?  A built-in pdb.

What’s new? Not a new feature but simplifies using the debugger. Eliminates the necessity to import pdb.

Example:
def divide(a,b):
…breakpoint()
…return a/b

>>> divide(2,3)
> (3)divide()
(Pdb)

Old way of importing pdb:

def divide(a,b):
….import pdb; pdb.set_trace()
….return a/b

Typing module13541540425_63372041e1

What is it?  Annotations and Type hinting

What’s new? Function Annotations intend to provide a standard way of associating metadata to function arguments and return values.

  • The annotations module and typing module provide hints on arguments and return value of a function.
  • Annotations earlier worked only with names available in the current scope. i.e. Forward referencing was not supported.
  • Annotations are now evaluated when a module is imported.

Example:  

Creating the file py37anno.py:

from __future__ import annotations

class Try:
…def foo(name: str) -> ‘salutation’:
……print(f”Annotations Example for you {name}”)

importing the annotations and typing module:

>>> from py37anno import Try
>>> from __future__ import annotations
>>> Try.foo.__annotations__
{‘name’: ‘str’, ‘return’: “‘salutation'”}
>>> import typing
>>> typing.get_type_hints(Try.foo)
{‘name’: <class ‘str’>, ‘return’: ForwardRef(‘salutation’)}
>>> Try.foo(‘Shilpa’)
Annotations Example for you Shilpa

Importing Data Files

What is it? An optimized, easy and organized API for working with data files in a Project

What’s new? Eliminates the necessity of hard-coding data file-names. The importlib.resources module helps in locating, importing and reading from the data file.

Example:

A file lorem.txt exists in the data directory of the project which also contains the __init__.py file

>>> import os
>>> os.listdir(‘data’)                                     #output: files in the ‘data’ directory
[‘lorem.txt’, ‘__init__.py’, ‘__pycache__’]

>> from importlib import resources
>>> with resources.open_text(“data”,”lorem.txt”) as f:
….print(f.readlines())
….
[‘”Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.” \n’]

A note on installing python 3.7:

Creating a new environment:

conda create -n py37 -c anaconda python=3.7

Upgrade in an existing python environment to 3.7 in Anaconda:

conda install -c anaconda python=3.7