Saturday, 20 August 2016

Basic rule based typecasting parser (python)

 

Introduction

Having an experience of writing API's for Sorted from past 6-7 months I have came up with a lot of utility functions which really made my and some of my colleagues life really simple (only after they learned how to use them). One of the important thing (rarely noticed) I came across was how to efficiently parse the request parameters (sent by client) and automatically typecast them if possible else throw error. Everytime before today, whenever I thought about it, the only fast solution I came up was to manually typecast values. So this post is regarding a basic (though efficient) pythonic solution to automatically parse and typecast variables according to the rule provided.
 

Thought process

  • The task was divided into two parts:
    1. The rule construct (detail).
    2. Parser, typecasting and error throwing error if needed (detail).
  • I was targetting a small subset of problem, in which I have to typecast values into primitive datatypes like int, float, string and boolean
  • Every API request had its own structure of request, so rule should change based on the API. Hence the rule construct should be as generic as possible to accomodate most of the possible use cases.
 

Rule Construct

Let us look at an example rule:
student = {
  'name': 'str',
  'subjects': [{
    'name': 'str',
    'marks': 'float',
    'passed': 'bool'
  }],
  'hobbies': ['str'],
  'school': {
    'name': 'str',
    'estd': 'int'
  }
}
The above rule states that:
  • student.name is to be a string
  • student.subjects is to be a list of dict each having:
    • name as a string
    • marks as a decimal (floating point) number
    • passed as a boolean
  • student.hobbies is to be a list of strings
  • student.school.name is to be a string
  • student.school.estd is to be an integer value
Note that rule which corresponds to list has only one element irrespective of number of data that would be stored in it. So rule which corresponds to list should have homogeneous elements stored in them. Using only 4 keywords (i.e. int, float, bool and str) along with dict and list construct of python rule was designed. I hope the construction of rules are clear. So moving on to the tough section in which we parse, typecast and optionally throw error (if required) the given input according to the rule.
 

Code Section (Parser)

from copy import deepcopy


'''
To check if type(var) passed as `x` is a string
or not (either of type `str` or `unicode`)
'''
_isStrType = lambda x: x == type('') or x == type(u'')


'''
To check if the type(var) passed as `attrType`
is one of the primitive datatype (as mentioned
above)
'''
_isBasicType = lambda attrType: (
  _isStrType(attrType)
  or attrType == type(1)
  or attrType == type(1.0)
  or attrType == type(True)
)


'''
A utility function to show a formatted 
message of datatype parsing error to user.
'''
_msgFormatter = lambda chunkAttrVal, ruleAttrVal: (
  str(chunkAttrVal) +
  " found " + str(type(chunkAttrVal)) +
  " but api expected " +
  (
    str(type(ruleAttrVal))
    if ruleAttrVal not in ['int', 'str', 'bool', 'float']
    else str(ruleAttrVal)
  )
)


'''
Returns `val` after typecasting it to the 
primitive types as asked in the argument `type`.
Throws `ValueError` Exception if any typecasting 
error occurs
'''
def _typecast(val, type):
  val = deepcopy(val)
  if type == 'str':
    return str(val)
  elif type == 'float':
    return float(val)
  elif type == 'int':
    return int(val)
  elif type == 'bool':
    val = str(val)
    if val.lower() == 'true':
      return True
    elif val.lower() == 'false':
      return False
    else:
      raise ValueError(
        "Invalid argument for boolean type : " + val
      )


'''
The recursive function which traverse all the dict 
attributes and list elements and typecast each of 
them if it is provided as per the rule provided.
Throws `Exception` if any parsing/typecasting error occurs
'''
def parseInputParams(chunk, rulesChunk):
  # Used to update the typecasted value of chunk 
  # when it is a list
  index = 0

  # for identifying whether chunk is list/dict
  if type(chunk) == type([]) or type(chunk) == type(()):
    isChunkList = True
  else:
    isChunkList = False

  # for identifying whether rulesChunk is list/dict
  if type(rulesChunk) == type([]) or type(rulesChunk) == type(()):
    isRulesChunkList = True
  else:
    isRulesChunkList = False
  
  if isChunkList != isRulesChunkList:
    msg = _msgFormatter(chunkAttrVal = chunk, ruleAttrVal = rulesChunk)
    raise Exception(msg)
  else:
    isList = isChunkList
    # isList = True, means both `chunk` and `rulesChunk` are list
    # isList = False, mean `chunk` is a dict but `rulesChunk` can
    #        be dict or any other primtive types

    # Start Iteration over all the elements of the chunk
    for attr in chunk:
      # get the value which is to be parsed next according to if chunk is a list or dict
      if isList == True:
        chunkValue = attr
        ruleChunkValue = rulesChunk[0] # since the rule is depicted by the 0th element of the rule list as explained above
      else:
        chunkValue = chunk[attr]
        if type(rulesChunk) == type({ }) and attr in rulesChunk:
          ruleChunkValue = rulesChunk[attr]
        else:
          # If `rulesChunk` is neither a list nor dict 
          # containing attr, then the rule for the given
          # attr is not defined and hence it should be `None`
          ruleChunkValue = None

      chunkValueType = type(chunkValue)
      rulesChunkValueType = type(ruleChunkValue)
      # Just to make `tuple` type to `list` type for easy comparison
      if chunkValueType == type(()):
        chunkValueType = type([])
      if rulesChunkValueType == type(()):
        rulesChunkValueType = type([])

      if _isBasicType(chunkValueType) and _isBasicType(rulesChunkValueType):
        # the `chunkValue` is to be typecasted
        if isList == True:
          try:
            chunk[index] = _typecast(val = chunkValue, type = ruleChunkValue)
          except ValueError as e:
            raise Exception(e)
        else:
          try:
            chunk[attr] = _typecast(val = chunkValue, type = ruleChunkValue)
          except ValueError as e:
            raise Exception(e)
      elif chunkValueType == rulesChunkValueType:
        # the `chunkValue` is not of primitive datatype and so is the `ruleChunkValue`
        # call the same function with the subset of data to be typecasted and subset of the rule applicable
        parseInputParams(chunk = chunkValue, rulesChunk = ruleChunkValue)
      else:
        # there is some error in the chunk provided as none of the valid condition matches.
        msg = _msgFormatter(chunkAttrVal = chunkValue, ruleAttrVal = ruleChunkValue)
        raise Exception(msg)

      index += 1
  return chunk
Having parser in place, lets now see how would we call the function with the given student rule.
exampleRequest = {
    'name': "Gautam",
    'subjects': [{
        'name': 'Programming',
        'marks': '1',
        'passed': 'false'
    }, {
        'name': 'Photography',
        'marks': '100',
        'passed': 'true'
    }],
    'hobbies': [29, 'photography', 'coding'],
    'school': {
        'name': 'School X',
        'estd': '1993'
    }
}
try:
  parseInputParams(chunk = exampleRequest, rulesChunk = student)
except Exception as ex:
  print ex
The above code works fine with no errors printed on console. Let's try another example and see when the parser would throw an error.
wrongExampleRequest = {
    'name': "Gautam",
    'subjects': [{
        'name': 'Programming',
        'marks': 'F',
        'passed': 'false'
    }, {
        'name': 'Photography',
        'marks': '100',
        'passed': 'true'
    }],
    'hobbies': [29, 'photography', 'coding'],
    'school': {
        'name': 'School X',
        'estd': '1993'
    }
}

try:
  parseInputParams(chunk = wrongExampleRequest, rulesChunk = student)
except Exception as ex:
  print ex
Above mentioned example throws an error on console (explore yourself why!)
 

Notes

There are few things that you must take care while using the module:
  • There can be a situation in which the request param might not contain cetain attribute(s) whose rule has been defined. This will go off uncaught so please be careful.
  • It can handle a valid dict/list/tuple request data if rule is written with caution and correctly.
  • This is the first version of the basic automated rule based typecaster in python. Soon I will upload more advanced typecaster including implementation in other loosely typed programming language.
Wordpress Version

Thursday, 30 June 2016

Javascript Object (or JSON) attribute removal

Introduction

Here is the small utility function which I wrote it when I fell into the requirement stated below:
"Given a Javascript object having a lot of attributes, I am given a path for a particular attribute which is to be removed."

My Thought process

  • Recursion come up in my mind by noticing that there can be variable length of the path for the particular attribute to be deleted.
  • After searching for sometime I got to know that I cannot (or is very hard) make the deletion happen on the fly (I was thinking of pass by reference) and just return the result. So I came up with the idea of overwrite
  • Someway or the other each recursion level on completion should be able to return 2 things whichever suits the case. Two things are -
    • Whether the path was correct and any attribute was able to get deleted
    • If deletion was successful return the updated object (or array) and traverse it to the parent level and update subsequent attribute (or array elements)

Code Section

Here is the recursive function which fulfils the requirement
function getDeletedObj(obj, attr) {
if (typeof obj === 'undefined') {
// We came across improper attribute while traversing the list of attribute in `attr`
return false;
} else if (attr.length === 0) {
// when we have traversed the complete `attr` path correctly
return true;
} else {
var key = attr[0];
var isArray = false;
if (Object.prototype.toString.call(obj) === '[object Array]') {
isArray = true;
}
var objectFound = getDeletedObj(obj[key], attr.slice(1));
if (objectFound === false) {
// We went to the path of the object which did not existed (see the 'undefined' check above)
return false;
} else if (objectFound === true) {
// the next call was the last call to the attribute which is to be removed
if (isArray) {
// If the object to be removed was a part of array, then splice it
obj.splice(key, 1);
} else {
// Otherwise delete the object
delete obj[key];
}
} else {
// If neither `true` nor `false` is returned that means the returned variable is the updated object
// Update the current object and return it back for higher level updation
obj[key] = objectFound;
}
return obj;
}
}
Here's how you would use it:
exampleObj = {
a: {
b: 1,
c: [1,2,3]
},
d: true
}
result = getDeletedObj(exampleObj, ['a', 'c', 1])
Output:
result = {
a: {
b: 1,
c: [1,3]
},
d: true
}
Wordpress version

Tuesday, 28 June 2016

PHP multi exec made easy

Introduction

This story is for those people for whom network call becomes the bottleneck (I know for sure that there are plenty, so comes the solution). But before we get to the solution, allow me to describe the problem formally. When you have multiple network calls to be made simultaneously in PHP you either end up in reading about Curl Multi exec or end up reading about PHP Threads to have a work around and trying to make those requests simultaneously. So, the solution below will help you if you are facing any/all the issues like:
  • Multiple urls needs to be hit without any dependency on each other.
  • All the responses from the requests need to be handled from a single place.
  • If you are not going to use it directly with client inputs (error handling is not done properly, yet!)

Solution

The solution I am talking about actually uses Curl Multi exec but in a more cleaner way. We can call it a wrapper for ease of use. Now lets make our eyes dirty and go through the code. PS: Please read the comments => Super necessary

Code Section

class MultiCurl
{
//curl handler info storing variables
private $multiCurlHandler;
private $noOfHandlers;
private $handlers;

public function __construct()
{
$this->multiCurlHandler = curl_multi_init();
$this->noOfHandlers = 0;
$this->handlers = [];
}

/**
* @param $ch curl_init() resource variable
* @param $id any unique identification for the handler (default is the number (i-1), for ith handler)
*/
public function addHandler($ch, $id = -1)
{
if ( $id !== -1 ) {
curl_setopt($ch, CURLOPT_PRIVATE, $id);
} else {
curl_setopt($ch, CURLOPT_PRIVATE, $this->noOfHandlers);
}
curl_multi_add_handle($this->multiCurlHandler, $ch);
$this->noOfHandlers++;
}

/**
* @param callable|NULL $callback non-mandatory callback (sort of) function, called with param mentioned next
* @param int $sleep milliseconds to wait in the loops to prevent high CPU usage (defaults to 100 ms)
* @return array associative array containing unique id to response/error mapping.
*/
public function exec(callable $callback = NULL, int $sleep = 100)
{
$active = 0;
do {
//Initialise multi exec resources until all are done
$execResult = curl_multi_exec($this->multiCurlHandler, $active);
usleep($sleep*1000);
} while ( $active > 0 );
while ( $active && $execResult == CURLM_OK ) {
// We have more than one connections awaiting for response and initialisations are done properly

//if the socket has any data
if ( curl_multi_select($this->multiCurlHandler) != -1 ) {
do {
//Fetch more data as long as system tells us to fetch
$execResult = curl_multi_exec($this->multiCurlHandler, $active);
usleep($sleep*500);
} while ( $active );
}
usleep($sleep*500);
}
//Rest below are easily understandable stuff
$response = [];
while ( $done = curl_multi_info_read($this->multiCurlHandler) ) {
$info = curl_getinfo($done['handle']);
$uniqueId = curl_getinfo($done['handle'], CURLINFO_PRIVATE);
if ( $info['http_code'] == 200 ) {
$output = curl_multi_getcontent($done['handle']);
if ( $callback === NULL ) {
$response[$uniqueId] = [
'error' => NULL,
'response' => $output
];
} else {
$callback($uniqueId, NULL, $output);
}
} else {
$error = curl_error($done['handle']);
if ( $callback === NULL ) {
$response[$uniqueId] = [
'error' => $error,
'response' => NULL
];
} else {
$callback($uniqueId, $error);
}
}
}
if ( $callback === NULL ) {
return $response;
}
}

/**
* Closes all the connections
*/
public function close()
{
foreach ( $this->handlers as $handler ) {
curl_multi_remove_handle($this->multiCurlHandler, $handler);
curl_close($handler);
}
curl_multi_close($this->multiCurlHandler);
$this->multiCurlHandler = NULL;
$this->handlers = [];
$this->noOfHandlers = 0;
}

public function __destruct()
{
if ( $this->multiCurlHandler !== NULL && $this->noOfHandlers > 0 ) {
$this->close();
}
}
}
Here's how you would use it
$obj = new MultiCurl();
$url = "http://jsonplaceholder.typicode.com/comments/";
for ( $i = 1; $i < 10; $i++ ) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url . $i);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$obj->addHandler($ch, "handle" . $i);
}
$res = $obj->exec();
/*
* Response can be obtained by using exec in callback form too as shown below:
* $obj->exec(function($uniqueId, $error, $response) {
* //if $error === NULL, $response is obtained
* //otherwise $response is not passed
* });
*/
$obj->close(); //Optional to call as __destruct() makes sure resources are freed at the end
Wordpress version

References

Matt Butcher, TechnoSophos | Example, php.net