Tuesday, 19 August 2014

Correlation and regression using numpy

Without any explanation...I just didn't want to forget it.  More to follow when I get the graphing stuff finished.  Served as a good opportunity to explore numpy in more detail.  No effort was made to simplify it down further.  I will be adding shapefile reading capabilities as well.  Just open it and run it...a simple data set is contained within. Cross-posted from:



Author:  Dan.Patterson@carleton.ca
  calculates the correlation coefficient and regression parameters for simple
  correlation using numpy
import numpy as np

def correlation(xs,ys):
  s_x = np.std(xs, ddof=1, dtype=np.float64)
  s_y = np.std(ys, ddof=1, dtype=np.float64)
  covar = np.cov(xs,ys)[0][1]
  r = covar/(s_x * s_y)
  return r

def regress1D(xs, ys, dim=1):
  '''simple first-order least squares regression in the form y=mx+b'''
  coeff = np.polyfit(xs,ys,dim)
  polynomial = np
  polynomial = np.poly1d(coeff)
  y_cal = polynomial(xs)
  return [coeff, y_cal]

if __name__ == "__main__":
  xs = [0,1,2,3,4,5,6,7,8,9,10]; ys = [0.2,0.7,2.7,2.6,4.1,5,5.7,7.3,7.9,9.1,9.8]
  r = correlation(xs,ys)
  coeff, y_cal = regress1D(xs,ys,1)
  text ='y = %.3fx + %.3f' % (coeff[0],coeff[1])
  print "\nPearson's r: ", r
  print "Equation (y=mx+b): ", text

Collections in numpy: producing frequency distributions and graphing

A simple verbose demo again, so that I don't forget.  The script is just pasted here until the python format and copying with IE11 is fixed.  The script is pretty self explanatory and no effort has been made to simplify it.  Cross-posted at



Author:   Dan.Patterson@carleton.ca
  To demonstrate the utility of using the collections module to
  obtain a simple frequency distribution, in this case, a list
  of random integers.  It is written verbosely so that the user
  can see the sequence of events and the results of the various

  A sequence of random numbers is generated and a dictionary of
  key:values is produced by collections.Counter.  The resultant
  keys are cloned to "classes" to prevent alteration of the
  initial keys.  An extra class is appended to the list to ensure
  that the last class in the keys is included since the behaviour
  of histogram is to combine the last two classes into
  one frequency (long story).  I just add a value of 1 to the last
  class to produce an extra bin.

  A histogram is produced which contains the classes and the frequency
  for those classes.

import collections
import random
import numpy as np

from matplotlib import pyplot as plt

rand_int = [random.randrange(1,6) for i in xrange(0,25)]
dict = collections.Counter(rand_int)
keys = dict.keys()
counts = dict.values()
classes = keys[:]                #clone the keys
classes.append(classes[-1] + 1)  #to ensure that the last bin has values

histo = np.histogram(rand_int,classes)
plt.title("Sample Histogram", loc='center')
plt.xlabel("class"); plt.ylabel("frequency")

print "\nCollections and pylab\n"
print "Random integers:\n", rand_int
print "Collections dict: ", dict
print "  keys:           ", keys
print "  values (freq):  ", counts
print "histogram         ", histo
print "   classes:       ", histo[1]
print "   frequency:     ", histo[0]

Sunday, 29 December 2013

Nested Conditional Operators

From my last post....consider this...
>>> a = [1,2,3,4,5]
>>> for i in a: True if i < 3 else ("blah" if i == 4 else False)

Which means you can extend the added function in Python 2.5 to include nested operators which is a handy function if you only have a few values to reclassify

So consider an integer field containing random numbers and an output text field, that you want to reclassify as

"True" if !Random! <  5 else "False"

would result in all the values in the Random field being reclassified to as either True or False.  If we want to expand this reclassification to include more breaks, consider the following.

"True" if !Random! <  5 else ("sort of" if !Random! == 5 else "False")

which would result in values of 5 in Random being classified as "sort of" and values > 5 as "False"

Saturday, 31 August 2013

Circular mean for directional data

When working with angles or time (minutes, days, months or years) the mean (average) of a list of values can't be determined in the same fashion as we normally do.  This example show one how to calculate the values of angles (azimuths specifically).  The function can be used in any script once the CircularMean.py script is imported.  I will get around to other descriptive measures for circular data.


Dan Patterson
 Date  Aug 2013
  To calculate the mean (average) for points along a circle using the
  angular measure to these points from the center of the circle

  If the angles are given in degrees then there is no need to convert
  If the units are time, such as hours (24 hour clock) or months (12) then
  these need to be converted by
    angle in degrees = time quantity / number of equal intervals
    ie 0.5 = (12 hours / 24 hours per day)
    or 180 degrees
required modules
arcpy, math

def circularMean(angleList):
  angles to be given in degrees
  math module needs to be imported in the main module
  cosList = []
  cosSum = 0.0
  sinSum = 0.0
  for i in angleList:
    theCos = math.cos(math.radians(float(i)))
    theSin = math.sin(math.radians(float(i)))
    cosSum += theCos
    sinSum += theSin
  N = len(angleList)
  C = cosSum/N
  S = sinSum/N
  theMean = math.atan2(S,C)
  if theMean < 0.0:
    theMean += math.radians(360.0)
  return theMean

if __name__ == "__main__":
  import math
  sampleData = [ [350,10],
                 [350, 5],
                 [355, 10],
  print "\nCircular Mean Demo"
  print  "%20s %20s %20s" % ("angle List", "Mean radians", "Mean degrees")

  for angleList in sampleData:
    theMean = circularMean(angleList)
    print  "%20s %20.8f %20.8f" % (angleList, theMean, math.degrees(theMean))

Saturday, 24 August 2013

Segment Intersection Demo

An addendum to my previous post in the quest to find a solution to polygon diameter, antipodal distance and/or polygon fetch for concave polygons.  This ditty checks to see if line segments intersect...may come in handy...read the header etc.


  Dan Patterson
  Determines whether two line segments intersect and returns it if it does
  Detects parallel lines.  This differs from line intersections, whereby the
  point of intersection need not be on either line
http://gis.stackexchange.com/questions/69025/antipodal-distance-or-polygon \
def SegmentIntersection(A,B,C,D):
  #[read the point objects, A,B,C,D which are lists of X,Y coordinates
  x0,y0 = A; x1, y1 = B; x2,y2 = C; x3,y3 = D
  # calculate differences
  Ax = x1-x0;  Ay = y1-y0
  Bx = x3-x2;  By = y3-y2
  Cx = x0-x2;  Cy = y0-y2

  # calculate the lengths of the two lines
  len1 = math.sqrt(Ax*Ax + Ay*Ay)
  len2 = math.sqrt(Bx*Bx + By*By)

  # calculate angle between the two lines
  dot = (Ax*Bx + Ay*By) #dot product
  deg = dot/(len1*len2)

  #if abs(angle) == 1:  #lines are parallel, no intersection is possible
  if(abs(deg) == 1):
    return None

  #find the intersection point (pt) between two lines
  pt = [0,0]
  div = By*Ax - Bx*Ay
  ua = (Bx*Cy - By*Cx)/div
  ub = (Ax*Cy - Ay*Cx)/div
  pt[0] = x0 + ua*Ax
  pt[1] = y0 + ua*Ay

  # calculate the combined length of the two segments
  # between pt- pnt0 and pt-pnt1
  Ax = pt[0] - x0
  Bx = pt[0] - x1
  Ay = pt[1] - y0
  By = pt[1] - y1
  segmentLen1 = math.sqrt(Ax*Ax + Ay*Ay) + math.sqrt(Bx*Bx + By*By)
  #print "segmentLen1 ", segmentLen1
  # calculate the combined length of the two segments
  # between Pt-p2 and Pt-p3
  Ax = pt[0] - x2
  Bx = pt[0] - x3
  Ay = pt[1] - y2
  By = pt[1] - y3
  segmentLen2 = math.sqrt(Ax*Ax + Ay*Ay) + math.sqrt(Bx*Bx + By*By)

  if the lengths of both sets of segments are the same as
     the lengths of the two lines the point is actually
   on the line segment.
   if the point isn't on the line, return None, else the intersection
  if(abs(len1-segmentLen1) > 0.0 or abs(len2-segmentLen2) > 0.0):
    return None
    return pt

if __name__ == "__main__":
  import math
  print "\nSegment Intersection Demo...Tests to see if two line segments intersect"
  dataList = [[[0.0,0.0],[1.0,1.0], [0.0,1.0],[1.0,0.0]],
              [[0.0,0.0], [0.0,1.0],[0.0,1.0],[1.0,1.0]],
              [[0.0,0.0], [0.0,1.0],[1.0,1.0],[0.0,1.0]],
  print dataList[0]
  for pnts in dataList:
    A,B,C,D = pnts
    pt = SegmentIntersection(A,B,C,D)
    print "\nintersection point = ", pt, " for pnts: ",A,B,C,D

Monday, 19 August 2013

Antipodal Distance Demo

This is a variant of the antipodal distance which allows you to isolate points pairs within a convex polygon forming the longest path from one side to the other side of a polygon without following the polygon edge.  If this algorithm were applied to concave polygons, then there would be no guarantee that the formed vector would be fully contained within the polygon.

Other variants of the antipodal distance can consider travel along the polygon edges, and there are many examples on the web.  Enjoy
  Dan Patterson
  To calculate the largest distance between the points the form
  a convex hull.
  - A list of list of xy pairs eg [ [1,1],[3,4,[2,6], [1,1] ]
    first and last points must be the same to close the polygon
  - numpy
  Does not account for donuts or multi-part polygons
def antipodalDistance(pnts):
  Calculates the largest distance between the points
  that make up a convex hull
  Requires: a list of list of xy pairs eg [[1,1],[3,4,[2,6],[1,1]]

  N = len(pnts)
  #thePairs = itertools.combinations(range(N), 2)   #pairs of numbers in the range N
  distList = []
  edgeList = []
  dmax = 0.0
  for i in range(0,N-2):
    for j in range(i+1, N-1):
      dist = numpy.math.hypot(pnts[i][0] - pnts[j][0], pnts[i][1] - pnts[j][1])
      diff = abs(i - j)
      if (diff == 1) or diff == abs(N-2) :
        edgeList.append( [dist, [i,j]] )
        distList.append( [dist, [i,j]] )   # compute the lower triangle of the matrix
        #distList.append( [dist, [j,i]] )   # form the upper triangle of the matrix
  dmax = distList[0]
  return [dmax, distList, edgeList]
if __name__ == "__main__":
  import numpy
  pnts = [ [0,0], [0,1], [1,4],[3,5],[5,4],[4,1],[0,0] ]  # a closed-loop convex polygon
  #pnts = [ [0,0], [2,1], [1,4],[3,5],[5,4],[4,1],[0,0] ]  # a closed-loop concave polygon

  dmax, distList, edgeList = antipodalDistance(pnts)
  print "\ndmax and index pair ", dmax
  print "\nDistList"
  for i in distList:
    print i
  print "\nedgeList:"
  for i in edgeList:
    print i