The Aesthetics of Koundinya Vajjha.

If you can forgive the overly-pretentious title, I’d like to tell you about the things I find beautiful.

Why? Well, it turned out that pretty recently I discovered that for the past few years I’ve been growing fascinated by an alarming number of topics across diverse areas. As I’ve become more and more self-aware, I wanted to locate the thing inside each of them which attracted me to them. Also, for the purpose of better self-discovery, I wanted to make a list of these things so that I could look back and revisit them again in the future.

As is common for almost every seemingly-novel intellectual pursuit, it turns out that there has already been someone in the past who has already thought of the same thing and written about it. (As a young boy when I noticed how often this was happening,  I used to get sad that there was always someone else who beat me to everything I thought I had discovered for myself.)

Immanuel Kant says that things we find beautiful (aesthetic judgements) must have four distinguishing features. First, they are disinterested, meaning that we take pleasure in something because we judge it beautiful, rather than judging it beautiful because we find it pleasurable. Second and third, such judgments are both universal and necessary. This means roughly that it is an intrinsic part of the activity of such a judgment to expect others to agree with us. Fourth, through aesthetic judgments, beautiful objects appear to be ‘purposive without purpose’ (sometimes translated as ‘final without end’).

The fourth and final feature is common to the aesthetics of Schopenhauer too. (Although there are major matters on how Schopenhauer differs from Kant.) In a nutshell, Schopenhauer says that the conscious manifestation of the Will is evil, and that art offers a way for people to temporarily escape the suffering that results from willing. So there is sense in pursuing “art for art’s sake”.

This is something which I strongly and unconsciously believed in ever since I was introduced to Pure Mathematics. While I initially held the view that nothing could compare to the aesthetic experience (or “Joy”) which Mathematics could provide, in due course I realized that there were many many other things which could provide an equal, if not greater, amount of this “Joy”. The following is a list of the things I’ve come across so far in life which I find beautiful, in the sense described above: I believe that each of the items in the following list have an inherent value enough to warrant a study of each one for it’s own sake. Here they are in no particular order.

  1. The prose of Jorge Luis Borges: In particular, the structure and execution of his stories and poems. Each story of his has a a nugget of an idea which is breathtakingly creative, around which the story is weaved, with that mellifluous writing which is Borges’ own. It has been a long standing goal of mine to weave a story as intricate as his, although I’m sure it would only come across as a cheap replica of the original.
  2. The Fundamental Theorems of Asset Pricing: Two theorems which lie at the heart of Quantitative Finance and which specify conditions for the fair pricing of a contingent claim(i.e., without the possibility of arbitrage). More precisely, the theorems state that if a market model of a market has a risk-neutral martingale measure, then that market does not admit arbitrage in it’s pricing. Also, markets with a unique risk-neutral measure are precisely those in which every derivative instrument can be hedged.
  3. Grothendieck dessins d’enfant: Graph embeddings in Riemann surfaces which provide combinatorial invariants for the action of the mysterious absolute Galois group over the rationals, \text{Gal}(\overline{\mathbb{Q}}/\mathbb{Q})! Unexpected and beautiful.
  4. Conlangs/Lojban/Sapir-Whorf Hypothesis: Conlangs are human constructed languages. One of the crowning achievements of human thought is the construction of Lojban, a language based on first-order logic in which it is impossible to be ambiguous. (For example, the grammar prohibits you saying things like “I got a license to practice in New York.” because there are two ways in which that sentence can be interpreted!) The Sapir-Whorf hypothesis is a rather controversial statement on the possibility of language influencing our thoughts. If this was true, would Lojban be able to influence its speaker’s  thoughts and make them think critically and correctly by default?
  5. Homotopy Type Theory and Univalent Foundations: Around 2005, it was discovered that traditional Martin-Löf Type Theory has a natural interpretation ( a “model”) in the abstract homotopy theory of Simplicial sets. Via this model, it was discovered that Martin-Löf Type Theory could be extended by an axiom called the “Univalence Axiom”, which provides the correction notion of equality in the universe of types. This new and extremely exciting area of mathematics which brings together abstract Homotopy Theory (an offspring of algebraic topology) and Type Theory (from Theoretical Computer Science) is called Homotopy Type Theory. Closely related to HoTT is the Univalent Foundations project which is an attempt to systematically provide a new constructive foundations of mathematics using, as a key way, the notions of h-level and Univalence. In particular, it provides a possibility of computer verification of mathematical proofs. The long term goal of this project is to make formal verification of mathematical proofs ubiquitous.
  6. Haskell and Functional Programming:  Why functional programming? Tail recursion is it’s own reward.
  7. Horology and George Daniels: Kicked off after I saw Daniels talk about the story behind the Space Traveller’s Watch.  Daniels was one of the few people in the world who could make a complete watch by hand during his time.
  8. Astronomy/Eclipse Prediction/Star Trails:
    • When I bought a DSLR, I experimented with long exposure photography and photographed my first star trails in the Rann of Kutch, in Gujarat. It’s so beautiful watching the stars rotate about the pole star.
    • On a whim, I decided to understand how eclipses were predicted. Which led me to borrowing a textbook on Spherical Astronomy. I discovered in that book the Saros cycle and the Besselian elements
  9. The Philosophy of Philip Mainlander: Philip Mainlander propounded “Perhaps the most radical system of pessimism known to philosophical literature…” Mainländer proclaims that life is absolutely worthless, and that “the will, ignited by the knowledge that non-being is better than being, is the supreme principle of morality”. In particular he states that God Himself finds existence unbearable and as a result He created the universe. So we are God’s way of killing Himself.
  10.  Software-Defined Radio: SDRs are awesome! They allow fully software implmentations of what previously required specialized and expensive hardware. I bought my HackRF last September and have been in love ever since.

I guess I’ll update the list as and when I discover more about myself.

 

 

Script to backup home directory.

My laptop is in it’s final stages. It might die any moment. And since I’ve spent almost three years moulding my Arch install on it to my liking, I spent many sleepless nights pondering about what I’d do if it crashed and never woke up.

Earlier I wrote a blog post on how to transfer an existing Arch install onto a new laptop. In that blog post, I noted that in essence, it is the home directory and the list of packages of an Arch install which uniquely determines it and so if I had a copy of these, in theory, I would be able to transfer my install onto any new laptop. The steps to do this were outlined in that post.

So the question remained on automating this process. To acheive that, I wrote a simple bash script.

#!/bin/bash 

echo "Backing up to HDD..."
# Use rsync to backup to external HDD.
sudo rsync --info=progress2 -aAXn --delete --exclude={/home/*/.thumbnails/*,/home/*/.cache/mozilla/*,/home/*/.local/share/Trash/*} /home/kody/ /run/media/kody/TOSHIBA\ EXT/ARCH-BACKUP-2017/
echo "Back up done."
# Make a list of packages and store them in files on the HDD.
echo "Making a list of AUR and Pacman Packages and storing it on the HDD..."
pacman -Qqe | grep -vx "$(pacman -Qqm)" > /run/media/kody/TOSHIBA\ EXT/ARCH-BACKUP-2017/Packages_$(date '+%Y-%m-%d')

pacman -Qqm > /run/media/kody/TOSHIBA\ EXT/ARCH-BACKUP-2017/Packages_AUR_$(date '+%Y-%m-%d')

echo "Done."

Adding the ‘n’ option in rsync does a dry run first so you can see if things are okay before going in for the kill.

Tlön, Uqbar, Orbis Tertius,Ultrafinitism and Depression.

Throughout my life, or more precisely, ever since I’ve attained wisdom, there have been individuals (“Mutants”) who have, through their sheer intellect and brilliance, have managed to influence, impress upon and shape my thoughts . This familiar story of a single individual seeking and marking down a list of men who have provided him inspiration is not new. For example, Alexander Grothendieck (one of the “mutants” in my list), listed his own set of mutants in his Notes pour la Clef des Songes. His list contains eighteen names.

My list contains a modest six.

These “Mutants” are human beings who are ahead of their time, precursors of a coming “New Age”. They are distinguished by internal freedom, insight into the nature of humanity and by the depth of Platonic genius inherent in their work. Inspection of their lives reveals periods in which each was tortured by their own mind, as if the weight of their genius was unbearable to them. All of these men (possibly with the exception of Da Vinci), at some point in their lives, were struck by melancholy, and learning how they dealt with their melancholia is greatly enlightening.

I myself have the tendency to slip into depression often; and during one of these manic depressive episodes, I happened to recall a line of Borges’, where he talked about writing a poem as “working [his sadness] out of his system and making something out of his experience”. So, I decided that I should do something similar myself: work the sadness out of my system and squeeze out something positive from it. It was then that I played around with an idea in my head, just for fun and to see where it would take me.

The main idea is derived from Jorge Luis Borges’ short story “Tlön, Uqbar and Orbis Tertius“, wherein Borges describes a universe which has completely adopted Berkelyean Idealism without a God. In a nutshell this means that while Berkeley has posited that only minds and mental constructs exist and thus the world exists because it is the mental construct of a God, Borges describes a Berkeleyan universe without a God, so all that exists is only that which people imagine in that particular instant, and the world is a series of such instants.Borges then describes various features of this curious universe, including its grammar, literature and so on.I liked to imagine this universe as an infinitely dark room with people having attached flashlights to their heads. If a person’s flashlight falls on something, it would mean that he is imagining that thing. Structures which are imagined by the people in the room are only as vivid as the amount of light falling on them, and are capable of being extinguished completely if there is no light around them. This is precisely what Borges describes at the very end of the text.

I was interested in the imaginary mathematics of such a universe.

At first, I thought it would it be sensible to say that the imaginary mathematics of such a universe would be equivalent to the Ultrafinitism of our own mathematics.

A few words about Ultrafinitism first. Ultrafinitism is a branch of Constructivism as a Philosophy of Mathematics. Constructivists believe that it is necessary to “find” or “construct” a mathematical object in order to prove it’s existence. So for example, \pi exists because we have  \frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} -\frac{1}{7} ...

On the other hand, something shown to exist by proof of contradiction is something which the constructivists don’t allow to be labelled “existing”. Because you haven’t explicitly constructed it.

Ultrafinitists take it a step further. Ultrafinitists deny the existence of the set of naturals \mathbb{N}, because it can never be completed. Here is an example of a conversation (taken from Harvey M. Friedman “Philosophical Problems in Logic”) with a well known Ultrafinitist, Alexander Esenin-Volpin, who sketched a program to prove the consistency of Zermelo-Frankael Set Theory with the Axiom of Choice in Ultrafinite Mathematics.

I have seen some ultrafinitists go so far as to challenge the existence of 2100 as a natural number, in the sense of there being a series of “points” of that length. There is the obvious “draw the line” objection, asking where in 21, 22, 23, … , 2100 do we stop having “Platonistic reality”? Here this … is totally innocent, in that it can be easily be replaced by 100 items (names) separated by commas. I raised just this objection with the (extreme) ultrafinitist Esenin-Volpin during a lecture of his. He asked me to be more specific. I then proceeded to start with 21 and asked him whether this is “real” or something to that effect. He virtually immediately said yes. Then I asked about 22, and he again said yes, but with a perceptible delay. Then 23, and yes, but with more delay. This continued for a couple of more times, till it was obvious how he was handling this objection. Sure, he was prepared to always answer yes, but he was going to take 2100 times as long to answer yes to 2100 then he would to answering 21. There is no way that I could get very far with this.

 

On the other hand, Intuitionistic Logic and Primitive Recursive Arithmetic are agreed to be foundations for Constructivism and Finitism respectively. The appropriate foundations for Ultrafinite mathematics is still an open question.

Now coming to mathematics in Tlön, Borges actually writes a couple of lines about how the mathematics and geometry of Tlön is. He write that in Tlön, the “very act of counting, changes the number being counted”.

My initial hunch about the equivalence of Tlön arithmetic and Ultrafinite arithmetic was based on the fact that our minds can only “picture” small numbers. We surely cannot picture 2^100 trees without being unsure of the number being pictured. To make this intuition rigorous, I can think of the following steps.

  1.  Pin down axioms for mathematics in Tlön: One can start this by looking at analogues of Peano Arithmetic in Tlön.
  2.  Pin down axioms for Ultrafinite mathematics: This may be a problem because in the preliminary reading that I have done, I have learnt that there are no formal foundations for Ultrafinitism and this is one of the main problems of this field. If not, maybe one can start with axioms for Constructivism (Intuitionistic Logic) or Finitism (Primitive Recursive Arithmetic)
  3. Show that they are equivalent: Show that the axioms imply each other. A cool way of doing this would be through Coq, the proof assistant.

Twitter bots and WhatsApp bots.

I made an automated Twitter bot which tweets out funny sentences from a corpus of tweets by my favorite twitter user, @AccioBae.

You can find the github repo here. It implements a Markov chain on the corpus of tweets. The first implementation wasn’t that effective, but I modified the algorithm slightly to get slightly better coherent results.

You can find the twitter account here. I named it “H.Bustos Domecq“.

A little while later, I wondered if I could do the same with WhatsApp as well. Upon a little searching I came across a github repository of a Python library which interacts with WhatsApp. This was called “yowsup“. I used it’s echo implementation demo for a while, to great results, albeit temporarily. The very next morning I found that WhatsApp had blocked that number. I requested an unblock by telling them that I had no malicious or spammy intent. They said no.

Oh well.

 

Exporting Nike+ run data to Strava and as a CSV file.

Oh man where do I begin?

I’ve just finished doing this and it has left me exhausted and weary and I have no clue where or how to begin. But like Seligman tells Joe, let me start at the very beginning.

I’ve been running more or less consistently over the past four years or so and have been using Nike+ Running to track my runs. At the start I was blown away by Nike+, recommending it to everyone I talked to quite enthusiastically and telling them to “add” me so that we can “compete”. I managed to convince about 12 people to join me. This resulted in quite a bit of healthy competition and more Nike+ love. Also, I loved all the statistics and trophies which Nike+ had to offer. I made sure to run on all 4 of my birthdays from 2012 to 2016 just because I wanted to earn the Extra Frosting badge. Also, there were Nike+ levels. Aah the memories! I still vividly remember that run when I hit Green level. It was raining and I kept running till I couldn’t run anymore. I ran 7k at once. That was probably one of the best runs of my life.

As the years went on, I bought a Nike+ SportBand because I was finding it difficult to run with my phone. The SportBand works with a tiny shoe pod which goes into a tiny slot inside a Nike+ shoe and gets connected to the SportBand while running. Then it tracks your pace, distance, calories burnt and other things like that. I used it for about three years. Now the shoe pod has an irreplacable battery and has, according to Nike, “1000 hours of battery life”.

Slowly, Nike decided to phase out the SportBand and the shoe pod and also they stopped making shoes with the shoe pod slots in them. That got me paranoid. What if my shoe pod dies suddenly? Then my SportBand would die too, because Nike isn’t selling standalone shoe pods anymore!

So I decided, with a heavy heart, to look at other run trackers out there in the market. There were tons of them! I heard a bunch of good reviews about Strava, so I decided to try that. All I had to do was transfer all of my runs onto Strava and poof, start anew!

Unfortunately, things weren’t that simple. Nike, it turned out, were maniacal about their data policies. Somehow they thought that MY run data were THEIR property, and did not allow you to export and download run data. This was the first thing that pissed me off. But I didn’t lose my shit completely, because hey, there are worse companies out there.

It so happened that before I lost  my shit completely, I ordered a Nike+ SportWatch GPS to track my runs.

Later, Nike put up this idiotic new website which seemingly got rid of Nike+ levels and the trophies too. This was the final straw. I lost my shit completely and went on a twitter rant. But I there wasn’t anything that could be done. So I went ahead and looked to Strava.

The first thing was to find a way to export my Nike+ run data to Strava. I searched a lot and found about three websites which promised to do that but neither of them worked. Finally I stumbled on this beautifully designed website which did the whole exporting in 4 simple steps. But it was too painful to do this every time after I used my SportWatch for a run. Then I searched if I could automate it, and I even thought of writing my own script for it. But I found a simpler app which does the same thing. So I was saved!

Now I also had this idea of sorts to download my run data and do a statistical analysis on it to get a better understanding of my runs. This was what I did today. To do that, I first installed this python package I found called “nikeplusapi“. Install it using pip2.7.
Next, since I wanted to write a BASH script to download the data, I wanted to get a tool which parses JSON data. jshon was the answer to my problems.

Finally, here is the bash code which gets this shit done.

#!/bin/bash

# Get the JSON data and store it in test.json. 
curl -k 'https://developer.nike.com/services/login' --data-urlencode username='EMAIL_ID_HERE' --data-urlencode password='PASSWORD_HERE' > test.json

# Make jshon read the test.json data.
jshon < test.json 

# Take out the Access Token from the json data.
ACCESS_TOKEN=$(jshon -e access_token < test.json | tr -d '"') # Get your latest run data from the Nike Developer website and store it into a file. nikeplusapi -t $ACCESS_TOKEN > output

# Store the relevant data into a variable.
NEW_DATA=$(awk '{if (NR==2) print}' output)

# Push the latest run data into the old dataset containing all runs. 
echo $NEW_DATA >> /home/kody/nikerundata.csv

# Clean up. 
rm test.json
rm output

Also, I modified the nikeplusapi code to display exactly the last workout’s data and nothing else. That is what I add to the existing CSV file in the Bash script above. The final data is now stored in nikerundata.csv and now we can do our magic on it in R!

This Bash script is messy and gives out a bunch of errors on execution, but hey man, it works for now. That’s all I need.

 

 

 

 

Analysis of Doppler Ultrasound in Predicting Malignancy.

A while back I happened to come across data from a hospital which consisted of Doppler ultrasound data of patients at the hospital. The data consisted of technical parameters related to the ultrasound and finally, a “final diagnosis” of the patient, which could be either “Malignant” or “Benign”. The doctor who provided the data asked if I could see any trend in the technical parameters in predicting the final diagnosis.

I decided to have a go at it since it would be a good statistics refresher and some practice in R.

I found a bunch of interesting observations in the data and at the risk of tiring myself by explaining it all twice, I’m just going to point to the github repository of this project. All the details are in the pdf file in that repository.

Zipf’s law, Power-law distributions in “The Anatomy of Melancholy” – Part II

The last post ended with me discovering a Zipf-like curve in the rank-frequency histogram of words in the Anatomy. The real problem was now to verify if the distribution was indeed explained by Zipf’s law. In the last post we saw that Zipf’s law was a special case of a more general family of distributions called “power law” distributions.

A discrete random variable X is said to follow a power law if it’s density looks like p(x) = P(X = x) = C x^{-\alpha} Where \alpha > 0 and C is a normalizing constant. We assume X is nonnegative integer valued. Clearly, for x = 0 the density diverges and so that equation cannot hold for all x \geq 0 and hence, there must be a quantity x_{\text{min}}>0 such that the above power law behaviour is followed.

One can easily check that the value of C is given by  \frac{1}{\zeta(\alpha,x_{\text{min}})} where \zeta(\alpha, x_{\text{min}}) = \sum_{n=0}^{\infty}(n+x_{\text{min}})^{-\alpha} is the generalized Hurwitz zeta function. So the parameters of a power law are \alpha and x_{\text{min}}. If we suspect that our data comes from a power law, we first need to estimate the quantities \alpha and x_{\text{min}}.

So upon searching for ways to confirm if the distribution was indeed coming from a power law, I chanced upon a paper of Clauset,Shalizi and Newman (2009) which outlines an explicit recipe to be followed for the verification process.

  1. Estimate the parameters x_{\text{min}} and \alpha of the power-law model.
  2. Calculate the goodness-of-fit between the data and the power law. If the resulting p-value is greater than 0.1 the power law is a plausible hypothesis for the data, otherwise it is rejected.
  3. Compare the power law with alternative hypotheses via a loglikelihood ratio test. For each alternative, if the calculated loglikelihood ratio is significantly different from zero, then its sign indicates whether the alternative is favored over the power-law model or not.

Their paper elaborates on each of the above steps, specifically on how to carry them out. Then they consider about 20 data sets and carry out this recipe on each of them.

Quickly giving the main steps :

They estimate \alpha by  giving it it’s Maximum Likelihood Estimator in the continuous case and give it an approximation in the discrete case as there is no closed form formula in the discrete case. Next, x_{\text{min}} is estimated by creating a power law fit starting from each unique value in the dataset, then selecting the one that results in the minimal Kolmogorov-Smirnov distance, D between the data and the fit.

Now given the observed data and the estimated parameters from the previous step, we can come up with a hypothesized power law distribution and say that the observed data come from the hypothesized distribution. But we need to be sure of the goodness-of-fit. So, for this, we fit the power law using the estimated parameters and calculate the Kolmogorov-Smirnov statistic for this fit. Next, we generate a large number of power-law distributed synthetic data sets with scaling parameter \alpha and lower bound x_{\text{min}} equal to those of the distribution that best fits the observed data. We fit each synthetic data set individually to its own power-law model and calculate the Kolmogorov-Smirnov statistic for each one relative to its own model. Then we simply count what fraction of the time the resulting statistic is larger than the value for the empirical data. This fraction is the p-value. Check if this p-value is greater than 0.1. The specifics of how this is carried out is given in the paper.

To make sure that the fitted power law explains the data better than another candidate distribution, say like lognormal or exponential, we then conduct a loglikelihood ratio test. For each alternative, if the calculated loglikelihood ratio is significantly different from zero, then its sign indicates whether the alternative is favored over the power-law model or not.

Thankfully, some great souls have coded the above steps into a python library called the powerlaw library. So all I had to do was download and install the powerlaw library (it was available in the Arch User Repository) and then code away!

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
#
# Copyright © 2016 kody <kody@kodick>
#
# Distributed under terms of the MIT license.

""" Using the powerlaw package to do analysis of The Anatomy of Melancholy. """
""" We use the steps given in Clauset,Shalizi,Newman (2007) for the analysis."""

from collections import Counter
from math import log
import powerlaw
import numpy as np
import matplotlib.pyplot as plt

file = "TAM.txt"

with open(file) as mel:
    contents = mel.read()
    words = contents.split()

""" Gives a list of tuples of most common words with frequencies """
comm = Counter(words).most_common(20000)

""" Isolate the words and frequencies and also assign ranks to the words """
labels = [i[0] for i in comm]
values = [i[1] for i in comm]
ranks = [labels.index(i)+1 for i in labels]

""" Step 1 : Estimate x_min and alpha """
fit= powerlaw.Fit(values, discrete=True)
alpha = fit.alpha
x_min = fit.xmin
print("\nxmin is: " ,x_min,)
print("Scaling parameter is: ",alpha,)

""" Step 1.5 : Visualization by plotting PDF, CDF and CCDF """
fig = fit.plot_pdf(color='b',original_data=True,linewidth=1.2)
fit.power_law.plot_pdf(color='b',linestyle='--',ax=fig)
fit.plot_ccdf(color='r', linewidth=1.2, ax=fig)
fit.power_law.plot_ccdf(color='r',linestyle='--',ax=fig)
plt.ylabel('PDF and CCDF')
plt.xlabel('Word Frequency')
plt.show()

""" Step 2&3 : Evaluating goodness of fit by this with candidate distribitions """
R1,p1 = fit.distribution_compare('power_law','stretched_exponential',normalized_ratio=True)
R2,p2 = fit.distribution_compare('power_law','exponential',normalized_ratio=True)
R3,p3 = fit.distribution_compare('power_law','lognormal_positive',normalized_ratio=True)
R4,p4 = fit.distribution_compare('power_law','lognormal',normalized_ratio=True)

print("Loglikelihood and p-value for stretched exponential: ",R1," ",p1,)
print("Loglikelihood and p-value for exponential: ",R2," ",p2,)
print("Loglikelihood and p-value for lognormal positive: ",R3," ",p3,)
print("Loglikelihood and p-value for lognormal: ",R4," ",p4,)

""" One notices that lognormal and power_law are very close in their fit for the data."""
fig1 = fit.plot_ccdf(linewidth=2.5)
fit.power_law.plot_ccdf(ax=fig1,color='r',linestyle='--')
fit.lognormal.plot_ccdf(ax=fig1,color='g',linestyle='--')
plt.xlabel('Word Frequency')
plt.ylabel('CCDFs of data, power law and lognormal.')
plt.title('Comparison of CCDFs of data and fitted power law and lognormal distribitions.')
plt.show(fig1)

So here were the results.

The estimated scaling parameter was \widehat{\alpha} = 2.0467 and \widehat{x_{\text{min}}}=9

The loglikelihood ratio of powerlaw against stretched exponential was 4.2944 and the p-value was 1.75 \times 10^{-5}. So we reject stretched exponential.

The loglikelihood ratio of powerlaw against exponential was 11.0326 and the p-value was 2.66 \times 10^{-28}. So we reject exponential.

The loglikelihood ratio of powerlaw against stretched lognormal positive was 6.072 and the p-value was 1.26 \times 10^{-9}. So we reject lognormal positive.

The loglikelihood ratio of powerlaw against lognormal was 0.307 and the p-value was 0.75871.

To be honest, I didn’t know what to do with the last one. Since we had positive loglikelihood ratio, that means that the powerlaw is favoured over lognormal, but only ever so slightly.

So the questions now remain : should I be happy with power law or should I prefer lognormal? Also, is there a test which helps us decide between the power law and lognormal distributions?

As far as I know, these questions are still open. Anyway, I think I shall give it a rest here and maybe take this up later. All that is left now is satisfication that I have beaten melancholy by writing about The Anatomy of Melancholy. (Temporarily at least!)

Zipf’s law, Power-law distributions in “The Anatomy of Melancholy” – Part I

A while ago while trying to understand depression (a.k.a. why I was so fucking sad all the time), I came across a spectacular book which immediately caught my fancy. It was a 17th century book on (quite literally) The Anatomy of Melancholy written by a spectacular dude called Robert Burton. The Anatomy was first published in 1621 and was revised and republished five times by the author during his lifetime. As soon as I discovered it, I wanted to lay my hands on it and read it fully, but very soon I lost hope of that altogether.

The work itself is mind blowingly huge. To quote a reviewer in Goodreads :

And for you perverts, here is how the length of The Anatomy shakes out.

439 pages — Democritus (Burton’s persona) To The Reader and other front matter (125 pages) & First Partition.
261 pages — Second Partition
432 pages — Third Partition
Which amounts to 1132 pages. The remainder of its 1424 pages (292) consists of 6817 endnotes, (which are painlessly skippable), introductions, a glossary, and an index; unless you’ve got that ‘every damn page’ project in mind.

And mind you, the prose is difficult 17th century English. Critics have called it,”The Book to End all Books“. Burton himself writes that, “I write of melancholy by being busy to avoid melancholy.”

Though itself stating that it focusses on melancholy, the Anatomy in fact delves into much much more. “It uses melancholy as the lens through which all human emotion and thought may be scrutinized, and virtually the entire contents of a 17th-century library are marshalled into service of this goal. It is encyclopedic in its range and reference.”

Our good friends at the Project Gutenberg have made the entire text of the Anatomy available for free to the public. You can access the entire text here.

About the same time as this, I discovered Zipf’s law and it’s sister laws: Benford’s law and the Pareto distribution. (Terry tao has a nice post describing all three and how they relate to each other.)

Zipf’s law is an empirical law which says that certain data sets can be approximated by the Zipfian distribution, which is a part of a family of more general discrete power-law probability distributions. More precisely, Zipf’s law states that if X is a discrete random variable, then the n^{th} largest value of X should be approximately C n^{-\alpha} for the first few n=1,2,3 \ldots and parameters C, \alpha >0. Of course, this does not hold for any discrete random variable X. A natural question is to ask which X follows Zipf’s law. As far as I know, apart from a few general comments about X, nothing further can be said regarding this question. Terry Tao says the above laws are seen to approximate the distribution of many statistics X which

  1. take values as positive numbers
  2. range over many different orders of magnitude
  3. arise from a complicated combination of many largely independent different factors
  4. have not been artifically rounded or truncated

Tao gives examples where, if hypotheses 1 and 4 are dropped, other laws rather than ones like Zipf’s law come into play.

Zipf’s law posits an inverse relation between the ranks and frequencies, which is only common sense. So we look at a text, say the Anatomy, look at each word in it, note its frequency and then assign a rank to each word based on its frequency. So one would expect the words “and”,”the” and “if” to have a really low rank, and thus a really high frequency. One can see this clearly in the histogram below. The word “and” is ranked 1 and it has the highest frequency among all other words.

Say now, that on a whim, (to test my insane python skillz) I write a program to fetch the entire text of the Anatomy and then with the text, I make a histogram of words and their frequencies. Here’s the program. (The whole book is available in plain text by the way. So I just had to wget the whole thing available here.)

"""Let us look at the word frequency in The Anatomy of Melancholy"""

from collections import Counter
from math import log
import numpy as np
import matplotlib.pyplot as plt

file = "TAM.txt"

with open(file) as mel:
       contents = mel.read()
       words = contents.split()

"""gives a list of tuples of most common words with frequencies."""
comm = Counter(words).most_common(100)

""" Isolate the words and frequencies and also assign ranks to the words. """
labels = [i[0] for i in comm]
values = [i[1] for i in comm]
ranks = [labels.index(i)+1 for i in labels]


indexes = np.arange(len(labels))
width = 0.2

"""Histogram of word frequencies"""
plt.title("Frequency of words in 'The Anatomy of Melancholy' ")
plt.xlabel("Ranked words")
plt.ylabel("Frequency")
plt.bar(indexes, values, width)
plt.xticks(indexes + width * 0.5, labels, rotation='vertical')
plt.show()

This then gives us the following graph.

2016-05-19-122614_1366x768_scrot.png

Now compare this with the picture below.

qSUgV

Doesn’t it make you want to scream, “Zipf!”?

Just to add even more weight to our hypothesis, let’s just plot the log of the frequencies against the log of the ranks of the words. If there is a power-law being followed here, we would expect the log-log graph of ranks and frequencies to be linear.

So put that into the code.

""" Log-Log graph of ranks and frequencies """

logvals=[log(x) for x in values]
logranks=[log(x) for x in ranks]

plt.title("Log-Log graph of ranks and frequencies of words in 'The Anatomy of Melancholy' ")
plt.xlabel("logranks")
plt.ylabel("logvals")
plt.scatter(logranks,logvals)
plt.show()

This now gives us the following plot.

2016-05-19-124124_1366x768_scrot

Hmm. Seems like there is a quantity x_{\text{min}} after which the plot is almost linear. That is, it looks like the tail follows a power-law. Naturally, at this stage, we would want to do a least squares linear regression and fit a line to this plot, and if it’s a good fit, use that to conclude that we have a power-law!

Unfortunately, it’s not that simple. A lot of distributions give a straight-ish line in a log-log plot. So that is just not enough.

Also, like how Cosma Rohilla Shalizi states in his blog, least squares linear regression on a log-log plot is a bad idea, because even if your data follows a power-law it gives a bad estimate on the parameters x{{\text{min}} and \alpha. Cosma even adds that even though this was what Pareto essentially did in 1890, “there is a time and place to be old school, and this is not it”.

We talked about x_{\text{min}} above. How do we get an estimate of that? How do we know where the power-law starts?

All fantastic questions! What about answers?

Well, up till 2009, no one knew how to answer them. Then there was a paper by Clauset,Shalizi and Newman which set all these matters to rest.

Indeed, to state whether a given data set is approximated satisfactorily by a power-law such as Zipf’s law is quite a tricky business, and it is this question which we shall be tackling later on. I hope to write another blog post after I’ve read through the paper and coded their recipe.

Till then, cheers!

The capacity to be alone – An obituary.

There is not a single day which passes in which I don’t see your name, or your influence and breathtaking power seeping through in the structures you created and called your own. Years ago I promised myself that I shall become like you, someone exactly like you…with superhuman prowess and might. Today with extreme sadness I realize that those dreams of mine were laughably childish. I shall never become half, nay, even a quarter of what you were.

You left the earth a year ago, leaving me irrevocably sad that I had not known you or spoken to you whilst you were alive. I had dreamt and prayed that you make an appearance in the future somehow alongside me and that I could just see you and maybe exchange a few plesantries, as I wouldn’t have been capable of expressing in words my admiration for you.

I am not that kid I once was. Life has been cruel to me because it has all but robbed me of my chance of following your footsteps. But then again, I don’t know if all this was meant to be or if it was just me not working as hard as I should have.

I see the others every day. They surround me and talk around me and I am forced to listen. I am, as you were too, surprised by them sometimes, surprised by the facility with which they pick up, as if at play, new ideas, juggling them as if familiar with them from the cradle. But look where they are now, and look where you are. They pale in comparison. I ask myself if it will be the same for me? Of course it won’t.

During your later years you became dissatisfied with the system. You said you have retreated more and more from the scientific “milieu”. You said you noticed the outright theft perpetrated by colleagues of yours and that was why you declined the recognition being bestowed on you.  This dissatisfaction which you had then has now made it’s way inside me.  I am dissatisfied as well, but it is more of me being bitter because I have been rejected. How else should one respond to someone letting you know that you aren’t good enough?

What makes my heart ache is that I shall never again discover that beauty for myself. That single moment of clarity which reveals the structure behind mathematics in that synchronous harmony which is it’s own. You have experienced what I am talking about. I have too, but not nearly enough.

Now that it won’t be possible for me to experience it ever, what then, should my raison d’être be?

What I am most scared about, is that it is now that that bond between you and me shall begin to falter and eventually fade. You shall become just another famous name I know and there will be nothing in common between us.

The others have intimidated me all through my life. And whenever they have, your words have been the most powerful consolation I could have ever asked for. What then, will be my consolation when the bond between us breaks?

I miss you, Shurik. I miss you like a pupil misses his master. I miss you despite the fact that I have never seen you or heard your voice. I miss the joy you used to give me when I discovered I shared the same passion you had. I miss the fact that I won’t be able to call myself a mathematician anymore: someone who “does” math, in the fullest sense of the word, like one “makes” love.

I have long contemplated learning French for the sole purpose of reading Récoltes et Semailles. I think now that I won’t. Reading it will be too painful for me and I have just about had enough disappointment to last me a life time.

Wherever you are, Alexander Grothendieck, rest in peace and know that you are missed.

Lain weather widget : “Service not available at the moment”.

On my Arch system I use copycat-killer’s awesome themes. One of my favorites is the “multicolor” theme because it gives me a ton of info and it looks snappy and nice overall. One thing I was most happy about was the weather widget that came along with the themes. Anyway, here’s a pic of it in action during happier times.

2016-03-29-003232_1366x768_scrot.png

But since the past few days, the weather indicator on top was annoyingly showing “N/A” and the weather widget was giving me a “Service not available at the moment.” message.

3mpryRs

I googled around a bit and found out that the lain weather widget uses the Yahoo Weather API and that the devs at Yahoo decided that after 15th March, their weather API would give out data only to requests which were upgraded to Oauth 1, whatever the hell that is.

One of the answers here, however, suggested to simply replace “http://weather.yahooapis.com/&#8221; with “http://xml.weather.yahoo.com/&#8221; and I decided to do just that in the lain weather widget.

The init.lua file for the lain widgets which were part of the “yawn” library was at

/home/kody/.config/awesome/lain/widgets/yawn/init.lua

In that, I replaced the line

local api_url = 'http://weather.yahooapis.com/forecastrss'

to

local api_url = 'http://xml.weather.yahoo.com/forecastrss'

And then restarted awesome with Modkey+Ctrl+r, which resulted in my pretty looking working weather panel again. 🙂

UPDATE 12/04/16 : The weather widget stopped working again. Yahoo used to be cool! 😦

UPDATE 24/04/16 : After tinkering again, I realized that my current config files were not up to date with Copycat-killer’s github ones. He’s not using Yahoo Weather API anymore, but is using the OpenWeatherMap API. I used the new weather.lua widget of his and things are up and running once more.

 

Ramblings of an excessively conscious idiot.