programming org-mode

Oilshell blog recently hinted at a possibility of having dataframes in the language:

Data frames may someday be a feature of the Oil language. Why? Because the output of both ls and ps is a table. (Caveat: this work is far in the future.)

With the benefit of hindsight, this idea is really useful. More than ls and ps, domain specific tools like from, say, srilm IO things which are conceptually dataframes (an example below).

# Counting ngrams in this file
ngram-count -text index.org -no-sos -no-eos -tolower -order 1 | head
language. 2
org-babel. 1
type 1
universal 1
why? 2
conceptually 2
<s> 1
you 1
constraints 1
i 2

Of course the idea is not totally universal. There are many popular tools that return structured data in formats like json, xml, sexps etc., but keeping a dataframe structure does make sense for many of the core unix and data processing tools.

Already there are a lot of tools for ingesting tables. A primitive parsing of shell output to tables is done in org-babel. You can get output of tools that emit tabular data in org tables and then work on them a little like dataframes:

;; I am using table from the above snippet
(mapcar #'second tb)
2 1 1 2 2 2 1 1 1 1

Other than this and the usual unix tools, there are a lot of cli tools that actually work by passing proper tables, for example csvkit. But a move to dataframes, with type constraints and variable naming, might have some nicer side effects.