Oilshell blog recently hinted at a possibility of having dataframes in the language:
Data frames may someday be a feature of the Oil language. Why? Because the output of both ls and ps is a table. (Caveat: this work is far in the future.)
With the benefit of hindsight, this idea is really useful. More than ls
and ps
,
domain specific tools like from, say, srilm IO things which are conceptually
dataframes (an example below).
# Counting ngrams in this file ngram-count -text index.org -no-sos -no-eos -tolower -order 1 | head
language. | 2 |
org-babel. | 1 |
type | 1 |
universal | 1 |
why? | 2 |
conceptually | 2 |
<s> | 1 |
you | 1 |
constraints | 1 |
i | 2 |
Of course the idea is not totally universal. There are many popular tools that return structured data in formats like json, xml, sexps etc., but keeping a dataframe structure does make sense for many of the core unix and data processing tools.
Already there are a lot of tools for ingesting tables. A primitive parsing of shell output to tables is done in org-babel. You can get output of tools that emit tabular data in org tables and then work on them a little like dataframes:
;; I am using table from the above snippet (mapcar #'second tb)
2 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 1 |
Other than this and the usual unix tools, there are a lot of cli tools that actually work by passing proper tables, for example csvkit. But a move to dataframes, with type constraints and variable naming, might have some nicer side effects.