1 Experiments
- Took something from my lists and wrote a post after a long time. Blurring Text. Will be trying to do such things on a regular basis.
2 Readings/Explorations
- Blinkdb: queries with bounded errors and bounded response times on very large data (agarwal2013blinkdb)
- Went through a few subword papers:
- Bpemb: tokenization-free pre-trained subword embeddings in 275 languages (heinzerling2017bpemb)
- Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing (kudo2018sentencepiece)
3 Programming
4 Media
- Were Nazis Drug-Fueled Crankheads? | Stuff You Should Know. I kind of want to try Blitzed now.
- Kurt Vonnegut, Shape of Stories (subtitulos castellano)
- Walden by Henry David Thoreau
Bibliography
- [agarwal2013blinkdb] Agarwal, Mozafari, Panda, Milner, Madden & Stoica. 2013. "BlinkDB: queries with bounded errors and bounded response times on very large data", 29-42, in in: Proceedings of the 8th ACM European Conference on Computer Systems, edited by
- [heinzerling2017bpemb] Heinzerling & Strube. 2017. "Bpemb: Tokenization-free pre-trained subword embeddings in 275 languages." arXiv preprint arXiv:1710.02187, , link. doi.
- [kudo2018sentencepiece] Kudo & Richardson. 2018. "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing." arXiv preprint arXiv:1808.06226, , link. doi.