1 Experiments

  • Took something from my lists and wrote a post after a long time. Blurring Text. Will be trying to do such things on a regular basis.

2 Readings/Explorations

  • Blinkdb: queries with bounded errors and bounded response times on very large data (agarwal2013blinkdb)
  • Went through a few subword papers:
    • Bpemb: tokenization-free pre-trained subword embeddings in 275 languages (heinzerling2017bpemb)
    • Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing (kudo2018sentencepiece)

3 Programming

Commits for week 14-2019 and 4 previous weeks.

4 Media

Bibliography

  • [agarwal2013blinkdb] Agarwal, Mozafari, Panda, Milner, Madden & Stoica. 2013. "BlinkDB: queries with bounded errors and bounded response times on very large data", 29-42, in in: Proceedings of the 8th ACM European Conference on Computer Systems, edited by
  • [heinzerling2017bpemb] Heinzerling & Strube. 2017. "Bpemb: Tokenization-free pre-trained subword embeddings in 275 languages." arXiv preprint arXiv:1710.02187, , link. doi.
  • [kudo2018sentencepiece] Kudo & Richardson. 2018. "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing." arXiv preprint arXiv:1808.06226, , link. doi.