As a data scientist, you’d need to learn several tools and technology stacks. The learning curve required for each of those can get overwhelming. On top of it, each tool and technology would require some familiarity with its own default editor. It is difficult to concentrate on a singular task when there are more than five different applications open, each taking a chunk of memory, requiring a different set of commands and compilation options. For most of the tasks that I’ve come across, Emacs solves this problem. What follows is a brief overview of the tools that I frequently use and how Emacs modes can makes life easier.

Frequent Technology Stack in Data Science

  1. R language
  2. RMarkdown files
  3. Python
  4. IPython notebooks
  5. Git
  6. command line shell
  7. grep through files
  8. Editing environment on remote machines
  9. File transfer over ssh
  10. Database and SQL interface
  11. Markdown
  12. Task lists
  13. Latex
  14. File system view

In the next Section, I’d discuss the different modes and my configuration of Emacs for those Sections.

Emacs modes

Emacs provides major and minor modes for each buffer. For one buffer, only one major mode is active whereas you can have multiple active minor ones. Major modes are usually programming language specific; the minor modes can include things like spell check (flyspell), fly hints for function parameters (eldoc), autopairing of brackets (autopair), (auto-completion) among others. The minor modes provide you with uniform helper functions while switching between different modes.

R language (ESS)

For R language, one of the default editors is RStudio. Emacs has ESS mode. It provides almost all the functionality of RStudio, augmented with documentation help, function completion, argument help, and numerous evaluation options. Furthermore, it will serve as a major mode for SAS, Julia, Stata and most of the other statistical packages. The image below shows how the environment looks while coding it looks alt text As you can see, the view is quite lean with the same power as RStudio. The bottom buffer is R console, Emacs’ command line shows arguments, and the working window is showing autocompletion options with documentation of parameter. I’d go into details of creating Shiny apps later, with some minor modes, autocompletion between ui.R and server.R makes it easier to work in Emacs as compared to RStudio. A good article about ESS integration is provided by Caroline Wiley.

R Markdown (ESS, polymode)

R Markdown provides one of the best platforms for creating reproducible data driven documents. You can embed, text, latex, html and code all into one file. It can be used for creating presentations, documents and even websites (with shiny package). For RMarkdown, you’d need syntax highlighting for both markdown style writing and power of ESS mode for R. This requires two modes and is solved by Polymode. A good blog post for configuration is provided by John Stanton. An image of how an RMarkdown environment is shown in image below. alt text

Python

A very detailed and amazing post is written about using Emacs as an IDE for Python by Jess Hamrick. You’d get functionalities like autocompletion, documentation, function parameters and help, console, autopep, code checking on the fly, virtual environments and debugging options. A screenshot of my python environment is given below. alt text

IPython notebooks (ein mode)

Jupyter’s IPython notebooks are the RMarkdown equivalent for reproducible documentations. Although, in my personal opinion, RMarkdown provides more flexibility about the generated document and is my preferred choice for sharing reports. You can run the IPython notebook in a shell opened in Emacs M-x shell and typing jupyter notebook. Once, an IP is assigned to the notebooks, the notebooks can be opened by using ein package in Emacs. The shell which opens the notebook session is given below. alt text

Following that, you can obtain the list of notebooks using M-x ein:notebooklist-open. It provides you with a list of notebooks in the current directory. alt text This provides you with an interactive environment with python, markdown and latex options as seen in screenshot below. alt text

Git (magit)

Most of the Emacs users will vouch for magit as the best Git porcelain for version control. You can enter the magit mode using C-x g. As you can see in the bottom buffer of the screenshot, magit is showing me all the untracked files in current directory. Since, I’m working on this blog post, the untracked files belong to this post. Followed by that, you can stage all by pressing S, you can commit by pressing cc writing the commit message and pressing C-c C-c. The changes can then be pushed by pressing Pu. The following snapshot shows the commit phase with diff. alt text The final phase shows the push phase. alt text

Shells

There are usually multiple shells open while working on any data science project. To name a few tasks: finding some files; grep through them; opening ssh connections; installing packages and applications; compiling some files; you’d need a shell. Or if you are more comfortable using git through shell commands, or as in our previous case opening a session of notebooks, a shell will be required. With so many shells open, it can get annoying pretty soon. In Emacs you can open and handle multiple shell sessions seamlessly. In the snapshot below, you can see multiple shells open through the command M-x shell (please note that in case of multiple shells, you’d need to rename each before opening a new one by using M-x rename-buffer. You can see a shell executing bundle exec jekyll serve for continuously generating blog post for this post. Another one for the notebook sessions, one of them finding Rmd files in system, another using git command line and an idle one. Five shells working at the same time, Pretty neat eh! alt text

Grep through files (rgrep mode)

Not as frequently used as other modes, rgrep can be pretty handy, especially when finding tags or some token from a list of files. For example, say you’re interested in finding the EMG in all the files with R extension within a folder. The following step will do this for you using rgrep:

  1. M-x rgrep
  2. Type emg as search query
  3. Enter *.R to search through all files with R extension.
  4. It will ask you for the folder to search through recursively.

There you have it. A screenshot of this search is shown. In the result buffer, you can move your cursor to any of the files and press enter to start editing that file. You can see that the results have highlighted all the code lines containing the word ‘emg’. alt text

Task lists (org mode)

For someone who can get distracted quite easily, I need lists to keep track of things; daily, weekly, monthly, yearly lists!! I’ve tried various applications, sticky notes, but it gets annoying with as many lists as I use. Emacs solves this issue for me with its amazing org-mode. You can take notes, plan projects, create TODO lists, assign priorities, set dead lines, or even use it as markup language (a more extensive form of markdown language). The snapshot below shows some of my recent TODO lists. alt text

Latex (AUCTeX)

Although RMarkdown and Ipython notebooks are sufficient for most kind of documentations, at times you need to go to the conventional way of communication. When, you need better organization, references, citations, more control over output, or are writing a scientific document; Latex is the way to go. Emacs provides a very strong layer for working on Latex through AUCTeX. Some of the features that you’d get through Emacs are:

  1. Latex syntax Flycheck
  2. Forward and inverse search
  3. what you see is what you get (with latex-preview)
  4. Grep through citations and reference sections.
  5. Autocompletion
  6. Central, intelligent compilation cycle.

A decent blog post for setting up latex with Emacs is provided by Piotr. Here is how Emacs looks like while working with Latex files: alt text

Editing File system directories (dired)

Finally, you often have to play in the file system, modify, change permissions and edit files. Dired is an extremely powerful package for that. Here’s a handy cheat sheet of dired. You can rename multiple files, open the directory listing as an editable document, mark files for moving, copying and all that you may require without leaving Emacs.

Final words

The purpose of this post is to show experienced Emacs users that it is suitable for most Data Science related tasks. You can find my init file here. Emacs does have a steep learning curve, and I wont recommend it for beginners.

I love xkcd, and I’d end this post with Randall Munroe’s take on editor wars. alt text