Tuesday, June 3, 2008

Writing Workflow for Scientific Articles

I'm a researcher. An important part of my work concerns writing peer-reviewed scientific papers, in order to expose my work in different scientific venues, such as symposiums, conferences, journals, and books.

Having an excellent research work, with excellent results and findings, is insufficient to have a paper accepted. An important part of this process relates to exposing your ideas, your results. And writing papers is really hard task. It's a mix of sweating to find the proper words and to put them in the proper places, with a fluid sequence of ideas and explanations. It's almost an art form, despite some fairly dogmatic (common-sense?) items that must be present, such as state-of-the-art review, introduction and conclusions, etc.

To achieve a an accepted quality in the writing process (assuming that the actual content is scientifically relevant, of course), I typically perform a well-defined set of tasks: research on existing (and relevant) state-of-the-art work (hand-in-hand with the development of the research work and results gathering/analysis), organise high-level ideas into concepts, drill, cite work, read, annotate, and iterate until reaching the desired result (or, more often than not, reaching the deadline).

This fairly complex and exhausting process can be leveraged a bit by using the right tools at the right time, in order to shift my focus towards Getting Things Done. That is, not worrying about crashing document editors, text formatting, citing format, print+comment+rectify/improve. Just focus on structure on my ideas and write them in a coherent way.

Furthermore, the sheer amount of research work that is published every year in related venues makes it increasingly difficult to find needles in haystacks. That is, find that research article in the piles of paper sitting in the desk, unorganised or, at best, stored in shelves. Obviously, this process doesn't scale. It's an evident role for digital technologies, specially for bibliography and citation management tasks.

I think that several researchers can relate to these scenarios. Hence, all of this blabber leads to my suggestion of a workflow optimised for scientific articles writing tasks, tailored to the best software I could find. On OS X. I'm not sure if some of the software I'll be talking about in the rest of this post has counterparts in other platforms. If so, please feel free to comment and contribute with some thoughts and links.

LaTeX



No researcher in her/his own mind writes scientific articles with other software (unless it's specifically prohibited). LaTeX, a set of extensions to the TeX typing system, where one focuses just on document structure (i.e., abstract, sections, etc.) and on content itself. LaTeX files are plain text files. They are parsed and processed with LaTeX software through one of the several flavours wildly available on the Web, resulting on either a PostScript document (.PS) or a universally accepted PDF.

Despite some problems of LaTeX, such as (oft) lack of WYSIWYG software (due to its typesetting compiler-alike nature), the results are of high-quality and WYSIWYP (What You See Is What You Print). Add that to the almost ubiquitous availability of LaTeX templates on conference/journal websites, a really good automatic bibliography formatter (BibTeX), coupled with an almost dauntingly comprehensive number of utilities, and you've got yourself a must-have typesetting software for scientific papers.

There are several LaTeX distributions at one's disposal, for every platform. My preferred choice on OS X goes to MacTeX, since it is geared toward OS X's look&feel on supportive tools, as well as correct integration with the OS (read: it just works out-of-the-box).

So, LaTeX will be the centre on which the rest of my software choices gravitate around.

Papers



As explained earlier, managing state-of-the-art and other relevant sources of information can be daunting. Either at a physical level (stacks of real printed paper) or at the digital (folders), managing and searching through all papers to find that particular one you're looking for (and with a paper submission deadline lurking in the corner) is just cumbersome.

Papers will help you on this (too obvious name for a software!) It's a really good software to manage, organise, and usefully leverage your entire collection of PDFs laying around in the hard drive. It integrates with well-known scientific digital libraries, including ACM, IEEE Explore, arXiv, among many many others (and it's plugin based for repositories integration).

Despite the fact that one has to pay a license to use it (€29, not that expensive), trust me on this one, it's worth the money. With Papers I can tag (i.e., multi-category), annotate, and search through my own repository within the program, as well as through Spotlight.

One more thing. It affords exporting papers' metadata into the BibTeX format. This way, I can manage everything related to what I have to cite in a single program. It is the right hammer to the right nail.

Scrivener



At some point is time to put thoughts, ideas, and results into words. As I previously said, it's not easy. Almost no one can write a paper top to bottom, from the first word to the last. It's an iterative process that starts invariantly with organising ideas in a coherent line of thought. That's when Scrivener comes to help.

Scrivener is a tool targeted to all writers that exploits the typical workflow of drafts, loose notes, and combining them into a consistent piece. It's fairly similar to scientific writing, minus some issues that I'll describe later on. One of its killer features is the full-screen editing mode. I've written an essay before about the benefits of full-screen applications, ergo Scrivener fits perfectly into this line of thought. It hides all other apps, animations, popups, and everything that might stand in the way of the writing process. This way, one focuses just on what's supposed to be done: writing that paper.

This software also supports researching tasks (lato sensus), including searching the Web, bookmarking Webpages, as well as annotating text drafts. I do not advise performing all of these tasks within Scrivener. To put it simple: use it just to organise your ideas, structure your text in different drafts, and that's it. Papers and other software listed in this essay will streamline research and annotating tasks in a better way.

Oh, and did I mention that Scrivener exports into the LaTeX file format?


TextMate, Skim, and pdfsync



After having the core texts for the paper converted to LaTeX, one has to delve into details and typeset it. Editing it in a generic brand or non-specialised text editor is something from the last century. With the current days of syntax highlighting and IDEs, a lot of choices are available.

Furthermore, LaTeX is a command line oriented software package. And that's how it should (continue to) be. However, one typically wastes too much time opening a shell and running a set of commands to typeset LaTeX documents.

To complete the workflow I've described earlier, this detailing and improving process includes annotating the paper with comments, highlights, strikes, underlines, etc. Since I'm talking about an all-digital workflow, the process of annotating and editing must be as simple as possible, mimicking the print-annotate-edit traditional process.

All of this can be easily avoidable with a tailored LaTeX text editor plus some useful tools.

While other choices are available, my personal belief is that the workflow is better supported and streamlined with TextMate.

TextMate is an all purpose text editor mostly targeted to programming tasks. It was popularised by the Ruby on Rails guys, as a simple, lightwight, and GTD-friendly text editor (I strongly agree with this opinion.) It also provides a comprehensive support for different programming languages and, as you surely have figured out, supports LaTeX out-of-the-box.

Through a bunch of keystrokes , the typeset tasks are instantly launched, and a user-friendly window presents possible errors and warnings that might occur. Citations are easily managed, and syntax highlighting provide visual cues to LaTeX keywords.

Within the iterative process of improving the paper one's writing, the back and forth reading, annotating, and editing process can be really tiresome. Therefore, to mitigate such problem, two other tools can help getting back on track on the main task: finishing the paper. These tools are pdfsync (which is already bundled in LaTeX distributions) and Skim.

pdfsync provides the core support for swinging between the typeset PDF and the LaTeX source (with a fairly good granularity). Setting it up just requires adding a \usepackage{pdfsync} on your LaTeX preamble. After typesetting, a marker will appear on the PDF, representing the position your cursor is located within the LaTeX source.

Vice versa, Skim supports the other direction (PDF towards LaTeX), since OS X's default PDF reader does not afford this functionality. Skim is supported by TextMate, which can be setup with just two mouse clicks.

As a bonus, Skim has built-in PDF annotation tasks just like Adobe products, with the added bonus of being free (as in beer) and really lightweight.

OmniGraffle



The last thing I'll be talking about in this essay concerns creating vector-based figures. One of the beauties of PDF (and PS, for that matter) is that it's a vector-based file format. It means that it's resolution independent. Consequently, it is desirable that, whenever possible, all figures embedded into the paper are vector-based as well.

My preference for creating figures is OmniGraffle. It's a lightweight and easy-to-use piece of software, that provides intelligent guides to create vector-based figures that are coherently aligned, dimensioned, and eye-candy. Remember that a good figure can be worth one thousand words. A poor quality figure (e.g., misaligned shapes) conveys an amateurish approach to the work, which can be negatively reflected in the peer reviewing process. High quality graphics do help improving the paper's overall quality. After using it, you'll be constantly reminded that it's an excellent piece of software when you have to use Microsoft Visio or any other diagram software of lesser quality.

Add to that fact that it supports 100% vectorised PDF exporting - which can be directly embedded into LaTeX files, and you've got a high quality research paper ready to be submitted, peer-reviewed and, hopefully, accepted!

Ending remarks


I hope this info will help you lowering the burden on the logistics of writing scientific papers. While I'm not an expert on all of these topics, all of this comes from my 4 to 5 years of experience working as a researcher. Once again, this is not an exhaustive list of software and workflow. It's just my own experience being described.

I'm sure there is a lot of things that I may have missed, and better software out there. I'm still missing two significant pieces of software that can integrate seamlessly into my workflow: WYSIWYG table and equation editors, and integrated into TextMate. It would be great to select a table or an equation, and edit it without having to know a bunch of macros.

Therefore, please feel free to comment, and make corrections and suggestions. I believe it's important that researchers spend their time on researching, not wasting it on avoidable pitfalls in the writing process.

And now, back to that pesky paper I'm writing...