gemini://gemlog.cosroe.com/thoughts/academia.gmi

I haven't written anything formal in this way since my master's dissertation before the pandemic. I failed to organize myself to apply for PhD positions in time back in 2020, and exacerbated by the pandemic in all its contingency, felt uncertain it was the right time anyway. I just wanted to get home and be with friends and family for a while, after studying for 4 years abroad.

Those who were privileged enough to suddenly find themselves with excess time on their hands as I did, may similarly feel a strange nostalgia for the lock down. I spent most of my time following through on questions and thoughts I had in the margins of my lecture notes, and feeling my way through my subject, and generally decompressing after a very stressful day-to-day. I feel like I only really started learning and feeling confident in my subject when I didn't have to be learning it, and could explore at my own pace. A consequence of independent study, however, is the development of a subjective ethic when it comes to the methods and presentation of the work.

My paper details all the work my supervisor and I have done in the last year. In writing, I am concious that there were a number of things that really irked me in the way other papers in my field were written, and I am trying to avoid falling into the same pitfalls. Here's some of them, which I'll indulgently present as rules of a thumb.

Being introduced to a single new concept and being met with five or more citations is really unhelpful for the learner, and even for the seasoned academic. How are they meant to know which paper will detail the concept best? If one of the citations is from the 70s, and another from last month, are they to read the most recent first? If the concept is so general that there is plenty of literature in which it is discussed, cite the one that you think explains it best. Better yet, write just a few words for each reference so the reader themselves can quickly discern which is most appropriate for them to find and start with.

> There are fish in the sea with unique physiologies, some of which have strange faces, but can all be classified by their preference of composer (Kim et al. 1903; Beckett 1921; Scott et al. 1950; Rilley and Joyce 2000; Bourg et al. 2020; Dover 2023).

> There are fish in the sea (Kim et al. 1903), with unique physiologies (Beckett 1921; Joyce 2000). These fish have are commonly classified by which classical composers they enjoy (Scott et al. 1950), later expanded to neo-classical artists (Bourg et al. 2020). Some of these fish have strange faces (Dover 2023).

It is incredulous that many authors do not publish their code along with their paper. This should be a requirement; if the method section is ostensibly to allow someone to recreate your work, what better way to explain how you did it than by including exactly how you did it. For future researchers who are trying to catch up, it is also incredibly useful to see exactly how that trivial algorithm gets implemented, because what seems trivial to you will not be trivial to someone else.

There are plenty of really cool resources out there that will not only host your code with 20+ year guarantees, but some even take your code notebooks and let people run them *for free*. Furthermore, most academic institutions maintain a data storage policy with longevity, which you can use just to bundle a tar ball with all your scripts and put it there with a permalink that someone a long way down the line will still be able to resolve.

Much of academia is currently undergoing a reproducibility crisis. Papers that present or extensively use software have virtually no excuse to fall into that category.

Unless your paper is observational, you probably aren't the first person to do that thing, at best only the first to publish it in an academic journal. The stress that some papers put on being the first authors or work to explore something reinforces an unhealthy competitive culture, that rewards negative behaviour and anti-social practices. It may also impede researchers who fall into some type of non-normative category: those who do not have English as their mother tongue and may be slower to write, those who had unexpected life events impede them, those who burned out, those who wanted to double check their results again, those who were nervous about their work.

Emphasizing originality by being the first to do something encourages a race to publication. In the worst cases, it also leads to incorrect results being published that could have otherwise been corrected.

Repeated parts of equations can be made named variables, and constants can be fused. Complex equations look cool but they rarely need to be that complex. There are 26 characters in the latin alphabet, and another 24 in the greek, and LaTeX has at least 4 unique maths fonts: not everything has to be x and y or alpha and beta, with a plethora of tildes, hats, dots and bars. Use distinct symbols, keep things unambiguous. Defining a symbol with an inline equation in the middle of a paragraph will have someone cursing when they are trying to reconstruct your method.

Incredibly complex and original figures can be absolutely incredible. Rephrasing a commonly quoted maxim (or platitude) can sound impressive and give a reader a new way of looking at something. But most of the time they are not, and they wont.

Using defaults in your plotting package, or presenting multiple simple figures instead of an overly cluttered abstraction is familiar to readers, and will do a lot of the work of communicating your idea or result effectively for you.

Pick a colour scheme that is shown to be friendly to individuals with deuteranopia or protanopia, and use it for life.

Humans and machines will love you alike. A table that spans multiple pages is a transcription task waiting to happen.

Related to the point about simple maths, there is manifestly always a good reason to keep things simple. The model or method need not be more complex if that would eclipse ease of understanding, and may be iterated in complexity, to ensure the underlying ideas remain tractable. Simplicity, similarly, does not always mean brevity; to quote from the preface to the first edition of Kant's Critique of Pure Reason:

> The Abbe Terrasson writes indeed that if we measured the size of a book, not by the number of its pages, but by the time we required for mastering it, then it could be said of many a book that it would be much shorter if it were not so short.

The appendix should contain the complexity needed to obtain detailed understanding, that might otherwise hinder or make obtaining the unity of the ideas awkward with excessive caveat.

Chances are, your paper may be one of the first a newcomer in the field will read. There are many seemingly trivial details or concepts that felt arcane or even counter-intuitive when you first read them. Someone else will still be at that stage, and a small gesture of welcome to them is to treat every statement with equal reverence and to explain and reference concepts appropriately.

Some of my dearest colleagues who have spent most of their life in their field tell me that they still read new basic text books and papers, because they might still find a description of a concept they thought they understood cast wide open and clarified in a manner that gives both intuition and satisfaction. You can maybe write that eye-opener for someone else.

I will note that I am under no false impressions as to why this is the current state of affairs in academia. But I would like to appeal to the progression of science manifest, and try to improve the state of affairs. Science is already arcane and exclusionary enough in its ritualistic formality, authorship politik, Sisyphean funding pursuit, and occasionally damaging competition.

There have recently been a number of seminars where the topic was somewhere in the vicinity of "Rethinking the research paper", and good points were made: research papers are limited in their capabilities to faithfully represent research, and some (albeit fledgling) journals are already accepting e.g. Jupyter Notebooks or more versatile media as submissions. This is to recognize that sometimes there is no substitute for executable code interlaced with an explanation, rich data visualisation, a video, or even that sometimes big data research is best explained interactively. If the work is a new software tool that will enable other researchers to go about their work with absolute ease, is the documentation not a sufficient submission?

I am weary that converting to miscellaneous formats may lead to inaccessibility to these new media (c.f. also my view on familiarity). There should be no requirement to have an expensive computer to view good research. But I also think the resistance against this conversion on the grounds of inaccessibility is somewhat ironic given the state of paywalls in academia for research papers.

There is potential to make research much more accessible using formats other than papers. I am unsure what the standards should be, but I love the conversations happening around that space.

On writing a paper

A wall of citations is worse than no citations.

Your code is your method.

You probably aren't the first, and what did you sacrifice to be?

Simple maths is readable maths.

Familiarity will do most of the work for you.

Data tables should be attached data products.

You can always justify simplicity.

Write the paper you would have wanted to read when you started in your field.

Some more remarks...