Citation beyond Google Scholar

In one of his Topic … Comment pieces, Geoff Pullum decried declining standards in citation of preceding literature. I have just experienced a new threat to our quest to properly acknowledge sources. While marking a student’s work, I was surprised to came across a citation of Silverstein (1986) which clearly referenced the article on “Hierarchy of features and ergativity” which I had always thought of as Silverstein (1976). As the dates differed by only only digit, I at first took this as a typo. But then I looked at the reference list and found that the reference was to a different version of the article, one of which I had not previously been aware. The bibliographic details given were clearly incorrect, or at least incomplete, so I turned to Google Scholar (as one does these days) to check the details.

Searching for “Hierarchy of features and ergativity” in Google Scholar gets one direct match:


and the citation details given are:


This was exactly the information which the student had collected. A search of my library catalogue enabled me to find that this was really a chapter in an edited volume:


Silverstein, M. (1986). Hierarchy of features and ergativity. In P. Muysken & H. van Riemsdijk (Eds.), Features and projections (p. 163–?). Dortrecht, Holland ; Riverton, USA: Foris.
As I said above, this is an article which I have always known as Silverstein (1976), full details as follows:
Silverstein, M. (1976). Hierarchy of Features and Ergativity. In R. M. W. Dixon (Ed.), Grammatical Categories in Australian Languages (pp. 112–171). Canberra: Australian National University.
 (Incidentally, I recovered these details from the BibTex version of the data supplied by the References section of the World Atlas of Language Structures Online, an impeccably assembled and maintained resource. A large tip of my hat to Matthew Dryer, Martin Haspelmath and Robert Forkel.)
I can understand that the 1986 publication came from a significant academic publisher while the 1976 version is (to put it mildly) obscure and it is therefore not suprising that Google has indexed one and not the other. But the bibliographic details for the indexed publication are horribly incomplete (I still need to make a trip to the library to check the page numbers), and part of the chronological record of scholarly endeavour has been lost here – which was also the topic Pullum was commenting on 1988.

Weaning myself off Word

We all know that Word has its idiosyncrasies, and they have driven us all crazy sometimes. We also all know that the package has lots of capability that we don’t use and probably never will use….. But are there real alternatives? After reading this blog, I have started playing a little with pandoc and Markdown as an alternative approach to document production. I have got as far as producing a three page document in this workflow – it was a simple one and ironically it has ended up as a docx (so I can share it with colleagues), but the experience was not too horrible. I used Haroopad as an editor, and I can tell already that, if I keep doing this, I will have to improve my typing accuracy a lot. Finding typos in the low-contrast editing pane in Haroopad is too much for my aging eyes….

But there are more serious questions I still haven’t resolved. First I need to be sure that Zotero can be integrated into the workflow. I understand how Markdown handles citations and how I can use Zotero to generate a BibTex file which will feed the data into the final version, but the ease of choosing references using the Zotero plugin or Word is not something I will easily give up. I’m looking at a couple of possibilities, but things start to get more complicated rather quickly…..

Then there are a couple of linguist issues. The first is using IPA (International Phonetic Alphabet). The whole process is Unicode based, so the basic problem (consistent reference to symbols) is solved, and I have checked that rendering works (with an appropriate font). The remaining issue, though, is an input method. At the moment, I am not seeing a better possibility than selecting characters somewhere and then doing a cut-and-paste. There are online tools, and offline ones as well. It’s a bit clunky but I am not a huge user of IPA symbols and I can probbaly live with this.

That leaves a last problem that I’m quite nervous about – presenting examples as aligned interlinear text. The best way to make sure that this sort of presentation is stable is to use tables – i.e. doing in a word processor what you would be forced to do in html. But to make it look decent, you need to be able to adjust the size of table columns all the time…. Whether Markdwon can handle this is something I will have to find out. I realise that it may mean writing a specific style sheet at some point and that is a bit intimidating at this stage. Anyway, I will report on progress somewhere down the track and will be happy to share any tricks or tools I come up with.

Radical Saussurean semantics?

I have been following Benjamin Schmidt‘s posts (and here and here) about word-embedding models (WEMs). I don’t claim to have any grasp of the underlying mathematics, but the results are very interesting – I’m planning to download the R-package and do some playing with text that I am working on with Brian Zuccala (when I have time [he said optimistically]).

Here I just wanted to draw attention to one foundational aspect of this approach. Schmidt comments:

The question that word embedding models ask is: what if we could model all relationship between words as spatial ones? Or put another way: how can we reduce words into a field where they are purely defined by their relations?

This seems to me to be a very Saussurean approach to semantics – each word is defined by its place in the system, and that is in turn defined by the relationships (especially the differences) between the target word and all other words. The problem for Saussurean semantics has been the scale of the task of establishing all those relationships, except in circumscribed domains such as pronouns. But if that task can be handled by machine learning procedures, then suddenly this is a viable approach! Of course there are problems: the learning data will only ever be a snapshot; very rare words may not occur; are numerical estimates of relationships the one’s we really want (and I’m sure there are others I’m not thinking of yet). Using a very large corpus should give some traction on these sort of problems, but even acknowledging them, these methods seem like potentially a huge advance for empirical semantics.

More on citation styles

Following from my last post, I decided I should have a crack at editing a stylesheet for Zotero. I’m working on a paper at the moment where I rely a good deal on a seventeenth century text via two modern editions. The lack of a field for original date of publication is a long-standing issue for Zotero users; there is a work-around which enters the additional information in the Zotero ‘Extras’ field (see the bottom of p2 in the forum), but it means adding a few lines to stylesheets which have not been tweaked already. My normal style is the Unified Stylesheet for Linguistics Journals, which is untweaked as yet.

So I set to work using the Zotero Style Editor, following the advice provided in the forum linked above (adamsmith Dec 24 2014). And it all went well – because the Style Editor is dynamic, it’s actually very easy to track what you are doing and experiment with the style. I found the macro I needed to adjust, I added the snippet of code from the forum post (which recovers the original date information from the database) and I added “prefix” and “suffix” attributes to get the format I wanted. The results look like this:

(Rumphius 1983 [1648])

Rumphius, G.E. 1983 [1648]. Ambonsche Landbeschrijving. (Ed.) Z.J. Manusama. Jakarta: Arsip Nasional Republik Indonesia.


(Rumphius 2002 [1648])

Rumphius, G.E. 2002 [1648]. De Ambonse Eilanden Onder de VOC Zoals Opgetekend in de Ambonese Landbeschrijving. (Ed.) Chris Frans van Fraassen & Hans Straver. Utrecht: Landelijk Steunpunt Educatie Molukkers.
I am very happy to share the revised stylesheet with anyone who is interested. And I’m feeling much braver about the next tussle with a publisher – I think I have a decent chance of making changes in a stylesheet to match a house style. But of course that assumes that the publisher can tell me what style to start from…….

Welcome to the 21st century

Here’s an exchange I had with the editorial assistant of a journal (and via them with a publisher) in the last two days. No names, no embarrassment.


Referencing in our manuscript was accomplished using Zotero, with the stylesheet ‘Unified stylesheet for linguistics journals’. If you could let me know if there is a Zotero stylesheet which matches your house style, or alternatively, the closest match that you know of, this would make editing much simpler for me.

Editorial assistant:

I have spoken to a number of people at [publisher] and none of them are familiar with Zotero or know of any (close to) matching style sheets, so I am afraid that I cannot ease the job for you.

What is truly astonishing (at least to me) is that none of the publisher’s staff know anything about Zotero. And the question is how long authors will accept having to post-edit the output of Zotero (or Endnote or Mendeley or …) to meet the ultra-specific requirements of publishers who cannot be bothered to provide the needed tools (or to switch to a more standard format).

Is this a humanities problem, or do scientists have as much trouble?

Update (7/12/2015): I have now taken the plunge and modified a stylesheet – not to solve this problem, but for another citation issue. Details in the next post.

Changing definitions

I was interviewed for PM yesterday about the Macquarie Dictionary’s decision to extend the definition of the word ‘misogyny’. I was surprised that the story made it into the short version of the show, but today I see that the story even has international attention. I am once again astonished (haven’t I learned yet?) by how many people think that dictionary definitions establish the precise limits of usage, rather than being a reflection of what is (or more realistically was) being done in practice. In the wonderful phrase of the Macquarie’s Sue Butler, the lexicographers come round with mop and bucket and clean up some verbal mess. In this case, she admits that the clean up is a little delayed and has been triggered by a specific event, but normal lexicographic practice is being followed. Which is why I didn’t think it was such a big deal – but I guess my understanding of newsworthiness is poor!

(Cross posted at my Monash profile)


Gilbert and Sullivan meet the grammar Aryans

As you can see in this witty rewrite of The very model of a modern major general, ‘grammar Aryan’ is now preferred to ‘grammar Nazi’!