BibTeX frequency table
Wednesday, June 4th, 2008Something I usually ask my students to do us to draw up a frequency table of their use of references in their Theses. This is useful to see if one is over-citing particular sources, or disregarding sources that are more significant. Up until now most have gone the paper and pencil route.
Fred Otten came up with the following script using good old sed, awk and some plumbing, that draws up a nice list based on an input Lyx file.
#!/bin/sh
cat $1 | grep key\ \" |\
awk '{ print substr($2,2,length($2)-2)}' | \
sed -e s/,/\\n/g | \
awk 'BEGIN {i=0} \
{ if (temp[$1]) { temp[$1]=temp[$1]+1 } \
else { temp[$1]=1; tmp[i]=$1; i++; }; } \
END { for (j=0; j { print tmp[j] ” ” temp[tmp[j]] } }’ | sort
This gives a two column listing of the citation keys and their frequency count. This of course can be extended using further awk statements to transpose the columns, or sort by frequency, rather than citation key.





