Twitter for Scientific Communication: How Can
Citations/References be Identified and Measured?
Katrin Weller
Dept. of Information Science
Heinrich-Heine-University sseldorf
Phone: 0049 (0) 211 8110803
weller@uni-duesseldorf.de
Cornelius Puschmann
Dept. for English Language and Linguistics
Heinrich-Heine-University sseldorf
Phone: 0049 (0) 211 8115927
cornelius.puschmann@uni-duesseldorf.de
ABSTR A C T
7KLV SDSHU GLVFXVVHV µFLWDWLRQV¶ DQG µUHIHUHQFHV¶ ZLWKLQ WKH
microblogging service Twitter with the aim to provide measures
for scientific communication on this platform. It provides
definitions for different types of citations on Twitter and
discusses general difficulties in accessing scientific tweets.
Furthermore, two different datasets that represent scientific usage
of Twitter have been analyzed with respect to citation counts.
Ge ne ral Te rms
Measurement, Human Factors
Ke ywords
Microblogging, Twitter, scientific communication, citations,
references, scientometics..
1. IN TR OD UC TI ON
Scientific communication is typically perceived as a process of
SXEOLVKLQJ VFLHQWLILF SXEOLFDWLRQV DQG RI FLWLQJ RWKHU VFLHQWLVWV¶
publications. The disciplines of bibliometrics and scientometrics
have established procedures for measuring scientific output based
on publications and scientific reputation based on citations.
Informetric citation analysis distinguishes citations from
references [11]: A citation is a formal mention of another work in a
scientific publication ± viewed from the cited wRUN¶VSHUVSHFWLYH
A reference is the same mention of a work but viewed from the
FLWLQJZRUN¶VSHUVSHFWLYHW\SLFDOO\LQIRUPRIDUHIHUHQFHVHFWLRQ
in a publication). Thus, citations and references are two sides of
the same coin. Slightly inconsistently, WKH WHUPµFLWDWLRQ¶LVDOVR
used as a broader term that subsumes both the dimension of
citations as well as the dimension of references. This paper
investigates whether comparable structures of citations and
references can also be identified in microblogging environments,
particularly in the microblogging service Twitter
The Web as a medium for information exchange and
communication has lead to the investigation of new metrics
(webometrics) in addition to classical bibliometric and
scientometric indicators [12]. While classical webometrics mainly
consideres hyperlink structures between Websites, recent Web 2.0
tools that enable novel forms of social interaction have brought
about a range of new aspects that can be measured and evaluated
(e.g. relating to access and usage, Web publication behavior, user
interrelations). [12] explains that measuring Web 2.0 services
offers new ways for data mining; it can help to gain insights to
³SDWWHUQV VXFK DV FRQVXPHU UHDFWLRQV WR SURGXFWV RU ZRUOG
HYHQWV´ >@ >@ SURYide an overview on Web 2.0 services that
may be of interest for new scientometric indicators by measuring
publication impact based on social mentions. One of these social
software scenarios is microblogging.
Within this paper, we investigate Twitter usage in scientific
contexts and consider Twitter as a means for scientific
communication. The scientific use of Twitter has received some
attention in previous work: [4] and [5] have performed several
automatic analyses of tweets collected for different conference
hashtags, including for example time series and lists of most active
twitterers. [3] and [9] have furthermore carried out manual
analyses of tweet contents for conference tweet datasets to
determine, what conference participants are tweeting about. [10]
are developing automatic methods for extracting semantic
information from conference tweets. [6] have focused on tweets
published by a set of manually identified scientists and have
investigated their citation behavior.
[6] define 7ZLWWHU FLWDWLRQV DV ³GLUHFW RU LQGLUHFW OLQNV IURP D
tweet to a peer-UHYLHZHGVFKRODUO\DUWLFOHRQOLQH´DQGGLVWLQJXLVK
first- and second-order citations based on whether there is an
³LQWHUPHGLDWH ZHESDJH EHWZHHQ WKH WZHHW DQG WDUJHW UHVRXUFH´
Within this paper, a broader approach is applied. Two
fundamental types of citations are distinguished: external citations
are all links included in tweets; internal citations are retweets
within the Twitter platform.
The paper will discuss these two types of citations and will focus
on their implications and challenges for informetrics (section 3).
But first of all it will have to start with the general problem in
analyzing scientific impact of Twitter: how can scientific contents
be actually identified on Twitter (section 2)? We will furthermore
present our current approaches to citation analyses on Twitter for
Permission to make digital or hard copies of all or part of t his work
for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial
advantage and that copies bear this notice and the full citation on
the first page. To copy otherwise, or republish, to post on servers
or to redistribute to lists, requires prior specific permission and/or a
fee.
Web
S
ci
11
, June 14-17, 2011, Koblenz, Germany.
Copyright held by the aut hors.
two different types of datasets. Section 4 describes how these
datasets were gathered, section 5 presents very preliminary
results. Our overall aim is to better understand how scientists use
Twitter and whether traditional patterns of scientific
communication are being mapped to microblog communications or
whether entirely new practices emerge. This paper should
primarily be viewed as exploratory research in the field of
informetrics for microblogging. It may provide a basis for future
work on developing novel informetric indicators or for the
development of applications that make use of these indicators, e.g.
for identifying and ranking popular tweets, popular twitterers or
external resources, as well as for displaying user networks based
on co-citation or bibliographic coupling.
2. ID EN TI FI C A TI ON O F SCI EN TI FI C
T WE E TS
As Twitter is not dedicated to one particular application scenario
and thus includes users with various backgrounds and different
motivations, it is difficult to identify scientific tweets or
twitterers. It is not yet defined in the research community what
actually classifies as scientific Twitter usage or as a scientific
tweet. There are also no reliable statistics about how many
scientists use Twitter (and consequently no insights to how many
of them do so for science-related communication). Empirical
studies (quantitative and qualitative designs) that investigate
VFLHQWLVWV¶PRWLYDWLRQVIRUXVLQJ7ZLWWHUDUHPLVVLQJ ± analyses are
so far mainly based on the data delivered by Twitter. So far, there
are basically two different ways to compose scientific tweet
datasets [13]: a) based on hashtags and b) based on persons.
Theoretically, a third way would be to collect all tweets with
scientific content or that link to scientific content. This, however,
is almost impossible to achieve, as it would require either manual
identification of tweet contents or elaborated computer-linguistic
automated methods as well as an elaborated definition for
µVFLHQWLILFFRQWHQWV¶
2. 1 Pe ople -Bas e d Approach
Analyses of scientific Twitter behavior may be based on a
collection of tweets published by a scientist. Similar approaches
are frequently applied in analyses of scientific blogging. Yet, the
GHILQLWLRQRIµVFLHQWLVW¶LQWKLVFRQWH[W is not always consistent. It
may for example be a narrow definition only including members of
universities or a broad one including also, e.g., teachers and science
journalists. Analyzing (micro)blogs based on users is depending on
the availability of biographical information provided by the blog
authors or twitterers. Furthermore, a selection of users will have
to be made manually. [6] have applied this approach and have
manually identified 28 twittering scientists (using a snowball
system) to analyze their citation behavior. [14] has identified
twitterers with academic background by examining the list of
followers RIWKH&KURQLFOHRI+LJKHU(GXFDWLRQ¶V7ZLWWHUDFFRXQW
The most notable effort in collecting scientific twitteres has been
made by David Bradley, who identified more than 500 scientific
twitter accounts [2].
One problem in people-based approaches is that a twitter account
may also be shared by a group of people. For example, a research
group may have a twitter account and several members of that
group may access this account to report their latest efforts. Other
official institutional accounts (e.g. for a university) may be
completely taken care of by a single person. In many cases it is
not possible to distinguish whether a twitter account is used by a
single person or a group. To our awareness, there are so far no
studies that exclusively analyze Twitter accounts belonging to
scientific groups or institutions.
2. 2 Hashtag-Base d Approach
The more common way to compose datasets for scientific Twitter
analyses is to collect tweets for specific (science-related) hashtags.
Only in rather rare cases, scientists announce official hash-tags for
their projects or topics of interest. One recent prominent example
LV WKH KDVKWDJ ³DOWPHWULFV´ZKLFKLV introduced by [8] for work
on measuring scholarly impact on the Web. But much more
frequently, specific hashtags are announced for scientific
conferences (some of them officially proposed by the organizers,
HJ ³ZHEVFL´, some are arranged by the participants of a
conference during the event). M ost studies on scientific
microblogging have used datasets collected via conference hashtags
[3, 4, 5, 9, 10, 13]. This approach always has to accept, that
WZHHWV PLJKW EH ³ORVW´ ,I WZLWWHUHUV HQJDJH LQ WKH GLVFXVVLRQ
without using the respective hashtag, their tweets cannot be
included. The same holds for tweets in which the hashtag is
missp elled HJ ³ZHEVFL´ LQVWHDG RI ³ZHEVFL´6WLOOLW
enables us to compose datasets for a relatively consistent subset
of Twitter users, namely people interested in the contents of a
particular scientific conference.
3. CI T A TI O N AN A L YSIS ON T WI T T ER
Sets of scientific tweets may be analyzed with different
objectives. Our main question within this paper is whether
scientific tweets include citation structures similar to traditional
information flows in scientific literature. [6] define Twitter
FLWDWLRQV DV ³GLUHFW RU LQGLUHFW OLQNV IURP D WZHHW WR D SHHU-
UHYLHZHG VFKRODUO\ DUWLFOH RQOLQH´. They distinguish first- and
second-RUGHUFLWDWLRQVEDVHGRQZKHWKHUWKHUHLVDQ³LQWHUPHGLDWH
webpage between the tweet and taUJHWUHVRXUFH´,QWKHLUVDPSOH
of tweets collected from 28 academics they discovered that of all
tweets including an URL, 6% fit into their definition of twitter
citations, i.e. they linked directly or via an intermediate page (like
a blog post) to a peer-reviewed article. Within our previous work
[13] we suggested alternative definitions and different dimensions
of citations in Twitter.
3. 1 Exte rnal Citations
We consider all URLs included as a form of citation: the tweet
includes a reference in form of a URL and a certain website obtains
a citation through this tweet. URLs in tweets act as external
citations as they link Twitter content with external websites.
Analyses may focus on the types of resources that are referenced
in URLs [13]. For purely scientometric analyses, references to
scientific publications are of highest interest, but references to
scientific blog posts or presentations slides may also be valuable
information. For more general informetric analyses, references to
all other websites may provide additional value.
3. 2 Inte rnal Citations
3. 2. 1
Retwee ts
Retweets (RTs) can be interpreted as a form of inter-Twitter
citation (
inte rnal citations
). A user who retweets another one¶V
tweet publishes a reference, the retweeted user gets a citation. As
analyzed by [1], users retweet for different reasons like
LQIRUPDWLRQGLIIXVLRQRUXVHUHWZHHWVDVD³PHDQVRISDUWLFLSDWLQJ
LQ D GLIIXVH FRQYHUVDWLRQ´. This should be investigated in more
detail for scientific tweets. Yet, retweet analyses are not easy to
perform, due to the lack of format standardization. Not all
WZLWWHUHUVUHWZHHWZLWKWKHVWDQGDUG³57#XVHU´IRUPDW
3. 2. 2
@
m
entions
@mentions of usernames within tweets also sometimes resemble
UHIHUHQFHV HJ LQ WZHHWV OLNH ³-XVW UHDG DQLQWHUHVWLQJSDS er by
#VDPSOHXVHU´ <HW WKH\ FDQ FXUUHQWO\ QRW EH DXWRPDWLFDOO\
distinguished from other @messages and will thus have to be
excluded from current analyses.
4. D A T A C O L L E C TI O N
Within our previous work [3, 13] we have exclusively worked
with scientific tweets collected via conference hashtags. We now
want to extend this and include additional data collected via a list
of scientific twitterers.
4. 1 Hashtag-Bas e d Colle ction
During our previous work [3] we have collected tweets from four
scientific conferences. Conferences were selected based on two
features: size and discipline. We have chosen two smaller
conferences (<500 participants) and two major conferences
(>1.000 participants). One small and one larger conference was on
topics from (digital) humanities and one small and one larger
conference was located in the field of computer sciences. In [3] we
performed intellectual analyses of tweets in these conference
datasets. In [13] we continued this work and performed additional
intellectual analysis of URLs included in tweets and first citation
analyses. Within this paper and the respective poster we now
want to consider the results of citation analyses from the hashtag-
based dataset in comparison to additional data collected with a
people-based approach.
Currently, we have restricted our citation analyses to data from
two conferences out of the initial set of four conferences, as the
methodology is still subject to refinements and should be
improved after discussion in the scientific community. We have
chosen the two larger conferences: one from computer science (the
World Wide Web Conference 2010, WWW2010, hashtag
#www2010), and one from humanities (the Modern Language
Association Conference 2009, M LA 2009, hashtag #mla09). Table
1 presents an overview of the key information about the selected
conferences and their respective hashtags. We deliberately
concentrated on the main hashtag for each conference in order to
achieve uniform preconditions for each set (we did not include
spelling variants or hashtags for associated or co-located events).
Table 1. T he te st datase t for twe e ts wi th confe re nce hashtags
#ml a09 and #www2010
Hashtag
#www2010
#ml a09
Conference
World Wide Web
Conference
(WWW2010),
Raleigh, NC,
USA.
Modern Lan-
guage Associ-
ation Conference
(MLA 2009),
Philadelphia, PA,
USA.
Conference dates
26.-$SULOµ10
27.-30. 'HFµ09
Discipline
Computer
science
Linguistics,
literature, (digital
humanities)
No. of tweets from
two weeks before
until two weeks
after the conference
3,358
[during period:
13. April 2010-
14. May 2010]
1,929
[during period:
15. Dec. 2009-
14. Jan. 2010]
Total no. of unique
twitterers (average
no. of tweets per
twitterer)
903
(Ӆ 3.72)
369
(Ӆ 5.23)
Total no. of tweets
during actual confe-
rence days only
2,425
[26.-30. April
2010]
1,206
[27.-30.
December 2009]
4. 2 Pe ople -Bas e d Colle ction
We assume that scientists tweet differently during conferences
than in every-day situations. To fully support this, broad
additional studies with data collected from scientific twitterers are
needed. In order to start first analyses in this regard we have
started to set up a sample collection of tweets by scientists.
We used the list of scientific twitterers by Bradley [2] and
modified it; we added some more twitter accounts which we had
manually identified as belonging to scientists. Scientists in this
context are not purely university staff but may also be (graduate)
students or researchers in companies. Some twitter accounts may
not belong to individual persons but to scientific groups.
Altogether, we obtained a set of 589 unique users. We then
collected all the tweets from these 589 Twitter accounts during the
period January 7, 2010 until August 31, 2010. The total number
of tweets for this dataset is 410,609 tweets.
5. FIRST R ESUL TS
Within this poster paper we will only be able to give a very first
insight to our overall results. More detailed data will be presented
in the poster. A first result of high interest can be found in the
pure counting of URLs as external citations. We counted the
numbers of URLs (identified as strings starting with ³KWWSV´RU
³www.´ followed by text) in different ways. Table 2 shows how
many tweets in the #www2010, the #mla09 and the people-based
dataset contain at least one URL. As some Tweets contain more
than one URL, the total number of URLs is also listed. For the
two conference datasets we have also resolved the shortened
URLs to count the number unique URLs: the #www2010 dataset
includes 574 unique URLs, the #mla09 dataset includes 199
unique URLs. Table 2 shows that the people-based dataset
includes a much higher percentage of Tweets with URLs than the
conference-based datasets. That also shows that in general,
scientists post a URL in more than 55% of their published tweets.
During conferences, the number of non-URL tweets increases ±
we assume that this is due to a higher number of ³FRQYHUVDWLRQDO´
tweets during social events like conferences and will investigate
this in more detail.
Table 2. Diffe re nt URL Counts
#www2010
#ml a09
Number (and %) of
tweets including at
least one URL
1,338
(39.85%)
525
(27.22%)
Number of total
URLs
1,460
551
6. C O N C LUSI O N AN D O U T L O O K
The poster presentation will include additional results: the
investigation of the types of Websites that the URLs link to, the
highly cited URLs from the conference datasets, the number of
retweets for the different datasets, highly retweeting and
retweeted users. Altogether citation behavior in Twitter is
different from traditional scientific publication and citation
behavior and need specific standards for analysis and metrics.
7. A C K N O WL ED G M EN TS
Many thanks to Evelyn Dröge who worked with us during earlier
phases of this project. Thanks to Julia Verbina and Parinaz
Maghferat for their contributions to data collection. Thanks to
Bernd Klingsporn for advice and support and to Wolfgang G.
Stock and Isabella Peters for critical remarks. Financial support
from the Heinrich-Heine-University Düsseldorf for the Research
*URXS³6FLHQFHDQGWKH,QWHUQHW´LVJUHDWO\DFNQRZOHGJHG.
8. R E F ER EN C ES
[1] Boyd, D., Golder, S. and Lotan, G. 2010. Tweet, tweet,
retweet: Conversational aspects of retweeting on Twitter. In
R. H. Sprague (Ed.), Proceedings of the 43rd Conference on
Sy st em Sciences (HICSS 10), Honolulu, Hawaii, USA .
Piscataway, NJ: IEEE.
[2] Bradley, David (no year). Hundreds of scientific Twitter
friends. Retrieved M ay 6, 2011, from
http://www.sciencebase.com/science-blog/100-scientific-
twitter-friends.
[3] Dröge, E., Maghferat, P., Puschmann, C., Verbina, J. and
Weller, K. 2011. Konferenz-Tweets: Ein Ansatz zur Analyse
der Twitter-Kommunikation bei wissenschaftlichen
Konferenzen. In Joachim Griesbaum, Thomas M andl, Christa
Womser-Hacker (Eds.), Information und Wissen: global,
sozial und frei? Proceedings des 12. Internationalen
Symposiums für Informationswissenchaft (pp. 98-110).
Boizenburg: VWH.
[4] Ebner, M. and Reinhardt, W. 2009. Social networking in
scientific conferences: Twitter as tool for strengthen a
scientific community. In U. Cress; V. Dimitrova, & M .
Specht (Eds.), Learning in the Synergy of M ultiple
Disciplines.4th European Conference on Technology
Enhanced Learning, EC-TEL 2009 Nice, France. Berlin:
Springer.
[5] Letierce, J., Passant, A., Decker, S. and Breslin, J. G. 2010.
Understanding how Twitter is used to spread scientific
messages. In Proceedings of the Web Science Conference
(WebSci10): Extending the Frontiers of Society On-Line,
Raleigh, NC, USA.
[6] Priem, J. and Costello, K. L. 2010. How and why scholars
cite on Twitter. In C. Marshall; E. Toms, & A. Grove (Eds.),
Proceedings of the 73rd ASIS&T Annual M eeting on
Navigating Streams in an Information Ecosystem, Pittsburgh,
PA, USA (pp. Article No. 75). New York, NY: ACM .
[7] Priem, J. and Hemminger, B. M . 2010. Scientometrics 2.0:
Toward new metrics of scholarly impact on the social Web.
First M onday, 15(7). Retrieved January 06, 2011, from
http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/articl
e/view/2874/2570.
[8] Priem, J., Taraborelli, D., Groth, P. and Neylon, C. 2010.
Alt-metrics: A M anifesto. Retrieved January 13, 2011, from
http://altmetrics.org/manifesto/.
[9] Ross, C., Terras, M ., Warwick, C. and Welsh, A. 2011.
Enabled backchannel: Conference Twitter use by digital
humanists. Journal of Documentation, 67(2), 214±237.
[10] Stankovic, M ., Rowe, M., and Laublet, P. 2010. M apping
tweets to conference talks: A goldmine for semantics. In
Proceedings of the Third Social Data on the Web Workshop
SDoW2010, collocated with ISWC2010, Shanghai, China.
[11] Stock, W.G. (2007): Information Retrieval. Informationen
suchen und finden. München, Wien: Oldenbourg.
[12] Thelwall, M. 2008. Bibliometrics to webometrics. Journal of
Information Science, 34(4), 605±621.
[13] Weller, K., Dröge, E., and Puschmann, C. 2011 (in press):
Citation Analysis in Twitter. Approaches for Defining and
Measuring Information Flows within Tweets during
Scientific Conferences. In Proceedings of Making Sense of
Microposts Workshop (#M SM2011). Co-located with
Extended Semantic Web Conference, Crete, Greece.
[14] Young, J. R. 2009. 10 High Fliers on Twitter: On the
microblogging service, professors and administrators find
work tips and new ways to monitor the world. The Chronicle
of Higher Education, 31, A10, April 10, 2009. Retrieved
January 11, 2011, from http://chronicle.com/article/10-High-
Fliers-on-Twitter/16488/.