Introduction to Web
mining, or: From IR to KD - Transitions 1 and 2 (PPT)
Introduction to Web text mining, focusing specifically on blogs mining (PPT 1)
Things that I only covered briefly and/or informally
during the discussion:
More on blogs mining (PPT 2)
Transition 3: The questions change - opinion
mining (PDF - thanks to Mathias Verbeke)
Introduction to Web usage mining (PPT 1, PPT 2)
Transition 4: The material changes (2) - "story tracking" (PPT - pp. 31 ff.)
New challenges (PPT -
The material changes (2)
Literature and sources mentioned in the slides
excellent general introductions/overviews:
P., Frasconi, P.,
& Smyth, P. (2003). Modeling the
and the Web. Probabilistic Methods and Algorithms. Chichester,
Wiley & Sons. (chapter
on text mining,
Bing Liu (2006). Web
Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems
and Applications). Springer. (Book homepage)
Our clustering / ontology learning tool for
Berendt, B., Krause, B., &
Kolbe-Nusser, S. (in press).
Intelligent scientific authoring tools:
Interactive data mining for constructive uses
To appear in Information Processing & Management.
(PDF of last version before proofs)
Verbeke, M., Berendt, B., &
Nijssen, S. (2009). Data mining, interactive semantic structuring,
and collaboration: A diversity-aware method for
sense-making in search.
In G. Boato & C. Niederee (Eds.), Proceedings of First
International Workshop on Living Web,
, collocated with the 8th International Semantic Web Conference
(ISWC-2009), Washington D.C., USA, October 26, 2009.
CEUR Workshop Proceedings Vol-515. (PDF,
Two nice and thought-provoking
examples of text classification applied to blogs and news:
(2006). A corpus-based approach to
finding happiness, In Proceedings of the AAAI Spring Symposium on
Computational Approaches to Analyzing Weblogs. (PDF)
Liu, H. & Mihalcea, R. (2007). Of men,
and computers: Data-driven gender modeling for improved user
interfaces. In Proc. of the
International Conference on Weblogs and Social Media. (PDF)
The overview of Opinion
Mining is based on Bing Liu's book (see above).
Web usage mining work:
M., & Berendt, B. (2003). Web-Usage-Based Success Metrics for
In Proceedings of the
WebKDD 2003 Workshop - Webmining as a Premise to Effective and
Intelligent Web Applications.
August 27th, 2003, Washington DC, USA.
Held in conjunction with
The Ninth ACM SIGKDD International Conference on Knowledge Discovery
More details in: Teltzrow, M. (2005). A quantitative analysis of e-commerce -
channel conflicts, data mining, and consumer privacy. PhD
Dissertation, Institute of Information Systems, Humboldt University
Berendt, B. & Spiliopoulou, M. (2000).
Analysis of navigation behaviour in web sites integrating multiple
information systems. The VLDB Journal, 9, 56-75.
Our story tracking work:
Subašić, I. & Berendt,
Discovery of interactive graphs for understanding
and searching time-indexed corpora. Knowledge and Information
DOI - 10.1007/s10115-009-0227-x
and other work by the same authors (see my homepage)
L. Sweeney (2002).
k-anonymity: a model for protecting privacy. International Journal on Uncertainty,
Fuzziness and Knowledge-based Systems, 10 (5), 557-570.
Frankowski, D., Cosley, D.,
S., Terveen, L.G., Riedl, J.
(2006). You are what you say: privacy risks of public mentions. In SIGIR 2006: Proceedings of the 29th
Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, Seattle,
Washington, USA, August 6-11, 2006 (pp.
565â€“572). ACM. (PDF)
Barbaro, M., Zeller, T.: A
exposed for aol searcher no. 4417749. New
York Times (9 August 2006) (HTML)
All other sources should be
retrievable from the information in the slides - please let me know if
I overlooked something!
Major conferences and workshops
All KDD ("data mining")
(SIGKDD, PKDD, SIAM Conf. Data Mining, PAKDD) have interesting papers
on Web mining and dedicated workshops. Check
Further resources: Tools (Open source and/or free)
Please talk or write to me!