The traffic produced by the periodic crawling activities of Web robots often represents a good fraction of the overall websites traffic, thus causing some non-negligible effects on their performance. Our study focuses on the traffic generated on the SPEC website by many different Web robots, including, among the others, the robots employed by some popular search engines. This extensive investigation shows that the behavior and crawling patterns of the robots vary significantly in terms of requests, resources and clients involved in their crawling activities. Some robots tend to concentrate their requests in short periods of time and follow some sorts of deterministic patterns characterized by multiple peaks. The requests of other robots exhibit a time dependent behavior and repeated patterns with some periodicity. We represent the traffic as a time series modelled in the frequency domain. The identified models, consisting of trigonometric polynomials and Auto Regressive Moving Average components, accurately summarize the behavior of the overall traffic as well as the traffic of individual robots. These models can be easily used as a basis for forecasting.

Tessera, D., Calzarossa, M., Massari, L., An extensive study of Web robots traffic, in Proc. iiWAS2013, (Vienna, 02-04 December 2013), ACM Press, New York 2013: 410-417. [10.1145/2539150.2539161] [http://hdl.handle.net/10807/56099]

An extensive study of Web robots traffic

Tessera, Daniele;Calzarossa, Maria;Massari, Luisa
2013

Abstract

The traffic produced by the periodic crawling activities of Web robots often represents a good fraction of the overall websites traffic, thus causing some non-negligible effects on their performance. Our study focuses on the traffic generated on the SPEC website by many different Web robots, including, among the others, the robots employed by some popular search engines. This extensive investigation shows that the behavior and crawling patterns of the robots vary significantly in terms of requests, resources and clients involved in their crawling activities. Some robots tend to concentrate their requests in short periods of time and follow some sorts of deterministic patterns characterized by multiple peaks. The requests of other robots exhibit a time dependent behavior and repeated patterns with some periodicity. We represent the traffic as a time series modelled in the frequency domain. The identified models, consisting of trigonometric polynomials and Auto Regressive Moving Average components, accurately summarize the behavior of the overall traffic as well as the traffic of individual robots. These models can be easily used as a basis for forecasting.
2013
Inglese
Proc. iiWAS2013
The 15th International Conference on Information Integration and Web-based Applications & Services (iiWAS2013)
Vienna
2-dic-2013
4-dic-2013
978-1-4503-2113-6
Tessera, D., Calzarossa, M., Massari, L., An extensive study of Web robots traffic, in Proc. iiWAS2013, (Vienna, 02-04 December 2013), ACM Press, New York 2013: 410-417. [10.1145/2539150.2539161] [http://hdl.handle.net/10807/56099]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10807/56099
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact