Web content changes have a strong impact on search engines and more generally on technologies dealing with content retrieval and management. These technologies have to take account of the temporal patterns of these changes and adjust their crawling policies accordingly. This paper presents a methodological framework – based on time series analysis – for modeling and predicting the dynamics of the content changes. To test this framework, we analyze the content of three major news websites whose change patterns are characterized by large fluctuations and significant differences across days and hours. The classical decomposition of the observed time series into trend, seasonal and irregular components is applied to identify the weekly and daily patterns as well as the remaining fluctuations. The corresponding models are used for predicting the future dynamics of the sites based on their current and historical behavior.
Maria Carla, C., Tessera, D., Analysis and Forecasting of Web Content Dynamics, in 2018 32nd International Conference on Advanced Information Networking and Applications Workshops, (Kraków, 16-18 May 2018), IEEE Computer Society, NEW YORK -- USA 2018: 12-17. [10.1109/WAINA.2018.00056] [http://hdl.handle.net/10807/121332]
Analysis and Forecasting of Web Content Dynamics
Tessera, DanieleWriting – Original Draft Preparation
2018
Abstract
Web content changes have a strong impact on search engines and more generally on technologies dealing with content retrieval and management. These technologies have to take account of the temporal patterns of these changes and adjust their crawling policies accordingly. This paper presents a methodological framework – based on time series analysis – for modeling and predicting the dynamics of the content changes. To test this framework, we analyze the content of three major news websites whose change patterns are characterized by large fluctuations and significant differences across days and hours. The classical decomposition of the observed time series into trend, seasonal and irregular components is applied to identify the weekly and daily patterns as well as the remaining fluctuations. The corresponding models are used for predicting the future dynamics of the sites based on their current and historical behavior.File | Dimensione | Formato | |
---|---|---|---|
08418041.pdf
non disponibili
Tipologia file ?:
Versione Editoriale (PDF)
Licenza:
Non specificato
Dimensione
157.68 kB
Formato
Unknown
|
157.68 kB | Unknown | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.