Salutari, F., Da Hora, D., Dubuc, G., & Rossi, D. (2020). Analyzing Wikipedia Users’ Perceived Quality Of Experience: A Large-Scale Study. IEEE Transactions on Network and Service Management.
@article{salutari2020tnsm,
author = {{Salutari}, F. and {Da Hora}, D. and {Dubuc}, G. and {Rossi}, D.},
journal = {IEEE Transactions on Network and Service Management},
title = {Analyzing Wikipedia Users’ Perceived Quality Of Experience: A Large-Scale Study},
month = mar,
year = {2020},
miodoi = {10.1109/TNSM.2020.2978685}
}
The Web is one of the most successful Internet applications. Yet, the quality of Web users’ experience is still largely impenetrable. Whereas Web performance is typically studied with controlled experiments, in this work we perform a large-scale study of a real site, Wikipedia, explicitly asking (a small fraction of its) users for feedback on the browsing experience. The analysis of the collected feedback reveals that 85% of users are satisfied, along with both expected (e.g., the impact of browser and network connectivity) and surprising findings (e.g., absence of day/night, weekday/weekend seasonality) that we detail in this paper. Also, we leverage user responses to build supervised data-driven models to predict user satisfaction which, despite including state-of-the art quality of experience metrics, are still far from achieving accurate results (0.62 recall of negative answers). Finally, we make our dataset publicly available, hopefully contributing in enriching and refining the scientific community knowledge on Web users’ QoE.
Conference Proceedings
Salutari, F., Da Hora, D., Varvello, M., Teixeira, R., Christophides, V., & Rossi, D. (2020, June). Implications of the Multi-Modality of User Perceived Page Load Time. IEEE MedComNet Conference.
@inproceedings{salutari2020medcomnet,
title = {Implications of the Multi-Modality of User Perceived Page Load Time},
author = {Salutari, Flavia and Da Hora, Diego and Varvello, Matteo and Teixeira, Renata and Christophides, Vassilis and Rossi, Dario},
booktitle = {IEEE MedComNet Conference},
month = jun,
year = {2020},
miodoi = {10.1109/MedComNet49392.2020.9191615}
}
Web browsing is one of the most popular applications for both desktop and mobile users. A lot of effort has been devoted to speedup the Web, as well as in designing metrics that can accurately tell whether a webpage loaded fast or not. An often implicit assumption made by industrial and academic research communities is that a single metric is sufficient to assess whether a webpage loaded fast. In this paper we collect and make publicly available a unique dataset which contains webpage features (e.g., number and type of embedded objects) along with both objective and subjective Web quality metrics. This dataset was collected by crawling over 100 websites—representative of the top 1 M websites in the Web—while crowdsourcing 6,000 user opinions on user perceived page load time (uPLT). We show that the uPLT distribution is often multi-modal and that, in practice, no more than three modes are present. The main conclusion drawn from our analysis is that, for complex webpages, each of the different objective QoE metrics proposed in the literature (such as AFT, TTI, PLT, etc.) is suited to approximate one of the different uPLT modes.
Salutari, F., Da Hora, D., Dubuc, G., & Rossi, D. (2019, May). A large-scale study of Wikipedia users’ quality of experience. The Web Conference (WWW’19).
@inproceedings{salutari2019www,
title = {A large-scale study of Wikipedia users' quality of experience},
author = {Salutari, Flavia and Da Hora, Diego and Dubuc, Gilles and Rossi, Dario},
booktitle = {The Web Conference (WWW'19)},
month = may,
year = {2019},
address = {San Francisco, California},
miodoi = {10.1145/3308558.3313467}
}
The Web is one of the most successful Internet application. Yet, the quality of Web users’ experience is still largely impenetrable. Whereas Web performances are typically gathered with controlled experiments, in this work we perform a large-scale study of one of the most popular websites,namely Wikipedia, explicitly asking (a small fraction of its) users for feedback on the browsing experience. We leverage user survey responses to build a data-driven model of user satisfaction which, despite including state-of-the art quality of experience metrics, is still far from achieving accurate results, and discuss directions to move forward. Finally, we aim at making our dataset publicly available, which hopefully contributes in enriching and refining the scientific community knowledge on Web users’ quality of experience (QoE).
Salutari, F., Cicalese, D., & Rossi, D. (2018, March). A closer look at IP-ID behavior in the Wild. International Conference on Passive and Active Network Measurement (PAM).
@inproceedings{salutari2018pam,
title = {A closer look at IP-ID behavior in the Wild},
author = {Salutari, Flavia and Cicalese, Danilo and Rossi, Dario},
booktitle = {International Conference on Passive and Active Network Measurement (PAM)},
address = {Berlin, Germany},
year = {2018},
month = mar,
miodoi = {10.1007/978-3-319-76481-8_18}
}
Originally used to assist network-layer fragmentation and reassembly, the IP identification field (IP-ID) has been used and abused for a range of tasks, from counting hosts behind NAT, to detect router aliases and, lately, to assist detection of censorship in the Internet at large. These inferences have been possible since, in the past, the IPID was mostly implemented as a simple packet counter: however, this behavior has been discouraged for security reasons and other policies, such as random values, have been suggested. In this study, we propose a framework to classify the different IP-ID behaviors using active probing from a single host. Despite being only minimally intrusive, our technique is significantly accurate (99% true positive classification) robust against packet losses (up to 20%) and lightweight (few packets suffices to discriminate all IP-ID behaviors). We then apply our technique to an Internet-wide census, where we actively probe one alive target per each routable /24 subnet: we find that that the majority of hosts adopts a constant IP-IDs (39%) or local counter (34%), that the fraction of global counters (18%) significantly diminished, that a non marginal number of hosts have an odd behavior (7%) and that random IP-IDs are still an exception (2%).
Ciociola, A., Cocca, M., Giordano, D., Mellia, M., Morichetta, A., Putina, A., & Salutari, F. (2017, August). UMAP: Urban Mobility Analysis Platform to Harvest Car Sharing Data. IEEE Smart City Innovations (IEEE SCI’17),
@inproceedings{salutari2017umap,
author = {Ciociola, Alessandro and Cocca, Michele and Giordano, Danilo and Mellia, Marco and Morichetta, Andrea and Putina, Andrian and Salutari, Flavia},
title = {UMAP: Urban Mobility Analysis Platform to Harvest Car Sharing Data},
booktitle = {IEEE Smart City Innovations (IEEE SCI'17),},
month = aug,
year = {2017},
address = {San Francisco, California},
miodoi = {10.1109/UIC-ATC.2017.8397566}
}
Car sharing is nowadays a popular means of transport in smart cities. In particular, the free-floating paradigm lets the customers look for available cars, book one, and then start and stop the rental at their will, within a specific area. This is done thanks to a smartphone app, which contacts a webbased backend to exchange information. In this paper we present UMAP, a platform to harvest the data freely made available on the web by these backends and to extract driving habits in cities. We design UMAP with two specific purposes. Firsty UMAP fetches data from car sharing platforms in real time. Secondly, it processes the data to extract advanced information about driving patterns and user’s habits. To extract information, UMAP augments the data available from the car sharing platforms with mapping and direction information fetched from other web platforms. This information is stored in a data lake where historical series are built, and later analyzed using analytics modules easy to design and customize. We prove the flexibility of UMAP by presenting a case of study for the city of Turin. We collect car sharing usage data for over 50 days to characterize both the temporal and spatial properties of rentals, and to characterize customers’ habits in using the service, which we contrast with public transportation alternatives. Results provide insights about the driving style and needs, which are useful for smart city planners, and prove the feasibility of our approach.
Technical Reports
Salutari, F., Hora, D. D., Dubuc, G., & Rossi, D. (2020). Analyzing Wikipedia Users’ Perceived Quality Of Experience: A Large-Scale Study (Extended Technical Report). In Technical Report.
@techrep{techrepqoe2020,
author = {Salutari, Flavia and Hora, Diego Da and Dubuc, Gilles and Rossi, Dario},
title = {Analyzing Wikipedia Users’ Perceived Quality Of Experience: A Large-Scale Study (Extended Technical Report)},
booktitle = {Technical Report},
month = dec,
year = {2020}
}
The Web is one of the most successful Internet application. Yet, the quality of Web users’ experience is still largely impenetrable. Whereas Web performances are typically gathered with controlled experiments, in this work we perform a large-scale study of one of the most popular websites, namely Wikipedia, explicitly asking (a small fraction of its) users for feedback on the browsing experience. The analysis of the collected users’ feedback reveals both expected (e.g., the impact of browser and network connectivity) and surprising findings (e.g., absence of day/night, weekday/weekend seasonality and other temporal dependencies) that we detail in this paper. Also, we leverage user survey responses to build supervised data-driven models to predict user satisfaction which, despite including state-of-the art quality of experience metrics, are still far from achieving accurate results. Finally, we make our dataset publicly available, which hopefully contributes in enriching and refining the scientific community knowledge on Web users’ Quality of Experience (QoE).
Salutari, F., & Rossi, D. (2019). A deeper look at IP-ID behavior in the Wild (Extended Technical Report). In Technical Report.
@techrep{techrepipid2018,
author = {Salutari, Flavia and Rossi, Dario},
title = {A deeper look at IP-ID behavior in the Wild (Extended Technical Report)},
booktitle = {Technical Report},
month = feb,
year = {2019}
}
Originally used to assist network-layer fragmentation and reassembly, the IP identification field (IP-ID) has been used and abused for a range of tasks, from counting hosts behind NAT, to detect router aliases and, lately, to assist detection of censorship in the Internet at large. These inferences have been possible since, in the past, the IP-ID was mostly implemented as a simple packet counter: however, this behavior has been discouraged for security reasons and other policies, the use of random values, have been suggested. In this study, we propose a framework to classify the different IP-ID behaviors using active probing from a single host. Despite being only minimally intrusive, our technique is significantly accurate (99% true positive classification) robust against packet losses (up to 20%) and lightweight (few packets suffices to discriminate all IP-ID behaviors). We then apply our technique to an Internet wide census, where we actively probe one alive target per each routable /24 subnet: we find that the majority of hosts adopts a constant IP-IDs (39%) or local counter (34%), that the fraction of global counters (18%) significantly diminished, that a non marginal number of hosts have an odd behavior (7%) and that random IP-IDs are still an exception (2%). We believe that these findings, together with the datasets we release, can provide some support for works relying on a specific implementation of the IPID and, more generally, they can be instrumental for researchers operating in the field of network measurements, by providing them an updated picture of the Internet-wide adoption of the different known IP-ID implementations.