Initial Setup

Dataset Characterization

Number of users

Total number of requests

Distribution of answer type

Answers per user

Privacy Profiles

Privacy Profiles using the category-permission pair

This methodology follows the approach described in Liu, B., Andersen, M. S., Schaub, F., Almuhimedi, H., Zhang, S. A., Sadeh, N., ... & Acquisti, A. (2016). Follow my recommendations: A personalized privacy assistant for mobile app permissions. In Twelfth Symposium on Usable Privacy and Security (SOUPS 2016) (pp. 27-41).

Load the data

Form the tensor data

Each user becomes a row where each column is a pair category-permission, whose value is the average grant result of the user (value between -1 and 0) for that specific pair.

Missing values correspond to cases where the user has no answered permission for the given pair. Data must be inputted in order to form the profiles using hierarchical clustering. We use SciKit's IterativeImputer.

Dendogram

From the dendogram, we can select where to perform the "cut" to have the privacy profiles.

For an illustrative example, we will select 3 profiles and plot them.

From the plots above, we can see that participants in profile 1 (17 participants) allow most requests, therefore these are typically referred to as "The unconcerned". In contrast, participants in profile 2 (26 participants) mostly deny permission requests. These are the "Privacy Conscious". However, most participants are in profile 0, where there seems to bee a more diverse response. Potentially, this profile could have been further divided, by increasing the number of clusters. Note that increasing the number of clusters degrades the interpretability, but might allow for a better separation of behaviors.

n-Dimensional Privacy Profiles

The previous profiles were built using just the category-permission pair. However, we can use other features and even more features, such as contextual features, to form the profiles.

Below we follow the same methodology to build profiles with the category-permission-expectancy tuples. Just note that an increase on the number of considered features leads to an increase in the amount of missing data.

Dendogram

Notice the difference between this dendogram and the previous example. In this case 2 or 3 profiles should result in the best clustering. For an illustrative example we select 3 again.

3 category-permission-expectancy Profiles

Looking at the above plots, profile 0, which has 76 (81.7%) participants corresponds to participants that deny almost all unexpected requests, while allowing almost all expected requests. These are participants whose privacy behavior is strongly motivated by their expectations. In contrast, profile 1 (16 participants) corresponds to participants that allow most requests regardless of their expectancy. Profile 2 belongs to a single user that denies most requests.

With the profiles formed, you can then use those labels (data["hc_label"]) as features towards training a classifier to predict the grant result.

We are currently working on publishing a paper on such results and will then provide the respective code and results. However, you can already do it if you have access to the dataset.