Hi everyone
I’m in a bit of quandary over the best way to create my tables to allow for an ‘optimised’ filter based search of tens-of-thousands of users.
I have a site that stores respondents information (for market research projects). For each user/respondent, there can be up to 50 individual pieces of info about them, in the form of answers to questions such as ‘email’, ‘gender’, ‘location’, ‘medical issues’ and so on. Information held can be in a variety of formats i.e. ‘number’, ‘string’, etc,(mainly in the form of dropdown selections but some with free-text input) and each question could have anything from 2 option choices (yes, no) to 200 option choices (think location dropdown for instance). The 50 questions are made up of 7 categories: (1) User [10 Questions], (2) Personal [17 Questions]l, (3) Work [7 Questions], (4) Tech & Comms [5 Questions], (5) Consumer Behaviour [5 Questions], (6) Utilities [3 Questions], (7) Communication Preferences [2 Questions].
What I’d like to know, is if there is an optimal database design/structure to house this information, particularly considering that we have to search through tens of thousands of users through filtered searches? For example, we might need to find males that have an income of X, live in a particular location, and have a certain mobile/cell provider.
Currently I have considered 3 possible ways of doing this but cannot determine the best/optimal choice.
-
To have just one ‘User’ table (with the additional schemas/columns added to it) containing all the 50 data points.
-
To have two related tables, one for ‘User’ (with basic signup info) and the rest of the info/data points held in another table (lets call it ‘Info’).
-
To have 7 related tables, that represent the categories mentioned above. So one for ‘User’, and then ‘Personal’, ‘Work’, ‘Tech & Comms’, ‘Consumer Behaviour’, ‘Utilities’, 'Communication Preferences’.
Ultimately, I’m trying to plan for the worse case scenario where we have to search/filter across all 50 datapoints for 100,000+ users, without the query/result taking forever to be returned (or worse, timing out).