Optimal Database Design/Best Practice - Advice? :)

Stephen_Essex · October 12, 2022, 1:15pm

Hi everyone

I’m in a bit of quandary over the best way to create my tables to allow for an ‘optimised’ filter based search of tens-of-thousands of users.

I have a site that stores respondents information (for market research projects). For each user/respondent, there can be up to 50 individual pieces of info about them, in the form of answers to questions such as ‘email’, ‘gender’, ‘location’, ‘medical issues’ and so on. Information held can be in a variety of formats i.e. ‘number’, ‘string’, etc,(mainly in the form of dropdown selections but some with free-text input) and each question could have anything from 2 option choices (yes, no) to 200 option choices (think location dropdown for instance). The 50 questions are made up of 7 categories: (1) User [10 Questions], (2) Personal [17 Questions]l, (3) Work [7 Questions], (4) Tech & Comms [5 Questions], (5) Consumer Behaviour [5 Questions], (6) Utilities [3 Questions], (7) Communication Preferences [2 Questions].

What I’d like to know, is if there is an optimal database design/structure to house this information, particularly considering that we have to search through tens of thousands of users through filtered searches? For example, we might need to find males that have an income of X, live in a particular location, and have a certain mobile/cell provider.

Currently I have considered 3 possible ways of doing this but cannot determine the best/optimal choice.

To have just one ‘User’ table (with the additional schemas/columns added to it) containing all the 50 data points.
To have two related tables, one for ‘User’ (with basic signup info) and the rest of the info/data points held in another table (lets call it ‘Info’).
To have 7 related tables, that represent the categories mentioned above. So one for ‘User’, and then ‘Personal’, ‘Work’, ‘Tech & Comms’, ‘Consumer Behaviour’, ‘Utilities’, 'Communication Preferences’.

Ultimately, I’m trying to plan for the worse case scenario where we have to search/filter across all 50 datapoints for 100,000+ users, without the query/result taking forever to be returned (or worse, timing out).

viktor.liablin · October 12, 2022, 3:22pm

Hello @Stephen_Essex

Each option has its strengths and weaknesses
The first option will be the fastest. Especially if you add indexes.
The third option is the most flexible. Easy to change the structure. No duplicates. More flexible requests.

Regards

Jorg_Beyer · October 12, 2022, 3:24pm

What about not using a relational database, but a graph database?

Stephen_Essex · October 13, 2022, 9:55am

Thanks Viktor

Stephen_Essex · October 13, 2022, 9:57am

Hi Jorg. I’m quite new to databases and backendless so I’m not even sure what a graph database is or how to use them.

Jorg_Beyer · October 13, 2022, 4:26pm

Hi Stephen,

then I would recommend go with one of the options Viktor recommended.