Page 1 of 1

User profiling using the database. What's the best way?

Posted: Sat Aug 20, 2005 8:01 pm
by robster
I have a database setup, which as well as having forum tables and other tables, also contains a table for users.

The table could look similar to this for examples sake and is called forum_user_info:

Code: Select all

id - name_first - name_last - country - gender - birthdate - email_settings - username - email
What I want to do now, is have the ability to ask users questions. The questions could be things like this:

Code: Select all

When being presented with a chance to overtake, and the lanes are double lined, do you overtake or do you obey the road rules?
These questions will appear around the site, and will contribute to the users profile in time, and will provide answers to questions about them. Profiling as such, but fun profiling.

What I want to know is, how would I best layout the database to handle this considering these rules:?

1 - The questions must never be asked twice to the same user
2 - The answers must represent a character trait of the user, so the answers must be interprited by the system in a way to emphasise this
3 - They must be able to be searched very quickly to compare one user against another


What I mean by questions 2 and 3 are basically described like this:

I want to be able to look at a users responses to the questions and write a script that says something like 'user X answered question Y with a negative answer, so they are 'character_trait''.

The user can then say, show me all users that have a character trait of 'angry' (as an example)

I hope somebody here can help me out with a design for the database, and I want to imagine the site is very successful and has say, up to 50,000 members.

Some more random thoughts I've been having on the topic:

Do I create a seperate table for each question?
If I do create a seperate table for each question, how do I let the system know the user has taken that question?
Do I have a script that checks through the whole table for responses?
Considering there could be 10's of thousands of responses, this could get slow with thousands of users searching at once for example.



Any advice appreciated dearly! :)

Rob

Posted: Sat Aug 20, 2005 8:41 pm
by feyd
I see a table for all questions, a table for answers, and a table for votes.

since it sounds like you're only allowing binary choices, you can store a reference to the responses allowed in the question table. As for the votes table, it'd store a user id, question id, and answer chosen

Posted: Sat Aug 20, 2005 9:02 pm
by robster
That's pretty interesting! Damn, that makes a lot of sense.

Do you imagine the tables to look something like this?


Questions

Code: Select all

id - question - answer_id

Answers

Code: Select all

id - answer - question_id

Votes

Code: Select all

id - question_id - answer_chosen - user_id

I don't fully understand why you have answer_id in the Questions table. Could you perhaps explain that a little as I would have thought you'd leave that out and do something like this:


Creating the question for display from the database (really rough code, no real loop or way to stop loop, you get the idea though):

Code: Select all

if (user has has not answered this question)
{
  display question
}
else
{
  display next question
}


When drawing the question, find the answers in the Answers table

Code: Select all

result = "retrieve all answers from answers table where question_id = $question_id"

begin loop through result
{
  display answer
}


and finally, when checking out what the user has answered, so we can profile them, something like this:

Code: Select all

result = "retrieve all answer_chosen from votes table where user_id = $user_id and where question_id = $question_id"

begin loop through the result
{
  //do math here, ie:  if user answers yes to question, then they are likely to be psychopaths 
}

What do you think? :)

Also, remember, I don't fully understand why you have answer_id in the Questions table. Thanks again so much.

Rob

Posted: Sat Aug 20, 2005 9:25 pm
by feyd
the answer id's could be kept in the question table if you are only going to ever offer two choices. However, if you ever think the choices may go beyond that (or you want to remain normalized) having a question reference in the answer table is preferred. An additional thing to think about: it seems like there is a limited set of actual answers, so you could use a many-to-many relationship to limit duplication of data.

Posted: Sat Aug 20, 2005 9:35 pm
by robster
the answer id's could be kept in the question table if you are only going to ever offer two choices.
I plan on questions like this also:

Code: Select all

Q: if you found out your partner had borrowed your car without asking, would you:
A1: Go mental
A2: Ban them but keep it cool
A3: Not say anything but be unhappy
A4:  I wouldn't do anything, why would I care
You get the idea I think.



However, if you ever think the choices may go beyond that (or you want to remain normalized) having a question reference in the answer table is preferred.
I do want to go beyond that yes, as the previous Q/A shows.
I am not sure what remaining normalised is?
If I have the question reference the answer, how can I do that if there are more than one answer? I am guessing you can only do this:

Code: Select all

QUESTION TABLE
id "1"
question -  "if you found out your partner had borrowed your car without asking, would you"
answer_id - "345"
If we assume '345' is the answer_id, how could you do it with a choice of say, 5 answers?




An additional thing to think about: it seems like there is a limited set of actual answers, so you could use a many-to-many relationship to limit duplication of data.
I don't know what a many-to-many relationship is but will research it on the net today. I am presuming you mean, in this situation, some kind of way of keeping the database size down due to lots of 'yes' and 'no' type answers? I'll have a look anyway, sounds interesting.


Thanks again, this is starting to come together for me, just these few things that I want to clarify before I design the database (I've learned from experience what bad design can do for a site/app, hence my questions ;)).

Rob

Posted: Sat Aug 20, 2005 10:03 pm
by feyd
if you're going to have multiple, varied answers, then a many-to-many isn't a great avenue.

Although this isn't the avenue to go down, I'll explain something real quick about the storing answers in the question table: you can have multiple fields that reference answers. But as I said, I recommend not going down that avenue now that I know you want to have multiple, varied answers.

So, to summarize: you want a table with questions; a table with answers, where each answer points to the question it works with; and a votes table, where each record points to an answer id (through which, points to a question)

Posted: Sun Aug 21, 2005 4:29 am
by robster
beautiful :)

I'll get straight into some testing on that. I really like the idea of it.

I guess the one thing still remains though. What's an efficient way to know if a user has already answered that question?
I guess I could have another table called 'completed' which has the id of each question. That would be easy enough I guess.

ie:

Code: Select all

if (question_id != completed_question_id)
{
  ask question
}
else
{
  loop thru to next question
}
etc ;)

You think that's the way to go?

Just wait until I get to my profiling! That's seriously fun, I've had some good conversations with a mathemetician friend on this already!

:)

Thanks again so much, I really appreciate all the help I can get. Although I've been poking around with php/mysql for a while I've not really done anything serious with it and it's really helping me grow codewise/designwise.

Thanks again :)

Rob

Posted: Sun Aug 21, 2005 4:54 am
by feyd
read "Find all records that aren't matching: SQL SELECT JOIN HELP"

found in the Useful Posts thread.

You can combine that technique with an "ORDER BY RAND() LIMIT 1" to get the next question to use. :)