Transfer of DATA between 2 Databases (LocalDB & Global D

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

User avatar
gtrawoger
Forum Newbie
Posts: 14
Joined: Sun Dec 11, 2005 10:38 am

Transfer of DATA between 2 Databases (LocalDB & Global D

Post by gtrawoger »

Hi guys,

I have a little project that I would like some input on.

I want to build some mobile units (laptops) and have them store the data they collect (user entered) locally on their harddrive.

When they are done, I would like them to be able to upload the data they have collected as soon as they get back to the office (or find some other wireless hotspot) to the main database.

Since there are about 3 laptops on the road I don't want them to overwrite each others information when it get's upload (they will add, edit and delete certain entries).

I have in mind that the laptops have both, the LocalDB and the GlobalDB on the harddrive. When they need information about a certain "jobsite" they can pull it up from the GlobalDB and then edit it (saving it into the LocalDB). The LocalDB will simply store the "changes" that have been made.

When they are done at the end of the day, they upload the changes and a script on the main site will convert the changes to actual records in the GlobalDB. After than, it downloads the new GlobalDB onto the Laptop.

It seems a bit complicated, and when we hit around 400 "jobsites" it would have to download roughly 1.6 megs of data everytime to get the new GlobalDB. And should it ever grow to more (we would like that) you can guess that the time to update would grow substancially.

Ok, I hope the above made sense. :roll: I am not the best in describing what I would like to do. This is a small project I have started on my own, and I would like to keep it simple.

Thanks already.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

research "replication" .. you'll still need some artificial synchronization likely, but it shouldn't be too difficult.
User avatar
gtrawoger
Forum Newbie
Posts: 14
Joined: Sun Dec 11, 2005 10:38 am

Post by gtrawoger »

Ok, so I did some research on "replication" for MySQL. I did find some really useful information. Updating the GlobalDB after the information was send up to the master server seems to be solved with that.

I guess, now my question is, how can I build a good system to upload the data to the master server from the laptops without fear of them overwriting each others information?

I am still thinking of having a database of "changes" to the GlobalDB, which will then be applied once a connection is made to the main server.

I was going to simply use a couple identifiers for where the entry needs to be writen. How about:

TABLE changes:

- id: long-int (unsigned) - [autoincremented id]
- table_id : text - [name of the table which contains the entry]
- entry_name : text - [name of the entry]
- new_value : text, integer, Idunno - [this is where I am having trouble!! What am I going to make this, since I have integers, long-ints, text and timestamps]

I just don't know what to make new_value, unless I make a different entry for all these different types I am using and then let it pick the right on. Right?

Or is there a better approach??? There must be. It seems so Mickey-Mouse.... :?: :|
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

hmm .. although I'm not sure how well this would work, here's an idea:

have a single extra field in the tables that indicates if it is synched or not. Now this field should only be written to by the global server, then replicated back to the laptops. If the laptop updates a record, this field should be cleared, or invalidate in some fashion. (It could be a hash of the record itself.)

When they want to update the main database with their changed information, simply select all fields where this field varies from the record (i.e. is blank). The server should take this data and perform checks on it. It can make decisions on whether to insert or update data. You could leave the "synch" field empty on insertion to let this script know someone has already inserted or updated that record. If there's a conflict, display the stored record and incoming record for someone to make decisions about. Allow them to mix the two records if need be.

Another thing to consider is deletions. You could do a comparison between records that have been removed from the laptop.. however, I personally think ghost-deletion works best. This simply marks the record to not show for anyone but an administrator.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Post by josh »

You probably want to allow the users to check out the records, XML or CSV is a good format for sending data over http, and what I would do personally is add a field to the main table that says who has it checked out (if anyone), when they upload their data back at the end of the day overwrite the data in the main DB with the data they have only if it has it checked out as them. You'd then mark those entries as "checked back in", this way no two people can edit the same information twice. The way I set up systems like this is usually to have them work on the data through the web interface itself so there is no transfer of data (they still check data out under their account but they are just doing it through a web interface, all the workers need is a web browser). You also might want to give them some kind of preview when they are checking data back in

ex. "You are checking in 5,000 records, you have 6,000 checked out so 1,000 will be deleted, 500 of your records have changed, they will be updated, there are 200 new records they will be inserted" that kind of thing and have them confirm it before any modifications to the main DB occur.

I would agree with feyd in doing the ghost deletion (just have a `deleted` field and flag things for deletions), in our application that is what we do and all those records still sit around incase we ever need them again.
User avatar
gtrawoger
Forum Newbie
Posts: 14
Joined: Sun Dec 11, 2005 10:38 am

Post by gtrawoger »

feyd wrote:hmm .. although I'm not sure how well this would work, here's an idea:

have a single extra field in the tables that indicates if it is synched or not. Now this field should only be written to by the global server, then replicated back to the laptops. If the laptop updates a record, this field should be cleared, or invalidate in some fashion. (It could be a hash of the record itself.)
I was thinking something similar. Maybe an extra field per table (or even per field, to really be exact what got updated) that indicates that it was changed or added. It can even contain a code for deletion.

This seemed the best alternative to me so far. Just adds a heck of a lot more fields to check. But it should theoretically work.

You could leave the "synch" field empty on insertion to let this script know someone has already inserted or updated that record. If there's a conflict, display the stored record and incoming record for someone to make decisions about. Allow them to mix the two records if need be.
I think I understand what you are saying here.... this is in case two of the guys on the road enter in information or create a new record about the same job at the same time (before the two can update their main database).

The mixing of the record would be tricky, no?
Another thing to consider is deletions. You could do a comparison between records that have been removed from the laptop.. however, I personally think ghost-deletion works best. This simply marks the record to not show for anyone but an administrator.
Agreed! Too much power in the users hands, should they delete someone else's entry. This way we can undelete. Just no way to "unedit". Unless I go back to the "changes" records. Ugh...
User avatar
gtrawoger
Forum Newbie
Posts: 14
Joined: Sun Dec 11, 2005 10:38 am

Post by gtrawoger »

jshpro2 wrote:You probably want to allow the users to check out the records, XML or CSV is a good format for sending data over http, and what I would do personally is add a field to the main table that says who has it checked out (if anyone), when they upload their data back at the end of the day overwrite the data in the main DB with the data they have only if it has it checked out as them. You'd then mark those entries as "checked back in", this way no two people can edit the same information twice. The way I set up systems like this is usually to have them work on the data through the web interface itself so there is no transfer of data (they still check data out under their account but they are just doing it through a web interface, all the workers need is a web browser). You also might want to give them some kind of preview when they are checking data back in

ex. "You are checking in 5,000 records, you have 6,000 checked out so 1,000 will be deleted, 500 of your records have changed, they will be updated, there are 200 new records they will be inserted" that kind of thing and have them confirm it before any modifications to the main DB occur.

I would agree with feyd in doing the ghost deletion (just have a `deleted` field and flag things for deletions), in our application that is what we do and all those records still sit around incase we ever need them again.
You are right, I am doing this with PHP and MySQL on Windows 2000 running Apache. All browser based. I have done something similar, just never anything where the units need to run offline for a while and then reconnect and update the main database.

I think this is the tricky part is. The updating at the end of the day!

And I think this is how your system was build: The computers are always online (or if they are offline, they are unable to alter anything).

I am not so worried that two users are working on the same jobsite *at the same time*. Rather that, when one goes to the jobsite and then another goes there 4 hours later. Then both head back to the office and upload their data, the first one's data get's overwriten by the one who send it in later (because the first one's data is not *with, or on* the laptop of the second person when he edited it. (BOY, did that make sense???)

There will probably always be a slim chance that one overwrites data of the other, but I would like to minimize that.
User avatar
gtrawoger
Forum Newbie
Posts: 14
Joined: Sun Dec 11, 2005 10:38 am

One more idea....

Post by gtrawoger »

Ok Guys,

I know you are busy, but here is one more idea I had in the shower yesterday (see what kind of life I got.....)

What if I continued with the idea of having TWO databases (LocalDB and GlobalDB) on the laptops.

1.) When the laptops start out in the morning, the LocalDB and GlobalDB are IDENTICAL.

2.) As the day goes on , and data is entered, the LocalDB is getting changed (while the GlobalDB isn't touched, we could also call it "OriginalDB")

3.) At the end of the day, when the user wants to upload the data, the PHP/MySQL script compares the two databases on the laptop (the laptops will be their own webserver, with Apache, PHP and MySQL, so the laptop will handle the comparison script).

4.) Now, as the laptop server will know the changes that were made to the database, it will apply them to the GlobalDB hosted on the mainserver.

5.) Once the changes were applied to the GlobalDB on the mainserver, using "replication" the GlobalDB gets updated on the Laptop from the mainserver. Also, the LocalDB gets "reset" by making an exact copy of the GlobalDB data to the LocalDB.

- bada-boom... we are back to where we started in the morning. Off the unit goes the next day.....

-----------------------

For Step 3:
My question is then, is there a call in MySQL for comparing two databases and finding what the difference is (like finding new ROWS and finding changed COLS)????

For Step4:
I guess I would have to have some kind of error routine to check if certain ROWS with the *job site* identifier already exist. If it does exist, then just apply the information as changes (or updates), correct?

I guess, what I am asking for is some approval for this way of doing it, since I am not that experienced with PHP/MySQL and the whole database programming.

Thanks already......
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

The only problem I have with the idea is then the laptops are authoritative of the server. This is generally not a relationship you would want. So, personally, I'd go the server doing the comparison route.
User avatar
gtrawoger
Forum Newbie
Posts: 14
Joined: Sun Dec 11, 2005 10:38 am

Post by gtrawoger »

feyd wrote:The only problem I have with the idea is then the laptops are authoritative of the server. This is generally not a relationship you would want. So, personally, I'd go the server doing the comparison route.
Just out of curiosity, why would you not want them to be authoritative? Security reason? Or because they could mess up everything on the main database?

I thought of this idea because it would minimize the bandwidth need for the laptops. They would do everything internally and then just upload the changes.

I could do it that it uploads the changes through a POST in an HTML. No? Then the mainserver would handle the actual writing of the data.

If I was to go the route of the mainserver handling all the comparison, wouldn't I run into large upload times when the database gets bigger? Or would MySQL have some super-smart command that would be able to compare over a slow connection?
User avatar
gtrawoger
Forum Newbie
Posts: 14
Joined: Sun Dec 11, 2005 10:38 am

Post by gtrawoger »

I just wanted to mention, that I really appreciate you guys' time. This would be a milestone project for me if I could get this done!
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

it's both a security reason and messing up the main database.

Keeping a copy of the global database and changing a duplicate on the laptop will require the laptop to be more powerful and have more RAM. It'll also need large drives as the database expands. Having the laptop simply mark which records it's modifier and sending those minimizes the bandwidth required, but the server should always make the decision of whether to incorporate that data or not. Basically, a client (in a sever-client relatoinship) should never be allowed to directly affect the server's data.

As mentioned before I wouldn't allow deleting of records on the server, opting instead to just mark them as deleted. That way you can more easily recover the data if needed, but also this makes sure someone can't permanently damage the server's data. (outside of the database administrator who can delete records at will, but some level of trust needs to be in place.) Having a good server backup plan in place is paramount too.
User avatar
gtrawoger
Forum Newbie
Posts: 14
Joined: Sun Dec 11, 2005 10:38 am

Post by gtrawoger »

feyd wrote:it's both a security reason and messing up the main database.
Agreed.
Keeping a copy of the global database and changing a duplicate on the laptop will require the laptop to be more powerful and have more RAM. It'll also need large drives as the database expands. Having the laptop simply mark which records it's modifier and sending those minimizes the bandwidth required, but the server should always make the decision of whether to incorporate that data or not. Basically, a client (in a sever-client relatoinship) should never be allowed to directly affect the server's data.
So, what do you think of my *POST* idea? As well, the laptops have a 6GB HDD (nothing fancy), but I doubt that the DB will ever go over 10MB. But, 10MB over a 56K connection is going to take forever. Especially when you are sitting in a Van waiting for it to get done.
As mentioned before I wouldn't allow deleting of records on the server, opting instead to just mark them as deleted. That way you can more easily recover the data if needed, but also this makes sure someone can't permanently damage the server's data. (outside of the database administrator who can delete records at will, but some level of trust needs to be in place.) Having a good server backup plan in place is paramount too.
I totally agree with the deletion notion. I was going to incorporate that for sure. As well, I had planned on doing weekly, if not semi-weekly Backups for the Database.

Just FYI, the information of the Database is sumplemental for the guys on the road. It was meant to speed things up and add a degree of professionalism to our work (i.e.: the man on the road would know what another guy did there at the job 6 months ago in a click of a button)

But the information is not *essential*. It just would be agonizing to have entered in all the information over months, just to have it screwed up (I'd probably get fired too... haha).

I am just wondering then, is there a MySQL command to compare two databases?? Or do I need to write my own script?


[/quote]
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I have a post, linked by the Userful Posts thread, which can be found via the Forum Tour link in my signature which shows how to see what records are in one table but not in another.

I was figuring the synch would happen when they get back to the office and have a high speed connection to the server. Since the laptops have quite small hard drives, I'd definitely suggest the single database with signature field that the laptops don't/can't write to. Only 1 field is needed to indicate the record has changed. It may be used as a bitfield of sorts to indicate which fields were changed, but that's not as important as say marking it deleted. Granted, the delete flag should probably be separate for ease of use in the database.

If you go that route, a simple selection is all that's required to know which records to send to the server. I'd use an encryption and/or compression to add a layer of security if need be.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

The POST idea is relatively okay, but the laptop should do it for the user anyways. Yours sounded like they'd have to upload a file or something. I'd try to keep it as automated as possible, they just hit a synch button on their local web server's page.
Post Reply