Async Query on Keystroke (a.k.a "autocomplete")
Posted: Fri Jan 09, 2009 10:36 am
You know on Twitter, when you create an account, it does an async* SQL query with every keystroke to see if that account doesn't already exist? Well, my client wants that. I'm postponing that for now so that I can knock out other features, but I wanted to at least get the discussion going with you guys to see if you are familiar with it.
* Async, as opposed to sync. With async, multiple transactions can run at the same time, in parallel. With sync, each transaction is blocked until the previous one is completed, serially, in order.
To me, it means that with every keystroke in the username field on Twitter, they have to make an async AJAX SQL query (likely using jQuery to help) back to the database. When someone starts typing again in the username field, the response handler would have multiple responses coming back. They basically have to reject multiple responses and take the latest response. They probably use some interval timer so that they don't send the query until 0.5 seconds when someone stops typing. I also see that this is modeless -- while the query is running in the background, they let one keep on typing.
And it's probably not good to check on the most recent request because one of those async calls may come back out of order if the server receives an unusual performance spike. So, you basically have to tag an async AJAX request with a key field, and then keep parsing requests until you find the one with the latest key field that comes back from the server.
I would think that a control like this on a web page would create a significant impact on database performance. So, to help offload that, the interval timer in the Javascript helps, but it's not the only help one needs. Next, you would probably need an exclusive table in the database just for this, and it contains no other columns but this column. It then needs a properly tuned index against it. And then there might be a need to do some sort of either pure RAM cache on this, or tmpfs (virtual memory) cache. And having the query run on one of these new SSD drives (solid state drives) might improve performance as well. And I would definitely say -- this is not something you want to implement on a shared hosting plan. It takes a dedicated hosting plan for such a thing.
Suggestions?
* Async, as opposed to sync. With async, multiple transactions can run at the same time, in parallel. With sync, each transaction is blocked until the previous one is completed, serially, in order.
To me, it means that with every keystroke in the username field on Twitter, they have to make an async AJAX SQL query (likely using jQuery to help) back to the database. When someone starts typing again in the username field, the response handler would have multiple responses coming back. They basically have to reject multiple responses and take the latest response. They probably use some interval timer so that they don't send the query until 0.5 seconds when someone stops typing. I also see that this is modeless -- while the query is running in the background, they let one keep on typing.
And it's probably not good to check on the most recent request because one of those async calls may come back out of order if the server receives an unusual performance spike. So, you basically have to tag an async AJAX request with a key field, and then keep parsing requests until you find the one with the latest key field that comes back from the server.
I would think that a control like this on a web page would create a significant impact on database performance. So, to help offload that, the interval timer in the Javascript helps, but it's not the only help one needs. Next, you would probably need an exclusive table in the database just for this, and it contains no other columns but this column. It then needs a properly tuned index against it. And then there might be a need to do some sort of either pure RAM cache on this, or tmpfs (virtual memory) cache. And having the query run on one of these new SSD drives (solid state drives) might improve performance as well. And I would definitely say -- this is not something you want to implement on a shared hosting plan. It takes a dedicated hosting plan for such a thing.
Suggestions?