Statiscal lazy loading idea

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Statiscal lazy loading idea

Post by Kieran Huggins »

Something I'd like to see in an ORM in the future is statistically influenced loading.. imaging you could define "watch points" in your model and have the ORM track use cases to tune performance on...

Hmmm... maybe an example (in pseudo code):

Code: Select all

model:
class User extends SmartORM
  # properties
  property id, Integer, auto-inc
  property username, String
  property name, String
  property description, Text
  property signature, Text
 
  # associations  
  belongs_to groups
  has_many posts, order_by(date, desc)
 
  # watch points
  watch association groups
  watch property id
  watch count(posts)
  watch Request.controller
end
ok... so a user belongs to zero or more groups, has many posts (blog/forum?) that are ordered by date, descending when accessed through the model.

Now with traditional ORMs, the ORM will make decisions about which fields to retrieve based on field type (or not at all!)... and most will allow us to override lazy/eager-loading at the model layer, and sometimes when loading the Object (explicitly, in the controller).

Sooo.... how about an intelligent, predictive decision based on past experience? The ORM could track certain conditions and work out the cost/benefit of grabbing certain columns / associations based on past behaviour. For instance, maybe for a user the "name" property is accessed almost all the time, whereas the "username" property is only accessed 5% of the time (during login). Or better yet - the "description" is accessed only when a user's profile is shown, but the signature is accessed every time a post is viewed. Since these are both Text columns, they incur a relatively large cost to the DB that is sometimes, but not often necessary.

The ORM could track key usage stats in memory (or a HEAP table, depending on the platform/arch). And we could manually add specific data points to watch and analyze whenever a decision needs to be made. Maybe dump the totals to a long-term store every once in a while for safe-keeping, but it's not the kind of data we need to store every last piece of at all. Losing minutes, hours or even days of data would be increasingly inconsequential as time went on. Stats are cool like that.

In the example above, I added four watch points to the default set:
* watch the "groups" association - Maybe group membership alters what data is often needed from the DB
* watch the "id" property - decisions might be atomic per user, which would probably suck, performance-wise
* watch the number of posts a user has - maybe there are many users with zero posts, in which case a signature would almost never need to be loaded.
* watch the controller that made the request - i.e. the "Login" controller would need different fields on average than the "Posts" controller - no need to hard code the rules in the controller!

Anyway... this post is running a tad long... would love to hear some feedback.

Still :crazy: ,
Kieran
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Statiscal lazy loading idea

Post by alex.barylski »

Two issues:

1. Heuristics algorithms tend to be complicated in nature and in real time environments would hinder performance more than expedite a process. Of course this depends on the implementation and what kind of data you collect and crunch. In my experience anything other than simple caching, is usually less efficient than first thought; Which leads me into my next point. :P

2. Caching. I would rather have an intelligent caching layer or ORM plug-in that was entirely configurable so I could analyze stats over time and determine the best tweaks and make heuristic suggestions in the API based on human knowledge.

Caching is usually a lot more efficient because of it's atomic nature and trivial algorithms. Cache this connection, cache that page, cache this section of that page, cache this result set, etc.

Whereas heuristics are usually (if not completely) borderline Artificial intelligence...which is slower than faster and better suited for other tasks.

Cheers,
Alex
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Re: Statiscal lazy loading idea

Post by Kieran Huggins »

I see what you mean about caching... and I totally support it. I guess this is just a "what if" kind of post...

So... how about caching the results of the heuristics? Perform an analysis every once in a while, store the stats.

You could judge the heuristics re-run interval by a function of the probability of it changing and the potential cost savings, saving even more time...
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Statiscal lazy loading idea

Post by alex.barylski »

Caching never hurts no matter where you are.

I'm not sure I totally understand what exactly it is you are trying to cahe though to be honest. You want an ORM solution that optimizes queries based on input frequencies?

What not just cache resultsets or something similar? Perhaps the SQL string if the construction is complex enough?
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Statiscal lazy loading idea

Post by Eran »

Caching never hurts no matter where you are.
Caching can hurt when it becomes too complicated. If you cache too much, you are basically replacing your database with a file-based system - which is counterproductive. Also, worrying about invalidating and when the data becomes stale in the cache becomes a chore, especially with highly dynamic data.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Statiscal lazy loading idea

Post by josh »

Exactly, in the time it takes you to code the optimizations I can go out and do a real project, bill it, and buy a server with 32GB of ram for a few thousand bucks and put mysql's key cache buffer on steroids. It's a good idea but I don't see any value in it personally, especially due to the ORM impedance mismatch
User avatar
allspiritseve
DevNet Resident
Posts: 1174
Joined: Thu Mar 06, 2008 8:23 am
Location: Ann Arbor, MI (USA)

Re: Statiscal lazy loading idea

Post by allspiritseve »

I like the idea.. it seems like it would be most useful as a profiling tool though. You could have the ORM wrap all of your objects/collections, and produce a report on how exactly each object/collection was used over a span of, say, a week. If every item in the collection was accessed at once, but not always, lazy batch loading would be appropriate. If the collection is large, and only a couple of items are accessed, lazy loading would be appropriate. If every item was accessed always, eager loading. (Though I almost feel like eager loading should only be used if you can get that data in a join... otherwise, it seems like sticking with lazy batch loading is the best option).

As far as controlling what the ORM actually does... I'd rather let that be set manually. There may be reasons the profiling didn't encounter that you favor one method over another.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Statiscal lazy loading idea

Post by alex.barylski »

pytrin wrote:
Caching never hurts no matter where you are.
Caching can hurt when it becomes too complicated. If you cache too much, you are basically replacing your database with a file-based system - which is counterproductive. Also, worrying about invalidating and when the data becomes stale in the cache becomes a chore, especially with highly dynamic data.

Ummm that would be a error on behalf of programmer then, not caching per se. ;)

If you cache to much, you don't understand the problem and need to re-evaluate what it is your doing. If you read carefully, you will have noticed I stated that caching is usually atomic in nature, thus implying that most are trivial, simple and highly effective ROI.
Exactly, in the time it takes you to code the optimizations I can go out and do a real project, bill it, and buy a server with 32GB of ram for a few thousand bucks and put mysql's key cache buffer on steroids. It's a good idea but I don't see any value in it personally, especially due to the ORM impedance mismatch
Good design = optimization.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Statiscal lazy loading idea

Post by Eran »

Ummm that would be a error on behalf of programmer then, not caching per se.
Caching is just a technique, it is always up to the programmer to apply it effectively. You said - caching never hurts, to this I say - cache too much, and it becomes counterproductive. You do want to hit your database, since that is your main data source. I don't know why people are so against it :)

Caching is good mostly for very expensive queries or highly repetitive queries with the same parameters. Beyond that you are starting to replace database IO (which is usually the bottleneck) with regular file IO (unless you cache to memory - which is non-persistent).
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Statiscal lazy loading idea

Post by josh »

PCSpectra wrote:Good design = optimization.
In that case we should all go back to binary then huh, since noone needs more than 512KB of ram anyways, and since noone could possibly need more than 5Megabytes of hard drive space
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Statiscal lazy loading idea

Post by VladSun »

jshpro2 wrote:
PCSpectra wrote:Good design = optimization.
In that case we should all go back to binary then huh, since noone needs more than 512KB of ram anyways, and since noone could possibly need more than 5Megabytes of hard drive space
True enough ;)
There are 10 types of people in this world, those who understand binary and those who don't
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Statiscal lazy loading idea

Post by alex.barylski »

In that case we should all go back to binary then huh, since noone needs more than 512KB of ram anyways, and since noone could possibly need more than 5Megabytes of hard drive space
Been there done that...it's waaaay more frustrating writing code in assembler than it is PHP...I would be the last person in the world to advocate such a thing.

Good design leads to optimized code, by it's very nature of being simple, things have a tendancy to be fast, efficient, break less, more secure, etc, etc...

How many times am I going to have to stress that before you begin to believe me?

Caching is a very simple concept and should be very simple to implement. If you implement caching in such a way that it's more burdensome than benefitial, you need to re-analyze how you write software and quit trying to out do the next guy by writing cooler code and focus more on writing simple code. ;)

The fact you try and manipulate what I say in a fashion more favourable for you...makes me ask why? What good could possibly come from suggesting that:

Code: Select all

Good design = optimization
Would at all mean:

Code: Select all

Writing code in bits and bytes
I think Good design = optimization would implicitly suggest that "optimization" as a process is overated and that focusing on clean design will innately result in optimized code, thus eliminating the need to optimize at all. :)

Cheers,
Alex
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Statiscal lazy loading idea

Post by Christopher »

PCSpectra wrote:Good design = optimization.
You are assigning optimization to design and the passing it to Good?

I'd prefer:
Design good;
optimization.
;)
(#10850)
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Statiscal lazy loading idea

Post by alex.barylski »

Semantics :P
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Statiscal lazy loading idea

Post by Christopher »

... Methodology
(#10850)
Post Reply