Re: [NTLK] Developer Soup question

From: Will Hartung (willh_at_msoft.com)
Date: Thu Apr 29 2004 - 11:47:51 PDT


> From: "Paul Guyot" <pguyot_at_kallisys.net>
> Sent: Thursday, April 29, 2004 8:24 AM

> Aux environs du 28/04/04 à 11:44 -0700, sous le titre "[NTLK]
> Developer Soup question", Will Hartung prit sa plus belle plume pour
> écrire les mots suivants:
> >What do you think of Soups? Do you like them?
>
> Yes.
>
> >Favorite feature?
>
> Union soups.
> Complex queries.

Like what? They seemed to be mostly ranged queries (from the documentation),
i.e. frames with 'slot A </<= B </<= C. I saw the technique to pass a
function in to do arbitrary selection (though obiviously it was scanning
each frame). I guess you can just heap on the conditions to a query, and it
will "figure it out" "Give me all entries between these dates that contain
the word 'Bob'".

How does the Newton deal with dates?

> >Is the ability to simply store an arbitrary frame in a soup valuable
> >compared to having to set up a schema?
>
> One doesn't have to update the schema whenever a new class of data is
> added. This is a key feature of the NewtApp framework (one can extend
> any NewtApp application with new stationeries).

This seems key to the integration of the system, IMHO. Stand alone this is
no big deal. But start adding arbitrary applications, and this seems to be
very important.

> >Are managing "inter soup" aliases a
> >problem?
>
> No. Plus we can have N-N relations.

How do you do N-N relations?

> >At a glance, Soups seems to give a flexible ISAM storage system, without
> >having to worry about table definitions etc, but not quite making the
leap
> >to being an "OODB", notably when you try and store a Frame graph of items
> >that may already be stored in a soup (which makes the system simpler).
>
> Don't forget that Newton data are not trees but DAGs.

By this you mean they can store circular structures, yes. The flatten the
graph before they store it.

The key distinction here is in the "classic" OODB, each Frame is unique. In
Soups, only the "parent" stored Frame is unique, and any embedded Frames (or
Entries) are copied into the Soup, rather than shared.

So, for example, were you to store two Frames, A and B, that each contained
a common Frame C, the Soup would actually contain 4 total Frames, A, C, B,
C. Whereas in a OODB, you'd only have the 3 (A, B, C). Should you need to
really share C, you must explicity create an Alias. This really simplifies
the implementation of Soups, but doesn't seem to be overly burdensome on the
developers.

> >Any idea why the number of tag's on a soup are limited to, what, ~600?
>
> What do you call tags?

You can create a special "tag" index for a soup. A tag index is an index on
a slot that contains symbol(s). Essentially, from what I can tell, this is a
"bitmap" index. What is not clear is the contents of the entries tag slot.
It appears it can either contain a symbol, or an array of symbols.

The actuall number is 624, which, coincedently, is 16 away from 640. 640
bits is 80 bytes. Also coincedently, a key can be at most 80 bytes. So, what
happens (apparently) is that the Soup takes a list of Symbols, arbitrarily
assigns them a scalar value (0-623), and uses that bit to represent the
Entry's inclusion in the tag index.

Thus, if you have two tags, say 'business and 'home, you can qualify, say, a
phone number as being both a business and home phone. In the tag index, the
two bits associated with those symbols would be set. Then using simple AND,
OR, or EQ operations, you can quickly deterimine if an entry is in any of
the up to 624 categories, including combinations (like I want all phone
numbers that are both business and home) So, I guess the 16 bits of the 640
is actually 2 bytes. The first byte identifies the key as a tag key, and the
second identifies how many byte are consumed by the index column (0-78).
Very clever.

> >I'm just curious as someone who has seen them from afar and what folks
who
> >had to manage them thought about them as a storage mechanism. Do you
think
> >in hindsight a mini-SQL like system would have been more flexible and
easier
> >to use?
>
> Frankly, no.
>
> A big feature of the storage system is that data does not appear as
> flattened or serialized anywhere. You can pass an entry from a soup
> to a function and it wouldn't notice (unless asking the system) that
> it's a soup entry and not a frame in heap or in a package.

Yes. The Entries are simply light weight proxies to the actual Frames.

This is partly the nature of the NewtonScript language as well.

Regards,

Will Hartung
(willh_at_msoft.com)

-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
Official Newton FAQ: http://www.chuma.org/newton/faq/
WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/


This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 12:00:02 PDT