Page 2 of 4
for those in the ignorant
Posted: Thu Feb 23, 2006 8:08 am
by cj5
Well, if you consider the background of XML, and its roots, it was intended to bridge the gap of the varying types of electronic document formats, and give developers a way to use the information inside these documents for application development. It's the crux of information sharing. What if you walked into a library and saw the shelves filled with sheets of paper just shoved on there in no particular order, and no page numbers? Take SGML for instance (the grandparent of XML). Now consider its document type definition. Would you want to search and retrieve documents of all varying types without placing them into a predefined infrastructure? From the Library of Congress' MARC XML Design Considerations (
http://www.loc.gov/standards/marcxml/ma ... esign.html), the last note given is
Extensiblity
By using XML as the structure for MARC records, users of the MARC in the XML framework can more easily write their own tools to consume, manipulate, and convert MARC data.
You need a central point at which various users can access and manipulate data sources
Posted: Thu Feb 23, 2006 10:07 am
by Benjamin
jshpro2 wrote:agtlewis,
care to mock up a 6 dimensional CSV file for me to show me how that would work?
Whoa no way I don't think that would work very well.
It's basically a data abstraction thing, if you are downloading data for the weather for multiple zipcodes from a weather service for example
Code: Select all
<weather>
<zip code = "33458">
<temps>
<high>82</high>
<low>75</low>
</temps>
</zip>
<zip code = "90210">
<temps>
<high>80</high>
<low>73</low>
</temps>
</zip>
</weather>
It's just a really convenient way to move data from place to place, to store hierarchical data, etc.
Ok, I understand that. But looking at that example the first thing I see is that there is probably 4 or 5 times more markup than data. I'm sure if someone put their mind to it, they could develop something much more efficient, possibly even with mime types so it can support binary data as well. I'm really not impressed with it.
Posted: Thu Feb 23, 2006 10:22 am
by feyd
psst, it does support binary data.
Posted: Thu Feb 23, 2006 10:36 am
by cj5
agtlewis wrote:Ok, I understand that. But looking at that example the first thing I see is that there is probably 4 or 5 times more markup than data. I'm sure if someone put their mind to it, they could develop something much more efficient, possibly even with mime types so it can support binary data as well. I'm really not impressed with it.
I think you're not impressed, because you are not aware of the various tools out there that can produce this. Sure I can show you how to markup a CSV file with PHP. It's very simple in fact. The important thing I must emphasize, is that most major websites that offer up XML in one format or another, usually don't have static XML laying around in a file, instead they dynamically generate it from various formats, whether they'd be electronic documents (text, csv, excel, pdf) or databases. If you think developers hand type out XML documents, then you need to do more research on this topic. I'd suggest you look into things like the PEAR XML packages, NuSOAP, and many other PHP scripts that can easily produce XML. I use some of them to allow people to access my database information. I can create PDF's on the fly by importing XML into a PDF format, Spreadsheets too. XML does support binary data as well, but by interpreting XML as a programming language is misinterpretation. To draw a picture for you, look at it as a central data format. If I am building a site with Java using an Oracle database, and I want to access information from another website built with PHP and MySQL. Now XML offers a bridge connection to that data, because both languages have the ability to parse/generate/query XML data, without having to labor through reinventing the wheel via parallel data access coding.
Hope that helps.
Posted: Thu Feb 23, 2006 11:39 am
by CoderGoblin
cj5 : I remember SGML....
Wasn't that and AECMA going to allow people to have a paperless office by the year 2000...
Oops too late...
Posted: Thu Feb 23, 2006 11:53 am
by cj5
CoderGoblin wrote:cj5 : I remember SGML....
Wasn't that and AECMA going to allow people to have a paperless office by the year 2000...
Oops too late...
What's your your source for this information?
Posted: Thu Feb 23, 2006 12:18 pm
by Christopher
agtlewis wrote:Ok, I understand that. But looking at that example the first thing I see is that there is probably 4 or 5 times more markup than data. I'm sure if someone put their mind to it, they could develop something much more efficient, possibly even with mime types so it can support binary data as well. I'm really not impressed with it.
Why is it a problem that there is more markup than data? HTML usually has more markup than content and I don't hear an outcry about how unimpressive or unsuccessful it has been. I hope you are not thinking of performance issues in the abstract. The markup is meaningful and allows very general support for very rich data is thousands of tools and programs. Unlike "more efficient" formats, when you get XML it is pretty self explanatory what the data is.
I'm not sure what to say about "I'm really not impressed with it" as the likes of IBM, Sun and Microsoft (and pretty much everyone else) have all standardized on it. XML does for data interchange what HTML did for page layout -- make the powerful easy.
Posted: Thu Feb 23, 2006 12:53 pm
by Roja
agtlewis wrote:But looking at that example the first thing I see is that there is probably 4 or 5 times more markup than data. I'm sure if someone put their mind to it, they could develop something much more efficient, possibly even with mime types so it can support binary data as well. I'm really not impressed with it.
Efficiency isnt the goal. Portability and consistency is.
Just like you could probably write a custom parser for Yahoo finance, that scrapes *only* the numbers you need, making it very "efficient". But when you have to change that custom parser once a week, redoing almost all your work, because they change layouts.. Suddenly, consistency becomes a much higher priority.
Now imagine trying to incorporate data from a dozen sources per page, like the personalized pages do. It would be flat out impossible without xml.
If that doesn't get it across to you, picture doing sales through online retailers like barnes and noble, amazon, and other retailers. Trying to create the information each needs, and parse the information they return would be a nightmare. With XML, you can simply import the xml feed, and target the element in the tree you are looking for, like its a row from a db.
XML is fairly efficient - for being a completely consistent data exchange format. It supports binary data (CDATA), and millions of websites have embraced it.
Feel free to not be impressed or use it. The rest of the world has, does, and is FAR better because of it.
Eventually, you'll probably find something compelling about it.
Posted: Thu Feb 23, 2006 2:04 pm
by m3mn0n
Bottom line is XML is very useful if you're a web developer, and the more you understand about it and the more you work with it, the more you appreciate it.
Especially when you learn about making your own markup language for a data source you manage, XHTML, and syndication (Atom/RSS for example) and effectively use these things.
Posted: Thu Feb 23, 2006 2:23 pm
by Christopher
There are reasonable alternatives to XML for some cases, JSON being an example. For language specific tasks you can do shortcuts. I hear that Yahoo is providing some data in PHP serialize() format for example.
Posted: Thu Feb 23, 2006 3:17 pm
by josh
arborint wrote: I hear that Yahoo is providing some data in PHP serialize() format for example.
And if I'm using perl I have to re-implement unserialize() ?
Posted: Thu Feb 23, 2006 4:03 pm
by Roja
jshpro2 wrote:arborint wrote: I hear that Yahoo is providing some data in PHP serialize() format for example.
And if I'm using perl I have to re-implement unserialize() ?
Tada. The value of standardized data formats.

Posted: Thu Feb 23, 2006 8:10 pm
by Gambler
Speaking about flawed logic... Excel tables are the best document format in the world. They are used by many large businesses and there are many tools that generate/manipulate them. Also, they are created by Microsof, which by itself makes them business standard. Who cares about the rest? We should all use excel tables. We should not reinvent the wheel. We should not consider using better formats, because excel already exists and it works.
Those are the same arguments everyone uses do defend pretty much any existsing technology that is fairly popular.
Posted: Thu Feb 23, 2006 8:13 pm
by m3mn0n
Gambler wrote:Speaking about flawed logic... Excel tables are the best document format in the world. They are used by many large businesses and there are many tools that generate/manipulate them. Also, they are created by Microsof, which by itself makes them business standard. Who cares about the rest? We should all use excel tables. We should not reinvent the wheel. We should not consider using better formats, because excel already exists and it works.
Those are the same arguments everyone uses do defend pretty much any existsing technology that is fairly popular.
LOL
Were you joking? Or are you serious?
Posted: Thu Feb 23, 2006 8:22 pm
by josh
Gambler wrote:Those are the same arguments everyone uses do defend pretty much any existsing technology that is fairly popular.
Are you saying that I am defending XML? I did no such thing, I am defending standards. Excel is not a standard, it is a proprietary format and thus cannot be compared to the current topic ( serialize vs xml). Although serialize is a data format that can be implemented in other languages, it is not a standard. Nothing tells us that the format that serialize() won't change in the next PHP version (although I doubt they will because I bet a lot of people serialize() data for long term storage)