Archive for the ‘opensource’ category

New PDF database magazine

November 19th, 2009

No, it’s not from me or WebDevPub, but it looks good all the same.  It’s actually a continuation of the earlier MySQL Magazine, but with a larger focus, and is now a pay-for PDF, similar to JSMag and GroovyMag.

OSDBZine.net is put out bi-monthly from Keith Murphey, who had started the MySQL Magazine two years ago (and recently interviewed on webdevradio).  I just picked up the first issue, with a whopping 61 pages of database goodness.  With pieces on Drizzle, Firebird, PostgreSQL, MongoDB, LucidDB and more, it’s got something for just about everyone.

Only drawback so far is the current signup process – it’s a little barebones (I spoke with Keith, and he’ll be updating it soon).  Visit http://www.osdbzine.net/signup.html to register an account, then login to purchase via paypal.

Oracle buying Sun – what does this mean for MySQL?

April 20th, 2009

Just woke up this morning to news that Oracle is buying Sun.  After cursing myself for not having bought some JAVA last week when it was in the 6 range (pre-market now at 9.10), I started thinking about what this might mean for MySQL.

About a year ago, Sun purchased MySQL.  Although a lot of hooha was made about what might happen to MySQL at that time, Sun made it pretty clear that they wouldn’t be changing too much of the company they were purchasing.  They’d wanted to have a good story on the low-end of computing (from what I remember) and the LAMP stack (where M was for MySQL) dominated much of the low-end of web development.  While it was never on the same level as Ebay buying Skype, I think a lot of people were confused by how Sun would be able to get back the billion dollars they expended on the MySQL deal.

Fast forward a year.  Some of the key MySQL core team have left, forking the MySQL product in the process (drizzle).  Is the MySQL branch of Sun very attractive?  I imagine Oracle was looking more at the hardware and consulting side of the Sun acquisition, not the database side, but this won’t site well with many in the MySQL world.

The MySQL community has always been a bit suspicious of Oracle.  Many were quite alarmed when Oracle purchased InnoDB, the company that made the innodb MySQL table engine, and that was something that spurred on work on other transaction engines in the MySQL world.  Nothing has yet come to be adopted as widely as innodb yet, and Oracle’s control of InnoDB has continued to be a bit of concern for some in the community. 

Is Oracle getting much on the database front when they purchase Sun?  There’s probably not enough of a marketshare between the two to claim that there’s some sort of monopoly anti-trust considerations to take in to account (MSSQL, DB2 and PostgreSQL – what is their combined marketshare?) 

Is there a danger that “MySQL” as a product will be a name brand, but that many people will just start using community forks?  I can see that “MySQL” as a database engine might end up being a generic term, somewhat like “Linux”, in that there are many distros out there serving different needs.  Not sure if the MySQL licensing would ever allow for that degree of diversity, but maybe we’ll see something like this in reaction to the Oracle purchase. 

Side question – what will this do to Java and OpenOffice?  Hopefully Oracle will leave these (and MySQL) intact, and just focus on integrating these technologies in to their sales and consulting process, but leave the tech direction alone.

Running .Net code on a JVM?

June 8th, 2008

I just stumbled on this article Sunday morning.  This snippet sums up the product:

There is a way of marrying the advantages of .NET development with Java deployment. Using Mainsoft for Enterprise Edition (EE), Visual Studio developers can write code in .NET and cross-compile it to Java. Not only code, but pieces of the Framework; Mainsoft has worked with Miguel de Icaza and Novell to port pieces of the Mono project to Java. Your limits in calling Framework classes, especially for Web apps, are almost nonexistent.

Sounds very intriguing.  But, is it just a solution in search of a problem?  Would many .Net shops embrace a Java app server for deployment?  Is this too niche of a product to take off beyond a few edge cases?  Or is this sort of thing the future?

What would, I think, be more useful for many shops is to take Java code and compile it in to something that targetted the .Net CLR.  Are there any projects that do this/

WebDevRadio podcast – Symfony Project at MySQL User Conference

April 21st, 2008

I had a chance to catch up with the Symfony guys at the MySQL User Conference a few days ago.  We get some background on the Symfony project, and a glimpse as to where things are going in the near future.
This was my first ‘from the floor’ recording – I think the levels are *OK*, but please bear with me – Fabien sounded better in person  :)   I think I got a small ‘exclusive/breaking news’ at the time, but I wasn’t able to get the podcast up until now.

Have a listen at http://webdevradio.com

Blob Streaming with MySQL

April 16th, 2008

I’m sitting in on the Blob Streaming with MySQL session.  The project is at http://blobstreaming.org.  Coincidentally,  my brother had put together a proof of concept for streaming blobs from MySQL a few months ago before either of us heard about this project.

Why to put blobs in a database?  The biggest pro seems to be for transactional reasons.  File systems directly aren’t always transactional.  Also replication and HA solutions then get applied right to that data as well as the rest of your data.

Are there reasons to not put blobs in a database?  They can make the table slow, the database can become too big to snapshot or backup (practically, if not in theory), and replication can be too slow.  The blobstreaming.org project seems to alleviate the replication slowness problem.  How?  The blobstreaming project stores the blob data in the database, but not in the rowdata itself (which just holds a reference).  The goodness of the database functionality is there, but the replications aren’t slowed down by the blob data.  I think I’m getting this right…

Basically the BlobStreamer is another engine type, but it’s not a table type you can create directly – it has to hook in to another table type.  The only one demoed is PBXT (blobstreaming was put together by PrimeBase as well, so this makes sense).  Perhaps this would work with innodb as well…?  The BS engine type exposes an HTTP interface as well for basic reading and writing of blob data.

An interesting project which may come in handy for some large file/video projects.

Generational developers

April 16th, 2008

I’m seeing a large cross section of age groups represented at the MySQL conference.  The typical late teens through mid twenties are here, as expected, but I’m seeing a high number of people who are clearly older than that – many likely mid 40s or higher.  It could just be that database work is typically suited for older workers looking for more stability (‘keep the systems running day in day out’), but it might also represent an uptake of MySQL at more established companies as well.

Anyway, that’s not quite what I was writing about.  What crossed my mind was the children of many of these older people.  Will they grow up in to software people as well?  Will we perhaps see consultancies handed down from generation to generation over the next several decades?  Software as an industry has barely been around 30 years, so I’m not sure it’s been on too many peoples’ minds, but I still wonder.  My dad is an accountant, but didn’t bring me up to be one, and I had little interest.  Some of that may have been because I had no way of having visibility in to his profession.  Beyond ‘take you child to work’ days, there’s not too many professions where children can get hands-on experience of what their parents do.  With many types of software, that’s not the case.  Anyone can get started with most tools, especially with Open Source.  Put another way, will Linus’ kids take over the kernel in another 20 years?  :)

SOLR search adoption – the power of sane defaults?

March 27th, 2008

Tonight I met someone from a (largish) local company and learned they’re migrating their search functionality to SOLR.  This is the second largish company in the area I know that’s migrating to SOLR.  I’m not naming names only because I’m not sure they’d want me to do so.  Suffice it to say these are names fairly well known in the marketing and communications industries.

I’m not surprised at all by the adoption, as SOLR makes it pretty easy to get started using the power of Lucene without requiring you to do a lot of setup or administration up front.  These ‘sane defaults’, as I believe Erik Hatcher put it to me, are what give projects like SOLR a competitive advantage against even commercial offerings.  Whether technology is good or bad is often secondary to whether it’s easy to get it to a testing stage.

If you’re using SOLR, what was the deciding factor?  Ease of setup?  Flexibility?  Compatibility with existing Lucene data?

If you’re not using SOLR for your data search needs, what are you using?  Raw LuceneXapianSphinx?  A commercial product?  If so, which one?

P.S  If you’re not sure how to go about implementing search for your site and have some questions, email me – mgkimsal@gmail.com.

FriendFeed prediction – clustered feed data

March 18th, 2008

Robert Scoble just switched his home pages from TechMeme to FriendFeed.

“So what?” is likely what you’re thinking. Yeah, big deal, right? Well, TechMeme had a clustering algorithm which would group together news articles of related content, and give you a good idea of the ‘hot topics’ of the day. It did this in a completely automated way.

I predict that FriendFeed (or another social network aggregator) will introduce topic clustering, based on the keywords and topics of people you follow. Clusty.com has done topic clustering for years, though it’s not something that is of great use to ‘general’ searching (at least, not in many cases). Carrot2, an open source clustering engine, also provides this sort of functionality.

I took a first stab at clustering my feed data with carrot2. I’m not sure I had enough data to draw useful conclusions yet – it might need a larger body of a group of people’s tweets (for example) which I just didn’t have at the time.

For people who follow thousands of users, it would obviously be useful to have a ‘big picture’ view of the hottest topics being twittered/blogged/etc about. But take it one step beyond that. Being able to look at *other peoples’* topic clusters would give you an instant view as to whether they have people worth following.

When I look at twitter, I can look at other people’s followers. Great concept, but it doesn’t tell me anything about the topics those people tend to twitter about, so I’m never sure if it’s worth following them. Nor do I get any notion of how those people are related. Marrying facebook or plaxo data against twitter feeds would be useful, no? Or just letting me add my own relationship metadata in to twitter itself.

Getting a high level view of peoples’ topiclusters would be incredibly useful. “Topiclusters” – yeah, I just made up that word and yeah, it’s lame. “Topsters”? “Substers?” (subject clusters?).

Yahoo supports more semantic web standards

March 13th, 2008

There’s an article on TechCrunch about Yahoo offering support for a number of microformat standards.

They are saying that they will support a number of microformats at the start: hCard, hCalendar, hReview, hAtom and XFN. They will support vocabulary components from Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, and others. They will support RDFa and eRDF markup to embed these into existing HTML pages. Finally, Yahoo will support the Amazon A9 OpenSearch specification with extensions for structured queries to deep web data.

I replied to the post at techcrunch and will repost it here – I thought I’d change something, but I’ll just throw it out here for now for discussion.

=========

I’m a bit more reluctant to believe the hype or promise of this. There are technical and human hurdles to deal with – semantically marking up data is hard, and humans can still get things wrong. Yahoo will still need to put in ‘best guess’ algorithms and such to compensate.

But the bigger issue is why would someone like linkedin semantically mark up all their profile pages, at least for public consumption? It makes it that much easier for competitors to come and take away the one set of data that makes linkedin unique – the relationship data they have about their users. For me, what makes linkedin linkedin is the set of relationships (and to a lesser extent, what tools linkedin provides to exploit those relationships).

Adding semantic markup to linkedin profile pages will make it easier for Yahoo to show more information. Great. But it also makes it easier for everyone, including Linkedin and Yahoo’s competitors, to scrape intelligently, and offer bigger/better/faster/cheaper.

Now, there are certainly other benefits regarding cross-domain info linking – being able to better know the relationships between data across multiple data sets, for example. Again, good, but not great, imo.

It’s certainly a chicken/egg situation, but I’m also not sure that’ll we have the same incentives that we did 10 years ago before the massive commercialization. For every argument for semantic markup, there’s gotta be at least one competing commercial interest against it.

That’s my 2 cents as to why this will be an uphill battle.

=========

Any thoughts?

Possible book project – open source search

March 9th, 2008

I’ve had a book project in the back of my mind for a bit.  Though there’s never enough hours in the day to get everything done I need to, I have one book I’m wrapping up in the next few days, and am seriously considering committing myself to this next one.  No publisher lined up yet or anything of that nature, but if I can’t find one I’d self publish through lulu.com.

The title obviously gives it away – I’m looking at doing a book on open source search products.  I was thinking of doing an entire book on SOLR last fall, but honestly I’m not sure there’s enough about SOLR to write an entire book – at least not without repeating a lot of the information already out there in tutorials and what not.  And I’m not sure that another deep in-depth technical book is necessary on something that’s moving so fast.  The idea was to give a moderately-deep (but not overly deep, if you can make that distinction) look at setting up and using a group of open source search projects out there.

  • Lucene is the leader in this space, without question.  It’s been around for quite a while, and keeps getting better with each release.  However, Lucene itself is very low-level, and Java only.  Many implementations of Lucene have sprung up, such as Lucene.Net and Lucy, as well as tools which build on Lucene like SOLR and Nutch.
  • PostgreSQL has full text search capabilities which I plan to explore in more detail.
  • MySQL has had a degree of full text search capabilities for years, and the Sphinx project has emerged over the last couple of years to provide even more functionality and speed.  I believe Sphinx is essentially standalone but can be coupled with MySQL or PostgreSQL – again, that’s research fodder for the book.

Are there other open source search projects that you’d be interested in seeing covered in a book?  Is this a topic you see any demand or interest in?  Whenever I see a gap in the book market, I always wonder if it’s because there’s no interest, or just that no one has filled the gap yet.  Usually something appears to fill that gap a few months after I notice it, but I’ve yet to see this gap filled after almost a year of thinking about it.

Feedback?