Follow LostMuser on Twitter

Tuesday, January 10, 2012

The Issue with Databases

Databases... Where did these things come from?

There are a big range of databases. From the Metallic tape systems the FBI used to the fancy Oracle or IBM towers. The goal is simple: Store your information. But the subtleties are complicated.

Do you want writing to be fast? Do you want reading to be fast? What do you want to read? How do you figure out what you want to read? -- These questions have a variety of answers depending on who they're built for.

There are some interesting examples of databases that come to mind. Google and Facebook are the unspoken kings of the field. But unfortunately for us outsiders, we don't know all of their insights into the matter. But we can piece together a thing or two.

Cassandra.

What's up with Cassandra? On the surface, it's a really cool, scalable, database solution. It supports a lot of fascinating stuff.

But! Facebook, after developing Cassandra (http://cassandra.apache.org/), dropped it entirely. Strange... it looks like Facebook stumbled upon something much better. Better enough to through out what they'd invested on for so long and start over completely.

And Google? They built their own thing too. A really pretty thing (from the likes of it that I manage to overhear). Assuredly, MapReduce was a beautiful thing and Google was a huge proponent of it. Just taking into account that they can take a single query and search the whole internet in sub-second timespans, we can appreciate how effective they are. Anyway, the fact that they threw everything out and built their own is a testament to the horrid condition of existing database solutions.

Combine that with Facebook's jumping around from database to database and, eventually, building their own and then building another, has got to be a wake up call to everyone that our databases are just shit.

I have felt this way for a long time. Every coworker, every employer along the way of my winding career has universally been against me on the concept. The more I push the issue, the more I've gotten large groups of religious-esk programmers ganging up on me using social force instead of logic and persuasion.

To begin my attack on this age-old issue, I've started with the index concept. One of the goals of my database-to-be is quick data lookup. Indeed that is one of the more challenging of the concepts involved. Anyone can write something where data writing is quick, it's data lookup where all the drama is.

Okay, so you got an index. The first question, which seems obvious from a certain perspective, is why don't indices have filters? Filters are just a ridiculously obvious optimization. Let's look at MySql's index syntax and see if they have a filter parameter:



CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name
[index_type]
ON tbl_name (index_col_name,...)
[index_type]

index_col_name:
col_name [(length)] [ASC | DESC]

index_type:
USING {BTREE | HASH}
No filter parameter. What happens when we want to build an index based on some condition of the row? -- you can't! Nothing at all. Fail.

Let's presume you have a table of this sort:
"gender" => string
"income" => int

Now let's say you want to build an index for all males sorted by their income. You can't! That may not seem so bad when gender has just two values, but what about something more complex:

"unix_creation_timestamp" => int
"user_id" => int

Let's say we want to build an index for just a single given day sorted by user_id. Again, we can't. But why can't we?

Turns out, this is a very common question in many applications. But, this optimization isn't enough. In the gender case, we're only going from log(n) time to log(n/2) time, which isn't much to celebrate. In the user creation table, if w = number of weeks in the range, than we've just gone from log(n) to log(n/w). While that's better for a large number of weeks, it's not really enough. We're onto something much bigger here.

What if, what we really care about is not a list of new user ids during the given week but just the count. Suddenly, our needs go way down. All we need to store is an int for each given week. When we add a given user row, we adjust just a single integer (increment it by one) for the given week it's being created inside of.

That's an O(1) burden on insertion but optimizes our query from O(log(n)) to O(1).

Now we're talking seriously. This is how queries should be operating. And they can, with the right kind of database. It just requires a little thinking of your queries in advance of needing them and approaching the database problem from a sideways perspective. Don't tackle it head on because your presumptions will get in the way, you've got to look for different angles to think about this stuff.

Wednesday, June 22, 2011

What Were They Thinking? [Apple]

Read the last two sentences of this documentation entry (It says "This method is unimplemented in iOS, so it performs no operation."). Can you think of a scenario where it makes sense to put this in the docs at all? I can't...

Sadly this is exactly the routine I need for the work I'm doing. I guess I'm too small of a market for Apple to support me =P.

getResourceValue:forKey:error:

Returns the resource value for the property identified by a given key.

- (BOOL)getResourceValue:(id *)value forKey:(NSString *)key error:(NSError **)error

Parameters

value

The value for the property identified by key.

key

The name of one of the URL’s resource properties.

error

The error that occurred in the case that the resource value cannot be retrieved.

Return Value

YES if value is successfully populated; otherwise, NO.

Discussion

value is set to nil if the requested resource value is not defined for the URL. In this case, the method still returns YES.

This method is unimplemented in iOS, so it performs no operation.

Availability

  • Available in iOS 4.0 and later.

Tuesday, June 21, 2011

Red Rock's Wifi Blocker [UDP]

You know those blockers that make you click "I accept" they have at cafe's and stuff? When the one at redrock drops you, apparently UDP broadcasts are still allowed. TCP connections still aren't though!

This funniness caused me to go do some interesting but unnecessary debugging today =).

Wednesday, June 15, 2011

File Monitor [iOS] [OS X]

I built a class for monitoring file changes. It will fire events on the delegate when files or folders are modified.


I'm taking the code base in a new direction, but this class was so cool and useful I'm publishing it here.


http://dustytech.com/FileMonitor.zip



Monday, May 9, 2011

Bitcoin Bills

You can buy physical bitcoins now! This is a really cool idea -- they use a sticker and some plastic layers to cover the private key of the bitcoins.


I'm buying one and framing it.

Bitcoin

So I decided to buy some bitcoins because it was cool. The exchange rate at the time was around $0.80 per bitcoin. I put in a bit over $40.

Over the last few weeks, the price has nearly quadrupled to $3.80 per bitcoin. My $40 has turned into $190. About 2 weeks ago Paypal decided to ban bitcoins from their system -- this 'caused the price to double. Why?

They keep a daily graph of the exchange rate on this site:
http://mtgox.com/

Thursday, May 5, 2011

Apple's Long Function Names

Sometimes they can get excessive... Look at this one:

CFURLCreateStringByReplacingPercentEscapesUsingEncoding




How are you supposed to make a call to that be clean? There's no way...

Musing's Blog Viewer Webcam



(People reading the blog right now!)