On Fri, Apr 2, 2010 at 01:35, Philipp Marek <philipp.marek_at_emerion.com> wrote:
> Hello Greg,
> Hello Jan,
>
> On Donnerstag, 1. April 2010, Greg Stein wrote:
>> 2010/3/31 Jan Horák <horak.honza_at_gmail.com>:
>> > 30.3.2010 13:55, Philipp Marek wrote:
>> >...
>> >
>> >> * Furthermore, how about allowing the plain data to reside in files?
>> >> Would make the database much smaller, and then these data blocks
>> >> could possibly be shared among multiple repositories.
>> >> (Really easy, too, if they're named by their SHA1, for example).
>> >> That should allow for zero-copy IO, too (at least for sending data).
>> >
>> > The question is, how much faster it would be.. I would like to make a
>> > simple test to simulate this soon and estimate the percentage
>> > difference..
>>
>> My gut says "not that much faster". In most scenarios, the network
>> bandwidth between the client/server will be the bottleneck. Reading
>> the data off a disk (rather than from a DB) is not going to make the
>> WAN connection any faster.
>>
>> On a LAN, you might have enough network bandwidth to see bottlenecks
>> on the server's I/O channel, but really... I remain somewhat doubtful.
>
> It's not about the raw speed alone.
>
> Of course, having to tell a database which BLOBs to fetch (which might readily
> be stored out-of-line, eg. with PostGreSQL [pg_largeobject, TOAST]), getting
> them via a socket into a buffer, and writing the buffer to another socket,
> everything with protocol header adding/removing etc. *has* to be slower than a
> open()/sendfile()/close().
>
> It's a bit about latency (which might not be really an argument, as the
> database has to be queried anyway), and about CPU load.
Possibly. If you're bottlenecked on the network, then your CPU load
will (by definition) not be an issue. You have CPU load to spare.
That's for a single user. If you scale up to multiple users, then yes:
minimizing the CPU will be important. Each needs to do some work, then
sit on the outbound network. But you have almost no CPU. You're
waiting on the disk or the RDBMS or the network.
> For small setups the HTTP server and the database will be on the same host; if
> we can increase the performance by 10% it means that 10% of all people won't
> need to buy a faster/larger server.
That doesn't follow.
>...
In any case... it is hard to truly say what will happen without some
data. There are lots of variables :-P
Cheers,
-g
Received on 2010-04-02 08:43:04 CEST