On Jul 25, 8:13=A0pm, "jbwest" <jbw...@[EMAIL PROTECTED]
> wrote:
> "t0rakka" <t...@[EMAIL PROTECTED]
> wrote in message
>
> news:W7mik.38746$_03.5595@[EMAIL PROTECTED]
>
>
>
>
>
> >> I am pretty new to OpenGL. After trying out a few examples, i
> >> downloaded the Mesa source code and looked at the 3D driver (ATI :
> >> r300) to see what operations are typically accelerated. I guess that
> >> this
> >> is not a simple task that can be done quickly with so many
> >> indirections in the code :-) Some are obvious where ctx->Driver
> >> functions are initialized by the driver. Then there are the software
> >> fallback modules vbo etc. also that initializes the Driver functions.
> >> For example, i was trying to see if the Vertex operations like
> >> glVertex2f() are accelerated. I was unable to find it out. Is there a
> >> good do***ent that describes how the 3D driver acceleration works ? I
> >> searched but could not find a do***ent describing this. Any help on
> >> this would be appreciated.
>
> > First you have to understand what the glVertex2f() implies; it implies
> > that the data is read from the parameters passed to the function.
Usual=
ly
> > this is a bad sign; the graphics processors often can only use local
> > memory (memory close to the graphics processor, on-chip, or at least
on
> > the same discrete graphics board).
>
> > There are many different architectures, some use so-called UMA where
th=
e
> > system memory is shared between the graphics processor and application
> > processor. This is very inefficient as both contest for access to the
s=
ame
> > memory. More bandwidth-friendly approach is to have own dedicated
memor=
y
> > for the graphics processor, this can and is done on both discrete
graph=
ics
> > boards and integrated graphics (in various shapes and forms), however,
> > this approach requires the data from the so-called system memory
> > (application processor spesific memory) to be copied to the memory
that
> > the graphics processor can access. This on the other hand requires
extr=
a
> > overhead.
>
> > Regardless of the various tradeoffs between different designs, there
is=
a
> > rule-of-thumb to follow: if possible, store the data in graphics
proces=
sor
> > specific storage location. In other words this means use Vertex Buffer
> > Objects, or VBO's in short. This eliminates all kinds of nasty things
t=
he
> > graphics driver has to do and there by improve the performance.
Display
> > Lists are also capable of doing this sort of optimization.
>
> > The glVertex2f() itself is not "accelerated", it just way to input
vert=
ex
> > data. It just defines a new vertex. The draw call is where the action
> > takes place: the driver uses all the collected data to draw
primitive(s=
).
> > I use the word "uses", because that doesn't necessarily mean that
anyth=
ing
> > is happening at that time, it could just update the command buffer, if
=
the
> > driver has one, or similar mechanism.
>
> > A good driver is asynchronous; the hardware is doing it's tasks while
> > freeing the CPU to do it's tasks.. this means the internal buffers and
> > data are more dynamical than static and requires some creativity from
t=
he
> > driver authors, but not that much, but the point is that unless you
are
> > aware of such things taking place, the driver code might be a slightly
> > difficult to follow. But when you do, it's all peachy.
>
> > The key thing is that glVertex2f and similar calls shouldn't invoke
any
> > register writes or other nasty business to take place, just store the
d=
ata
> > and send it in one big packet to the GPU to think about. That's less
> > overhead than invoking some transfer mechanism for each vertex
> > (fetch-convert unit, dma, register writes, what ever.. architecture
and
> > platform specific =A0details which are too numerous to enumerate..)
>
> > ...
>
> Sun's (used to?) have glVertexxx written as macros that write directly
in=
to
> a memory-mapped vram area.
> There's nothing in the spec per se that would prevent anyone from doing
t=
hat
> "under the hood".
> (e.g, populate a "private" vbo).
>
> glVertex2f may cause a register write or etc, e.g, when a buffer fills
up=
..
>
> jbw- Hide quoted text -
>
> - Show quoted text -
Or a lot of other things; that's the point. But usually batching is
more efficient, that also the point. The old SGI boxes had 1-to-1
mapping between OGL API functions and command buffer entries, a
command was basically enumeration of the function name, then the data.
The hardware could deal with that. Very simple driver. But very
inefficient approach when the primitive counts and number of draw
calls are high as they are these days.
The question doesn't state any preference for the hardware, a
resonable assumption is that contem****ary mainstream hardware.. but
that is just a guess and as such any answer is approximation at best.
Maybe shouldn't bother.. or ask more questions.. who knows.. more
dots.. some more...


|