# GPU
COA - Computer Organization and Architecture
MIPS - Million Instructions Per Second
     - Microprocessor without Interlocked Pipelined Stages
VMIPS - Vectorized MIPS
- The FUs(functional units like ALUs) can be fully parallel, or a combination
  of parallel and pipelined units with multiplied clock rate to match.
Vector register - generally of dimension 64 with each register of size 64bits
- GPU is organized into TPC->SM->SP
  - it also conatains ROPs
  - each SM has
    - eight SP cores
    - two SFUs each with 4 floating point multipliers
    - MT instruction fetch and issue unit
    - an instruction cache, a read-only constant cache
    - a 16kb read/write shared memory
  - each SP has
    - a scalar MAD unit
TPC - Texture/processor clusters
SM - Streaming Multiprocessor
SP - Scalar/Streaming processor
ROP - Raster operation processors or Render Output Unit
SFU - Special Function Unit(Transcendental functions like cosine, sine, etc.)
MT - multi thread issue
MAD - Multiply-add unit
ISA - Instruction Set Architecture
- Tesla architecture(./images/GeForce8800)) comprises of 8TPC each with 2SM
  each with 8SP
  - input assembler collects vetex work
  - Vertex work distributor distributes vertex work packets to TPCs
  - TPCs execute vertex/geometry shader programs
  - output data is written to on-chip buffers
  - buffers then pass their results to the viewport/clip/setup/raster/zcull
    block
SIMT - Single Instruction Multiple Thread similar to SIMD(applies one
instruction to multiple data lanes) where one instruction applied to multiple
independent threads(non concurrent). a SIMT instruction controls the execution
and branching behavior one thread. bunch of SIMD instructions form an SIMT 
program.

## Nvidia
- each SM's multithreaded instruction unit creates, manages, schedules, and
  executes threads in groups of 32 threads called warps. several warps form
  a block. each SM has 1 or more blocks. each SM has atleast 1 warp size of
  SPs
- each SM manages a pool of 24 warps, with a total of 768 threads.
- each SM can schedule max of 1024(2048 in modern GPUs) threads per
  instruction, and hence 1024 is max block size. if SP's per SM are less than
  block size then a block is divided into batches of multiple of warp size.
- each SM maps warp threads to the SP cores
- In each operation cycle, the SM warp scheduler selects one of the 24 warps
- An issued warp executes over four processor cycles
- the SP cores and SFU units execute instructions independently
Control flow includes branch, call, return, trap(program blocking itself and requesting OS service) and barrier synchronization
Vector Register File is collection of vector registers
Vector registers are ordered collections of scalar registers and provide intermediate storage space for the components of a vector of moderately large size.
Register File registers are divided logically across the SIMD lanes
- Fermi GTX 480
  - 16 SMs each with 32 SPs
  - a warp comprises of atleast one SM
  - each thread has 64 registers of 32bit or 32 registers of double-precision
    floating point operands
  - so a warp has 32 double precision vector registers of 32 elements
- typically a warp contains 32 threads, if a SM has 128 SPs then 4 warps are
  scheduled at a time on the SM.
- when an instruction in a warp is waiting for another operation to complete
  then the warp scheduler runs another warp while the blocking operation is
  completed.
- A warp typically requests 32x4byte memory aligned words in one global memory
  transaction which is called memory coalaced transaction.
  - uncoalesced memory access is due to
    - offset request: a[tid+s] = a[tid+s]+1;
    - strided request: a[tid*s]=a[tid*s]+1;
- various software optimizations
  - memory access coalescing
  - optimizing reduction kernels
  - kernel fusion, thread and block coarsening
- GPU Topics
  - warp scheduling and divergence
  - OpenCL - runtime system
  - OpenCL - heterogeneous computing
  - Efficient Neural Network Training/Inferencing

# OpenGL

## extension
- classified into Vendor, EXT and ARB
  - Vendor extensions are followed by vendor name AMD or NV. Other vendors can
    support same extension later on.
  - EXT extension are written together by two or more vendors.
  - ARB(Architecture Review Board) extensions are official OpenGL extensions.

- OpenGL coordiante system is from -1 to 1, center is 0,0

1. What is OpenGL (or Open Graphics Library)? Give Brief introduction about it.
A. OpenGL is a language-independent industrial standard API for producing 3D or
   2D graphics using graphics SIMD processor.

2. Name major competitors of OpenGL . Also give main advantages & disadvantage
   OpenGL have over other graphics libraries in the market.
A. Direct3d, Vulkan, Metal

3. Give the main advantages that OpenGL have over Microsoft’s proprietary
   Direct3D.
A. It is cross platform and open source.

4. OpenGL is written in which language? Is it possible to implement (or use)
   same library in programming languages other than that?
A. Its written in C/C++. yes its possible to implement in other languages.

5. Is OpenGL API platform independent? Is it possible to port the library to
   embedded systems such as mobile phones etc?
A. Yes. Yes.

6. Name few OpenGL related libraries that simplifies the programming task by providing a layer of abstraction over OpenGL.
A. GLUT.

7. How OpenGl can be considered as a state machine?
A. Opengl get into various states like glEnableVertexAttribArray(0) to bind
   buffer with glBindBuffer (GL_ARRAY_BUFFER, (GLUint)vertexbuffer) to buffer
   0. It can also get into various states using glenable(GL_TEXTURE_2D,
   GL_FOG, GL_BLEND) and gldisbale functions.

8. Explain OpenGL rendering pipeline.
A.
- Vertex specification: setting various buffers
- Vertex shader: transforming vertices to camera space
- tesselation: subdivides polygon into smaller polygons
  - Tesselation Control Shader(TCS): Works on patch(group of Control
    Points(CPs)) that define some surface, usually defined by a polynomial of
    CPs and emeits an output patch and TesselationLevel(TL)s. moving a CP
    changes the output patch. shader can transform/add/delete CPs.
  - Primitive Generator(PG): Subdivides TS in to a domain which is either
    normalized square of 2d coordinates or an 3d barycentric coordinates.
    The output topology can configure to be either points or triangles.
    generally TLs tell the PG the number of segments on the outer edge of the
    triangle and the number of rings towards the center.
  - Tesselation Evaluation Shader(TES): PG executes TES on every barycentric pt
    which gives out a vertex based on the patch(position, normal, etc.) and
    the polynomial of the surface. After PG executes TES, it sends the vertex
    as triangle from TES down the pipeline.
    
- geometry shader: generates additional primitives
  - Transform Feedback Buffer: when configured, activates vertex shader with
    the additional generated primitives.
- vertex post processing: clipping
- primitive assembly: sorting the primitives
- rasterization: converting primitives to 2D pixels
- fragment shader: sets color of pixels
- per-sample operations: bases on user activated them or not, tests like pixel
ownership test, scissor test(discards fragments that fall outside certain
rectangular region), stencil test(just like depth test it discards fragments
based on stensil which can be allowed to update during rendering), depth test
are performed
```
#version 330 core

//  Input vertex data, different for all executions of this shader.
layout(location = 0) in vec3 vertexPosition_modelspace;
//Tut4 not tut5
layout(location = 1) in vec3 vertexColor;
out vec3 fragmentColor;

////Tut5 not tut4
//  layout(location=1) in vec2 vertexUV;
//  out vec2 UV;

uniform mat4 MVP;
void main(){
    //Tut2
    gl_Position = MVP * vec4(vertexPosition_modelspace,1);
    //Tut4
    fragmentColor=vertexColor;
//      //Tut5
//      UV =  vertexUV;
}
#version 330 core

//Tut4 not tut5
in vec3 fragmentColor;

//  //Tut5 not tut4
//  in vec2 UV;
//  uniform sampler2D myTextureSampler;

//Tut2
out vec3 color;

void main()
{

//        //Tut2
//        color = vec3(1,0,0);
    //Tut 4 not tut5
    color = fragmentColor;
//      //Tut 5 not tut4
//      color = texture(myTextureSampler, UV).rgb;
}
```

9. What is term named Rasterization means? How is it different from vector
graphics?
A. Taking an image described in a vector graphics format (3d coordinates) and
convert it into a series of pixels, dots or lines.

10. How do we clear a window in OpenGL? Also write a code snippet for the same.
A. glClearColor (0.0f, 0.0f, 0.4f, 0.0f);

11. How to apply color to a geometrical object? Give the syntax of glColor3f()
method.
A. glColor3f(r,g,b) sets drawing color.

12. What is difference between glColor3f() & glClearColor() ?
A. glClearColor is used by glClear where as glColor3f is used for drawing.

13. Under which circumstances glFlush() method is used? How it is different
from glFinish()?
A. glFlush submits the commands to the GPU while it doesn't wait for the GPU to
actually draw the pixels. while glFinish waits for the pixels to be drawn.

14. What kind of restrictions OpenGL imposes on primitive polygons? Why?
A. polygon line segments cannot intersect, non convex polygons cannot be drawn
as expected.

15. Specify the syntax of rendering a vertex in OpenGL.
A. glBegin(GL_POINTS);
      glVertex3f(0.25, 0.25, 0.0)
   glEnd();
   glFlush();
   
16. Using glBegin() & glEnd() how do we create primitive geometric drawings
such as quadrilaterals, polygons etc?
A. glBegin(GL_POLYGON);
      glVertex3f(0.25, 0.25, 0.0)
   glEnd();
   glFlush();

17. What are vertex arrays? How do they help in increasing performance of
application?
A. vertex array is an array vertices which are passed at a time to the
processor to render them as specified primitives.

18. What are interleaved arrays? Where they are used?
A. Interleaved arrays are array of vertex,normal,texture arrays. they speed up
the rendering, give spatial locality and increase cache hits.

19. How to construct curved surfaces using polygon approximations?
A. sub dividing the curve into polygons.

20. Explain 3d viewing pipline.
A.

21. What is the set of operations that are needed to perform to display a 3d
representation over 2d screen?

22. Name the major stages of vertex transformations.
A. Model, View, Projection
   M  = T.R.S
   v' = P.V.M.v

23. Name & give syntax of general purpose transformation commands.
A. glRotatef(angle, x, y, z), glScalef(x, y, z), glTranslate(x, y, z).

24. Explain viewing & modeling transformations briefly.
A. modeling tranformation converts vertices to world coordinates
and viewing transformation converts world coordinates to camera coordinates

25. What is projection transformation? Give difference between perspective &
orthographic projections.
A. projection transformation is how camera views the scene. projection is
either othographic or perspective. in othographic projection, light rays from
vertices travel parallel to each other towards camera, while in perspective
projection, light rays from vertices travel away from the vertex.

26. What do you understand by color perception?
A. It explains how eye perceives color as a combination of Red,Green,Blue
wavelengths in addition to opacity. to get number of bit planes(or bits) per
each color, use glGetIntegerv() with GL_(RED/GREEN/BLUE/ALPHA/INDEX)_BITS.

27. What is difference between color index mode & RGBA mode?
A. color index mode uses lookup color map table to map indices to RGBA color
pallet.

28. How do we specify color of a geometrical object in both RGBA & color index
mode?
A. array of rgba values, array of color indices.

29. What is a shading model? Why we need it? List shading models that are
available in OpenGL.
A. glShadeMode(GLenum mode);
GL_SMOOTH interpolates color between vertices smoothly while rasterizing,
GL_FLAT assigns computed color of one vertex to all pixels of rasterized
primitive.

30. What do you understand by hidden surface removal? Name few of algorithms
used for the same.
A. 
31. Give a brief comparison between ambient, diffuse, specular, & emissive
light.
32. Write a little code snippet for creating a light source.
33. What do you understand by attenuation factor? What is its role in lighting?
34. Name the lighting models that are available in OpenGL.
35. Explain the effect of diffuse & specular reflection over a material.
36. How to achieve lighting effects in color index mode?
37. What do you understand by antialiasing? Is it good to use antialiasing in
our application? If yes, then why most of the applications do not use it ?
- removes jagged edges
  - SSAA(SuperSampleAntiAliasing) generates higher resolution image and
    down samples it
  - (MSAA)MultiSampleAntiAliasing rasterizers samples each pixel at multiple
    locations not just at the pixel center, on the underlying geometry.
- Mipmaps: bi linear and tri linear texture filtering.
38. What do you understand by blending? What role it plays in rendering
graphics? How it can be implemented in OpenGL?
- it implements transperancy.
39. Write a small program to add fog effect to the application.
40. Display lists play important role in OpenGL. What it is?
41. How Display lists are implemented in OpenGL?
42. How can we manage state of our application with display lists?
43. What is a BitMap? How it can be rendered in OpenGL?
44. Name & give syntax of methods used for reading, writing, & copying image
pixel data.
45. Briefly explain imaging pipeline.
46. What do you understand by Texture mapping? What are basic steps involved
in it?
47. What is the use case of texture arrays?
48. Explain usage of glTextEnv() or texturing methods.
49. What are main types of buffers OpenGL ecosystem consists of? Give their
uses also.
50. Name & explain tests that can applied to individual fragments .
51. What is Accumulation Buffer in context of OpenGL? For what purposes it is
used?
52. Can you name the OpenGL methods used in polygon tessellation?
53. what are VBO and VAO?
A. Vertex Buffer Object is high speed memory buffer in GPU while
Vertex Array Object is array of VBOs.
54. get attributes and uniforms in shaders
A. Variables shared between both examples:
```
GLint i;
GLint count;
GLint size; // size of the variable
GLenum type; // type of the variable (float, vec3 or mat4, etc)
const GLsizei bufSize = 16; // maximum name length
GLchar name[bufSize]; // variable name in GLSL
GLsizei length; // name length
glGetProgramiv(program, GL_ACTIVE_ATTRIBUTES, &count);
printf("Active Attributes: %d\n", count);
for (i = 0; i < count; i++) {
    glGetActiveAttrib(program, (GLuint)i, bufSize, &length, &size, &type, name);

    printf("Attribute #%d Type: %u Name: %s\n", i, type, name);
}
glGetProgramiv(program, GL_ACTIVE_UNIFORMS, &count);
printf("Active Uniforms: %d\n", count);
for (i = 0; i < count; i++) {
    glGetActiveUniform(program, (GLuint)i, bufSize, &length, &size, &type, name);
    printf("Uniform #%d Type: %u Name: %s\n", i, type, name);
}
```
55. gl fucntions
A.
glClearColor(0.0f, 0.0f, 0.4f, 0.0f); - set frame buffer color
glEnable(GL_DEPTH_TEST); - enables various opengl features
glDepthFunc(GL_LESS); - draws only nearer fragments
// array holds data and its shape through glEnableVertexAttribArray
// glVertexAttribPointer for VBO
glGenVertexArrays(1, &VertexArrayID);
glBindVertexArray(VertexArrayID);
GLuint depthMatrixID = glGetUniformLocation(depthPorgramID, "depthMVP");
GLuint vertexbuffer;
glGenBuffers(1, &vertexbuffer);
glBindBuffer(GL_ARRAY_BUFFER, vertexbuffer);
glBufferData(GL_ARRAY_BUFFER, indexed_vertices.size() * sizeof(glm::vec3),
             &indexd_vertices[0], GL_STATIC_DRAW);
glGenFramebuffers(1, &FramebufferName);
glBindFramebuffer(GL_FRAMEBUFFER, FramebufferName);
GLuint depthTexture;
glGenTextures(1, &depthTexture);
glBindTexture(GL_TEXUTURE_2D, depthTexture);
glTexImage2D(GL_TEXTURE_2D, 0,GL_DEPTH_COMPONENT16, 1024, 1024, 0,
             GL_DEPTH_COMPONENT, GL_FLOAT, 0);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, depthTexture, 0);
glDrawBuffer(GL_NONE); //define an array of buffers into which outputs from
                       //fragment shader will be written.
glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE
GLuint TextureID  = glGetUniformLocation(programID, "myTextureSampler");
glUseProgram(depthProgramID);
glUniformMatrix4fv(depthMatrixID, 1, GL_FALSE, &depthMVP[0][0]);
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER, vertexbuffer);
glVertexAttribPointer(
	0,                  // The attribute we want to configure
	3,                  // size
	GL_FLOAT,           // type
	GL_FALSE,           // notNormalized?
	0,                  // stride
	(void*)0            // array buffer offset
);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, elementbuffer);
// Draw the triangles !
glDrawElements(
	GL_TRIANGLES,      // mode  
	indices.size(),    // count
	GL_UNSIGNED_SHORT, // type
	(void*)0           // element array buffer offset
);
glDisableVertexAttribArray(0);
glBindFramebuffer(GL_FRAMEBUFFER, 0); // to render to screen
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, Texture);
// Set our "myTextureSampler" sampler to use Texture Unit 0
glUniform1i(TextureID, 0);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, depthTexture);
glUniform1i(ShadowMapID, 1);
glDrawArrays (GL_TRIANGLES, 0, pObj->m_TotalVertices); //draws using vertices
glDeleteBuffers(1, &vertexbuffer);
glDeleteProgram(programID);
glDeleteTextures(1, &Texture);
glDeleteFramebuffers(1, &FramebufferName);

## GLFW
- multi platform library that creates windows, contexts, surfaces, receive
  input and events
# GJK collision detection algorithm
support function gives farthest point in a direction