# GPU COA - Computer Organization and Architecture MIPS - Million Instructions Per Second - Microprocessor without Interlocked Pipelined Stages VMIPS - Vectorized MIPS - The FUs(functional units like ALUs) can be fully parallel, or a combination of parallel and pipelined units with multiplied clock rate to match. Vector register - generally of dimension 64 with each register of size 64bits - GPU is organized into TPC->SM->SP - it also conatains ROPs - each SM has - eight SP cores - two SFUs each with 4 floating point multipliers - MT instruction fetch and issue unit - an instruction cache, a read-only constant cache - a 16kb read/write shared memory - each SP has - a scalar MAD unit TPC - Texture/processor clusters SM - Streaming Multiprocessor SP - Scalar/Streaming processor ROP - Raster operation processors or Render Output Unit SFU - Special Function Unit(Transcendental functions like cosine, sine, etc.) MT - multi thread issue MAD - Multiply-add unit ISA - Instruction Set Architecture - Tesla architecture(./images/GeForce8800)) comprises of 8TPC each with 2SM each with 8SP - input assembler collects vetex work - Vertex work distributor distributes vertex work packets to TPCs - TPCs execute vertex/geometry shader programs - output data is written to on-chip buffers - buffers then pass their results to the viewport/clip/setup/raster/zcull block SIMT - Single Instruction Multiple Thread similar to SIMD(applies one instruction to multiple data lanes) where one instruction applied to multiple independent threads(non concurrent). a SIMT instruction controls the execution and branching behavior one thread. bunch of SIMD instructions form an SIMT program. ## Nvidia - each SM's multithreaded instruction unit creates, manages, schedules, and executes threads in groups of 32 threads called warps. several warps form a block. each SM has 1 or more blocks. each SM has atleast 1 warp size of SPs - each SM manages a pool of 24 warps, with a total of 768 threads. - each SM can schedule max of 1024(2048 in modern GPUs) threads per instruction, and hence 1024 is max block size. if SP's per SM are less than block size then a block is divided into batches of multiple of warp size. - each SM maps warp threads to the SP cores - In each operation cycle, the SM warp scheduler selects one of the 24 warps - An issued warp executes over four processor cycles - the SP cores and SFU units execute instructions independently Control flow includes branch, call, return, trap(program blocking itself and requesting OS service) and barrier synchronization Vector Register File is collection of vector registers Vector registers are ordered collections of scalar registers and provide intermediate storage space for the components of a vector of moderately large size. Register File registers are divided logically across the SIMD lanes - Fermi GTX 480 - 16 SMs each with 32 SPs - a warp comprises of atleast one SM - each thread has 64 registers of 32bit or 32 registers of double-precision floating point operands - so a warp has 32 double precision vector registers of 32 elements - typically a warp contains 32 threads, if a SM has 128 SPs then 4 warps are scheduled at a time on the SM. - when an instruction in a warp is waiting for another operation to complete then the warp scheduler runs another warp while the blocking operation is completed. - A warp typically requests 32x4byte memory aligned words in one global memory transaction which is called memory coalaced transaction. - uncoalesced memory access is due to - offset request: a[tid+s] = a[tid+s]+1; - strided request: a[tid*s]=a[tid*s]+1; - various software optimizations - memory access coalescing - optimizing reduction kernels - kernel fusion, thread and block coarsening - GPU Topics - warp scheduling and divergence - OpenCL - runtime system - OpenCL - heterogeneous computing - Efficient Neural Network Training/Inferencing # OpenGL ## extension - classified into Vendor, EXT and ARB - Vendor extensions are followed by vendor name AMD or NV. Other vendors can support same extension later on. - EXT extension are written together by two or more vendors. - ARB(Architecture Review Board) extensions are official OpenGL extensions. - OpenGL coordiante system is from -1 to 1, center is 0,0 1. What is OpenGL (or Open Graphics Library)? Give Brief introduction about it. A. OpenGL is a language-independent industrial standard API for producing 3D or 2D graphics using graphics SIMD processor. 2. Name major competitors of OpenGL . Also give main advantages & disadvantage OpenGL have over other graphics libraries in the market. A. Direct3d, Vulkan, Metal 3. Give the main advantages that OpenGL have over Microsoft’s proprietary Direct3D. A. It is cross platform and open source. 4. OpenGL is written in which language? Is it possible to implement (or use) same library in programming languages other than that? A. Its written in C/C++. yes its possible to implement in other languages. 5. Is OpenGL API platform independent? Is it possible to port the library to embedded systems such as mobile phones etc? A. Yes. Yes. 6. Name few OpenGL related libraries that simplifies the programming task by providing a layer of abstraction over OpenGL. A. GLUT. 7. How OpenGl can be considered as a state machine? A. Opengl get into various states like glEnableVertexAttribArray(0) to bind buffer with glBindBuffer (GL_ARRAY_BUFFER, (GLUint)vertexbuffer) to buffer 0. It can also get into various states using glenable(GL_TEXTURE_2D, GL_FOG, GL_BLEND) and gldisbale functions. 8. Explain OpenGL rendering pipeline. A. - Vertex specification: setting various buffers - Vertex shader: transforming vertices to camera space - tesselation: subdivides polygon into smaller polygons - Tesselation Control Shader(TCS): Works on patch(group of Control Points(CPs)) that define some surface, usually defined by a polynomial of CPs and emeits an output patch and TesselationLevel(TL)s. moving a CP changes the output patch. shader can transform/add/delete CPs. - Primitive Generator(PG): Subdivides TS in to a domain which is either normalized square of 2d coordinates or an 3d barycentric coordinates. The output topology can configure to be either points or triangles. generally TLs tell the PG the number of segments on the outer edge of the triangle and the number of rings towards the center. - Tesselation Evaluation Shader(TES): PG executes TES on every barycentric pt which gives out a vertex based on the patch(position, normal, etc.) and the polynomial of the surface. After PG executes TES, it sends the vertex as triangle from TES down the pipeline. - geometry shader: generates additional primitives - Transform Feedback Buffer: when configured, activates vertex shader with the additional generated primitives. - vertex post processing: clipping - primitive assembly: sorting the primitives - rasterization: converting primitives to 2D pixels - fragment shader: sets color of pixels - per-sample operations: bases on user activated them or not, tests like pixel ownership test, scissor test(discards fragments that fall outside certain rectangular region), stencil test(just like depth test it discards fragments based on stensil which can be allowed to update during rendering), depth test are performed ``` #version 330 core // Input vertex data, different for all executions of this shader. layout(location = 0) in vec3 vertexPosition_modelspace; //Tut4 not tut5 layout(location = 1) in vec3 vertexColor; out vec3 fragmentColor; ////Tut5 not tut4 // layout(location=1) in vec2 vertexUV; // out vec2 UV; uniform mat4 MVP; void main(){ //Tut2 gl_Position = MVP * vec4(vertexPosition_modelspace,1); //Tut4 fragmentColor=vertexColor; // //Tut5 // UV = vertexUV; } #version 330 core //Tut4 not tut5 in vec3 fragmentColor; // //Tut5 not tut4 // in vec2 UV; // uniform sampler2D myTextureSampler; //Tut2 out vec3 color; void main() { // //Tut2 // color = vec3(1,0,0); //Tut 4 not tut5 color = fragmentColor; // //Tut 5 not tut4 // color = texture(myTextureSampler, UV).rgb; } ``` 9. What is term named Rasterization means? How is it different from vector graphics? A. Taking an image described in a vector graphics format (3d coordinates) and convert it into a series of pixels, dots or lines. 10. How do we clear a window in OpenGL? Also write a code snippet for the same. A. glClearColor (0.0f, 0.0f, 0.4f, 0.0f); 11. How to apply color to a geometrical object? Give the syntax of glColor3f() method. A. glColor3f(r,g,b) sets drawing color. 12. What is difference between glColor3f() & glClearColor() ? A. glClearColor is used by glClear where as glColor3f is used for drawing. 13. Under which circumstances glFlush() method is used? How it is different from glFinish()? A. glFlush submits the commands to the GPU while it doesn't wait for the GPU to actually draw the pixels. while glFinish waits for the pixels to be drawn. 14. What kind of restrictions OpenGL imposes on primitive polygons? Why? A. polygon line segments cannot intersect, non convex polygons cannot be drawn as expected. 15. Specify the syntax of rendering a vertex in OpenGL. A. glBegin(GL_POINTS); glVertex3f(0.25, 0.25, 0.0) glEnd(); glFlush(); 16. Using glBegin() & glEnd() how do we create primitive geometric drawings such as quadrilaterals, polygons etc? A. glBegin(GL_POLYGON); glVertex3f(0.25, 0.25, 0.0) glEnd(); glFlush(); 17. What are vertex arrays? How do they help in increasing performance of application? A. vertex array is an array vertices which are passed at a time to the processor to render them as specified primitives. 18. What are interleaved arrays? Where they are used? A. Interleaved arrays are array of vertex,normal,texture arrays. they speed up the rendering, give spatial locality and increase cache hits. 19. How to construct curved surfaces using polygon approximations? A. sub dividing the curve into polygons. 20. Explain 3d viewing pipline. A. 21. What is the set of operations that are needed to perform to display a 3d representation over 2d screen? 22. Name the major stages of vertex transformations. A. Model, View, Projection M = T.R.S v' = P.V.M.v 23. Name & give syntax of general purpose transformation commands. A. glRotatef(angle, x, y, z), glScalef(x, y, z), glTranslate(x, y, z). 24. Explain viewing & modeling transformations briefly. A. modeling tranformation converts vertices to world coordinates and viewing transformation converts world coordinates to camera coordinates 25. What is projection transformation? Give difference between perspective & orthographic projections. A. projection transformation is how camera views the scene. projection is either othographic or perspective. in othographic projection, light rays from vertices travel parallel to each other towards camera, while in perspective projection, light rays from vertices travel away from the vertex. 26. What do you understand by color perception? A. It explains how eye perceives color as a combination of Red,Green,Blue wavelengths in addition to opacity. to get number of bit planes(or bits) per each color, use glGetIntegerv() with GL_(RED/GREEN/BLUE/ALPHA/INDEX)_BITS. 27. What is difference between color index mode & RGBA mode? A. color index mode uses lookup color map table to map indices to RGBA color pallet. 28. How do we specify color of a geometrical object in both RGBA & color index mode? A. array of rgba values, array of color indices. 29. What is a shading model? Why we need it? List shading models that are available in OpenGL. A. glShadeMode(GLenum mode); GL_SMOOTH interpolates color between vertices smoothly while rasterizing, GL_FLAT assigns computed color of one vertex to all pixels of rasterized primitive. 30. What do you understand by hidden surface removal? Name few of algorithms used for the same. A. 31. Give a brief comparison between ambient, diffuse, specular, & emissive light. 32. Write a little code snippet for creating a light source. 33. What do you understand by attenuation factor? What is its role in lighting? 34. Name the lighting models that are available in OpenGL. 35. Explain the effect of diffuse & specular reflection over a material. 36. How to achieve lighting effects in color index mode? 37. What do you understand by antialiasing? Is it good to use antialiasing in our application? If yes, then why most of the applications do not use it ? - removes jagged edges - SSAA(SuperSampleAntiAliasing) generates higher resolution image and down samples it - (MSAA)MultiSampleAntiAliasing rasterizers samples each pixel at multiple locations not just at the pixel center, on the underlying geometry. - Mipmaps: bi linear and tri linear texture filtering. 38. What do you understand by blending? What role it plays in rendering graphics? How it can be implemented in OpenGL? - it implements transperancy. 39. Write a small program to add fog effect to the application. 40. Display lists play important role in OpenGL. What it is? 41. How Display lists are implemented in OpenGL? 42. How can we manage state of our application with display lists? 43. What is a BitMap? How it can be rendered in OpenGL? 44. Name & give syntax of methods used for reading, writing, & copying image pixel data. 45. Briefly explain imaging pipeline. 46. What do you understand by Texture mapping? What are basic steps involved in it? 47. What is the use case of texture arrays? 48. Explain usage of glTextEnv() or texturing methods. 49. What are main types of buffers OpenGL ecosystem consists of? Give their uses also. 50. Name & explain tests that can applied to individual fragments . 51. What is Accumulation Buffer in context of OpenGL? For what purposes it is used? 52. Can you name the OpenGL methods used in polygon tessellation? 53. what are VBO and VAO? A. Vertex Buffer Object is high speed memory buffer in GPU while Vertex Array Object is array of VBOs. 54. get attributes and uniforms in shaders A. Variables shared between both examples: ``` GLint i; GLint count; GLint size; // size of the variable GLenum type; // type of the variable (float, vec3 or mat4, etc) const GLsizei bufSize = 16; // maximum name length GLchar name[bufSize]; // variable name in GLSL GLsizei length; // name length glGetProgramiv(program, GL_ACTIVE_ATTRIBUTES, &count); printf("Active Attributes: %d\n", count); for (i = 0; i < count; i++) { glGetActiveAttrib(program, (GLuint)i, bufSize, &length, &size, &type, name); printf("Attribute #%d Type: %u Name: %s\n", i, type, name); } glGetProgramiv(program, GL_ACTIVE_UNIFORMS, &count); printf("Active Uniforms: %d\n", count); for (i = 0; i < count; i++) { glGetActiveUniform(program, (GLuint)i, bufSize, &length, &size, &type, name); printf("Uniform #%d Type: %u Name: %s\n", i, type, name); } ``` 55. gl fucntions A. glClearColor(0.0f, 0.0f, 0.4f, 0.0f); - set frame buffer color glEnable(GL_DEPTH_TEST); - enables various opengl features glDepthFunc(GL_LESS); - draws only nearer fragments // array holds data and its shape through glEnableVertexAttribArray // glVertexAttribPointer for VBO glGenVertexArrays(1, &VertexArrayID); glBindVertexArray(VertexArrayID); GLuint depthMatrixID = glGetUniformLocation(depthPorgramID, "depthMVP"); GLuint vertexbuffer; glGenBuffers(1, &vertexbuffer); glBindBuffer(GL_ARRAY_BUFFER, vertexbuffer); glBufferData(GL_ARRAY_BUFFER, indexed_vertices.size() * sizeof(glm::vec3), &indexd_vertices[0], GL_STATIC_DRAW); glGenFramebuffers(1, &FramebufferName); glBindFramebuffer(GL_FRAMEBUFFER, FramebufferName); GLuint depthTexture; glGenTextures(1, &depthTexture); glBindTexture(GL_TEXUTURE_2D, depthTexture); glTexImage2D(GL_TEXTURE_2D, 0,GL_DEPTH_COMPONENT16, 1024, 1024, 0, GL_DEPTH_COMPONENT, GL_FLOAT, 0); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, depthTexture, 0); glDrawBuffer(GL_NONE); //define an array of buffers into which outputs from //fragment shader will be written. glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE GLuint TextureID = glGetUniformLocation(programID, "myTextureSampler"); glUseProgram(depthProgramID); glUniformMatrix4fv(depthMatrixID, 1, GL_FALSE, &depthMVP[0][0]); glEnableVertexAttribArray(0); glBindBuffer(GL_ARRAY_BUFFER, vertexbuffer); glVertexAttribPointer( 0, // The attribute we want to configure 3, // size GL_FLOAT, // type GL_FALSE, // notNormalized? 0, // stride (void*)0 // array buffer offset ); glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, elementbuffer); // Draw the triangles ! glDrawElements( GL_TRIANGLES, // mode indices.size(), // count GL_UNSIGNED_SHORT, // type (void*)0 // element array buffer offset ); glDisableVertexAttribArray(0); glBindFramebuffer(GL_FRAMEBUFFER, 0); // to render to screen glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, Texture); // Set our "myTextureSampler" sampler to use Texture Unit 0 glUniform1i(TextureID, 0); glActiveTexture(GL_TEXTURE1); glBindTexture(GL_TEXTURE_2D, depthTexture); glUniform1i(ShadowMapID, 1); glDrawArrays (GL_TRIANGLES, 0, pObj->m_TotalVertices); //draws using vertices glDeleteBuffers(1, &vertexbuffer); glDeleteProgram(programID); glDeleteTextures(1, &Texture); glDeleteFramebuffers(1, &FramebufferName); ## GLFW - multi platform library that creates windows, contexts, surfaces, receive input and events # GJK collision detection algorithm support function gives farthest point in a direction