IBM Netezza Interview questions (1/3)

  1. What is Netezza?

Netezza is a datawarehouse appliance in which both h/w and s/w are tightly coupled to deliver best performance to the business.

Appliance is like a black box. For ex: Washing machine

 

  1. List various datawarehouse appliances available in the market? (or) who are all Netezza’s competitors?
    • Oracle Exadata
    • Teradata
    • EMC Greenplum
    • HP Vertica
    • Microsoft Parallel machine.

Netezza (vs) Hadoop:

Hadoop: For Unstructured data. For ex: Someone wishing you on your birthday

Netezza: Structured data. Registering for a facebook account.

 

  1. List various appliance components
  • SMP Hosts
  • Disks
  • Snippet Blades
  • Sub components: Disk Enclosures, SPA, SPU, FPGA

S-Blade = SPU

Storage Array 1, Storage Array 2.

Each Storage Array has 4 disk Enclosures. Each Disk Enclosure contains 12 disks of 1TB size.

Each S-Blade owns several disks, which reside in a storage array within the same rack.

Storage Array = Storage Disk = N disk enclosures

====

Storage Array 1  +  Chassis 1 => Snippet Processing Array (SPA) 1

Sorage Array2    + Chassis 2 => SPA 2

 

4. List few ETL tools?

Informatica, Datastage, Ab intio

 

5. List few BI tools?

Cognos, OBIEE, SSIS

 6. Where system catalog information is stored? In Netezza host or Netezza disks?

Netezza hosts.

For example: when we create a table, table definition is stored in host disks (df -k) and rows/records of this table are stored in Netezza disks

7. What are default userid’s and passwords in Netezza?

root, nz, admin

8. What is clustering in Netezza?

Clustering is nothing but grouping of similar things.  In Netezza, we have 2 SMP hosts. These 2 hosts are part of a cluster. In this cluster, one host is active and another host is passive.

9. List few back slash commands in Netezza

Back slash commands: To avoid basic day to day SQL

For example: \du is to display users. \l is to list the database users.

 

10. What is the difference between “nzsql” and “nzsql -E”

-E ==> To see the underlying query generated by back slash command

 

11. List 2 default databases in Netezza

SYSTEM, MASTER_DB

MASTER_DB: Is template database; helps us to create new databases faster.

12. What is the extent and page size in Netezza?

Extent size: 3MB

Page size: 128 KB

Note: The above are fixed. we can’t change them.

 

13. What are various maximums in Netezza?

Columns: 1600 per table or view

Maximum columns per distribution: 4

Max Length of database and column names: 128 bytes

Max number of characters in a char/varchar field: 64,000

Max connections to the server: 2000. Default: 500

Row size: 65,535 bytes

 

14. List few environment variables in Netezza

NZ_DATABASE

NZ_HOST

NZ_USER

NZ_PASSWORD

15. List constraints in Netezza

Primary Key (Not enforced)

Unique key (Not enforced)

Referential Integrity (Not enforced)

Not Null (Enforced)

Data type verification (Enforced)

16. Can you insert duplicate rows in Netezza?

Yes

17. Techniques to improve Netezza appliance performance?

As a developer:

– At design time, have a best distribution key. To avoid data skew

– Use appropriate data types. For example: If INT2 is good, then don’t use INT4. For age, Byteint (Int1) is good enough

18. What is a Snippet?

Snippet is a part of given query. When we submit a query to Netezza, it is divided into multiple snippets. These snippets are executed by Snippet(s)-Blades.

19. What are zonemaps?

Automatically created persistent internal tables.  Minimum & Maximum for  Integer & Date columns.

Helps Netezza to avoid unnecessary scanning of Extents (3MB)

Till Netezza 6.0: Zonemaps are till only at extent level

In Netezza 7.0, Zonemaps are even added at Page(128KB) level

Zonemaps (vs) Index: Zonemaps are like Anti-Indexes.

Index points to wherever the data is. But, Zonemaps point to where data doesn’t exist.

20. Generate Statistics (vs) Generate Express Statistics (vs) Automatic Statistics

When a user submits a query, Netezza optimizer decides on best execution path for that query.

For example: If I want to travel from Newyork to Chicago, what is my best mode of transportation in the resources(money & Time) available to me.

 

Generate statistics: Helps Netezza optimizer to know the details of tables such as number of rows, null/not nulls, etc in advance

 

Generate Statistics = Generate Express Statistics

Automatic Statistics: Automatically generated statistics.

Generate Statistics (Vs) Automatic Statistics:

With respect to sampling. Automatic statistics only gathers using little sampling of data. Where as Generate statistics, takes complete data as sample.

21. What are materialized views and advantages of creating them?

In Netezza, Materialized views are also called as SPM views.

S: Sorted

P: Projected

M: Materialized

 

The objective of SPM views is to avoid unnecessary I/O

Advantages:

If we are using only few columns of a major table in most of the queries, we can create a materialized view on these commonly used columns.

For example: create materialized view as select c_custname, c_age, c_address from custinfo order by c_custname.

If anyone fires a select statement to select customer name, the query gets the result from materialized view instead of main table.

The major advantage here is: Amount of data transferred from storage layer to database layer.

 

Can’t we use View?

View: Still has to go and get data from main table. The advantage with SPM views is: data is ordered

22. List 2 distribution methods in Netezza

Hashing Distribution

Round Robin

Advantage of Hashing: Co-located joins will happen here

 

23. What is the default distribution method in Netezza?

Hashing on first column

24. How many maximum columns you can use while distributing a table?

4

 

25. Can we update column(s) used in distribution key?

Nope.  If you have to update, create another table using “CTAS: Create table as … distribution key”

26. How do you change distribution key of already create table?

CTAS: Create table as …

27. What is CTAS? What is the default distribution that you get when you create table using CTAS?

CTAS: Create table as

The default is: whatever we have on original table

 

28. What is a collocated join?

The joins that might happen on Hashing distributed tables in which we don’t have to distribute data of tables at run time.

29. What is table skew? How do you check table is skewed or not?

Table skew: Un even distribution of data of a table across data slices.

SELECT datasliceid, count(datasliceid) from <tablename> group by  datasliceid;

 

30. What is broadcasting in Netezza? When Netezza uses broadcasting?

SELECT …

FROM customer, sales

Where customer.cid = sales.sid

If both tables in join doesn’t have same distribution key, then one or both tables have to be re-distributed at run time.

The decision of broadcasting is taken by Netezza: if one of the tables is way smaller than other table.

 

Thanks,
Ravi

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: