| IntroductionSurprised, by the title?
| |
| | "Analyze table" to identify how the mysql
|
| well, this is a tour of how we cracked
| |
| | query works. This will give you insight
|
| the scalability jinx from handling a
| |
| | about why the query is slow, i.e whether
|
| meagre 40 records per second to 500
| |
| | it is using the correct indexes, whether
|
| records per second. Beware, most of the
| |
| | it is using a table level scan etc.*
|
| problems we faced were straight forward,
| |
| | Never delete rows when you deal with huge
|
| so experienced people might find this
| |
| | data in the order of 50,000 records in a
|
| superfluous.
| |
| | single table. Always try to do a "drop
|
| Contents* 1.0 Where were we?1.1 Memory
| |
| | table" as much as possible. If it is not
|
| hits the sky
| |
| | possible, redesign your schema, that is
|
| 1.2 Low processing rate
| |
| | your only way out!* Avoid unwanted
|
| 1.3 Data loss :-(
| |
| | join(s), don't be afraid to de-normalize
|
| 1.4 Mysql pulls us down
| |
| | (i.e duplicate the column values) Avoid
|
| 1.5 Slow Web Client* 2.0 Road to
| |
| | join(s) as much as possible, they tend to
|
| Nirvana2.1 Controlling memory!
| |
| | pull your query down. One hidden
|
| 2.2 Streamlining processing rate
| |
| | advantage is the fact that they impose
|
| 2.3 What data loss uh-uh?
| |
| | simplicity in your queries.* If you are
|
| 2.4 Tuning SQL Queries
| |
| | dealing with bulk data, always use "load
|
| 2.5 Tuning database schema
| |
| | data infile" there are two options here,
|
| 2.5 Mysql helps us forge ahead!
| |
| | local and remote. Use local if the mysql
|
| 2.6 Faster...faster Web Client* 3.0
| |
| | and the application are in the same
|
| Bottom lineWhere were we?Initially we had
| |
| | machine otherwise use remote.* Try to
|
| a system which could scale only upto 40
| |
| | split your complex queries into two or
|
| records /sec. I could even recollect
| |
| | three simpler queries. The advantages in
|
| the discussion, about "what should be
| |
| | this approach are that the mysql resource
|
| the ideal rate of records? ". Finally we
| |
| | is not hogged up for the entire process.
|
| decided that 40/sec was the ideal rate
| |
| | Tend to use temporary tables. Instead of
|
| for a single firewall. So when we have to
| |
| | using a single query which spans across
|
| go out, we atleast needed to support 3
| |
| | 5-6 tables.* When you deal with huge
|
| firewalls. Hence we decided that 120/sec
| |
| | amount of data, i.e you want to proces
|
| would be the ideal rate. Based on the
| |
| | say 50,000 records or more in a single
|
| data from our competitor(s) we came to
| |
| | query try using limit to batch process
|
| the conclusion that, they could support
| |
| | the records. This will help you scale the
|
| around 240/sec. We thought it was ok! as
| |
| | system to new heights* Always use smaller
|
| it was our first release. Because all the
| |
| | transaction(s) instead of large ones i.e
|
| competitors talked about the number of
| |
| | spanning across "n" tables. This locks up
|
| firewalls he supported but not on the
| |
| | the mysql resources, which might cause
|
| rate.Memory hits the skyOur memory was
| |
| | slowness of the system even for simple
|
| always hitting the sky even at 512MB!
| |
| | queries* Use join(s) on columns with
|
| (OutOfMemory exception) We blamed
| |
| | indexes or foreign keys* Ensure that the
|
| cewolf(s) inmemory caching of the
| |
| | the queries from the user interface have
|
| generated images.But we could not escape
| |
| | criteria or limit.* Also ensure that the
|
| for long! No matter whether we connected
| |
| | criteria column is indexed* Do not have
|
| the client or not we used to hit the sky
| |
| | the numeric value in sql criteria within
|
| in a couple of days max 3-4 days flat!
| |
| | quotes, because mysql does a type cast*
|
| Interestingly,this was reproducible when
| |
| | use temporary tables as much as possible,
|
| we sent data at very high rates(then), of
| |
| | and drop it...* Insert of select/delete
|
| around 50/sec. You guessed it right, an
| |
| | is a double table lock... be aware...*
|
| unlimited buffer which grows until it
| |
| | Take care that you do not pain the mysql
|
| hits the roof.Low processing rateWe were
| |
| | database with the frequency of your
|
| processing records at the rate of 40/sec.
| |
| | updates to the database. We had a typical
|
| We were using bulk update of
| |
| | case we used to dump to the database
|
| dataobject(s). But it did not give the
| |
| | after every 300 records. So when we
|
| expected speed! Because of this we
| |
| | started testing for 500/sec we started
|
| started to hoard data in memory resulting
| |
| | seeing that the mysql was literally
|
| in hoarding memory!Data Loss :-(At very
| |
| | dragging us down. That is when we
|
| high speeds we used to miss many a
| |
| | realized that the typicall at the rate of
|
| packet(s). We seemed to have little data
| |
| | 500/sec there is an "load data infile"
|
| loss, but that resulted in a memory hog.
| |
| | request every second to the mysql
|
| On some tweaking to limit the buffer size
| |
| | database. So we had to change to dump the
|
| we started having a steady data loss of
| |
| | records after 3 minutes rather than 300
|
| about 20% at very high rates.Mysql pulls
| |
| | records.Tuning database schemaWhen you
|
| us downWe were facing a tough time when
| |
| | deal with huge amount of data, always
|
| we imported a log file of about 140MB.
| |
| | ensure that you partition your data. That
|
| Mysql started to hog,the machine started
| |
| | is your road to scalability. A single
|
| crawling and sometimes it even stopped
| |
| | table with say 10 lakhs can never scale.
|
| responding.Above all, we started getting
| |
| | When you intend to execute queries for
|
| deadlock(s) and transaction timeout(s).
| |
| | reports. Always have two levels of
|
| Which eventually reduced the
| |
| | tables, raw tables one for the actual
|
| responsiveness of the system.Slow Web
| |
| | data and another set for the report
|
| ClientHere again we blamed the number of
| |
| | tables( the tables which the user
|
| graphs we showed in a page as the
| |
| | interfaces query on!) Always ensure that
|
| bottleneck, ignoring the fact that there
| |
| | the data on your report tables never
|
| were many other factors that were pulling
| |
| | grows beyond a limit. Incase you are
|
| the system down. The pages used to take
| |
| | planning to use Oracle, you can try out
|
| 30 seconds to load for a page with 6-8
| |
| | the partitioning based on criteria. But
|
| graphs and tables after 4 days at
| |
| | unfortunately mysql does not support
|
| Internet Data Center.Road To
| |
| | that. So we will have to do that.
|
| NirvanaControlling Memory!We tried to put
| |
| | Maintain a meta table in which you have
|
| a limit on the buffer size of 10,000, but
| |
| | the header information i.e which table to
|
| it did not last for long. The major flaw
| |
| | look for, for a set of given criteria
|
| in the design was that we assumed that
| |
| | normally time.* We had to walk through
|
| the buffer of around 10000 would suffice,
| |
| | our database schema and we added to add
|
| i.e we would be process records before
| |
| | some indexes, delete some and even
|
| the buffer of 10,1000 reaches. Inline
| |
| | duplicated column(s) to remove costly
|
| with the principle "Something can go
| |
| | join(s).* Going forward we realized that
|
| wrong it will go wrong!" it went wrong.
| |
| | having the raw tables as InnoDB was
|
| We started loosing data. Subsesquently we
| |
| | actually a overhead to the system, so we
|
| decided to go with a flat file based
| |
| | changed it to MyISAM* We also went to the
|
| caching, wherein the data was dumped into
| |
| | extent of reducing the number of rows in
|
| the flat file and would be loaded into
| |
| | static tables involved in joins* NULL in
|
| the database using "load data infile".
| |
| | database tables seems to cause some
|
| This was many times faster than an bulk
| |
| | performance hit, so avoid them* Don't
|
| insert via database driver. you might
| |
| | have indexes for columns which has
|
| also want to checkout some possible
| |
| | allowed values of 2-3* Cross check the
|
| optimizations with load data infile. This
| |
| | need for each index in your table, they
|
| fixed our problem of increasing buffer
| |
| | are costly. If the tables are of InnoDB
|
| size of the raw records.The second
| |
| | then double check their need. Because
|
| problem we faced was the increase of
| |
| | InnoDB tables seem to take around 10-15
|
| cewolf(s) in memory caching mechanism. By
| |
| | times the size of the MyISAM tables.* Use
|
| default it used
| |
| | MyISAM whenever there is a majority of ,
|
| "TransientSessionStorage" which caches
| |
| | either one of (select or insert)
|
| the image objects in memory, there seemed
| |
| | queries. If the insert and select are
|
| to be some problem in cleaning up the
| |
| | going to be more then it is better to
|
| objects, even after the rerferences were
| |
| | have it as an InnoDBMysql helps us forge
|
| lost! So we wrote a small "FileStorage"
| |
| | ahead!Tune your mysql server ONLY after
|
| implementation which store the image
| |
| | you fine tune your queries/schemas and
|
| objects in the local file. And would be
| |
| | your code. Only then you can see a
|
| served as and when the request comes in.
| |
| | perceivable improvement in performance.
|
| Moreover, we also implmentated a cleanup
| |
| | Here are some of the parameters that
|
| mechanism to cleanup stale images( images
| |
| | comes in handy:* Use the buffer pool size
|
| older than 10mins).Another interesting
| |
| | which will enable your queries to execute
|
| aspect we found here was that the Garbage
| |
| | faster --innodb_buffer_pool_size=64M
|
| collector had lowest priority so the
| |
| | for InnoDB and use --key-bufer-size=32M
|
| objects created for each records , were
| |
| | for MyISAM* Even simple queries started
|
| hardly cleaned up. Here is a little math
| |
| | taking more time than expected. We were
|
| to explain the magnitude of the problem.
| |
| | actually puzzled! We realized that mysql
|
| Whenever we receive a log record we
| |
| | seems to load the index of any table it
|
| created ~20 objects(hashmap,tokenized
| |
| | starts inserting on. So what typically
|
| strings etc) so at the rate of 500/sec
| |
| | happened was, any simple query to a table
|
| for 1 second, the number of objects was
| |
| | with 5-10 rows took around 1-2 secs. On
|
| 10,000(20*500*1). Due to the heavy
| |
| | further analysis we found that just
|
| processing Garbage collector never had a
| |
| | before the simple query , "load data
|
| chance to cleanup the objects. So all we
| |
| | infile" happened. This disappeared when
|
| had to do was a minor tweak, we just
| |
| | we changed the raw tables to MyISAM type,
|
| assigned "null" to the object references.
| |
| | because the buffer size for innodb and
|
| Voila! the garbage collector was never
| |
| | MyISAM are two different
|
| tortured I guess ;-)Streamlining
| |
| | configurations.for more configurable
|
| processing rateThe processing rate was at
| |
| | parameters see here.Tip: start your mysql
|
| a meagre 40/sec that means that we could
| |
| | to start with the following option
|
| hardly withstand even a small outburst of
| |
| | --log-error this will enable error
|
| log records! The memory control gave us
| |
| | loggingFaster...faster Web ClientThe user
|
| some solace,but the actual problem was
| |
| | interface is the key to any product,
|
| with the application of the alert filters
| |
| | especially the perceived speed of the
|
| over the records. We had around 20
| |
| | page is more important! Here is a list of
|
| properties for each record, we used to
| |
| | solutions and learnings that might come
|
| search for all the properties. We changed
| |
| | in handy:* If your data is not going to
|
| the implementation to match for those
| |
| | change for say 3-5 minutes, it is better
|
| properties we had criteria for! Moreover,
| |
| | to cache your client side pages* Tend to
|
| we also had a memory leak in the alert
| |
| | use Iframe(s)for inner graphs etc. they
|
| filter processing. We maintained a queue
| |
| | give a perceived fastness to your pages.
|
| which grew forever. So we had to maintain
| |
| | Better still use the javascript based
|
| a flat file object dumping to avoid
| |
| | content loading mechanism. This is
|
| re-parsing of records to form objects!
| |
| | something you might want to do when you
|
| Moreover, we used to do the act of
| |
| | have say 3+ graphs in the same page.*
|
| searching for a match for each of the
| |
| | Internet explorer displays the whole page
|
| property even when we had no alert
| |
| | only when all the contents are received
|
| criteria configured.What data loss
| |
| | from the server. So it is advisable to
|
| uh-uh?Once we fixed the memory issues in
| |
| | use iframes or javascript for content
|
| receiving data i.e dumping into flat
| |
| | loading.* Never use multiple/duplicate
|
| file, we never lost data! In addition to
| |
| | entries of the CSS file in the html page.
|
| that we had to remove a couple of
| |
| | Internet explorer tends to load each CSS
|
| unwanted indexes in the raw table to
| |
| | file as a separate entry and applies on
|
| avoid the overhead while dumping data.
| |
| | the complete page!Bottomline
|
| We hadd indexes for columns which could
| |
| | Your queries and schema make the system
|
| have a maximum of 3 possible values.
| |
| | slower! Fix them first and then blame
|
| Which actually made the insert slower and
| |
| | the database!See Also* High Performance
|
| was not useful.Tuning SQL QueriesYour
| |
| | Mysql* Query Performance* Explain Query*
|
| queries are your keys to performance.
| |
| | Optimizing Queries* InnoDB Tuning* Tuning
|
| Once you start nailing the issues, you
| |
| | MysqlCategories: Firewall Analyzer |
|
| will see that you might even have to
| |
| | Performance Tips
|
| de-normalize the tables. We did it! Here
| |
| | This page was last modified 18:00, 31
|
| is some of the key learnings:* Use
| |
| | August 2005.
|