Skip to content
Snippets Groups Projects
  1. Jan 27, 2014
    • Christian Dietrich's avatar
      dciao-kernelstructs: reuse sobres experiment for ISORC2014 · d307dd2e
      Christian Dietrich authored
      Differences:
      
      - the task activation order is determined in the faulty experiment as
        well as in the golden run (which is now done by
        fail-generic-tracing) by observing a variable fail_virtual_port.
      - There is a panic value read from the fail_virtual_port
      - The golden run task activation is determined by giving an extended
        trace to task_activation.py. The script collects all writes to
        fail_virtual_port, and determines the activation from this.
      
      Change-Id: Id401b78933b45a4b2cf031fc0a8b5ac90151ec24
      d307dd2e
  2. Jan 24, 2014
  3. Jan 23, 2014
  4. Jan 22, 2014
  5. Jan 21, 2014
  6. Jan 20, 2014
    • Horst Schirmeier's avatar
      jobclient: use initializer list · de39bf61
      Horst Schirmeier authored
      Change-Id: I7eb42f947bbabd61e1aad9224cedd7ffceec4f10
      de39bf61
    • Horst Schirmeier's avatar
      jobclient: initial number of jobs configurable · 5ffcb821
      Horst Schirmeier authored
      The new CLIENT_JOB_INITIAL configuration option allows to configure
      the client to request more than one job in the first request round.
      If a reasonable initial value is chosen, this removes the job ramp-up
      after each fail-client restart, and slightly improves overall
      throughput.
      
      Change-Id: Idac2721264ec264c520d341fac64a8311a974708
      5ffcb821
    • Horst Schirmeier's avatar
      jobclient: expect communication failures · 2c31bf79
      Horst Schirmeier authored
      This change makes the JobClient act properly on communication aborts.
      
      Change-Id: I0a76489f117e9721546215e3b627002605e25452
      2c31bf79
    • Horst Schirmeier's avatar
      jobclient: bugfix: faster shutdown at campaign end · 882d4f38
      Horst Schirmeier authored
      The JobClient currently waits a LONG time until it really shuts down
      after not having reached the server in sendResultsToServer() (which is
      unfortunately the by far most probable point in the code to determine
      this):
      
       -  A different bug (fixed in the previous commit) provoked the
          situation that a (way) too large amount of jobs was fetched
          before.
       -  sendResult() (called after each experiment iteration) realized
          that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to
          prematurely call home to send first results (without planning to
          get new jobs yet).
       -  If the server was gone (done, or aborted), connect in
          sendResultsToServer() failed after several retries and timeouts.
       -  All subsequent calls to sendResult() retried connecting to the
          server (again, with retries and timeouts), once for each remaining
          job.
       -  When all jobs were done, getParam() tries to connect a last time,
          finally telling the experiment that nobody's home.
      
      This resulted in client shutdown times of up to four hours (for the
      default CLIENT_JOB_LIMIT of 1000) after the campaign server
      terminated.  This change solves the issue by not handing out new
      (cached) jobs after the connect failed once, making the experiment
      terminate quickly.
      
      Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7
      882d4f38
    • Horst Schirmeier's avatar
      jobclient: bugfix: initialize timing statistics · ee7bc23d
      Horst Schirmeier authored
      If we don't properly initialize the job timing statistics, the number
      of jobs to be requested in the second request to the server is based
      on the wrong timings.  In our test case, CLIENT_JOB_LIMIT jobs were
      requested at once.
      
      Change-Id: I7e9d8ab6fe14e4488b3a74baf061d9a07f3a77c4
      ee7bc23d
    • Horst Schirmeier's avatar
      jobserver: bugfix: potential race · 1f6e275e
      Horst Schirmeier authored
      Delay insertion of to-be-sent jobs into m_runningJobs until they are
      really sent, as getMessage() won't work anymore (as in: segfault) if
      this job is concurrently re-sent (due to campaign end), its result is
      received, and deleted in the campaign.  This becomes non-hypothetical
      with larger values for CLIENT_JOB_LIMIT and CLIENT_JOB_REQUEST_SEC.
      
      Additionally, reinsert the remaining jobs into the input queue if
      communication fails, instead of inefficiently delaying redistribution
      until the campaign end.
      
      Change-Id: If85e3c8261deda86beb8d4d93343429223753f22
      1f6e275e
    • Horst Schirmeier's avatar
      jobserver: outgoing jobqueue bounded by default · 128b54b0
      Horst Schirmeier authored
      Bounding the outgoing queue is always a good idea:  If the campaign has
      separate threads for outgoing and incoming jobs (true for the
      DatabaseCampaign), this keeps memory requirements reasonable.  If the
      campaign works in a single thread, this is not disadvantageous either.
      
      Change-Id: Ic75272daa8266f051adf7b23e2ffe87f5c965b86
      128b54b0
    • Horst Schirmeier's avatar
      jobserver: use non-blocking accept · 73adc714
      Horst Schirmeier authored
      To allow the JobServer to shutdown properly, the accept() loop in
      JobServer::run() needs to regularly check whether we're done.  This
      change introduces a timed, non-blocking variant of accept() into
      SocketComm to achieve this.
      
      Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22
      73adc714
    • Horst Schirmeier's avatar
      jobserver: join remaining threads on shutdown · 86716690
      Horst Schirmeier authored
      To avoid accessing destroyed resources in CommThreads talking to clients,
      we need to properly join them on shutdown.  The m_CommMutex becomes a
      JobServer member to make sure it isn't destroyed before the JobServer
      itself.
      
      Change-Id: I35b9fb93ace08a7a9476650f8f5e93597a3a8aa0
      86716690
    • Horst Schirmeier's avatar
      jobserver: synchronization cleanup · 8505ddbb
      Horst Schirmeier authored
      This change cleans up in/out queue synchronization in the job server.
      End-of-jobs conditions are now properly signaled through the
      SynchronizedQueue, allowing to resume and abort blocked readers when
      no more input is expected.
      
      Change-Id: I3eaf37115ccf8c5b5afe3d971c7109cd62b68906
      8505ddbb
    • Horst Schirmeier's avatar
      import-trace: emit warning for malformed traces · 84edd02b
      Horst Schirmeier authored
      The Fail* tools expect trace events to be ordered in a specific way:
      memory-access events are supposed to come *after* the instruction
      event for the instruction that caused them.  Using a different order
      may cause subtle problems with both fault-space pruning and fast
      forwarding.  This change introduces a warning message when such a
      malformed trace is detected (i.e., when the instruction pointer of a
      memory-access event does not match the preceding instruction event).
      
      Change-Id: I8ae7420fd8ff26e2574590748bdcc5a63db76490
      84edd02b
    • Horst Schirmeier's avatar
      Merge branch 'mysql-concurrency-fixes' · 5ac108ea
      Horst Schirmeier authored
      5ac108ea
    • Horst Schirmeier's avatar
      use libmysqlclient_r to ensure thread safety · 84aac60a
      Horst Schirmeier authored
      According to
      <http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>,
      (potentially) threaded clients should use the reentrant
      libmysqlclient_r.  This is just a precaution, I haven't seen any
      issues with the normal libmysqlclient.
      
      Change-Id: Icb29df6dd54eb666e3b43b73fbda406acccd11cb
      84aac60a
    • Horst Schirmeier's avatar
      DatabaseCampaign: run statistics update when finished · 8f9ee3fd
      Horst Schirmeier authored
      Change-Id: Ib68e54ba82e988db0d2d74ffafa6dc9bd54cd272
      8f9ee3fd
    • Horst Schirmeier's avatar
      DatabaseCampaign: MySQL / concurrency fixes · 33b63651
      Horst Schirmeier authored
      According to
      <http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>,
      a MySQL connection handle must not be used concurrently with an open
      result set and mysql_use_result() in one thread
      (DatabaseCampaign::run()), and mysql_query() in another
      (DatabaseCampaign::collect_result_thread()).  This indeed leads to
      crashes when bounding the outgoing job queue (SERVER_OUT_QUEUE_SIZE),
      and maybe even more insidous effects in other cases.  The solution is
      to create separate connections for both threads.
      
      Additionally, call mysql_library_init() before spawning any threads.
      
      Change-Id: I2981f2fdc67c9a2cbe8781f1a21654418f621aeb
      33b63651
  7. Jan 15, 2014
  8. Jan 14, 2014
  9. Jan 06, 2014
  10. Jan 03, 2014
  11. Dec 11, 2013
    • Michael Lenz's avatar
      weather-monitor: now is a DatabaseCampaign · 0907dfb0
      Michael Lenz authored
      "removed" unneccessary memory-mapping ("Step 0")
      cleaned out ExperimentData - now consists only of fsppilot and resultset
      resultset now contains bitoffset which is part of result-table's primary key
      adapted code to work with msg.fsppilot() instead of ExperimentData-values
      
      Change-Id: I3b310e7a71d4b28479028250cd5722b3b2ce9f8c
      0907dfb0
  12. Dec 06, 2013
Loading